Smart Tasks
Cluster A: Exam Question Innovation
for commercial vocational examination tasks
The dashboard below presents the empirical findings on the evaluation and innovation of commercial vocational examination tasks in the context of artificial intelligence. Drawing on central explanatory models and a traffic-light classification, it illustrates which task types remain suitable for examination practice.
The empirical basis of the study comprises N=102 examined tasks. Within the AI analysis process, a test prompt was applied across a total of 14 AI systems. A standardised prompt was used to examine the tasks, with the AI evaluating them from a didactic perspective rather than merely solving them. The generated results were subsequently aggregated and analysed.
Empirically significant effects on the overall evaluation of an examination task:
1. Judgement competence
The more human judgement a task requires, the more suitable it is for examination purposes.
2. Openness
The more open-ended a task is, the more likely it is to remain suitable for examinations despite AI.
3. AI susceptibility to error
The more strongly AI reaches its limits, the more suitable the task is for examination use.
Characteristics without empirical significance in the model:
Green: suitable for examination
High openness, argumentative contextualisation, and strong judgement and reflective competence. AI produces incomplete solutions and is prone to error.
Yellow: suitable for examination with limitations
Partially open-ended. AI provides superficial answers. High reading competence is required, and the AI-generated solution must be corrected or expanded.
Red: unsuitable for examination
Closed tasks, knowledge-based or computational tasks. AI provides consistent solutions. Low numeracy and data literacy demands. AI generates complete solutions.