Cluster A – Exam Question Innovation – Smart Tasks

Cluster A: Exam Question Innovation

for commercial vocational examination tasks

The dashboard below presents the empirical findings on the evaluation and innovation of commercial vocational examination tasks in the context of artificial intelligence. Drawing on central explanatory models and a traffic-light classification, it illustrates which task types remain suitable for examination practice.

Data basis & AI analysis process

The empirical basis of the study comprises N=102 examined tasks. Within the AI analysis process, a test prompt was applied across a total of 14 AI systems. A standardised prompt was used to examine the tasks, with the AI evaluating them from a didactic perspective rather than merely solving them. The generated results were subsequently aggregated and analysed.

Examination suitability & traffic-light classification
Green
26
Yellow
52
Red
24
25.5%
51.0%
23.5%
Central effect model

Empirically significant effects on the overall evaluation of an examination task:

1. Judgement competence

The more human judgement a task requires, the more suitable it is for examination purposes.

2. Openness

The more open-ended a task is, the more likely it is to remain suitable for examinations despite AI.

3. AI susceptibility to error

The more strongly AI reaches its limits, the more suitable the task is for examination use.

Non-significant examined effects

Characteristics without empirical significance in the model:

Reflective competence Problem-solving competence Numeracy/Data Literacy Reading competence Integrative competence Contextual embeddedness Need for argumentation Need for interpretation
Assessment logic of the traffic-light scheme

Green: suitable for examination

High openness, argumentative contextualisation, and strong judgement and reflective competence. AI produces incomplete solutions and is prone to error.

Yellow: suitable for examination with limitations

Partially open-ended. AI provides superficial answers. High reading competence is required, and the AI-generated solution must be corrected or expanded.

Red: unsuitable for examination

Closed tasks, knowledge-based or computational tasks. AI provides consistent solutions. Low numeracy and data literacy demands. AI generates complete solutions.