Smart Tasks
Cluster A: Exam Question Innovation
for commercial vocational examination tasks
The dashboard below presents the empirical findings on the evaluation and innovation of commercial vocational examination tasks in the context of artificial intelligence. Drawing on central explanatory models and a traffic-light classification, it illustrates which task types remain suitable for examination practice.
The empirical basis of the study comprises N=102 examined tasks. Within the AI analysis process, a test prompt was applied across a total of 14 AI systems. A standardised prompt was used to examine the tasks, with the AI evaluating them from a didactic perspective rather than merely solving them. The generated results were subsequently aggregated and analysed.
Empirically significant effects on the overall evaluation of an examination task:
1. Judgement competence
The more human judgement a task requires, the more suitable it is for examination purposes.
2. Openness
The more open-ended a task is, the more likely it is to remain suitable for examinations despite AI.
3. AI susceptibility to error
The more strongly AI reaches its limits, the more suitable the task is for examination use.
Characteristics without empirical significance in the model:
Green: suitable for examination
High openness, argumentative contextualisation, and strong judgement and reflective competence. AI produces incomplete solutions and is prone to error.
Yellow: suitable for examination with limitations
Partially open-ended. AI provides superficial answers. High reading competence is required, and the AI-generated solution must be corrected or expanded.
Red: unsuitable for examination
Closed tasks, knowledge-based or computational tasks. AI provides consistent solutions. Low numeracy and data literacy demands. AI generates complete solutions.
The Augmentation Trap Model
A dynamic analysis of AI productivity and the erosion of expertise
Introduction
The use of artificial intelligence can raise productivity in the short run, but it can also weaken workers' expertise when core cognitive processes are delegated to the tool. Caosun and Aral (2026) develop a dynamic model that captures this tension between immediate productivity gains and skill erosion. Two parameters are central: the skill-neutral productivity effect α, which is independent of the user's expertise, and the knowledge-complementary productivity effect β, which scales with expertise. The delegation intensity u describes how much of the task is handed over to AI.
Production function
The production function combines a human contribution that declines as AI delegation increases with a productivity effect of AI use that depends on usage intensity and the worker's skill level. S denotes the worker's current skill, while u denotes the share of the task delegated to AI. The first component (1 - u)S is the remaining human contribution: the more work is delegated, the smaller this contribution becomes. The second component (α + βS - γu)u captures the productivity effect of AI use. Here, α is the skill-independent gain from tasks that the AI can handle largely on its own. β captures the knowledge-complementary gain: cases in which the quality of AI output depends on the worker's judgment. γ imposes diminishing marginal returns to AI use, because the easiest-to-delegate tasks are transferred to the AI first.
The skill dynamic arises because high delegation reduces opportunities for independent practice. The optimal delegation policy u*(S) therefore depends not only on α, β and γ, but also on the skill recovery or forgetting rate κ and the decision-maker's evaluation horizon. When β > 1, complementarity dominates and experts gain more from AI use. When β < 1, AI is more substitutive, giving more experienced workers weaker incentives to delegate.
The two AI productivity channels
Skill-neutral channel α
This component delivers value independently of the user's expertise. Examples include the automated drafting of routine text, standard data preparation or form completion. It can raise output even for novices, but it contributes little to learning or skill formation.
Knowledge-complementary channel β
This component increases with the user's expertise. Experienced users can direct, evaluate and refine the tool more effectively, producing higher-quality results. Examples include complex programming, diagnostic reasoning or research-intensive work.
Skill recovery rate κ
This parameter describes how quickly skills are rebuilt through active practice or lost under sustained delegation. Deliberate practice, mentoring and unassisted work phases can reduce the long-run cost of cognitive offloading.
Key results of the model
Steady-state loss
Even when AI raises short-run output, the long-run steady state can fall below the no-AI benchmark because the worker's expertise erodes over time.
Augmentation trap
When decision-makers have short evaluation horizons or ignore the private value of skill, they may choose excessive delegation, shifting the long-run cost of skill atrophy to workers.
Complementarity vs. substitution
When β > 1, AI and human expertise reinforce each other. When β = 1, the AI effect is skill-neutral. When β < 1, AI substitutes for skill and changes who delegates most.
Skill divergence
Especially under low β, the workforce can split: experienced workers preserve expertise and continue to benefit, while less experienced workers delegate more and may deskill.
Five regimes of AI deployment
The interaction of α, β and the skill recovery rate κ partitions the parameter space into five regimes with different long-run implications for output and expertise:
Region I: Non-adoption
α and β are too low to justify AI use. The productivity gain does not offset coordination and delegation costs, so no adoption is optimal.
Region II: Augmentation (worse off)
Between the adoption boundary C0 and the long-run break-even boundary B, AI use is attractive in the short run, but the long-run state is worse than the no-AI benchmark.
Region III: Automation (worse off)
AI is productive enough to justify full automation (u = 1), but its raw output α remains below human potential. Expertise collapses and long-run output falls.
Region IV: Augmentation (better off)
With high β, AI complements human judgment. Skill remains valuable, and long-run productivity exceeds the no-AI benchmark.
Region V: Automation (better off)
The skill-neutral AI contribution α is high enough to dominate human potential, making full automation both rational and beneficial for highly standardized tasks.
Design implications
To preserve the long-run benefits of AI and avoid the augmentation trap, the authors point to design and governance choices that raise complementarity and protect skill formation:
Source: Caosun, Michael & Aral, Sinan (2026): The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading, arXiv (Cornell University). DOI: 10.48550/arxiv.2604.03501