MEASUREMENTS&ANALYTICS
Figure 1
For far too long, we in learning & development (L&D) have faced the persistent challenge of proving our value. It’s not for a lack of desire, but often a lack of time and resources.
Squeezed by the demands of learning design and delivery, we often under-invest in measurement, typically spending only 15% of our time on it.
This creates a significant disconnect with the C-suite. While 75% of CEOs believe L&D analytics are important, only 25% feel they receive sufficient information about how learning drives business results, reduces risk and strengthens workforce capability. This gap between our activity and executive expectation has persisted for years.
How can we finally close this measurement expectations gap? Artificial Intelligence (AI) offers some surprising new answers.
Every L&D professional knows the “Kirkpatrick/Phillips conundrum.” We can confidently say “yes” when it comes to measuring Level 1 (reaction) and Level 2 (learning). We hit “sometimes” for Level 3 (behavior). But when we get to Level 4 (impact) and Level 5 (return on investment or ROI) – the very things our leaders care about most – our answer becomes a deflating “Rarely” or “Nope.” (Figure 1)
AI finally offers us a practical, powerful way to shatter this long-standing impasse. It does this not just by making us faster, but by also making us smarter.
Its dual benefits are resource augmentation, which increases the efficiency of our measurement activities, and targeted expertise, which injects modern measurement knowledge into our process. This combination is transformative.
AI doesn’t just help us do old measurement tasks more quickly; it injects the specialized expertise needed to perform entirely new, more sophisticated analysis that was previously out of reach for most teams. It makes measuring all five levels not just a theoretical goal, but an achievable reality.
Designing and analyzing learner surveys is an area where AI delivers immediate value. An L&D professional can prompt an AI to create a performance-focused survey, using its knowledge base to suggest relevant, high-impact question types like “intent to apply” or “relevance of course material.”
After collecting learner responses, the same AI can process the data, automatically generating histogram charts for quantitative results. But its most powerful capability is in analyzing openresponse questions.
AI is excellent at performing a sentiment analysis on responses to these questions. The output is a concise summary of key insights, broken down into “positive” and “opportunity” categories. This transforms raw feedback into strategic intelligence almost instantly, giving us a clear, immediate understanding of a learning program’s strengths and weaknesses.
Today, more than 95% of life sciences exam questions are multiple-choice or other closed-end question types (like true/false or all that apply). Multiple-choice questions are great for testing recall and some level of understanding, but we know that it is difficult to write multiple-choice questions that test more complex reasoning, critical thinking and decision making.
So, why do we continue to use them? Because they are easy to write and score. Open-response (essay) questions are, of course, much better at probing for higher levels of understanding but we don’t tend to use them because we don’t want to spend the time and effort to score them.
Consider the difference between a straight-forward multiple-choice question and an AI-generated open-response question about the human heart.
A basic multiple-choice question is: Blood returning from the body enters which chamber of the heart first?
Left atrium
Left ventricle
Right atrium
Right ventricle
An AI-generated open-response question is: “Explain the path that blood takes as it flows through the heart. In your response, include the names of the chambers, valves and major blood vessels involved, and describe whether the blood is oxygenated or deoxygenated at each stage.”
The AI-generated question is undeniably more comprehensive and requires a higher level of understanding of cardiovascular function to answer correctly.
Having created the question, AI can next be prompted to create a scoring rubric, associating scores (for example 1-5) with key expected elements of a learner response. To aid human scorers and ensure consistent grading it can also generate “anchor” responses – sample answers that exemplify what a 1-, 2-, 3-, 4- or 5-point answer would look like.
But can the AI accurately score the student responses itself? Our own experiments with AI scoring suggest that it does an accurate job, usually as good as human raters, especially if the AI has been pre-trained with the scoring rubric and anchor responses.
This is a game-changer because it allows test authors to create, at scale, robust exams that assess beyond the level of recall.
While AI is a powerful tool, it is crucial to remember the principle of caveat emptor (buyer beware). AI is not infallible. A real-world example from Dallas school officials highlights this: When state tests were graded by AI, scores came in lower than expected, based upon previous years’ results. Upon human review, 2,000 of the 4,600 submitted samples received a higher score. Imagine the compliance implications if 2,000 employees were incorrectly assessed on a critical certification due to an unverified AI scoring model!
This underscores the need for a “Human-in-the-loop” (HITL) approach — a collaborative model where humans provide continuous feedback and oversight. This partnership is essential for responsible implementation and delivers several key benefits:
Improved accuracy and reliability: Human feedback helps to correct errors and biases in AI models.
Increased adaptability: HITL systems can learn and adapt to new situations and contexts through human feedback.
Enhanced trust and accountability: Human involvement ensures that AI systems remain transparent and accountable for their outputs.
Ultimately, AI is an incredibly powerful assistant, not an infallible oracle. Our oversight is the non-negotiable reality check that ensures its outputs are accurate, fair and effective.
Figure 2
With AI as a partner, we can finally break free from the measurement conundrum, spending less time creating and scoring Level 1 surveys and Level 2 assessments and more time architecting organizational impact studies. The integration of AI into learning measurement has the potential to fundamentally reshape our role. By automating the manual, time-intensive tasks of data collection, analysis and assessment design, AI frees us to focus on higher-value strategic work. (Figure 2)
This isn’t a story about replacement, but of evolution. The shift is away from manual execution and toward strategic design and oversight. As Scott Wu, co-founder of Cognition, puts it:
“Within our lifetime, engineers will go from bricklayers to architects, focusing on the creativity of designing systems rather than the manual labor of putting them together.”
The same is true for L&D. How will you start shifting your focus from “bricklaying” to “architecture” in your own learning practice?
Steven Just, Ed.D., is CEO and principal consultant at Princeton Metrics. Email Steven at sjust@princetonmetrics.com or connect through LinkedIn at linkedin.com/in/steven-just-081b76.
James Delaney is the founder and principal consultant of Talent Experience Group. Email Jim at jdelaney@talentexperiencegroup.com or connect through linkedin.com/in/jdlearning.