Tips for Creating Valid and Effective Assessments

METRICS&ANALYSISTips for Creating Valid and Effective AssessmentsShouldnâ€™t our exams be fair, valid and reliable?

By Steven Just, Ed.D.Dividing Rule-1680635383img2We all know that life sciences products emerge from a rigorous, scientific process. Research results must be valid and reliable and stand up to external scrutiny. The consequences of a non-rigorous research & development (R&D) process can be substantial.Itâ€™s reasonable to recognize that learning & development (L&D) is not R&D. But, as learning professionals, shouldnâ€™t we attempt to be scientific and rigorous in our practice as well?This is especially true when we are administering assessments. If we are giving exams with potential job or career consequences, shouldnâ€™t we be able to demonstrate that our exams are fair, valid and reliable? And if we canâ€™t, have we not potentially placed the company in legal jeopardy?We know from experience that companies vary widely in how seriously they treat exams. At one end of the spectrum, some companies treat their assessments as mild annoyances to be dispensed with as effortlessly and as quickly as possible (a check-the-box approach), while at the other end some companies follow rigorous, defensible processes.So, what makes a process defensible? These are the questions (with answers!) I most frequently encounter:What is a validated exam?There are several types of validity: construct, predictive, content and face. For the most part we are concerned with content validity. Content validity is the adequacy with which a domain of content is tested. Whatâ€™s important is that it is not a quantitative measure. It flows from a process, including properly constructed questions, domain coverage, setting a valid passing score, ensuring fairness and using a â€œsensibleâ€ method of testing.What is a reliable exam?Unlike validity, reliability is a number. It varies from 0 (no reliability) to 1 (perfect reliability). It refers to â€œreplicability.â€That is, if we gave the same test again to the same or an equivalent group of people, would we get the same results? Since you almost never give the same exam to the same people twice, there are statistical methods for calculating reliability from a single exam delivery.How are validity and reliability related?Exams can be reliable but not valid. For example, if we give a calculus test to a second-grade class, the results will be very poor (to say the least). If, two days later, we again give the same exam to these students the results would be the same. This exam is reliable, but not valid. (It doesnâ€™t have construct validity.) But, for an exam to be valid it must also be reliable.Do all exams need to be validated?No. Only exams with consequences (also known as high-stakes exams). If you are using assessments for quizzing, pre-testing, gamification, adaptive questioning, module-level tests or for diagnosing learning gaps and targeting remediation, the assessment does not need to be validated.That said, the questions in any assessment, including low-stakes assessments, should be written according to valid question writing rules.Our standard passing score is 90%. Is that valid?This is very important. It depends on how you arrived at that number. Was your reasoning something like: â€œwell thatâ€™s what itâ€™s always been at our company, so thatâ€™s what I useâ€? If it is, you could be placing your company in legal jeopardy.If exam failure has job or career consequences, you need to be able to justify that number, and the only way to justify a passing score is through a legally-adjudicated, validated process.How long should an exam be?The simple answer is, as long as it needs to be to cover the important content. I have validated more than a hundred product information or package insert (PI) certification exams. Typically, I find that an exam in the 40-question range will cover a PI. Other exams could be shorter or longer depending on the content being tested.As a rule of thumb, I generally limit exams to no more than 40 or so questions. If more questions are required to cover the content, then I break the exam into two parts.Should we time our exams?For certification exams, the answer is absolutely yes. Since most exams are delivered remotely and are not proctored, if you do not time the exam, it will become effectively open book.How should we time our exams, by the exam or by the question?Timing by the whole exam is equivalent to making the exam open book. Why? Because the testtakers will go through the exam, answer the questions they know, and then take the leftover time and look up the answers to questions they donâ€™t know.In an unproctored exam, the only way to ensure exam integrity is to time by the question. If your testing system does not have this basic feature, then you may want to think about acquiring a new testing system.Side note: I have found that many companies use their learning management systems for testing and these systems frequently lack key features, including question timing.Iâ€™ve heard that â€œall of the aboveâ€ is not a valid answer choice. Why?Two reasons: When â€œall of the aboveâ€ is a choice, it is more often than not the correct answer and in a multiple-choice question, with four or more choices, you only need to know that two of the choices are correct, to know that â€œall of the aboveâ€ is the correct answer.I recommend using â€œselect all that applyâ€ questions when there are multiple correct responses. Be sure to include at least one incorrect response.Can I use â€œall that applyâ€ questions?Yes, but see above. There is one problem with â€œall that applyâ€ questions, however. They are more difficult than multiple-choice questions (unless you give partial credit, but very few testing systems have this feature).Hereâ€™s a trick: If you want to simplify an â€œall that applyâ€ question, tell the test-taker how many responses are correct. (â€œSelect the three correct answers from the list below.â€)Is â€œnone of the aboveâ€ a valid answer choice?Iâ€™ve heard people say not to use it, but I have no problem with it as a choice if you donâ€™t overuse it. It actually makes the question more difficult.Why? In a standard multiple-choice question, even if you are not sure of the correct answer, you know one of the choices must be correct. But if you include â€œnone of the aboveâ€ as a choice, the correct response could be anything, not just one of the listed choices.Can I use true-or-false questions?T/F questions are a valid question type. However, as we well know, the test-taker has a 50% chance of guessing the correct answer. So, you can use them, but sparingly.There is a variation on the T/F question that I like: the â€œT/F becauseâ€ question, in which you must select why the statement is true or false.ConclusionCreating fair, valid and reliable assessments is a science. As learning professionals, we need to incorporate this science into our practice.Dividing Ruleimg3Steven Just, Ed.D., is the CEO and principal consultant of Princeton Metrics. Dr. Just can be reached at sjust@princetonmetrics.com.LTEN 2023CoverRed NucleusACTOFrom the President: The Privilege of Being Counted OnIlluminateCelebrating Training ExcellenceCelebrating Training Excellence 2Celebrating Training Excellence 3Celebrating Training Excellence 4Celebrating Training Excellence 5CMR InstituteUpfrontUpfront 2Whole SystemsProficient LearningAd IndexAxiomiCoachFirstGuest Editor: The Power of PrioritizationIC AxonAtomus | aCoachDirections: Getting to Know YouMatrix Achievement GroupCurtis LearningLTEN Conference 1LTEN Conference 2LTEN Conference 3Front of the Room: Audience AdvocateMC3 | Tipping Point MediaLTEN OpportunitiesVirtual Training: A Path Forward: 5 Key Questions for Virtual TrainingAllego | Syneos HealthTiER1 Performance | Springer Healthcare TrainingEisai Sales Training: The Journey to Training ExcellenceMetrix | ETSIAdMed | ERSThe Leadership Development ChallengeCustom Learning Designs (CLD) | Salience LearningVaya | NXLevelTips for Creating Valid and Effective AssessmentsBull City Blue | Encompass Communications and Learnings4 NetQuest | UnboxedFalse Assumptions: Releasing Your Organization's ValueELB Learning | Fair PlayQstream | Richardson | Ignite | Integrity SolutionsRethinking Sales Training: Better Design, Increased EffectivenessLTEN Lead10 Tips to Help New Managers SucceedAttensi | WLH | TransPerfect | EversanaLTEN MembershipCreating Tension: A Sales StrategyLeading Edge Training Solutions | BioDigital | Romar Learning | SmartWinnrLearning Technology Is Your Life RaftDeltaPoint | LTEN | Career CenterFocus ContactsFocus ForwardArchives