Although I’ve complained a fair bit about validity and reliability issues in CME assessment, I haven’t offered much on this blog to actually address these concerns. Well, the thought of thousands (and thousands and…) of dear and devoted readers facing each new day with the same, tired CME assessment questions has become too much to bear. That, and I was recently required to do a presentation on guidelines and common flaws in the creation of multiple-choice questions…so I thought I’d share it here.
I’d love to claim these pearls are all mine, but they’re just borrowed. Nevertheless, this slide deck may serve as a handy single-resource when constructing your next assessment (and it contains some cool facts about shark attacks).
I’ve talked a lot about effect size: what it is (here), how to calculate it (here, here and here), what to do with the result (here and here)…and then some about limitations (here). Overall, I’ve been trying to convince you that effect size is a sound (and simple) approach to quantifying the magnitude of CME effectiveness. Now it’s time to talk about how it may be total garbage.
All this effect size talk includes the supposition that the data from which it is calculated is both reliable and valid. In CME, the data source is overwhelming survey – and the questions within typically include self-efficacy scales, single-correct answer knowledge tests and / or case vignettes. But how do you know that your survey questions actually measure their intention (validity) and do so with consistency (reliability)? CME has been repeatedly dinged for not using validated measurement tools. And if your survey isn’t valid (or reliable), why would your data be worth anything? Effect size does not correct for bad questions. So maybe next time you’re touting a great effect size (or trying to bury a bad one), you should also consider (and be able to document) whether you’ve demonstrated the effectiveness of your CME or the ineffectiveness of your survey.
So what can be done? Well, you can hire a psychometrist and add complicated-sounding things like “factor analysis” and “Cronbach’s alpha” to your lexicon (yell those out during the next CME presentation you attend…and then quickly run of the room). Or (actually “and”), you can start with sound question-design principles. The key thing to note, no amount of complex statistics can make a bad question good – so you need to know the fundamentals of assessing knowledge and competence in medical education. Where do you get those? Here are some suggestions to get you started:
- Take the National Board of Medical Examiners (NBME) U course entitled: Assessment Principles, Methods, and Competency Framework. This is an awesome (daresay, the best) resource for anyone assessing knowledge and competence in medical education. Complete this course (there are 20 lessons, each under 30 minutes) and you’ll be as expert as anyone in CME. You can register here. And it’s free!
- Check out Dr. Wendy Turell’s session entitled Tips to Make You a Survey Measurement Rock Star during the next CMEpalooza (April 8th at 1:30 eastern). This is her wheelhouse – so steal every bit of her expertise you can. Once again, it’s free.
Oh, I so want to say I measure patient outcomes. Everyone gets so excited. Imagine these two presentation titles: 1) “Reliability and Validity in Educational Outcome Assessment” and 2) “Measuring Patient Outcomes Associated with CME Participation”. Which one are you going to attend? Well…yes, to most folks those both sound pretty boring. But this is a CME blog. And in this part of town, it’d be like asking whether you’d rather hang out with some guy who runs a strip mall accounting firm or Will Ferrell.
But I’m not Will Ferrell. And instead of an accountant, I’d like to introduce you to Drs. Cook and West who present a very clear and thoughtful piece on
why Will Ferrell really isn’t that funny why patient outcomes may not be the best CME outcome target (click here for the article).
Read this article and be prepared. If you’re presenting on patient outcomes, I’m going to ask about things like “dilution” and “teaching-to-the-test”. Unless, of course, you are Will Ferrell. In which case, thank you for Elf.
There are very few validated instruments in CME assessment. I’ve only encountered three, and each of these were designed to measure satisfaction. As to why there are so few validated instruments…it’s pretty simple: 1) validating an instrument is difficult and 2) grant funding in CME research is scarce (forcing the necessary research talent to focus elsewhere).
Of course, an enterprising investigator might see this gap as an opportunity to build a research career in an area with very low competition. For that person, I’d recommend starting with the following articles:
- Downing SM. Face validity of assessments: faith-based interpretation or evidence-based science. Medical Education 2006;40:7-8. (PubMed)
- Downing SM. Validity: on the meaningful interpretation of assessment data. Medical Education 2003;37:830-7. (abstract)
- Downing SM, Reliability: on the reproducibility of assessment data. Medical Education 2004;38:1006-12. (abstract)