I bought my first car at 16. It was an awesome little blue 4×4 (Bronco II). The test drive was perfect. I got to blast the radio and drive off-road through a sub-division under construction. Bouncing over piles of debris, I can still remember the exhilaration. Both the seller and I laughed the whole time. Only problem…he was still laughing two weeks later, while I was on the side of the highway spitting steam and pouring oil mixed with engine coolant. That 4×4 rusted in my driveway for another year before a neighbor bought it for less than 20% of what I paid.
Yeah…I skipped the inspection part. It was just too much fun to think about that. And since it handled the test drive, what could really go wrong? I was going to be so freakin’ cool come fall in high school.
Tell me I’m the only one who’s ever dreamed of the stars and ended up on the bus.
Now that brings us to outcomes. Maybe you’ve been kicking the tires of a new CME program and hoping it will generate great outcomes? Don’t get distracted by the shiny bits…there are three key things to inspect for every outcomes project (in descending order of importance and ascending in order of coolness):
- Study design: the main concern here is “internal validity”, which refers to how well a study controls for the factors that could confound the relationship between the intervention and outcome (ie, how do we know something else isn’t accelerating or breaking our path toward the desired outcome?). There are many threats to internal validity and correspondingly, many distinct study designs to address them. One group pretest-posttest is a study design, so is posttest only with nonequivalent groups (ie, post-test administered to CME participants and a non-participant “control” group). There are about a dozen more options. You should understand why a particular study design was selected and what answers it can (and cannot) provide.
- Data collection: second to study design, is data collection. The big deal here is “construct validity” (ie, can the data collection tool measure what it claims?). Just because you want your survey or chart abstraction to measure a certain outcome, doesn’t mean it actually will. Can you speak to the data that supports the effectiveness of your tool in measuring its intention? If not, you should consider another option. Note: it is really fun to say “chart abstraction”, but it’s a data collection tool, not a study design. If your study design is flawed, you have to consider those challenges to internal validity plus any construct validity issues associated with your chart abstraction. The more issues you collect, the weaker your final argument regarding your desired outcome. An expensive study (eg, chart review) does not guarantee a result of any importance, but it does sound good.
- Analysis: this is the shiny bit, and just like your parents told you, the least important. Remember Mom’s advice: if your friends don’t think you’re cool, then they aren’t really your friends. Well, think about study design and data collection as the “beauty on the inside” and analysis as a really groovy jacket and great hair. Oh yeah, it matters, but rather less so if they keep getting you stuck on the highway. You may have heard statisticians are nerds, but they’re the NASCAR drivers of the research community – and I’m here to tell you the car and pit crew are more important. In short, if your outcomes are all about analysis, they probably aren’t worth much.
Although I’ve complained a fair bit about validity and reliability issues in CME assessment, I haven’t offered much on this blog to actually address these concerns. Well, the thought of thousands (and thousands and…) of dear and devoted readers facing each new day with the same, tired CME assessment questions has become too much to bear. That, and I was recently required to do a presentation on guidelines and common flaws in the creation of multiple-choice questions…so I thought I’d share it here.
I’d love to claim these pearls are all mine, but they’re just borrowed. Nevertheless, this slide deck may serve as a handy single-resource when constructing your next assessment (and it contains some cool facts about shark attacks).
I’ve talked a lot about effect size: what it is (here), how to calculate it (here, here and here), what to do with the result (here and here)…and then some about limitations (here). Overall, I’ve been trying to convince you that effect size is a sound (and simple) approach to quantifying the magnitude of CME effectiveness. Now it’s time to talk about how it may be total garbage.
All this effect size talk includes the supposition that the data from which it is calculated is both reliable and valid. In CME, the data source is overwhelming survey – and the questions within typically include self-efficacy scales, single-correct answer knowledge tests and / or case vignettes. But how do you know that your survey questions actually measure their intention (validity) and do so with consistency (reliability)? CME has been repeatedly dinged for not using validated measurement tools. And if your survey isn’t valid (or reliable), why would your data be worth anything? Effect size does not correct for bad questions. So maybe next time you’re touting a great effect size (or trying to bury a bad one), you should also consider (and be able to document) whether you’ve demonstrated the effectiveness of your CME or the ineffectiveness of your survey.
So what can be done? Well, you can hire a psychometrist and add complicated-sounding things like “factor analysis” and “Cronbach’s alpha” to your lexicon (yell those out during the next CME presentation you attend…and then quickly run of the room). Or (actually “and”), you can start with sound question-design principles. The key thing to note, no amount of complex statistics can make a bad question good – so you need to know the fundamentals of assessing knowledge and competence in medical education. Where do you get those? Here are some suggestions to get you started:
- Take the National Board of Medical Examiners (NBME) U course entitled: Assessment Principles, Methods, and Competency Framework. This is an awesome (daresay, the best) resource for anyone assessing knowledge and competence in medical education. Complete this course (there are 20 lessons, each under 30 minutes) and you’ll be as expert as anyone in CME. You can register here. And it’s free!
- Check out Dr. Wendy Turell’s session entitled Tips to Make You a Survey Measurement Rock Star during the next CMEpalooza (April 8th at 1:30 eastern). This is her wheelhouse – so steal every bit of her expertise you can. Once again, it’s free.
Oh, I so want to say I measure patient outcomes. Everyone gets so excited. Imagine these two presentation titles: 1) “Reliability and Validity in Educational Outcome Assessment” and 2) “Measuring Patient Outcomes Associated with CME Participation”. Which one are you going to attend? Well…yes, to most folks those both sound pretty boring. But this is a CME blog. And in this part of town, it’d be like asking whether you’d rather hang out with some guy who runs a strip mall accounting firm or Will Ferrell.
But I’m not Will Ferrell. And instead of an accountant, I’d like to introduce you to Drs. Cook and West who present a very clear and thoughtful piece on
why Will Ferrell really isn’t that funny why patient outcomes may not be the best CME outcome target (click here for the article).
Read this article and be prepared. If you’re presenting on patient outcomes, I’m going to ask about things like “dilution” and “teaching-to-the-test”. Unless, of course, you are Will Ferrell. In which case, thank you for Elf.
There are very few validated instruments in CME assessment. I’ve only encountered three, and each of these were designed to measure satisfaction. As to why there are so few validated instruments…it’s pretty simple: 1) validating an instrument is difficult and 2) grant funding in CME research is scarce (forcing the necessary research talent to focus elsewhere).
Of course, an enterprising investigator might see this gap as an opportunity to build a research career in an area with very low competition. For that person, I’d recommend starting with the following articles:
- Downing SM. Face validity of assessments: faith-based interpretation or evidence-based science. Medical Education 2006;40:7-8. (PubMed)
- Downing SM. Validity: on the meaningful interpretation of assessment data. Medical Education 2003;37:830-7. (abstract)
- Downing SM, Reliability: on the reproducibility of assessment data. Medical Education 2004;38:1006-12. (abstract)