Where did the knowledge go?

What does it mean when your CME participants score worse on a post-test assessment (compared to pre-test)?

Here are some likely explanations:

  1. The post-activity change was not statistically significant.  Significance testing determines whether a measured difference pre/post could be attributed to random chance.  If the difference was not significant, we can’t say the result was due to anything other than chance.  If the pre/post response was too low to warrant statistical testing, the direction of change is meaningless – you don’t have a representative sample.
  2. Measurement bias (specifically, “multiple comparisons”).  This measurement bias results from multiple comparisons being conducted within a single sample (ie, asking dozens of pre/post questions within a single audience).  The issue with multiple comparisons is that the more questions you ask, the more likely you are to find a significant difference where it shouldn’t exist (and would’t if subject to more focused assessment).  Yes, this is a bias to which many CME assessments are subject.
  3. Bad question design. Did you follow key question development guidelines?  If not, the post-activity knowledge drop could be due to misinterpretation of the question.  You can learn more about question design principles here.

Leave a comment

Filed under Outcomes, question design, Statistical tests of significance

CME Outcomes Statistician, first grade

I was very excited to have my CMEPalooza session (Secrets of CME Outcome Assessment) officially sanctioned by the League of Assessors (LoA).  Accordingly, participants who passed the associated examination were awarded “CME Outcome Statistician, first grade” certifications.  It’s a grueling test, but three candidates made it through and received their certifications today (names withheld due to exclusivity).

Picture2

More good news…I petitioned the LoA to extend the qualifying exam for another six weeks (expiring May 29, 2015) and was officially approved!  So you can still view the CMEPalooza session (here) and then take the qualifying exam (here). Good luck!

Leave a comment

Filed under CME, CMEpalooza, League of Assessors, Outcomes

CMEPalooza

On Tuesday, Chicago will decide on either Rahm on Chuy.  But Wednesday, it’s all about CMEPalooza.  Thank you to our industry’s “Jane’s Addiction” for organizing the third installment of this CME free-for-all.  Imedex is a proud second-time sponsor and I’ll be presenting on CME outcomes assessment (11 AM eastern). My session is designed for those that fall into the following categories:

  • Regularly use surveys to measure learning and competence change
  • No formal process for reviewing survey questions
  • Unsure of how to utilize statistical tests

Oh, but there’s more…this session has been accredited by the apocryphal League of CME Assessors (sorry, can’t provide a link due to exclusivity).  If, after completing the session, you wish to be considered for eligibility as “CME Outome Statistician, first grade”, click here to take their test. There’s even a certificate if you pass. Good luck!

2 Comments

Filed under CMEpalooza

Writing questions good

Although I’ve complained a fair bit about validity and reliability issues in CME assessment, I haven’t offered much on this blog to actually address these concerns. Well, the thought of thousands (and thousands and…) of dear and devoted readers facing each new day with the same, tired CME assessment questions has become too much to bear. That, and I was recently required to do a presentation on guidelines and common flaws in the creation of multiple-choice questions…so I thought I’d share it here.

I’d love to claim these pearls are all mine, but they’re just borrowed.  Nevertheless, this slide deck may serve as a handy single-resource when constructing your next assessment (and it contains some cool facts about shark attacks).

1 Comment

Filed under Best practices, CME, MCQs, multiple-choice questions, Reliability, Summative assessment, Survey, survey design, Validity

Effect size kryptonite

I’ve talked a lot about effect size: what it is (here), how to calculate it (here, here and here), what to do with the result (here and here)…and then some about limitations (here).  Overall, I’ve been trying to convince you that effect size is a sound (and simple) approach to quantifying the magnitude of CME effectiveness.  Now it’s time to talk about how it may be total garbage.

All this effect size talk includes the supposition that the data from which it is calculated is both reliable and valid.  In CME, the data source is overwhelming survey – and the questions within typically include self-efficacy scales, single-correct answer knowledge tests and / or case vignettes.  But how do you know that your survey questions actually measure their intention (validity) and do so with consistency (reliability)?  CME has been repeatedly dinged for not using validated measurement tools.  And if your survey isn’t valid (or reliable), why would your data be worth anything?  Effect size does not correct for bad questions.  So maybe next time you’re touting a great effect size (or trying to bury a bad one), you should also consider (and be able to document) whether you’ve demonstrated the effectiveness of your CME or the ineffectiveness of your survey.

So what can be done?  Well, you can hire a psychometrist and add complicated-sounding things like “factor analysis” and “Cronbach’s alpha” to your lexicon (yell those out during the next CME presentation you attend…and then quickly run of the room).  Or (actually “and”), you can start with sound question-design principles.  The key thing to note, no amount of complex statistics can make a bad question good – so you need to know the fundamentals of assessing knowledge and competence in medical education.  Where do you get those?  Here are some suggestions to get you started:

  • Take the National Board of Medical Examiners (NBME) U course entitled: Assessment Principles, Methods, and Competency Framework.  This is an awesome (daresay, the best) resource for anyone assessing knowledge and competence in medical education.  Complete this course (there are 20 lessons, each under 30 minutes) and you’ll be as expert as anyone in CME.  You can register here.  And it’s free!
  • Check out Dr. Wendy Turell’s session entitled Tips to Make You a Survey Measurement Rock Star during the next CMEpalooza (April 8th at 1:30 eastern).  This is her wheelhouse – so steal every bit of her expertise you can.  Once again, it’s free.

2 Comments

Filed under ACCME, CMEpalooza, Item writing, question design, Reliability, Validity

Bringing boring back

I want to play guitar. I want to play loud, fast and funky.  But right now, I’m wrestling basic open chords.  And my fingers hurt.  And I keep forgetting to breathe when I play.  And my daughter gets annoyed listening to the same three songs over and over.  But so is the way.

When my daughter “plays”.  She cranks up a song on Pandora, jumps on and off the furniture, and windmills through the strings like Pete Townshend.  She’d light the thing on fire if I didn’t hide the matches.  Guess who’s more fun to watch.  But take away the adorable face and the hard rock attitude and what do you have?  Yeah…a really bad guitar player.

I was reminded of this juxtaposition while perusing the ACEhp 2015 Annual Conference schedule.  I know inserting “patient outcomes”  into an abstract title is a rock star move.  But on what foundation is this claim built?  What limitations are we overlooking?  Have we truly put in the work to ensure we’re measuring what we claim?

My interests tend to be boring.  Was the assessment tool validated?  How do you ensure a representative sample?  How best to control for confounding factors?  What’s the appropriate statistical test?  Blah, blah, blah…  I like to know I have a sturdy home before I think about where to put the entertainment system.

So imagine how excited I was to find this title: Competence Assessments: To Pair or Not to Pair, That Is the Question (scheduled for Thursday, January 15 at 1:15).  Under the assumption that interesting-sounding title and informational value are inversely proportional, I had to find out more.  Here’s a excerpt:

While not ideal, providers are often left with unpaired outcomes data due to factors such as anonymity of data, and low overall participation. Despite the common use of unpaired results, literature on the use of unpaired assessments as a surrogate for paired data in the CME setting is limited.

Yes, that is a common problem.  I very frequently have data for which I cannot match a respondent’s pre- and post-activity responses.  I assume the same respondents are in both groups, but I can’t make a direct link (i.e., I have “upaired” data).  Statistically speaking, paired data is better.  The practical question the presenters of this research intend to answer is how unpaired data may affect conclusions about competence-level outcomes.  Yes, that may sound boring, but it is incredibly practical because it happens all the time in CME – and I bet very few people even knew it might be an issue.

So thank you Allison Heintz and Dr. Fagerlie.  I’ll definitely be in attendance.

Leave a comment

Filed under ACEhp, Alliance for CME, CME, Methodology, paired data, Statistical tests of significance, Statistics, unpaired data

CME is Effective! Now what?

The ACCME just released an updated synthesis of published systematic reviews regarding the effectiveness of CME.  You can find it here.  In short, the authors offer the following conclusions (this is pulled verbatim from the report on p. 14):

  • CME does improve physician performance and patient health outcomes;
  • CME has a more reliably positive impact on physician performance than on patient health outcomes; and
  • CME leads to greater improvement in physician performance and patient health if it is more interactive, uses more methods, involves multiple exposures, is longer, and is focused on outcomes that are considered important by physicians.

Yes, there are issues of validity, heterogeneity, standardization and good-ole-fashioned publication bias in CME research, but that aside, there’s enough evidence out there to comfortably assume CME can positively affect physician performance and patient health.  While that’s good news, we can’t ignore the next question: Why is it effective?

To borrow another section from this report (p. 15):

The authors of the systematic reviews make clear that the research regarding mechanisms of action by which CME improves physician performance and patient health outcomes is in the early stages and needs greater theoretical and methodological sophistication. Several authors make the argument that future research must take account of the wider social, political, and organizational factors that play a role in physician performance and patient health outcomes.

The third bullet point above shines some light on these “mechanisms of action”, but the recipe for effective CME is still vague.  For example….How do I make my activity more interactive?  More importantly, what qualifies as interactive in the first place?  If multiple exposures is better, how many, and at what intensity?  How effective are these “mechanisms of action” across various physician audiences?  Do oncologists and internists learn the same way?  What internal and external (e.g., practice environment) factors are influential?

There’s several careers worth of research questions here.  Anyone funding?

Leave a comment

Filed under ACCME, CME, Effectiveness