Outcomes curriculum at the Alliance 2020 Annual Meeting

gratisography-342H(superhero)

If outcomes are the “cowbell” of CME, then the upcoming Alliance Annual Meeting is, well…going to have a lot of cowbell. Sorry, I ran out of steam with this intro.

So…more quickly to the matter at hand: the Alliance has organized a five-part outcomes curriculum within the regular sessions at the upcoming Annual Meeting in San Francisco. That means there is no additional charge to participate, but attendance will be limited-ish (trying to keep each session to around 30 participants to facilitate a more skills-based, workshop style). Just “favorite” the sessions via the conference app to make sure you are on the list.

As far as the curriculum components, it kicks-off with a 60-minute session (Thursday, January 9 @ 2:30) led by Jack Kues regarding outcome study design (yes, there is more out there than pre/post). On the following day (Friday, January 10), the Alliance digs into data…a 90-minute workshop on qualitative data (guided by Wendy Turell @ 10:15), a 60-minute quantitative analysis primer (steered by Karyn Ruiz-Cordell @ 1:15) and a 60-minute discussion of data ethics (yes, it’s a thing) helmed by Gary Bird @ 4:15. On the final day (Saturday, January 11), Karen Roy will direct a 90-minute workshop on reporting.

The intention of this curriculum is to empower learners in all phases of outcome study – not just highlight one particular area. Sure, you can cherry-pick sessions in the curriculum, but be aware that good outcomes speak to each of these five areas – focusing on just a few affects the credibility of whatever conclusions you eventually draw from your data.

Leave a comment

Filed under ACEhp, Outcomes, Uncategorized

What do you mean statistically significant?

gratisography-231H(advice)Today’s post is brought to you by the letter P and the number .05.  That’s right, we’re going to dig into what it really means to be “statistically significant.”

Setting the stage…imagine you wanted to educate primary care physicians about a guideline update for the management of hypertension. To do so, envision a CME roadshow of one-hour dinner meetings hitting all the major metros in the US. And to determine whether you had any impact on awareness…how about a case-based, pre- vs. post-activity survey of participants?

Once all the data is collected, you tabulate a score (% correct) for each case-based question (ideally matched for each learner) from pre to post. Now…moment of truth, you pick the appropriate statistical test of significance, say a short prayer, and hit “calculate”. The great hope being that your P values will be less than .05. Because…glory be! That’s statistical significance!

So let’s take this scenario to its hopeful conclusion. What does it really mean when we say “statistical significance”?

Maybe not quite what you thought.

You see…statistical tests of significance (eg, chi square, t test, Wilcoxon signed-rank, McNemar’s) are hypothesis tests. And the hypothesis (or expectation) is that the two comparison groups are the same. In this case, the hypothesis is that the pre and post activity % correct (for each question) of your CME participants are equivalent. So…when you cross the threshold into “statistical significance” you’re not saying “Hey, these groups are different!” Instead you’re saying, “Hey, these groups are supposed to be the same, but the data doesn’t support that expectation!” Which, if said quickly, sounds like the same thing…but there’s a very important distinction. Statistical tests of significance do not test whether two groups are different, they test whether two groups are the same. You may jump to the conclusion that if they aren’t the same, they must be different, but statistically, you have no evidence to that point.

Yes…that is confusing. Which is probably why it gets glossed over in so many reports of CME outcomes. In actuality, you should think of P value as a sniff test. If you expect every flower to smell like a rose, P value can tell you if it does, but if P is < .05 (indicating the data doesn’t support that expectation), you can’t make any assumption about the flower’s true scent. You’d need other tests to isolate the actual smell. Same thing with our CME example…a P <.05 indicates that we can’t confirm expectation that pre and post-activity % correct for a given question are equivalent, but it doesn’t tell us that pre and post are different in any substantive way. It’s simply a threshold test…if we find statistical significance, the correct interpretation should be “Hey, I didn’t expect that, we should look into this further”.  And that’s when other tests come into play (eg, effect size).

In summary, P value is not an endpoint. Be wary of any outcome data punctuated solely by P values. Hence, the image for this post (which is from Gratisography – a truly wonderful resource of free images): are you really getting what you expected from P values?

Leave a comment

Filed under CME, Outcomes, P value, Pre vs. Post

Give me Moore

Ten years ago, the Journal of Continuing Education in the Health Professions published Achieving Desired Results and Improved Outcomes: Integrating Planning and Assessment Throughout Learning Activities, which quickly became known as the “Moore’s outcomes paper” (sorry, Green and Gallis). While applauded for several years, it has now become vogue to wave aside as an antiquated interpretation of physician learning outcomes.

Why a paper primarily focused on the design and implementation of educational interventions for clinicians was ever considered bedrock for outcome assessment, escapes me. Then again, maybe I shouldn’t be surprised that the learning objectives of a passive, print intervention would be so poorly translated to its target audience.

So what were Moore, Green and Gallis trying to communicate? Specifically…the central point of this article is that before outcomes can be measured, educational planning focused on the outcomes must occur so that these outcomes can be expected to happen (JCEHP 2009; 29: p. 5). To be fair, they do use the word “outcomes” a lot in that sentence, but the key terms are clearly “educational planning”. Overall, this was meant to be an instructional guide for planning continuing medical education (CME) – not an outcomes paper. Here are the key points:

  1. There may be five stages of physician learning
  2. If there are such stages, designing CME using the predisposing-enabling-reinforcement framework may be a good idea
  3. The seven level outcomes framework may help CME providers apply the predisposing-enabling-reinforcement framework
  4. Formative assessment is really important and can be incorporated in the predisposing-enabling-reinforcement framework

Noting a pattern here? Whole lotta chatter about the predisposing-enabling-reinforcement framework.  As far as outcomes, the “Moore’s model” is a simple amalgam of frameworks – I suspect that neither Moore nor Green nor Gallis really care which framework you use, as long as it also incorporates…let’s all say it together now…the predisposing-enabling-reinforcement framework!

While I recognize that many a fine point has been made in criticism of the Moore’s outcomes framework as an independent entity (ie, outside of the context of the article in which it was published), my concern is that we’re tossing the baby with the bathwater for those newly initiated into the field of CME. Everyone in the practice of CME should read this paper. The insights neatly tucked into 15 pages may not instantly transform a CME providers’ practice, but they will at least help tune their attention to the evidence-based barriers and facilitators to transferring clinical education to practice.

Leave a comment

Filed under CME, Formative assessment, JCEHP, Methodology, Outcomes, Predisposing-enabling-reinforcement, Summative assessment

Outcomes test drive

australia-162760_1920(broke car)

I bought my first car at 16. It was an awesome little blue 4×4 (Bronco II). The test drive was perfect. I got to blast the radio and drive off-road through a sub-division under construction. Bouncing over piles of debris, I can still remember the exhilaration. Both the seller and I laughed the whole time. Only problem…he was still laughing two weeks later, while I was on the side of the highway spitting steam and pouring oil mixed with engine coolant. That 4×4 rusted in my driveway for another year before a neighbor bought it for less than 20% of what I paid.

Yeah…I skipped the inspection part. It was just too much fun to think about that. And since it handled the test drive, what could really go wrong? I was going to be so freakin’ cool come fall in high school.

Tell me I’m the only one who’s ever dreamed of the stars and ended up on the bus.

Now that brings us to outcomes. Maybe you’ve been kicking the tires of a new CME program and hoping it will generate great outcomes? Don’t get distracted by the shiny bits…there are three key things to inspect for every outcomes project (in descending order of importance and ascending in order of coolness):

  1. Study design: the main concern here is “internal validity”, which refers to how well a study controls for the factors that could confound the relationship between the intervention and outcome (ie, how do we know something else isn’t accelerating or breaking our path toward the desired outcome?). There are many threats to internal validity and correspondingly, many distinct study designs to address them. One group pretest-posttest is a study design, so is posttest only with nonequivalent groups (ie, post-test administered to CME participants and a non-participant “control” group). There are about a dozen more options. You should understand why a particular study design was selected and what answers it can (and cannot) provide.

 

  1. Data collection: second to study design, is data collection. The big deal here is “construct validity” (ie, can the data collection tool measure what it claims?). Just because you want your survey or chart abstraction to measure a certain outcome, doesn’t mean it actually will. Can you speak to the data that supports the effectiveness of your tool in measuring its intention? If not, you should consider another option. Note: it is really fun to say “chart abstraction”, but it’s a data collection tool, not a study design. If your study design is flawed, you have to consider those challenges to internal validity plus any construct validity issues associated with your chart abstraction. The more issues you collect, the weaker your final argument regarding your desired outcome. An expensive study (eg, chart review) does not guarantee a result of any importance, but it does sound good.

 

  1. Analysis: this is the shiny bit, and just like your parents told you, the least important. Remember Mom’s advice: if your friends don’t think you’re cool, then they aren’t really your friends. Well, think about study design and data collection as the “beauty on the inside” and analysis as a really groovy jacket and great hair. Oh yeah, it matters, but rather less so if they keep getting you stuck on the highway. You may have heard statisticians are nerds, but they’re the NASCAR drivers of the research community – and I’m here to tell you the car and pit crew are more important. In short, if your outcomes are all about analysis, they probably aren’t worth much.

2 Comments

Filed under CME, Confounders, Construct validity, Internal validity, Methodology, Uncategorized

Cause and effect in CME

There is rumor of a sacred mountain in Tibet, the peak of which can be only ascended when Jupiter, Mercury and Venus are in triangular alignment. At the summit, there lives a man who will provide the truth for any one question a plucky adventurer may pose. One day, I hope to be that adventurer. My question…is CME an effective means for impacting clinician competence, performance and (daresay) patient health?

nature-sky-sunset-man.jpeg

Unfortunately, the next anticipated triangular alignment isn’t until 2021. In the interim, I have to: 1) learn how to climb mountains and 2) go about establishing cause and effect the old-fashioned way.

To that end…If I want to argue that a relationship exists between CME and some effect (eg, competence gain), I must establish three things:

  • Temporal precedence: the effect comes after the presumed cause. For example, CME participants score better on a case-based, post-activity assessment than pre-activity. Pretty straightforward, right? Who needs a mountain guru?
  • Covariation: the effect is systematically (ie, not randomly) related to the presumed cause. For example, a high level of competence would be more likely among CME participants than non-participants and/or more CME participation would equal more competence than less CME participation. Wait…this sounds like a control group study. Didn’t we (ie, me in this conversation with myself) say control groups in CME are bunk? Okay, I exaggerated a skosh. Simple post-test only nonequivalent control group design (ie, surveys to participants and nonparticipants after a CME activity) is pretty much at the bottom of the research credibility scale, but there are more robust methods to employ control groups. I’ll cover these in a subsequent post.
  • Plausible alternatives: once both temporal precedence and covariation are established, all other possible explanations for the effect (ie, confounders) must be explored. This addresses the internal validity of your assessment (ie, how well it avoids confounding). I’ll talk about some threats to internal validity in a subsequent post. Until then, note there is no perfect study: interval validity exists on a spectrum. The more internally valid (ie, the less confounded), the more confident you can be in your interpretation of cause and effect.

In absence of divine wisdom, every CME outcome assessment should speak to these three factors. I’d say we do a pretty good job establishing temporal precedence, but it’s a rare occasion to discuss covariation or confounders. Next time you find yourself creating or reviewing an outcome report, take that opportunity to push us all forward a bit on these critical factors to establishing the value of CME.

Leave a comment

Filed under Causality, CME, Covariation, Internal validity, Outcomes, Temporal precedence, Uncategorized

Losing Control

CME has been walking around with spinach in its teeth for more than 10 years.  And while my midwestern mindset defaults to “don’t make waves”, I think it’s officially time to offer a toothpick to progress and pluck that pesky control group from the front teeth of our standard outcomes methodology.

That’s right, CME control groups are bunk. Sure, they make sense at first glance: randomized controlled trials (RCTs) use control groups and they’re the empirical gold standard.  However, as we’ll see, the magic of RCTs is the randomization, not the control: without the “R” the “C” falls flat.  Moreover, efforts to demographically-match controls to CME participants on a few simple factors (eg, degree, specialty, practice type and self-report patient experience) fall well short of the vast assemblage of confounders that could account for differences between these groups. In the end, only you can prevent forest fires and only randomization can ensure balance between samples.

So let’s dig into this randomization thing.  Imagine you wanted to determine the efficacy of a new treatment for detrimental modesty (a condition in which individuals are unable to communicate mildly embarrassing facts).  A review of clinical history shows that individuals who suffer this condition represent a wide range of race, ethnicity and socioeconomic strata, as well as vary in health metrics such as age, BMI and comorbidities.  Accordingly, you recruit a sufficient sample* of patients with this diagnosis and randomly designate them into two categories: 1) those who will receive the new treatment and 2) those who will receive a placebo.  The purpose of this randomization is to balance the factors that could confound the relationship you wish to examine (ie, treatment to outcome).  Assume the outcome of interest is likelihood to tell a stranger he has spinach in his teeth.  Is there a limit to the number of factors you can imagine that might influence an individual’s ability for such candor?  And remember, clinical history indicated that patients with detrimental modesty are diverse in regard social and physical characteristics.  How can you know that age, gender, height, religious affiliation, ethnicity or odontophobia won’t enhance or reduce the effect of your treatment?  If these factors are not evenly distributed across the treatment and control groups, your conclusion about treatment efficacy will be confounded.

So…you could attempt to match the treatment and control groups on all potential confounders or you could take the considerably less burdensome route and simply randomize your subjects into either group.  While all of these potential confounders still exist, randomization ensures that both the treatment and control group are equally “not uniform” across all these factors and therefore comparable.  It’s very important to note that the “control” group is simply what you call the population who doesn’t receive treatment.  The only reason it works is because of randomization.  Accordingly, simply applying a control group to your CME outcome assessment without randomization is like giving a broke man a wallet – it’s so not the thing that matters.

Now let’s bring this understanding to CME.  There are approximately, 18,000 oncology physicians in the United States.  In only two scenarios will the participants in your oncology-focused CME represent an unbiased sample of this population: 1) all 18,000 physicians participate or 2) at least 377 participate (sounds much more likely) that have been randomly sampled (wait…what?).  For option #2, the CME provider would require access to the entire population of oncology physicians from which they would apply a randomization scheme to create a sample based on their empirically expected response rate to invitations in order to achieve the 377 participation target.  Probably not standard practice.  If neither scenario applies to your CME activity, then the participants are a biased representation of your target learners.  Of note, biased doesn’t mean bad.  It just means that there are likely factors that differentiate your CME participants from the overall population of target learners and, most importantly, these factors could influence your target outcomes.  How many potential factors? Some CME researchers suggest more than 30.

Now think about a control group. Are you pulling a random sample of your target physician population?  See scenario #2 above.  Also, are you having any difficulty attracting physicians to participate in control surveys?  What’s your typical response rate?  Maybe you use incentives to help?  Does it seem plausible that the physicians who choose to respond to your control group surveys would be distinct from the overall physician population you hope they represent?  Do you think matching this control group to participants based on just profession, specialty, practice location and type is sufficient to balance these groups?  Remember, it not the control group, it’s the randomization that matters.  RCTs would be a lot less cumbersome if they only had to match comparison groups on four factors.  Of course, our resulting pharmacy would be terrifying.

So, based on current methods, we’re comparing a biased sample of CME participants to a biased sample of non-participants (control) and attributing any measured differences to CME exposure.  This is a flawed model.  Without balancing the inherent differences between these two samples, it is impossible to associate any measured differences in response to survey questions to any specific exposure.  So why are you finding significant differences (ie, P < .05) between groups?  Because they are different.  The problem is we have no idea why.

By what complicated method can we pluck this pesky piece of spinach?  Simple pre- versus post-activity comparison.  Remember, we want to ensure that confounding factors are balanced between comparison groups.  While participants in your CME activity will always be a biased representation of your overall target learner population, those biases are balanced when participants are used as their own controls (as in the pre- vs. post-activity comparison).  That is, both comparison groups are equally “non-uniform” in that they are comprised of the same individuals. In the end, you won’t know how participants differ from non-participants, but you will be able to associate post-activity changes to your CME.

1 Comment

Filed under Best practices, CME, Confounders, Control groups, Needs Assessment, Outcomes, Power calculation, Pre vs. Post

Where did the knowledge go?

What does it mean when your CME participants score worse on a post-test assessment (compared to pre-test)?

Here are some likely explanations:

  1. The data was not statistically significant.  Significance testing determines whether we reject the null hypothesis (null hypothesis = pre- and post-test scores are equivalent).  If the difference was not significant (ie, P > .05), we can’t reject this assumption.  If the pre/post response was too low to warrant statistical testing, the direction of change is meaningless – you don’t have a representative sample.
  2. Measurement bias (specifically, “multiple comparisons”).  This measurement bias results from multiple comparisons being conducted within a single sample (ie, asking dozens of pre/post questions within a single audience).  The issue with multiple comparisons is that the more questions you ask, the more likely you are to find a significant difference where it shouldn’t exist (and wouldn’t if subject to more focused assessment).  Yes, this is a bias to which many CME assessments are subject.
  3. Bad question design. Did you follow key question development guidelines?  If not, the post-activity knowledge drop could be due to misinterpretation of the question.  You can learn more about question design principles here.

Leave a comment

Filed under Outcomes, question design, Statistical tests of significance

CME Outcomes Statistician, first grade

I was very excited to have my CMEPalooza session (Secrets of CME Outcome Assessment) officially sanctioned by the League of Assessors (LoA).  Accordingly, participants who passed the associated examination were awarded “CME Outcome Statistician, first grade” certifications.  It’s a grueling test, but three candidates made it through and received their certifications today (names withheld due to exclusivity).

Picture2

More good news…I petitioned the LoA to extend the qualifying exam for another six weeks (expiring May 29, 2015) and was officially approved!  So you can still view the CMEPalooza session (here) and then take the qualifying exam (sorry, exam is now closed). Good luck!

Leave a comment

Filed under CME, CMEpalooza, League of Assessors, Outcomes

CMEPalooza

On Tuesday, Chicago will decide on either Rahm on Chuy.  But Wednesday, it’s all about CMEPalooza.  Thank you to our industry’s “Jane’s Addiction” for organizing the third installment of this CME free-for-all.  I’ll be presenting on CME outcomes assessment (11 AM eastern). My session is designed for those that fall into the following categories:

  • Regularly use surveys to measure learning and competence change
  • No formal process for reviewing survey questions
  • Unsure of how to utilize statistical tests

Oh, but there’s more…this session has been accredited by the apocryphal League of CME Assessors (sorry, can’t provide a link due to exclusivity).  If, after completing the session, you wish to be considered for eligibility as “CME Outome Statistician, first grade”, click here (sorry, this test is now closed) to take their test. There’s even a certificate if you pass. Good luck!

2 Comments

Filed under CMEpalooza

Writing questions good

Although I’ve complained a fair bit about validity and reliability issues in CME assessment, I haven’t offered much on this blog to actually address these concerns. Well, the thought of thousands (and thousands and…) of dear and devoted readers facing each new day with the same, tired CME assessment questions has become too much to bear. That, and I was recently required to do a presentation on guidelines and common flaws in the creation of multiple-choice questions…so I thought I’d share it here.

I’d love to claim these pearls are all mine, but they’re just borrowed.  Nevertheless, this slide deck may serve as a handy single-resource when constructing your next assessment (and it contains some cool facts about shark attacks).

1 Comment

Filed under Best practices, CME, MCQs, multiple-choice questions, Reliability, Summative assessment, Survey, survey design, Validity