This past Thursday, I gave a short presentation on effect size at the SACME Spring Meeting in Cincinnati (a surprisingly cool city, by the way – make sure to stop by Abigail Street). Instead of a talk about why effect size is important in CME, I focused on its limitations. My expectation was feedback about how to refine current methods. My main concerns:
- Using mean and standard deviation from ordinal variables to determine effect size (how big of a deal is this?)
- Transforming Cramer’s V to Cohen’s d (is there a better method?)
- How many outcome questions should be aggregated for a given CME activity to determine an overall effect? (my current minimum is four)
The SACME slide deck is here. I got some good feedback at the meeting, which may lead to some changes in the approach I’ve previously recommended. Until then, if you have any suggestions, let me know.
Statistics can help answer important questions about your CME. For example, was there an educational effect and, if so, how big was it? The first question is typically answered with a P value and the second with an effect size.
If this were 10 years ago, you’d either be purchasing some expensive statistical software or hiring a consultant to answer these questions. Today (thank you Internet), it’s simple and basically free.
A step-by-step approach can be found here.
CE Measure just published our manuscript regarding effect size in CME assessment. In it, we compare traditional tests of statistical significance (e.g., t-test) with effect size measures and then provide a step-by-step guide for calculating Cohen’s d (one of the more popular effect size measures).
Check it out. Like this blog – all the information is free.
Over the previous three posts, I introduced effect size, discussed its calculation and interpretation, and even provided an example of how you can use effect size to demonstrate the effectiveness of your overall CME program. My intention was to present a method for CME assessment that is both practical and powerful.
For those a bit more statistically savvy, you likely noticed that my previous effect size example focused on paired, ordinal data. That is, I used a pre- vs. post-activity survey (i.e., paired) comprised of rating-scale (i.e., ordinal) questions. I chose this path because it’s fairly common in CME outcome assessments and it’s the most straightforward calculation of Cohen’s d (which was the effect size measure of interest).
Here are some other scenarios:
- If you’re using pre- vs. post-activity case-based surveys, you’re now working with paired, nominal (or categorical) data that has most likely been dichotomized (e.g., transformed into correct/evidence-based preferred answer = 1, all other responses = 0). In this case, the road to effect size is a bit more complex (i.e., use McNemar’s to test for statistical significance, calculate an odds ratio[OR], and convert the odds ratio to Cohen’s d). Of note, an OR is itself an effect size measure, and converting this to Cohen’s d is optional. The formula for this conversion is d = ln(OR)/1.81 (Chinn S: A simple method for converting an odds ration to effect size for use in meta-analysis. Statistics in Medicine 2000, 19:3127-3131).
- If you’re using post-activity case-based surveys administered to CME participants and a representative control group, you’re now working with unpaired, nominal data (that is typically dichotomized into correct answer vs. incorrect answer). In this case, you’ll use a chi-square test (if the sample is large) or Fisher’s exact test (if the sample is small) and also calculate a Cramer’s V. You’ll then need to convert Cramer’s V to Cohen’s d (which you can do here).
If you’ve been doing this, or any other analysis incorrectly (as I have in the past, often do in the present, and bet on in the future). Don’t fret. Statisticians are constantly pointing out examples of faulty use of statistics in the peer-reviewed literature (even in such prestigious journals as JAMA and NEJM). Keep making mistakes, it means you’re moving forward.
In the previous two posts, I introduced effect size, walked through an effect size calculation and provided some insight regarding interpretation. Now I want to quickly identify one application of effect size data: ACCME reaccreditation.
ACCME criterion 11 states: The provider analyzes changes in learners (competence, performance, or patient outcomes) achieved as a result of the overall program’s activities/educational interventions. I can only imagine the pages and pages of material heaped on ACCME reviewers in response to this criterion. How can you succinctly describe the effectiveness of a CME program, consisting of hundreds of activities over a two-, four- or six-year period? Oh yeah, effect size.
If you remember from the last post, effect sizes can be aggregated across activities as long as the education outcome measurement (EOM) approach remains the same. So assume you’re a healthcare system that regularly produces RSS, conferences and eLearning activities. Furthermore, assume your standard EOM approach across these activities is to measure self-reported utilization of clinical tasks related to CME activity content. If you’ve been calculating an effect size for each of these activities, you can aggregate the effect size scores across all of these activities to come up with a single effect size for competence (Level 4 outcome). Compare this effect size to the benchmarks identified in the previous post (e.g., 0.2 = small, 0.5 = medium, and 0.8 = large) and you have data-based evidence of your overall program effectiveness at this outcome level (see example Figure).
Taking it a step further, you can stratify effect size by format type, which would tell you how effective your eLearning was in relation to conferences or RSS (see example Figure 2). You can even further stratify by topic focus to see how effective your primary care CME was in relation to rheumatology-based CME, for example.
Now you’re responding to criterion with just a few figures and explanatory paragraphs. And you’re using good data to do so. Maybe now the next reaccreditation review won’t look so scary.
In the previous post, I introduced effect size (more specifically, Cohen’s d) as a statistical tool that can answer whether a CME activity was effective, as well as quantify the magnitude of this effectiveness and allow for comparisons of effectiveness across CME activities. Using Cohen’s d, a CME provider can report the effectiveness of an annual meeting in affecting, for example, participant competency (Level 4 outcomes) and then compare the magnitude of effect to previous year’s meetings and/or other CME activities of similar format or topic focus. Ultimately, a CME provider can determine benchmarks for effectiveness at each outcome level (or for each educational format) to quickly diagnose the performance of each CME activity. That sort of info comes in real handy for accreditation review and for communicating with sponsors (but that will be the focus of the next post).
So, all that being said, it’s now time to discuss how to actually calculate a Cohen’s d. One caution: you will not need a statistician, an advanced grasp of mathematics, or any specialty certification…if you can calculate (or more likely, use MS Excel to calculate) an average, standard deviation and have access to the Internet, you’re good.
I’ll set the stage with a common example: assume that you are a CME provider who just produced a 2-hour, mixed didactic-interactive case discussion regarding advances in the detection, evaluation and treatment of high blood cholesterol in adults. You used a paper-based survey (administer both pre- and post-activity) to measure participants self-reported utilization (on a 5-point scale) of clinical tasks related to the CME activity content. Each survey consisted of eight assessment items (i.e., clinical tasks). Now you want to summarize this pre- vs. post-activity data into a single effect size. The steps for such are as follows:
- Calculate a mean rating and standard deviation for each assessment item in the pre-survey.
- Calculate a mean rating and standard deviation for each assessment item the post-survey.
- Type “effect size calculator” into Google and click any of the identified links (I like to use this one).
- Enter the data from items #1 and 2 (above) into the effect size calculator.
- Behold the effect size for your activity!
There is one more step…interpretation. For that, you need to be aware of the following:
- Cohen’s d is expressed in standard deviation units. Accordingly, a Cohen’s d of 1.0 indicates that one standard deviation separates the pre-activity average rating vs. the post-activity average rating (with the post-activity rating being greater).
- Cohen’s d is proportional. Therefore, a Cohen’s d of 1.0 is twice the magnitude of a Cohen’s d of .5 (or half the magnitude of a 2.0).
- There is no upper or lower bound to the possible range of Cohen’s d. The maximum expected range of Cohen’s d is from -3 to +3, but the majority is expected to fall within -1 to +1.
- Benchmarks are used to assess the magnitude of a Cohen’s d. Based on repeated measurement, benchmarks (or expected ranges of Cohen’s d) can be established in a given area (e.g., mixed, didactic-interactive CME). In areas where benchmarks remain to be established, the following preliminary benchmarks can be used to assessed magnitude of effect: 0.2 (small), 0.5 (medium) and 0.8 (large) (Cohen 1988).
- You can compare the Cohen’s d from one activity to the d from any other activity that used a similar outcome assessment method (i.e., case-based survey).
- You can aggregate Cohen’s d across activities (i.e., take an average d across all of your eLearning activities, or all of your cholesterol-focused CME – assuming you used the same outcome assessment method for these activities [see item #5 above]).
And just like that, you are now proficient in calculating and interpreting effect size in CME. I told you this would be easy. Now go forth and make this look hard to all of your competition.
Reference: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd edition. Erlbaum, Hillsdale, NJ.
Imagine you’ve created a survey to assess changes in competence among participants in a CME activity (pre- vs. post-participation). Let’s say there’s a total of eight questions in your survey (each one asking participant to self-report their utilization of a specific clinical task related to the CME activity content on a 5-point scale). One week after the CME activity is completed, you’re staring at an Excel spreadsheet containing all of the pre- and post-activity responses. Now what? How do you use this data to determine whether your activity was effective? And how do you compare this data to the results of other outcome assessments?
If this was an infomercial, I’d now break into scene (colorfully sweatered) and introduce the latest advancement in outcome measurement: Cohen’s d. It slices, it dices…it reduces all of your outcomes data into a single metric that summarizes the overall effectiveness of your CME. But wait, there’s more…you can use Cohen’s d to compare the effectiveness of your CME across activities. That’s right. Want to know how effective this year’s conference was relative to last year? Cohen’s d. Want to know how effective your live CME activity was relative to its repurposed enduring material? Cohen’s d. Want to know the effectiveness of your overall CME program by topic? Or by format? That’s right, Cohen’s d.
Although this may sound like an innovation, Cohen’s d has been around for decades. It’s even been used in CME; however, this has largely been restricted to academic publications. But the claims made above are true. Cohen’s d can answer the following questions: 1) Was my CME effective? 2) How effective was it? And 3) how effective was my CME relative to other CME activities? Better still, it’s remarkably simple to calculate – if you can calculate a mean and a standard deviation (not by hand, of course, use Excel), then you can calculate a Cohen’s d.
I’m going to dedicate the next two blog posts to Cohen’s d. The first will provide step-by-step instructions of how to calculate it and interpret the results. The next will demonstrate how you can use Cohen’s d to assess the effectiveness of your CME beyond the individual activity (i.e., by topic, or format, or year). And I promise, you’ll be amazed at how easy this is to do.