Category Archives: Needs Assessment

Losing Control

CME has been walking around with spinach in its teeth for more than 10 years.  And while my midwestern mindset defaults to “don’t make waves”, I think it’s officially time to offer a toothpick to progress and pluck that pesky control group from the front teeth of our standard outcomes methodology.

That’s right, CME control groups are bunk. Sure, they make sense at first glance: randomized controlled trials (RCTs) use control groups and they’re the empirical gold standard.  However, as we’ll see, the magic of RCTs is the randomization, not the control: without the “R” the “C” falls flat.  Moreover, efforts to demographically-match controls to CME participants on a few simple factors (eg, degree, specialty, practice type and self-report patient experience) fall well short of the vast assemblage of confounders that could account for differences between these groups. In the end, only you can prevent forest fires and only randomization can ensure balance between samples.

So let’s dig into this randomization thing.  Imagine you wanted to determine the efficacy of a new treatment for detrimental modesty (a condition in which individuals are unable to communicate mildly embarrassing facts).  A review of clinical history shows that individuals who suffer this condition represent a wide range of race, ethnicity and socioeconomic strata, as well as vary in health metrics such as age, BMI and comorbidities.  Accordingly, you recruit a sufficient sample* of patients with this diagnosis and randomly designate them into two categories: 1) those who will receive the new treatment and 2) those who will receive a placebo.  The purpose of this randomization is to balance the factors that could confound the relationship you wish to examine (ie, treatment to outcome).  Assume the outcome of interest is likelihood to tell a stranger he has spinach in his teeth.  Is there a limit to the number of factors you can imagine that might influence an individual’s ability for such candor?  And remember, clinical history indicated that patients with detrimental modesty are diverse in regard social and physical characteristics.  How can you know that age, gender, height, religious affiliation, ethnicity or odontophobia won’t enhance or reduce the effect of your treatment?  If these factors are not evenly distributed across the treatment and control groups, your conclusion about treatment efficacy will be confounded.

So…you could attempt to match the treatment and control groups on all potential confounders or you could take the considerably less burdensome route and simply randomize your subjects into either group.  While all of these potential confounders still exist, randomization ensures that both the treatment and control group are equally “not uniform” across all these factors and therefore comparable.  It’s very important to note that the “control” group is simply what you call the population who doesn’t receive treatment.  The only reason it works is because of randomization.  Accordingly, simply applying a control group to your CME outcome assessment without randomization is like giving a broke man a wallet – it’s so not the thing that matters.

Now let’s bring this understanding to CME.  There are approximately, 18,000 oncology physicians in the United States.  In only two scenarios will the participants in your oncology-focused CME represent an unbiased sample of this population: 1) all 18,000 physicians participate or 2) at least 377 participate (sounds much more likely) that have been randomly sampled (wait…what?).  For option #2, the CME provider would require access to the entire population of oncology physicians from which they would apply a randomization scheme to create a sample based on their empirically expected response rate to invitations in order to achieve the 377 participation target.  Probably not standard practice.  If neither scenario applies to your CME activity, then the participants are a biased representation of your target learners.  Of note, biased doesn’t mean bad.  It just means that there are likely factors that differentiate your CME participants from the overall population of target learners and, most importantly, these factors could influence your target outcomes.  How many potential factors? Some CME researchers suggest more than 30.

Now think about a control group. Are you pulling a random sample of your target physician population?  See scenario #2 above.  Also, are you having any difficulty attracting physicians to participate in control surveys?  What’s your typical response rate?  Maybe you use incentives to help?  Does it seem plausible that the physicians who choose to respond to your control group surveys would be distinct from the overall physician population you hope they represent?  Do you think matching this control group to participants based on just profession, specialty, practice location and type is sufficient to balance these groups?  Remember, it not the control group, it’s the randomization that matters.  RCTs would be a lot less cumbersome if they only had to match comparison groups on four factors.  Of course, our resulting pharmacy would be terrifying.

So, based on current methods, we’re comparing a biased sample of CME participants to a biased sample of non-participants (control) and attributing any measured differences to CME exposure.  This is a flawed model.  Without balancing the inherent differences between these two samples, it is impossible to associate any measured differences in response to survey questions to any specific exposure.  So why are you finding significant differences (ie, P < .05) between groups?  Because they are different.  The problem is we have no idea why.

By what complicated method can we pluck this pesky piece of spinach?  Simple pre- versus post-activity comparison.  Remember, we want to ensure that confounding factors are balanced between comparison groups.  While participants in your CME activity will always be a biased representation of your overall target learner population, those biases are balanced when participants are used as their own controls (as in the pre- vs. post-activity comparison).  That is, both comparison groups are equally “non-uniform” in that they are comprised of the same individuals. In the end, you won’t know how participants differ from non-participants, but you will be able to associate post-activity changes to your CME.

1 Comment

Filed under Best practices, CME, Confounders, Control groups, Needs Assessment, Outcomes, Power calculation, Pre vs. Post

The dark side of SurveyMonkey

I love SurveyMonkey…survey creation, distribution and data collection is a snap with this service (and it’s super cheap).  What could possibly be bad about making surveys so accessible to everyone?  Oh, yeah…it’s probably making surveys so accessible to everyone.  Surveys used to represent a significant time and financial investment (e.g., postage, envelop stuffing, data entry).  Now all you need is a list of emails.  Without previous barriers, the decision to survey can come a little too quickly.

Admittedly, I’ve done more than one survey too many surveys simply because it was easy…rather than necessary.  Now I’m afraid that all this ease is actually making surveying harder than ever.  There are only so many physicians, and if we’re all bombing their inboxes with survey invitations, what’s the difference between us and cheap Viagra spam?

In his recent JCEHP Editorial, Dr. Olson eloquently describes this concern:

“…a survey population is a commons, a resource that is shared by a community, and like other commons such as ocean fisheries or antibiotics, it can be degraded by overuse” (p. 94)

Dr. Olson goes on to detail five ways in which we most typically misuse this common resource – which are much easier to address than climate change.  I highly recommend reading this editorial.   Afterward, continue to “reduce, reuse, recycle” and add: resist.


Filed under Best practices, CME, JCEHP, Needs Assessment, Survey

Thoughts on organizing your outcomes data

An experiment begins with a hypothesis. For example…I suspect that the next person to enter this coffee shop will be a hipster (denied, by the way).

A neat and tidy hypothesis for CME outcome assessment might read: I suspect that participants in this CME activity will increase compliance with <insert evidence-based quality indicator here>.

Unfortunately, access to data that would answer such a question is beyond the reach of most CME providers. So we use proxy measures such as knowledge tests or case vignette surveys through which we hope to show data suggestive of CME participants increasing their compliance with <insert evidence-based quality indicator here>.

Although this data is much easier to access, it can be pretty tedious to weed through. Issue #1: How do you reduce the data across multiple knowledge or case vignette questions into a single statement about CME effectiveness? Issue #2: How do you systematically organize the outcomes data to develop specific recommendations for future CME?

For issue #1, I’d recommend using “effect size”. There’s more about that here.

For issue #2, consider organizing your outcome results into the following four buckets (of note, there is some overlap between these buckets):

1. Unconfirmed gap – pre-activity question data suggests knowledge or competence already high (typically defined as >70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response). Important note: although we shouldn’t expect every anticipated gap to be present in our CME participants, one cause of an unconfirmed gap (other than a bad needs assessment) is the use of assessment questions that are too easy and/or don’t align with the education.

2. Confirmed gap – pre-activity questions data suggest that knowledge or competence is sufficiently low to warrant educational focus (typically defined as <70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response)

3. Residual gap

a. Post-activity data only = typically defined as <70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no evidence-based correct response

b. Pre- vs. post-activity data = no significant difference between pre- and post-activity responses

4. Gap addressed

a. Post-activity data only = typically defined as >70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response

b. Pre- vs. post-activity data = significant difference between pre- and post-activity responses

Most important to note, if the outcome assessment questions do not accurately reflect gaps identified in the needs assessment, the results of the final report are not going to make any sense (no matter how you organize the results).

Leave a comment

Filed under CME, Gap analysis, Needs Assessment, Outcomes, Reporting, Statistics

Formative Assessment

Outcomes assessment is “summative”, which is fancy for measures whether desired results have been achieved.  A “formative” assessment, however, addresses something while in development to be sure it’s on track.  Moore et al (2009) make a strong case for formative assessment in CME, but leave the “how-to” details to our imagination (I guess when you’re covering every aspect of CME you need to leave a few bits out).

Here’s one recipe for formative assessment (for live CME activities):

  1. Have your course faculty develop knowledge and/or case vignette questions relative to their pending talks
  2. Turn these questions into a web-based survey (
  3. At least two weeks prior to the activity date, email the survey to all activity registrants
  4. Share the registrants’ responses with your course faculty
  5. Adjust the pending talks accordingly

If you feel the need to incentivize respondents (which I never discourage), offer them a discount off registration for another activity.  If you want more detail, check out this short JCEHP article.

I’ve used this approach a few times and it’s been generally successful (i.e., good response rate and faculty have used some of the data to modify their presentations).  However, I don’t want to pretend this approach is “setting-the-bar” for formative assessment.  If you’re not doing any such assessment, this is a good way to get started.  Play with this for a while and you’ll discover ways to get more sophisticated – just remember to share what you’re doing with the rest us!

1 Comment

Filed under Formative assessment, Methodology, Needs Assessment, Summative assessment

In support of didactic CME

Here is an excerpt from a needs assessment for a didactic CME activity:

…a 2000 survey of Kaiser Permanente physicians reported lecture to be the perceived most useful and effective CME format (1).  Although there is little evidence in support of live format CME in regard to changing physician behavior, performance or patient outcomes (2), there appears to be considerable preference for this format.  Some suggest that limitations of methods for attributing physician practice and/or patient health changes to CME make it difficult to validate any format, whereas physician preferences may be a more useful metric (3).

In that definitive data in support of any specific CME format remains to be reported and preference data of our target audience (as well as external physician populations) continues to support the live CME format, we will continue to include live CME in our programming.

1. Price DW, et al. Results of the First National Kaiser Permanente Continuing Medical Education Needs Assessment Survey. The Permanente Journal 2002;6:p76-84.
2. Davis D, et al. The impact of formal continuing medical education: do conferences, workshops, rounds and other traditional continuing education activities change physician behavior or health outcomes? JAMA 1999;282:867-74.
3. McLeod PJ and McLeod AH. If formal CME is ineffective, why do physicians still participate? Medical Teacher 2004;26:184-6.
I hope this is helpful.

Leave a comment

Filed under CME, Didactic, Needs Assessment

Calculating sample size for a needs assessment survey

Here are the steps:

1)      Determine the size of your target population.  Let’s say you want to survey pediatricians in the United States…a quick Google search (search terms = “how many US general pediatricians”) points to the American Academy of Pediatrics Division of Workforce and Medical Education Policy webpage, which reports a total of 57,200 U.S. general pediatricians (based on data from the 2006 American Medical Association Masterfile).

So, in this example, the target population size = 57,200.

2)      Determine how big a sample is needed to represent the target population.  Thankfully, there’s an abundance of free sample size calculators online.   I typically use this one.  Four things are needed to calculate sample size: 1) margin of error, 2) confidence level, 3) population size, and 4) response distribution.  Actually, the only thing you really need to know is population size (which for U.S. pediatricians is 57,200).  Just like we all accept P < .05 as the benchmark for statistical significance, the standards for margin of error, confidence level and response distribution are 5%, 95% and 50%, respectively.  Click here for a sample size calculator screen shot using U.S. pediatricians as the target population (the recommended sample size is 382).

3)      Before you start surveying, there’s one more important (and often overlooked) step: pulling a random sample from your target population for your survey pool.  To do this, you’ll need to estimate your survey response rate.  The best way to estimate your survey’s response rate is to see what’s been achieved in other studies.  Relevant to our pediatrician example, a quick PubMed search (search terms = email + pediatricians + survey) identified the following:

  • McMahon SR, et al. Comparison of e-mail, fax, and postal surveys of pediatricians. Pediatrics 2003;111:e299-303 (abstract).

This study of pediatricians in Georgia reported a 26% response rate to an email survey (after two invitations).  So if I’m expecting a 26% response rate (assuming I’m doing a web-based survey of pediatricians) and my recommended sample size is 382, then I will need to randomly select 1469 U.S. pediatricians from the AMA Masterfile (based on this calculation: 0.26[x]=382).  A 26% response rate from 1469 U.S. pediatricians randomly selected from the AMA Masterfile will meet my sample size requirement of 382.

You need to pull a random sample to reduce concerns such as self-selection bias (i.e., respondents’ decision to participate in your survey may be correlated with traits that affect the study, making the participants a non-representative sample).  The are a number of ways to pull a random sample, as well as a number of factors that dictate which method to use (click here for the Wikipedia summary).  In the following paragraph, I describe a method for pulling a “simple random sample“.

You can identify a random sample using MS Excel.  Start with an Excel spreadsheet containing everyone in your target population (continuing with our example that would be 57,200 U.S. pediatricians).  Create a new column (call it “random sample”) and type this formula in the first cell: =RAND().  This will provide a random number between 0 and 1.  Copy and paste this formula into all cells in that row.  You now have a random number in each row.  Sort the entire worksheet based on this column.  Select the first however many needed for random sample (in our case, the first 1469).  This is your survey pool.  Of note, after the “sort”, the random number in each row will re-calculate (making it look like they were never sorted).  Ignore this.  The numbers were sorted first (by ascending or descending)  and then the random values recalculated.  This column will recalculate every time you run a function in this worksheet.


Filed under Needs Assessment, Sample size