I was recently asked the following:

*Do you have any information on the sample size needed to obtain statistical significance for surveys?*

That depends on the type of survey. If you’re looking for sample size necessary for a needs assessment survey, you can find clear instructions here. For a comparative assessment (e.g., participants pre- vs. post-CME activity or CME participants vs. representative control group), the necessary sample size would be determined by a power calculation…but don’t worry about how to do a power calculation, odds are it doesn’t fit your assessment.

A very helpful explanation of power calculations by Professor Mean (think “average” not “unpleasant”) can be found here. Professor Mean details three things needed for a power calculation:

- a research hypothesis,
- a standard deviation for your outcome measure, and
- an estimate of a clinically relevant difference for this outcome measure.

The standard CME assessment is as follows: participants in a CME activity are given a survey (this survey consists of case-based questions, likert-scale questions, or both) and their responses to this survey are compared pre- vs. post-participation, post-participation vs. the responses of a representative non-participant group, or both. Other than the umbrella expectation that CME participants will respond better to each question after CME exposure (i.e., more in accordance with the educational messages of the CME activity), there is seldom a specific hypothesis defined (see power calculation criteria #1 above). You could argue that each survey question is a hypothesis, in which case you would need to be able to identify a standard deviation (criteria #2) and clinically relevant difference (criteria #3) for each. If you’re using a likert scale survey, what’s the standard deviation for self-efficacy in performing a diabetic foot exam? And if a physician’s self-efficacy climbs 1-point, is that clinically relevant? If you’re using a case-based instrument, what’s the standard deviation for prescribing a LDL-lowering drug in a patient with 0-1 risk factors for CHD and a LDL level of > 190 mg/dL? Can you imagine having to answer these questions for every CME assessment instrument for every CME activity? I can’t. Which is why I/we don’t/shouldn’t worry about power calculations.

The purpose of a power calculation is to conserve resources and protect people from harm. In regard to clinical drug trials, each subject added to your study increases both expense and exposure to potentially harmful treatment. Clearly a calculation to identify the minimum number of study subjects is useful in this setting. In CME, we want to educate as many physicians as possible and each additional physician educated *should* decrease the amount of harm experienced by their patients. Power calculations don’t make sense in CME planning, and we shouldn’t pretend otherwise.

Now for the best part…go ahead and run stastical tests on your survey data. If your results achieve statistical significance, then you had adequate power. That doesn’t mean your assessment isn’t without methodologic flaws…just that power isn’t one of them. If your results don’t achieve statistical significance, then you have two conclusions: 1) in this assessment, there was not a difference between CME participants and the comparison group, and 2) the inability to detect a difference could be due to an insufficient number of assessment participants.

I know it sounds smart to talk about power calculations, but in most cases the truth is exactly the opposite. Next time you hear someone claiming they did a power calculation for a CME assessment – ask them to answer to each of Dr. Mean’s three criteria.