Monthly Archives: August 2012

P values – controlling for multiple comparisons

A p-value tells you whether to accept or reject a given hypothesis.  In CME, however, we often don’t have a hypothesis.  Sure, we expect physicians who participate in CME to have changes in competency, performance or maybe even their patients’ health, but we’re not very good at testing this directly via a single hypothesis (as compared to clinical drug trials).   Our typical approach is to give CME participants a survey containing several knowledge/self-efficacy/case questions and then run t- or chi-square tests to see if they answer differently pre vs. post (or even post vs. a control group).  This results in a p-value for each question, which means that each question is essentially a hypothesis.  If you’re going to have more than one hypothesis in a single study, you need to control for multiple comparisons.  This is because each additional hypothesis applied to a single study increases the likelihood that any one difference uncovered is due to chance (as opposed to a true difference between the comparison groups).

For example….if you conduct a single statistical test and use the conventional p-value (.05), there is only a 5% chance that you’ll reject your null hypothesis (i.e., find that a difference exists between groups) and be incorrect.  But if you have a 20-question survey and you’re conducting a statistical test for each question, you now have a 64% chance of making one or more false findings (the formula from which this was derived can be found here).

Although I’d first recommend not conducting multiple comparison, there aren’t many viable alternatives for most CME providers and such an approach can have value for hypothesis-generation.  That being said, a simple way to address the multiple comparison issue is via the Simes-Hochberg correction [1,2].

Here are the steps:

  1. After you’ve run all your statistical tests, order all P-values from high to low.
  2. If the highest p-value is < .05,  stop here, all tests are significant.
  3. If the second highest p-value is less than < .025 (which is .05/2), then stop here, all following tests are significant.
  4. If the third highest p-value is less than .017 (which is .05/3), then stop here, all following tests are significant.
  5. And so on, comparing the p-value with .05 divided by it’s ranking among all multiple comparison p-values.

Here’s an example (note that 5 comparisons were significant prior to the multiple comparison correction, after which none of the comparisons maintained statistical significance):


P value


Adjusted p-value

Question 1




Question 2




Question 3




Question 4




Question 5




Question 6




Question 7




Question 8




Question 9




Question 10





1. Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika 1986;73:751–54.

2. Hochberg Y. A sharper Bonferroni procedure for multiple significance testing. Biometrika 1988;75:800–02.

Leave a comment

Filed under P value, Statistics

Open-ended Survey Questions: Non-response Bias

You can’t make everyone happy.  I don’t think I’ve ever seen outcome data for a CME activity that didn’t include at least one harsh comment in the open-ended feedback section.  Although in the minority, something about these comments make them feel particularly weighty.  Maybe it’s because someone actually took the time to write something down – as opposed to simply checking boxes on an evaluation form.  When you find yourself (or a sponsor) particularly affected by such comments, consider the following…

One consideration in the interpretation of survey data is non-response bias.  Non-response bias is the possibility that individuals responding to a survey differ from non-respondents in a way that limits the generalizability of survey data to the overall CME participant population being evaluated.  Generally speaking, the lower the survey response, the greater the potential for non-response bias.  For example, a CME evaluation survey with a 20% response rate is less likely to be representative of the overall CME participants than a survey with a 40% response rate. The concern is that the 20% who choose to complete the survey are unique in some way that creates a bias in the data.  The higher the response rate, the less likely survey respondents are distinct from the overall population of CME participants.

Open-ended questions are particularly susceptible to non-response bias. Even when someone elects to respond to a survey, research has shown that these respondents complete open-ended questions less than 40% of the time (Borg, 2005; Poncheri et al., 2008; Siem, 2005).  So even if survey respondents are deemed representative of the overall population (for example, based on a demographic comparison between respondents and the overall population), the subgroup of survey respondents who complete the open-ended questions may differ enough to introduce bias.

So do respondents who complete open-ended questions differ from non-respondents? Research has shown that survey respondents with lower satisfaction are more likely to respond to open-ended questions than satisfied respondents (McNeely, 1990; Poncheri et al. 2008).  This is supported by the general psychological phenomenon that dissatisfied individuals are more likely to consider the causes of their dissatisfaction than satisfied individuals are to consider the source of their satisfaction – accordingly, satisfied individuals will have less to communicate than dissatisfied individuals when asked to provide comments (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001; Peeters, 1971; Harman-Poncheri, R, 2008).

By focusing on open-ended comments in CME evaluation surveys, we may be drawing conclusions based only on the least satisfied respondents (which are likely a minority of the overall CME participants).  Although such feedback is still valuable in the identification of areas of improvement, assuming such feedback is reflective of the whole would likely skew our perception of how CME participants really feel about their CME experience.


  • Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, D. K. (2001). Bad is stronger than good. Review of General Psychology, 5, 323-370.
  • Borg, I. (2005, April). Who writes what kinds of comments? Some new findings. In A. I. Kraut (Chair), Grappling with write-in comments in a web-enabled survey world. Symposium conducted at the 20th annual conference of the Society for Industrial and Organizational Psychology, Los Angeles, California.
  • Harman-Poncheri, R. Understanding Survey Comment Nonresponse and the Characteristics of Nonresponders. Dissertation, North Carolina State University, 2008.
  • McNeely, R. L. (1990). Do respondents who pen comments onto mail surveys differ from other respondents? A research note on the human services job satisfaction literature. Journal of Sociology & Social Welfare, 17(4), 127-137.
  • Peeters, G. (1971). The positive-negative asymmetry: On cognitive consistency and positivity bias. European Journal of Social Psychology, 1, 455-474.
  • Poncheri, R. M., Lindberg, J. T., Thompson, L. F., & Surface, E. A. (2008). A comment on employee surveys: Negativity bias in open-ended responses. Organizational Research Methods, 11, 614-630.
  • Siem (2005, April). History of survey comments at the Boeing Company. In K. J. Fenlason (Chair), Comments: Where have we been? Where are we going? Symposium conducted at the 20th annual conference of the Society for Industrial and Organizational Psychology, Los Angeles, California.

Leave a comment

Filed under Bias, CME, Open-ended questions, Response rates, Survey