Tag Archives: statistics

November 17, 2014 · 11:57 am

Bringing boring back

I want to play guitar. I want to play loud, fast and funky. But right now, I’m wrestling basic open chords. And my fingers hurt. And I keep forgetting to breathe when I play. And my daughter gets annoyed listening to the same three songs over and over. But so is the way.

When my daughter “plays”. She cranks up a song on Pandora, jumps on and off the furniture, and windmills through the strings like Pete Townshend. She’d light the thing on fire if I didn’t hide the matches. Guess who’s more fun to watch. But take away the adorable face and the hard rock attitude and what do you have? Yeah…a really bad guitar player.

I was reminded of this juxtaposition while perusing the ACEhp 2015 Annual Conference schedule. I know inserting “patient outcomes” into an abstract title is a rock star move. But on what foundation is this claim built? What limitations are we overlooking? Have we truly put in the work to ensure we’re measuring what we claim?

My interests tend to be boring. Was the assessment tool validated? How do you ensure a representative sample? How best to control for confounding factors? What’s the appropriate statistical test? Blah, blah, blah… I like to know I have a sturdy home before I think about where to put the entertainment system.

So imagine how excited I was to find this title: Competence Assessments: To Pair or Not to Pair, That Is the Question (scheduled for Thursday, January 15 at 1:15). Under the assumption that interesting-sounding title and informational value are inversely proportional, I had to find out more. Here’s a excerpt:

While not ideal, providers are often left with unpaired outcomes data due to factors such as anonymity of data, and low overall participation. Despite the common use of unpaired results, literature on the use of unpaired assessments as a surrogate for paired data in the CME setting is limited.

Yes, that is a common problem. I very frequently have data for which I cannot match a respondent’s pre- and post-activity responses. I assume the same respondents are in both groups, but I can’t make a direct link (i.e., I have “upaired” data). Statistically speaking, paired data is better. The practical question the presenters of this research intend to answer is how unpaired data may affect conclusions about competence-level outcomes. Yes, that may sound boring, but it is incredibly practical because it happens all the time in CME – and I bet very few people even knew it might be an issue.

So thank you Allison Heintz and Dr. Fagerlie. I’ll definitely be in attendance.

Leave a comment

Filed under ACEhp, Alliance for CME, CME, Methodology, paired data, Statistical tests of significance, Statistics, unpaired data

Tagged as ACEhp, CME, methodology, paired data, statistical tests of significance, statistics, upaired data

October 21, 2014 · 5:29 pm

Script Concordance Tests: where have you been hiding?

When I saw the JCEHP editorial title lead with “How Significant is Statistical Significance…” I knew I’d be blogging about it. As I remember the progression through graduate school statistic courses, it began with learning how to select the appropriate significance test, progressed to application and then concluded with all the reasons why the results didn’t really mean much. So I was ready to build a “cut-and-paste” blog post out of old class papers detailing an unhealthy dependence on the results of statistical tests (which I expected to be the opinion of this editorial). And that would have worked fine, but then I found a rabbit hole: script concordance test (SCTs).

Casually introduced by the authors via an educational scenario illustrating the limitations of statistical significance, SCT is a case-based assessment method designed to measure the clinical decision-making process (as opposed to simply identifying whether someone knows a correct diagnosis or treatment). As educators, this could be quite helpful in clarifying educational gaps. For evaluators, this approach has some encouraging validity data. I’ve got a way to go before I can even claim familiarity with SCTs, but will be diving into the literature immediately (and assuming expert status by hopefully next week). If anyone else is interested, here are some suggestions to learn more:

Fournier JP, Demeester A, Charlin B. Script concordance tests: guidelines for construction. BMC Med Inform Decis Mak 2008;8:18. (click here for full article)
Charlin B, Roy L, Brailovsky C, Goulet F, van der Vleuten C. The script concordance test: A tool to assess the reflective clinician. Teach Learn Med 2000; 12:189-195. (click here for abstract)
Dory V, Gagnon R, Dominique V, Charlin B: How to construct and implement script concordance tests: insights from a systematic review. Med Educ 2012, 46:552–563. (click here for full article)
Lubarsky S, Charlin B, Cook DA, Chalk C, van der Vleuten C: Script concordance method: a review of published validity evidence. Med Educ 2011, 45:329–338. (click here for full article)

FYI – it turns out SCTs were introduced in the late 1990s. So I’m less than 20 years behind the curve, and perfectly in tune with the traditional adoption curve of evidence to clinical practice (which hovers around 17 years).

Leave a comment

Filed under Case vignettes, CME, Script concordance tests, Statistical tests of significance, Statistics

Tagged as case vignettes, CME, script concordance tests, statistical tests of significance, statistics

October 13, 2014 · 5:42 pm

Same question, two different scales

It happens. Your carefully crafted evaluation questions are administered to the survey population using a different scale pre- and post-activity. Miscommunication, cut & paste fail, whatever the cause…what do you do with the data?

Nothing. You report it as is, don’t attempt any statistical testing, and hope it doesn’t happen again.
Transform. Call on your inner MacGyver and make these two scales compatible.

Tempting as option #1 may be, this blog wouldn’t be much use if we take that route. So here are the simplest fixes:

Proportional transformation: if you want to make a 5-point scale talk to a 7-point scale, you multiple each 5-point score by 7/5 (alternatively, you could reduce a 7-point scale to 5-point by multiplying each 7-point score by 5/7).
Transform each score (e.g., all 5-point and 7-point scores) to a standard z-score using the following formula: z = (raw score – mean of raw scores)/standard deviation of raw scores.

In this case, simple may also be right (or right enough). To see how these approaches compare to more complex transformations, check out this article.

Leave a comment

Filed under CME, data, Likert scale, Statistics, transformation

Tagged as CME, data, likert scale, statistics, transformation

May 7, 2014 · 3:47 pm

Issues with effect size in CME

This past Thursday, I gave a short presentation on effect size at the SACME Spring Meeting in Cincinnati (a surprisingly cool city, by the way – make sure to stop by Abigail Street). Instead of a talk about why effect size is important in CME, I focused on its limitations. My expectation was feedback about how to refine current methods. My main concerns:

Using mean and standard deviation from ordinal variables to determine effect size (how big of a deal is this?)
Transforming Cramer’s V to Cohen’s d (is there a better method?)
How many outcome questions should be aggregated for a given CME activity to determine an overall effect? (my current minimum is four)

The SACME slide deck is here. I got some good feedback at the meeting, which may lead to some changes in the approach I’ve previously recommended. Until then, if you have any suggestions, let me know.

1 Comment

Filed under CME, Cohen's d, Cramer's V, Effect size, Statistics

Tagged as CME, Cohen's d, Cramer's V, Effect size, SACME, statistics

April 8, 2014 · 3:39 pm

Thoughts on organizing your outcomes data

An experiment begins with a hypothesis. For example…I suspect that the next person to enter this coffee shop will be a hipster (denied, by the way).

A neat and tidy hypothesis for CME outcome assessment might read: I suspect that participants in this CME activity will increase compliance with <insert evidence-based quality indicator here>.

Unfortunately, access to data that would answer such a question is beyond the reach of most CME providers. So we use proxy measures such as knowledge tests or case vignette surveys through which we hope to show data suggestive of CME participants increasing their compliance with <insert evidence-based quality indicator here>.

Although this data is much easier to access, it can be pretty tedious to weed through. Issue #1: How do you reduce the data across multiple knowledge or case vignette questions into a single statement about CME effectiveness? Issue #2: How do you systematically organize the outcomes data to develop specific recommendations for future CME?

For issue #1, I’d recommend using “effect size”. There’s more about that here.

For issue #2, consider organizing your outcome results into the following four buckets (of note, there is some overlap between these buckets):

1. Unconfirmed gap – pre-activity question data suggests knowledge or competence already high (typically defined as >70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response). Important note: although we shouldn’t expect every anticipated gap to be present in our CME participants, one cause of an unconfirmed gap (other than a bad needs assessment) is the use of assessment questions that are too easy and/or don’t align with the education.

2. Confirmed gap – pre-activity questions data suggest that knowledge or competence is sufficiently low to warrant educational focus (typically defined as <70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response)

3. Residual gap

a. Post-activity data only = typically defined as <70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no evidence-based correct response

b. Pre- vs. post-activity data = no significant difference between pre- and post-activity responses

4. Gap addressed

a. Post-activity data only = typically defined as >70% of respondents identifying the evidence-based correct answer OR agreeing on a single answer if there is no correct response

b. Pre- vs. post-activity data = significant difference between pre- and post-activity responses

Most important to note, if the outcome assessment questions do not accurately reflect gaps identified in the needs assessment, the results of the final report are not going to make any sense (no matter how you organize the results).

Leave a comment

Filed under CME, Gap analysis, Needs Assessment, Outcomes, Reporting, Statistics

Tagged as CME, Effect size, gap analysis, needs assessment, outcome reports, outcomes, statistics

March 28, 2014 · 2:25 pm

Statistical analysis in CME

Statistics can help answer important questions about your CME. For example, was there an educational effect and, if so, how big was it? The first question is typically answered with a P value and the second with an effect size.

If this were 10 years ago, you’d either be purchasing some expensive statistical software or hiring a consultant to answer these questions. Today (thank you Internet), it’s simple and basically free.

A step-by-step approach can be found here.

2 Comments

Filed under CME, CMEpalooza, Cohen's d, Effect size, P value, Statistical tests of significance, Statistics

Tagged as CME, CMEpalooza, Cohen's d, Effect size, outcomes, P value, statistical tests of significance, statistics

March 14, 2014 · 12:22 pm

Data analysis in Excel

Oh, was I excited to find VassarStats. I haven’t yet encountered a CME outcome analysis that it can’t handle – and it’s free. Yes, having to cut & paste data between Excel and VassarStats is a bit cumbersome (and subject to error), but I felt it a small price to pay. And then I found the “data analysis toolpack” in Excel. Well, actually, I found Jacob Coverstone’s CME/CPD blog, which unlocks this little secret here. We’ve been sitting on the tools all along. Thanks, Jacob, for pointing this out.

1 Comment

Filed under Microsoft Excel, Statistics

Tagged as Microsoft Excel, statistics

July 24, 2013 · 2:34 pm

Choosing the right statistical test

I cheat. Well, I use “cheat” sheets. I’ve had plenty of statistics training, but honestly, my brain just doesn’t want to hold onto the assumptions associated with this or that test. So I create little charts or tables, which I reference often. This frees up brain space to continually re-assess the pros and cons of buying a used Jeep Wrangler versus a Subaru WRX (three years strong and I still haven’t decided).

Anyway…here’s a quick reference guide for choosing a statistical test:

Answer the following questions:

What is my variable type?
Is the comparison group data paired or unpaired? (i.e., can you link data from individual respondents in the two comparison groups or not)
What is the sample size?

Once you’ve answered these questions, use the following to identify which statistical test of significance to use:

Variable type	Paired data	Sample size (in each group)	Test of significance
Categorical	No	>5	Chi-square
	No	<5	Fisher’s exact test
	Yes	N/A	NcNemar’s Test
Continuous	No	>30	Independent samples t-test
	No	<30	Mann-Whitney U test
	Yes	>30	Paired t-test
	Yes	<30	Wilcoxon signed-rank test
Ordinal	No	N/A	Mann-Whitney U test
Ordinal	Yes	N/A	Wilcoxon signed-rank test

1 Comment

Filed under CME, Fisher's exact test, Mann-Whitney U test, McNemar's Test, Statistics, t-test, Wilcoxon signed rank test

Tagged as chi-square test, CME, Fisher's exact test, Mann-Whitney U test, McNemar's test, statistics, wilcoxon signed rank test

August 23, 2012 · 9:00 am

P values – controlling for multiple comparisons

A p-value tells you whether to accept or reject a given hypothesis. In CME, however, we often don’t have a hypothesis. Sure, we expect physicians who participate in CME to have changes in competency, performance or maybe even their patients’ health, but we’re not very good at testing this directly via a single hypothesis (as compared to clinical drug trials). Our typical approach is to give CME participants a survey containing several knowledge/self-efficacy/case questions and then run t- or chi-square tests to see if they answer differently pre vs. post (or even post vs. a control group). This results in a p-value for each question, which means that each question is essentially a hypothesis. If you’re going to have more than one hypothesis in a single study, you need to control for multiple comparisons. This is because each additional hypothesis applied to a single study increases the likelihood that any one difference uncovered is due to chance (as opposed to a true difference between the comparison groups).

For example….if you conduct a single statistical test and use the conventional p-value (.05), there is only a 5% chance that you’ll reject your null hypothesis (i.e., find that a difference exists between groups) and be incorrect. But if you have a 20-question survey and you’re conducting a statistical test for each question, you now have a 64% chance of making one or more false findings (the formula from which this was derived can be found here).

Although I’d first recommend not conducting multiple comparison, there aren’t many viable alternatives for most CME providers and such an approach can have value for hypothesis-generation. That being said, a simple way to address the multiple comparison issue is via the Simes-Hochberg correction [1,2].

Here are the steps:

After you’ve run all your statistical tests, order all P-values from high to low.
If the highest p-value is < .05, stop here, all tests are significant.
If the second highest p-value is less than < .025 (which is .05/2), then stop here, all following tests are significant.
If the third highest p-value is less than .017 (which is .05/3), then stop here, all following tests are significant.
And so on, comparing the p-value with .05 divided by it’s ranking among all multiple comparison p-values.

Here’s an example (note that 5 comparisons were significant prior to the multiple comparison correction, after which none of the comparisons maintained statistical significance):

Comparison	P value	Rank	Adjusted p-value
Question 1	.43	1	.05
Question 2	.37	2	.025
Question 3	.28	3	.017
Question 4	.18	4	.0125
Question 5	.07	5	.01
Question 6	.05	6	.0083
Question 7	.04	7	.0071
Question 8	.04	8	.0063
Question 9	.03	9	.0056
Question 10	.01	10	.005

References:

1. Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika 1986;73:751–54.

2. Hochberg Y. A sharper Bonferroni procedure for multiple significance testing. Biometrika 1988;75:800–02.

Leave a comment

Filed under P value, Statistics

Tagged as CME, multiple comparisons, P value, Simes-Hochberg, statistics

April 20, 2012 · 5:12 pm

Thank you Internet: Best online statistics resource

In search of clarity regarding an obscure statistical question, I recently stumbled onto vassarstats.net. Well, thank you again, Internet (and those quiet heroes that labor after hours in dimly light cubicles, cinder-block academic offices, or 3rd bedroom/home office/craft rooms to fill you with information for no reason other than to advances the efforts of others). Check out vassarstats.net before you consider buying any statistical software – I bet it can handle just about anything related to CME outcome assessment. And it’s free.

1 Comment

Filed under CME, Statistics, Website

Tagged as CME, statistics

Tag Archives: statistics

Bringing boring back

Script Concordance Tests: where have you been hiding?

Same question, two different scales

Issues with effect size in CME

Thoughts on organizing your outcomes data

Statistical analysis in CME

Data analysis in Excel

Choosing the right statistical test

P values – controlling for multiple comparisons

Thank you Internet: Best online statistics resource

Contact:

Follow AssessCME via Email

Recent Posts

Archives

Categories