# Calculating sample size for a needs assessment survey

Here are the steps:

1)      Determine the size of your target population.  Let’s say you want to survey pediatricians in the United States…a quick Google search (search terms = “how many US general pediatricians”) points to the American Academy of Pediatrics Division of Workforce and Medical Education Policy webpage, which reports a total of 57,200 U.S. general pediatricians (based on data from the 2006 American Medical Association Masterfile).

So, in this example, the target population size = 57,200.

2)      Determine how big a sample is needed to represent the target population.  Thankfully, there’s an abundance of free sample size calculators online.   I typically use this one.  Four things are needed to calculate sample size: 1) margin of error, 2) confidence level, 3) population size, and 4) response distribution.  Actually, the only thing you really need to know is population size (which for U.S. pediatricians is 57,200).  Just like we all accept P < .05 as the benchmark for statistical significance, the standards for margin of error, confidence level and response distribution are 5%, 95% and 50%, respectively.  Click here for a sample size calculator screen shot using U.S. pediatricians as the target population (the recommended sample size is 382).

3)      Before you start surveying, there’s one more important (and often overlooked) step: pulling a random sample from your target population for your survey pool.  To do this, you’ll need to estimate your survey response rate.  The best way to estimate your survey’s response rate is to see what’s been achieved in other studies.  Relevant to our pediatrician example, a quick PubMed search (search terms = email + pediatricians + survey) identified the following:

• McMahon SR, et al. Comparison of e-mail, fax, and postal surveys of pediatricians. Pediatrics 2003;111:e299-303 (abstract).

This study of pediatricians in Georgia reported a 26% response rate to an email survey (after two invitations).  So if I’m expecting a 26% response rate (assuming I’m doing a web-based survey of pediatricians) and my recommended sample size is 382, then I will need to randomly select 1469 U.S. pediatricians from the AMA Masterfile (based on this calculation: 0.26[x]=382).  A 26% response rate from 1469 U.S. pediatricians randomly selected from the AMA Masterfile will meet my sample size requirement of 382.

You need to pull a random sample to reduce concerns such as self-selection bias (i.e., respondents’ decision to participate in your survey may be correlated with traits that affect the study, making the participants a non-representative sample).  The are a number of ways to pull a random sample, as well as a number of factors that dictate which method to use (click here for the Wikipedia summary).  In the following paragraph, I describe a method for pulling a “simple random sample“.

You can identify a random sample using MS Excel.  Start with an Excel spreadsheet containing everyone in your target population (continuing with our example that would be 57,200 U.S. pediatricians).  Create a new column (call it “random sample”) and type this formula in the first cell: =RAND().  This will provide a random number between 0 and 1.  Copy and paste this formula into all cells in that row.  You now have a random number in each row.  Sort the entire worksheet based on this column.  Select the first however many needed for random sample (in our case, the first 1469).  This is your survey pool.  Of note, after the “sort”, the random number in each row will re-calculate (making it look like they were never sorted).  Ignore this.  The numbers were sorted first (by ascending or descending)  and then the random values recalculated.  This column will recalculate every time you run a function in this worksheet.