19.Aug.2015

Kevin Lyons
Research Supervisor

Does Your Sample Size Matter?

3 Tips to Ensure Reliable Research Data

When discussing survey research, one of the first things our clients want to know is what we think the sample size will be and if that number will be “good” enough. In truth, there is no magic number that makes a sample good or valid. In fact, rather than focusing in on the sample size alone, there are two other important factors that provide greater insight into the validity of survey data: margin of error and the representativeness of the sample.

Margin of error
A reliable survey is consistent and each time you conduct it, you get, roughly, the same information. The margin of error measures reliability. Any survey takes a sample population from the whole population and then generalizes the results to the whole population. This, invariably, leads to a possibility of error because the whole is unlikely to be described by any one particular part. For example, if a survey finds that 36 percent of the respondents watch television while eating lunch, the information is incomplete. While the margin of error is specified, say 4 percent, this means the 36 percent should be interpreted as 32 to 40 percent. Margins of errors are especially important to consider when looking for differences between waves of benchmark research and between segments of respondents.

In statistics, the two most fundamental concepts behind sample size and margin of error are:
• The larger the sample size, the smaller the margin of error
• After a point, increasing the sample size beyond what you already have provides you a diminished return because the increased accuracy will be negligible.

In general, the precision of an estimate is related to the square root of the sample size – in other words, to double the precision, the sample size must be quadrupled. As a general rule, sample sizes of 200 to 300 respondents provide an acceptable margin of error and fall before the point of diminishing returns.

Table A
Approximate margin of error
(at a 95% confidence level)
Survey sample size Margin of error
2,000 +/- 2
1,500 +/- 3
1,000 +/- 3
900 +/- 3
800 +/- 3
700 +/- 4
600 +/- 4
500 +/- 4
400 +/- 5
500 +/- 4
300 +/- 6
200 +/- 7
100 +/- 10
50 +/- 14
Note: This table reflects the convention for survey researchers to report sampling errors that are based on a 50 percent split, where the margin of error is largest 

Table A is a simplified approximation of sample sizes and their resulting margin of errors. A more precise determination can be provided by using one of many free online calculators including the one embedded below made available by Wolfram|Alpha that also takes into consideration the standard deviation (how different the responses are) and the sample mean (as margins of error are largest around 50 percent – where there is the most disagreement – and smallest around 0 and 100 percent – where there is the most agreement).

Representativeness of sample

The second key consideration is the representativeness of the sample. To ensure a representative sample, one can simply compare key demographic and behavioral statistics for the entire study population to those of the sample. To find out the statistics from the study population, query a constituent database or pull data from a trusted source (e.g., the Census Bureau). Then compare to the survey data to make sure that a population sample is geographically, demographically, and, if possible, behaviorally representative. You can calculate the total variance of your sample from the population distributions fairly easily in a spreadsheet – a good sample doesn’t vary substantially.

Table B shows an example of this analysis for a donor survey conducted via email. A common concern of our clients is that an online survey of donors will not reflect the donor population as a whole (including those who do not have or provide their email address). Key demographic (age and gender) and behavioral (donation) characteristics are fairly similar across the survey population, the email-addressable portion of the donor database, and the donor database as a whole.

Table B
Example of a fairly representative survey sample
Survey population Email-addressable donors in database All donors in database
Gender
Male 43% 44% 44%
Female 57% 56% 56%
Age
Under 30 4% 7% 5%
30-45 27% 29% 29%
46-64 32% 33% 26%
65 and older 37% 31% 40%
Last Donation
Within past year 26% 25% 26%
1 to 3 years ago 34% 33% 31%
More than 3 years ago 40% 42% 43%

If the survey sample is found to not be representative, two good options exist to correct the situation. The first is weighing the data. For example, if everything else appears to be in line but females are over-represented and males are under-represented as a percentage of the total sample, one could simply place more weight on each single male response and less weight on each single female response. The second option is collecting more data, but only within the under-represented population. In the end, both approaches provide sample data that better mirror the population.

A final consideration related to representation are the individuals who are inherently outliers or drastically different from the rest of the sample (and population). For research among constituents, this may be an individual with an atypical relationship with the organization (e.g., a board members, staff members). An outlier may provide wildly different responses to survey questions, skewing the results. An example would be when asking donors how frequently they visit the organization’s webpage. A donor who is also a staff member may visit the page daily, boosting up the average when looked at as a summary statistic. It is very important that survey invitation lists be carefully constructed to ensure that outliers are deliberately included or excluded.

Tips for collecting a good sample

Occasionally, we are also asked sometimes for tips for collecting a good and robust sample. First, make sure to keep the survey open long enough to provide adequate participation. This will not only boost the total number of responses and overall participation rate, but it will also ensure no sampling bias. For example, conducting a survey for a few days around school vacation week could lead to under-sampling of parents of school-age children. However, keeping the survey open for a few weeks with periodic reminder emails ensures that all types of potential respondents have an opportunity to participate.

You can also ensure a good sample by partnering with a third-party for some types of your projects. Fund raising and attitudes toward giving queries are one example of this type of project. In these cases, partnering with a third party helps to ease concerns about potential solicitation and ensuresand ensuring that raw, respondent-level data will not be shared helps ease concerns that participating in the survey will result in a solicitation. Another benefit of partnering partnership is gaining the expertise of survey specialists. If a survey is poorly designed, potential respondents will drop out and not complete the survey – opening the potential for bias.

The final tip is to pay attention to best practices in email marketing, if the survey invitation is being emailed. Make the invitation (and survey tool) look as professional as possible. Using your organization’s branding helps, as does having the request come “from” an actual person at member within the organization – including a physical address and all relevant contact information. Be sure to provide a mechanism for individuals to opt out and honor those requests. Also, stagger your reminders to non-respondents and send frequently enough to catch people at a convenient time, but do not ask more than two or three times in total.

By following these best practices, and by understanding the principles of survey statistics, you can feel confident that your survey data are valid and that your decisions are guided by solid information.