Obtaining population attributes from a sample of the population under study is the basis of many study designs and the development statistical methods. The methods that are used in the analysis of sample survey datavary according the data collection algorithm, the type of questions asked during a survey and the objectives of the survey. This article deals with cross cutting issues in the analysis of survey data and is meant to be simple and to offer guidance on what a complete survey analysis should entail. It is by no means a bench mark but rather recommendations.

The process involved in survey data analysis should include the following steps:

  1. Data validation and exploratory analysis
  2. Confirmatory analysis
  3. Interpretation
  4. Documentation/storage for future analysis

Data Validation and exploratory analysis

Data validation checks that the survey questionnaires were completed correctly and present consistent and logical data. Some of the recommended data checks that you should do are but not limited to:

  • Out of range entries: These are often as a result of poor questionnaire design or faulty data entry. For instance a question asking for a respondent’s age is captured in a non-categorical question as 200 years, which is extremely unlikely.
  • Logically inconsistent data: This arises when responses to two or more variables/questions do not make logical sense when taken together. Branch logic during the design of questionnaires can help avoid this kind of data inconsistencies though not totally.
  • Coding: This would involve checking that the categorical responses are all coded and for instance there are no values in the responses that a meaningful pre-assigned label cannot be assigned. In case it is desired some open-ended questions in the survey be categorized, human expertise, possibly with the aid of qualitative analysis tools, can be employed to group responses.

Once the above checks are done the exploratory tables and graphs can be generated. During exploratory analysis the data cleaning/validation process actually extends since the summaries can bring out other problems, if any, in the data. In the exploratory survey analysis you should be looking for:

  • Odd or extreme observations that might be errors to correct
  • The major patterns which answer the questions or objectives. For example, is the proportion of men higher among those suffering from a particular condition than those who are not?
  • Indications that the results will be clearer if the variables are modified. For example when recoded or transformed.
  • Patterns which might suggest new questions that will be more revealing than those originally posed. This is important for hypothesis generation.

Where simple random sampling is not applied statistical adjustments such as weighting are necessary in preparation for the definitive analyses. However, the researcher is likely in good shape to start generating information from the data without the adjustment. The common statistical adjustments are:

  • Weighting: This is adjusting data so that a particular respondent or case is given more or less importance than other respondents or cases. This ensures the data is more representative of the source population. Typical practice is to assign observations/cases weights proportional to their probability of selection into the sample during the survey.
  • Variable re-specification: This creates new variables from the original variables by redefining the meanings or re-categorizing already categorical variables. For example, when a number of categories used to answer a question are collapsed into fewer categories, as would happen if we reduce a ten category scale into say two categories by collapsing them.
  • Scale transformation: This can be used when a survey employs scales of differing length and types with the goal of either comparability or compatibility.


Confirmatory Data Analysis

The exploratory analysis can be used to describe what going on, but this will only be tentative. You need to confirm that the patterns are real. You need estimates of uncertainty, like standard errors or confidence intervals, for key estimates because of the sampling errors. At this point formal statistical analysis is required.

The statistical analysis procedure to adopt should depend on:

  • the design of the survey study
  • the type of response variable
  • the type of the explanatory variables

Standard data analysis for sample survey includes computing for the proportion of variables and their standard errors. Continuous response/dependent variables can be analysed through simple linear regression or multiple linear regression models. Non-linear regression can sometime be employed if the nature of the relation between variables being correlated is not well characterized by a linear relationship.

Relationships between ordinal variables can be assessed using statistical methods such as the Spearman’s rank correlation and Kendall’s tau. Working with nominal data usually includes identifying the percentage of responses per category where the Chi-square tests of the Fisher’s exact test are commonly used to assess the relationship between two nominal variables. In case the response variable is a binary variable and its relationship with more than one explanatory variable is of interest logistic regression can be applied, otherwise, if the response is ordinal, ordinal logistic regression is suggested. For surveys where lower level observations are clustered within higher units of observations the use of multilevel modeling can be appropriate approach to analysis.

The use of cross-tabulations can be very effective in presentations if they center on key findings or objective of the sample survey; cross-tabulations are usually an essential component of summary and comparative analysis.


Interpretation of results

When you are done with the analysis it’s now time to really think about what the results mean in the context of the problem at hand. Below are points you should mind or do during your data interpretation:

  • Clearly state what is the substantive new knowledge which has been generate.
  • Discus whether this new information and understanding adds more examples of something previously known, means that general rules or principles can be stated with more confidence, or means that current understanding or theory has to be substantially modified.
  • Use the quantitative information from the sample survey to make quantitative predictions about the target population.
  • Explain how your results help towards the objective of the survey and what the next steps, if any, will be.


Documentation and storage for future analysis

Documentation cannot be overemphasized! Someone else might need to independently replicate your results; actually you might find yourself referring back to what you did a while back and if this is not well documented you might run into problems recalling. We have previously discussed the importance of documentation as part of data management in a post. Please take some time to read through.


Did you like this article? Do you have any questions or need a clarification? Don’t hesitate comment below or mail us! And don’t forget to share with your peers!