Data analysis and presentation
Scope and purpose
Scope and purpose
Data analysis is the process of developing answers to questions through the examination and interpretation of data. The basic steps in the analytic process consist of identifying issues, determining the availability of suitable data, deciding on which methods are appropriate for answering the questions of interest, applying the methods and evaluating, summarizing and communicating the results.
Analytical results underscore the usefulness of data sources by shedding light on relevant issues. Some Statistics Canada programs depend on analytical output as a major data product because, for confidentiality reasons, it is not possible to release the microdata to the public. Data analysis also plays a key role in data quality assessment by pointing to data quality problems in a given survey. Analysis can thus influence future improvements to the survey process.
Data analysis is essential for understanding results from surveys, administrative sources and pilot studies; for providing information on data gaps; for designing and redesigning surveys; for planning new statistical activities; and for formulating quality objectives.
Results of data analysis are often published or summarized in official Statistics Canada releases.
A statistical agency is concerned with the relevance and usefulness to users of the information contained in its data. Analysis is the principal tool for obtaining information from the data.
Data from a survey can be used for descriptive or analytic studies. Descriptive studies are directed at the estimation of summary measures of a target population, for example, the average profits of owner-operated businesses in 2005 or the proportion of 2007 high school graduates who went on to higher education in the next twelve months. Analytical studies may be used to explain the behaviour of and relationships among characteristics; for example, a study of risk factors for obesity in children would be analytic.
To be effective, the analyst needs to understand the relevant issues both current and those likely to emerge in the future and how to present the results to the audience. The study of background information allows the analyst to choose suitable data sources and appropriate statistical methods. Any conclusions presented in an analysis, including those that can impact public policy, must be supported by the data being analyzed.
Prior to conducting an analytical study the following questions should be addressed:
Objectives. What are the objectives of this analysis? What issue am I addressing? What question(s) will I answer?
Justification. Why is this issue interesting? How will these answers contribute to existing knowledge? How is this study relevant?
Data. What data am I using? Why it is the best source for this analysis? Are there any limitations?
Analytical methods. What statistical techniques are appropriate? Will they satisfy the objectives?
Audience. Who is interested in this issue and why?
Ensure that the data are appropriate for the analysis to be carried out. This requires investigation of a wide range of details such as whether the target population of the data source is sufficiently related to the target population of the analysis, whether the source variables and their concepts and definitions are relevant to the study, whether the longitudinal or cross-sectional nature of the data source is appropriate for the analysis, whether the sample size in the study domain is sufficient to obtain meaningful results and whether the quality of the data, as outlined in the survey documentation or assessed through analysis is sufficient.
If more than one data source is being used for the analysis, investigate whether the sources are consistent and how they may be appropriately integrated into the analysis.
Appropriate methods and tools
Choose an analytical approach that is appropriate for the question being investigated and the data to be analyzed.
When analyzing data from a probability sample, analytical methods that ignore the survey design can be appropriate, provided that sufficient model conditions for analysis are met. (See Binder and Roberts, 2003.) However, methods that incorporate the sample design information will generally be effective even when some aspects of the model are incorrectly specified.
Assess whether the survey design information can be incorporated into the analysis and if so how this should be done such as using design-based methods. See Binder and Roberts (2009) and Thompson (1997) for discussion of approaches to inferences on data from a probability sample.
See Chambers and Skinner (2003), Korn and Graubard (1999), Lehtonen and Pahkinen (1995), Lohr (1999), and Skinner, Holt and Smith (1989) for a number of examples illustrating design-based analytical methods.
For a design-based analysis consult the survey documentation about the recommended approach for variance estimation for the survey. If the data from more than one survey are included in the same analysis, determine whether or not the different samples were independently selected and how this would impact the appropriate approach to variance estimation.
The data files for probability surveys frequently contain more than one weight variable, particularly if the survey is longitudinal or if it has both cross-sectional and longitudinal purposes. Consult the survey documentation and survey experts if it is not obvious as to which might be the best weight to be used in any particular design-based analysis.
When analyzing data from a probability survey, there may be insufficient design information available to carry out analyses using a full design-based approach. Assess the alternatives.
Consult with experts on the subject matter, on the data source and on the statistical methods if any of these is unfamiliar to you.
Having determined the appropriate analytical method for the data, investigate the software choices that are available to apply the method. If analyzing data from a probability sample by design-based methods, use software specifically for survey data since standard analytical software packages that can produce weighted point estimates do not correctly calculate variances for survey-weighted estimates.
It is advisable to use commercial software, if suitable, for implementing the chosen analyses, since these software packages have usually undergone more testing than non-commercial software.
Determine whether it is necessary to reformat your data in order to use the selected software.
Include a variety of diagnostics among your analytical methods if you are fitting any models to your data.
- Data sources vary widely with respect to missing data. At one extreme, there are data sources which seem complete - where any missing units have been accounted for through a weight variable with a nonresponse component and all missing items on responding units have been filled in by imputed values. At the other extreme, there are data sources where no processing has been done with respect to missing data. The work required by the analyst to handle missing data can thus vary widely. It should be noted that the handling of missing data in analysis is an ongoing topic of research.
Refer to the documentation about the data source to determine the degree and types of missing data and the processing of missing data that has been performed. This information will be a starting point for what further work may be required.
Consider how unit and/or item nonresponse could be handled in the analysis, taking into consideration the degree and types of missing data in the data sources being used.
Consider whether imputed values should be included in the analysis and if so, how they should be handled. If imputed values are not used, consideration must be given to what other methods may be used to properly account for the effect of nonresponse in the analysis.
If the analysis includes modelling, it could be appropriate to include some aspects of nonresponse in the analytical model.
Report any caveats about how the approaches used to handle missing data could have impact on results
Interpretation of results
Since most analyses are based on observational studies rather than on the results of a controlled experiment, avoid drawing conclusions concerning causality.
When studying changes over time, beware of focusing on short-term trends without inspecting them in light of medium-and long-term trends. Frequently, short-term trends are merely minor fluctuations around a more important medium- and/or long-term trend.
Where possible, avoid arbitrary time reference points. Instead, use meaningful points of reference, such as the last major turning point for economic data, generation-to-generation differences for demographic statistics, and legislative changes for social statistics.
Presentation of results
Focus the article on the important variables and topics. Trying to be too comprehensive will often interfere with a strong story line.
Arrange ideas in a logical order and in order of relevance or importance. Use headings, subheadings and sidebars to strengthen the organization of the article.
Keep the language as simple as the subject permits. Depending on the targeted audience for the article, some loss of precision may sometimes be an acceptable trade-off for more readable text.
Use graphs in addition to text and tables to communicate the message. Use headings that capture the meaning (e.g. "Women's earnings still trail men's") in preference to traditional chart titles (e.g."Income by age and sex"). Always help readers understand the information in the tables and charts by discussing it in the text.
When tables are used, take care that the overall format contributes to the clarity of the data in the tables and prevents misinterpretation. This includes spacing; the wording, placement and appearance of titles; row and column headings and other labeling.
Explain rounding practices or procedures. In the presentation of rounded data, do not use more significant digits than are consistent with the accuracy of the data.
Satisfy any confidentiality requirements (e.g. minimum cell sizes) imposed by the surveys or administrative sources whose data are being analysed.
Include information about the data sources used and any shortcomings in the data that may have affected the analysis. Either have a section in the paper about the data or a reference to where the reader can get the details.
Include information about the analytical methods and tools used. Either have a section on methods or a reference to where the reader can get the details.
Include information regarding the quality of the results. Standard errors, confidence intervals and/or coefficients of variation provide the reader important information about data quality. The choice of indicator may vary depending on where the article is published.
Ensure that all references are accurate, consistent and are referenced in the text.
Check for errors in the article. Check details such as the consistency of figures used in the text, tables and charts, the accuracy of external data, and simple arithmetic.
Ensure that the intentions stated in the introduction are fulfilled by the rest of the article. Make sure that the conclusions are consistent with the evidence.
Have the article reviewed by others for relevance, accuracy and comprehensibility, regardless of where it is to be disseminated. As a good practice, ask someone from the data providing division to review how the data were used. If the article is to be disseminated outside of Statistics Canada, it must undergo institutional and peer review as specified in the Policy on the Review of Information Products (Statistics Canada, 2003).
If the article is to be disseminated in a Statistics Canada publication make sure that it complies with the current Statistics Canada Publishing Standards. These standards affect graphs, tables and style, among other things.
As a good practice, consider presenting the results to peers prior to finalizing the text. This is another kind of peer review that can help improve the article. Always do a dry run of presentations involving external audiences.
Refer to available documents that could provide further guidance for improvement of your article, such as Guidelines on Writing Analytical Articles (Statistics Canada 2008 ) and the Style Guide (Statistics Canada 2004)
Main quality elements: relevance, interpretability, accuracy, accessibility
An analytical product is relevant if there is an audience who is (or will be) interested in the results of the study.
For the interpretability of an analytical article to be high, the style of writing must suit the intended audience. As well, sufficient details must be provided that another person, if allowed access to the data, could replicate the results.
For an analytical product to be accurate, appropriate methods and tools need to be used to produce the results.
For an analytical product to be accessible, it must be available to people for whom the research results would be useful.
Binder, D.A. and G.R. Roberts. 2003. "Design-based methods for estimating model parameters." In Analysis of Survey Data. R.L. Chambers and C.J. Skinner (eds.) Chichester. Wiley. p. 29-48.
Binder, D.A. and G. Roberts. 2009. "Design and Model Based Inference for Model Parameters." In Handbook of Statistics 29B: Sample Surveys: Inference and Analysis. Pfeffermann, D. and Rao, C.R. (eds.) Vol. 29B. Chapter 24. Amsterdam.Elsevier. 666 p.
Chambers, R.L. and C.J. Skinner (eds.) 2003. Analysis of Survey Data. Chichester. Wiley. 398 p.
Korn, E.L. and B.I. Graubard. 1999. Analysis of Health Surveys. New York. Wiley. 408 p.
Lehtonen, R. and E.J. Pahkinen. 2004. Practical Methods for Design and Analysis of Complex Surveys.Second edition. Chichester. Wiley.
Lohr, S.L. 1999. Sampling: Design and Analysis. Duxbury Press. 512 p.
Skinner, C.K., D.Holt and T.M.F. Smith. 1989. Analysis of Complex Surveys. Chichester. Wiley. 328 p.
Thompson, M.E. 1997. Theory of Sample Surveys. London. Chapman and Hall. 312 p.
Statistics Canada. 2003. "Policy on the Review of Information Products." Statistics Canada Policy Manual. Section 2.5. Last updated March 4, 2009.
Statistics Canada. 2004. Style Guide. Last updated October 6, 2004.
Statistics Canada. 2008. Guidelines on Writing Analytical Articles. Last updated September 16, 2008.
I. Organization and Approach
For most research paper formats in the social and behavioral sciences, there are two possible ways of presenting and organizing the results. Both approaches are appropriate in how you report your findings, but use only one format or the other.
- Present a synopsis of the results followed by an explanation of key findings. For example, you may have noticed an unusual correlation between two variables during the analysis of your findings. It is correct to point this out in the results section. However, speculating as to why this correlation exists, and offering a hypothesis about what may be happening, belongs in the discussion section of your paper.
- Present a result and then explain it, before presenting the next result then explaining it, and so on, then end with an overall synopsis. This is more common in longer papers because it helps the reader to better understand each finding. This is also the preferred approach if you have multiple results of equal significance. In this model, it is helpful to provide a brief conclusion that ties each of the findings together and provides a narrative bridge to the discussion section of the your paper.
NOTE: Just as the literature review should be arranged under conceptual categories rather than systematically describing each source, organize your findings under key themes related to addressing the research problem. This can be done under either format noted above [i.e., a thorough explanation of the results] or a sequential description and explanation of each key finding.
In general, the content of your results section should include the following:
- An Introductory context for understanding the results by restating the research problem underpinning your study. This is useful in orientating the reader's focus back to the research after reading about the methods of data gathering and analysis.
- Inclusion of non-textual elements, such as, figures, charts, photos, maps, tables, etc. to further illustrate key findings, if appropriate. Rather than relying entirely on descriptive text, consider the ways your findings can be presented visually. This is a helpful way of condensing a lot of data into one place that can then be referred to in the text. Consider using appendices if there is a lot of non-textual elements.
- A systematic description of your results, highlighting for the reader observations that are most relevant to the topic under investigation [remember that not all results that emerge from the methodology used to gather information may be related to answering the "So What?" question]. Do not confuse observations with interpretations; observations in this context refers to highlighting important findings you discovered through a process of reviewing prior literature and gathering data.
- The page length of your results section is guided by the amount and types of data to be reported. However, focus only on findings that are important and related to addressing the research problem. It is not uncommon to have unanticipated results that are not relevant to answering the research question, and this is not to say that you don't acknowledge tangential findings, but spending time describing them only clutters your overall results section.
- A short paragraph that concludes the results section by synthesizing the key findings of the study. Highlight the most important findings you want readers to remember as they transition into the discussion section. This is particularly important if, for example, there are many results to report, the findings are complicated or unanticipated, or they are impactful or actionable in some way [i.e., able to be acted upon in a feasible way applied to practice].
NOTE: Use the past tense when referring to your results. Reference to findings should always be described as having already happened because the method of gathering data has been completed.
III. Problems to Avoid
When writing the results section, avoid doing the following:
- Discussing or interpreting your results. Save all this for the next section of your paper, although where appropriate, you should compare or contrast specific results to those found in other studies [e.g., "Similar to Smith , one of the findings of this study is the strong correlation between motivation and academic achievement...."].
- Reporting background information or attempting to explain your findings. This should have been done in your Introduction section, but don't panic! Often the results of a study point to the need for additional background information or to explain the topic further, so don't think you did something wrong. Revise your introduction as needed.
- Ignoring negative results. If some of your results fail to support your hypothesis, do not ignore them. Document them, then state in your discussion section why you believe a negative result emerged from your study. Note that negative results, and how you handle them, offer you the opportunity to write a more engaging discussion section, therefore, don't be afraid to highlight them.
- Including raw data or intermediate calculations. Ask your professor if you need to include any raw data generated by your study, such as transcripts from interviews or data files. If raw data is to be included, place it in an appendix or set of appendices that are referred to in the text.
- Be as factual and concise as possible in reporting your findings. Do not use phrases that are vague or non-specific, such as, "appeared to be greater or lesser than..." or "demonstrates promising trends that...."
- Presenting the same data or repeating the same information more than once. If it is important to highlight a particular finding, you will have an opportunity to emphasize its significance in the discussion section.
- Confusing figures with tables. Be sure to properly label any non-textual elements in your paper. Don't call a chart an illustration or a figure a table. If you are not sure, go here.
Annesley, Thomas M. "Show Your Cards: The Results Section and the Poker Game." Clinical Chemistry 56 (July 2010): 1066-1070; Bavdekar, Sandeep B. and Sneha Chandak. "Results: Unraveling the Findings." Journal of the Association of Physicians of India 63 (September 2015): 44-46; Burton, Neil et al. Doing Your Education Research Project. Los Angeles, CA: SAGE, 2008; Caprette, David R. Writing Research Papers. Experimental Biosciences Resources. Rice University; Hancock, Dawson R. and Bob Algozzine. Doing Case Study Research: A Practical Guide for Beginning Researchers. 2nd ed. New York: Teachers College Press, 2011; Introduction to Nursing Research: Reporting Research Findings. Nursing Research: Open Access Nursing Research and Review Articles. (January 4, 2012); Kretchmer, Paul.Twelve Steps to Writing an Effective Results Section. San Francisco Edit; Ng, K. H. and W. C. Peh. "Writing the Results." Singapore Medical Journal 49 (2008): 967-968; Reporting Research Findings. Wilder Research, in partnership with the Minnesota Department of Human Services. (February 2009); Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College; Schafer, Mickey S. Writing the Results. Thesis Writing in the Sciences. Course Syllabus. University of Florida.