Data Preparation - Biostatistics
A clean, suitably-structured, and well-documented data set is critical for efficient and accurate statistical analysis. Most commonly, data is imported into statistical analysis programs as a comma delimited text file. For easy and accurate importation of data into statistical software, it is essential that the data adhere to a regular structure with consistent entries.
While it is not required, using REDCap (Research Electronic Data Capture) can greatly simplify data collection and minimize costly and time-consuming data clean-up activities. REDCap is a secure web-based application for building and managing online databases for research and is supported by the CTSC Biomedical Informatics team.
Regardless of the software used to record data, adhering to the following guidelines will facilitate importation of the data into statistical software. In addition, every data set must include a data dictionary that describes each variable and identifies acceptable values. An example of a codebook can be found here. Additional information on data dictionaries is available on the UC Davis REDCap website.
Additional tips for data management are available in the PDF document, “Guidance for Database Developers for Efficient Import to Statistical Software.”
Recommendations for organizing spreadsheet data to reduce errors and facilitate statistical analyses are available in the PDF documents: “Data Organization in Spreadsheets” And “Biostatistics Center Guidelines for Excel and Access”