Search Close Search
Search Close Search
Page Menu

Guidelines for constructing data sets for analysis

Advice on constructing data sets in Excel

  •      Use only one row for column headings; do not use entire questions for column headings; use short names
  • Put only one item of information in each column (e.g., 120.2 not 120.2 mg/dl (put mg/dl in a second column if needed)
  • If the column is numeric, do not use NA or UNK for Not Available or Unknown (either leave blank or use a numeric unknown code – like 9999 or -1)
  • Make sure that cells to record numbers have a numeric format in Excel
  • While you can use colors or different fonts to highlight numbers, those do not translate into any of the statistical packages. If you need to highlight something, create another column for an “*”
  • While you can include summary statistics in your spreadsheet, please keep the data area as purely data – we will recalculate the statistics
  • Please include only related items in a row. If you have multiple dates for multiple measurements, it is better to create either separate rows for each date or separate spreadsheets for the different measurements
  • Always include ID information on each row for each patient/participant
    • Never include PHI on your spreadsheet
    • Assign a “study ID” to each patient and use only that on the file
    • If there are multiple rows of data for each patient or multiple spreadsheets for different measures, make sure that each row has the patient Study ID, date, and any other information that is needed to uniquely identify that information
  • Always create a code book of information for your data set
    • Complete name for each column [e.g., TC = Total Cholesterol (mg/dl)]
    • Meaning of each code used in each column (e.g., 1 = Low, 2 = Normal, 3 = High, 9 = Unknown/not available)
  • Please give your spreadsheet a descriptive name and include your name in the Properties of the spreadsheet (e.g., Cholesterol_Data_Jones_05012011.xlsx, not Data.xlsx 

Advice on constructing data sets in SPSS, SAS, Stata

  • Please make sure that variable labels and format labels are in place
  • As an alternative to Excel, please consider something like SPSS as a convenient way to record data and the associated labels and formats. Conversion from SPSS to SAS is very easy with very little extra work. Conversion from Excel to SAS is very time-consuming and can require additional programming.  The very best solution is REDCap, but additional training is involved.

Click here to return to the QMC main menu