PHASE 1 – January 26, 2018
Examples of questions where multiple linear regression is not applicable:
Examples of research questions might be:
PHASE 2 – February 9, 2018
2b. Identify the source or sources of your data. When using cross-sectional data ALL data must be pulled from the same time period but can be pulled from different sources. Identify the source and time period for each data element.
2c. Data sets much have a minimum of 50 observations, but not to exceed 100 observations.
2d. Include a definition of each data element, data source, and period for which data element is captured.
For example: BMI = Body Mass Index, is measure by body mass divided by the square of the individual’s height. Data source: Health and Human Services, www.hhs.gov, fiscal year 2010, state level data.
Example #2: Parental income = is combined household annual income. Data Source: www.bls.gov, state level data for 2012.
IMPORTANT: Your dependent variable cannot be binary, categorical/ranking, or strictly a discrete variable. Your dependent variable must be a continuous variable. Your independent variables can be all continuous or a mix of discrete and continuous. Your independent variables CANNOT be strictly discrete variables.
Submission of Phases I – II: Must be submitted in WORD format. The word document should be attached to your email. Do not send Phase I and II as part of the body of an email. As part of the Phase II submission include Phase I. Definition of variables (see Phase II ‘2b’) should include time period data is captured, explicit definition of variable measurement or how the variable will be transformed for inclusion in Phase IV. For example:
INCORRECT: Weight –‘ how much a person weighs.’
CORRECT: Weight – ‘weight as measured in pounds’, data source Center for Disease Control, www.cdc.gov, individual level data, 2010.
INCORRECT: Unemployment rate – ‘the unemployment rate’
CORRECT: Unemployment rate – ‘the number of people unemployed per 1000, data source Bureau of Labor Statistics, www.bls.gov, state level data 2011.
Transformed Variables (see Phase 3 for example): Categorical variables such as gender, race, color, manufacturing sector, team, geographic region, as examples, need to be transformed into quantitative variables. For example, gender is captured as M or F, this will need to be transformed into 0,1 variable.
Example: M = 0 and F = 1 or M =1 and F = 0. The definition should read if M then M = 0 and if F then F=1
Example: Let’s assume your data contains 4 geographic regions, North, South, East, and West then for Phase II you will need to define the states that comprise the North geographic region, similarly for east, west, and south. In Phase III, you will need to transform these variables into columns of 0,1 dummy variables. See examples below.