Fall 2017 Data Project

Introduction and data description:  Mindset Matters

In 2007 a Harvard psychologist recruited 75 female maids working in different hotels to participate in a study.  She informed 41 maids (randomly chosen) that the work they do satisfies the Surgeon General’s recommendations for an active lifestyle (which is true), giving them examples showing that their work is good exercise.  The other 34 maids were told nothing.  Various characteristics such as weight, body mass (BMI), blood pressure etc were recorded for each subject at the start of the experiment and again four weeks later.

Note that you in fact have the dataset available in multiple formats, as a .txt file, as a .csv file, as a .mtw file and as a .mpjx file.  This is meant to provide you with flexibility – if you are most comfortable using Minitab Express, use the .mpjx file.  If you don’t want to use Minitab Express, you can use another statistical software (the .mtw file opens in Minitab 17 or 18), or you could even use Excel or StatKey to compute summary statistics from the .csv or .txt files by uploading them to StatKey and experimenting to make graphical and numerical summaries

This is a very interesting dataset, and there are many questions we could ask.  The questions below are all explicitly matched to the textbook section that covers the corresponding topic/concept/procedure.  The textbook section is included in parentheses for each question.

Consider first how the data was gathered. 

(1.3) Is this an observational study, a randomized comparative experiment or a matched-pairs experiment?

(1.1) Listed below are a few of the variables from the dataset.  Classify each of them as nominal, ordinal, discrete, or continuous.  Go ahead and look at the dataset to see what kind of observations are entered for each variable.

  • Treatment (informed active, uninformed)
  • Age
  • Weight _Before
  • Weight_Change
  • Body_Condition (underweight, normal, overweight, obese)
  • Waist_To_Hip_Ratio_Before

A complete list of variable descriptions is included at the bottom of this document for your reference.

(1.2) Will the results of this study apply to male maids? Why or why not?

 

(1.2) Do you think this is a simple random sample?  Use this as an opportunity to review the definition of a simple random sample.

 

 

Next we do some exploration and inference for all women included in the study, without considering whether or not they were informed that they met the requirements for an active lifestyle.

(2.1) What type of graphical summary could we use for the variable Body_Condition?  Which are the most and least common body conditions when considering all women in our sample?

 

 

(2.2) What is the shape of the variable ‘Weight_Before’,  the weights from before the study began? Is it symmetric, left-skewed, or right-skewed?

 

(2.2) What is the mean BMI measured before the study began? What is the standard deviation for these BMI?

 

 

(2.3) If a female maid has a BMI of 26, how many standard deviations above or below the mean is her BMI?  (hint – this is a z-score!).

 

 

(2.3) In what interval do we expect 95% of BMI measurements to be for female maids? (hint – use the 95% rule!).

 

 

(2.5, 2.6) Which of the variables listed below is the best predictor of BMI_Before, as measured by the highest R-squared?

  • Age
  • Weight_Before
  • Fat_Percent_Before
  • Waist_To_Hip_Ratio_Before

(9.1) Provide an interpretation in context of the R-squared for BMI_Before and the variable you selected above.

 

(6.2) Find and interpret a 95% confidence interval for the population mean weight of maids (use the Weight_Beforevariable).

Let’s compare the two groups (informed active and uninformed) at baseline (measurements taken at the beginning of the four-week period).

How many women were in each treatment group?

 

(2.2—2.4) Make boxplots comparing the two groups’ weight at the beginning of the study and compare them.  For example, compare their medians, IQR, minimum, maximum, and number of outliers.

 

 

(6.4) Is there a significant difference in weights before the study when comparing the treatment groups?  State the null and alternative hypothesis, find the p-value, and include both a generic conclusion and a conclusion in context.

 

 

 

(6.4)  Is there a significant difference in BMI before the study when comparing the treatment groups?  State the null and alternative hypothesis, find the p-value, and include both a generic conclusion and a conclusion in context.

 

 

 

(7.2) Would we be able to perform a chi-square test to determine if the treatment is significantly associated with the maids’ body conditions at the beginning of the study? If yes, state the null and alternative hypothesis, find the p-value, and include both a generic conclusion and a conclusion in context. If no, determine why not.

 

 

 

 

 

 

 

 

Finally, we will investigate the differences between the treatments.

(6.4) Do maids who are informed that their job satisfies active lifestyle requirements lose significantly more weight than maids who are uninformed?  Note that this is equivalent to asking if the weight change for informed maids is less than that for uninformed maids – it’s kind of confusing.

Set up the hypotheses, calculate the p-value (use any method you’d like, but software will probably be fastest!), interpret your p-value, and report your conclusions both generic and in context.

 

 

 

(6.4) Do maids who are informed that their job satisfies active lifestyle requirements lower their BMI farther than maids who are uninformed?  Again note that this is equivalent to asking if the BMI change for informed maids is less than that for uninformed maids.

Again, perform the complete hypothesis test, including a p-value interpretation and both generic and in context conclusions.

 

 

 

 

 

(6.5) If we consider only the women who were uninformed and look at the difference between their after and before body fat percentage measurements, what would our parameter of interest be?

 

(3.4) Use StatKey to create a 95% confidence interval for this mean difference using the percentile method.  To do this, copy the first 34 rows of the Fat_percent_change column in Minitab, and past it into the ‘edit data’ window in StatKey.  You should be using the ‘create a bootstrap confidence interval for one mean’ page.

Can we claim at 95% confidence that women who are uninformed lower their body fat percentage over a four week period?

Are the conditions for using distribution-based inference?  If it is, what would the correct t-multiplier be to build the 95% confidence interval?

 

 

 

(3.4) Now use StatKey to create a 95% confidence interval for the mean difference of body fat percentage just for the women who were informed that they led an active lifestyle.  Again, move the data into StatKey by copy and pasting rows 35—75 into the edit data window.  Be sure to correctly select whether or not your data includes a header row.

Can we claim at 95% confidence that women who are informed that they lead an active lifestyle lower their body fat percentage over a four week period?

 

Make sure you look at the StatKey output and know what each dot represents, both in the original sample, the bootstrap distribution, and the bootstrap sample.

 

Summary of variables:

Variable Name Variable Description
Treatment Treatment group: ‘uninformed’ for participants who were told nothing, ‘informed active’ for participants who were told the work they do satisfies the Surgeon General’s recommendations for an active lifestyle.
Age Age in years
Body_Condition  ‘underweight’, ‘normal’, ‘overweight’, ‘obese’
Weight_Before Weight (in pounds) measured at the beginning of the study
Weight_After Weight (in pounds) measured at the end of the study
Weight_Change Weight change (in pounds), calculated using after – before
BMI_Before Body Mass Index (BMI) measured at the beginning of the study
BMI_After Body Mass Index (BMI) measured at the end of the study
BMI_Change Change in Body Mass Index (BMI), calculated using after – before
Fat_Percent_Before Body fat percentage measured at the beginning of the study
Fat_Percent_After Body fat percentage measured at the end of the study
Fat_Percent_Change Change in body fat percentage, calculated using after – before
Waist_To_Hip_Ratio_Before Waist circumference divided by hip circumference as measured at the beginning of the study
Waist_To_Hip_Ratio_After Waist circumference divided by hip circumference as measured at the beginning of the study
Waist_To_Hip_Ratio_Change Change in Waist to Hip Ratio, calculated using after – before
Systolic_BP_Before Systolic Blood Pressure, measured at the beginning of the study
Systolic_BP_After Systolic Blood Pressure, measured at the end of the study
Systolic_BP_Change Change in systolic blood pressure, calculated using after – before
Diastolic_BP_Before Diastolic Blood Pressure, measured at the beginning of the study
Diastolic_BP_After Diastolic Blood Pressure, measured at the end of the study
Diastolic_BP_Change Change in diastolic blood pressure, calculated using after – before

 

Order from us and get better grades. We are the service you have been looking for.