Statistics
Statistics assignment. Some of the questions require using Stata. See attachments for assignment
1. Consider three education variables whose survey questions and response categories are listed below. What types/forms of variables are these? What do you see as
some challenges to describing the distribution of education (in the target population) using each of these variables? Are there other ways you might gather education
that might be more informative?
“What is the highest degree you have completed?” “How many years of school have you completed? “How far did you get in school?”
1=high school diploma
2=GED
3=associate’s degree
4=bachelor’s degree
5=master’s degree
6=PhD
7=JD
8=MD
9=other (specify) 0 to 25 years 9=9th grade
10=10th grade
11=11th grade
12=12th grade
13=high school diploma or GED
14=some college
15=associate’s degree
16=bachelor’s degree
17=master’s of professional degree
18=doctoral degree
Q1-A: What type of variables are these? 3 pts
Q1-B: What do you see as some challenges to describing the distribution of education (in the target population) using each of these variables? 4 pts
Q1-C: Are there other ways you might gather education that might be more informative? 3 pts
2. What education variables are available in the South African CSG Impact Evaluation dataset?
Q2-A: Identify three of these variables and produce appropriate descriptive statistics for each one. 6 pts
Q2-B: What is limiting about these CSG education measures? 3 pts
3. Using the South African CSG Impact Evaluation dataset, produce a histogram, box plot and summary statistics for the variable (ad3q20a2) that shows the total
amount of expenditures (in Rands) on school fees by households. Describe the patterns you see in these data. How many households are missing information on the amount
of school fees for this variable (hint: use the codebook command and browse the data to assess the cases where no fee amount is recorded)? What do you learn? Does no
response implies no school fees were paid?
Q3-A: Produce a histogram, box plot and summary statistics for the variable (ad3q20a2) that shows the total amount of expenditures (in Rands) on school fees by
households. 3 pts
Q3-B: Describe the patterns you see in these data. 3 pts
Q3-C: How many households are missing information on the amount of school fees for this variable (hint: use the codebook command and browse the data to assess the
cases where no fee amount is recorded)? 4 pts
Q3-D: What do you learn? Does no response imply no school fees were paid? 4 pts
4. Examine the information in Table 6 of the CSG Impact Evaluation Fieldwork Report.
Q4-A: Based on the information reported, where did the fieldwork team have the greatest success in securing completed surveys (in terms of sampling unit and
geography)? Present statistics to explain your answer. 4 pts
Q4-B: Where did they have the lowest success rates? Present statistics to explain your answer. 3pts
5. Look at Table 15 in the CSG Impact Evaluation Fieldwork Report (p. 80).
Q5-A: What is the total number of questionnaires generated in the fieldwork? 3 pts
Q5-B: What fraction of all questionnaires obtained in the fieldwork were invalid? 3pts
Q5-C: What paypoint/province/household type combination generated the largest number of invalid questionnaires? 3pts
Q5-D: For that paypoint/province/household type combination with the largest number of invalid questionnaires, what fraction of total attempted questionnaires did the
invalid questionnaires represent? (Show your math work in your response). 3 pts
6. Open the Los Angeles public school student data (“LA students”) in Stata. Assume that you have the population of students eligible for free extra academic
assistance (as mandated under No Child Left Behind) in the 2012-13 school year. Calculate basic descriptive statistics (including the mean and standard deviation) and
produce a histogram for 2012-13 math test scores in this dataset. Characterize the distribution of test scores (in words) for the LA school district administrator,
keeping in mind what an education leader might want to understand from this distribution.
Q6-A: Calculate basic descriptive statistics (including the mean and standard deviation) and produce a histogram for 2012-13 math test scores in this dataset. 4 pts
Q6-B: Characterize the distribution of test scores (in words) for the LA school district administrator, keeping in mind what an education leader might want to
understand from this distribution.
3 pts
7. Draw a random sample of 5,000 students from the LA students dataset and compute and record the mean and standard deviation of 2012-13 math test scores in your
sample. Then “clear” the data and repeat these steps (i.e., sampling 5,000 students and computing and recording the sample mean and standard deviation of their 2012-
13 math test scores). Also be sure to track the number of observations for each set of statistics recorded. Continue to repeat this step until you have 10 sample
means and standard deviations. 10 pts
8. Follow the same steps in #4, but instead draw 10 samples of 500 students each, computing the same statistics and tracking the number of observations. Repeat
these steps once more to obtain 10 samples of 50 students each.
Q8-A: Follow the same steps in #4, but instead draw 10 samples of 500 students each, computing the same statistics and tracking the number of observations. 5 pts
Q8-B: Repeat these steps once more to obtain 10 samples of 50 students each. 5 pts
9. Now treat each set of sample statistics for a given sample size (5,000, 500 and 50) as a dataset (n=10) and compute the mean and standard deviation for each.
How do these means compare with the mean you calculated in #6 for all students with 2012-13 math test scores in this dataset? How do the three sets of sample standard
deviations compare to each other and to the standard deviation that you calculated in #2 for all students?
Q9-A: Now treat each set of sample statistics for a given sample size (5,000, 500 and 50) as a dataset (n=10) and compute the mean and standard deviation for each.
10 pts.
Q9-B: How do these means compare with the mean you calculated in #6 for all students with 2012-13 math test scores in this dataset? 3 pts.
Q9-C: How do the three sets of sample standard deviations compare to each other and to the standard deviation that you calculated in #2 for all students? 3 pts.
10. If you had been asked to draw sample sizes of 50,000 students each (10 different times), how would you expect the mean and standard deviation of those
resulting statistics to compare to those you already calculated above? Explain your answer. 5 pts.