Statistics

 

 M7 Assignment and Discussion

We will obtain a new data set for this module that we will continue to use for the remainder of the term. All external links below will open in a new window.

  • Visit the United States Environmental Protection Agency’s download daily data page at http://www.epa.gov/airdata/ad_data_daily.html.
  • Select the Pollutant PM2.5.
  • Select the Year 2015.
  • Select a City or County near where you live.
  • For the Monitor Site category select only one of the sites should multiple monitoring sites exist in your geographic area.
  • For the category Exceptional Events, select to include the exceptional events data.
  • Click on Get Data.
  • In a few moments a link will appear at the bottom of that same page.  Click on that link to download the data to your own computer.  Save the data file in an Excel format (.xls or .xlsx).  You may have to use the “Save As” feature in Excel to do this.

Time to check out your data!

Take a look at the names of the columns.  We will be analyzing fine particulate pollution referred to as PM2.5.  The PM2.5 concentrations are given (Daily Mean PM2.5 Concentration) as well as the air quality index for that pollutant (Daily_AQI_Value).  Please take a few minutes right now to learn more about the Air Quality Index (AQI) which is calculated for four major air pollutants regulated by the Clean Air Act: AQI Brochure.

Look in your data set for the AQS_Parameter_Code column and scroll down this column.  There are two different codes you might see here, namely 88101 and 88502.  These denote the reference method used for measuring mass concentrations of PM2.5.  Code 88101 denotes a single filter 24 hour balanced model PQ200 PM2.5 sampler with WINS, while Code 88502 denotes an R&P model 2025 PM2.5 sequential air sampler with VSCC.  Each is considered an acceptable method for collecting the PM2.5 particulate measurements.  If your data set has both measurement types, you will want to delete all rows of (either) one of these.

Please edit the name of the sheet with the data on it, giving it the name “PM2.5 Dataâ€Â.  Create a second sheet and give that sheet/tab the name “Mod 7â€Â (as we will be working in this same file for the remainder of the term).  For the activities of Module 7 please work in the Mod 7 tab you have created.Â

Use Excel’s COUNTIF function to find the number of days with a PM2.5 AQI value over 50.  On such days, the air quality conditions are considered not to be “Goodâ€Â per the EPA’s Air Quality Index. Use this to find the proportion of days for which the air quality was not good in your sample (phat). A success will be a day in which the PM2.5 AQI is above 50.

We first must check that the conditions of the Central Limit Theorem apply for estimating proportions in a population.

  1. The Random and Independent condition is met by the EPA’s collection agencies.Â
  2. The Large Sample condition must be checked. If phat is the proportion of days with AQI above 50, then we need to have both n*phat and n*(1-phat) greater than or equal to 10.Â
  3. The Big Population condition is met for our data.Â

When these three conditions are met, we can use the Normal distribution to find probabilities concerning the sample proportion.  If your data set does not meet the Large Sample condition, obtain a new data set for a different city our county near where you live and then check these conditions again.

Clearly label cells with the names and values for the following:  number of successes in sample, sample size, sample proportion of successes, z value multiplier for 95% confidence interval, the estimated standard error and the confidence interval.  By hand calculate the estimated standard error and the confidence interval (using a calculator to do the math) using formulas 7.2 from our text.  Confirm your results using StatCrunch, inserting your StatCrunch results into your worksheet.

We are 95% confident that the actual proportion of days with AQI that is not “Goodâ€Â is within the confidence interval we found here.

Now find the 90% confidence interval using any method (by hand or with StatCrunch).  How is the 90% confidence interval different than the 95% confidence interval?  Why is this so?

Order from us and get better grades. We are the service you have been looking for.