Code for Data Analysis| statistics
Answer all questions and complete all tasks thorough the document
and follow the directions
attached are the documents and download this one :
http://www.wikiupload.com/UGSP0VJRQYEXZA1
Code for Data Analysis:
In R go to File< New Script (New Document on MAC) – A new script window will pop up. You should write all of your code in a script window and not directly in the console.
The R code can be uploaded directly into R by copying and pasting everything below R Code DA4 into a script window.
Note: Any time you see # this means that R will not read what follows. I will use this to make comments about the following command.
R Code DA4
####################################################################
# Number 1
#####################################################################
# Upload data set called EPAFE2017Data.csv. Let’s call it fueldata.
fueldata = read.csv(file.choose(), header= TRUE)
head(fueldata)
# Make a box plot to compare the combined fuel efficiency of cars made by American companies
# and international companies for 2017.
boxplot(CombFE~International, data = fueldata, horizontal = TRUE, col= c(“coral”, “lightblue”), main = “2017 EPA Estimated Combined Fuel Efficiency:
American vs International Car companies”, xlab = “Miles per Gallon”)
# Get Summary Statistics
# aggregate() Calculates Sample Means, Sample Standard Deviations and Sample Sizes between groups.
aggregate(CombFE~International, data = fueldata, mean) # Means
aggregate(CombFE~International, data = fueldata, sd) # Standard Deviations
aggregate(CombFE~International, data = fueldata, length) # Sample Size
# Perform a Two Sample T Test, with CI level 99%
t.test(fueldata$CombFE~fueldata$International, conf.level=0.99)
#####################################################################
# OPTIONAL Number 2 Data Table
#####################################################################
# Create a 2X2 table of International vs Guzzler Status.
table(fueldata$International, fueldata$Guzzler)
#####################################################################
# Number 3 ANOVA
#####################################################################
# Create a boxplot of Combined Fuel Efficiency vs Drive Types
boxplot(fueldata$CombFE~fueldata$Drive, horizontal = TRUE, col= rainbow(5), main = “Estimated Fuel Efficiency
for 2017 Vehicles among Drive Types”, cex.axis = 0.7, xlab = “MPG”)
# Note if you can’t see the categories of drive expand your graph.
# Get Means, Sds and Sample Sizes for each
aggregate(CombFE~Drive, data = fueldata, mean) # Means
aggregate(CombFE~Drive, data = fueldata, sd) # Standard Deviations
aggregate(CombFE~Drive, data = fueldata, length) # Sample Sizes for each
# Test whether any means differ from each other with an Overall F test.
mod = aov(fueldata$CombFE~fueldata$Drive)
summary(mod)
# IF the F statistic in the ANOVA is significant…
# Perform a multiple comparisons test to see which
# fuel types are significantly different.
# Tukeys multiple comparisons test
TukeyHSD(mod, conf.level = 0.95)
#subset the data to only look at part-time 4 wheel drive vehicles. What type of vehicles are these?
PT4wdonly = subset(fueldata, fueldata$Drive==”Part-time 4-WD”)
# Who makes these vehicles?
PT4wdonly$Division
# What are th ecar names?
PT4wdonly$Carline
# What type of cars are they?
PT4wdonly$CarType
Project does not have any attached files