Python

Python
verview (data source: kaggle)
titanic.csv

Use five-fold mode and separate the data into training and testing sets. The training set should be used to build your machine learning models. Your model will be
based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

The test set should be used to see how well your model performs on unseen data. You should report your result in terms of average accuracy.
Data Dictionary

Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings/spouses
parch # of parents/children
fare Passenger fare
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes
pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower

age: Age is fractional if less than 1.

sibsp: The dataset defines family relations in this way…
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way…
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.

Analysis requirement:
1. How is the survival chance related to the gender?
2. How is the survival chance related to age?
3. How is the survival chance related to socio-economic status?
4. What models do you chose for the prediction and why?
5. What are the first three most important factors related to the survival chance?
6. What is your average prediction accuracy?

Submission requirement:
Use the submission script and submit your project by 05/03/17 11:59pm.
You should format your project report by using jupyter notebook

Order from us and get better grades. We are the service you have been looking for.