Analytic and Computing essay -Computer science

Analytic and Computing essay -Computer science Analytics and Computing for Industrial Engineers

1. Supposed that you need to apply supervised learning to build an accurate prediction model of
the tuition from other variables from the dataset using by Lab 3 and 4. Please provide your steps
for model building in details, including data preprocessing, model selection and building, baseline
model and performance evaluation.
2. Should we strive for the highest possible accuracy with the training set? Why or why not?
3. When should the train data set be balanced? When should the test data set be balanced?
4. How does two-fold cross validation work? List the major steps.
5. Suppose we are running a fraud classification model, with a training set of 10,000 records of
which only 500 are fraudulent. How many fraudulent records need to be resampled if we would
take the proportion of fraudulent records in the balanced data set to be 15%?
6. Some of the problems below could be addressed using either a supervised learning algorithm
or an unsupervised learning algorithm. Which algorithm would you adopt, supervised or
unsupervised learning? and why?
(a) Given data on how 1000 medical patients respond to an experimental drug (such as
effectiveness of the treatment, side effects, etc.), discover whether there are different
categories or “types” of patients in terms of how they respond to the drug, and if so what
these categories are.
(b) Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes
over the next 10 years.
2
(c) Given a large dataset of medical records from patient’s suffering from heart disease, try
to learn whether there might be different clusters of such patients for which we might
tailor separate treatments.
(d) Have a computer examine an audio clip of a piece of music, and classify whether or not
there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only
musical instruments (and no vocals).
7. Indicate the region of underfitting and overfitting respectively in the following plot.
8. The following figure shows three blue curves that fit the scatter points.
(a) By your observation, which regression model is the best fit? And why?
(b) From the perspective of bias and variance, give your comments on the three
regression models respectively.
3
9. An effective way to diagnose the bias and variance of a prediction model is Learning Curve. A
learning curve is a plot of the training and cross-validation (testing) error as a function of the
number of training points. There are two learning curves as shown below.
a) Please specify which one with high bias and which one with high variance. Give your
reasons.
b) For high bias model, what actions can be token? And how about the actions when it comes
to high variance?
(a)
b) c)

Order from us and get better grades. We are the service you have been looking for.