data Mining

1. Competitive Auctions on eBay.com. The file eBayAuctions.xls contains information on 1972 auctions transacted on eBay.com during May – 2004. The goal is to use these data to build a model that will classify competitive auctions from non-competitive ones. A competitive auction is defined as an auction with at least two bids placed on the item being auctioned. The data include variables that describe the item (auction category), the seller (his/her eBay rating), and the auction terms that the seller selected (auction duration, opening price, currency, day-of-week of auction close). In addition, we have the price at which the auction closed. The goal is to predict whether or not the auction will be competitive (textbook reference – 9.1).
Data Preprocessing. Create dummy variables for the categorical predictors. These variable include Category (18 categories), Currency (USD, GPB, Euro), EndDay (Monday-Sunday), and Duration (1, 3, 5, 7, or 10 days). Split the data into training and validation sets using a 60% : 40% ratio.
a. Fit a classification tree using all predictors (using the Best-Pruned tree). To avoid overfitting, set the minimum number of observations in a leaf node to 50. Also, set the maximum number of levels to be displayed at seven (the maximum allowed in XLminer). To remain within the limitation of 30 predictors, combine some of the categories of categorical predictors. Write down the results in terms of rules.
b. Is this model practical for predicting the outcome of a new auction? (Hint: Consider Closing Price)
c. Describe the expected and unexpected results from the rules in Part a?
d. Fit another classification tree (using the Best-Pruned tree, with a minimum number of observations per leaf node = 50 and maximum allowed number of displayed levels), this time include only predictors that can be used for predicting the outcome of a new auction. Describe the resulting tree in terms of rules. Make sure to report the smallest set of rules required for classification.
e. Examine the lift chart and classification table for the tree. What can you say about the predictive performance of this new tree/model compared to the tree/model in Part a?
f. Based on the last tree, what can you conclude from these data about the chances of an auction obtaining at least two bids and its relationship to the auction settings set by the seller (i.e., duration, opening price, ending day, currency)? What strategy would you recommend

Order from us and get better grades. We are the service you have been looking for.