Data mining and analytics
This assignment is designed to understand more about clustering before we used it for analysis when we move to regressions. For this assignment, we will work with the jeans.dta file, which has 689 stores that sell four different types of jeans: original, stretch, leisure, and fashion. As an internal auditor of the company, you would like to understand whether the promotions rolled out in several groups of stores are appropriate. To do so, we want to do some clustering with this data.
The fields are:
StoreID |
Store identification number |
Fashion |
The number of pairs of “fashion” style jeans sold last month |
Leisure |
The number of pairs of “leisure” style jeans sold last month |
Stretch |
The number of pairs of “stretch” style jeans sold last month |
Original |
The number of pairs of “original” style jeans sold last month |
TotalSold |
The total number of jeans sold last month |
Instructions:
- Perform a hierarchical clustering by choosing the four styles as selected variables, normalized input data with the single linkage method, set the number of clusters as 5
- How many clusters do you have in the output?
- Explain the dendrogram
- Go back to the raw data, perform the k-means clustering by choosing the four styles as selected variables, normalized input data with 5 clusters 50 iterations, random start at 1
- How many clusters do you have?
- Which cluster is the largest (cluster number)?
- How many stores are in the largest cluster?
- Which cluster has the largest original jeans sales?
- Do you think 5-cluster is a meaningful way to categorize these stores? Why or why not?
- Now rerun step 2, everything is the same but now with 10 clusters.
- How many clusters do you have?
- Which cluster is the largest (cluster number)?
- How many stores are in the largest cluster?
- Which cluster has the largest original jeans sales?
- Do you think 10-cluster is a meaningful way to categorize these stores? Why or why not?
Order from us and get better grades. We are the service you have been looking for.