CSc 120: Pokemon data analysis

CSc 120: Pokemon data analysis

Introduction

This problem involves some simple data analysis and aims to give you some more practice with combining Python data structures in interesting ways: in this case, using two-level dictionaries (i.e., a dictionary of dictionaries). The data, as it happens, is about Pokemon (source: www.kaggle.com). You are to write a program to read in Pokemon data from a file and organize it according to Pokemon type (we will only consider Type 1 for this assignment), then repeatedly read in queries from the user and print out solutions to those queries.

Input Format

The input file, Pokemon.csv, is in CSV format (“comma-separated values”). This is a simple file format typically used for tabular data such as that for spreadsheets, and if you want you can open this file in a program like Excel or libreoffice to view the data in an easier-to-read form.

Any line in the input file that begins with the character ‘#‘ (without quotes) is a comment line that should be ignored for data analysis.

The first line of the input file, which is a comment line, gives the meaning of the various data columns (in this table, the number at the top of each entry gives its position in a row of comma-separated values, e.g., “Attack” is at position 6):

 

0
No.
1
Name
2
Type 1
3
Type 2
4
Total strength
5
HP
6
Attack
7
Defense
8
Special Attack
9
Special Defense
10
Speed
11
Generation
12
Legendary?

Expected Behavior

Write a program, in a file pokemon.py, that behaves as follows.

 

  1. Read in the name of a data file (do not prompt the user). This is a CSV file containing data about Pokemon in the format described above. It can be the full Pokemon.csv data file, but you can also specify other input files that contain more or less information (e.g., a smaller file may be useful for testing or debugging).
  2. Read the data file specified and organize the data into a data structure that collects together information about different Pokemon types (for this assignment we will consider only the Type 1 field for this, and ignore Type 2 since this is not defined for all Pokemon).
  3. Repeatedly read and process queries from the user (see Queries below) until the user enters an empty line. Some examples are given here.

Queries

Your program will read in queries from the user, and for each query, analyze the Pokemon data based on the query and print out the results (see Output Format below). The queries and the corresponding analyses are as follows:

 

User query Program action
Total Compute the Pokemon type(s) that have highest average Total strength.
HP Compute the Pokemon type(s) that have highest average HP.
Attack Compute the Pokemon type(s) that have highest average Attack.
Defense Compute the Pokemon type(s) that have highest average Defense.
SpecialAttack Compute the Pokemon type(s) that have highest average Special Attack.
SpecialDefense Compute the Pokemon type(s) that have highest average Special Defense.
Speed Compute the Pokemon type(s) that have highest average Speed.
(empty line) Terminate query processing
anything else Ignore the query

Note that, in each case, there may be more than one type of Pokemon with the highest average value computed. You should print out information about each of them according to the output format given below.

Matching the queries entered by the user’s with the User query column shown above should be case-insensitive. For example, the user inputs AttackattackATTACK, and AtTaCk should all be processed the same way.

Output Format

For each Pokemon type identified by your analysis, print out the result as follows:

 

print("{}: {}".format(pokemon_type, max_average))

where pokemon_type is the type of Pokemon, and max_average is the average value computed for that Pokemon type for that query (e.g., average total, average HP, average Attack, etc.), which should be equal to the maximum value for that query across all types.

Programming Requirements

  1. Follow the style guidelines for this class. 
  2. Your code should not repeatedly and unnecessarily traverse all the data about all the Pokemon when processing queries. To this end, organize your code and data as follows. 
    A. Data organization.
    Use a two-level dictionary (i.e., a dictionary of dictionaries) to implement your Pokemon database, as explained below:

    • At the top level, information should be grouped by Pokemon type: i.e., all of the information about Pokemon belonging to a particular type should be grouped together. A data structure that will do this efficiently is the dictionary.
    • For each Pokemon type, we have to store information about all of the different Pokemon that belong to that type. Again, this can be done efficiently using a dictionary that maps the Pokemon’s name to its properties (Total strength, Attack, Defense, etc.).

    Additionally, for each Pokemon type you should pre-compute the average values for all of its properties (see Code Organization below). These average values should also be organized as a dictionary keyed by Pokemon type.

    B. Code organization.
    Notice that the Pokemon properties you read in do not change during the computation. This means that the average value for any property for any given Pokemon type will remain the same as well. This, in turn, means that the maximum average values will also not change. Thus, these values can all be computed once and saved, with query processing simply looking up the saved values as needed. This approach is closely related to an speedup technique called memoization.Your code should be organized as follows:

    • After reading in all the Pokemon data: for each Pokemon type, compute the average value for each of its properties across all of the Pokemon that belong to that type. Save this result into a dictionary indexed by Pokemon type.
    • Next, process the average values obtained in the previous step to compute the maximum average value for each property. Optionally, at this point you can also compute which Pokemon types have the maximum average value for each property.
    • Use these data to help process user queries until there are no queries to process.

Examples

Some examples of query processing, on different datasets, are shown here.

CSc 120: Pokemon Examples

It can be helpful, for testing and debugging, to start out with small input files and work up to larger inputs. In addition to the full Pokemon.csv dataset, the examples below also use four smaller slices of the full dataset. In each case, the program had four queries submitted (the queries used a mixture of upper- and lower-case characters to test the case-insensitive comparison in the code):

speed
ATTACK
Defense
hP

The characteristics of the five input files, and the results of these four queries, are as follows.

 

 

  • PokeInfo-tiny.csv. The first 3 lines from the full dataset. There are two Pokemon, and only one type. 
    speed
    Grass: 52.5
    ATTACK
    Grass: 55.5
    Defense
    Grass: 56.0
    hP
    Grass: 52.5
    

     

  • PokeInfo-small.csv. The first 12 lines from the full dataset. There are 11 Pokemon, from three different types.
    speed
    Fire: 89.0
    ATTACK
    Fire: 86.8
    Defense
    Grass: 79.5
    hP
    Grass: 66.25
    

     

  • PokeInfo-100.csv. The first 100 lines from the full dataset. There are 99 Pokemon from 12 types.
    speed
    Psychic: 116.25
    ATTACK
    Fighting: 99.0
    Defense
    Rock: 115.0
    hP
    Fairy: 82.5
    

     

  • PokeInfo-250.csv. The first 250 lines from the full dataset. 249 Pokemon, 17 types.
    speed
    Ghost: 100.0
    ATTACK
    Fighting: 102.85714285714286
    Defense
    Steel: 190.0
    hP
    Normal: 78.3529411764706
    

     

  • Pokemon.csv. The complete dataset. 800 Pokemon, 18 types.
    speed
    Flying: 102.5
    ATTACK
    Dragon: 112.125
    Defense
    Steel: 126.37037037037037
    hP
    Dragon: 83.3125
    
Order from us and get better grades. We are the service you have been looking for.