Note

  • Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.
In [3]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

purchase_data.head()
Out[3]:
Purchase ID SN Age Gender Item ID Item Name Price
0 0 Lisim78 20 Male 108 Extraction, Quickblade Of Trembling Hands 3.53
1 1 Lisovynya38 40 Male 143 Frenzied Scimitar 1.56
2 2 Ithergue48 24 Male 92 Final Critic 4.88
3 3 Chamassasya86 24 Male 100 Blindscythe 3.27
4 4 Iskosia90 23 Male 131 Fury 1.44

Player Count

  • Display the total number of players
In [4]:
player_count = len(purchase_data["SN"].unique())
print(f"There are {player_count} players total in Pymoli!")
There are 576 players total in Pymoli!

Purchasing Analysis (Total)

  • Run basic calculations to obtain number of unique items, average price, etc.
  • Create a summary data frame to hold the results
  • Optional: give the displayed data cleaner formatting
  • Display the summary data frame
In [5]:
item_count = len(purchase_data["Item ID"].unique())
print(f"There are {item_count} purchasable in-game items, total!")

avg_price = purchase_data["Price"].mean()
print("The average price of the items sold in-game was ${:.2f}".format(avg_price))
avg_price_string = "${:.2f}".format(avg_price)

item_sold_count = purchase_data["Purchase ID"].count()
print("The total number of items sold in the Pymoli game was", item_sold_count)

total_revenue = purchase_data["Price"].sum()
print("The total revenue from in-game item purchases was ${:,}".format(total_revenue))
total_revenue_string = "${:,}".format(total_revenue)

#put info into a dataframe and show as a table!
item_summary_stats_df = pd.DataFrame({"Number of Unique Items":[item_count],
                                     "Average Price":[avg_price_string],
                                     "Number of Purchases":[item_sold_count],
                                     "Total Revenue":[total_revenue_string]})
item_summary_stats_df
There are 179 purchasable in-game items, total!
The average price of the items sold in-game was $3.05
The total number of items sold in the Pymoli game was 780
The total revenue from in-game item purchases was $2,379.77
Out[5]:
Number of Unique Items Average Price Number of Purchases Total Revenue
0 179 $3.05 780 $2,379.77

Gender Demographics

  • Percentage and Count of Male Players
  • Percentage and Count of Female Players
  • Percentage and Count of Other / Non-Disclosed
In [6]:
#remember the total player count from earlier - this will help with percentages!
print("There are {} total players of Pymoli.".format(player_count))

#now find number of male players
all_male_purchases = purchase_data.loc[purchase_data["Gender"] == "Male"]
male_player_count = len(all_male_purchases["SN"].unique())
print("Of the {} total players of Pymoli, there are {} male players.".format(player_count, male_player_count))

#now find number of female players
all_female_purchases = purchase_data.loc[purchase_data["Gender"] == "Female"]
female_player_count = len(all_female_purchases["SN"].unique())
print("Of the {} total players of Pymoli, there are {} female players.".format(player_count, female_player_count))

#now find number of other/non-disclosed players
other_players = player_count - (male_player_count + female_player_count)
print("Of the {} total players of Pymoli, there are {} other/non-disclosed players.".format(player_count, other_players))

#create a dataframe table to hold this player count by gender information, as well as show the percentages of the
#player population that are categorized by each gender
stats_dict = {"Total Count":{"Male": male_player_count, 
                             "Female": female_player_count, 
                             "Other/Non-Disclosed": other_players},
             "Percentage of Players":{"Male": ("{:.2%}".format(male_player_count/player_count)), 
                                      "Female": "{:.2%}".format(female_player_count/player_count), 
                                      "Other/Non-Disclosed": "{:.2%}".format(other_players/player_count)},}
player_gender_df = pd.DataFrame(stats_dict)

#output table with desired counts and percentages by gender
player_gender_df
There are 576 total players of Pymoli.
Of the 576 total players of Pymoli, there are 484 male players.
Of the 576 total players of Pymoli, there are 81 female players.
Of the 576 total players of Pymoli, there are 11 other/non-disclosed players.
Out[6]:
Total Count Percentage of Players
Male 484 84.03%
Female 81 14.06%
Other/Non-Disclosed 11 1.91%

Purchasing Analysis (Gender)

  • Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender
  • Create a summary data frame to hold the results
  • Optional: give the displayed data cleaner formatting
  • Display the summary data frame
In [7]:
purchases_by_gender = purchase_data["Gender"].value_counts()
total_purchases = purchases_by_gender.sum()

#summary series to show the different purchase counts by gender
print(purchases_by_gender)

#lookup the purhcase counts by gender and store in variables
male_purchases = purchases_by_gender["Male"]
female_purchases = purchases_by_gender["Female"]
other_purchases = total_purchases - (male_purchases + female_purchases)

#calculate average purchase price by gender

#first, filter all the purchase records to only include male purchases
male_purchases_only = purchase_data.loc[purchase_data["Gender"] == "Male"]

#get the average purchase price for any given male purchase
male_avg_purchase_price = male_purchases_only["Price"].mean()
male_avg_purchase_price = round(male_avg_purchase_price, 2)
print("The average in-game item purchase price in the Pymoli game for male players was ${}".format(male_avg_purchase_price))

#get the total revenue from male purchases
male_total_purchase_value = male_purchases_only["Price"].sum()
male_total_purchase_value = round(male_total_purchase_value, 2)
print("The total in-game item purchase price in the Pymoli game for male players was ${:,}".format(male_total_purchase_value))

#get the average purchase value per male player (average amount spent by each player)
avg_male_purchase_per_player = male_total_purchase_value/male_player_count
avg_male_purchase_per_player = round(avg_male_purchase_per_player, 2)
print("The average in-game item purchase amount in the Pymoli game for male players was ${:.2f}".format(avg_male_purchase_per_player))


#now, filter the purchase records to only include female purchases
female_purchases_only = purchase_data.loc[purchase_data["Gender"] == "Female"]

#get the average purchase price for any given female purchase
female_avg_purchase_price = female_purchases_only["Price"].mean()
female_avg_purchase_price = round(female_avg_purchase_price, 2)
print("The average in-game item purchase price in the Pymoli game for female players was ${:.2f}".format(female_avg_purchase_price))

#get the total revenue from female purchases
female_total_purchase_value = female_purchases_only["Price"].sum()
female_total_purchase_value = round(female_total_purchase_value, 2)
print("The total in-game item purchase price in the Pymoli game for female players was ${:,}".format(female_total_purchase_value))

#get the average purchase value per female player (average amount spent by each player)
avg_female_purchase_per_player = female_total_purchase_value/female_player_count
avg_female_purchase_per_player = round(avg_female_purchase_per_player, 2)
print("The average in-game item purchase amount in the Pymoli game for female players was ${:.2f}".format(avg_female_purchase_per_player))

#now, filter the purchase records to only include other/non-disclosed purchases
other_purchases_only = purchase_data.loc[((purchase_data["Gender"] != "Female") & (purchase_data["Gender"] != "Male")) ]
# print(other_purchases_only)

#get the average purchase price for any given 'other/undisclosed' purchase
other_avg_purchase_price = other_purchases_only["Price"].mean()
other_avg_purchase_price = round(other_avg_purchase_price, 2)
print("The average in-game item purchase price in the Pymoli game for 'other/undisclosed' gender players was ${:.2f}".format(other_avg_purchase_price))

#get the total revenue from 'other/undisclosed' purchases
other_total_purchase_value = other_purchases_only["Price"].sum()
other_total_purchase_value = round(other_total_purchase_value, 2)
print("The total in-game item purchase price in the Pymoli game for 'other/undisclosed' gender players was ${:,}".format(other_total_purchase_value))

#get the average purchase value per 'other/undisclosed' gender player (average amount spent by each player)
avg_other_purchase_per_player = other_total_purchase_value/other_players
avg_other_purchase_per_player = round(avg_other_purchase_per_player, 2)
print("The average in-game item purchase amount in the Pymoli game for 'other/undisclosed' gender players was ${:.2f}".format(avg_other_purchase_per_player))

purchase_count_dict = {"Male":male_purchases, "Female":female_purchases, "Other / Non-Disclosed":other_purchases}
avg_purchase_price_dict = {"Male":male_avg_purchase_price, "Female":female_avg_purchase_price, "Other / Non-Disclosed":other_avg_purchase_price}
total_purchase_dict = {"Male":male_total_purchase_value, "Female":female_total_purchase_value, "Other / Non-Disclosed":other_total_purchase_value}
avg_purchase_per_player_dict = {"Male":avg_male_purchase_per_player, "Female":avg_female_purchase_per_player, "Other / Non-Disclosed":avg_other_purchase_per_player}

overall_table_dict = {"Purchase Count":purchase_count_dict, "Average Purchase Price":avg_purchase_price_dict,
                     "Total Purchase Value":total_purchase_dict, "Avg Total Purchase Per Person":avg_purchase_per_player_dict}

purchases_summary_stats_df = pd.DataFrame(overall_table_dict)
purchases_summary_stats_df
Male                     652
Female                   113
Other / Non-Disclosed     15
Name: Gender, dtype: int64
The average in-game item purchase price in the Pymoli game for male players was $3.02
The total in-game item purchase price in the Pymoli game for male players was $1,967.64
The average in-game item purchase amount in the Pymoli game for male players was $4.07
The average in-game item purchase price in the Pymoli game for female players was $3.20
The total in-game item purchase price in the Pymoli game for female players was $361.94
The average in-game item purchase amount in the Pymoli game for female players was $4.47
The average in-game item purchase price in the Pymoli game for 'other/undisclosed' gender players was $3.35
The total in-game item purchase price in the Pymoli game for 'other/undisclosed' gender players was $50.19
The average in-game item purchase amount in the Pymoli game for 'other/undisclosed' gender players was $4.56
Out[7]:
Purchase Count Average Purchase Price Total Purchase Value Avg Total Purchase Per Person
Male 652 3.02 1967.64 4.07
Female 113 3.20 361.94 4.47
Other / Non-Disclosed 15 3.35 50.19 4.56

Age Demographics

  • Establish bins for ages
  • Categorize the existing players using the age bins. Hint: use pd.cut()
  • Calculate the numbers and percentages by age group
  • Create a summary data frame to hold the results
  • Optional: round the percentage column to two decimal points
  • Display Age Demographics Table
In [8]:
#get age information for unique players - the mean of the ages should be the same as the age for each 
#player, since they have the same age for each purchase
unique_players = purchase_data.loc[:, ["SN", "Age"]].groupby("SN").mean()
print("Here is the series of unique player SN's and ages: \n", unique_players)


bins = [0, 9, 14, 19, 24, 29, 34, 39, 200]
bin_labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

age_df = pd.cut(unique_players["Age"], bins, labels=bin_labels)

age_table_df = pd.DataFrame(age_df.value_counts())
age_table_df["Percentage of Players"] = (age_table_df["Age"]/player_count).map("{:.2%}".format)
age_table_df.index.name = "Age"
age_table_df.columns = ["Player Count", "Percentage of Players"]
age_table_df.reindex(bin_labels)
Here is the series of unique player SN's and ages: 
                Age
SN                
Adairialis76    16
Adastirin33     35
Aeda94          17
Aela59          21
Aelaria33       23
...            ...
Yathecal82      20
Yathedeu43      22
Yoishirrala98   17
Zhisrisu83      10
Zontibe81       21

[576 rows x 1 columns]
Out[8]:
Player Count Percentage of Players
Age
<10 17 2.95%
10-14 22 3.82%
15-19 107 18.58%
20-24 258 44.79%
25-29 77 13.37%
30-34 52 9.03%
35-39 31 5.38%
40+ 12 2.08%

Purchasing Analysis (Age)

  • Bin the purchase_data data frame by age
  • Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below
  • Create a summary data frame to hold the results
  • Optional: give the displayed data cleaner formatting
  • Display the summary data frame
In [9]:
purchase_data_copy = purchase_data.copy()

bins = [0, 9, 14, 19, 24, 29, 34, 39, 200]
bin_labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#replaces the number age with the age range bin
purchase_data_copy["Age"] = pd.cut(purchase_data_copy["Age"], bins, labels=bin_labels)

#get purchase counts by age bin
age_summary_df = purchase_data_copy.groupby("Age").count()
#remove extra columns
age_summary_df = age_summary_df.drop([age_summary_df.columns[a] for a in range(1, len(age_summary_df.columns))], axis=1)
age_summary_df.columns = ["Purchase Count"]
#print(age_summary_df)

#get average purchase prices for each age bin
avg_purchase_price_df = purchase_data_copy.groupby("Age").mean()
avg_purchase_price_df = avg_purchase_price_df.loc[:, "Price"].map("${:.2f}".format)
#add to the summary stats table
age_summary_df = age_summary_df.merge(avg_purchase_price_df, on="Age")
update_cols = list(age_summary_df.columns)
update_cols[-1] = "Average Purchase Price"
age_summary_df.columns = update_cols

#get total purchase values for each age bin
total_purchase_price_df = purchase_data_copy.groupby("Age").sum()
total_purchase_price_df = total_purchase_price_df.loc[:, "Price"].map("${:.2f}".format)
#add to the summary stats table
age_summary_df = age_summary_df.merge(total_purchase_price_df, on="Age")
update_cols = list(age_summary_df.columns)
update_cols[-1] = "Total Purchase Price"
age_summary_df.columns = update_cols

#get avg total purchase per person for each age bin
avg_total_purchase_price_df = purchase_data_copy.groupby("Age").sum()#/(age_table_df["Player Count"])
avg_total_purchase_price_df = pd.DataFrame(avg_total_purchase_price_df.loc[:, "Price"]).merge(
    age_table_df.loc[:, "Player Count"], on="Age")
avg_total_purchase_price_df["Avg Total Purchase Per Person"] = avg_total_purchase_price_df["Price"]/avg_total_purchase_price_df["Player Count"]
#print(avg_total_purchase_price_df)
avg_total_purchase_price_df = avg_total_purchase_price_df.loc[:, "Avg Total Purchase Per Person"].map("${:.2f}".format)
#print(avg_total_purchase_price_df) -> correct avg total per person calc, now add to summary stats table
age_summary_df = age_summary_df.merge(avg_total_purchase_price_df, on="Age")
update_cols = list(age_summary_df.columns)
update_cols[-1] = "Avg Total Purchase Per Person"
age_summary_df.columns = update_cols

print("The summary stats table for age-related purchasing analysis is below:")
age_summary_df
The summary stats table for age-related purchasing analysis is below:
Out[9]:
Purchase Count Average Purchase Price Total Purchase Price Avg Total Purchase Per Person
Age
<10 23 $3.35 $77.13 $4.54
10-14 28 $2.96 $82.78 $3.76
15-19 136 $3.04 $412.89 $3.86
20-24 365 $3.05 $1114.06 $4.32
25-29 101 $2.90 $293.00 $3.81
30-34 73 $2.93 $214.00 $4.12
35-39 41 $3.60 $147.67 $4.76
40+ 13 $2.94 $38.24 $3.19

Top Spenders

  • Run basic calculations to obtain the results in the table below
  • Create a summary data frame to hold the results
  • Sort the total purchase value column in descending order
  • Optional: give the displayed data cleaner formatting
  • Display a preview of the summary data frame
In [10]:
#group data by SN's
top_spenders_df = pd.DataFrame(purchase_data.groupby("SN").count())
#remove extra columns
top_spenders_df = top_spenders_df.drop([top_spenders_df.columns[a] for a in range(1, len(top_spenders_df.columns))], axis=1)
top_spenders_df.columns = ["Purchase Count"]

#get the average purchase price in a DataFrame
avg_spent_df = pd.DataFrame(purchase_data.groupby("SN").mean())
avg_spent_df = avg_spent_df.loc[:, "Price"].map("${:.2f}".format)
#add into top_spenders_df
top_spenders_df = top_spenders_df.merge(avg_spent_df, on="SN")
update_cols = list(top_spenders_df.columns)
update_cols[-1] = "Average Purchase Price"
top_spenders_df.columns = update_cols

#get the total purchase price in a DataFrame
total_spent_df = pd.DataFrame(purchase_data.groupby("SN").sum())
#remove other columns, just leaving the price totals
total_spent_df = total_spent_df.loc[:, "Price"]
#add into top_spenders_df
top_spenders_df = top_spenders_df.merge(total_spent_df, on="SN")
update_cols = list(top_spenders_df.columns)
update_cols[-1] = "Total Purchase Price"
top_spenders_df.columns = update_cols
#sort values based on "Total Purchase Price"
top_spenders_df = top_spenders_df.sort_values(by="Total Purchase Price", ascending=False)
#can now update formatting to show dollar sign, otherwise the sorting is wrong because the 
#"Total Purchase Price" column is sorting on strings, not numeric values!
top_spenders_df["Total Purchase Price"] = top_spenders_df["Total Purchase Price"].map("${:.2f}".format)
top_spenders_df.head(5)
Out[10]:
Purchase Count Average Purchase Price Total Purchase Price
SN
Lisosia93 5 $3.79 $18.96
Idastidru52 4 $3.86 $15.45
Chamjask73 3 $4.61 $13.83
Iral74 4 $3.40 $13.62
Iskadarya95 3 $4.37 $13.10
  • Retrieve the Item ID, Item Name, and Item Price columns
  • Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value
  • Create a summary data frame to hold the results
  • Sort the purchase count column in descending order
  • Optional: give the displayed data cleaner formatting
  • Display a preview of the summary data frame
In [11]:
#first, group items by both ID and Name
pop_items = purchase_data[["Item ID", "Item Name", "Price"]]
group_by_item = pop_items.groupby(["Item ID", "Item Name"])

#use the average price as the item price
pop_items = pd.DataFrame(group_by_item.mean())
pop_items.columns = ["Item Price (Average)"]

#get count of items purchased by item name and ID
item_purchase_count = group_by_item.count()
pop_items["Purchase Count"] = item_purchase_count

#get count of items purchased by item name and ID
item_purchase_total = group_by_item.sum()
pop_items["Purchase Amount"] = item_purchase_total

#sort the summary dataframe by item purchase count
pop_items = pop_items.sort_values(by="Purchase Count", ascending=False)

#format pop_items df so table looks better
formatted_pop_items = pop_items.copy()
formatted_pop_items["Item Price (Average)"] = pop_items["Item Price (Average)"].map("${:.2f}".format)
formatted_pop_items["Purchase Amount"] = pop_items["Purchase Amount"].map("${:.2f}".format)
formatted_pop_items.head()
Out[11]:
Item Price (Average) Purchase Count Purchase Amount
Item ID Item Name
92 Final Critic $4.61 13 $59.99
178 Oathbreaker, Last Hope of the Breaking Storm $4.23 12 $50.76
145 Fiery Glass Crusader $4.58 9 $41.22
132 Persuasion $3.22 9 $28.99
108 Extraction, Quickblade Of Trembling Hands $3.53 9 $31.77

Most Profitable Items

  • Sort the above table by total purchase value in descending order
  • Optional: give the displayed data cleaner formatting
  • Display a preview of the data frame
In [12]:
#use original items table to sort by numeric values instead of strings
money_items = pop_items.copy()
money_items = money_items.sort_values(by="Purchase Amount", ascending=False)

#format table to look cleaner
formatted_money_items = money_items.copy()
formatted_money_items["Item Price (Average)"] = money_items["Item Price (Average)"].map("${:.2f}".format)
formatted_money_items["Purchase Amount"] = money_items["Purchase Amount"].map("${:.2f}".format)

formatted_money_items.head()
Out[12]:
Item Price (Average) Purchase Count Purchase Amount
Item ID Item Name
92 Final Critic $4.61 13 $59.99
178 Oathbreaker, Last Hope of the Breaking Storm $4.23 12 $50.76
82 Nirvana $4.90 9 $44.10
145 Fiery Glass Crusader $4.58 9 $41.22
103 Singed Scalpel $4.35 8 $34.80

Conclusions

  • First, we notice that the game is heavily dominated by male players, with about 84% of the players identifying as male.
  • Similarly, we find that about 86% of the players are between the ages of 15 and 34, which is a large majority.
  • Finally, the purchasing data suggest that approximately 83% of the revenue from in-game item purchases result from male players purchasing items, as opposed to female or "other" genders.

These trends in the purchasing data suggest that most of the game marketing and features should attract and support young male players (between the ages of 15 and 34) to grow the player base and drive revenue growth, since that demographic spends the most money in Heroes of Pymoli and has the largest market segment of the current player base. There are only 576 players of Heroes of Pymoli, so promoting the game to this demographic could increase the number of players, with an additional 1.35 items purchased on average per male player, and a \$4.07 average increase in revenue per additional male player. So if the marketing was able to result in attracting 1000 additional male players to Heroes of Pymoli, the revenue from in-game purchases would be expected to increase by about \\$4070.

In [ ]:
 
In [ ]: