Group 4

Hai Huang, Kuai Yu, Jiahua Ma

How to become a top PUBG player

PUBG

PlayerUnknown's Battlegrounds (PUBG) is an online multiplayer battle royale game

Which the last man survive win the game

Our goal

Our goal is to analysis the top players and rest of the players, and trying to help players become top players

The Data

Get it from Kaggle Dataset

webpage: https://www.kaggle.com/skihikingkevin/pubg-match-deaths

Design

we use jupyter notebook and python 3.7 to do the most of the code

The visualization package we use are plotly seaborn and matplotlib

Data concat

Each csv file is around 2 GB, and agg_match file contain match information

kill_match files contain information about players'death placement, killed by information

In [2]:
#import and concat data for agg
agg_final = pd.concat([pd.read_csv('agg_match_stats_0.csv'),
                      pd.read_csv('agg_match_stats_1.csv')], ignore_index = True)
In [3]:
#import and concat data for deaths
kill_final = pd.concat([pd.read_csv('kill_match_stats_final_0.csv'),
                      pd.read_csv('kill_match_stats_final_1.csv')], ignore_index = True)
In [4]:
#Drop nan data from both datasets
agg_final = agg_final.dropna()
kill_final = kill_final.dropna()

Sneak Peak of two datasets

In [5]:
agg_final.head()
Out[5]:
date game_size match_id match_mode party_size player_assists player_dbno player_dist_ride player_dist_walk player_dmg player_kills player_name player_survive_time team_id team_placement
0 2017-11-26T20:59:40+0000 37 2U4GBNA0YmnNZYkzjkfgN4ev-hXSrak_BSey_YEG6kIuDG... tpp 2 0 1 2870.72400 1784.847780 117 1 SnuffIes 1106.320 4 18
1 2017-11-26T20:59:40+0000 37 2U4GBNA0YmnNZYkzjkfgN4ev-hXSrak_BSey_YEG6kIuDG... tpp 2 0 1 2938.40723 1756.079710 127 1 Ozon3r 1106.315 4 18
2 2017-11-26T20:59:40+0000 37 2U4GBNA0YmnNZYkzjkfgN4ev-hXSrak_BSey_YEG6kIuDG... tpp 2 0 0 0.00000 224.157562 67 0 bovize 235.558 5 33
3 2017-11-26T20:59:40+0000 37 2U4GBNA0YmnNZYkzjkfgN4ev-hXSrak_BSey_YEG6kIuDG... tpp 2 0 0 0.00000 92.935150 0 0 sbahn87 197.553 5 33
4 2017-11-26T20:59:40+0000 37 2U4GBNA0YmnNZYkzjkfgN4ev-hXSrak_BSey_YEG6kIuDG... tpp 2 0 0 2619.07739 2510.447000 175 2 GeminiZZZ 1537.495 14 11
In [6]:
kill_final.head()
Out[6]:
killed_by killer_name killer_placement killer_position_x killer_position_y map match_id time victim_name victim_placement victim_position_x victim_position_y
0 Grenade KrazyPortuguese 5.0 657725.10 146275.2 MIRAMAR 2U4GBNA0YmnLSqvEycnTjo-KT000vfUnhSA2vfVhVPe1QB... 823 KrazyPortuguese 5.0 657725.10 146275.2
1 SCAR-L nide2Bxiaojiejie 31.0 93091.37 722236.4 MIRAMAR 2U4GBNA0YmnLSqvEycnTjo-KT000vfUnhSA2vfVhVPe1QB... 194 X3evolution 33.0 92238.68 723375.1
2 S686 Ascholes 43.0 366921.40 421623.9 MIRAMAR 2U4GBNA0YmnLSqvEycnTjo-KT000vfUnhSA2vfVhVPe1QB... 103 CtrlZee 46.0 367304.50 421216.1
3 Down and Out Weirdo7777 9.0 472014.20 313274.8 MIRAMAR 2U4GBNA0YmnLSqvEycnTjo-KT000vfUnhSA2vfVhVPe1QB... 1018 BlackDpre 13.0 476645.90 316758.4
4 M416 Solayuki1 9.0 473357.80 318340.5 MIRAMAR 2U4GBNA0YmnLSqvEycnTjo-KT000vfUnhSA2vfVhVPe1QB... 1018 Vjolt 13.0 473588.50 318418.8

Column features and shapes for both datasets

In [7]:
agg_final.columns
Out[7]:
Index(['date', 'game_size', 'match_id', 'match_mode', 'party_size',
       'player_assists', 'player_dbno', 'player_dist_ride', 'player_dist_walk',
       'player_dmg', 'player_kills', 'player_name', 'player_survive_time',
       'team_id', 'team_placement'],
      dtype='object')
In [8]:
kill_final.columns
Out[8]:
Index(['killed_by', 'killer_name', 'killer_placement', 'killer_position_x',
       'killer_position_y', 'map', 'match_id', 'time', 'victim_name',
       'victim_placement', 'victim_position_x', 'victim_position_y'],
      dtype='object')
In [9]:
agg_final.shape
Out[9]:
(27653247, 15)
In [10]:
kill_final.shape
Out[10]:
(24217596, 12)

Data Preprocess:

Match id is the only feature which are in both datasets

and there are at most 100 same match id in the datasets

we have to group data by match id and some features we define the data such as party size and map

Subset of data

find each party size's match id

find each map's match id

In [11]:
solo_matches = agg_final.loc[agg_final['party_size'] == 1,'match_id'].drop_duplicates()
duo_matches = agg_final.loc[agg_final['party_size'] == 2,'match_id'].drop_duplicates()
squad_matches = agg_final.loc[agg_final['party_size'] == 4,'match_id'].drop_duplicates()
In [12]:
er_matches = kill_final.loc[kill_final['map'] == 'ERANGEL','match_id'].drop_duplicates()
mr_matches = kill_final.loc[kill_final['map'] == 'MIRAMAR','match_id'].drop_duplicates()

Match the match id from two dataset and create subsets of data

Here we have two maps of solo, duo and squad subsets of data

In [13]:
er_agg = agg_final[agg_final['match_id'].isin(er_matches.values)]
top_solo_era = er_agg[(er_agg['party_size'] == 1) & (er_agg['team_placement'] < 6)]
top_duo_era = er_agg[(er_agg['party_size'] == 2) & (er_agg['team_placement'] < 6)]
top_squad_era = er_agg[(er_agg['party_size'] == 4) & (er_agg['team_placement'] < 6)]

rest_solo_era = er_agg[(er_agg['party_size'] == 1) & (er_agg['team_placement'] > 6)]
rest_duo_era = er_agg[(er_agg['party_size'] == 2) & (er_agg['team_placement'] > 6)]
rest_squad_era = er_agg[(er_agg['party_size'] == 4) & (er_agg['team_placement'] > 6)]
In [14]:
mr_agg = agg_final[agg_final['match_id'].isin(mr_matches.values)]
top_solo_mir = mr_agg[(mr_agg['party_size'] == 1) & (mr_agg['team_placement'] < 6)]
top_duo_mir = mr_agg[(mr_agg['party_size'] == 2) & (mr_agg['team_placement'] < 6)]
top_squad_mir = mr_agg[(mr_agg['party_size'] == 4) & (mr_agg['team_placement'] < 6)]

rest_solo_mir = mr_agg[(mr_agg['party_size'] == 1) & (mr_agg['team_placement'] > 6)]
rest_duo_mir = mr_agg[(mr_agg['party_size'] == 2) & (mr_agg['team_placement'] > 6)]
rest_squad_mir = mr_agg[(mr_agg['party_size'] == 4) & (mr_agg['team_placement'] > 6)]

Number of solo duo squad matches in two different maps

In [15]:
print('Number of solo queue matches: %i' % len(solo_matches))
solo_deaths = kill_final[kill_final['match_id'].isin(solo_matches.values)]
deaths_solo_er = solo_deaths[solo_deaths['map'] == 'ERANGEL']
deaths_solo_mr = solo_deaths[solo_deaths['map'] == 'MIRAMAR']
print('  Number of Erangel solo matches: %i' % len(deaths_solo_er.groupby('match_id').first()))
print('  Number of Miramar solo matches: %i' % len(deaths_solo_mr.groupby('match_id').first()))
Number of solo queue matches: 62702
  Number of Erangel solo matches: 51548
  Number of Miramar solo matches: 10090
In [16]:
print('Number of duo queue matches: %i' % len(duo_matches))
duo_deaths = kill_final[kill_final['match_id'].isin(duo_matches.values)]
deaths_duo_er = duo_deaths[duo_deaths['map'] == 'ERANGEL']
deaths_duo_mr = duo_deaths[duo_deaths['map'] == 'MIRAMAR']
print('  Number of Erangel duo matches: %i' % len(deaths_duo_er.groupby('match_id').first()))
print('  Number of Miramar duo matches: %i' % len(deaths_duo_mr.groupby('match_id').first()))
Number of duo queue matches: 95727
  Number of Erangel duo matches: 76717
  Number of Miramar duo matches: 16569
In [17]:
print('Number of squad queue matches: %i' % len(squad_matches))
squad_deaths = kill_final[kill_final['match_id'].isin(squad_matches.values)]
deaths_squad_er = squad_deaths[squad_deaths['map'] == 'ERANGEL']
deaths_squad_mr = squad_deaths[squad_deaths['map'] == 'MIRAMAR']
print('  Number of Erangel squad matches: %i' % len(deaths_squad_er.groupby('match_id').first()))
print('  Number of Miramar squad matches: %i' % len(deaths_squad_mr.groupby('match_id').first()))
Number of squad queue matches: 141555
  Number of Erangel squad matches: 109757
  Number of Miramar squad matches: 28460

Stats of top 5 players and rest of the players in different map and different party size

In [32]:
layout = dict(title='Maps gamemode stats difference', showlegend=False,
              updatemenus=updatemenus)
figmenu = dict(data=menu_data, layout=layout)
py.iplot(figmenu, filename='update_button')
Out[32]:

Top 20 weapons usage between top players and rest of the players

In [34]:
waysOfDeath = kill_final['killed_by'].unique()
nwaysOfDeath = kill_final['killed_by'].nunique()
rank = 20
nwaysOfDeath
Out[34]:
56
In [61]:
py.iplot(fig1, filename='donut6')
Out[61]:
In [63]:
py.iplot(fig2, filename='donut1')
Out[63]:
In [65]:
py.iplot(fig3, filename='donut2')
Out[65]:
In [67]:
py.iplot(fig4, filename='donut3')
Out[67]:
In [69]:
py.iplot(fig5, filename='donut4')
Out[69]:
In [71]:
py.iplot(fig6, filename='donut5')
Out[71]:

Heat map to see where the last circle could be

We use the second place player's death position to define where the last circle could be.

We only use solo matches data here, because in duo and squad there are multiple second place death position

Because the picture is too large and cannot fit properly in the screen so we uploaded them

Original erangel map : http://personal.psu.edu/hkh5094/DS330/Finalproject/erangel.jpg

Original miramar map: http://personal.psu.edu/hkh5094/DS330/Finalproject/miramar.jpg

erangel heat map: http://personal.psu.edu/hkh5094/DS330/Finalproject/era.png

miramar heat map: http://personal.psu.edu/hkh5094/DS330/Finalproject/mir.png