2016 World Cup of Hockey NHL Team Captains

Today we’ll scrape some data from wikipedia and use Bokeh to make a stacked bar chart showing the number of NHL captains per team in the World Cup of Hockey. This was the inaugural year of the tournament and it was held in Toronto.

As usual, the complete work can be seen in my ipython notebook.


Getting the data

Using the python libraries requests and beautiful soup, this can be done as seen below. We first get the names of current NHL captains.

url = 'https://en.wikipedia.org/wiki/List_of_current_NHL_captains_and_alternate_captains'
page = requests.get(url)
print('Got %s type object of url using the requests library' % str(type(page)))

>> Got <class ‘requests.models.Response’> type object of url using the requests library

soup = BeautifulSoup(page.content, 'html.parser')
print('Fed %s type object into BeautifulSoup to create a %s type object' % (str(type(page.content)), str(type(soup))))

>> Fed <class ‘bytes’> type object into BeautifulSoup to create a <class ‘bs4.BeautifulSoup’> type object

page_tables = soup.findAll('table')
print('Got %s type object of length %d' % (str(type(page_tables)), len(page_tables)))

>> Got <class ‘bs4.element.ResultSet’> type object of length 11

Each table in the URL is an object and they can be iterated over. For example:

for table in page_tables:

>> Position abbreviations
>> List of current NHL Captains
>> List of current NHL Alternate Captains
>> None
>> None
>> …etc

We clearly are interested in the 2nd and 3rd items in page_tables. There are many ways to extract the data at this point. We’ll iterate through the data in our table using mod(i, 3) to access only one of the 3 columns.

# Getting the captains
C_players = []
for i, n in enumerate(page_tables[1].findAll('td')):
    if i % 3 == 0:
            # N/A entry
            print('Skipping entry:', n.contents)

We can do something similar to get the alternate captains.

The world cup rosters can be acquired in a similar way, as seen in my ipython notebook linked to at the top of this post.


Plotting the results

At this point we have the following data in memory:

  • players – a list containing lists of players on each team
  • teams – the World Cup team names
  • C_players – list of NHL team captains
  • A_players – list of NHL team alternate captains

We could get the total number of captains playing in the tournament as follows:

# Flatten player list
all_players = [p for p_list in players for p in p_list]
# Initialize counters
N_C, N_A = 0, 0
for player in all_players:
    if player in C_players:
        N_C += 1
    elif player in A_players:
        N_A += 1
print('%d captains and %d alternates ' % (N_C, N_A))

>> 18 captains and 25 alternates

This turns out to be 69% of the NHL team captains and 42% of the alternates. Overall about half of the NHL captains are playing in this tournament.

Building the above counter into a function, we can apply it to each list of players. Using the pandas library, we can then build the following dataframe:



With this dataframe (named df), we can use Bokeh to create the stacked bar plot as follows [1]:

# Import libraries and setup for ipython notebook display
from bokeh.charts import Bar
from bokeh.charts.attributes import color, cat
from bokeh.charts.operations import blend
from bokeh.io import output_notebook, show

# Make the plot
bar = Bar(df,
          values=blend('Number of Captains', 'Number of Alternate Captains',
          name='Number of Captains', labels_name='caps'),
          stack=cat(columns='caps', sort=False),
          label=cat(columns='Team', sort=False),
          color=color(columns='caps', palette=['OrangeRed', 'Orange'], sort=False),
          title='2016 World Cup of Hockey NHL Captains',



This method is good for creating charts where the data is ordered in the same way as in the dataframe. In this case we ordered df by the total number of captains.

Most of my time on this little project was spent trying to get some of Bokeh’s interactive plot features working. In particular, I wanted to use the hover tool to show the names of the players in each bar. Unfortunately it is not possible to do this easily with the current version of Bokeh and I could not figure out how to get it working. For more details check out my stack overflow question about the issue.

Here is the list of captains / alternates on each team:

for T, C, A in zip(df.Team, df.C, df.A):
    print(T, '\nC -', C, '\nA -', A, '\n')

C – Alex Pietrangelo, Sidney Crosby, Ryan Getzlaf, Claude Giroux, Steven Stamkos, John Tavares, Jonathan Toews
A – Drew Doughty, Shea Weber, Patrice Bergeron, Logan Couture, Ryan O’Reilly, Corey Perry, Joe Thornton

United States
C – Ryan McDonagh, Max Pacioretty, Joe Pavelski, Blake Wheeler
A – Dustin Byfuglien, Ryan Suter, Brandon Dubinsky, Ryan Kesler, Zach Parise, Derek Stepan

C – Erik Karlsson, Gabriel Landeskog, Henrik Sedin
A – Oliver Ekman-Larsson, Nicklas Backstrom, Daniel Sedin

Team Europe
C – Zdeno Chara, Anze Kopitar
A – Roman Josi, Mark Streit

C – Alexander Ovechkin
A – Andrei Markov, Evgeni Malkin

Czech Republic
C –
A – Martin Hanzal, Tomas Plekanec

C – Mikko Koivu
A – Jussi Jokinen

Team North America
C –
A – Ryan Nugent-Hopkins, Mark Scheifele


Thanks for reading! If you would like to discuss anything or have questions/corrections then please write a comment, email me at agalea91@gmail.com, or tweet me @agalea91


[1] – I used this example as a guide from the Bokeh docs to create my bar plot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s