zhaopinxinle.com

Web Scraping in R: A Hands-On Guide to NBA Player Data

Written on

Web scraping is a valuable technique for extracting data from websites, especially when the data isn't readily available in a structured format. In many instances, analysts find themselves needing to gather information from various online sources, such as websites, to conduct thorough analyses.

Businesses often leverage web scraping to gain competitive insights by accessing overlooked data. When executed effectively, it allows us to retrieve information from any site and convert it into a format suitable for analysis or reporting.

Web scraping involves the automated extraction of data from websites, which simplifies the process by eliminating manual data collection.

Some practical applications of web scraping include collecting product reviews, tracking real-time prices for travel accommodations, or aggregating job listings.

There are numerous libraries in popular programming languages designed for parsing HTML content, such as Beautiful Soup in Python. However, in this tutorial, we will utilize rvest, an R package specifically created for harvesting web data. We will focus on how to scrape details about current NBA players from the ESPN website.

Player Profile

When scraping data, it's crucial to ensure consistency across the web pages you plan to target. The only way to automate the scraping process across multiple pages is if there is a recognizable pattern in the data structure.

For instance, let's examine the roster pages for the Boston Celtics and the New York Knicks.

Both teams present their rosters in a tabular format, listing each player's name, position, age, height, weight, college, and salary. Additionally, the URL structure varies by team: ../bos/boston-celtics for the Celtics and ../ny/new-york-knicks for the Knicks. This will be useful as we proceed.

Initially, we will concentrate on scraping data for just one team. Once we successfully extract this data, we can easily implement a loop to automate the process for all teams.

To effectively scrape data, it's beneficial to have a basic understanding of HTML and CSS, the foundational technologies for web development. HTML structures the content, while CSS enhances the visual layout of a page.

We will streamline this process using a Chrome extension called SelectorGadget, which allows us to effortlessly generate CSS selectors by highlighting the desired elements on a webpage.

In this instance, we will select the elements corresponding to each player's details for the Boston Celtics and store them in variables. Subsequently, we will compile this information into a data frame.

Player Regular Season Statistics

Let's enhance our scraping process further.

Each player’s name features a hyperlink that directs to a separate page detailing their performance during the latest NBA regular season.

For example, consider Jayson Tatum and Derrick Rose.

In this phase, we aim to amalgamate this performance data with our original data frame.

To achieve this, we will create a function that retrieves the seasonal statistics for each player on our roster.

Automate for All NBA Teams

Having grasped the process for the Boston Celtics, we can extend it to encompass all 30 NBA teams.

We simply need to adjust the URL for each team using a loop. The entire operation took my computer approximately 9 minutes. Quite efficient!

After performing additional data cleaning—such as converting player heights to centimeters and weights to kilograms, along with adjusting data types and renaming columns—we will arrive at a final data frame that appears as follows.

Bonus: Exploratory Data Analysis

This segment serves more for my amusement than as a web scraping tutorial. I’ve employed basic data visualizations to derive insights from the compiled data on current NBA players.

While the analysis may not be groundbreaking, I have included my observations in the captions accompanying each chart.

Thank you for reading! I hope you gained valuable insights into the fundamentals of web scraping with R. I encourage you to explore the workbook associated with this exercise on my GitHub, which includes all the code utilized throughout the project.

If you found this article helpful and aren't yet a Medium member, signing up through the link below would greatly support me and other writers on this platform. Your membership empowers us to continue producing high-quality, informative content—thank you in advance!

<div class="link-block">

<h2>Join Medium with my referral link - Jason Chong</h2>

<h3>Read every story from Jason Chong (and thousands of other writers on Medium). Your membership fee directly supports…</h3>

<p>chongjason.medium.com</p>

</div>

Looking for your next read? Here are some recommendations:

<div class="link-block">

<h2>10 Most Important SQL Commands Every Data Analyst Needs to Know</h2>

<h3>Querying data from a database doesn’t need to be complicated</h3>

<p>towardsdatascience.com</p>

</div>

<div class="link-block">

<h2>Addressing the Issue of “Black Box” in Machine Learning</h2>

<h3>4 must-know techniques to create more transparency and explainability in model predictions</h3>

<p>towardsdatascience.com</p>

</div>

<div class="link-block">

<h2>What does Career Progression Look Like for a Data Scientist?</h2>

<h3>A guide to understanding the role of a junior vs senior data scientist at a large company</h3>

<p>towardsdatascience.com</p>

</div>

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Navigating the Venture Capital Landscape: Insights for 2022

An exploration of the evolving venture capital trends in 2022, including insights from leading experts.

Are Atheists More Intelligent? Insights from Research

This article explores the research connecting intelligence and religiosity, examining the views of atheists and religious individuals.

Exploring the Depths of Consciousness Beyond Materialism

A critique of materialism, emphasizing the importance of consciousness and subjective experience over a purely physicalist worldview.

Why Twitter's New NFT Feature Raises Serious Concerns

Explore the potential pitfalls of Twitter's NFT profile feature, from privacy issues to scams and bullying.

Innovative Solar Solutions and the Shift Towards Postcapitalism

Exploring solar energy's role in combating climate change and its potential to transform economic systems.

The Evolution of Human Intelligence: A Unique Perspective

Exploring the theory that female vulnerability drove human intelligence evolution through group dynamics.

Top 3 NFT Marketplaces With Minimal Fees for Trading

Discover the best NFT marketplaces with low fees for buying and selling digital assets.

Navigating Grief After Divorce: Understanding Acceptance Stages

Exploring the stages of divorce grief, focusing on acceptance and personal growth.