Creating Stunning Ridgeline Plots Using JoyPy in Python
Written on
Chapter 1: Introduction to Ridgeline Plots
Ridgeline plots serve as a compelling visualization method for illustrating repetitive signals. By arranging each frequency period along distinct lines and stacking them, these plots create a pseudo-3D effect that resembles a range of mountain ridges. This visual depth is enhanced by "fading" the periods, mimicking atmospheric perspective.
Example ridgeline plot showcasing atmospheric perspective (by author)
In a typical ridgeline plot, the vertical axis indicates the density of the variable, while the horizontal axis reflects its range of values. The alignment of density plots is optimized to reduce overlap and enhance the clarity of the data distribution. This plotting technique can effectively convey intricate data distributions in an aesthetically pleasing and comprehensible manner.
Ridgeline plots are particularly suited for visualizing weather data due to its repetitive nature. In this Quick Success Data Science project, we will utilize Python and a ridgeline plot to investigate whether droughts in East-Central Texas are linked to La Niña weather events.
What is La Niña?
La Niña, which translates to "the little girl" in Spanish, is a climatic phenomenon occurring in the Pacific Ocean at irregular intervals. It acts as the counterbalance to the El Niño phenomenon.
During La Niña years, robust trade winds push warm surface water from South America towards Indonesia. As this warm water shifts westward, colder water from the ocean depths rises to take its place. Consequently, the eastern Pacific waters cool, leading to cooler air above, which pushes the jet stream northward and results in drier, hotter conditions across the southwestern United States.
Here’s an insightful video explaining this process:
Importing Necessary Libraries
The JoyPy library simplifies the creation of ridgeline plots by leveraging kernel density estimation (KDE) plots from the pandas library. KDE is a technique for visualizing data distributions, similar to a histogram. If you frequently use pandas and matplotlib, you’ll find JoyPy quite straightforward. To install JoyPy, you can use:
pip install joypy
or
conda install joypy
In addition to JoyPy, you’ll need Python, NumPy, matplotlib, and pandas.
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from joypy import joyplot
Loading Rainfall Data
We'll analyze monthly rainfall data (in inches) from the Easterwood Field weather station, located near College Station in East-Central Texas, covering the period from 2000 to 2021. This data is available through the Southern Regional Climate Center Data Portal managed by Texas A&M University.
Loading and Cleaning the Data
The following code snippet reads a CSV file from a URL and stores it in a pandas DataFrame. It replaces trace values, marked by "T", with a small numerical value (0.01). Then, it converts all columns to numeric types and changes the "Year" column to integers.
df = pd.read_csv('http://bit.ly/3yo4ZRI')
df = df.replace('T', 0.01) # Handle trace rainfall amounts.
df = df.apply(pd.to_numeric, errors='coerce', axis=1)
df.Year = df.Year.astype(int)
df.head(3)
Calculating Rainfall Anomaly
A rainfall anomaly indicates how a given month's total rainfall deviates from the mean. Here, we will calculate the mean for the period from 2000 to 2021. The following code utilizes pandas to transform the data for visualization.
col = df.loc[:, 'JAN':'DEC']
df_anomaly = pd.concat([df['Year'], (col - col.mean())], axis=1)
df_melted = pd.melt(df_anomaly,
id_vars=['Year'],
var_name='Month',
value_name='Anomaly')
df_melted.head(3)
This code creates a new DataFrame, "col", consisting solely of the monthly columns from "df". It then builds a new DataFrame, "df_anomaly," that computes the monthly anomaly and combines this with the "Year" column.
Defining La Niña Years
Next, we need to categorize each year as either a La Niña event year or not, and mark years with below-average rainfall. This will help us analyze the connection between La Niña events and drought conditions in East-Central Texas.
We will compile a list of moderate to strong La Niña years manually, named "nina."
# Identify moderate-to-strong La Niña events:
nina = [2000, 2007, 2008, 2010, 2011, 2012, 2020, 2021]
df['event'] = np.where(df['Year'].isin(nina), 1, 0)
# Identify dry years:
MAX_INCHES_DRY_YEAR = 38
dry_years = df.loc[df['Yearly'] < MAX_INCHES_DRY_YEAR, 'Year'].tolist()
After creating the list, we add a new column to the "df" DataFrame called "event," which flags La Niña years. The np.where() function assigns a value of 1 for years in the "nina" list and 0 otherwise. To identify dry years, we use a threshold of 38 inches, representing approximately 3 inches below the average annual rainfall.
Creating a Custom Colormap for the Plot
Ridgeline plots typically employ sequential color bars, but in this case, we want to emphasize two categories: "La Niña" and "non-La Niña" years. Note that non-La Niña years may not necessarily correspond to El Niño years, as some are classified as "neutral."
The following code generates a custom matplotlib colormap based on the values in the "event" column of the "df" DataFrame.
norm = plt.Normalize(df['event'].min(), df['event'].max())
arr = np.array(df['event'])
original_cmap = plt.cm.coolwarm
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(arr)))
The first line establishes a normalization object based on the minimum and maximum values in the "event" column. The second line extracts values into a NumPy array. The third line assigns the "coolwarm" matplotlib colormap, and the last line creates a new colormap with custom colors based on the "event" values.
Making the Ridgeline Plot
The following code generates a ridgeline plot illustrating the monthly rainfall anomaly at College Station, TX, with each year’s data displayed on its own axis.
fig, axes = joyplot(data=df_melted,
by='Year',
column='Anomaly',
range_style='own',
grid="both",
linewidth=1,
legend=False,
figsize=(7.5, 5),
title=("Monthly Rainfall Anomaly (inches)"
" at College Station, TX"),colormap=cmap,
fade=False)
# Add annotations:
for year in dry_years:
axes[year - 2000].annotate('Dry Year',
xy=(-9, 0),
fontsize='medium')
axes[11].annotate('Driest Year in Texas History',
xy=(1.5, 0),
fontsize='large',
color='firebrick')
axes[15].annotate('Wettest Year in Texas History',
xy=(7, 0),
fontsize='large',
color='b')
axes[17].annotate('Hurricane Harvey',
xy=(15.5, 0.03),
fontsize='large')
axes[-1].annotate('Red = Moderate to Strong La Niña event',
xy=(2, 0.92),
color='firebrick',
fontsize='x-large',
weight='bold')
axes[-1].axvline(0, color='k');
Outcome of the Analysis
The analysis reveals that five out of seven dry years in College Station coincided with La Niña events, suggesting a notable correlation between these weather patterns and droughts in the Southwestern US. It appears that meteorologists may have valid insights after all!
Identifying Trends in Data
Ridgeline plots are instrumental in uncovering trends within data, as illustrated in the plot below. For subtle trends, tilting the plot can enhance visibility. Ridgeline plots are best viewed on devices that allow for easy tilting, such as smartphones or tablets.
Limitations of Ridgeline Plots
One drawback of ridgeline plots is that the kernel density plots forming the "ridges" can be challenging for viewers to interpret. For instance, in our rainfall analysis, the wet years of 2004, 2015, and 2017, which exceeded 50 inches of rainfall, are not easily distinguishable.
Exploring Long-term Drought Trends with Ridgeline Plots
Another question we can address with a ridgeline plot is: "Have droughts increased in East-Central Texas over the past fifty years?" The resulting chart indicates that drought occurrences have not risen in recent years. Notably, the first half of this period (1970-1995) recorded the same number of dry years as the latter half. While the driest year on record was in 2011, several of the driest years occurred in the 1950s.
Alternatives to JoyPy
Ridgeline plots provide a captivating means to present repetitive signal data. While JoyPy, built upon the widely-used pandas and matplotlib libraries, is a strong option, there are alternatives available. For seaborn users, refer to their documentation for ridgeline plot creation. If you're a Plotly enthusiast, the "ridgeline" library is also worth exploring.
Thanks for Reading!
Thank you for following along! Stay tuned for more Quick Success Data Science projects.
The second video title is "Joy Plots with Semiotic - YouTube." This video offers insights into using semiotics for creating joy plots, providing an alternative perspective on visualizing data.