Line Chart Customization with Quartiles and Percentiles in R
Introduction
When creating line charts, it’s common to include additional information that provides context about the data distribution. In this article, we’ll explore how to add first quartile (25th percentile), third quartile (75th percentile), and 90th percentile to a line chart in R using the ggplot2 package.
Background
Before diving into the code, let’s review some key concepts:
- Quantiles: Quantiles are values that divide a dataset into equal-sized groups. The first quartile is the median of the lower half of the data, while the third quartile is the median of the upper half.
- Percentiles: Percentiles are similar to quantiles but represent a specific percentage of the data. For example, the 90th percentile represents the value below which 90% of the data falls.
Sample Dataset
We’ll start with a small sample dataset containing observations y for three timepoints time. The dataset has 100 observations for each timepoint.
# Load required libraries
library(ggplot2)
# Create a sample dataset
dd <- data.frame(y = c(rnorm(100, 10, 2), rnorm(100, 15, 2), rnorm(100, 20, 2)),
time = rep(c(1, 2, 3), each = 100))
Summarizing the Data
To calculate the desired statistics (median, first quartile, third quartile, and 90th percentile), we’ll use the dplyr package’s summarise function.
# Load required libraries
library(dplyr)
# Calculate summary statistics
dd1 <- dd %>%
group_by(time) %>%
summarise(
med = median(y),
firstquart = quantile(y, probs = 0.25),
thirdquart = quantile(y, probs = 0.75),
ninety_percentile = quantile(y, probs = 0.9)
)
Plotting the Data
We’ll use ggplot2 to create a line chart with the desired statistics.
# Create a line chart
ggplot(dd1, aes(x = time, y = med)) +
geom_line() +
geom_line(aes(x = time, y = firstquart), colour = "red") +
geom_line(aes(x = time, y = thirdquart), colour = "green") +
geom_line(aes(x = time, y = ninety_percentile), colour = "blue")
This code creates a line chart with three lines:
- The first line represents the median (
med) of each timepoint. - The second line represents the first quartile (
firstquart) in red. - The third line represents the third quartile (
thirdquart) in green. - The fourth line represents the 90th percentile (
ninety_percentile) in blue.
Customizing the Plot
We can customize the plot further by adding a theme, changing the axis labels, and including a title.
# Customize the plot
ggplot(dd1, aes(x = time, y = med)) +
geom_line() +
geom_line(aes(x = time, y = firstquart), colour = "red") +
geom_line(aes(x = time, y = thirdquart), colour = "green") +
geom_line(aes(x = time, y = ninety_percentile), colour = "blue") +
theme_minimal() +
labs(title = "Line Chart with Quartiles and Percentiles",
subtitle = "Sample dataset with 100 observations per timepoint",
x = "Time",
y = "Value")
Conclusion
In this article, we demonstrated how to add first quartile, third quartile, and 90th percentile to a line chart in R using ggplot2. We also discussed the importance of including these statistics in data visualization and provided a sample dataset and code example. By following these steps, you can create informative and engaging line charts that provide context about your data distribution.
Last modified on 2023-10-30