Seaborn in Python
Introduction to Seaborn in Python
Seaborn
is a powerful visualization library built on top of Matplotlib
that makes it easier to create aesthetically pleasing
and informative statistical plots. It integrates well with pandas
DataFrames and provides high-level functions to create common
visualizations like bar plots, scatter plots, box plots, heatmaps, and more.
Installation
To install Seaborn, you can use pip
:
pip install seaborn
Importing Seaborn
Import Seaborn and other libraries:
import seaborn as sns
import matplotlib.pyplot as plt
Basic Plots in Seaborn
- Line Plot A line plot is used to show the relationship between two continuous variables.
import seaborn as sns
import matplotlib.pyplot as plt
# Load example dataset
tips = sns.load_dataset("tips")
# Create a line plot
sns.lineplot(x="day", y="total_bill", data=tips)
# Display the plot
plt.show()
data:image/s3,"s3://crabby-images/5e029/5e02920365fe9278f527b31d02cf5b593f973392" alt="Output"
Note: "According to the tips dataset documentation, the Tips dataset is a data frame with 244 rows and 7 variables which represents some tipping data where one waiter recorded information about each tip he received over a period of a few months working in one restaurant."
- Scatter Plot A scatter plot is used to display the relationship between two continuous variables.
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/8d9b7/8d9b79118f2327135610076e208d664f13753a3c" alt="Output"
- Bar Plot A bar plot is used to compare quantities for different categories.
sns.barplot(x="day", y="total_bill", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/5d51d/5d51d32b7b6e131f418d315ee8dfba5ba059638b" alt="Output"
- Box Plot A box plot shows the distribution of a dataset based on a five-number summary (minimum, first quartile, median, third quartile, and maximum).
sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/7486a/7486ac05b3171d8e8252b2db3d5e2c1da604d444" alt="Output"
- Histogram A histogram is used to show the distribution of a dataset.
sns.histplot(tips['total_bill'], kde=True) # Including kernel density estimate (KDE)
plt.show()
data:image/s3,"s3://crabby-images/06089/06089f05f46a328f7fad3f08a996288c858aa965" alt="Output"
- Heatmap A heatmap is used to visualize data in matrix format, where values are represented by color intensity.
import numpy as np
# Create a correlation matrix
corr = tips.corr()
# Create a heatmap
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.show()
data:image/s3,"s3://crabby-images/c3696/c3696ed284407973299ad31297737081512f0a53" alt="Output"
Customizing Seaborn Plots
Seaborn allows you to easily customize your plots using various parameters.
- Customizing Colors You can customize the color palette to make the plot more attractive.
# Use a predefined color palette
sns.set_palette("darkgrid")
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()
- Adding Titles and Labels You can add titles, axis labels, and adjust the plot size.
sns.barplot(x="day", y="total_bill", data=tips)
# Add title and labels
plt.title("Average Total Bill by Day")
plt.xlabel("Day")
plt.ylabel("Total Bill")
plt.show()
data:image/s3,"s3://crabby-images/1f32f/1f32f3a4a23ce051783d3fcfa7ff36bdfda8d6f5" alt="Output"
Seaborn Themes
Seaborn provides several themes for controlling the style of the plots.
-
darkgrid: Gray background with gridlines.
-
whitegrid: White background with gridlines.
-
dark: Dark background with no gridlines.
-
white: White background with no gridlines.
-
ticks: White background with ticks on the axes.
-
Using Themes
# Set the style to darkgrid
sns.set_theme(style="darkgrid")
# Create a plot
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/a578f/a578f4729f1f1daab16ff9889b6c2f07642edff1" alt="Output"
Pair Plot
A pair plot allows you to visualize relationships between multiple variables in a dataset.
sns.pairplot(tips)
plt.show()
data:image/s3,"s3://crabby-images/866e4/866e4540fd4f84e23ad4a66787c2f32afd4e4ecc" alt="Output"
FacetGrid
FacetGrid
allows you to create multiple subplots based on some categorical variable.
# Create FacetGrid based on the "sex" column
g = sns.FacetGrid(tips, col="sex")
g.map(sns.scatterplot, "total_bill", "tip")
plt.show()
data:image/s3,"s3://crabby-images/597e0/597e04dd07b24ce5bd5daa9342a4c5643be7752c" alt="Output"
Regression Plot
A regression plot shows the relationship between two variables and fits a regression line.
sns.regplot(x="total_bill", y="tip", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/a7853/a78530171e9abbecab8d83291f14482aa303363b" alt="Output"
Violin Plot
A violin plot combines aspects of box plots and density plots. It shows the distribution of the data across different categories.
sns.violinplot(x="day", y="total_bill", data=tips)
plt.show()
data:image/s3,"s3://crabby-images/0e4a7/0e4a7b37c88116cfa71f0d45912680797d5401e8" alt="Output"
Seaborn with Pandas DataFrames
Seaborn works seamlessly with pandas DataFrames, and it's easy to pass data directly from pandas to Seaborn functions.
import seaborn as sns
import pandas as pd
# Create a pandas DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'C', 'A', 'B', 'C'],
'Values': [1, 2, 3, 4, 5, 6]
})
# Create a bar plot directly from the DataFrame
sns.barplot(x='Category', y='Values', data=df)
plt.show()
data:image/s3,"s3://crabby-images/33f3d/33f3d7ca544300e48b8c402ad0c6b6b5ec4b3fc2" alt="Output"
Saving Seaborn Plots
You can save your Seaborn plots as image files (e.g., PNG, SVG) using Matplotlib
.
# Create a bar plot
sns.barplot(x="day", y="total_bill", data=tips)
# Save the plot as PNG
plt.savefig("seaborn_plot.png")
# Show the plot
plt.show()
data:image/s3,"s3://crabby-images/b1618/b161811c8e3b4e1f134986b142bd40da8ee986a8" alt="Output"
Key Seaborn Plots
- Line Plot: Shows relationships between continuous variables.
- Bar Plot: Compares quantities across categories.
- Scatter Plot: Displays relationships between two variables.
- Box Plot: Shows data distribution based on percentiles.
- Histogram: Displays the distribution of a single variable.
- Heatmap: Visualizes matrix-like data.
- Pair Plot: Visualizes relationships between multiple variables.
- FacetGrid: Creates subplots based on categories.
- Regression Plot: Plots data with a fitted regression line.
- Violin Plot: Displays data distribution and density.