Data Science
Python
data visualization
seaborn
Data visualization is fundamental in data analysis and data science. Seaborn is a Python library that simplifies the creation of statistical plots with a more aesthetic appearance and less code than Matplotlib. In this article, we will explore the basics of Seaborn and how to leverage it to visualize data effectively.
Seaborn is a library based on Matplotlib that makes it easier to create statistical plots. It provides a high-level interface for generating attractive and well-structured visualizations with less code.
To get started, you need to install the library. You can do this with the following command:
1pip install seaborn
Seaborn
includes some predefined datasets that we can use for practice. Let's see how to load one:
1import seaborn as sns 2import pandas as pd 3 4# Load example dataset 5iris = sns.load_dataset("iris") 6print(iris.head())
A scatter plot is useful for visualizing the relationship between two variables.
1sns.scatterplot(x="sepal_length", y="sepal_width", data=iris) 2plt.title("Iris Scatter Plot") 3plt.show()
Bar plots allow comparing categories.
1sns.barplot(x="species", y="sepal_length", data=iris) 2plt.title("Average Sepal Length by Species") 3plt.show()
A histogram helps us visualize the distribution of a variable.
1sns.histplot(iris["sepal_length"], bins=20, kde=True) 2plt.title("Sepal Length Distribution") 3plt.show()
Box plots help visualize the distribution and outliers.
1sns.boxplot(x="species", y="petal_length", data=iris) 2plt.title("Petal Length Distribution by Species") 3plt.show()
A heatmap allows us to visualize the relationship between numerical variables.
1import numpy as np 2 3corr = iris.corr() 4sns.heatmap(corr, annot=True, cmap="coolwarm", linewidths=0.5) 5plt.title("Correlation Heatmap") 6plt.show()
This plot shows multiple scatter plots in a single figure.
1sns.pairplot(iris, hue="species") 2plt.show()