Section: Heatmap et Clustermap | Data Visualization on Seaborn

Section outline

Seaborn also allows you to create heatmaps using heatmap().

Parameter name	Description	Format	Example
data	The dataframe you are working on	DataFrame, Series, dict, array, or list of arrays	data=table
cmap	Heatmap colors. Either a Matplotlib palette or a custom one.	String corresponding to a palette or a Seaborn color_palette	cmap=”viridis” or cmap = sns.color_palette("light:blue", as_cmap=True)
annot	Parameter that determines whether to display the values inside the cells.	Boolean	annot=True, default is False
vmin	Minimum value that will be taken into account for the colormap.	Float	vmin=30.6
vmax	Maximum value that will be taken into account for the colormap.	Float	vmax=42
linecolor	Parameter used to choose the color of the lines between the cells.	String corresponding to a color	linecolor=”blue”
linewidths	Parameter controlling the thickness of the lines between the cells	Float	linewidths=0.2 or linewidths=10
mask	Parameter used to control the range of values taken into account in the heatmap.	Boolean list, same format as data	mask=table_mask

Here is an example of code:

glue=sns.load_dataset("glue").pivot(index="Model",columns="Task",values="Score")
sns.heatmap(glue)

We use pivot to format the data in the order we want:

index defines the variable for the y-axis (ordinates).
columns defines the variable for the x-axis (abscissas).
values must be a numerical variable, and it is what the heatmap will use for coloring.

heatmap simple

sns.heatmap(glue,cmap="viridis",annot=True,vmin=20,vmax=80,linecolor="red",linewidths=0.1)

With the vmin and vmax parameters, we can choose the range of values over which the heatmap will apply, and we also have visual options such as linecolor and linewidths for the lines between the cells.

heatmap modifiée

If we need clustering in the heatmap, we can use Seaborn’s clustermap(). One thing to know is that this function requires SciPy, so it must be installed in the environment you are working in. If you are using Colab, this will not be necessary, as you can import it directly.

signature clustermap

Parameter name	Description	Format	Example
data	The dataframe you are working on	DataFrame, Series, dict, array, or list of arrays	data=table
method	SciPy method used to perform clustering.	String corresponding to a SciPy method.	method=’centroid’
metric	SciPy metric used for clustering.	String corresponding to a SciPy metric.	metric=’jaccard’
z_score	Parameter used to center and standardize the data.	0 to standardize rows, 1 to standardize columns.	z_score=0
standard_scale	Parameter used to normalize the data, without deviation.	0 to standardize rows, 1 to standardize columns.	standard_scale=1
row_cluster, col_cluster	Parameters used to choose the clustering axes.	Boolean	row_cluster=False
figsize	Parameter controlling the size of the figure.	tuple (width, height)	figsize=(4,4)
dendrogram_ratio	Parameter controlling the size ratio of the dendrograms.	tuple (row ratio, column ratio)	dendrogram_ratio=(0.2,0.1)
cbar_pos	Parameter controlling the position of the color bar.	tuple (left, bottom, width, height)	cbar_pos=(0,0.1,0.05,0.6)

Here is an example of code:

iris = sns.load_dataset("iris")
species = iris.pop("species")
sns.clustermap(iris)

clustermap basique

Now let’s explore different parameters:

lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
sns.clustermap(iris,row_cluster=True,dendrogram_ratio=(0.2,0.1),row_colors=row_colors,method="weighted",metric="correlation",z_score=1,annot=True,figsize=(3,9),cbar_pos=(0,0.1,0.02,0.8))

row_cluster allows grouping rows based on their similarity in order to reveal clusters. dendrogram_ratio controls the size of the dendrograms: the first value corresponds to the one on the left, and the second to the one at the top. row_colors allows adding a color indicator next to the rows. In this case, using the previous settings, the species of each row is shown. metric defines the similarity (distance) measure used, and method specifies the algorithm used for clustering. Setting z_score to 1 normalizes the data across rows. cbar_pos allows setting the position of the color bar.

clustermap plus complexe