The reason I am using Altair for most of my visualization in Python



home · about · subscribe

May 04, 2019 ·

Sadly, in Python, we do not have a ggplot2.

Python’s go to visualization library, matplotlib, is very powerful1 but has severe limitations. At times its flexibility is a blessing, but it is easy to get frustrated adding a small feature to your graph. Also, matplotlib dual object oriented and state-based interface is confusing. I still don’t completely grasp it even though I have been using matplotlib for years. Lastly, it makes only static graphs.

Altair and the grammar of graphics

Enter Altair. Altair is a wrapper for Vega-Lite, a JavaScript high-level visualization library. One of Vega-Lite2 most important features is that its API is based in the grammar of graphics.

Grammar of graphics may sound like an abstract feature, but it is the main difference between Altair and other Python visualization libraries. Altair matches the way we reason about visualizing data.

Altair only needs three main parameters:

Based on these Altair will pick sensible defaults to display your data.

My favorite example of Altair’s sensibility is how it chooses colors. If you tell Altair to color a quantitative variable then it will use a continuous color scale (light blue, blue, dark blue). If you tell Altair to color a categorical variable3 then it will use a different color for each category (red, yellow, blue).

Let’s see a concrete example:

I made up 6 countries and population numbers. The data looks like this:

import pandas as pd
import altair as alt

data = pd.DataFrame({'country_id': [1, 2, 3, 4, 5, 6],
                     'population': [1, 100, 200, 300, 400, 500],
                     'income':     [1000, 50, 200, 300, 200, 150]})
country_idpopulationincome
111000
210050
3200200
4300300
5400200
6500150

We will first plot the population data for each country:

"""As we mentioned before, we need to define 3 parameters:
 1. Mark: We do this by using "mark_circle".
 2. Channel: We only define an x-axis and we map it to the population.
 3. Encodings: We define both variables as quantitative by using :Q after the column name"""

categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        color='country_id:Q')

Does this coloring makes sense?Does this coloring makes sense?

Altair picked a continuous color scale. That doesn’t make sense! The problem is that we defined the country_id as a quantitative variable, but it is really a categorical one.

# We changed color='country_id:Q' to color='country_id:N' to indicate it is a nominal variable
categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        color='country_id:N')

This makes more sense! Each country should be represented by its own distinctive color!This makes more sense! Each country should be represented by its own distinctive color!

We only changed the encoding of the variable country_id. Instead of using Q (Quantitative) we use N (Nominal). That’s enough for Altair to know that it shouldn’t use a continuous color scale.

Extending you graphs

Another beauty of Altair than usually you easily build-up from an existing graph. For example, let’s say that now we want to add income to our graph. We simply tell Altair to map the y-axis to income:

categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        y='income:Q',
                        color='country_id:N')

Want to add tooltips? One line is all you need:

categorical_chart = alt.Chart(data).mark_circle(size=200).encode(
                        x='population:Q',
                        y='income:Q',
                        color='country_id:N',
                        tooltip=['country_id', 'population', 'income'])

Is that all?

At first, I was skeptical of using a wrapper of another library as my main visualization tool. Wrappers are often a bad idea. For example, there are many wrappers for ggplot2 that haven’t been widely adopted by the Python community. It is hard to create one that is feature complete and up to date. But Altair is different:

Interactive chart using Altair

Combination of line, circle, and text marks. The output can easily be made interactive.Combination of line, circle, and text marks. The output can easily be made interactive.

Altair main disadvantages


If this got you excited (or at least curious) I highly recommend Altair’s documentation. It is a concise and clear place to start. Don’t forget to check out the example gallery and the details of Altair internals.


  1. matplotlib recently came into the spotlight again for being attributed the first black hole image↩︎

  2. In the rest of the article, I will mainly refer to Altair, but Vega-Lite deserves as much (or more) credit. ↩︎

  3. Vega-Lite has two types of categorical data: nominal and ordinal. Nominal are categories where the order doesn’t have meaning. For example, the continents which are Europe, Asia, Africa, America, and Oceania (for me America is a continent, not the USA). Ordinal are categories where the order has meaning. For example, an Amazon review can be one, two, three, four or five stars. ↩︎

Fernando Irarrázaval

Copyright, 2026