Pretty printing tables in Python

2013-03-13

I've published a Python pretty-printer library for tables.

I used it across several projects, and decided it's time to package it properly. I often deal with tabular data in Python (a list of lists, a NumPy array), and from time to time I want to print it quickly, easily and in a reasonable human-readable format.

What's a reasonable format for me?

  • columns are aligned properly (and a table looks like a _table__)
  • values are properly padded to help readability
  • text is aligned to the left (or centered)
  • numbers are aligned by a decimal point (or aligned to the right)
  • the output can be used elsewhere (edited in the text editor, or inserted in a lightweight markup like Markdown)

What is easy for me?

  • one function, one argument, no objects, no setup; it cannot be any simpler than

    >>> from tabulate import tabulate
    >>> print tabulate([["spam", 1], ["eggs", 42]])
    ----  --
    spam   1
    eggs  42
    ----  --
    

    I prefer to just feed the data into the function, and let it analyze the table structure and choose the best layout for me (column width, alignment, padding, if horizontal and vertical lines are necessary)

  • headers are optional, but if they are given, they are displayed nicely

    >>> print tabulate([["spam", 1], ["eggs", 42]], ["item", "quantity"])
    item      quantity
    ------  ----------
    spam             1
    eggs            42
    
  • common lightweight markup table formats are supported:

    • plain tables without pseudographics
    • simple tables, like in Pandoc (it's the default format)
    • grid tables, which can be edited by Emacs table.el package and are accepted in Pandoc and reStructuredText)
    • pipe tables, like in PHP Markdown Extra (Pandoc, Python Markdown and several other Markdown implementations accept them too)
    • orgtbl tables, like in Emacs org-mode and orgtbl-mode; they are easy to edit in Emacs, Pandoc accepts them
    • rst tables is similar to simple tables, but follows the conventions of reStructuredText

    An example of a grid table:

    +-----------+-----------+
    | strings   |   numbers |
    +===========+===========+
    | spam      |   41.9999 |
    +-----------+-----------+
    | eggs      |  451      |
    +-----------+-----------+
    

    A pipe table:

    | strings   |   numbers |
    |:----------|----------:|
    | spam      |   41.9999 |
    | eggs      |  451      |
    

    An orgtbl table:

    | strings   |   numbers |
    |-----------+-----------|
    | spam      |   41.9999 |
    | eggs      |  451      |
    

There are two peculiar features of tabulate I have to tell about.

  • It tries to parse everything as a number, and formats them accordingly:

    >>> print tabulate([["one", "1.0"],["forty two", "42"]])
    ---------  --
    one         1
    forty two  42
    ---------  --
    

    This features comes handy when reading mixed textual and numeric data from a text file.

  • The default alignment for the columns of numbers is unusual; tabulate aligns the decimal points:

    >>> print tabulate([[1.234],[123.4],[12.34]])
    -------
      1.234
    123.4
     12.34
    -------
    

    This helps to compare numbers visually.

Things which tabulate doesn't do and probably will not:

  • create tables with cell and row spans; it is designed to print regular tabular data, not to be a replacement to publishing software
  • parse tables; I want tabulate to remain a small single-purpose library; if I ever need to parse text tables, that would be a separate project

Things which tabulate will probably get in future versions:

  • More output formats (Vim tables, LaTeX tables)
  • Multi-line rows
  • Different heuristics to decide about formatting. Now tabulate formats data on a per-column basis; per-cell and per-row are also reasonable auto-formatting strategies.
  • Deal with empty cells or occasional textual data in an otherwise numeric column (this may be useful to print tournament cross-tables, for example).

Some implementation details:

  • tabulate.py is just one file, it's easy to bundle with other projects,
  • it works in Python 3 too,
  • the MIT license is friendly.

There are some other Python pretty printers for tables:

  • PrettyTable requires to setup of a pretty-printer in object-oriented style. It goes against my habit of exploratory programming.
  • texttable requires object-oriented setup; I don't like it's programming interface.
  • asciitable (now astropy.io.ascii) with a FixedWidth writer is not bad, but it doesn't do what I need by default, and it's an overkill for me most of the time (and I didn't know about it when I wrote tabulate)

Decimal point alignment seems to be a unique feature of tabulate. Performance-wise, tabulate is much faster than PrettyTable and texttable, but slightly slower than asciitable.

Now I only need to port tabulate to Clojure :-)

For future updates about tabulate see #tabulate tag or subscribe to tabulate Atom feed.