Write your own formatter

As well as creating your own lexer, writing a new formatter for Pygments is easy and straightforward.

A formatter is a class that is initialized with some keyword arguments (the formatter options) and that must provides a format() method. Additionally a formatter should provide a get_style_defs() method that returns the style definitions from the style in a form usable for the formatter’s output format.


The most basic formatter shipped with Pygments is the NullFormatter. It just sends the value of a token to the output stream:

from pygments.formatter import Formatter

class NullFormatter(Formatter):
    def format(self, tokensource, outfile):
        for ttype, value in tokensource:

As you can see, the format() method is passed two parameters: tokensource and outfile. The first is an iterable of (token_type, value) tuples, the latter a file like object with a write() method.

Because the formatter is that basic it doesn’t overwrite the get_style_defs() method.


Styles aren’t instantiated but their metaclass provides some class functions so that you can access the style definitions easily.

Styles are iterable and yield tuples in the form (ttype, d) where ttype is a token and d is a dict with the following keys:

Hexadecimal color value (eg: 'ff0000' for red) or None if not defined.
True if the value should be bold
True if the value should be italic
True if the value should be underlined
Hexadecimal color value for the background (eg: 'eeeeeee' for light gray) or None if not defined.
Hexadecimal color value for the border (eg: '0000aa' for a dark blue) or None for no border.

Additional keys might appear in the future, formatters should ignore all keys they don’t support.

HTML 3.2 Formatter

For an more complex example, let’s implement a HTML 3.2 Formatter. We don’t use CSS but inline markup (<u>, <font>, etc). Because this isn’t good style this formatter isn’t in the standard library ;-)

from pygments.formatter import Formatter

class OldHtmlFormatter(Formatter):

    def __init__(self, **options):
        Formatter.__init__(self, **options)

        # create a dict of (start, end) tuples that wrap the
        # value of a token so that we can use it in the format
        # method later
        self.styles = {}

        # we iterate over the `_styles` attribute of a style item
        # that contains the parsed style values.
        for token, style in self.style:
            start = end = ''
            # a style item is a tuple in the following form:
            # colors are readily specified in hex: 'RRGGBB'
            if style['color']:
                start += '<font color="#%s">' % style['color']
                end = '</font>' + end
            if style['bold']:
                start += '<b>'
                end = '</b>' + end
            if style['italic']:
                start += '<i>'
                end = '</i>' + end
            if style['underline']:
                start += '<u>'
                end = '</u>' + end
            self.styles[token] = (start, end)

    def format(self, tokensource, outfile):
        # lastval is a string we use for caching
        # because it's possible that an lexer yields a number
        # of consecutive tokens with the same token type.
        # to minimize the size of the generated html markup we
        # try to join the values of same-type tokens here
        lastval = ''
        lasttype = None

        # wrap the whole output with <pre>

        for ttype, value in tokensource:
            # if the token type doesn't exist in the stylemap
            # we try it with the parent of the token type
            # eg: parent of Token.Literal.String.Double is
            # Token.Literal.String
            while ttype not in self.styles:
                ttype = ttype.parent
            if ttype == lasttype:
                # the current token type is the same of the last
                # iteration. cache it
                lastval += value
                # not the same token as last iteration, but we
                # have some data in the buffer. wrap it with the
                # defined style and write it to the output file
                if lastval:
                    stylebegin, styleend = self.styles[lasttype]
                    outfile.write(stylebegin + lastval + styleend)
                # set lastval/lasttype to current values
                lastval = value
                lasttype = ttype

        # if something is left in the buffer, write it to the
        # output file, then close the opened <pre> tag
        if lastval:
            stylebegin, styleend = self.styles[lasttype]
            outfile.write(stylebegin + lastval + styleend)

The comments should explain it. Again, this formatter doesn’t override the get_style_defs() method. If we would have used CSS classes instead of inline HTML markup, we would need to generate the CSS first. For that purpose the get_style_defs() method exists:

Generating Style Definitions

Some formatters like the LatexFormatter and the HtmlFormatter don’t output inline markup but reference either macros or css classes. Because the definitions of those are not part of the output, the get_style_defs() method exists. It is passed one parameter (if it’s used and how it’s used is up to the formatter) and has to return a string or None.