Filters¶

Added in version 0.7.

Transforming a stream of tokens into another stream is called “filtering” and is done by filters. The most common example of filters transform each token by applying a simple rules such as highlighting the token if it is a TODO or another special word, or converting keywords to uppercase to enforce a style guide. More complex filters can transform the stream of tokens, such as removing the line indentation or merging tokens together. It should be noted that pygments filters are entirely unrelated to Python’s filter.

An arbitrary number of filters can be applied to token streams coming from lexers to improve or annotate the output. To apply a filter, you can use the add_filter() method of a lexer:

>>> from pygments.lexers import PythonLexer
>>> l = PythonLexer()
>>> # add a filter given by a string and options
>>> l.add_filter('codetagify', case='lower')
>>> l.filters
[<pygments.filters.CodeTagFilter object at 0xb785decc>]
>>> from pygments.filters import KeywordCaseFilter
>>> # or give an instance
>>> l.add_filter(KeywordCaseFilter(case='lower'))

The add_filter() method takes keyword arguments which are forwarded to the constructor of the filter.

To get a list of all registered filters by name, you can use the get_all_filters() function from the pygments.filters module that returns an iterable for all known filters.

If you want to write your own filter, have a look at Write your own filter.

Builtin Filters¶

class CodeTagFilter¶

Name:: codetagify

Highlight special code tags in comments and docstrings.

Options accepted:

codetagslist of strings: A list of strings that are flagged as code tags. The default is to highlight XXX, TODO, FIXME, BUG and NOTE.

Changed in version 2.13: Now recognizes FIXME by default.

class KeywordCaseFilter¶

Name:: keywordcase

Convert keywords to lowercase or uppercase or capitalize them, which

means first letter uppercase, rest lowercase.

This can be useful e.g. if you highlight Pascal code and want to adapt the code to your styleguide.

Options accepted:

casestring: The casing to convert keywords to. Must be one of 'lower', 'upper' or 'capitalize'. The default is 'lower'.

class NameHighlightFilter¶

Name:: highlight

Highlight a normal Name (and Name.*) token with a different token type.

Example:

filter = NameHighlightFilter(
    names=['foo', 'bar', 'baz'],
    tokentype=Name.Function,
)

This would highlight the names “foo”, “bar” and “baz” as functions. Name.Function is the default token type.

Options accepted:

nameslist of strings: A list of names that should be given the different token type. There is no default.
tokentypeTokenType or string: A token type or a string containing a token type name that is used for highlighting the strings in names. The default is Name.Function.

class RaiseOnErrorTokenFilter¶

Name:: raiseonerror

Raise an exception when the lexer generates an error token.

Options accepted:

excclassException class: The exception class to raise. The default is pygments.filters.ErrorToken.

Added in version 0.8.

class VisibleWhitespaceFilter¶

Name:: whitespace

Convert tabs, newlines and/or spaces to visible characters.

Options accepted:

spacesstring or bool: If this is a one-character string, spaces will be replaces by this string. If it is another true value, spaces will be replaced by · (unicode MIDDLE DOT). If it is a false value, spaces will not be replaced. The default is False.
tabsstring or bool: The same as for spaces, but the default replacement character is » (unicode RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK). The default value is False. Note: this will not work if the tabsize option for the lexer is nonzero, as tabs will already have been expanded then.
tabsizeint: If tabs are to be replaced by this filter (see the tabs option), this is the total number of characters that a tab should be expanded to. The default is 8.
newlinesstring or bool: The same as for spaces, but the default replacement character is ¶ (unicode PILCROW SIGN). The default value is False.
wstokentypebool: If true, give whitespace the special Whitespace token type. This allows styling the visible whitespace differently (e.g. greyed out), but it can disrupt background colors. The default is True.

Added in version 0.8.

class GobbleFilter¶

Name:: gobble

Gobbles source code lines (eats initial characters).

This filter drops the first n characters off every line of code. This may be useful when the source code fed to the lexer is indented by a fixed amount of space that isn’t desired in the output.

Options accepted:

nint: The number of characters to gobble.

Added in version 1.2.

class TokenMergeFilter¶

Name:: tokenmerge

Merges consecutive tokens with the same token type in the output

stream of a lexer.

Added in version 1.2.

class SymbolFilter¶

Name:: symbols

Convert mathematical symbols such as <longrightarrow> in Isabelle

or longrightarrow in LaTeX into Unicode characters.

This is mostly useful for HTML or console output when you want to approximate the source rendering you’d see in an IDE.

Options accepted:

langstring: The symbol language. Must be one of 'isabelle' or 'latex'. The default is 'isabelle'.

Table of Contents

Previous topic

Next topic

This Page

Filters¶

Builtin Filters¶