Builtin Tokens¶
In the pygments.token
module, there is a special object called Token
that is used to create token types.
You can create a new token type by accessing an attribute of Token whose name starts with an uppercase letter:
>>> from pygments.token import Token
>>> Token.String
Token.String
>>> Token.String is Token.String
True
Note that tokens are singletons so you can use the is
operator for comparing
token types.
You can also use the in
operator to perform set tests:
>>> from pygments.token import Comment
>>> Comment.Single in Comment
True
>>> Comment in Comment.Multi
False
This can be useful in filters and if you write lexers on your own without using the base lexers.
You can also split a token type into a hierarchy, and get the parent of it:
>>> String.split()
[Token, Token.Literal, Token.Literal.String]
>>> String.parent
Token.Literal
In principle, you can create an unlimited number of token types but nobody can guarantee that a style would define style rules for a token type. Because of that, Pygments proposes some global token types defined in the pygments.token.STANDARD_TYPES dict.
For some tokens aliases are already defined:
>>> from pygments.token import String
>>> String
Token.Literal.String
Inside the pygments.token
module the following aliases are defined:
Text |
Token.Text |
for any type of text data |
Whitespace |
Token.Text.Whitespace |
for whitespace |
Error |
Token.Error |
represents lexer errors |
Other |
Token.Other |
special token for data not matched by a parser (e.g. HTML markup in PHP code) |
Keyword |
Token.Keyword |
any kind of keywords |
Name |
Token.Name |
variable/function names |
Literal |
Token.Literal |
Any literals |
String |
Token.Literal.String |
string literals |
Number |
Token.Literal.Number |
number literals |
Operator |
Token.Operator |
operators ( |
Punctuation |
Token.Punctuation |
punctuation ( |
Comment |
Token.Comment |
any kind of comments |
Generic |
Token.Generic |
generic tokens (have a look at the explanation below) |
Normally you just create token types using the already defined aliases. For each of those token aliases, a number of subtypes exists (excluding the special tokens Token.Text, Token.Error and Token.Other)
It’s also possible to convert strings to token types (for example if you want to supply a token from the command line):
>>> from pygments.token import String, string_to_tokentype
>>> string_to_tokentype("String")
Token.Literal.String
>>> string_to_tokentype("Token.Literal.String")
Token.Literal.String
>>> string_to_tokentype(String)
Token.Literal.String
Keyword Tokens¶
- Keyword
For any kind of keyword (especially if it doesn’t match any of the subtypes of course).
- Keyword.Constant
For keywords that are constants (e.g.
None
in future Python versions).- Keyword.Declaration
For keywords used for variable declaration (e.g.
var
in some programming languages like JavaScript).- Keyword.Namespace
For keywords used for namespace declarations (e.g.
import
in Python and Java andpackage
in Java).- Keyword.Pseudo
For keywords that aren’t really keywords (e.g.
None
in old Python versions).- Keyword.Reserved
For reserved keywords.
- Keyword.Type
For builtin types that can’t be used as identifiers (e.g.
int
,char
etc. in C).
Name Tokens¶
- Name
For any name (variable names, function names, classes).
- Name.Attribute
For all attributes (e.g. in HTML tags).
- Name.Builtin
Builtin names; names that are available in the global namespace.
- Name.Builtin.Pseudo
Builtin names that are implicit (e.g.
self
in Ruby,this
in Java).- Name.Class
Class names. Because no lexer can know if a name is a class or a function or something else this token is meant for class declarations.
- Name.Constant
Token type for constants. In some languages you can recognise a token by the way it’s defined (the value after a
const
keyword for example). In other languages constants are uppercase by definition (Ruby).- Name.Decorator
Token type for decorators. Decorators are syntactic elements in the Python language. Similar syntax elements exist in C# and Java.
- Name.Entity
Token type for special entities. (e.g.
in HTML).- Name.Exception
Token type for exception names (e.g.
RuntimeError
in Python). Some languages define exceptions in the function signature (Java). You can highlight the name of that exception using this token then.- Name.Function
Token type for function names.
- Name.Function.Magic
same as Name.Function but for special function names that have an implicit use in a language (e.g.
__init__
method in Python).- Name.Label
Token type for label names (e.g. in languages that support
goto
).- Name.Namespace
Token type for namespaces. (e.g. import paths in Java/Python), names following the
module
/namespace
keyword in other languages.- Name.Other
Other names. Normally unused.
- Name.Property
Additional token type occasionally used for class attributes.
- Name.Tag
Tag names (in HTML/XML markup or configuration files).
- Name.Variable
Token type for variables. Some languages have prefixes for variable names (PHP, Ruby, Perl). You can highlight them using this token.
- Name.Variable.Class
same as Name.Variable but for class variables (also static variables).
- Name.Variable.Global
same as Name.Variable but for global variables (used in Ruby, for example).
- Name.Variable.Instance
same as Name.Variable but for instance variables.
- Name.Variable.Magic
same as Name.Variable but for special variable names that have an implicit use in a language (e.g.
__doc__
in Python).
Literals¶
- Literal
For any literal (if not further defined).
- Literal.Date
for date literals (e.g.
42d
in Boo).- String
For any string literal.
- String.Affix
Token type for affixes that further specify the type of the string they’re attached to (e.g. the prefixes
r
andu8
inr"foo"
andu8"foo"
).- String.Backtick
Token type for strings enclosed in backticks.
- String.Char
Token type for single characters (e.g. Java, C).
- String.Delimiter
Token type for delimiting identifiers in “heredoc”, raw and other similar strings (e.g. the word
END
in Perl codeprint <<'END';
).- String.Doc
Token type for documentation strings (for example Python).
- String.Double
Double quoted strings.
- String.Escape
Token type for escape sequences in strings.
- String.Heredoc
Token type for “heredoc” strings (e.g. in Ruby or Perl).
- String.Interpol
Token type for interpolated parts in strings (e.g.
#{foo}
in Ruby).- String.Other
Token type for any other strings (for example
%q{foo}
string constructs in Ruby).- String.Regex
Token type for regular expression literals (e.g.
/foo/
in JavaScript).- String.Single
Token type for single quoted strings.
- String.Symbol
Token type for symbols (e.g.
:foo
in LISP or Ruby).- Number
Token type for any number literal.
- Number.Bin
Token type for binary literals (e.g.
0b101010
).- Number.Float
Token type for float literals (e.g.
42.0
).- Number.Hex
Token type for hexadecimal number literals (e.g.
0xdeadbeef
).- Number.Integer
Token type for integer literals (e.g.
42
).- Number.Integer.Long
Token type for long integer literals (e.g.
42L
in Python).- Number.Oct
Token type for octal literals.
Operators¶
- Operator
For any punctuation operator (e.g.
+
,-
).- Operator.Word
For any operator that is a word (e.g.
not
).
Punctuation¶
Added in version 0.7.
- Punctuation
For any punctuation which is not an operator (e.g.
[
,(
…)- Punctuation.Marker
For markers that point to a location (e.g., carets in Python tracebacks for syntax errors).
Added in version 2.10.
Generic Tokens¶
Generic tokens are for special lexers like the DiffLexer that doesn’t really highlight a programming language but a patch file.
- Generic
A generic, unstyled token. Normally you don’t use this token type.
- Generic.Deleted
Marks the token value as deleted.
- Generic.Emph
Marks the token value as emphasized.
- Generic.Error
Marks the token value as an error message.
- Generic.Heading
Marks the token value as headline.
- Generic.Inserted
Marks the token value as inserted.
- Generic.Output
Marks the token value as program output (e.g. for python cli lexer).
- Generic.Prompt
Marks the token value as command prompt (e.g. bash lexer).
- Generic.Strong
Marks the token value as bold (e.g. for rst lexer).
- Generic.EmphStrong
Marks the token value as bold and emphasized.
- Generic.Subheading
Marks the token value as subheadline.
- Generic.Traceback
Marks the token value as a part of an error traceback.
Comments¶
Token type for any comment.
#!
).Token type for multiline comments.
Token type for preprocessor comments (also
<?php
/<%
constructs).Token type for filenames in preprocessor comments, such as include files in C/C++.
Token type for comments that end at the end of a line (e.g.
# foo
).Special data in comments. For example code tags, author and license information, etc.