xref: /Universal-ctags/docs/parser-python.rst (revision 0f0b7adcaadc5c490b415b7b615dc79099dc6122)
1.. _python:
2
3======================================================================
4The new Python parser
5======================================================================
6
7:Maintainer: Colomban Wendling <ban@herbesfolles.org>
8
9Introduction
10---------------------------------------------------------------------
11
12The old Python parser was a line-oriented parser that grew way beyond
13its capabilities, and ended up riddled with hacks and easily fooled by
14perfectly valid input.   By design, it especially had problems dealing
15with constructs spanning multiple lines, like triple-quoted strings
16or implicitly continued lines; but several less tricky constructs were
17also mishandled, and handling of lexical constructs was duplicated and
18each clone evolved in its own direction, supporting different features
19and having different bugs depending on the location.
20
21All this made it very hard to fix some existing bugs, or add new
22features.  To fix this regrettable state of things, the parser has been
23rewritten from scratch separating lexical analysis (generating tokens)
24from syntactical analysis (understanding what the lexemes mean).
25This moves understanding lexemes to a single location, making it
26consistent and easier to extend with new lexemes, and lightens the
27burden on the parsing code making it more concise, robust and clear.
28
29This rewrite allowed to quite easily fix all known bugs of the old
30parser, and add many new features, including:
31
32- Tagging function parameters
33- Extraction of decorators
34- Proper handling of semicolons
35- Extracting multiple variables in a combined declaration
36- More accurate support of mixed indentation
37- Tagging local variables
38
39
40The parser should be compatible with the old one.
41