xref: /Universal-ctags/docs/parser-python.rst (revision 0f0b7adcaadc5c490b415b7b615dc79099dc6122)
1*0f0b7adcSColomban Wendling.. _python:
2*0f0b7adcSColomban Wendling
3*0f0b7adcSColomban Wendling======================================================================
4*0f0b7adcSColomban WendlingThe new Python parser
5*0f0b7adcSColomban Wendling======================================================================
6*0f0b7adcSColomban Wendling
7*0f0b7adcSColomban Wendling:Maintainer: Colomban Wendling <ban@herbesfolles.org>
8*0f0b7adcSColomban Wendling
9*0f0b7adcSColomban WendlingIntroduction
10*0f0b7adcSColomban Wendling---------------------------------------------------------------------
11*0f0b7adcSColomban Wendling
12*0f0b7adcSColomban WendlingThe old Python parser was a line-oriented parser that grew way beyond
13*0f0b7adcSColomban Wendlingits capabilities, and ended up riddled with hacks and easily fooled by
14*0f0b7adcSColomban Wendlingperfectly valid input.   By design, it especially had problems dealing
15*0f0b7adcSColomban Wendlingwith constructs spanning multiple lines, like triple-quoted strings
16*0f0b7adcSColomban Wendlingor implicitly continued lines; but several less tricky constructs were
17*0f0b7adcSColomban Wendlingalso mishandled, and handling of lexical constructs was duplicated and
18*0f0b7adcSColomban Wendlingeach clone evolved in its own direction, supporting different features
19*0f0b7adcSColomban Wendlingand having different bugs depending on the location.
20*0f0b7adcSColomban Wendling
21*0f0b7adcSColomban WendlingAll this made it very hard to fix some existing bugs, or add new
22*0f0b7adcSColomban Wendlingfeatures.  To fix this regrettable state of things, the parser has been
23*0f0b7adcSColomban Wendlingrewritten from scratch separating lexical analysis (generating tokens)
24*0f0b7adcSColomban Wendlingfrom syntactical analysis (understanding what the lexemes mean).
25*0f0b7adcSColomban WendlingThis moves understanding lexemes to a single location, making it
26*0f0b7adcSColomban Wendlingconsistent and easier to extend with new lexemes, and lightens the
27*0f0b7adcSColomban Wendlingburden on the parsing code making it more concise, robust and clear.
28*0f0b7adcSColomban Wendling
29*0f0b7adcSColomban WendlingThis rewrite allowed to quite easily fix all known bugs of the old
30*0f0b7adcSColomban Wendlingparser, and add many new features, including:
31*0f0b7adcSColomban Wendling
32*0f0b7adcSColomban Wendling- Tagging function parameters
33*0f0b7adcSColomban Wendling- Extraction of decorators
34*0f0b7adcSColomban Wendling- Proper handling of semicolons
35*0f0b7adcSColomban Wendling- Extracting multiple variables in a combined declaration
36*0f0b7adcSColomban Wendling- More accurate support of mixed indentation
37*0f0b7adcSColomban Wendling- Tagging local variables
38*0f0b7adcSColomban Wendling
39*0f0b7adcSColomban Wendling
40*0f0b7adcSColomban WendlingThe parser should be compatible with the old one.
41