1.. ctags Internal API 2.. --------------------------------------------------------------------- 3 4.. _input-text-stream: 5 6Input text stream 7~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8 9.. figure:: input-text-stream.svg 10 :scale: 80% 11 12Function prototypes for handling input text stream are declared in 13``main/read.h``. The file exists in Exuberant Ctags, too. However, the 14names functions are changed when overhauling ``--line-directive`` 15option. (In addition macros were converted to functions for making 16data structures for the input text stream opaque.) 17 18Ctags has 3 groups of functions for handling input: *input*, *bypass*, and 19*raw*. Parser developers should use input group. The rest of two 20are for ctags main part. 21 22 23.. _inputFile: 24 25`inputFile` type and the functions of input group 26...................................................................... 27 28.. note:: The original version of this section was written 29 before ``inputFile`` type and ``File`` variable are made private. 30 31``inputFile`` is the type for representing the input file and stream for 32a parser. It was declared in ``main/read.h`` but now it is defined in 33``main/read.c``. 34 35Ctags uses a file static variable ``File`` having type ``inputFile`` for 36maintaining the input file and stream. ``File`` is also defined in 37main/read.c as ``inputFile`` is. 38 39``fp`` and ``line`` are the essential fields of ``File``. ``fp`` having type 40well known ``MIO`` declared in ``main/mio.h``. By calling functions of input group 41(``getcFromInputFile`` and ``readLineFromInputFile``), a parser gets input 42text from ``fp``. 43 44The functions of input group updates fields ``input`` and ``source`` of ``File`` variable. 45These two fields has type ``inputFileInfo``. These two fields are for mainly 46tracking the name of file and the current line number. Usually ctags uses 47only ``input`` field. ``source`` field is used only when ``#line`` directive is found 48in the current input text stream. 49 50A case when a tool generates the input file from another file, a tool 51can record the original source file to the generated file with using 52the ``#line`` directive. ``source`` field is used for tracking/recording the 53information appeared on ``#line`` directives. 54 55Regex pattern matching are also done behind calling the functions of 56this group. 57 58 59The functions of bypass group 60...................................................................... 61The functions of bypass group (``readLineFromBypass`` and 62``readLineFromBypassSlow``) are used for reading text from ``fp`` field of 63``File`` static variable without updating ``input`` and ``source`` fields of 64``File`` variable. 65 66 67Parsers may not need the functions of this group. The functions are 68used in ctags main part. The functions are used to make pattern 69fields of tags file, for example. 70 71 72The functions of raw group 73...................................................................... 74The functions of this group (``readLineRaw`` and ``readLineRawWithNoSeek``) 75take a parameter having type ``MIO``; and don't touch ``File`` static 76variable. 77 78Parsers may not need the functions of this group. The functions are 79used in ctags main part. The functions are used to load option files, 80for example. 81 82 83.. NOT REVIEWED YET 84 85.. _output-tag-stream: 86 87Output tag stream 88~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 89 90.. figure:: output-tag-stream.svg 91 :scale: 80% 92 93Ctags provides ``makeTagEntry`` to parsers as an entry point for writing 94tag information to MIO. ``makeTagEntry`` calls ``writeTagEntry`` if the 95parser does not set ``CORK_QUEUE`` to ``useCork`` field. ``writeTagEntry`` calls ``writerWriteTag``. 96``writerWriteTag`` just calls ``writeEntry`` of writer backends. 97``writerTable`` variable holds the four backends: ctagsWriter, etagsWriter, 98xrefWriter, and jsonWriter. 99One of them is chosen depending on the arguments passed to ctags. 100 101If ``CORK_QUEUE`` is set to ``useCork``, the tag information goes to a queue on memory. 102The queue is flushed when ``useCork`` in unset. See "`cork API`_" for more 103details. 104 105cork API 106...................................................................... 107 108Background and Idea 109^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 110*cork API* is introduced for recording scope information easier. 111 112Before introducing cork API, a scope information must be recorded as 113strings. It is flexible but memory management is required. 114Following code is taken from ``clojure.c`` (with some modifications). 115 116.. code-block:: c 117 118 if (vStringLength (parent) > 0) 119 { 120 current.extensionFields.scope[0] = ClojureKinds[K_NAMESPACE].name; 121 current.extensionFields.scope[1] = vStringValue (parent); 122 } 123 124 makeTagEntry (¤t); 125 126``parent``, ``scope [0]`` and ``scope [1]`` are vStrings. The parser must manage 127their life cycles; the parser cannot free them till the tag referring them via 128its scope fields are emitted, and must free them after emitting. 129 130cork API provides more solid way to hold scope information. cork API 131expects ``parent``, which represents scope of a tag(``current``) 132currently parser dealing, is recorded to a *tags* file before recording 133the ``current`` tag via ``makeTagEntry`` function. 134 135For passing the information about ``parent`` to ``makeTagEntry``, 136``tagEntryInfo`` object was created. It was used just for recording; and 137freed after recording. In cork API, it is not freed after recording; 138a parser can reused it as scope information. 139 140How to use 141^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 142 143See a commit titled with "`clojure: use cork <https://github.com/universal-ctags/ctags/commit/ef181e6>`_". 144I applied cork API to the clojure parser. 145 146Cork API can be enabled and disabled per parser, 147and is disabled by default. So there is no impact till you 148enables it in your parser. 149 150``useCork`` field is introduced in ``parserDefinition`` type: 151 152.. code-block:: c 153 154 typedef struct { 155 ... 156 unsigned int useCork; 157 ... 158 } parserDefinition; 159 160Set ``CORK_QUEUE`` to ``useCork`` like: 161 162.. code-block:: c 163 164 extern parserDefinition *ClojureParser (void) 165 { 166 ... 167 parserDefinition *def = parserNew ("Clojure"); 168 ... 169 def->useCork = CORK_QUEUE; 170 return def; 171 } 172 173When ctags running a parser with ``useCork`` being ``CORK_QUEUE``, all output 174requested via ``makeTagEntry`` function calling is stored to an internal 175queue, not to ``tags`` file. When parsing an input file is done, the 176tag information stored automatically to the queue are flushed to 177``tags`` file in batch. 178 179When calling ``makeTagEntry`` with a ``tagEntryInfo`` object (``parent``), 180it returns an integer. The integer can be used as handle for referring 181the object after calling. 182 183 184.. code-block:: c 185 186 int parent = CORK_NIL; 187 ... 188 parent = makeTagEntry (&e); 189 190The handle can be used by setting to a ``scopeIndex`` 191field of ``current`` tag, which is in the scope of ``parent``. 192 193.. code-block:: c 194 195 current.extensionFields.scopeIndex = parent; 196 197When passing ``current`` to ``makeTagEntry``, the ``scopeIndex`` is 198referred for emitting the scope information of ``current``. 199 200``scopeIndex`` must be set to ``CORK_NIL`` if a tag is not in any scope. 201When using ``scopeIndex`` of ``current``, ``NULL`` must be assigned to both 202``current.extensionFields.scope[0]`` and 203``current.extensionFields.scope[1]``. ``initTagEntry`` function does this 204initialization internally, so you generally you don't have to write 205the initialization explicitly. 206 207Automatic full qualified tag generation 208^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 209 210If a parser uses the cork API for recording and emitting scope 211information, ctags can reuse it for generating *full qualified (FQ) 212tags*. Set ``requestAutomaticFQTag`` field of ``parserDefinition`` to 213``TRUE`` then the main part of ctags emits FQ tags on behalf of the parser 214if ``--extras=+q`` is given. 215 216An example can be found in DTS parser: 217 218.. code-block:: c 219 220 extern parserDefinition* DTSParser (void) 221 { 222 static const char *const extensions [] = { "dts", "dtsi", NULL }; 223 parserDefinition* const def = parserNew ("DTS"); 224 ... 225 def->requestAutomaticFQTag = TRUE; 226 return def; 227 } 228 229Setting ``requestAutomaticFQTag`` to ``TRUE`` implies setting 230``useCork`` to ``CORK_QUEUE``. 231 232.. NOT REVIEWED YET 233 234.. _symtabAPI: 235 236symbol table API 237^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 238 239*symbol table* API is an extension to the cork API. The cork API was 240introduced to provide the simple way to represent mapping (*forward 241mapping*) from a language object (*child object*) to its upper scope 242(*parent object*). *symbol table* API is for representing the mapping 243(*reverse mapping*) opposite direction; you can look up (or traverse) 244child tags defined (or used) in a given tag. 245 246To use this API, a parser must set ``CORK_SYMTAB`` to ``useCork`` member 247of ``parserDefinition`` in addition to setting ``CORK_QUEUE`` as preparation. 248 249An example taken from R parser: 250 251.. code-block:: c 252 253 extern parserDefinition *RParser (void) 254 { 255 static const char *const extensions[] = { "r", "R", "s", "q", NULL }; 256 parserDefinition *const def = parserNew ("R"); 257 258 ... 259 260 def->useCork = CORK_QUEUE | CORK_SYMTAB; 261 262 ... 263 264 return def; 265 } 266 267 268To install a reverse mapping between a parent and its child tags, 269call ``registerEntry`` with the cork index for a child after making 270the child tag filling ``scopeIndex``: 271 272.. code-block:: c 273 274 int parent = CORK_NIL; 275 ... 276 parent = makeTagEntry (&e_parent); 277 278 ... 279 280 tagEntryInfo e_child; 281 ... 282 initTagEntry (&e_child, ...); 283 e_child.extensionFields.scopeIndex = parent; /* setting up forward mapping */ 284 ... 285 int child = makeTagEntry (&e_child); 286 287 registerEntry (child); /* setting up reverse mapping */ 288 289``registerEntry`` stores ``child`` to the symbol table of ``parent``. 290If ``scopeIndex`` of ``child`` is ``CORK_NIL``, the ``child`` is stores 291to the *toplevel scope*. 292 293``unregisterEntry`` is for clearing (and updating) the reverse mapping 294of a child. Consider the case you want to change the scope of ``child`` 295from ``newParent``. 296 297.. code-block:: c 298 299 unregisterEntry (child); /* delete the reverse mapping. */ 300 tagEntryInfo *e_child = getEntryInCorkQueue (child); 301 e_child->extensionFields.scopeIndex = newParent; /* update the forward mapping. */ 302 registerEntry (child); /* set the new reverse mapping. */ 303 304``foreachEntriesInScope`` is the function for traversing all child 305tags stored to the parent tag specified with ``corkIndex``. 306If the ``corkIndex`` is ``CORK_NIL``, the children defined (and/or 307used) in *toplevel scope* are traversed. 308 309.. code-block:: c 310 311 typedef bool (* entryForeachFunc) (int corkIndex, 312 tagEntryInfo * entry, 313 void * data); 314 bool foreachEntriesInScope (int corkIndex, 315 const char *name, /* or NULL */ 316 entryForeachFunc func, 317 void *data); 318 319``foreachEntriesInScope`` takes a ``foreachEntriesInScope`` typed 320callback function. ``foreachEntriesInScope`` passes the cork 321index and a pointer for ``tagEntryInfo`` object of children. 322 323`anyEntryInScope` is a function for finding a child tag stored 324to the parent tag specified with ``corkIndex``. It returns 325the cork index for the child tag. If ``corkIndex`` is ``CORK_NIL``, 326`anyEntryInScope` finds a tag stored to the toplevel scope. 327The returned child tag has ``name`` as its name as far as ``name`` 328is not ``NULL``. 329 330.. code-block:: c 331 332 int anyEntryInScope (int corkIndex, 333 const char *name, 334 bool onlyDefinitionTag); 335 336 337.. _tokeninfo: 338 339tokenInfo API 340~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 341 342In Exuberant Ctags, a developer can write a parser anyway; only input 343stream and tagEntryInfo data structure is given. 344 345However, while maintaining Universal Ctags I (Masatake YAMATO) think 346we should have a framework for writing parser. Of course the framework 347is optional; you can still write a parser without the framework. 348 349To design a framework, I have studied how @b4n (Colomban Wendling) 350writes parsers. tokenInfo API is the first fruit of my study. 351 352TBW 353 354Multiple parsers 355~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 356 357.. _promiseAPI: 358 359Guest parser (promise API) 360...................................................................... 361 362See ":ref:`host-guest-parsers`" about the concept of guest parsers. 363 364Background and Idea 365^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 366More than one programming languages can be used in one input text stream. 367*promise API* allows a host parser running a :ref:`guest parser 368<host-guest-parsers>` in the specified area of input text stream. 369 370e.g. Code written in c language (C code) is embedded 371in code written in Yacc language (Yacc code). Let's think about this 372input stream. 373 374.. code-block:: yacc 375 376 /* foo.y */ 377 %token 378 END_OF_FILE 0 379 ERROR 255 380 BELL 1 381 382 %{ 383 /* C language */ 384 int counter; 385 %} 386 %right EQUALS 387 %left PLUS MINUS 388 ... 389 %% 390 CfgFile : CfgEntryList 391 { InterpretConfigs($1); } 392 ; 393 394 ... 395 %% 396 int 397 yyerror(char *s) 398 { 399 (void)fprintf(stderr,"%s: line %d of %s\n",s,lineNum, 400 (scanFile?scanFile:"(unknown)")); 401 if (scanStr) 402 (void)fprintf(stderr,"last scanned symbol is: %s\n",scanStr); 403 return 1; 404 } 405 406In the input the area started from ``%{`` to ``%}`` and the area started from 407the second ``%%`` to the end of file are written in C. Yacc can be called 408*host language*, and C can be called *guest language*. 409 410Ctags may choose the Yacc parser for the input. However, the parser 411doesn't know about C syntax. Implementing C parser in the Yacc parser 412is one of approach. However, ctags has already C parser. The Yacc 413parser should utilize the existing C parser. The promise API allows this. 414 415See also ":ref:`host-guest-parsers`" about more concept and examples of the 416guest parser. 417 418Usage 419^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 420 421See a commit titled with "`Yacc: run C parser in the areas where code 422is written in C <https://github.com/universal-ctags/ctags/commit/757673f>`_". 423I applied promise API to the Yacc parser. 424 425The parser for host language must track and record the ``start`` and the 426``end`` of a guest language. Pairs of ``line number`` and ``byte offset`` 427represents the ``start`` and ``end``. When the ``start`` and ``end`` are 428fixed, call ``makePromise`` with (1) the guest parser name, (2) ``start``, 429and (3) ``end``. (This description is a bit simplified the real usage.) 430 431 432Let's see the actual code from "`parsers/yacc.c 433<https://github.com/universal-ctags/ctags/blob/master/parsers/yacc.c>`_". 434 435.. code-block:: c 436 437 struct cStart { 438 unsigned long input; 439 unsigned long source; 440 }; 441 442Both fields are for recording ``start``. ``input`` field 443is for recording the value returned from ``getInputLineNumber``. 444``source`` is for ``getSourceLineNumber``. See "`inputFile`_" for the 445difference of the two. 446 447``enter_c_prologue`` shown in the next is a function called when ``%{`` is 448found in the current input text stream. Remember, in yacc syntax, ``%{`` 449is a marker of C code area. 450 451.. code-block:: c 452 453 static void enter_c_prologue (const char *line CTAGS_ATTR_UNUSED, 454 const regexMatch *matches CTAGS_ATTR_UNUSED, 455 unsigned int count CTAGS_ATTR_UNUSED, 456 void *data) 457 { 458 struct cStart *cstart = data; 459 460 461 readLineFromInputFile (); 462 cstart->input = getInputLineNumber (); 463 cstart->source = getSourceLineNumber (); 464 } 465 466 467The function just records the start line. It calls 468``readLineFromInputFile`` because the C code may start the next line of 469the line where the marker is. 470 471``leave_c_prologue`` shown in the next is a function called when ``%}``, 472the end marker of C code area, is found in the current input text stream. 473 474.. code-block:: c 475 476 static void leave_c_prologue (const char *line CTAGS_ATTR_UNUSED, 477 const regexMatch *matches CTAGS_ATTR_UNUSED, 478 unsigned int count CTAGS_ATTR_UNUSED, 479 void *data) 480 { 481 struct cStart *cstart = data; 482 unsigned long c_end; 483 484 c_end = getInputLineNumber (); 485 makePromise ("C", cstart->input, 0, c_end, 0, cstart->source); 486 } 487 488After recording the line number of the end of the C code area, 489``leave_c_prologue`` calls ``makePromise``. 490 491Of course ``"C"`` stands for C language, the name of guest parser. 492Available parser names can be listed by running ctags with 493``--list-languages`` option. In this example two ``0`` characters are provided as 494the 3rd and 5th argument. They are byte offsets of the start and the end of the 495C language area from the beginning of the line which is 0 in this case. In 496general, the guest language's section does not have to start at the beginning of 497the line in which case the two offsets have to be provided. Compilers reading 498the input character by character can obtain the current offset by calling 499``getInputLineOffset()``. 500 501Internal design 502^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 503 504.. figure:: promise.svg 505 :scale: 80% 506 507A host parser cannot run a guest parser directly. What the host parser 508can do is just asking the ctags main part scheduling of running the 509guest parser for specified area which defined with the ``start`` and 510``end``. These scheduling requests are called *promises*. 511 512After running the host parser, before closing the input stream, the 513ctags main part checks the existence of promise(s). If there is, the 514main part makes a sub input stream and run the guest parser specified 515in the promise. The sub input stream is made from the original input 516stream by narrowing as requested in the promise. The main part 517iterates the above process till there is no promise. 518 519Theoretically a guest parser can be nested; it can make a promise. 520The level 2 guest is also just scheduled. (However, I have never 521tested such a nested guest parser). 522 523Why not running the guest parser directly from the context of the host 524parser? Remember many parsers have their own file static variables. If 525a parser is called from the parser, the variables may be crashed. 526 527API for subparser 528...................................................................... 529 530See ":ref:`base-sub-parsers`" about the concept of subparser. 531 532.. note:: Consider using optlib when implementing a subparser. It is much more 533 easy and simple. See ":ref:`defining-subparsers`" for details. 534 535Outline 536^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 537 538You have to work on both sides: a base parser and subparsers. 539 540A base parser must define a data structure type (``baseMethodTable``) for 541its subparsers by extending ``struct subparser`` defined in 542``main/subparser.h``. A subparser defines a variable (``subparser var``) 543having type ``baseMethodTable`` by filling its fields and registers 544``subparser var`` to the base parser using dependency API. 545 546The base parser calls functions pointed by ``baseMethodTable`` of 547subparsers during parsing. A function for probing a higher level 548language may be included in ``baseMethodTable``. What kind of fields 549should be included in ``baseMethodTable`` is up to the design of a base 550parser and the requirements of its subparsers. A method for 551probing is one of them. 552 553Registering a ``subparser var`` to a base parser is enough for the 554bottom up choice. For handling the top down choice (e.g. specifying 555``--language-force=<subparser>`` in a command line), more code is needed. 556 557In the top down choice, the subparser must call ``scheduleRunningBasepaser``, 558declared in ``main/subparser.h``, in its ``parser`` method. 559Here, ``parser`` method means a function assigned to the ``parser`` member of 560the ``parserDefinition`` of the subparser. 561``scheduleRunningBaseparser`` takes an integer argument 562that specifies the dependency used for registering the ``subparser var``. 563 564By extending ``struct subparser`` you can define a type for 565your subparser. Then make a variable for the type and 566declare a dependency on the base parser. 567 568Fields of ``subparser`` type 569^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 570 571Here the source code of Autoconf/m4 parsers is referred as an example. 572 573``main/types.h``: 574 575.. code-block:: C 576 577 struct sSubparser; 578 typedef struct sSubparser subparser; 579 580 581``main/subparser.h``: 582 583.. code-block:: C 584 585 typedef enum eSubparserRunDirection { 586 SUBPARSER_BASE_RUNS_SUB = 1 << 0, 587 SUBPARSER_SUB_RUNS_BASE = 1 << 1, 588 SUBPARSER_BI_DIRECTION = SUBPARSER_BASE_RUNS_SUB|SUBPARSER_SUB_RUNS_BASE, 589 } subparserRunDirection; 590 591 struct sSubparser { 592 ... 593 594 /* public to the parser */ 595 subparserRunDirection direction; 596 597 void (* inputStart) (subparser *s); 598 void (* inputEnd) (subparser *s); 599 void (* exclusiveSubparserChosenNotify) (subparser *s, void *data); 600 }; 601 602A subparser must fill the fields of ``subparser``. 603 604``direction`` field specifies how the subparser is called. See 605":ref:`multiple_parsers_directions`" in ":ref:`multiple_parsers`" about 606*direction flags*, and see ":ref:`optlib_directions`" in ":ref:`optlib`" for 607examples of using the direction flags. 608 609=========================== ====================== 610``direction`` field Direction Flag 611=========================== ====================== 612``SUBPARSER_BASE_RUNS_SUB`` ``shared`` (default) 613``SUBPARSER_SUB_RUNS_BASE`` ``dedicated`` 614``SUBPARSER_BI_DIRECTION`` ``bidirectional`` 615=========================== ====================== 616 617If a subparser runs exclusively and is chosen in top down way, set 618``SUBPARSER_SUB_RUNS_BASE`` flag. If a subparser runs coexisting way and 619is chosen in bottom up way, set ``SUBPARSER_BASE_RUNS_SUB``. Use 620``SUBPARSER_BI_DIRECTION`` if both cases can be considered. 621 622SystemdUnit parser runs as a subparser of iniconf base parser. 623SystemdUnit parser specifies ``SUBPARSER_SUB_RUNS_BASE`` because 624unit files of systemd have very specific file extensions though 625they are written in iniconf syntax. Therefore we expect SystemdUnit 626parser is chosen in top down way. The same logic is applicable to 627YumRepo parser. 628 629Autoconf parser specifies ``SUBPARSER_BI_DIRECTION``. For input 630file having name ``configure.ac``, by pattern matching, Autoconf parser 631is chosen in top down way. In other hand, for file name ``foo.m4``, 632Autoconf parser can be chosen in bottom up way. 633 634.. TODO: Write about SUBPARSER_BASE_RUNS_SUB after implementing python-celery. 635 636``inputStart`` is called before the base parser starting parsing a new input file. 637``inputEnd`` is called after the base parser finishing parsing the input file. 638Universal Ctags main part calls these methods. Therefore, a base parser doesn't 639have to call them. 640 641``exclusiveSubparserChosenNotify`` is called when a parser is chosen 642as an exclusive parser. Calling this method is a job of a base parser. 643 644 645Extending ``subparser`` type 646^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 647 648The m4 parser extends ``subparser`` type like following: 649 650``parsers/m4.h``: 651 652.. code-block:: C 653 654 typedef struct sM4Subparser m4Subparser; 655 struct sM4Subparser { 656 subparser subparser; 657 658 bool (* probeLanguage) (m4Subparser *m4, const char* token); 659 660 /* return value: Cork index */ 661 int (* newMacroNotify) (m4Subparser *m4, const char* token); 662 663 bool (* doesLineCommentStart) (m4Subparser *m4, int c, const char *token); 664 bool (* doesStringLiteralStart) (m4Subparser *m4, int c); 665 }; 666 667 668Put ``subparser`` as the first member of the extended struct (here sM4Subparser). 669In addition the first field, 4 methods are defined in the extended struct. 670 671Till choosing a subparser for the current input file, the m4 parser calls 672``probeLanguage`` method of its subparsers each time when find a token 673in the input file. A subparser returns ``true`` if it recognizes the 674input file is for the itself by analyzing tokens passed from the 675base parser. 676 677``parsers/autoconf.c``: 678 679.. code-block:: C 680 681 extern parserDefinition* AutoconfParser (void) 682 { 683 static const char *const patterns [] = { "configure.in", NULL }; 684 static const char *const extensions [] = { "ac", NULL }; 685 parserDefinition* const def = parserNew("Autoconf"); 686 687 static m4Subparser autoconfSubparser = { 688 .subparser = { 689 .direction = SUBPARSER_BI_DIRECTION, 690 .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback, 691 }, 692 .probeLanguage = probeLanguage, 693 .newMacroNotify = newMacroCallback, 694 .doesLineCommentStart = doesLineCommentStart, 695 .doesStringLiteralStart = doesStringLiteralStart, 696 }; 697 698``probeLanguage`` function defined in ``autoconf.c`` is connected to 699the ``probeLanguage`` member of ``autoconfSubparser``. The ``probeLanguage`` function 700of Autoconf is very simple: 701 702``parsers/autoconf.c``: 703 704.. code-block:: C 705 706 static bool probeLanguage (m4Subparser *m4, const char* token) 707 { 708 return strncmp (token, "m4_", 3) == 0 709 || strncmp (token, "AC_", 3) == 0 710 || strncmp (token, "AM_", 3) == 0 711 || strncmp (token, "AS_", 3) == 0 712 || strncmp (token, "AH_", 3) == 0 713 ; 714 } 715 716This function checks the prefix of passed tokens. If known 717prefix is found, Autoconf assumes this is an Autoconf input 718and returns ``true``. 719 720``parsers/m4.c``: 721 722.. code-block:: C 723 724 if (m4tmp->probeLanguage 725 && m4tmp->probeLanguage (m4tmp, token)) 726 { 727 chooseExclusiveSubparser ((m4Subparser *)tmp, NULL); 728 m4found = m4tmp; 729 } 730 731The m4 parsers calls ``probeLanguage`` function of a subparser. If ``true`` 732is returned ``chooseExclusiveSubparser`` function which is defined 733in the main part. ``chooseExclusiveSubparser`` calls 734``exclusiveSubparserChosenNotify`` method of the chosen subparser. 735 736The method is implemented in Autoconf subparser like following: 737 738``parsers/autoconf.c``: 739 740.. code-block:: C 741 742 static void exclusiveSubparserChosenCallback (subparser *s, void *data) 743 { 744 setM4Quotes ('[', ']'); 745 } 746 747It changes quote characters of the m4 parser. 748 749 750Making a tag in a subparser 751^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 752 753Via calling callback functions defined in subparsers, their base parser 754gives chance to them making tag entries. 755 756The m4 parser calls ``newMacroNotify`` method when it finds an m4 macro is used. 757The Autoconf parser connects ``newMacroCallback`` function defined in ``parser/autoconf.c``. 758 759 760``parsers/autoconf.c``: 761 762 763.. code-block:: C 764 765 static int newMacroCallback (m4Subparser *m4, const char* token) 766 { 767 int keyword; 768 int index = CORK_NIL; 769 770 keyword = lookupKeyword (token, getInputLanguage ()); 771 772 /* TODO: 773 AH_VERBATIM 774 */ 775 switch (keyword) 776 { 777 case KEYWORD_NONE: 778 break; 779 case KEYWORD_init: 780 index = makeAutoconfTag (PACKAGE_KIND); 781 break; 782 783 ... 784 785 extern parserDefinition* AutoconfParser (void) 786 { 787 ... 788 static m4Subparser autoconfSubparser = { 789 .subparser = { 790 .direction = SUBPARSER_BI_DIRECTION, 791 .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback, 792 }, 793 .probeLanguage = probeLanguage, 794 .newMacroNotify = newMacroCallback, 795 796In ``newMacroCallback`` function, the Autoconf parser receives the name of macro 797found by the base parser and analysis whether the macro is interesting 798in the context of Autoconf language or not. If it is interesting name, 799the Autoconf parser makes a tag for it. 800 801 802Calling methods of subparsers from a base parser 803^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 804 805A base parser can use ``foreachSubparser`` macro for accessing its 806subparsers. A base should call ``enterSubparser`` before calling a 807method of a subparser, and call ``leaveSubparser`` after calling the 808method. The macro and functions are declare in ``main/subparser.h`` . 809 810 811``parsers/m4.c``: 812 813.. code-block:: C 814 815 static m4Subparser * maySwitchLanguage (const char* token) 816 { 817 subparser *tmp; 818 m4Subparser *m4found = NULL; 819 820 foreachSubparser (tmp, false) 821 { 822 m4Subparser *m4tmp = (m4Subparser *)tmp; 823 824 enterSubparser(tmp); 825 if (m4tmp->probeLanguage 826 && m4tmp->probeLanguage (m4tmp, token)) 827 { 828 chooseExclusiveSubparser (tmp, NULL); 829 m4found = m4tmp; 830 } 831 leaveSubparser(); 832 833 if (m4found) 834 break; 835 } 836 837 return m4found; 838 } 839 840``foreachSubparser`` takes a variable having type ``subparser``. 841For each iteration, the value for the variable is updated. 842 843``enterSubparser`` takes a variable having type ``subparser``. With the 844calling ``enterSubparser``, the current language (the value returned from 845``getInputLanguage``) can be temporary switched to the language specified 846with the variable. One of the effect of switching is that ``language`` 847field of tags made in the callback function called between 848``enterSubparser`` and ``leaveSubparser`` is adjusted. 849 850Registering a subparser to its base parser 851^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 852 853Use ``DEPTYPE_SUBPARSER`` dependency in a subparser for registration. 854 855``parsers/autoconf.c``: 856 857.. code-block:: C 858 859 extern parserDefinition* AutoconfParser (void) 860 { 861 parserDefinition* const def = parserNew("Autoconf"); 862 863 static m4Subparser autoconfSubparser = { 864 .subparser = { 865 .direction = SUBPARSER_BI_DIRECTION, 866 .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback, 867 }, 868 .probeLanguage = probeLanguage, 869 .newMacroNotify = newMacroCallback, 870 .doesLineCommentStart = doesLineCommentStart, 871 .doesStringLiteralStart = doesStringLiteralStart, 872 }; 873 static parserDependency dependencies [] = { 874 [0] = { DEPTYPE_SUBPARSER, "M4", &autoconfSubparser }, 875 }; 876 877 def->dependencies = dependencies; 878 def->dependencyCount = ARRAY_SIZE (dependencies); 879 880 881``DEPTYPE_SUBPARSER`` is specified in the 0th element of ``dependencies`` 882function static variable. In the next a literal string "M4" is 883specified and ``autoconfSubparser`` follows. The intent of the code is 884registering ``autoconfSubparser`` subparser definition to a base parser 885named "M4". 886 887``dependencies`` function static variable must be assigned to 888``dependencies`` fields of a variable of ``parserDefinition``. 889The main part of Universal Ctags refers the field when 890initializing parsers. 891 892``[0]`` emphasizes this is "the 0th element". The subparser may refer 893the index of the array when the subparser calls 894``scheduleRunningBaseparser``. 895 896 897Scheduling running the base parser 898^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 899 900For the case that a subparser is chosen in top down, the subparser 901must call ``scheduleRunningBaseparser`` in the main ``parser`` method. 902 903``parsers/autoconf.c``: 904 905.. code-block:: C 906 907 static void findAutoconfTags(void) 908 { 909 scheduleRunningBaseparser (0); 910 } 911 912 extern parserDefinition* AutoconfParser (void) 913 { 914 ... 915 parserDefinition* const def = parserNew("Autoconf"); 916 ... 917 static parserDependency dependencies [] = { 918 [0] = { DEPTYPE_SUBPARSER, "M4", &autoconfSubparser }, 919 }; 920 921 def->dependencies = dependencies; 922 ... 923 def->parser = findAutoconfTags; 924 ... 925 return def; 926 } 927 928A subparser can do nothing actively. A base parser makes its subparser 929work by calling methods of the subparser. Therefore a subparser must 930run its base parser when the subparser is chosen in a top down way, 931The main part prepares ``scheduleRunningBaseparser`` function for the purpose. 932 933A subparser should call the function from ``parser`` method of ``parserDefinition`` 934of the subparser. ``scheduleRunningBaseparser`` takes an integer. It specifies 935an index of the dependency which is used for registering the subparser. 936 937 938PackCC compiler-compiler 939~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 940 941PackCC is a compiler-compiler; it translates ``.peg`` grammar file to ``.c`` 942file. PackCC was originally written by Arihiro Yoshida. Its source 943repository is at https://github.com/arithy/packcc. 944 945The source tree of PackCC is grafted at ``misc/packcc`` directory. 946Building PackCC and ctags are integrated in the build-scripts of 947Universal Ctags. 948 949Refer `peg/valink.peg 950<https://github.com/universal-ctags/ctags/blob/master/peg/varlink.peg>`_ as a 951sample of a parser using PackCC. 952 953Automatic parser guessing (TBW) 954~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 955 956Managing regular expression parsers (TBW) 957~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 958 959Ghost kind in regex parser (TBW) 960~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 961 962.. TODO: Q: what is the point of documenting this? 963 from comment on #2916: I (@masatake) must explain the ghost kind. 964 from comment on #2916: 965 I (@masatake) found I must explain "placeholder tag". The ghost kind is 966 useful for fill the find field of the placeholder tag. I will write about 967 the Ghost kind when I write about the placeholder tag. I will write about 968 the placeholder tag when I write about Optscript. 969 970 If a whitespace is used as a kind letter, it is never printed when 971 ctags is called with ``--list-kinds`` option. This kind is 972 automatically assigned to an empty name pattern. 973 974 Normally you don't need to know this. 975