xref: /Universal-ctags/docs/internal.rst (revision aaaac7eeac8399141aa8e6d9e6ec0379931848b2)
1.. ctags Internal API
2.. ---------------------------------------------------------------------
3
4.. _input-text-stream:
5
6Input text stream
7~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8
9.. figure:: input-text-stream.svg
10	    :scale: 80%
11
12Function prototypes for handling input text stream are declared in
13``main/read.h``. The file exists in Exuberant Ctags, too.  However, the
14names functions are changed when overhauling ``--line-directive``
15option. (In addition macros were converted to functions for making
16data structures for the input text stream opaque.)
17
18Ctags has 3 groups of functions for handling input: *input*, *bypass*, and
19*raw*. Parser developers should use input group. The rest of two
20are for ctags main part.
21
22
23.. _inputFile:
24
25`inputFile` type and the functions of input group
26......................................................................
27
28.. note:: The original version of this section was written
29	before ``inputFile`` type and ``File`` variable are made private.
30
31``inputFile`` is the type for representing the input file and stream for
32a parser. It was declared in ``main/read.h`` but now it is defined in
33``main/read.c``.
34
35Ctags uses a file static variable ``File`` having type ``inputFile`` for
36maintaining the input file and stream. ``File`` is also defined in
37main/read.c as ``inputFile`` is.
38
39``fp`` and ``line`` are the essential fields of ``File``. ``fp`` having type
40well known ``MIO`` declared in ``main/mio.h``. By calling functions of input group
41(``getcFromInputFile`` and ``readLineFromInputFile``), a parser gets input
42text from ``fp``.
43
44The functions of input group updates fields ``input`` and ``source`` of ``File`` variable.
45These two fields has type ``inputFileInfo``. These two fields are for mainly
46tracking the name of file and the current line number. Usually ctags uses
47only ``input`` field. ``source`` field is used only when ``#line`` directive is found
48in the current input text stream.
49
50A case when a tool generates the input file from another file, a tool
51can record the original source file to the generated file with using
52the ``#line`` directive. ``source`` field is used for tracking/recording the
53information appeared on ``#line`` directives.
54
55Regex pattern matching are also done behind calling the functions of
56this group.
57
58
59The functions of bypass group
60......................................................................
61The functions of bypass group (``readLineFromBypass`` and
62``readLineFromBypassSlow``) are used for reading text from ``fp`` field of
63``File`` static variable without updating ``input`` and ``source`` fields of
64``File`` variable.
65
66
67Parsers may not need the functions of this group.  The functions are
68used in ctags main part. The functions are used to make pattern
69fields of tags file, for example.
70
71
72The functions of raw group
73......................................................................
74The functions of this group (``readLineRaw`` and ``readLineRawWithNoSeek``)
75take a parameter having type ``MIO``; and don't touch ``File`` static
76variable.
77
78Parsers may not need the functions of this group.  The functions are
79used in ctags main part. The functions are used to load option files,
80for example.
81
82
83.. NOT REVIEWED YET
84
85.. _output-tag-stream:
86
87Output tag stream
88~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
89
90.. figure:: output-tag-stream.svg
91	    :scale: 80%
92
93Ctags provides ``makeTagEntry`` to parsers as an entry point for writing
94tag information to MIO. ``makeTagEntry`` calls ``writeTagEntry`` if the
95parser does not set ``CORK_QUEUE`` to ``useCork`` field. ``writeTagEntry`` calls ``writerWriteTag``.
96``writerWriteTag`` just calls ``writeEntry`` of writer backends.
97``writerTable`` variable holds the four backends: ctagsWriter, etagsWriter,
98xrefWriter, and jsonWriter.
99One of them is chosen depending on the arguments passed to ctags.
100
101If ``CORK_QUEUE`` is set to ``useCork``, the tag information goes to a queue on memory.
102The queue is flushed when ``useCork`` in unset. See "`cork API`_" for more
103details.
104
105cork API
106......................................................................
107
108Background and Idea
109^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110*cork API* is introduced for recording scope information easier.
111
112Before introducing cork API, a scope information must be recorded as
113strings. It is flexible but memory management is required.
114Following code is taken from ``clojure.c`` (with some modifications).
115
116.. code-block:: c
117
118		if (vStringLength (parent) > 0)
119		{
120			current.extensionFields.scope[0] = ClojureKinds[K_NAMESPACE].name;
121			current.extensionFields.scope[1] = vStringValue (parent);
122		}
123
124		makeTagEntry (&current);
125
126``parent``, ``scope [0]`` and ``scope [1]`` are vStrings. The parser must manage
127their life cycles; the parser cannot free them till the tag referring them via
128its scope fields are emitted, and must free them after emitting.
129
130cork API provides more solid way to hold scope information. cork API
131expects ``parent``, which represents scope of a tag(``current``)
132currently parser dealing, is recorded to a *tags* file before recording
133the ``current`` tag via ``makeTagEntry`` function.
134
135For passing the information about ``parent`` to ``makeTagEntry``,
136``tagEntryInfo`` object was created. It was used just for recording; and
137freed after recording.  In cork API, it is not freed after recording;
138a parser can reused it as scope information.
139
140How to use
141^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
142
143See a commit titled with "`clojure: use cork <https://github.com/universal-ctags/ctags/commit/ef181e6>`_".
144I applied cork API to the clojure parser.
145
146Cork API can be enabled and disabled per parser,
147and is disabled by default. So there is no impact till you
148enables it in your parser.
149
150``useCork`` field is introduced in ``parserDefinition`` type:
151
152.. code-block:: c
153
154		typedef struct {
155		...
156				unsigned int useCork;
157		...
158		} parserDefinition;
159
160Set ``CORK_QUEUE`` to ``useCork`` like:
161
162.. code-block:: c
163
164    extern parserDefinition *ClojureParser (void)
165    {
166	    ...
167	    parserDefinition *def = parserNew ("Clojure");
168	    ...
169	    def->useCork = CORK_QUEUE;
170	    return def;
171    }
172
173When ctags running a parser with ``useCork`` being ``CORK_QUEUE``, all output
174requested via ``makeTagEntry`` function calling is stored to an internal
175queue, not to ``tags`` file.  When parsing an input file is done, the
176tag information stored automatically to the queue are flushed to
177``tags`` file in batch.
178
179When calling ``makeTagEntry`` with a ``tagEntryInfo`` object (``parent``),
180it returns an integer. The integer can be used as handle for referring
181the object after calling.
182
183
184.. code-block:: c
185
186		int parent = CORK_NIL;
187		...
188		parent = makeTagEntry (&e);
189
190The handle can be used by setting to a ``scopeIndex``
191field of ``current`` tag, which is in the scope of ``parent``.
192
193.. code-block:: c
194
195		current.extensionFields.scopeIndex = parent;
196
197When passing ``current`` to ``makeTagEntry``, the ``scopeIndex`` is
198referred for emitting the scope information of ``current``.
199
200``scopeIndex`` must be set to ``CORK_NIL`` if a tag is not in any scope.
201When using ``scopeIndex`` of ``current``, ``NULL`` must be assigned to both
202``current.extensionFields.scope[0]`` and
203``current.extensionFields.scope[1]``.  ``initTagEntry`` function does this
204initialization internally, so you generally you don't have to write
205the initialization explicitly.
206
207Automatic full qualified tag generation
208^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
209
210If a parser uses the cork API for recording and emitting scope
211information, ctags can reuse it for generating *full qualified (FQ)
212tags*. Set ``requestAutomaticFQTag`` field of ``parserDefinition`` to
213``TRUE`` then the main part of ctags emits FQ tags on behalf of the parser
214if ``--extras=+q`` is given.
215
216An example can be found in DTS parser:
217
218.. code-block:: c
219
220    extern parserDefinition* DTSParser (void)
221    {
222	    static const char *const extensions [] = { "dts", "dtsi", NULL };
223	    parserDefinition* const def = parserNew ("DTS");
224	    ...
225	    def->requestAutomaticFQTag = TRUE;
226	    return def;
227    }
228
229Setting ``requestAutomaticFQTag`` to ``TRUE`` implies setting
230``useCork`` to ``CORK_QUEUE``.
231
232.. NOT REVIEWED YET
233
234.. _symtabAPI:
235
236symbol table API
237^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
238
239*symbol table* API is an extension to the cork API. The cork API was
240introduced to provide the simple way to represent mapping (*forward
241mapping*) from a language object (*child object*) to its upper scope
242(*parent object*). *symbol table* API is for representing the mapping
243(*reverse mapping*) opposite direction; you can look up (or traverse)
244child tags defined (or used) in a given tag.
245
246To use this API, a parser must set ``CORK_SYMTAB`` to ``useCork`` member
247of ``parserDefinition`` in addition to setting ``CORK_QUEUE`` as preparation.
248
249An example taken from R parser:
250
251.. code-block:: c
252
253	extern parserDefinition *RParser (void)
254	{
255		static const char *const extensions[] = { "r", "R", "s", "q", NULL };
256		parserDefinition *const def = parserNew ("R");
257
258		...
259
260		def->useCork = CORK_QUEUE | CORK_SYMTAB;
261
262		...
263
264		return def;
265	}
266
267
268To install a reverse mapping between a parent and its child tags,
269call ``registerEntry`` with the cork index for a child after making
270the child tag filling ``scopeIndex``:
271
272.. code-block:: c
273
274	int parent = CORK_NIL;
275	...
276	parent = makeTagEntry (&e_parent);
277
278	...
279
280	tagEntryInfo e_child;
281	...
282	initTagEntry (&e_child, ...);
283	e_child.extensionFields.scopeIndex = parent;    /* setting up forward mapping */
284	...
285	int child = makeTagEntry (&e_child);
286
287	registerEntry (child);                          /* setting up reverse mapping */
288
289``registerEntry`` stores ``child`` to the symbol table of ``parent``.
290If ``scopeIndex`` of ``child`` is ``CORK_NIL``, the ``child`` is stores
291to the *toplevel scope*.
292
293``unregisterEntry`` is for clearing (and updating) the reverse mapping
294of a child. Consider the case you want to change the scope of ``child``
295from ``newParent``.
296
297.. code-block:: c
298
299	unregisterEntry (child);                         /* delete the reverse mapping. */
300	tagEntryInfo *e_child = getEntryInCorkQueue (child);
301	e_child->extensionFields.scopeIndex = newParent; /* update the forward mapping. */
302	registerEntry (child);                           /* set the new reverse mapping. */
303
304``foreachEntriesInScope`` is the function for traversing all child
305tags stored to the parent tag specified with ``corkIndex``.
306If the ``corkIndex`` is ``CORK_NIL``, the children defined (and/or
307used) in *toplevel scope*  are traversed.
308
309.. code-block:: c
310
311	typedef bool (* entryForeachFunc) (int corkIndex,
312									   tagEntryInfo * entry,
313									   void * data);
314	bool          foreachEntriesInScope (int corkIndex,
315										 const char *name, /* or NULL */
316										 entryForeachFunc func,
317										 void *data);
318
319``foreachEntriesInScope``  takes a ``foreachEntriesInScope`` typed
320callback function.  ``foreachEntriesInScope`` passes the cork
321index and a pointer for ``tagEntryInfo`` object of children.
322
323`anyEntryInScope` is a function for finding a child tag stored
324to the parent tag specified with ``corkIndex``. It returns
325the cork index for the child tag. If ``corkIndex`` is ``CORK_NIL``,
326`anyEntryInScope` finds a tag stored to the toplevel scope.
327The returned child tag has ``name`` as its name as far as ``name``
328is not ``NULL``.
329
330.. code-block:: c
331
332	int           anyEntryInScope       (int corkIndex,
333										 const char *name,
334										 bool onlyDefinitionTag);
335
336
337.. _tokeninfo:
338
339tokenInfo API
340~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
341
342In Exuberant Ctags, a developer can write a parser anyway; only input
343stream and tagEntryInfo data structure is given.
344
345However, while maintaining Universal Ctags I (Masatake YAMATO) think
346we should have a framework for writing parser. Of course the framework
347is optional; you can still write a parser without the framework.
348
349To design a framework, I have studied how @b4n (Colomban Wendling)
350writes parsers. tokenInfo API is the first fruit of my study.
351
352TBW
353
354Multiple parsers
355~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
356
357.. _promiseAPI:
358
359Guest parser (promise API)
360......................................................................
361
362See ":ref:`host-guest-parsers`" about the concept of guest parsers.
363
364Background and Idea
365^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
366More than one programming languages can be used in one input text stream.
367*promise API* allows a host parser running a :ref:`guest parser
368<host-guest-parsers>` in the specified area of input text stream.
369
370e.g. Code written in c language (C code) is embedded
371in code written in Yacc language (Yacc code). Let's think about this
372input stream.
373
374.. code-block:: yacc
375
376   /* foo.y */
377    %token
378	    END_OF_FILE	0
379	    ERROR		255
380	    BELL		1
381
382    %{
383    /* C language */
384    int counter;
385    %}
386    %right	EQUALS
387    %left	PLUS MINUS
388    ...
389    %%
390    CfgFile		:	CfgEntryList
391			    { InterpretConfigs($1); }
392		    ;
393
394    ...
395    %%
396    int
397    yyerror(char *s)
398    {
399	(void)fprintf(stderr,"%s: line %d of %s\n",s,lineNum,
400					    (scanFile?scanFile:"(unknown)"));
401	if (scanStr)
402	    (void)fprintf(stderr,"last scanned symbol is: %s\n",scanStr);
403	return 1;
404    }
405
406In the input the area started from ``%{`` to ``%}`` and the area started from
407the second ``%%`` to the end of file are written in C. Yacc can be called
408*host language*, and C can be called *guest language*.
409
410Ctags may choose the Yacc parser for the input. However, the parser
411doesn't know about C syntax. Implementing C parser in the Yacc parser
412is one of approach. However, ctags has already C parser.  The Yacc
413parser should utilize the existing C parser. The promise API allows this.
414
415See also ":ref:`host-guest-parsers`" about more concept and examples of the
416guest parser.
417
418Usage
419^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
420
421See a commit titled with "`Yacc: run C parser in the areas where code
422is written in C <https://github.com/universal-ctags/ctags/commit/757673f>`_".
423I applied promise API to the Yacc parser.
424
425The parser for host language must track and record the ``start`` and the
426``end`` of a guest language. Pairs of ``line number`` and ``byte offset``
427represents the ``start`` and ``end``. When the ``start`` and ``end`` are
428fixed, call ``makePromise`` with (1) the guest parser name, (2) ``start``,
429and (3) ``end``. (This description is a bit simplified the real usage.)
430
431
432Let's see the actual code from "`parsers/yacc.c
433<https://github.com/universal-ctags/ctags/blob/master/parsers/yacc.c>`_".
434
435.. code-block:: c
436
437	struct cStart {
438		unsigned long input;
439		unsigned long source;
440	};
441
442Both fields are for recording ``start``. ``input`` field
443is for recording the value returned from ``getInputLineNumber``.
444``source`` is for ``getSourceLineNumber``. See "`inputFile`_" for the
445difference of the two.
446
447``enter_c_prologue`` shown in the next is a function called when ``%{`` is
448found in the current input text stream. Remember, in yacc syntax, ``%{``
449is a marker of C code area.
450
451.. code-block:: c
452
453    static void enter_c_prologue (const char *line CTAGS_ATTR_UNUSED,
454				 const regexMatch *matches CTAGS_ATTR_UNUSED,
455				 unsigned int count CTAGS_ATTR_UNUSED,
456				 void *data)
457    {
458	   struct cStart *cstart = data;
459
460
461	   readLineFromInputFile ();
462	   cstart->input  = getInputLineNumber ();
463	   cstart->source = getSourceLineNumber ();
464    }
465
466
467The function just records the start line.  It calls
468``readLineFromInputFile`` because the C code may start the next line of
469the line where the marker is.
470
471``leave_c_prologue`` shown in the next is a function called when ``%}``,
472the end marker of C code area, is found in the current input text stream.
473
474.. code-block:: c
475
476    static void leave_c_prologue (const char *line CTAGS_ATTR_UNUSED,
477				 const regexMatch *matches CTAGS_ATTR_UNUSED,
478				 unsigned int count CTAGS_ATTR_UNUSED,
479				 void *data)
480    {
481	   struct cStart *cstart = data;
482	   unsigned long c_end;
483
484	   c_end = getInputLineNumber ();
485	   makePromise ("C", cstart->input, 0, c_end, 0, cstart->source);
486    }
487
488After recording the line number of the end of the C code area,
489``leave_c_prologue`` calls ``makePromise``.
490
491Of course ``"C"`` stands for C language, the name of guest parser.
492Available parser names can be listed by running ctags with
493``--list-languages`` option. In this example two ``0`` characters are provided as
494the 3rd and 5th argument. They are byte offsets of the start and the end of the
495C language area from the beginning of the line which is 0 in this case. In
496general, the guest language's section does not have to start at the beginning of
497the line in which case the two offsets have to be provided. Compilers reading
498the input character by character can obtain the current offset by calling
499``getInputLineOffset()``.
500
501Internal design
502^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
503
504.. figure:: promise.svg
505	    :scale: 80%
506
507A host parser cannot run a guest parser directly. What the host parser
508can do is just asking the ctags main part scheduling of running the
509guest parser for specified area which defined with the ``start`` and
510``end``. These scheduling requests are called *promises*.
511
512After running the host parser, before closing the input stream, the
513ctags main part checks the existence of promise(s). If there is, the
514main part makes a sub input stream and run the guest parser specified
515in the promise. The sub input stream is made from the original input
516stream by narrowing as requested in the promise. The main part
517iterates the above process till there is no promise.
518
519Theoretically a guest parser can be nested; it can make a promise.
520The level 2 guest is also just scheduled. (However, I have never
521tested such a nested guest parser).
522
523Why not running the guest parser directly from the context of the host
524parser? Remember many parsers have their own file static variables. If
525a parser is called from the parser, the variables may be crashed.
526
527API for subparser
528......................................................................
529
530See ":ref:`base-sub-parsers`" about the concept of subparser.
531
532.. note:: Consider using optlib when implementing a subparser. It is much more
533	easy and simple. See ":ref:`defining-subparsers`" for details.
534
535Outline
536^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
537
538You have to work on both sides: a base parser and subparsers.
539
540A base parser must define a data structure type (``baseMethodTable``) for
541its subparsers by extending ``struct subparser`` defined in
542``main/subparser.h``.  A subparser defines a variable (``subparser var``)
543having type ``baseMethodTable`` by filling its fields and registers
544``subparser var`` to the base parser using dependency API.
545
546The base parser calls functions pointed by ``baseMethodTable`` of
547subparsers during parsing. A function for probing a higher level
548language may be included in ``baseMethodTable``.  What kind of fields
549should be included in ``baseMethodTable`` is up to the design of a base
550parser and the requirements of its subparsers. A method for
551probing is one of them.
552
553Registering a ``subparser var`` to a base parser is enough for the
554bottom up choice. For handling the top down choice (e.g. specifying
555``--language-force=<subparser>`` in a command line), more code is needed.
556
557In the top down choice, the subparser must call ``scheduleRunningBasepaser``,
558declared in ``main/subparser.h``, in its ``parser`` method.
559Here, ``parser`` method means a function assigned to the ``parser`` member of
560the ``parserDefinition`` of the subparser.
561``scheduleRunningBaseparser`` takes an integer argument
562that specifies the dependency used for registering the ``subparser var``.
563
564By extending ``struct subparser`` you can define a type for
565your subparser. Then make a variable for the type and
566declare a dependency on the base parser.
567
568Fields of ``subparser`` type
569^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
570
571Here the source code of Autoconf/m4 parsers is referred as an example.
572
573``main/types.h``:
574
575.. code-block:: C
576
577    struct sSubparser;
578    typedef struct sSubparser subparser;
579
580
581``main/subparser.h``:
582
583.. code-block:: C
584
585    typedef enum eSubparserRunDirection {
586	    SUBPARSER_BASE_RUNS_SUB = 1 << 0,
587	    SUBPARSER_SUB_RUNS_BASE = 1 << 1,
588	    SUBPARSER_BI_DIRECTION  = SUBPARSER_BASE_RUNS_SUB|SUBPARSER_SUB_RUNS_BASE,
589    } subparserRunDirection;
590
591    struct sSubparser {
592	    ...
593
594	    /* public to the parser */
595	    subparserRunDirection direction;
596
597	    void (* inputStart) (subparser *s);
598	    void (* inputEnd) (subparser *s);
599	    void (* exclusiveSubparserChosenNotify) (subparser *s, void *data);
600    };
601
602A subparser must fill the fields of ``subparser``.
603
604``direction`` field specifies how the subparser is called. See
605":ref:`multiple_parsers_directions`" in ":ref:`multiple_parsers`" about
606*direction flags*, and see ":ref:`optlib_directions`" in ":ref:`optlib`" for
607examples of using the direction flags.
608
609===========================  ======================
610``direction`` field          Direction Flag
611===========================  ======================
612``SUBPARSER_BASE_RUNS_SUB``  ``shared`` (default)
613``SUBPARSER_SUB_RUNS_BASE``  ``dedicated``
614``SUBPARSER_BI_DIRECTION``   ``bidirectional``
615===========================  ======================
616
617If a subparser runs exclusively and is chosen in top down way, set
618``SUBPARSER_SUB_RUNS_BASE`` flag. If a subparser runs coexisting way and
619is chosen in bottom up way, set ``SUBPARSER_BASE_RUNS_SUB``.  Use
620``SUBPARSER_BI_DIRECTION`` if both cases can be considered.
621
622SystemdUnit parser runs as a subparser of iniconf base parser.
623SystemdUnit parser specifies ``SUBPARSER_SUB_RUNS_BASE`` because
624unit files of systemd have very specific file extensions though
625they are written in iniconf syntax. Therefore we expect SystemdUnit
626parser is chosen in top down way. The same logic is applicable to
627YumRepo parser.
628
629Autoconf parser specifies ``SUBPARSER_BI_DIRECTION``. For input
630file having name ``configure.ac``, by pattern matching, Autoconf parser
631is chosen in top down way. In other hand, for file name ``foo.m4``,
632Autoconf parser can be chosen in bottom up way.
633
634.. TODO: Write about SUBPARSER_BASE_RUNS_SUB after implementing python-celery.
635
636``inputStart`` is called before the base parser starting parsing a new input file.
637``inputEnd`` is called after the base parser finishing parsing the input file.
638Universal Ctags main part calls these methods. Therefore, a base parser doesn't
639have to call them.
640
641``exclusiveSubparserChosenNotify`` is called when a parser is chosen
642as an exclusive parser. Calling this method is a job of a base parser.
643
644
645Extending ``subparser`` type
646^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
647
648The m4 parser extends ``subparser`` type like following:
649
650``parsers/m4.h``:
651
652.. code-block:: C
653
654    typedef struct sM4Subparser m4Subparser;
655    struct sM4Subparser {
656	    subparser subparser;
657
658	    bool (* probeLanguage) (m4Subparser *m4, const char* token);
659
660	    /* return value: Cork index */
661	    int  (* newMacroNotify) (m4Subparser *m4, const char* token);
662
663	    bool (* doesLineCommentStart)   (m4Subparser *m4, int c, const char *token);
664	    bool (* doesStringLiteralStart) (m4Subparser *m4, int c);
665    };
666
667
668Put ``subparser`` as the first member of the extended struct (here sM4Subparser).
669In addition the first field, 4 methods are defined in the extended struct.
670
671Till choosing a subparser for the current input file, the m4 parser calls
672``probeLanguage`` method of its subparsers each time when find a token
673in the input file. A subparser returns ``true`` if it recognizes the
674input file is for the itself by analyzing tokens passed from the
675base parser.
676
677``parsers/autoconf.c``:
678
679.. code-block:: C
680
681    extern parserDefinition* AutoconfParser (void)
682    {
683	    static const char *const patterns [] = { "configure.in", NULL };
684	    static const char *const extensions [] = { "ac", NULL };
685	    parserDefinition* const def = parserNew("Autoconf");
686
687	    static m4Subparser autoconfSubparser = {
688		    .subparser = {
689			    .direction = SUBPARSER_BI_DIRECTION,
690			    .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback,
691		    },
692		    .probeLanguage  = probeLanguage,
693		    .newMacroNotify = newMacroCallback,
694		    .doesLineCommentStart = doesLineCommentStart,
695		    .doesStringLiteralStart = doesStringLiteralStart,
696	    };
697
698``probeLanguage`` function defined in ``autoconf.c`` is connected to
699the ``probeLanguage`` member of ``autoconfSubparser``. The ``probeLanguage`` function
700of Autoconf is very simple:
701
702``parsers/autoconf.c``:
703
704.. code-block:: C
705
706    static bool probeLanguage (m4Subparser *m4, const char* token)
707    {
708	    return strncmp (token, "m4_", 3) == 0
709		    || strncmp (token, "AC_", 3) == 0
710		    || strncmp (token, "AM_", 3) == 0
711		    || strncmp (token, "AS_", 3) == 0
712		    || strncmp (token, "AH_", 3) == 0
713		    ;
714    }
715
716This function checks the prefix of passed tokens. If known
717prefix is found, Autoconf assumes this is an Autoconf input
718and returns ``true``.
719
720``parsers/m4.c``:
721
722.. code-block:: C
723
724		if (m4tmp->probeLanguage
725			&& m4tmp->probeLanguage (m4tmp, token))
726		{
727			chooseExclusiveSubparser ((m4Subparser *)tmp, NULL);
728			m4found = m4tmp;
729		}
730
731The m4 parsers calls ``probeLanguage`` function of a subparser. If ``true``
732is returned ``chooseExclusiveSubparser`` function which is defined
733in the main part. ``chooseExclusiveSubparser`` calls
734``exclusiveSubparserChosenNotify`` method of the chosen subparser.
735
736The method is implemented in Autoconf subparser like following:
737
738``parsers/autoconf.c``:
739
740.. code-block:: C
741
742    static void exclusiveSubparserChosenCallback (subparser *s, void *data)
743    {
744	    setM4Quotes ('[', ']');
745    }
746
747It changes quote characters of the m4 parser.
748
749
750Making a tag in a subparser
751^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
752
753Via calling callback functions defined in subparsers, their base parser
754gives chance to them making tag entries.
755
756The m4 parser calls ``newMacroNotify`` method when it finds an m4 macro is used.
757The Autoconf parser connects ``newMacroCallback`` function defined in ``parser/autoconf.c``.
758
759
760``parsers/autoconf.c``:
761
762
763.. code-block:: C
764
765    static int newMacroCallback (m4Subparser *m4, const char* token)
766    {
767	    int keyword;
768	    int index = CORK_NIL;
769
770	    keyword = lookupKeyword (token, getInputLanguage ());
771
772	    /* TODO:
773	       AH_VERBATIM
774	     */
775	    switch (keyword)
776	    {
777	    case KEYWORD_NONE:
778		    break;
779	    case KEYWORD_init:
780		    index = makeAutoconfTag (PACKAGE_KIND);
781		    break;
782
783    ...
784
785    extern parserDefinition* AutoconfParser (void)
786    {
787	    ...
788	    static m4Subparser autoconfSubparser = {
789		    .subparser = {
790			    .direction = SUBPARSER_BI_DIRECTION,
791			    .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback,
792		    },
793		    .probeLanguage  = probeLanguage,
794		    .newMacroNotify = newMacroCallback,
795
796In ``newMacroCallback`` function, the Autoconf parser receives the name of macro
797found by the base parser and analysis whether the macro is interesting
798in the context of Autoconf language or not. If it is interesting name,
799the Autoconf parser makes a tag for it.
800
801
802Calling methods of subparsers from a base parser
803^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
804
805A base parser can use ``foreachSubparser`` macro for accessing its
806subparsers. A base should call ``enterSubparser`` before calling a
807method of a subparser, and call ``leaveSubparser`` after calling the
808method. The macro and functions are declare in ``main/subparser.h`` .
809
810
811``parsers/m4.c``:
812
813.. code-block:: C
814
815    static m4Subparser * maySwitchLanguage (const char* token)
816    {
817	    subparser *tmp;
818	    m4Subparser *m4found = NULL;
819
820	    foreachSubparser (tmp, false)
821	    {
822		    m4Subparser *m4tmp = (m4Subparser *)tmp;
823
824		    enterSubparser(tmp);
825		    if (m4tmp->probeLanguage
826			    && m4tmp->probeLanguage (m4tmp, token))
827		    {
828			    chooseExclusiveSubparser (tmp, NULL);
829			    m4found = m4tmp;
830		    }
831		    leaveSubparser();
832
833		    if (m4found)
834			    break;
835	    }
836
837	    return m4found;
838    }
839
840``foreachSubparser`` takes a variable having type ``subparser``.
841For each iteration, the value for the variable is updated.
842
843``enterSubparser`` takes a variable having type ``subparser``.  With the
844calling ``enterSubparser``, the current language (the value returned from
845``getInputLanguage``) can be temporary switched to the language specified
846with the variable. One of the effect of switching is that ``language``
847field of tags made in the callback function called between
848``enterSubparser`` and ``leaveSubparser`` is adjusted.
849
850Registering a subparser to its base parser
851^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
852
853Use ``DEPTYPE_SUBPARSER`` dependency in a subparser for registration.
854
855``parsers/autoconf.c``:
856
857.. code-block:: C
858
859    extern parserDefinition* AutoconfParser (void)
860    {
861	    parserDefinition* const def = parserNew("Autoconf");
862
863	    static m4Subparser autoconfSubparser = {
864		    .subparser = {
865			    .direction = SUBPARSER_BI_DIRECTION,
866			    .exclusiveSubparserChosenNotify = exclusiveSubparserChosenCallback,
867		    },
868		    .probeLanguage  = probeLanguage,
869		    .newMacroNotify = newMacroCallback,
870		    .doesLineCommentStart = doesLineCommentStart,
871		    .doesStringLiteralStart = doesStringLiteralStart,
872	    };
873	    static parserDependency dependencies [] = {
874		    [0] = { DEPTYPE_SUBPARSER, "M4", &autoconfSubparser },
875	    };
876
877	    def->dependencies = dependencies;
878	    def->dependencyCount = ARRAY_SIZE (dependencies);
879
880
881``DEPTYPE_SUBPARSER`` is specified in the 0th element of ``dependencies``
882function static variable. In the next a literal string "M4" is
883specified and ``autoconfSubparser`` follows. The intent of the code is
884registering ``autoconfSubparser`` subparser definition to a base parser
885named "M4".
886
887``dependencies`` function static variable must be assigned to
888``dependencies`` fields of a variable of ``parserDefinition``.
889The main part of Universal Ctags refers the field when
890initializing parsers.
891
892``[0]`` emphasizes this is "the 0th element". The subparser may refer
893the index of the array when the subparser calls
894``scheduleRunningBaseparser``.
895
896
897Scheduling running the base parser
898^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
899
900For the case that a subparser is chosen in top down, the subparser
901must call ``scheduleRunningBaseparser`` in the main ``parser`` method.
902
903``parsers/autoconf.c``:
904
905.. code-block:: C
906
907    static void findAutoconfTags(void)
908    {
909	    scheduleRunningBaseparser (0);
910    }
911
912    extern parserDefinition* AutoconfParser (void)
913    {
914	    ...
915	    parserDefinition* const def = parserNew("Autoconf");
916	    ...
917	    static parserDependency dependencies [] = {
918		    [0] = { DEPTYPE_SUBPARSER, "M4", &autoconfSubparser },
919	    };
920
921	    def->dependencies = dependencies;
922	    ...
923	    def->parser = findAutoconfTags;
924	    ...
925	    return def;
926    }
927
928A subparser can do nothing actively. A base parser makes its subparser
929work by calling methods of the subparser.  Therefore a subparser must
930run its base parser when the subparser is chosen in a top down way,
931The main part prepares ``scheduleRunningBaseparser`` function for the purpose.
932
933A subparser should call the function from ``parser`` method of ``parserDefinition``
934of the subparser. ``scheduleRunningBaseparser`` takes an integer. It specifies
935an index of the dependency which is used for registering the subparser.
936
937
938PackCC compiler-compiler
939~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
940
941PackCC is a compiler-compiler; it translates ``.peg`` grammar file to ``.c``
942file.  PackCC was originally written by Arihiro Yoshida. Its source
943repository is at https://github.com/arithy/packcc.
944
945The source tree of PackCC is grafted at ``misc/packcc`` directory.
946Building PackCC and ctags are integrated in the build-scripts of
947Universal Ctags.
948
949Refer `peg/valink.peg
950<https://github.com/universal-ctags/ctags/blob/master/peg/varlink.peg>`_ as a
951sample of a parser using PackCC.
952
953Automatic parser guessing (TBW)
954~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
955
956Managing regular expression parsers (TBW)
957~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
958
959Ghost kind in regex parser (TBW)
960~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
961
962.. TODO: Q: what is the point of documenting this?
963	from comment on #2916: I (@masatake) must explain the ghost kind.
964	from comment on #2916:
965		I (@masatake) found I must explain "placeholder tag". The ghost kind is
966		useful for fill the find field of the placeholder tag. I will write about
967		the Ghost kind when I write about the placeholder tag. I will write about
968		the placeholder tag when I write about Optscript.
969
970	If a whitespace is used as a kind letter, it is never printed when
971	ctags is called with ``--list-kinds`` option.  This kind is
972	automatically assigned to an empty name pattern.
973
974	Normally you don't need to know this.
975