xref: /Universal-ctags/docs/optlib.rst (revision 6024deefc593abced0b42582f4cf1a8658aac96f) !
1.. _optlib:
2
3Extending ctags with Regex parser (*optlib*)
4---------------------------------------------------------------------
5
6:Maintainer: Masatake YAMATO <yamato@redhat.com>
7
8.. contents:: `Table of contents`
9	:depth: 3
10	:local:
11
12.. TODO:
13	add a section on debugging
14
15Exuberant Ctags allows a user to add a new parser to ctags with ``--langdef=<LANG>``
16and ``--regex-<LANG>=...`` options.
17Universal Ctags follows and extends the design of Exuberant Ctags in more
18powerful ways and call the feature as *optlib parser*, which is described in in
19:ref:`ctags-optlib(7) <ctags-optlib(7)>` and the following sections.
20
21:ref:`ctags-optlib(7) <ctags-optlib(7)>` is the primary document of the optlib
22parser feature. The following sections provide additional information and more
23advanced features. Note that some of the features are experimental, and will be
24marked as such in the documentation.
25
26Lots of optlib parsers are included in Universal Ctags,
27`optlib/*.ctags <https://github.com/universal-ctags/ctags/tree/master/optlib>`_.
28They will be good examples when you develop your own parsers.
29
30A optlib parser can be translated into C source code. Your optlib parser can
31thus easily become a built-in parser. See ":ref:`optlib2c`" for details.
32
33Regular expression (regex) engine
34~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35
36Universal Ctags uses `the POSIX Extended Regular Expressions (ERE)
37<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html>`_
38syntax as same as Exuberant Ctags by default.
39
40During building Universal Ctags the ``configure`` script runs compatibility
41tests of the regex engine in the system library.  If tests pass the engine is
42used, otherwise the regex engine imported from `the GNU Gnulib library
43<https://www.gnu.org/software/gnulib/manual/gnulib.html#Regular-expressions>`_
44is used. In the latter case, ``ctags --list-features`` will contain
45``gnulib_regex``.
46
47See ``regex(7)`` or `the GNU Gnulib Manual
48<https://www.gnu.org/software/gnulib/manual/gnulib.html#Regular-expressions>`_
49for the details of the regular expression syntax.
50
51.. note::
52
53	The GNU regex engine supports some GNU extensions described `here
54	<https://www.gnu.org/software/gnulib/manual/gnulib.html#posix_002dextended-regular-expression-syntax>`_.
55	Note that an optlib parser using the extensions may not work with Universal
56	Ctags on some other systems.
57
58The POSIX Extended Regular Expressions (ERE) does
59*not* support many of the "modern" extensions such as lazy captures,
60non-capturing grouping, atomic grouping, possessive quantifiers, look-ahead/behind,
61etc. It may be notoriously slow when backtracking.
62
63A common error is forgetting that a
64POSIX ERE engine is always *greedy*; the '``*``' and '``+``' quantifiers match
65as much as possible, before backtracking from the end of their match.
66
67For example this pattern::
68
69	foo.*bar
70
71Will match this entire string, not just the first part::
72
73	foobar, bar, and even more bar
74
75Another detail to keep in mind is how the regex engine treats newlines.
76Universal Ctags compiles the regular expressions in the ``--regex-<LANG>`` and
77``--mline-regex-<LANG>`` options with ``REG_NEWLINE`` set. What that means is documented
78in the
79`POSIX specification <https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html>`_.
80One obvious effect is that the regex special dot any-character '``.``' does not match
81newline characters, the '``^``' anchor *does* match right after a newline, and
82the '``$``' anchor matches right before a newline. A more subtle issue is this text from the
83chapter "`Regular Expressions <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html>`_";
84"the use of literal <newline>s or any escape sequence equivalent produces undefined
85results". What that means is using a regex pattern with ``[^\n]+`` is invalid,
86and indeed in glibc produces very odd results. **Never use** '``\n``' in patterns
87for ``--regex-<LANG>``, and **never use them** in non-matching bracket expressions
88for ``--mline-regex-<LANG>`` patterns. For the experimental ``--_mtable-regex-<LANG>``
89you can safely use '``\n``' because that regex is not compiled with ``REG_NEWLINE``.
90
91And it may also have some known "quirks"
92with respect to escaping special characters in bracket expressions.
93For example, a pattern of ``[^\]]+`` is invalid in POSIX ERE, because the '``]``' is
94*not* special inside a bracket expression, and thus should **not** be escaped.
95Most regex engines ignore this subtle detail in POSIX ERE, and instead allow
96escaping it with '``\]``' inside the bracket expression and treat it as the
97literal character '``]``'. GNU glibc, however, does not generate an error but
98instead considers it undefined behavior, and in fact it will match very odd
99things. Instead you **must** use the more unintuitive ``[^]]+`` syntax. The same
100is technically true of other special characters inside a bracket expression,
101such as ``[^\)]+``, which should instead be ``[^)]+``. The ``[^\)]+`` will
102appear to work usually, but only because what it is really doing is matching any
103character but '``\``' *or* '``)``'. The only exceptions for using '``\``' inside a
104bracket expression are for '``\t``' and '``\n``', which ctags converts to their
105single literal character control codes before passing the pattern to glibc.
106
107You should always test your regex patterns against test files with strings that
108do and do not match. Pay particular emphasis to when it should *not* match, and
109how *much* it matches when it should.
110
111Perl-compatible regular expressions (PCRE2) engine
112~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113
114Universal Ctags optionally supports `Perl-Compatible Regular Expressions (PCRE2)
115<https://www.pcre.org/current/doc/html/pcre2syntax.html>`_ syntax
116only if the Universal Ctags is built with ``pcre2`` library.
117See the output of ``--list-features`` option to know whether your Universal
118Ctags is built-with ``pcre2`` or not.
119
120PCRE2 *does* support many "modern" extensions.
121For example this pattern::
122
123       foo.*?bar
124
125Will match just the first part, ``foobar``, not this entire string,::
126
127       foobar, bar, and even more bar
128
129Regex option argument flags
130~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131
132Many regex-based options described in this document support additional arguments
133in the form of long flags. Long flags are specified with surrounding '``{``' and
134'``}``'.
135
136The general format and placement is as follows:
137
138.. code-block:: ctags
139
140	--regex-<LANG>=<PATTERN>/<NAME>/[<KIND>/]LONGFLAGS
141
142Some examples:
143
144.. code-block:: ctags
145
146	--regex-Pod=/^=head1[ \t]+(.+)/\1/c/
147	--regex-Foo=/set=[^;]+/\1/v/{icase}
148	--regex-Man=/^\.TH[[:space:]]{1,}"([^"]{1,})".*/\1/t/{exclusive}{icase}{scope=push}
149	--regex-Gdbinit=/^#//{exclusive}
150
151Note that the last example only has two '``/``' forward-slashes following
152the regex pattern, as a shortened form when no kind-spec exists.
153
154The ``--mline-regex-<LANG>`` option also follows the above format. The
155experimental ``--_mtable-regex-<LANG>`` option follows a slightly
156modified version as well.
157
158Regex control flags
159......................................................................
160
161.. Q: why even discuss the single-character version of the flags? Just
162	make everyone use the long form.
163
164The regex matching can be controlled by adding flags to the ``--regex-<LANG>``,
165``--mline-regex-<LANG>``, and experimental ``--_mtable-regex-<LANG>`` options.
166This is done by either using the single character short flags ``b``, ``e`` and
167``i`` flags as explained in the *ctags.1* man page, or by using long flags
168described earlier. The long flags require more typing but are much more
169readable.
170
171The mapping between the older short flag names and long flag names is:
172
173=========== =========== ===========
174short flag  long flag   description
175=========== =========== ===========
176b           basic       Posix basic regular expression syntax.
177e           extend      Posix extended regular expression syntax (default).
178i           icase       Case-insensitive matching.
179=========== =========== ===========
180
181
182So the following ``--regex-<LANG>`` expression:
183
184.. code-block:: ctags
185
186   --kinddef-m4=d,definition,definitions
187   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d/x
188
189is the same as:
190
191.. code-block:: ctags
192
193   --kinddef-m4=d,definition,definitions
194   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d/{extend}
195
196The characters '``{``' and '``}``' may not be suitable for command line
197use, but long flags are mostly intended for option files.
198
199Exclusive flag in regex
200......................................................................
201
202By default, lines read from the input files will be matched against all the
203regular expressions defined with ``--regex-<LANG>``. Each successfully matched
204regular expression will emit a tag.
205
206In some cases another policy, exclusive-matching, is preferable to the
207all-matching policy. Exclusive-matching means the rest of regular
208expressions are not tried if one of regular expressions is matched
209successfully, for that input line.
210
211For specifying exclusive-matching the flags ``exclusive`` (long) and ``x``
212(short) were introduced. For example, this is used in
213:file:`optlib/gdbinit.ctags` for ignoring comment lines in gdb files,
214as follows:
215
216.. code-block:: ctags
217
218	--regex-Gdbinit=/^#//{exclusive}
219
220Comments in gdb files start with '``#``' so the above line is the first regex
221match line in :file:`gdbinit.ctags`, so that subsequent regex matches are
222not tried for the input line.
223
224If an empty name pattern (``//``) is used for the ``--regex-<LANG>`` option,
225ctags warns it as a wrong usage of the option. However, if the flags
226``exclusive`` or ``x`` is specified, the warning is suppressed.
227This is useful to ignore matched patterns as above.
228
229NOTE: This flag does not make sense in the multi-line ``--mline-regex-<LANG>``
230option nor the multi-table ``--_mtable-regex-<LANG>`` option.
231
232
233Experimental flags
234......................................................................
235
236.. note:: These flags are experimental. They apply to all regex option
237	types: basic ``--regex-<LANG>``, multi-line ``--mline-regex-<LANG>``,
238	and the experimental multi-table ``--_mtable-regex-<LANG>`` option.
239
240``_extra``
241
242	This flag indicates the tag should only be generated if the given
243	``extra`` type is enabled, as explained in ":ref:`extras`".
244
245``_field``
246
247	This flag allows a regex match to add additional custom fields to the
248	generated tag entry, as explained in ":ref:`fields`".
249
250``_role``
251
252	This flag allows a regex match to generate a reference tag entry and
253	specify the role of the reference, as explained in ":ref:`roles`".
254
255.. NOT REVIEWED YET
256
257``_anonymous=PREFIX``
258
259	This flag allows a regex match to generate an anonymous tag entry.
260	ctags gives a name starting with ``PREFIX`` and emits it.
261	This flag is useful to record the position for a language object
262	having no name. A lambda function in a functional programming
263	language is a typical example of a language object having no name.
264
265	Consider following input (``input.foo``):
266
267	.. code-block:: lisp
268
269		(let ((f (lambda (x) (+ 1 x))))
270			...
271			)
272
273	Consider following optlib file (``foo.ctags``):
274
275	.. code-block:: ctags
276		:emphasize-lines: 4
277
278		--langdef=Foo
279		--map-Foo=+.foo
280		--kinddef-Foo=l,lambda,lambda functions
281		--regex-Foo=/.*\(lambda .*//l/{_anonymous=L}
282
283	You can get following tags file:
284
285	.. code-block:: console
286
287		$ u-ctags  --options=foo.ctags -o - /tmp/input.foo
288		Le4679d360100	/tmp/input.foo	/^(let ((f (lambda (x) (+ 1 x))))$/;"	l
289
290
291.. _extras:
292
293Conditional tagging with extras
294^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
295
296.. NEEDS MORE REVIEWS
297
298If a matched pattern should only be tagged when an ``extra`` flag is enabled,
299mark the pattern with ``{_extra=XNAME}`` where ``XNAME`` is the name of the
300extra. You must define a ``XNAME`` with the
301``--_extradef-<LANG>=XNAME,DESCRIPTION`` option before defining a regex flag
302marked ``{_extra=XNAME}``.
303
304.. code-block:: python
305
306	if __name__ == '__main__':
307		do_something()
308
309To capture the lines above in a python program (``input.py``), an ``extra`` flag can
310be used.
311
312.. code-block:: ctags
313	:emphasize-lines: 1-2
314
315	--_extradef-Python=main,__main__ entry points
316	--regex-Python=/^if __name__ == '__main__':/__main__/f/{_extra=main}
317
318The above optlib (``python-main.ctags``) introduces ``main`` extra to the Python parser.
319The pattern matching is done only when the ``main`` is enabled.
320
321.. code-block:: console
322
323	$ ctags --options=python-main.ctags -o - --extras-Python='+{main}' input.py
324	__main__	input.py	/^if __name__ == '__main__':$/;"	f
325
326
327.. TODO: this "fields" section should probably be moved up this document, as a
328	subsection in the "Regex option argument flags" section
329
330.. _fields:
331
332Adding custom fields to the tag output
333^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
334
335.. NEEDS MORE REVIEWS
336
337Exuberant Ctags allows just one of the specified groups in a regex pattern to
338be used as a part of the name of a tag entry.
339
340Universal Ctags allows using the other groups in the regex pattern.
341An optlib parser can have its specific fields. The groups can be used as a
342value of the fields of a tag entry.
343
344Let's think about `Unknown`, an imaginary language.
345Here is a source file (``input.unknown``) written in `Unknown`:
346
347.. code-block:: java
348
349	public func foo(n, m);
350	protected func bar(n);
351	private func baz(n,...);
352
353With ``--regex-Unknown=...`` Exuberant Ctags can capture ``foo``, ``bar``, and ``baz``
354as names. Universal Ctags can attach extra context information to the
355names as values for fields. Let's focus on ``bar``. ``protected`` is a
356keyword to control how widely the identifier ``bar`` can be accessed.
357``(n)`` is the parameter list of ``bar``. ``protected`` and ``(n)`` are
358extra context information of ``bar``.
359
360With the following optlib file (``unknown.ctags``), ctags can attach
361``protected`` to the field protection and ``(n)`` to the field signature.
362
363.. code-block:: ctags
364	:emphasize-lines: 5-9
365
366	--langdef=unknown
367	--kinddef-unknown=f,func,functions
368	--map-unknown=+.unknown
369
370	--_fielddef-unknown=protection,access scope
371	--_fielddef-unknown=signature,signatures
372
373	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
374	--fields-unknown=+'{protection}{signature}'
375
376For the line ``protected func bar(n);`` you will get following tags output::
377
378	bar	input.unknown	/^protected func bar(n);$/;"	f	protection:protected	signature:(n)
379
380Let's see the detail of ``unknown.ctags``.
381
382.. code-block:: ctags
383
384	--_fielddef-unknown=protection,access scope
385
386``--_fielddef-<LANG>=name,description`` defines a new field for a parser
387specified by *<LANG>*.  Before defining a new field for the parser,
388the parser must be defined with ``--langdef=<LANG>``. ``protection`` is
389the field name used in tags output. ``access scope`` is the description
390used in the output of ``--list-fields`` and ``--list-fields=Unknown``.
391
392.. code-block:: ctags
393
394	--_fielddef-unknown=signature,signatures
395
396This defines a field named ``signature``.
397
398.. code-block:: ctags
399
400	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
401
402This option requests making a tag for the name that is specified with the group 3 of the
403pattern, attaching the group 1 as a value for ``protection`` field to the tag, and attaching
404the group 4 as a value for ``signature`` field to the tag. You can use the long regex flag
405``_field`` for attaching fields to a tag with the following notation rule::
406
407	{_field=FIELDNAME:GROUP}
408
409
410``--fields-<LANG>=[+|-]{FIELDNAME}`` can be used to enable or disable specified field.
411
412When defining a new parser specific field, it is disabled by default. Enable the
413field explicitly to use the field. See ":ref:`Parser specific fields <parser-specific-fields>`"
414about ``--fields-<LANG>`` option.
415
416`passwd` parser is a simple example that uses ``--fields-<LANG>`` option.
417
418
419.. _roles:
420
421Capturing reference tags
422^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
423
424.. NOT REVIEWED YET
425
426To make a reference tag with an optlib parser, specify a role with
427``_role`` long regex flag. Let's see an example:
428
429.. code-block:: ctags
430	:emphasize-lines: 3-6
431
432	--langdef=FOO
433	--kinddef-FOO=m,module,modules
434	--_roledef-FOO.m=imported,imported module
435	--regex-FOO=/import[ \t]+([a-z]+)/\1/m/{_role=imported}
436	--extras=+r
437	--fields=+r
438
439A role must be defined before specifying it as value for ``_role`` flag.
440``--_roledef-<LANG>.<KIND>=<ROLE>,<ROLEDESC>`` option is for defining a role.
441See the line, ``--regex-FOO=...``.  In this parser `FOO`, the name of an
442imported module is captured as a reference tag with role ``imported``.
443
444For specifying *<KIND>* where the role is defined, you can use either a
445kind letter or a kind name surrounded by '``{``' and '``}``'.
446
447The option has two parameters separated by a comma:
448
449*<ROLE>*
450
451	the role name, and
452
453*<ROLEDESC>*
454
455	the description of the role.
456
457The first parameter is the name of the role. The role is defined in
458the kind *<KIND>* of the language *<LANG>*. In the example,
459``imported`` role is defined in the ``module`` kind, which is specified
460with ``m``. You can use ``{module}``, the name of the kind instead.
461
462The kind specified in ``--_roledef-<LANG>.<KIND>`` option must be
463defined *before* using the option. See the description of
464``--kinddef-<LANG>`` for defining a kind.
465
466The roles are listed with ``--list-roles=<LANG>``. The name and description
467passed to ``--_roledef-<LANG>.<KIND>`` option are used in the output like::
468
469	$ ctags --langdef=FOO --kinddef-FOO=m,module,modules \
470				--_roledef-FOO.m='imported,imported module' --list-roles=FOO
471	#KIND(L/N) NAME     ENABLED DESCRIPTION
472	m/module   imported on      imported module
473
474
475If specifying ``_role`` regex flag multiple times with different roles, you can
476assign multiple roles to a reference tag.  See following input of C language
477
478.. code-block:: C
479
480	x  = 0;
481	i += 1;
482
483An ultra fine grained C parser may capture the variable ``x`` with
484``lvalue`` role and the variable ``i`` with ``lvalue`` and ``incremented``
485roles.
486
487You can implement such roles by extending the built-in C parser:
488
489.. code-block:: ctags
490	:emphasize-lines: 2-5
491
492	# c-extra.ctags
493	--_roledef-C.v=lvalue,locator values
494	--_roledef-C.v=incremented,incremented with ++ operator
495	--regex-C=/([a-zA-Z_][a-zA-Z_0-9]*) *=/\1/v/{_role=lvalue}
496	--regex-C=/([a-zA-Z_][a-zA-Z_0-9]*) *\+=/\1/v/{_role=lvalue}{_role=incremented}
497
498.. code-block:: console
499
500	$ ctags with --options=c-extra.ctags --extras=+r --fields=+r
501	i	input.c	/^i += 1;$/;"	v	roles:lvalue,incremented
502	x	input.c	/^x = 0;$/;"	v	roles:lvalue
503
504
505Scope tracking in a regex parser
506......................................................................
507
508About the ``{scope=..}`` flag itself for scope tracking, see "FLAGS FOR
509--regex-<LANG> OPTION" section of :ref:`ctags-optlib(7) <ctags-optlib(7)>`.
510
511Example 1:
512
513.. code-block:: python
514
515	# in /tmp/input.foo
516	class foo:
517	def bar(baz):
518		print(baz)
519	class goo:
520	def gar(gaz):
521		print(gaz)
522
523.. code-block:: ctags
524	:emphasize-lines: 7,8
525
526	# in /tmp/foo.ctags:
527	--langdef=Foo
528	--map-Foo=+.foo
529	--kinddef-Foo=c,class,classes
530	--kinddef-Foo=d,definition,definitions
531
532	--regex-Foo=/^class[[:blank:]]+([[:alpha:]]+):/\1/c/{scope=set}
533	--regex-Foo=/^[[:blank:]]+def[[:blank:]]+([[:alpha:]]+).*:/\1/d/{scope=ref}
534
535.. code-block:: console
536
537	$ ctags --options=/tmp/foo.ctags -o - /tmp/input.foo
538	bar	/tmp/input.foo	/^    def bar(baz):$/;"	d	class:foo
539	foo	/tmp/input.foo	/^class foo:$/;"	c
540	gar	/tmp/input.foo	/^    def gar(gaz):$/;"	d	class:goo
541	goo	/tmp/input.foo	/^class goo:$/;"	c
542
543
544Example 2:
545
546.. code-block:: c
547
548	// in /tmp/input.pp
549	class foo {
550		int bar;
551	}
552
553.. code-block:: ctags
554	:emphasize-lines: 7-9
555
556	# in /tmp/pp.ctags:
557	--langdef=pp
558	--map-pp=+.pp
559	--kinddef-pp=c,class,classes
560	--kinddef-pp=v,variable,variables
561
562	--regex-pp=/^[[:blank:]]*\}//{scope=pop}{exclusive}
563	--regex-pp=/^class[[:blank:]]*([[:alnum:]]+)[[[:blank:]]]*\{/\1/c/{scope=push}
564	--regex-pp=/^[[:blank:]]*int[[:blank:]]*([[:alnum:]]+)/\1/v/{scope=ref}
565
566.. code-block:: console
567
568	$ ctags --options=/tmp/pp.ctags -o - /tmp/input.pp
569	bar	/tmp/input.pp	/^    int bar$/;"	v	class:foo
570	foo	/tmp/input.pp	/^class foo {$/;"	c
571
572
573Example 3:
574
575.. code-block::
576
577	# in /tmp/input.docdoc
578	title T
579	...
580	section S0
581	...
582	section S1
583	...
584
585.. code-block:: ctags
586	:emphasize-lines: 15,21
587
588	# in /tmp/doc.ctags:
589	--langdef=doc
590	--map-doc=+.docdoc
591	--kinddef-doc=s,section,sections
592	--kinddef-doc=S,subsection,subsections
593
594	--_tabledef-doc=main
595	--_tabledef-doc=section
596	--_tabledef-doc=subsection
597
598	--_mtable-regex-doc=main/section +([^\n]+)\n/\1/s/{scope=push}{tenter=section}
599	--_mtable-regex-doc=main/[^\n]+\n|[^\n]+|\n//
600	--_mtable-regex-doc=main///{scope=clear}{tquit}
601
602	--_mtable-regex-doc=section/section +([^\n]+)\n/\1/s/{scope=replace}
603	--_mtable-regex-doc=section/subsection +([^\n]+)\n/\1/S/{scope=push}{tenter=subsection}
604	--_mtable-regex-doc=section/[^\n]+\n|[^\n]+|\n//
605	--_mtable-regex-doc=section///{scope=clear}{tquit}
606
607	--_mtable-regex-doc=subsection/(section )//{_advanceTo=0start}{tleave}{scope=pop}
608	--_mtable-regex-doc=subsection/subsection +([^\n]+)\n/\1/S/{scope=replace}
609	--_mtable-regex-doc=subsection/[^\n]+\n|[^\n]+|\n//
610	--_mtable-regex-doc=subsection///{scope=clear}{tquit}
611
612.. code-block:: console
613
614	% ctags --sort=no --fields=+nl --options=/tmp/doc.ctags -o - /tmp/input.docdoc
615	SEC0	/tmp/input.docdoc	/^section SEC0$/;"	s	line:1	language:doc
616	SUB0-1	/tmp/input.docdoc	/^subsection SUB0-1$/;"	S	line:3	language:doc	section:SEC0
617	SUB0-2	/tmp/input.docdoc	/^subsection SUB0-2$/;"	S	line:5	language:doc	section:SEC0
618	SEC1	/tmp/input.docdoc	/^section SEC1$/;"	s	line:7	language:doc
619	SUB1-1	/tmp/input.docdoc	/^subsection SUB1-1$/;"	S	line:9	language:doc	section:SEC1
620	SUB1-2	/tmp/input.docdoc	/^subsection SUB1-2$/;"	S	line:11	language:doc	section:SEC1
621
622
623NOTE: This flag doesn't work well with ``--mline-regex-<LANG>=``.
624
625Overriding the letter for file kind
626~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
627
628.. Q: this was fixed in https://github.com/universal-ctags/ctags/pull/331
629	so can we remove this section?
630
631One of the built-in tag kinds in Universal Ctags is the ``F`` file kind.
632Overriding the letter for file kind is not allowed in Universal Ctags.
633
634.. warning::
635
636	Don't use ``F`` as a kind letter in your parser. (See issue `#317
637	<https://github.com/universal-ctags/ctags/issues/317>`_ on github)
638
639Generating fully qualified tags automatically from scope information
640~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
641
642If scope fields are filled properly with ``{scope=...}`` regex flags,
643you can use the field values for generating fully qualified tags.
644About the ``{scope=..}`` flag itself, see "FLAGS FOR --regex-<LANG>
645OPTION" section of :ref:`ctags-optlib(7) <ctags-optlib(7)>`.
646
647Specify ``{_autoFQTag}`` to the end of ``--langdef=<LANG>`` option like
648``--langdef=Foo{_autoFQTag}`` to make ctags generate fully qualified
649tags automatically.
650
651'``.``' is the (ctags global) default separator combining names into a
652fully qualified tag. You can customize separators with
653``--_scopesep-<LANG>=...`` option.
654
655input.foo::
656
657  class X
658     var y
659  end
660
661foo.ctags:
662
663.. code-block:: ctags
664	:emphasize-lines: 1
665
666	--langdef=foo{_autoFQTag}
667	--map-foo=+.foo
668	--kinddef-foo=c,class,classes
669	--kinddef-foo=v,var,variables
670	--regex-foo=/class ([A-Z]*)/\1/c/{scope=push}
671	--regex-foo=/end///{placeholder}{scope=pop}
672	--regex-foo=/[ \t]*var ([a-z]*)/\1/v/{scope=ref}
673
674Output::
675
676	$ u-ctags --quiet --options=./foo.ctags -o - input.foo
677	X	input.foo	/^class X$/;"	c
678	y	input.foo	/^	var y$/;"	v	class:X
679
680	$ u-ctags --quiet --options=./foo.ctags --extras=+q -o - input.foo
681	X	input.foo	/^class X$/;"	c
682	X.y	input.foo	/^	var y$/;"	v	class:X
683	y	input.foo	/^	var y$/;"	v	class:X
684
685
686``X.y`` is printed as a fully qualified tag when ``--extras=+q`` is given.
687
688.. NOT REVIEWED YET (--_scopesep)
689
690Customizing scope separators
691......................................................................
692Use ``--_scopesep-<LANG>=[<parent-kindLetter>]/<child-kindLetter>:<sep>``
693option for customizing if the language uses ``{_autoFQTag}``.
694
695``parent-kindLetter``
696
697	The kind letter for a tag of outer-scope.
698
699	You can use '``*``' for specifying as wildcards that means
700	*any kinds* for a tag of outer-scope.
701
702	If you omit ``parent-kindLetter``, the separator is used as
703	a prefix for tags having the kind specified with ``child-kindLetter``.
704	This prefix can be used to refer to global namespace or similar concepts if the
705	language has one.
706
707``child-kindLetter``
708
709	The kind letter for a tag of inner-scope.
710
711	You can use '``*``' for specifying as wildcards that means
712	*any kinds* for a tag of inner-scope.
713
714``sep``
715
716	In a qualified tag, if the outer-scope has kind and ``parent-kindLetter``
717	the inner-scope has ``child-kindLetter``, then ``sep`` is instead in
718	between the scope names in the generated tags file.
719
720specifying '``*``' as both  ``parent-kindLetter`` and ``child-kindLetter``
721sets ``sep`` as the language default separator. It is used as fallback.
722
723Specifying '``*``' as ``child-kindLetter`` and omitting ``parent-kindLetter``
724sets ``sep`` as the language default prefix. It is used as fallback.
725
726
727NOTE: There is no ctags global default prefix.
728
729NOTE: ``_scopesep-<LANG>=...`` option affects only a parser that
730enables ``_autoFQTag``. A parser building full qualified tags
731manually ignores the option.
732
733Let's see an example.
734The input file is written in Tcl.  Tcl parser is not an optlib
735parser. However, it uses the ``_autoFQTag`` feature internally.
736Therefore, ``_scopesep-Tcl=`` option works well. Tcl parser
737defines two kinds ``n`` (``namespace``) and ``p`` (``procedure``).
738
739By default, Tcl parser uses ``::`` as scope separator. The parser also
740uses ``::`` as root prefix.
741
742.. code-block:: tcl
743
744	namespace eval N {
745		namespace eval M {
746			proc pr0 {s} {
747				puts $s
748			}
749		}
750	}
751
752	proc pr1 {s} {
753		puts $s
754	}
755
756``M`` is defined under the scope of ``N``. ``pr0`` is defined	under the scope
757of ``M``. ``N`` and ``pr1`` are at top level (so they are candidates to be added
758prefixes). ``M`` and ``N`` are language objects with ``n`` (``namespace``) kind.
759``pr0`` and ``pr1`` are language objects with ``p`` (``procedure``) kind.
760
761.. code-block:: console
762
763	$ ctags -o - --extras=+q input.tcl
764	::N	input.tcl	/^namespace eval N {$/;"	n
765	::N::M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
766	::N::M::pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N::M
767	::pr1	input.tcl	/^proc pr1 {s} {$/;"	p
768	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
769	N	input.tcl	/^namespace eval N {$/;"	n
770	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N::M
771	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
772
773Let's change the default separator to ``->``:
774
775.. code-block:: console
776	:emphasize-lines: 1
777
778	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' input.tcl
779	::N	input.tcl	/^namespace eval N {$/;"	n
780	::N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
781	::N->M->pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N->M
782	::pr1	input.tcl	/^proc pr1 {s} {$/;"	p
783	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
784	N	input.tcl	/^namespace eval N {$/;"	n
785	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N->M
786	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
787
788Let's define '``^``' as default prefix:
789
790.. code-block:: console
791	:emphasize-lines: 1
792
793	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' input.tcl
794	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
795	N	input.tcl	/^namespace eval N {$/;"	n
796	^N	input.tcl	/^namespace eval N {$/;"	n
797	^N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
798	^N->M->pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
799	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
800	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
801	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
802
803Let's override the specification of separator for combining a
804namespace and a procedure with '``+``': (About the separator for
805combining a namespace and another namespace, ctags uses the default separator.)
806
807.. code-block:: console
808	:emphasize-lines: 1
809
810	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' --_scopesep-Tcl='n/p:+' input.tcl
811	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
812	N	input.tcl	/^namespace eval N {$/;"	n
813	^N	input.tcl	/^namespace eval N {$/;"	n
814	^N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
815	^N->M+pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
816	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
817	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
818	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
819
820Let's override the definition of prefix for a namespace with '``@``':
821(About the prefix for procedures, ctags uses the default prefix.)
822
823.. code-block:: console
824	:emphasize-lines: 1
825
826	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' --_scopesep-Tcl='n/p:+' --_scopesep-Tcl='/n:@' input.tcl
827	@N	input.tcl	/^namespace eval N {$/;"	n
828	@N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:@N
829	@N->M+pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:@N->M
830	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:@N
831	N	input.tcl	/^namespace eval N {$/;"	n
832	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
833	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:@N->M
834	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
835
836
837Multi-line pattern match
838~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
839
840We often need to scan multiple lines to generate a tag, whether due to
841needing contextual information to decide whether to tag or not, or to
842constrain generating tags to only certain cases, or to grab multiple
843substrings to generate the tag name.
844
845Universal Ctags has two ways to accomplish this: *multi-line regex options*,
846and an experimental *multi-table regex options* described later.
847
848The newly introduced ``--mline-regex-<LANG>`` is similar to ``--regex-<LANG>``
849except the pattern is applied to the whole file's contents, not line by line.
850
851This example is based on an issue `#219
852<https://github.com/universal-ctags/ctags/issues/219>`_ posted by
853@andreicristianpetcu:
854
855.. code-block:: java
856
857	// in input.java:
858
859	@Subscribe
860	public void catchEvent(SomeEvent e)
861	{
862	   return;
863	}
864
865	@Subscribe
866	public void
867	recover(Exception e)
868	{
869	    return;
870	}
871
872The above java code is similar to the Java `Spring <https://spring.io>`_
873framework. The ``@Subscribe`` annotation is a keyword for the framework, and the
874developer would like to have a tag generated for each method annotated with
875``@Subscribe``, using the name of the method followed by a dash followed by the
876type of the argument. For example the developer wants the tag name
877``Event-SomeEvent`` generated for the first method shown above.
878
879To accomplish this, the developer creates a :file:`spring.ctags` file with
880the following:
881
882.. code-block:: ctags
883	:emphasize-lines: 4
884
885	# in spring.ctags:
886	--langdef=javaspring
887	--map-javaspring=+.java
888	--mline-regex-javaspring=/@Subscribe([[:space:]])*([a-z ]+)[[:space:]]*([a-zA-Z]*)\(([a-zA-Z]*)/\3-\4/s,subscription/{mgroup=3}
889	--fields=+ln
890
891And now using :file:`spring.ctags` the tag file has this:
892
893.. code-block:: console
894
895	$ ctags -o - --options=./spring.ctags input.java
896	Event-SomeEvent	input.java	/^public void catchEvent(SomeEvent e)$/;"	s	line:2	language:javaspring
897	recover-Exception	input.java	/^    recover(Exception e)$/;"	s	line:10	language:javaspring
898
899Multiline pattern flags
900......................................................................
901
902.. note:: These flags also apply to the experimental ``--_mtable-regex-<LANG>``
903	option described later.
904
905``{mgroup=N}``
906
907	This flag indicates the pattern should be applied to the whole file
908	contents, not line by line. ``N`` is the number of a capture group in the
909	pattern, which is used to record the line number location of the tag. In the
910	above example ``3`` is specified. The start position of the regex capture
911	group 3, relative to the whole file is used.
912
913.. warning:: You **must** add an ``{mgroup=N}`` flag to the multi-line
914	``--mline-regex-<LANG>`` option, even if the ``N`` is ``0`` (meaning the
915	start position of the whole regex pattern). You do not need to add it for
916	the multi-table ``--_mtable-regex-<LANG>``.
917
918.. TODO: Q: isn't the above restriction really a bug? I think it is. I should fix it.
919   Q to @masatake-san: Do you mean that {mgroup=0} can be omitted? -> #2918 is opened
920
921
922``{_advanceTo=N[start|end]}``
923
924	A regex pattern is applied to whole file's contents iteratively. This long
925	flag specifies from where the pattern should be applied in the next
926	iteration for regex matching. When a pattern matches, the next pattern
927	matching starts from the start or end of capture group ``N``. By default it
928	advances to the end of the whole match (i.e., ``{_advanceTo=0end}`` is
929	the default).
930
931
932	Let's think about following input
933	::
934
935	   def def abc
936
937	Consider two sets of options, ``foo.ctags`` and ``bar.ctags``.
938
939	.. code-block:: ctags
940		:emphasize-lines: 5
941
942		# foo.ctags:
943	   	--langdef=foo
944	   	--langmap=foo:.foo
945	   	--kinddef-foo=a,something,something
946	   	--mline-regex-foo=/def *([a-z]+)/\1/a/{mgroup=1}
947
948
949	.. code-block:: ctags
950		:emphasize-lines: 5
951
952		# bar.ctags:
953		--langdef=bar
954		--langmap=bar:.bar
955		--kinddef-bar=a,something,something
956		--mline-regex-bar=/def *([a-z]+)/\1/a/{mgroup=1}{_advanceTo=1start}
957
958	``foo.ctags`` emits following tags output::
959
960	   def	input.foo	/^def def abc$/;"	a
961
962	``bar.ctags`` emits following tags output::
963
964	   def	input-0.bar	/^def def abc$/;"	a
965	   abc	input-0.bar	/^def def abc$/;"	a
966
967	``_advanceTo=1start`` is specified in ``bar.ctags``.
968	This allows ctags to capture ``abc``.
969
970	At the first iteration, the patterns of both
971	``foo.ctags`` and ``bar.ctags`` match as follows
972	::
973
974		0   1       (start)
975		v   v
976		def def abc
977		       ^
978		       0,1  (end)
979
980	``def`` at the group 1 is captured as a tag in
981	both languages. At the next iteration, the positions
982	where the pattern matching is applied to are not the
983	same in the languages.
984
985	``foo.ctags``
986	::
987
988		       0end (default)
989		       v
990		def def abc
991
992
993	``bar.ctags``
994	::
995
996		    1start (as specified in _advanceTo long flag)
997		    v
998		def def abc
999
1000	This difference of positions makes the difference of tags output.
1001
1002	A more relevant use-case is when ``{_advanceTo=N[start|end]}`` is used in
1003	the experimental ``--_mtable-regex-<LANG>``, to "advance" back to the
1004	beginning of a match, so that one can generate multiple tags for the same
1005	input line(s).
1006
1007.. note:: This flag doesn't work well with scope related flags and ``exclusive`` flags.
1008
1009
1010.. Q: this was previously titled "Byte oriented pattern matching...", presumably
1011	because it "matched against the input at the current byte position, not line".
1012	But that's also true for --mline-regex-<LANG>, as far as I can tell.
1013
1014Advanced pattern matching with multiple regex tables
1015~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1016
1017.. note:: This is a highly experimental feature. This will not go into
1018	the man page of 6.0. But let's be honest, it's the most exciting feature!
1019
1020In some cases, the ``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options are not
1021sufficient to generate the tags for a particular language. Some of the common
1022reasons for this are:
1023
1024* To ignore commented lines or sections for the language file, so that
1025  tags aren't generated for symbols that are within the comments.
1026* To enter and exit scope, and use it for tagging based on contextual
1027  state or with end-scope markers that are difficult to match to their
1028  associated scope entry point.
1029* To support nested scopes.
1030* To change the pattern searched for, or the resultant tag for the same
1031  pattern, based on scoping or contextual location.
1032* To break up an overly complicated ``--mline-regex-<LANG>`` pattern into
1033  separate regex patterns, for performance or readability reasons.
1034
1035To help handle such things, Universal Ctags has been enhanced with multi-table
1036regex matching. The feature is inspired by `lex`, the fast lexical analyzer
1037generator, which is a popular tool on Unix environments for writing parsers, and
1038`RegexLexer <http://pygments.org/docs/lexerdevelopment/>`_ of Pygments.
1039Knowledge about them will help you understand the new options.
1040
1041The new options are:
1042
1043``--_tabledef-<LANG>``
1044	Declares a new regex matching table of a given name for the language,
1045	as described in ":ref:`tabledef`".
1046
1047``--_mtable-regex-<LANG>``
1048	Adds a regex pattern and associated tag generation information and flags, to
1049	the given table, as described in ":ref:`mtable_regex`".
1050
1051``--_mtable-extend-<LANG>``
1052	Includes a previously-defined regex table to the named one.
1053
1054The above will be discussed in more detail shortly.
1055
1056First, let's explain the feature with an example. Consider an
1057imaginary language `X` has a similar syntax as JavaScript: ``var`` is
1058used as defining variable(s), and "``/* ... */``" is used for block
1059comments.
1060
1061Here is our input, :file:`input.x`:
1062
1063.. code-block:: java
1064
1065   /* BLOCK COMMENT
1066   var dont_capture_me;
1067   */
1068   var a /* ANOTHER BLOCK COMMENT */, b;
1069
1070We want ctags to capture ``a`` and ``b`` - but it is difficult to write a parser
1071that will ignore ``dont_capture_me`` in the comment with a classical regex
1072parser defined with ``--regex-<LANG>`` or ``--mline-regex-<LANG>``, because of
1073the block comments.
1074
1075The ``--regex-<LANG>`` option only works on one line at a time, so can not know
1076``dont_capture_me`` is within comments. The ``--mline-regex-<LANG>`` could
1077do it in theory, but due to the greedy nature of the regex engine it is
1078impractical and potentially inefficient to do so, given that there could be
1079multiple block comments in the file, with '``*``' inside them, etc.
1080
1081A parser written with multi-table regex, on the other hand, can capture only
1082``a`` and ``b`` safely. But it is more complicated to understand.
1083
1084Here is the 1st version of :file:`X.ctags`:
1085
1086.. code-block:: ctags
1087
1088   --langdef=X
1089   --map-X=.x
1090   --kinddef-X=v,var,variables
1091
1092Not so interesting. It doesn't really *do* anything yet. It just creates a new
1093language named ``X``, for files ending with a :file:`.x` suffix, and defines a
1094new tag for variable kinds.
1095
1096When writing a multi-table parser, you have to think about the necessary states
1097of parsing. For the parser of language `X`, we need the following states:
1098
1099* `toplevel` (initial state)
1100* `comment` (inside comment)
1101* `vars` (var statements)
1102
1103.. _tabledef:
1104
1105Declaring a new regex table
1106......................................................................
1107
1108Before adding regular expressions, you have to declare tables for each state
1109with the ``--_tabledef-<LANG>=<TABLE>`` option.
1110
1111Here is the 2nd version of :file:`X.ctags` doing so:
1112
1113.. code-block:: ctags
1114	:emphasize-lines: 5-7
1115
1116	--langdef=X
1117	--map-X=.x
1118	--kinddef-X=v,var,variables
1119
1120	--_tabledef-X=toplevel
1121	--_tabledef-X=comment
1122	--_tabledef-X=vars
1123
1124For table names, only characters in the range ``[0-9a-zA-Z_]`` are acceptable.
1125
1126For a given language, for each file's input the ctags multi-table parser begins
1127with the first declared table. For :file:`X.ctags`, ``toplevel`` is the one.
1128The other tables are only ever entered/checked if another table specified to do
1129so, starting with the first table. In other words, if the first declared table
1130does not find a match for the current input, and does not specify to go to
1131another table, the other tables for that language won't be used. The flags to go
1132to another table are ``{tenter}``, ``{tleave}``, and ``{tjump}``, as described
1133later.
1134
1135.. _mtable_regex:
1136
1137Adding a regex to a regex table
1138......................................................................
1139
1140The new option to add a regex to a declared table is ``--_mtable-regex-<LANG>``,
1141and it follows this form:
1142
1143.. code-block:: ctags
1144
1145	--_mtable-regex-<LANG>=<TABLE>/<PATTERN>/<NAME>/[<KIND>]/LONGFLAGS
1146
1147The parameters for ``--_mtable-regex-<LANG>`` look complicated. However,
1148``<PATTERN>``, ``<NAME>``, and ``<KIND>`` are the same as the parameters of the
1149``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options. ``<TABLE>`` is simply
1150the name of a table previously declared with the ``--_tabledef-<LANG>`` option.
1151
1152A regex pattern added to a parser with ``--_mtable-regex-<LANG>`` is matched
1153against the input at the current byte position, not line. Even if you do not
1154specify the '``^``' anchor at the start of the pattern, ctags adds '``^``' to
1155the pattern automatically. Unlike the ``--regex-<LANG>`` and
1156``--mline-regex-<LANG>`` options, a '``^``' anchor does not mean "beginning of
1157line" in ``--_mtable-regex-<LANG>``; instead it means the beginning of the
1158input string (i.e., the current byte position).
1159
1160The ``LONGFLAGS`` include the already discussed flags for ``--regex-<LANG>`` and
1161``--mline-regex-<LANG>``: ``{scope=...}``, ``{mgroup=N}``, ``{_advanceTo=N}``,
1162``{basic}``, ``{extend}``, and ``{icase}``. The ``{exclusive}`` flag does not
1163make sense for multi-table regex.
1164
1165In addition, several new flags are introduced exclusively for multi-table
1166regex use:
1167
1168``{tenter}``
1169	Push the current table on the stack, and enter another table.
1170
1171``{tleave}``
1172	Leave the current table, pop the stack, and go to the table that was
1173	just popped from the stack.
1174
1175``{tjump}``
1176	Jump to another table, without affecting the stack.
1177
1178``{treset}``
1179	Clear the stack, and go to another table.
1180
1181``{tquit}``
1182	Clear the stack, and stop processing the current input file for this
1183	language.
1184
1185To explain the above new flags, we'll continue using our example in the
1186next section.
1187
1188Skipping block comments
1189......................................................................
1190
1191Let's continue with our example. Here is the 3rd version of :file:`X.ctags`:
1192
1193.. code-block:: ctags
1194	:emphasize-lines: 9-13
1195	:linenos:
1196
1197	--langdef=X
1198	--map-X=.x
1199	--kinddef-X=v,var,variables
1200
1201	--_tabledef-X=toplevel
1202	--_tabledef-X=comment
1203	--_tabledef-X=vars
1204
1205	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1206	--_mtable-regex-X=toplevel/.//
1207
1208	--_mtable-regex-X=comment/\*\///{tleave}
1209	--_mtable-regex-X=comment/.//
1210
1211Four ``--_mtable-regex-X`` lines are added for skipping the block comments. Let's
1212discuss them one by one.
1213
1214For each new file it scans, ctags always chooses the first pattern of the
1215first table of the parser. Even if it's an empty table, ctags will only try
1216the first declared table. (in such a case it would immediately fail to match
1217anything, and thus stop processing the input file and effectively do nothing)
1218
1219The first declared table (``toplevel``) has the following regex added to
1220it first:
1221
1222.. code-block:: ctags
1223	:linenos:
1224	:lineno-start: 9
1225
1226	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1227
1228A pattern of ``\/\*`` is added to the ``toplevel`` table, to match the
1229beginning of a block comment. A backslash character is used in front of the
1230leading '``/``' to escape the separation character '``/``' that separates the fields
1231of ``--_mtable-regex-<LANG>``. Another backslash inside the pattern is used
1232before the asterisk '``*``', to make it a literal asterisk character in regex.
1233
1234The last ``//`` means ctags should not tag something matching this pattern.
1235In ``--regex-<LANG>`` you never use ``//`` because it would be pointless to
1236match something and not tag it using and single-line ``--regex-<LANG>``; in
1237multi-line ``--mline-regex-<LANG>`` you rarely see it, because it would rarely
1238be useful. But in multi-table regex it's quite common, since you frequently
1239want to transition from one state to another (i.e., ``tenter`` or ``tjump``
1240from one table to another).
1241
1242The long flag added to our first regex of our first table is ``tenter``, which
1243is a long flag for switching the table and pushing on the stack. ``{tenter=comment}``
1244means "switch the table from toplevel to comment".
1245
1246So given the input file :file:`input.x` shown earlier, ctags will begin at
1247the ``toplevel`` table and try to match the first regex. It will succeed, and
1248thus push on the stack and go to the ``comment`` table.
1249
1250It will begin at the top of the ``comment`` table (it always begins at the top
1251of a given table), and try each regex line in sequence until it finds a match.
1252If it fails to find a match, it will pop the stack and go to the table that was
1253just popped from the stack, and begin trying to match at the top of *that* table.
1254If it continues failing to find a match, and ultimately reaches the end of the
1255stack, it will stop processing for this file. For the next input file, it will
1256begin again from the top of the first declared table.
1257
1258Getting back to our example, the top of the ``comment`` table has this regex:
1259
1260.. code-block:: ctags
1261	:linenos:
1262	:lineno-start: 12
1263
1264	--_mtable-regex-X=comment/\*\///{tleave}
1265
1266Similar to the previous ``toplevel`` table pattern, this one for ``\*\/`` uses
1267a backslash to escape the separator '``/``', as well as one before the '``*``' to
1268make it a literal asterisk in regex. So what it's looking for, from a simple
1269string perspective, is the sequence ``*/``. Note that this means even though
1270you see three backslashes ``///`` at the end, the first one is escaped and used
1271for the pattern itself, and the ``--_mtable-regex-X`` only has ``//`` to
1272separate the regex pattern from the long flags, instead of the usual ``///``.
1273Thus it's using the shorthand form of the ``--_mtable-regex-X`` option.
1274It could instead have been:
1275
1276.. code-block:: ctags
1277
1278	--_mtable-regex-X=comment/\*\////{tleave}
1279
1280The above would have worked exactly the same.
1281
1282Getting back to our example, remember we're looking at the :file:`input.x`
1283file, currently using the ``comment`` table, and trying to match the first
1284regex of that table, shown above, at the following location::
1285
1286	   ,ctags is trying to match starting here
1287	  v
1288	/* BLOCK COMMENT
1289	var dont_capture_me;
1290	*/
1291	var a /* ANOTHER BLOCK COMMENT */, b;
1292
1293The pattern doesn't match for the position just after ``/*``, because that
1294position is a space character. So ctags tries the next pattern in the same
1295table:
1296
1297.. code-block:: ctags
1298	:linenos:
1299	:lineno-start: 13
1300
1301	--_mtable-regex-X=comment/.//
1302
1303This pattern matches any any one character including newline; the current
1304position moves one character forward. Now the character at the current position is
1305'``B``'. The first pattern of the table ``*/`` still does not match with the input. So
1306ctags uses next pattern again. When the current position moves to the ``*/``
1307of the 3rd line of :file:`input.x`, it will finally match this:
1308
1309.. code-block:: ctags
1310	:linenos:
1311	:lineno-start: 12
1312
1313	--_mtable-regex-X=comment/\*\///{tleave}
1314
1315In this pattern, the long flag ``{tleave}`` is specified. This triggers table
1316switching again. ``{tleave}`` makes ctags switch the table back to the last
1317table used before doing ``{tenter}``. In this case, ``toplevel`` is the table.
1318ctags manages a stack where references to tables are put. ``{tenter}`` pushes
1319the current table to the stack. ``{tleave}`` pops the table at the top of the
1320stack and chooses it.
1321
1322So now ctags is back to the ``toplevel`` table, and tries the first regex
1323of that table, which was this:
1324
1325.. code-block:: ctags
1326	:linenos:
1327	:lineno-start: 9
1328
1329	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1330
1331It tries to match that against its current position, which is now the
1332newline on line 3, between the ``*/`` and the word ``var``::
1333
1334	/* BLOCK COMMENT
1335	var dont_capture_me;
1336	*/ <--- ctags is now at this newline (/n) character
1337	var a /* ANOTHER BLOCK COMMENT */, b;
1338
1339The first regex of the ``toplevel`` table does not match a newline, so it tries
1340the second regex:
1341
1342.. code-block:: ctags
1343	:linenos:
1344	:lineno-start: 13
1345
1346	--_mtable-regex-X=toplevel/.//
1347
1348This matches a newline successfully, but has no actions to perform. So ctags
1349moves one character forward (the newline it just matched), and goes back to the
1350top of the ``toplevel`` table, and tries the first regex again. Eventually we'll
1351reach the beginning of the second block comment, and do the same things as before.
1352
1353When ctags finally reaches the end of the file (the position after ``b;``),
1354it will not be able to match either the first or second regex of the
1355``toplevel`` table, and quit processing the input file.
1356
1357So far, we've successfully skipped over block comments for our new ``X``
1358language, but haven't generated any tags. The point of ctags is to generate
1359tags, not just keep your computer warm. So now let's move onto actually tagging
1360variables...
1361
1362
1363Capturing variables in a sequence
1364......................................................................
1365
1366Here is the 4th version of :file:`X.ctags`:
1367
1368.. code-block:: ctags
1369	:emphasize-lines: 10,16-19
1370	:linenos:
1371
1372	--langdef=X
1373	--map-X=.x
1374	--kinddef-X=v,var,variables
1375
1376	--_tabledef-X=toplevel
1377	--_tabledef-X=comment
1378	--_tabledef-X=vars
1379
1380	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1381	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
1382	--_mtable-regex-X=toplevel/.//
1383
1384	--_mtable-regex-X=comment/\*\///{tleave}
1385	--_mtable-regex-X=comment/.//
1386
1387	--_mtable-regex-X=vars/;//{tleave}
1388	--_mtable-regex-X=vars/\/\*//{tenter=comment}
1389	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
1390	--_mtable-regex-X=vars/.//
1391
1392One pattern in ``toplevel`` was added, and a new table ``vars`` with four
1393patterns was also added.
1394
1395The new regex in ``toplevel`` is this:
1396
1397.. code-block:: ctags
1398	:linenos:
1399	:lineno-start: 10
1400
1401	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
1402
1403The purpose of this being in `toplevel` is to switch to the `vars` table when
1404the keyword ``var`` is found in the input stream. We need to switch states
1405(i.e., tables) because we can't simply capture the variables ``a`` and ``b``
1406with a single regex pattern in the ``toplevel`` table, because there might be
1407block comments inside the ``var`` statement (as there are in our
1408:file:`input.x`), and we also need to create *two* tags: one for ``a`` and one
1409for ``b``, even though the word ``var`` only appears once. In other words, we
1410need to "remember" that we saw the keyword ``var``, when we later encounter the
1411names ``a`` and ``b``, so that we know to tag each of them; and saving that
1412"in-variable-statement" state is accomplished by switching tables to the
1413``vars`` table.
1414
1415The first regex in our new ``vars`` table is:
1416
1417.. code-block:: ctags
1418	:linenos:
1419	:lineno-start: 16
1420
1421	--_mtable-regex-X=vars/;//{tleave}
1422
1423This pattern is used to match a single semi-colon '``;``', and if it matches
1424pop back to the ``toplevel`` table using the ``{tleave}`` long flag. We
1425didn't have to make this the first regex pattern, because it doesn't overlap
1426with any of the other ones other than the ``/.//`` last one (which must be
1427last for this example to work).
1428
1429The second regex in our ``vars`` table is:
1430
1431.. code-block:: ctags
1432	:linenos:
1433	:lineno-start: 17
1434
1435	--_mtable-regex-X=vars/\/\*//{tenter=comment}
1436
1437We need this because block comments can be in variable definitions::
1438
1439   var a /* ANOTHER BLOCK COMMENT */, b;
1440
1441So to skip block comments in such a position, the pattern ``\/\*`` is used just
1442like it was used in the ``toplevel`` table: to find the literal ``/*`` beginning
1443of the block comment and enter the ``comment`` table. Because we're using
1444``{tenter}`` and ``{tleave}`` to push/pop from a stack of tables, we can
1445use the same ``comment`` table for both ``toplevel`` and ``vars`` to go to,
1446because ctags will *remember* the previous table and ``{tleave}`` will
1447pop back to the right one.
1448
1449The third regex in our ``vars`` table is:
1450
1451.. code-block:: ctags
1452	:linenos:
1453	:lineno-start: 18
1454
1455	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
1456
1457This is nothing special, but is the one that actually tags something: it
1458captures the variable name and uses it for generating a ``variable`` (shorthand
1459``v``) tag kind.
1460
1461The last regex in the ``vars`` table we've seen before:
1462
1463.. code-block:: ctags
1464	:linenos:
1465	:lineno-start: 19
1466
1467	--_mtable-regex-X=vars/.//
1468
1469This makes ctags ignore any other characters, such as whitespace or the
1470comma '``,``'.
1471
1472
1473Running our example
1474......................................................................
1475
1476.. code-block:: console
1477
1478	$ cat input.x
1479	/* BLOCK COMMENT
1480	var dont_capture_me;
1481	*/
1482	var a /* ANOTHER BLOCK COMMENT */, b;
1483
1484	$ u-ctags -o - --fields=+n --options=X.ctags input.x
1485	u-ctags -o - --fields=+n --options=X.ctags input.x
1486	a	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4
1487	b	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4
1488
1489It works!
1490
1491You can find additional examples of multi-table regex in our github repo, under
1492the ``optlib`` directory. For example ``puppetManifest.ctags`` is a serious
1493example. It is the primary parser for testing multi-table regex parsers, and
1494used in the actual ctags program for parsing puppet manifest files.
1495
1496
1497.. _guest-regex-flag:
1498
1499Scheduling a guest parser with ``_guest`` regex flag
1500~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1501.. NOT REVIEWED YET
1502
1503With ``_guest`` regex flag, you can run a parser (a guest parser) on an
1504area of the current input file.
1505See ":ref:`host-guest-parsers`" about the concept of the guest parser.
1506
1507The ``_guest`` regex flag specifies a *guest spec*, and attaches it to
1508the associated regex pattern.
1509
1510A guest spec has three fields: *<PARSER>*, *<START>* of area, and *<END>* of area.
1511The ``_guest`` regex flag has following forms::
1512
1513  {_guest=<PARSER>,<START>,<END>}
1514
1515ctags maintains a data called *guest request* during parsing.  A
1516guest request also has three fields: `parser`, `start of area`, and
1517`end of area`.
1518
1519You, a parser developer, have to fill the fields of guest specs.
1520ctags inquiries the guest spec when matching the regex pattern
1521associated with it, tries to fill the fields of the guest request,
1522and runs a guest parser when all the fields of the guest request are
1523filled.
1524
1525If you use `Multi-line pattern match`_ to define a host parser,
1526you must specify all the fields of `guest request`.
1527
1528On the other hand if you don't use `Multi-line pattern match`_ to define a host parser,
1529ctags can fill fields of `guest request` incrementally; more than
1530one guest specs are used to fill the fields. In other words, you can
1531make some of the fields of a guest spec empty.
1532
1533The *<PARSER>* field of ``_guest`` regex flag
1534......................................................................
1535For *<PARSER>*, you can specify one of the following items:
1536
1537a name of a parser
1538
1539	If you know the guest parser you want to run before parsing
1540	the input file, specify the name of the parser. Aliases of parsers
1541	are also considered when finding a parser for the name.
1542
1543	An example of running C parser as a guest parser::
1544
1545		{_guest=C,...
1546
1547the group number of a regex pattern started from '``\``' (backslash)
1548
1549	If a parser name appears in an input file, write a regex pattern
1550	to capture the name.  Specify the group number where the name is
1551	stored to the parser.  In such case, use '``\``' as the prefix for
1552	the number. Aliases of parsers are also considered when finding
1553	a parser for the name.
1554
1555	Let's see an example. Git Flavor Markdown (GFM) is a language for
1556	documentation. It provides a notation for quoting a snippet of
1557	program code; the language treats the area started from ``~~~`` to
1558	``~~~`` as a snippet. You can specify a programming language of
1559	the snippet with starting the area with
1560	``~~~<THE_NAME_OF_LANGUAGE>``, like ``~~~C`` or ``~~~Java``.
1561
1562	To run a guest parser on the area, you have to capture the
1563	*<THE_NAME_OF_LANGUAGE>* with a regex pattern:
1564
1565	.. code-block:: ctags
1566
1567		--_mtable-regex-Markdown=main/~~~([a-zA-Z0-9][-#+a-zA-Z0-9]*)[\n]//{_guest=\1,0end,}
1568
1569	The pattern captures the language name in the input file with the
1570	regex group 1, and specify it to *<PARSER>*::
1571
1572		{guest=\1,...
1573
1574the group number of a regex pattern started from '``*``' (asterisk)
1575
1576	If a file name implying a programming language appears in an input
1577	file, capture the file name with the regex pattern where the guest
1578	spec attaches to. ctags tries to find a proper parser for the
1579	file name by inquiring the langmap.
1580
1581	Use '``*``' as the prefix to the number for specifying the group of
1582	the regex pattern that captures the file name.
1583
1584	Let's see an example. Consider you have a shell script that emits
1585	a program code instantiated from one of the templates. Here documents
1586	are used to represent the templates like:
1587
1588	.. code-block:: sh
1589
1590		i=...
1591		cat > foo.c <<EOF
1592			int main (void) { return $i; }
1593		EOF
1594
1595		cat > foo.el <<EOF
1596			(defun foo () (1+ $i))
1597		EOF
1598
1599	To run guest parsers for the here document areas, the shell
1600	script parser of ctags must choose the parsers from the file
1601	names (``foo.c`` and ``foo.el``):
1602
1603	.. code-block:: ctags
1604
1605		--regex-sh=/cat > ([a-z.]+) <<EOF//{_guest=*1,0end,}
1606
1607	The pattern captures the file name in the input file with the
1608	regex group 1, and specify it to *<PARSER>*::
1609
1610	   {_guest=*1,...
1611
1612The *<START>* and *<END>* fields of `_guest` regex flag
1613......................................................................
1614
1615The *<START>* and *<END>* fields specify the area the *<PARSER>* parses.  *<START>*
1616specifies the start of the area. *<END>* specifies the end of the area.
1617
1618The forms of the two fields are the same: a regex group number
1619followed by ``start`` or ``end``. e.g. ``3start``, ``0end``.  The suffixes,
1620``start`` and ``end``, represents one of two boundaries of the group.
1621
1622Let's see an example::
1623
1624	{_guest=C,2end,3start}
1625
1626This guest regex flag means running C parser on the area between
1627``2end`` and ``3start``. ``2end`` means the area starts from the end of
1628matching of the 2nd regex group associated with the flag. ``3start``
1629means the area ends at the beginning of matching of the 3rd regex
1630group associated with the flag.
1631
1632Let's more realistic example.
1633Here is an optlib file for an imaginary language `single`:
1634
1635.. code-block:: ctags
1636	:emphasize-lines: 3
1637
1638	--langdef=single
1639	--map-single=.single
1640	--regex-single=/^(BEGIN_C<).*(>END_C)$//{_guest=C,1end,2start}
1641
1642This parser can run C parser and extract ``main`` function from the
1643following input file::
1644
1645	BEGIN_C<int main (int argc, char **argv) { return 0; }>END_C
1646	        ^                                             ^
1647	         `- "1end" points here.                       |
1648	                               "2start" points here. -+
1649
1650.. NOT REVIEWED YET
1651
1652.. _defining-subparsers:
1653
1654Defining a subparser
1655~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1656
1657Basic
1658.........................................................................
1659
1660About the concept of subparser, see ":ref:`base-sub-parsers`".
1661
1662``--langdef=<LANG>`` option is extended as
1663``--langdef=<LANG>[{base=<LANG>}[{shared|dedicated|bidirectional}]][{_autoFQTag}]`` to define
1664a subparser for a specified base parser. Combining with ``--kinddef-<LANG>``
1665and ``--regex-<KIND>`` options, you can extend an existing parser
1666without risk of kind confliction.
1667
1668Let's see an example.
1669
1670input.c
1671
1672.. code-block:: C
1673
1674    static int set_one_prio(struct task_struct *p, int niceval, int error)
1675    {
1676    }
1677
1678    SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
1679    {
1680	    ...;
1681    }
1682
1683.. code-block:: console
1684
1685    $ ctags  -x --_xformat="%20N %10K %10l"  -o - input.c
1686	    set_one_prio   function          C
1687	 SYSCALL_DEFINE3   function          C
1688
1689C parser doesn't understand that ``SYSCALL_DEFINE3`` is a macro for defining an
1690entry point for a system.
1691
1692Let's define `linux` subparser which using C parser as a base parser (``linux.ctags``):
1693
1694.. code-block:: ctags
1695	:emphasize-lines: 1,3
1696
1697	--langdef=linux{base=C}
1698	--kinddef-linux=s,syscall,system calls
1699	--regex-linux=/SYSCALL_DEFINE[0-9]\(([^, )]+)[\),]*/\1/s/
1700
1701The output is change as follows with `linux` parser:
1702
1703.. code-block:: console
1704	:emphasize-lines: 2
1705
1706	$ ctags --options=./linux.ctags -x --_xformat="%20N %10K %10l"  -o - input.c
1707		 setpriority    syscall      linux
1708		set_one_prio   function          C
1709	     SYSCALL_DEFINE3   function          C
1710
1711``setpriority`` is recognized as a ``syscall`` of `linux`.
1712
1713Using only ``--regex-C=...`` you can capture ``setpriority``.
1714However, there were concerns about kind confliction; when introducing
1715a new kind with ``--regex-C=...``, you cannot use a letter and name already
1716used in C parser and ``--regex-C=...`` options specified in the other places.
1717
1718You can use a newly defined subparser as a new namespace of kinds.
1719In addition you can enable/disable with the subparser usable
1720``--languages=[+|-]`` option:
1721
1722.. code-block::console
1723
1724    $ ctags --options=./linux.ctags --languages=-linux -x --_xformat="%20N %10K %10l"  -o - input.c
1725	    set_one_prio   function          C
1726	 SYSCALL_DEFINE3   function          C
1727
1728.. _optlib_directions:
1729
1730Direction flags
1731.........................................................................
1732
1733.. TESTCASE: Units/flags-langdef-directions.r
1734
1735As explained in ":ref:`multiple_parsers_directions`" in
1736":ref:`multiple_parsers`", you can choose direction(s) how a base parser and a
1737guest parser work together with direction flags.
1738
1739The following examples are taken from `#1409
1740<https://github.com/universal-ctags/ctags/issues/1409>`_ submitted by @sgraham on
1741github Universal Ctags repository.
1742
1743``input.cc`` and ``input.mojom`` are input files, and have the same
1744contents::
1745
1746	ABC();
1747	int main(void)
1748	{
1749	}
1750
1751C++ parser can capture ``main`` as a function. `Mojom` subparser defined in the
1752later runs on C++ parser and is for capturing ``ABC``.
1753
1754shared combination
1755^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1756``{shared}`` is specified, for ``input.cc``, both tags capture by C++ parser
1757and mojom parser are recorded to tags file. For ``input.mojom``, only
1758tags captured by mojom parser are recorded to tags file.
1759
1760mojom-shared.ctags:
1761
1762.. code-block:: ctags
1763	:emphasize-lines: 1
1764
1765	--langdef=mojom{base=C++}{shared}
1766	--map-mojom=+.mojom
1767	--kinddef-mojom=f,function,functions
1768	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
1769
1770.. code-block:: ctags
1771	:emphasize-lines: 2
1772
1773	$ ctags --options=mojom-shared.ctags --fields=+l -o - input.cc
1774	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
1775	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
1776
1777.. code-block:: ctags
1778	:emphasize-lines: 2
1779
1780	$ ctags --options=mojom-shared.ctags --fields=+l -o - input.mojom
1781	ABC	input.mojom	/^ ABC();$/;"	f	language:mojom
1782
1783Mojom parser uses C++ parser internally but tags captured by C++ parser are
1784dropped in the output.
1785
1786dedicated combination
1787^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1788``{dedicated}`` is specified, for ``input.cc``, only tags capture by C++
1789parser are recorded to tags file. For ``input.mojom``, both tags capture
1790by C++ parser and mojom parser are recorded to tags file.
1791
1792mojom-dedicated.ctags:
1793
1794.. code-block:: ctags
1795	:emphasize-lines: 1
1796
1797	--langdef=mojom{base=C++}{dedicated}
1798	--map-mojom=+.mojom
1799	--kinddef-mojom=f,function,functions
1800	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
1801
1802.. code-block:: ctags
1803
1804	$ ctags --options=mojom-dedicated.ctags --fields=+l -o - input.cc
1805	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
1806
1807.. code-block:: ctags
1808	:emphasize-lines: 2-3
1809
1810	$ ctags --options=mojom-dedicated.ctags --fields=+l -o - input.mojom
1811	ABC	input.mojom	/^ ABC();$/;"	f	language:mojom
1812	main	input.mojom	/^int main(void)$/;"	f	language:C++	typeref:typename:int
1813
1814Mojom parser works only when ``.mojom`` file is given as input.
1815
1816bidirectional combination
1817^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1818``{bidirectional}`` is specified, both tags capture by C++ parser and
1819mojom parser are recorded to tags file for either input ``input.cc`` and
1820``input.mojom``.
1821
1822mojom-bidirectional.ctags:
1823
1824.. code-block:: ctags
1825	:emphasize-lines: 1
1826
1827	--langdef=mojom{base=C++}{bidirectional}
1828	--map-mojom=+.mojom
1829	--kinddef-mojom=f,function,functions
1830	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
1831
1832.. code-block:: ctags
1833	:emphasize-lines: 2
1834
1835	$ ctags --options=mojom-bidirectional.ctags --fields=+l -o - input.cc
1836	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
1837	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
1838
1839.. code-block:: ctags
1840	:emphasize-lines: 2-3
1841
1842	$ ctags --options=mojom-bidirectional.ctags --fields=+l -o - input.mojom
1843	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
1844	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
1845
1846
1847.. _optlib2c:
1848
1849Translating an option file into C source code (optlib2c)
1850~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1851Universal Ctags has an ``optlib2c`` script that translates an option file into C
1852source code. Your optlib parser can thus easily become a built-in parser.
1853
1854To add your optlib file, ``foo.ctags``, into ctags do the following steps;
1855
1856* copy ``foo.ctags`` file on ``optlib/`` directory
1857* add ``foo.ctags`` on ``OPTLIB2C_INPUT`` variable in ``source.mak``
1858* add ``fooParser`` on ``PARSER_LIST`` macro variable in ``main/parser_p.h``
1859
1860You are encouraged to submit your :file:`.ctags` file to our repository on
1861github through a pull request. See ":ref:`contributions`" for more details.
1862