xref: /Universal-ctags/docs/optlib.rst (revision 6024deefc593abced0b42582f4cf1a8658aac96f)
16575e367SMasatake YAMATO.. _optlib:
26575e367SMasatake YAMATO
309be9c82SMasatake YAMATOExtending ctags with Regex parser (*optlib*)
4eb375513SMasatake YAMATO---------------------------------------------------------------------
5f439b71bSVitor Antunes
6f439b71bSVitor Antunes:Maintainer: Masatake YAMATO <yamato@redhat.com>
7f439b71bSVitor Antunes
84351b915SHiroo HAYASHI.. contents:: `Table of contents`
94351b915SHiroo HAYASHI	:depth: 3
104351b915SHiroo HAYASHI	:local:
114351b915SHiroo HAYASHI
12b40096fdSHadriel Kaplan.. TODO:
13b40096fdSHadriel Kaplan	add a section on debugging
14b40096fdSHadriel Kaplan
15bb84f88aSHiroo HAYASHIExuberant Ctags allows a user to add a new parser to ctags with ``--langdef=<LANG>``
16d170c1c2SHiroo HAYASHIand ``--regex-<LANG>=...`` options.
17bb84f88aSHiroo HAYASHIUniversal Ctags follows and extends the design of Exuberant Ctags in more
18e30940dcSHiroo HAYASHIpowerful ways and call the feature as *optlib parser*, which is described in in
19e30940dcSHiroo HAYASHI:ref:`ctags-optlib(7) <ctags-optlib(7)>` and the following sections.
20d170c1c2SHiroo HAYASHI
2186bcb5c2SHiroo HAYASHI:ref:`ctags-optlib(7) <ctags-optlib(7)>` is the primary document of the optlib
2286bcb5c2SHiroo HAYASHIparser feature. The following sections provide additional information and more
23e30940dcSHiroo HAYASHIadvanced features. Note that some of the features are experimental, and will be
24e30940dcSHiroo HAYASHImarked as such in the documentation.
25d170c1c2SHiroo HAYASHI
2686bcb5c2SHiroo HAYASHILots of optlib parsers are included in Universal Ctags,
27e30940dcSHiroo HAYASHI`optlib/*.ctags <https://github.com/universal-ctags/ctags/tree/master/optlib>`_.
28e30940dcSHiroo HAYASHIThey will be good examples when you develop your own parsers.
29e30940dcSHiroo HAYASHI
3086bcb5c2SHiroo HAYASHIA optlib parser can be translated into C source code. Your optlib parser can
3186bcb5c2SHiroo HAYASHIthus easily become a built-in parser. See ":ref:`optlib2c`" for details.
32a3343725SMasatake YAMATO
33b40096fdSHadriel KaplanRegular expression (regex) engine
34eb375513SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3509be9c82SMasatake YAMATO
36ea999d80SMasatake YAMATOUniversal Ctags uses `the POSIX Extended Regular Expressions (ERE)
37ac0c751cSHiroo HAYASHI<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html>`_
38ea999d80SMasatake YAMATOsyntax as same as Exuberant Ctags by default.
39ac0c751cSHiroo HAYASHI
40ac0c751cSHiroo HAYASHIDuring building Universal Ctags the ``configure`` script runs compatibility
41ac0c751cSHiroo HAYASHItests of the regex engine in the system library.  If tests pass the engine is
42ac0c751cSHiroo HAYASHIused, otherwise the regex engine imported from `the GNU Gnulib library
43ac0c751cSHiroo HAYASHI<https://www.gnu.org/software/gnulib/manual/gnulib.html#Regular-expressions>`_
44ac0c751cSHiroo HAYASHIis used. In the latter case, ``ctags --list-features`` will contain
45ac0c751cSHiroo HAYASHI``gnulib_regex``.
46ac0c751cSHiroo HAYASHI
47ac0c751cSHiroo HAYASHISee ``regex(7)`` or `the GNU Gnulib Manual
48ac0c751cSHiroo HAYASHI<https://www.gnu.org/software/gnulib/manual/gnulib.html#Regular-expressions>`_
49ac0c751cSHiroo HAYASHIfor the details of the regular expression syntax.
50ac0c751cSHiroo HAYASHI
51ac0c751cSHiroo HAYASHI.. note::
52ac0c751cSHiroo HAYASHI
53ac0c751cSHiroo HAYASHI	The GNU regex engine supports some GNU extensions described `here
54ac0c751cSHiroo HAYASHI	<https://www.gnu.org/software/gnulib/manual/gnulib.html#posix_002dextended-regular-expression-syntax>`_.
55ac0c751cSHiroo HAYASHI	Note that an optlib parser using the extensions may not work with Universal
56ac0c751cSHiroo HAYASHI	Ctags on some other systems.
57ac0c751cSHiroo HAYASHI
580bceb411SHiroo HAYASHIThe POSIX Extended Regular Expressions (ERE) does
59b40096fdSHadriel Kaplan*not* support many of the "modern" extensions such as lazy captures,
60b40096fdSHadriel Kaplannon-capturing grouping, atomic grouping, possessive quantifiers, look-ahead/behind,
610bceb411SHiroo HAYASHIetc. It may be notoriously slow when backtracking.
6209be9c82SMasatake YAMATO
633676b2a7SHiroo HAYASHIA common error is forgetting that a
643676b2a7SHiroo HAYASHIPOSIX ERE engine is always *greedy*; the '``*``' and '``+``' quantifiers match
653676b2a7SHiroo HAYASHIas much as possible, before backtracking from the end of their match.
663676b2a7SHiroo HAYASHI
673676b2a7SHiroo HAYASHIFor example this pattern::
683676b2a7SHiroo HAYASHI
693676b2a7SHiroo HAYASHI	foo.*bar
703676b2a7SHiroo HAYASHI
713676b2a7SHiroo HAYASHIWill match this entire string, not just the first part::
723676b2a7SHiroo HAYASHI
733676b2a7SHiroo HAYASHI	foobar, bar, and even more bar
7409be9c82SMasatake YAMATO
75b40096fdSHadriel KaplanAnother detail to keep in mind is how the regex engine treats newlines.
76dccba5efSHiroo HAYASHIUniversal Ctags compiles the regular expressions in the ``--regex-<LANG>`` and
7786bcb5c2SHiroo HAYASHI``--mline-regex-<LANG>`` options with ``REG_NEWLINE`` set. What that means is documented
78b40096fdSHadriel Kaplanin the
790bceb411SHiroo HAYASHI`POSIX specification <https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html>`_.
8086bcb5c2SHiroo HAYASHIOne obvious effect is that the regex special dot any-character '``.``' does not match
8186bcb5c2SHiroo HAYASHInewline characters, the '``^``' anchor *does* match right after a newline, and
8286bcb5c2SHiroo HAYASHIthe '``$``' anchor matches right before a newline. A more subtle issue is this text from the
830bceb411SHiroo HAYASHIchapter "`Regular Expressions <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html>`_";
84b40096fdSHadriel Kaplan"the use of literal <newline>s or any escape sequence equivalent produces undefined
85b40096fdSHadriel Kaplanresults". What that means is using a regex pattern with ``[^\n]+`` is invalid,
8686bcb5c2SHiroo HAYASHIand indeed in glibc produces very odd results. **Never use** '``\n``' in patterns
8786bcb5c2SHiroo HAYASHIfor ``--regex-<LANG>``, and **never use them** in non-matching bracket expressions
88b40096fdSHadriel Kaplanfor ``--mline-regex-<LANG>`` patterns. For the experimental ``--_mtable-regex-<LANG>``
8986bcb5c2SHiroo HAYASHIyou can safely use '``\n``' because that regex is not compiled with ``REG_NEWLINE``.
9009be9c82SMasatake YAMATO
913676b2a7SHiroo HAYASHIAnd it may also have some known "quirks"
923676b2a7SHiroo HAYASHIwith respect to escaping special characters in bracket expressions.
933676b2a7SHiroo HAYASHIFor example, a pattern of ``[^\]]+`` is invalid in POSIX ERE, because the '``]``' is
9409be9c82SMasatake YAMATO*not* special inside a bracket expression, and thus should **not** be escaped.
953676b2a7SHiroo HAYASHIMost regex engines ignore this subtle detail in POSIX ERE, and instead allow
9609be9c82SMasatake YAMATOescaping it with '``\]``' inside the bracket expression and treat it as the
9709be9c82SMasatake YAMATOliteral character '``]``'. GNU glibc, however, does not generate an error but
9809be9c82SMasatake YAMATOinstead considers it undefined behavior, and in fact it will match very odd
9909be9c82SMasatake YAMATOthings. Instead you **must** use the more unintuitive ``[^]]+`` syntax. The same
10009be9c82SMasatake YAMATOis technically true of other special characters inside a bracket expression,
10109be9c82SMasatake YAMATOsuch as ``[^\)]+``, which should instead be ``[^)]+``. The ``[^\)]+`` will
10209be9c82SMasatake YAMATOappear to work usually, but only because what it is really doing is matching any
10309be9c82SMasatake YAMATOcharacter but '``\``' *or* '``)``'. The only exceptions for using '``\``' inside a
10409be9c82SMasatake YAMATObracket expression are for '``\t``' and '``\n``', which ctags converts to their
10509be9c82SMasatake YAMATOsingle literal character control codes before passing the pattern to glibc.
10609be9c82SMasatake YAMATO
107b40096fdSHadriel KaplanYou should always test your regex patterns against test files with strings that
108b40096fdSHadriel Kaplando and do not match. Pay particular emphasis to when it should *not* match, and
1093676b2a7SHiroo HAYASHIhow *much* it matches when it should.
11009be9c82SMasatake YAMATO
111ea999d80SMasatake YAMATOPerl-compatible regular expressions (PCRE2) engine
112ea999d80SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113ea999d80SMasatake YAMATO
114ea999d80SMasatake YAMATOUniversal Ctags optionally supports `Perl-Compatible Regular Expressions (PCRE2)
115ea999d80SMasatake YAMATO<https://www.pcre.org/current/doc/html/pcre2syntax.html>`_ syntax
116ea999d80SMasatake YAMATOonly if the Universal Ctags is built with ``pcre2`` library.
117ea999d80SMasatake YAMATOSee the output of ``--list-features`` option to know whether your Universal
118ea999d80SMasatake YAMATOCtags is built-with ``pcre2`` or not.
119ea999d80SMasatake YAMATO
120ea999d80SMasatake YAMATOPCRE2 *does* support many "modern" extensions.
121ea999d80SMasatake YAMATOFor example this pattern::
122ea999d80SMasatake YAMATO
123ea999d80SMasatake YAMATO       foo.*?bar
124ea999d80SMasatake YAMATO
125ea999d80SMasatake YAMATOWill match just the first part, ``foobar``, not this entire string,::
126ea999d80SMasatake YAMATO
127ea999d80SMasatake YAMATO       foobar, bar, and even more bar
128ea999d80SMasatake YAMATO
129b40096fdSHadriel KaplanRegex option argument flags
130b40096fdSHadriel Kaplan~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131b40096fdSHadriel Kaplan
132be11ec05SMasanari IidaMany regex-based options described in this document support additional arguments
13386bcb5c2SHiroo HAYASHIin the form of long flags. Long flags are specified with surrounding '``{``' and
13486bcb5c2SHiroo HAYASHI'``}``'.
135b40096fdSHadriel Kaplan
1363cd8570eSHiroo HAYASHIThe general format and placement is as follows:
1373cd8570eSHiroo HAYASHI
1383cd8570eSHiroo HAYASHI.. code-block:: ctags
139b40096fdSHadriel Kaplan
140b40096fdSHadriel Kaplan	--regex-<LANG>=<PATTERN>/<NAME>/[<KIND>/]LONGFLAGS
141b40096fdSHadriel Kaplan
142b40096fdSHadriel KaplanSome examples:
143b40096fdSHadriel Kaplan
144d14dd918SMasatake YAMATO.. code-block:: ctags
145b40096fdSHadriel Kaplan
146b40096fdSHadriel Kaplan	--regex-Pod=/^=head1[ \t]+(.+)/\1/c/
147b40096fdSHadriel Kaplan	--regex-Foo=/set=[^;]+/\1/v/{icase}
148b40096fdSHadriel Kaplan	--regex-Man=/^\.TH[[:space:]]{1,}"([^"]{1,})".*/\1/t/{exclusive}{icase}{scope=push}
149b40096fdSHadriel Kaplan	--regex-Gdbinit=/^#//{exclusive}
150b40096fdSHadriel Kaplan
1513cd8570eSHiroo HAYASHINote that the last example only has two '``/``' forward-slashes following
152b40096fdSHadriel Kaplanthe regex pattern, as a shortened form when no kind-spec exists.
153b40096fdSHadriel Kaplan
154b40096fdSHadriel KaplanThe ``--mline-regex-<LANG>`` option also follows the above format. The
155b40096fdSHadriel Kaplanexperimental ``--_mtable-regex-<LANG>`` option follows a slightly
156b40096fdSHadriel Kaplanmodified version as well.
157b40096fdSHadriel Kaplan
158b40096fdSHadriel KaplanRegex control flags
159b40096fdSHadriel Kaplan......................................................................
160b40096fdSHadriel Kaplan
161b40096fdSHadriel Kaplan.. Q: why even discuss the single-character version of the flags? Just
162b40096fdSHadriel Kaplan	make everyone use the long form.
163b40096fdSHadriel Kaplan
164b40096fdSHadriel KaplanThe regex matching can be controlled by adding flags to the ``--regex-<LANG>``,
165b40096fdSHadriel Kaplan``--mline-regex-<LANG>``, and experimental ``--_mtable-regex-<LANG>`` options.
166b40096fdSHadriel KaplanThis is done by either using the single character short flags ``b``, ``e`` and
167b40096fdSHadriel Kaplan``i`` flags as explained in the *ctags.1* man page, or by using long flags
168b40096fdSHadriel Kaplandescribed earlier. The long flags require more typing but are much more
169b40096fdSHadriel Kaplanreadable.
170b40096fdSHadriel Kaplan
171b40096fdSHadriel KaplanThe mapping between the older short flag names and long flag names is:
172b40096fdSHadriel Kaplan
173b40096fdSHadriel Kaplan=========== =========== ===========
174b40096fdSHadriel Kaplanshort flag  long flag   description
175b40096fdSHadriel Kaplan=========== =========== ===========
176b40096fdSHadriel Kaplanb           basic       Posix basic regular expression syntax.
177b40096fdSHadriel Kaplane           extend      Posix extended regular expression syntax (default).
178b40096fdSHadriel Kaplani           icase       Case-insensitive matching.
179b40096fdSHadriel Kaplan=========== =========== ===========
180b40096fdSHadriel Kaplan
181b40096fdSHadriel Kaplan
182b40096fdSHadriel KaplanSo the following ``--regex-<LANG>`` expression:
183b40096fdSHadriel Kaplan
184d14dd918SMasatake YAMATO.. code-block:: ctags
185b40096fdSHadriel Kaplan
1869c9a7a7cSMasatake YAMATO   --kinddef-m4=d,definition,definitions
1879c9a7a7cSMasatake YAMATO   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d/x
188b40096fdSHadriel Kaplan
189b40096fdSHadriel Kaplanis the same as:
190b40096fdSHadriel Kaplan
191d14dd918SMasatake YAMATO.. code-block:: ctags
192b40096fdSHadriel Kaplan
1939c9a7a7cSMasatake YAMATO   --kinddef-m4=d,definition,definitions
1949c9a7a7cSMasatake YAMATO   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d/{extend}
19509be9c82SMasatake YAMATO
19686bcb5c2SHiroo HAYASHIThe characters '``{``' and '``}``' may not be suitable for command line
197b40096fdSHadriel Kaplanuse, but long flags are mostly intended for option files.
19809be9c82SMasatake YAMATO
19909be9c82SMasatake YAMATOExclusive flag in regex
200b40096fdSHadriel Kaplan......................................................................
20109be9c82SMasatake YAMATO
20286bcb5c2SHiroo HAYASHIBy default, lines read from the input files will be matched against all the
203da7b7cd3SIvan Gonzalez Polancoregular expressions defined with ``--regex-<LANG>``. Each successfully matched
204da7b7cd3SIvan Gonzalez Polancoregular expression will emit a tag.
20509be9c82SMasatake YAMATO
20609be9c82SMasatake YAMATOIn some cases another policy, exclusive-matching, is preferable to the
20709be9c82SMasatake YAMATOall-matching policy. Exclusive-matching means the rest of regular
20809be9c82SMasatake YAMATOexpressions are not tried if one of regular expressions is matched
209b40096fdSHadriel Kaplansuccessfully, for that input line.
21009be9c82SMasatake YAMATO
211b40096fdSHadriel KaplanFor specifying exclusive-matching the flags ``exclusive`` (long) and ``x``
212b40096fdSHadriel Kaplan(short) were introduced. For example, this is used in
21386bcb5c2SHiroo HAYASHI:file:`optlib/gdbinit.ctags` for ignoring comment lines in gdb files,
214b40096fdSHadriel Kaplanas follows:
21509be9c82SMasatake YAMATO
216d14dd918SMasatake YAMATO.. code-block:: ctags
21709be9c82SMasatake YAMATO
218b40096fdSHadriel Kaplan	--regex-Gdbinit=/^#//{exclusive}
21909be9c82SMasatake YAMATO
2203cd8570eSHiroo HAYASHIComments in gdb files start with '``#``' so the above line is the first regex
221b40096fdSHadriel Kaplanmatch line in :file:`gdbinit.ctags`, so that subsequent regex matches are
222b40096fdSHadriel Kaplannot tried for the input line.
22309be9c82SMasatake YAMATO
224b40096fdSHadriel KaplanIf an empty name pattern (``//``) is used for the ``--regex-<LANG>`` option,
225b40096fdSHadriel Kaplanctags warns it as a wrong usage of the option. However, if the flags
226b40096fdSHadriel Kaplan``exclusive`` or ``x`` is specified, the warning is suppressed.
2273cd8570eSHiroo HAYASHIThis is useful to ignore matched patterns as above.
228b40096fdSHadriel Kaplan
229b40096fdSHadriel KaplanNOTE: This flag does not make sense in the multi-line ``--mline-regex-<LANG>``
230b40096fdSHadriel Kaplanoption nor the multi-table ``--_mtable-regex-<LANG>`` option.
231b40096fdSHadriel Kaplan
232b40096fdSHadriel Kaplan
233b40096fdSHadriel KaplanExperimental flags
234b40096fdSHadriel Kaplan......................................................................
235b40096fdSHadriel Kaplan
236b40096fdSHadriel Kaplan.. note:: These flags are experimental. They apply to all regex option
237b40096fdSHadriel Kaplan	types: basic ``--regex-<LANG>``, multi-line ``--mline-regex-<LANG>``,
238b40096fdSHadriel Kaplan	and the experimental multi-table ``--_mtable-regex-<LANG>`` option.
239b40096fdSHadriel Kaplan
240b40096fdSHadriel Kaplan``_extra``
241b40096fdSHadriel Kaplan
242b40096fdSHadriel Kaplan	This flag indicates the tag should only be generated if the given
24386bcb5c2SHiroo HAYASHI	``extra`` type is enabled, as explained in ":ref:`extras`".
244b40096fdSHadriel Kaplan
245b40096fdSHadriel Kaplan``_field``
246b40096fdSHadriel Kaplan
247b40096fdSHadriel Kaplan	This flag allows a regex match to add additional custom fields to the
24886bcb5c2SHiroo HAYASHI	generated tag entry, as explained in ":ref:`fields`".
249b40096fdSHadriel Kaplan
250b40096fdSHadriel Kaplan``_role``
251b40096fdSHadriel Kaplan
252b40096fdSHadriel Kaplan	This flag allows a regex match to generate a reference tag entry and
25386bcb5c2SHiroo HAYASHI	specify the role of the reference, as explained in ":ref:`roles`".
2548370e4a6SMasatake YAMATO
2550d56cc8eSMasatake YAMATO.. NOT REVIEWED YET
2560d56cc8eSMasatake YAMATO
2570d56cc8eSMasatake YAMATO``_anonymous=PREFIX``
2580d56cc8eSMasatake YAMATO
2590d56cc8eSMasatake YAMATO	This flag allows a regex match to generate an anonymous tag entry.
2600d56cc8eSMasatake YAMATO	ctags gives a name starting with ``PREFIX`` and emits it.
2610d56cc8eSMasatake YAMATO	This flag is useful to record the position for a language object
2620d56cc8eSMasatake YAMATO	having no name. A lambda function in a functional programming
2630d56cc8eSMasatake YAMATO	language is a typical example of a language object having no name.
2640d56cc8eSMasatake YAMATO
26586bcb5c2SHiroo HAYASHI	Consider following input (``input.foo``):
2660d56cc8eSMasatake YAMATO
2670d56cc8eSMasatake YAMATO	.. code-block:: lisp
2680d56cc8eSMasatake YAMATO
2690d56cc8eSMasatake YAMATO		(let ((f (lambda (x) (+ 1 x))))
2700d56cc8eSMasatake YAMATO			...
2710d56cc8eSMasatake YAMATO			)
2720d56cc8eSMasatake YAMATO
27386bcb5c2SHiroo HAYASHI	Consider following optlib file (``foo.ctags``):
2740d56cc8eSMasatake YAMATO
275d14dd918SMasatake YAMATO	.. code-block:: ctags
276a5c14cdaSHiroo HAYASHI		:emphasize-lines: 4
2770d56cc8eSMasatake YAMATO
2780d56cc8eSMasatake YAMATO		--langdef=Foo
2790d56cc8eSMasatake YAMATO		--map-Foo=+.foo
2800d56cc8eSMasatake YAMATO		--kinddef-Foo=l,lambda,lambda functions
2810d56cc8eSMasatake YAMATO		--regex-Foo=/.*\(lambda .*//l/{_anonymous=L}
2820d56cc8eSMasatake YAMATO
2830d56cc8eSMasatake YAMATO	You can get following tags file:
2840d56cc8eSMasatake YAMATO
2850d56cc8eSMasatake YAMATO	.. code-block:: console
2860d56cc8eSMasatake YAMATO
2870d56cc8eSMasatake YAMATO		$ u-ctags  --options=foo.ctags -o - /tmp/input.foo
2880d56cc8eSMasatake YAMATO		Le4679d360100	/tmp/input.foo	/^(let ((f (lambda (x) (+ 1 x))))$/;"	l
2890d56cc8eSMasatake YAMATO
2908370e4a6SMasatake YAMATO
291a60d2470SHiroo HAYASHI.. _extras:
292a60d2470SHiroo HAYASHI
293a60d2470SHiroo HAYASHIConditional tagging with extras
294a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
295a60d2470SHiroo HAYASHI
296a60d2470SHiroo HAYASHI.. NEEDS MORE REVIEWS
297a60d2470SHiroo HAYASHI
298a60d2470SHiroo HAYASHIIf a matched pattern should only be tagged when an ``extra`` flag is enabled,
299a60d2470SHiroo HAYASHImark the pattern with ``{_extra=XNAME}`` where ``XNAME`` is the name of the
300a60d2470SHiroo HAYASHIextra. You must define a ``XNAME`` with the
301a60d2470SHiroo HAYASHI``--_extradef-<LANG>=XNAME,DESCRIPTION`` option before defining a regex flag
302a60d2470SHiroo HAYASHImarked ``{_extra=XNAME}``.
303a60d2470SHiroo HAYASHI
304a60d2470SHiroo HAYASHI.. code-block:: python
305a60d2470SHiroo HAYASHI
306a60d2470SHiroo HAYASHI	if __name__ == '__main__':
307a60d2470SHiroo HAYASHI		do_something()
308a60d2470SHiroo HAYASHI
309a60d2470SHiroo HAYASHITo capture the lines above in a python program (``input.py``), an ``extra`` flag can
310a60d2470SHiroo HAYASHIbe used.
311a60d2470SHiroo HAYASHI
312a60d2470SHiroo HAYASHI.. code-block:: ctags
313a60d2470SHiroo HAYASHI	:emphasize-lines: 1-2
314a60d2470SHiroo HAYASHI
315a60d2470SHiroo HAYASHI	--_extradef-Python=main,__main__ entry points
316a60d2470SHiroo HAYASHI	--regex-Python=/^if __name__ == '__main__':/__main__/f/{_extra=main}
317a60d2470SHiroo HAYASHI
318a60d2470SHiroo HAYASHIThe above optlib (``python-main.ctags``) introduces ``main`` extra to the Python parser.
319a60d2470SHiroo HAYASHIThe pattern matching is done only when the ``main`` is enabled.
320a60d2470SHiroo HAYASHI
321a60d2470SHiroo HAYASHI.. code-block:: console
322a60d2470SHiroo HAYASHI
323a60d2470SHiroo HAYASHI	$ ctags --options=python-main.ctags -o - --extras-Python='+{main}' input.py
324a60d2470SHiroo HAYASHI	__main__	input.py	/^if __name__ == '__main__':$/;"	f
325a60d2470SHiroo HAYASHI
326a60d2470SHiroo HAYASHI
327a60d2470SHiroo HAYASHI.. TODO: this "fields" section should probably be moved up this document, as a
328a60d2470SHiroo HAYASHI	subsection in the "Regex option argument flags" section
329a60d2470SHiroo HAYASHI
330a60d2470SHiroo HAYASHI.. _fields:
331a60d2470SHiroo HAYASHI
332a60d2470SHiroo HAYASHIAdding custom fields to the tag output
333a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
334a60d2470SHiroo HAYASHI
335a60d2470SHiroo HAYASHI.. NEEDS MORE REVIEWS
336a60d2470SHiroo HAYASHI
337a60d2470SHiroo HAYASHIExuberant Ctags allows just one of the specified groups in a regex pattern to
338a60d2470SHiroo HAYASHIbe used as a part of the name of a tag entry.
339a60d2470SHiroo HAYASHI
340a60d2470SHiroo HAYASHIUniversal Ctags allows using the other groups in the regex pattern.
341a60d2470SHiroo HAYASHIAn optlib parser can have its specific fields. The groups can be used as a
342a60d2470SHiroo HAYASHIvalue of the fields of a tag entry.
343a60d2470SHiroo HAYASHI
344a60d2470SHiroo HAYASHILet's think about `Unknown`, an imaginary language.
345a60d2470SHiroo HAYASHIHere is a source file (``input.unknown``) written in `Unknown`:
346a60d2470SHiroo HAYASHI
347a60d2470SHiroo HAYASHI.. code-block:: java
348a60d2470SHiroo HAYASHI
349a60d2470SHiroo HAYASHI	public func foo(n, m);
350a60d2470SHiroo HAYASHI	protected func bar(n);
351a60d2470SHiroo HAYASHI	private func baz(n,...);
352a60d2470SHiroo HAYASHI
353a60d2470SHiroo HAYASHIWith ``--regex-Unknown=...`` Exuberant Ctags can capture ``foo``, ``bar``, and ``baz``
354a60d2470SHiroo HAYASHIas names. Universal Ctags can attach extra context information to the
355a60d2470SHiroo HAYASHInames as values for fields. Let's focus on ``bar``. ``protected`` is a
356a60d2470SHiroo HAYASHIkeyword to control how widely the identifier ``bar`` can be accessed.
357a60d2470SHiroo HAYASHI``(n)`` is the parameter list of ``bar``. ``protected`` and ``(n)`` are
358a60d2470SHiroo HAYASHIextra context information of ``bar``.
359a60d2470SHiroo HAYASHI
360a60d2470SHiroo HAYASHIWith the following optlib file (``unknown.ctags``), ctags can attach
361a60d2470SHiroo HAYASHI``protected`` to the field protection and ``(n)`` to the field signature.
362a60d2470SHiroo HAYASHI
363a60d2470SHiroo HAYASHI.. code-block:: ctags
364a60d2470SHiroo HAYASHI	:emphasize-lines: 5-9
365a60d2470SHiroo HAYASHI
366a60d2470SHiroo HAYASHI	--langdef=unknown
367a60d2470SHiroo HAYASHI	--kinddef-unknown=f,func,functions
368a60d2470SHiroo HAYASHI	--map-unknown=+.unknown
369a60d2470SHiroo HAYASHI
370a60d2470SHiroo HAYASHI	--_fielddef-unknown=protection,access scope
371a60d2470SHiroo HAYASHI	--_fielddef-unknown=signature,signatures
372a60d2470SHiroo HAYASHI
373a60d2470SHiroo HAYASHI	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
374a60d2470SHiroo HAYASHI	--fields-unknown=+'{protection}{signature}'
375a60d2470SHiroo HAYASHI
376a60d2470SHiroo HAYASHIFor the line ``protected func bar(n);`` you will get following tags output::
377a60d2470SHiroo HAYASHI
378a60d2470SHiroo HAYASHI	bar	input.unknown	/^protected func bar(n);$/;"	f	protection:protected	signature:(n)
379a60d2470SHiroo HAYASHI
380a60d2470SHiroo HAYASHILet's see the detail of ``unknown.ctags``.
381a60d2470SHiroo HAYASHI
382a60d2470SHiroo HAYASHI.. code-block:: ctags
383a60d2470SHiroo HAYASHI
384a60d2470SHiroo HAYASHI	--_fielddef-unknown=protection,access scope
385a60d2470SHiroo HAYASHI
386a60d2470SHiroo HAYASHI``--_fielddef-<LANG>=name,description`` defines a new field for a parser
387a60d2470SHiroo HAYASHIspecified by *<LANG>*.  Before defining a new field for the parser,
388a60d2470SHiroo HAYASHIthe parser must be defined with ``--langdef=<LANG>``. ``protection`` is
389a60d2470SHiroo HAYASHIthe field name used in tags output. ``access scope`` is the description
390a60d2470SHiroo HAYASHIused in the output of ``--list-fields`` and ``--list-fields=Unknown``.
391a60d2470SHiroo HAYASHI
392a60d2470SHiroo HAYASHI.. code-block:: ctags
393a60d2470SHiroo HAYASHI
394a60d2470SHiroo HAYASHI	--_fielddef-unknown=signature,signatures
395a60d2470SHiroo HAYASHI
396a60d2470SHiroo HAYASHIThis defines a field named ``signature``.
397a60d2470SHiroo HAYASHI
398a60d2470SHiroo HAYASHI.. code-block:: ctags
399a60d2470SHiroo HAYASHI
400a60d2470SHiroo HAYASHI	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
401a60d2470SHiroo HAYASHI
402a60d2470SHiroo HAYASHIThis option requests making a tag for the name that is specified with the group 3 of the
403a60d2470SHiroo HAYASHIpattern, attaching the group 1 as a value for ``protection`` field to the tag, and attaching
404a60d2470SHiroo HAYASHIthe group 4 as a value for ``signature`` field to the tag. You can use the long regex flag
405a60d2470SHiroo HAYASHI``_field`` for attaching fields to a tag with the following notation rule::
406a60d2470SHiroo HAYASHI
407a60d2470SHiroo HAYASHI	{_field=FIELDNAME:GROUP}
408a60d2470SHiroo HAYASHI
409a60d2470SHiroo HAYASHI
410a60d2470SHiroo HAYASHI``--fields-<LANG>=[+|-]{FIELDNAME}`` can be used to enable or disable specified field.
411a60d2470SHiroo HAYASHI
412a60d2470SHiroo HAYASHIWhen defining a new parser specific field, it is disabled by default. Enable the
413a60d2470SHiroo HAYASHIfield explicitly to use the field. See ":ref:`Parser specific fields <parser-specific-fields>`"
414a60d2470SHiroo HAYASHIabout ``--fields-<LANG>`` option.
415a60d2470SHiroo HAYASHI
416a60d2470SHiroo HAYASHI`passwd` parser is a simple example that uses ``--fields-<LANG>`` option.
417a60d2470SHiroo HAYASHI
418a60d2470SHiroo HAYASHI
419a60d2470SHiroo HAYASHI.. _roles:
420a60d2470SHiroo HAYASHI
421a60d2470SHiroo HAYASHICapturing reference tags
422a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
423a60d2470SHiroo HAYASHI
424a60d2470SHiroo HAYASHI.. NOT REVIEWED YET
425a60d2470SHiroo HAYASHI
426a60d2470SHiroo HAYASHITo make a reference tag with an optlib parser, specify a role with
427a60d2470SHiroo HAYASHI``_role`` long regex flag. Let's see an example:
428a60d2470SHiroo HAYASHI
429a60d2470SHiroo HAYASHI.. code-block:: ctags
430a60d2470SHiroo HAYASHI	:emphasize-lines: 3-6
431a60d2470SHiroo HAYASHI
432a60d2470SHiroo HAYASHI	--langdef=FOO
433a60d2470SHiroo HAYASHI	--kinddef-FOO=m,module,modules
434a60d2470SHiroo HAYASHI	--_roledef-FOO.m=imported,imported module
435a60d2470SHiroo HAYASHI	--regex-FOO=/import[ \t]+([a-z]+)/\1/m/{_role=imported}
436a60d2470SHiroo HAYASHI	--extras=+r
437a60d2470SHiroo HAYASHI	--fields=+r
438a60d2470SHiroo HAYASHI
439a60d2470SHiroo HAYASHIA role must be defined before specifying it as value for ``_role`` flag.
440a60d2470SHiroo HAYASHI``--_roledef-<LANG>.<KIND>=<ROLE>,<ROLEDESC>`` option is for defining a role.
441a60d2470SHiroo HAYASHISee the line, ``--regex-FOO=...``.  In this parser `FOO`, the name of an
442a60d2470SHiroo HAYASHIimported module is captured as a reference tag with role ``imported``.
443a60d2470SHiroo HAYASHI
444a60d2470SHiroo HAYASHIFor specifying *<KIND>* where the role is defined, you can use either a
445a60d2470SHiroo HAYASHIkind letter or a kind name surrounded by '``{``' and '``}``'.
446a60d2470SHiroo HAYASHI
447a60d2470SHiroo HAYASHIThe option has two parameters separated by a comma:
448a60d2470SHiroo HAYASHI
449a60d2470SHiroo HAYASHI*<ROLE>*
450a60d2470SHiroo HAYASHI
451a60d2470SHiroo HAYASHI	the role name, and
452a60d2470SHiroo HAYASHI
453a60d2470SHiroo HAYASHI*<ROLEDESC>*
454a60d2470SHiroo HAYASHI
455a60d2470SHiroo HAYASHI	the description of the role.
456a60d2470SHiroo HAYASHI
457a60d2470SHiroo HAYASHIThe first parameter is the name of the role. The role is defined in
458a60d2470SHiroo HAYASHIthe kind *<KIND>* of the language *<LANG>*. In the example,
459a60d2470SHiroo HAYASHI``imported`` role is defined in the ``module`` kind, which is specified
460a60d2470SHiroo HAYASHIwith ``m``. You can use ``{module}``, the name of the kind instead.
461a60d2470SHiroo HAYASHI
462a60d2470SHiroo HAYASHIThe kind specified in ``--_roledef-<LANG>.<KIND>`` option must be
463a60d2470SHiroo HAYASHIdefined *before* using the option. See the description of
464a60d2470SHiroo HAYASHI``--kinddef-<LANG>`` for defining a kind.
465a60d2470SHiroo HAYASHI
466a60d2470SHiroo HAYASHIThe roles are listed with ``--list-roles=<LANG>``. The name and description
467a60d2470SHiroo HAYASHIpassed to ``--_roledef-<LANG>.<KIND>`` option are used in the output like::
468a60d2470SHiroo HAYASHI
469a60d2470SHiroo HAYASHI	$ ctags --langdef=FOO --kinddef-FOO=m,module,modules \
470a60d2470SHiroo HAYASHI				--_roledef-FOO.m='imported,imported module' --list-roles=FOO
471a60d2470SHiroo HAYASHI	#KIND(L/N) NAME     ENABLED DESCRIPTION
472a60d2470SHiroo HAYASHI	m/module   imported on      imported module
473a60d2470SHiroo HAYASHI
474a60d2470SHiroo HAYASHI
475a60d2470SHiroo HAYASHIIf specifying ``_role`` regex flag multiple times with different roles, you can
476a60d2470SHiroo HAYASHIassign multiple roles to a reference tag.  See following input of C language
477a60d2470SHiroo HAYASHI
478a60d2470SHiroo HAYASHI.. code-block:: C
479a60d2470SHiroo HAYASHI
480a60d2470SHiroo HAYASHI	x  = 0;
481a60d2470SHiroo HAYASHI	i += 1;
482a60d2470SHiroo HAYASHI
483a60d2470SHiroo HAYASHIAn ultra fine grained C parser may capture the variable ``x`` with
484a60d2470SHiroo HAYASHI``lvalue`` role and the variable ``i`` with ``lvalue`` and ``incremented``
485a60d2470SHiroo HAYASHIroles.
486a60d2470SHiroo HAYASHI
487a60d2470SHiroo HAYASHIYou can implement such roles by extending the built-in C parser:
488a60d2470SHiroo HAYASHI
489a60d2470SHiroo HAYASHI.. code-block:: ctags
490a60d2470SHiroo HAYASHI	:emphasize-lines: 2-5
491a60d2470SHiroo HAYASHI
492a60d2470SHiroo HAYASHI	# c-extra.ctags
493a60d2470SHiroo HAYASHI	--_roledef-C.v=lvalue,locator values
494a60d2470SHiroo HAYASHI	--_roledef-C.v=incremented,incremented with ++ operator
495a60d2470SHiroo HAYASHI	--regex-C=/([a-zA-Z_][a-zA-Z_0-9]*) *=/\1/v/{_role=lvalue}
496a60d2470SHiroo HAYASHI	--regex-C=/([a-zA-Z_][a-zA-Z_0-9]*) *\+=/\1/v/{_role=lvalue}{_role=incremented}
497a60d2470SHiroo HAYASHI
498a60d2470SHiroo HAYASHI.. code-block:: console
499a60d2470SHiroo HAYASHI
500a60d2470SHiroo HAYASHI	$ ctags with --options=c-extra.ctags --extras=+r --fields=+r
501a60d2470SHiroo HAYASHI	i	input.c	/^i += 1;$/;"	v	roles:lvalue,incremented
502a60d2470SHiroo HAYASHI	x	input.c	/^x = 0;$/;"	v	roles:lvalue
503a60d2470SHiroo HAYASHI
504a60d2470SHiroo HAYASHI
5053c49e28cSMasatake YAMATOScope tracking in a regex parser
506a60d2470SHiroo HAYASHI......................................................................
5073c49e28cSMasatake YAMATO
50886bcb5c2SHiroo HAYASHIAbout the ``{scope=..}`` flag itself for scope tracking, see "FLAGS FOR
509fbfefc14SMasatake YAMATO--regex-<LANG> OPTION" section of :ref:`ctags-optlib(7) <ctags-optlib(7)>`.
5103c49e28cSMasatake YAMATO
511b40096fdSHadriel KaplanExample 1:
5123c49e28cSMasatake YAMATO
513b40096fdSHadriel Kaplan.. code-block:: python
514b40096fdSHadriel Kaplan
515b40096fdSHadriel Kaplan	# in /tmp/input.foo
5163c49e28cSMasatake YAMATO	class foo:
5173c49e28cSMasatake YAMATO	def bar(baz):
5183c49e28cSMasatake YAMATO		print(baz)
5193c49e28cSMasatake YAMATO	class goo:
5203c49e28cSMasatake YAMATO	def gar(gaz):
5213c49e28cSMasatake YAMATO		print(gaz)
5223c49e28cSMasatake YAMATO
523d14dd918SMasatake YAMATO.. code-block:: ctags
524a5c14cdaSHiroo HAYASHI	:emphasize-lines: 7,8
5253c49e28cSMasatake YAMATO
526b40096fdSHadriel Kaplan	# in /tmp/foo.ctags:
527b40096fdSHadriel Kaplan	--langdef=Foo
528b40096fdSHadriel Kaplan	--map-Foo=+.foo
5299c9a7a7cSMasatake YAMATO	--kinddef-Foo=c,class,classes
5309c9a7a7cSMasatake YAMATO	--kinddef-Foo=d,definition,definitions
531b40096fdSHadriel Kaplan
5329c9a7a7cSMasatake YAMATO	--regex-Foo=/^class[[:blank:]]+([[:alpha:]]+):/\1/c/{scope=set}
5339c9a7a7cSMasatake YAMATO	--regex-Foo=/^[[:blank:]]+def[[:blank:]]+([[:alpha:]]+).*:/\1/d/{scope=ref}
534b40096fdSHadriel Kaplan
535b40096fdSHadriel Kaplan.. code-block:: console
536b40096fdSHadriel Kaplan
537b40096fdSHadriel Kaplan	$ ctags --options=/tmp/foo.ctags -o - /tmp/input.foo
5383c49e28cSMasatake YAMATO	bar	/tmp/input.foo	/^    def bar(baz):$/;"	d	class:foo
5393c49e28cSMasatake YAMATO	foo	/tmp/input.foo	/^class foo:$/;"	c
5403c49e28cSMasatake YAMATO	gar	/tmp/input.foo	/^    def gar(gaz):$/;"	d	class:goo
5413c49e28cSMasatake YAMATO	goo	/tmp/input.foo	/^class goo:$/;"	c
5423c49e28cSMasatake YAMATO
5433c49e28cSMasatake YAMATO
544b40096fdSHadriel KaplanExample 2:
5453c49e28cSMasatake YAMATO
546b40096fdSHadriel Kaplan.. code-block:: c
547b40096fdSHadriel Kaplan
548b40096fdSHadriel Kaplan	// in /tmp/input.pp
5493c49e28cSMasatake YAMATO	class foo {
550b40096fdSHadriel Kaplan		int bar;
5513c49e28cSMasatake YAMATO	}
5523c49e28cSMasatake YAMATO
553d14dd918SMasatake YAMATO.. code-block:: ctags
554a5c14cdaSHiroo HAYASHI	:emphasize-lines: 7-9
555b40096fdSHadriel Kaplan
556b40096fdSHadriel Kaplan	# in /tmp/pp.ctags:
5573c49e28cSMasatake YAMATO	--langdef=pp
5583c49e28cSMasatake YAMATO	--map-pp=+.pp
5599c9a7a7cSMasatake YAMATO	--kinddef-pp=c,class,classes
5609c9a7a7cSMasatake YAMATO	--kinddef-pp=v,variable,variables
5613c49e28cSMasatake YAMATO
562b40096fdSHadriel Kaplan	--regex-pp=/^[[:blank:]]*\}//{scope=pop}{exclusive}
5639c9a7a7cSMasatake YAMATO	--regex-pp=/^class[[:blank:]]*([[:alnum:]]+)[[[:blank:]]]*\{/\1/c/{scope=push}
5649c9a7a7cSMasatake YAMATO	--regex-pp=/^[[:blank:]]*int[[:blank:]]*([[:alnum:]]+)/\1/v/{scope=ref}
565b40096fdSHadriel Kaplan
566b40096fdSHadriel Kaplan.. code-block:: console
567b40096fdSHadriel Kaplan
568b40096fdSHadriel Kaplan	$ ctags --options=/tmp/pp.ctags -o - /tmp/input.pp
569c180d919SK.Takata	bar	/tmp/input.pp	/^    int bar$/;"	v	class:foo
5703c49e28cSMasatake YAMATO	foo	/tmp/input.pp	/^class foo {$/;"	c
5713c49e28cSMasatake YAMATO
57209be9c82SMasatake YAMATO
573f998e51dSMasatake YAMATOExample 3:
574f998e51dSMasatake YAMATO
575f998e51dSMasatake YAMATO.. code-block::
576f998e51dSMasatake YAMATO
577f998e51dSMasatake YAMATO	# in /tmp/input.docdoc
578f998e51dSMasatake YAMATO	title T
579f998e51dSMasatake YAMATO	...
580f998e51dSMasatake YAMATO	section S0
581f998e51dSMasatake YAMATO	...
582f998e51dSMasatake YAMATO	section S1
583f998e51dSMasatake YAMATO	...
584f998e51dSMasatake YAMATO
585f998e51dSMasatake YAMATO.. code-block:: ctags
586f998e51dSMasatake YAMATO	:emphasize-lines: 15,21
587f998e51dSMasatake YAMATO
588f998e51dSMasatake YAMATO	# in /tmp/doc.ctags:
589f998e51dSMasatake YAMATO	--langdef=doc
590f998e51dSMasatake YAMATO	--map-doc=+.docdoc
591f998e51dSMasatake YAMATO	--kinddef-doc=s,section,sections
592f998e51dSMasatake YAMATO	--kinddef-doc=S,subsection,subsections
593f998e51dSMasatake YAMATO
594f998e51dSMasatake YAMATO	--_tabledef-doc=main
595f998e51dSMasatake YAMATO	--_tabledef-doc=section
596f998e51dSMasatake YAMATO	--_tabledef-doc=subsection
597f998e51dSMasatake YAMATO
598f998e51dSMasatake YAMATO	--_mtable-regex-doc=main/section +([^\n]+)\n/\1/s/{scope=push}{tenter=section}
599f998e51dSMasatake YAMATO	--_mtable-regex-doc=main/[^\n]+\n|[^\n]+|\n//
600f998e51dSMasatake YAMATO	--_mtable-regex-doc=main///{scope=clear}{tquit}
601f998e51dSMasatake YAMATO
602f998e51dSMasatake YAMATO	--_mtable-regex-doc=section/section +([^\n]+)\n/\1/s/{scope=replace}
603f998e51dSMasatake YAMATO	--_mtable-regex-doc=section/subsection +([^\n]+)\n/\1/S/{scope=push}{tenter=subsection}
604f998e51dSMasatake YAMATO	--_mtable-regex-doc=section/[^\n]+\n|[^\n]+|\n//
605f998e51dSMasatake YAMATO	--_mtable-regex-doc=section///{scope=clear}{tquit}
606f998e51dSMasatake YAMATO
607f998e51dSMasatake YAMATO	--_mtable-regex-doc=subsection/(section )//{_advanceTo=0start}{tleave}{scope=pop}
608f998e51dSMasatake YAMATO	--_mtable-regex-doc=subsection/subsection +([^\n]+)\n/\1/S/{scope=replace}
609f998e51dSMasatake YAMATO	--_mtable-regex-doc=subsection/[^\n]+\n|[^\n]+|\n//
610f998e51dSMasatake YAMATO	--_mtable-regex-doc=subsection///{scope=clear}{tquit}
611f998e51dSMasatake YAMATO
612f998e51dSMasatake YAMATO.. code-block:: console
613f998e51dSMasatake YAMATO
614f998e51dSMasatake YAMATO	% ctags --sort=no --fields=+nl --options=/tmp/doc.ctags -o - /tmp/input.docdoc
615f998e51dSMasatake YAMATO	SEC0	/tmp/input.docdoc	/^section SEC0$/;"	s	line:1	language:doc
616f998e51dSMasatake YAMATO	SUB0-1	/tmp/input.docdoc	/^subsection SUB0-1$/;"	S	line:3	language:doc	section:SEC0
617f998e51dSMasatake YAMATO	SUB0-2	/tmp/input.docdoc	/^subsection SUB0-2$/;"	S	line:5	language:doc	section:SEC0
618f998e51dSMasatake YAMATO	SEC1	/tmp/input.docdoc	/^section SEC1$/;"	s	line:7	language:doc
619f998e51dSMasatake YAMATO	SUB1-1	/tmp/input.docdoc	/^subsection SUB1-1$/;"	S	line:9	language:doc	section:SEC1
620f998e51dSMasatake YAMATO	SUB1-2	/tmp/input.docdoc	/^subsection SUB1-2$/;"	S	line:11	language:doc	section:SEC1
621f998e51dSMasatake YAMATO
622f998e51dSMasatake YAMATO
623641e337aSMasatake YAMATONOTE: This flag doesn't work well with ``--mline-regex-<LANG>=``.
6248370e4a6SMasatake YAMATO
625b40096fdSHadriel KaplanOverriding the letter for file kind
626eb375513SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62709be9c82SMasatake YAMATO
628b40096fdSHadriel Kaplan.. Q: this was fixed in https://github.com/universal-ctags/ctags/pull/331
629b40096fdSHadriel Kaplan	so can we remove this section?
630b40096fdSHadriel Kaplan
631dccba5efSHiroo HAYASHIOne of the built-in tag kinds in Universal Ctags is the ``F`` file kind.
632dccba5efSHiroo HAYASHIOverriding the letter for file kind is not allowed in Universal Ctags.
633599fcc99SMasatake YAMATO
634b40096fdSHadriel Kaplan.. warning::
635f7c45d47SMasatake YAMATO
63604cce070SHiroo HAYASHI	Don't use ``F`` as a kind letter in your parser. (See issue `#317
63704cce070SHiroo HAYASHI	<https://github.com/universal-ctags/ctags/issues/317>`_ on github)
63809be9c82SMasatake YAMATO
639ecc1c043SMasatake YAMATOGenerating fully qualified tags automatically from scope information
640ecc1c043SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
641ecc1c043SMasatake YAMATO
64286bcb5c2SHiroo HAYASHIIf scope fields are filled properly with ``{scope=...}`` regex flags,
643ecc1c043SMasatake YAMATOyou can use the field values for generating fully qualified tags.
64486bcb5c2SHiroo HAYASHIAbout the ``{scope=..}`` flag itself, see "FLAGS FOR --regex-<LANG>
6454d0efd68SMasatake YAMATOOPTION" section of :ref:`ctags-optlib(7) <ctags-optlib(7)>`.
646ecc1c043SMasatake YAMATO
64786bcb5c2SHiroo HAYASHISpecify ``{_autoFQTag}`` to the end of ``--langdef=<LANG>`` option like
6483cd8570eSHiroo HAYASHI``--langdef=Foo{_autoFQTag}`` to make ctags generate fully qualified
649ecc1c043SMasatake YAMATOtags automatically.
650ecc1c043SMasatake YAMATO
65186bcb5c2SHiroo HAYASHI'``.``' is the (ctags global) default separator combining names into a
6527fa16a7fSMasatake YAMATOfully qualified tag. You can customize separators with
653a118be61SMasatake YAMATO``--_scopesep-<LANG>=...`` option.
654ecc1c043SMasatake YAMATO
655ecc1c043SMasatake YAMATOinput.foo::
656ecc1c043SMasatake YAMATO
657ecc1c043SMasatake YAMATO  class X
658ecc1c043SMasatake YAMATO     var y
659ecc1c043SMasatake YAMATO  end
660ecc1c043SMasatake YAMATO
661d14dd918SMasatake YAMATOfoo.ctags:
662d14dd918SMasatake YAMATO
663d14dd918SMasatake YAMATO.. code-block:: ctags
664a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
665ecc1c043SMasatake YAMATO
666ecc1c043SMasatake YAMATO	--langdef=foo{_autoFQTag}
667ecc1c043SMasatake YAMATO	--map-foo=+.foo
668ecc1c043SMasatake YAMATO	--kinddef-foo=c,class,classes
669ecc1c043SMasatake YAMATO	--kinddef-foo=v,var,variables
670ecc1c043SMasatake YAMATO	--regex-foo=/class ([A-Z]*)/\1/c/{scope=push}
671ecc1c043SMasatake YAMATO	--regex-foo=/end///{placeholder}{scope=pop}
672ecc1c043SMasatake YAMATO	--regex-foo=/[ \t]*var ([a-z]*)/\1/v/{scope=ref}
673ecc1c043SMasatake YAMATO
674ecc1c043SMasatake YAMATOOutput::
675ecc1c043SMasatake YAMATO
67645e335abSHiroo HAYASHI	$ u-ctags --quiet --options=./foo.ctags -o - input.foo
677ecc1c043SMasatake YAMATO	X	input.foo	/^class X$/;"	c
678ecc1c043SMasatake YAMATO	y	input.foo	/^	var y$/;"	v	class:X
679ecc1c043SMasatake YAMATO
68045e335abSHiroo HAYASHI	$ u-ctags --quiet --options=./foo.ctags --extras=+q -o - input.foo
681ecc1c043SMasatake YAMATO	X	input.foo	/^class X$/;"	c
682ecc1c043SMasatake YAMATO	X.y	input.foo	/^	var y$/;"	v	class:X
683ecc1c043SMasatake YAMATO	y	input.foo	/^	var y$/;"	v	class:X
684ecc1c043SMasatake YAMATO
685ecc1c043SMasatake YAMATO
68686bcb5c2SHiroo HAYASHI``X.y`` is printed as a fully qualified tag when ``--extras=+q`` is given.
687ecc1c043SMasatake YAMATO
6887fa16a7fSMasatake YAMATO.. NOT REVIEWED YET (--_scopesep)
6897fa16a7fSMasatake YAMATO
6907fa16a7fSMasatake YAMATOCustomizing scope separators
6917fa16a7fSMasatake YAMATO......................................................................
6927fa16a7fSMasatake YAMATOUse ``--_scopesep-<LANG>=[<parent-kindLetter>]/<child-kindLetter>:<sep>``
69386bcb5c2SHiroo HAYASHIoption for customizing if the language uses ``{_autoFQTag}``.
6947fa16a7fSMasatake YAMATO
6957fa16a7fSMasatake YAMATO``parent-kindLetter``
6967fa16a7fSMasatake YAMATO
6977fa16a7fSMasatake YAMATO	The kind letter for a tag of outer-scope.
6987fa16a7fSMasatake YAMATO
69986bcb5c2SHiroo HAYASHI	You can use '``*``' for specifying as wildcards that means
70086bcb5c2SHiroo HAYASHI	*any kinds* for a tag of outer-scope.
7017fa16a7fSMasatake YAMATO
7027fa16a7fSMasatake YAMATO	If you omit ``parent-kindLetter``, the separator is used as
7037fa16a7fSMasatake YAMATO	a prefix for tags having the kind specified with ``child-kindLetter``.
7047fa16a7fSMasatake YAMATO	This prefix can be used to refer to global namespace or similar concepts if the
7057fa16a7fSMasatake YAMATO	language has one.
7067fa16a7fSMasatake YAMATO
7077fa16a7fSMasatake YAMATO``child-kindLetter``
7087fa16a7fSMasatake YAMATO
7097fa16a7fSMasatake YAMATO	The kind letter for a tag of inner-scope.
7107fa16a7fSMasatake YAMATO
71186bcb5c2SHiroo HAYASHI	You can use '``*``' for specifying as wildcards that means
71286bcb5c2SHiroo HAYASHI	*any kinds* for a tag of inner-scope.
7137fa16a7fSMasatake YAMATO
7147fa16a7fSMasatake YAMATO``sep``
7157fa16a7fSMasatake YAMATO
7167fa16a7fSMasatake YAMATO	In a qualified tag, if the outer-scope has kind and ``parent-kindLetter``
7177fa16a7fSMasatake YAMATO	the inner-scope has ``child-kindLetter``, then ``sep`` is instead in
7187fa16a7fSMasatake YAMATO	between the scope names in the generated tags file.
7197fa16a7fSMasatake YAMATO
72086bcb5c2SHiroo HAYASHIspecifying '``*``' as both  ``parent-kindLetter`` and ``child-kindLetter``
7217fa16a7fSMasatake YAMATOsets ``sep`` as the language default separator. It is used as fallback.
7227fa16a7fSMasatake YAMATO
72386bcb5c2SHiroo HAYASHISpecifying '``*``' as ``child-kindLetter`` and omitting ``parent-kindLetter``
7247fa16a7fSMasatake YAMATOsets ``sep`` as the language default prefix. It is used as fallback.
7257fa16a7fSMasatake YAMATO
7267fa16a7fSMasatake YAMATO
7277fa16a7fSMasatake YAMATONOTE: There is no ctags global default prefix.
7283cd8570eSHiroo HAYASHI
7297fa16a7fSMasatake YAMATONOTE: ``_scopesep-<LANG>=...`` option affects only a parser that
7307fa16a7fSMasatake YAMATOenables ``_autoFQTag``. A parser building full qualified tags
7317fa16a7fSMasatake YAMATOmanually ignores the option.
7327fa16a7fSMasatake YAMATO
7337fa16a7fSMasatake YAMATOLet's see an example.
7347fa16a7fSMasatake YAMATOThe input file is written in Tcl.  Tcl parser is not an optlib
7357fa16a7fSMasatake YAMATOparser. However, it uses the ``_autoFQTag`` feature internally.
7367fa16a7fSMasatake YAMATOTherefore, ``_scopesep-Tcl=`` option works well. Tcl parser
73786bcb5c2SHiroo HAYASHIdefines two kinds ``n`` (``namespace``) and ``p`` (``procedure``).
7387fa16a7fSMasatake YAMATO
73986bcb5c2SHiroo HAYASHIBy default, Tcl parser uses ``::`` as scope separator. The parser also
74086bcb5c2SHiroo HAYASHIuses ``::`` as root prefix.
7417fa16a7fSMasatake YAMATO
7427fa16a7fSMasatake YAMATO.. code-block:: tcl
7437fa16a7fSMasatake YAMATO
7447fa16a7fSMasatake YAMATO	namespace eval N {
7457fa16a7fSMasatake YAMATO		namespace eval M {
7467fa16a7fSMasatake YAMATO			proc pr0 {s} {
7477fa16a7fSMasatake YAMATO				puts $s
7487fa16a7fSMasatake YAMATO			}
7497fa16a7fSMasatake YAMATO		}
7507fa16a7fSMasatake YAMATO	}
7517fa16a7fSMasatake YAMATO
7527fa16a7fSMasatake YAMATO	proc pr1 {s} {
7537fa16a7fSMasatake YAMATO		puts $s
7547fa16a7fSMasatake YAMATO	}
7557fa16a7fSMasatake YAMATO
75686bcb5c2SHiroo HAYASHI``M`` is defined under the scope of ``N``. ``pr0`` is defined	under the scope
75786bcb5c2SHiroo HAYASHIof ``M``. ``N`` and ``pr1`` are at top level (so they are candidates to be added
75886bcb5c2SHiroo HAYASHIprefixes). ``M`` and ``N`` are language objects with ``n`` (``namespace``) kind.
75986bcb5c2SHiroo HAYASHI``pr0`` and ``pr1`` are language objects with ``p`` (``procedure``) kind.
7607fa16a7fSMasatake YAMATO
7617fa16a7fSMasatake YAMATO.. code-block:: console
7627fa16a7fSMasatake YAMATO
7637fa16a7fSMasatake YAMATO	$ ctags -o - --extras=+q input.tcl
7647fa16a7fSMasatake YAMATO	::N	input.tcl	/^namespace eval N {$/;"	n
7657fa16a7fSMasatake YAMATO	::N::M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
7667fa16a7fSMasatake YAMATO	::N::M::pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N::M
7677fa16a7fSMasatake YAMATO	::pr1	input.tcl	/^proc pr1 {s} {$/;"	p
7687fa16a7fSMasatake YAMATO	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
7697fa16a7fSMasatake YAMATO	N	input.tcl	/^namespace eval N {$/;"	n
7707fa16a7fSMasatake YAMATO	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N::M
7717fa16a7fSMasatake YAMATO	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
7727fa16a7fSMasatake YAMATO
77386bcb5c2SHiroo HAYASHILet's change the default separator to ``->``:
7747fa16a7fSMasatake YAMATO
7757fa16a7fSMasatake YAMATO.. code-block:: console
776a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
7777fa16a7fSMasatake YAMATO
7787fa16a7fSMasatake YAMATO	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' input.tcl
7797fa16a7fSMasatake YAMATO	::N	input.tcl	/^namespace eval N {$/;"	n
7807fa16a7fSMasatake YAMATO	::N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
7817fa16a7fSMasatake YAMATO	::N->M->pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N->M
7827fa16a7fSMasatake YAMATO	::pr1	input.tcl	/^proc pr1 {s} {$/;"	p
7837fa16a7fSMasatake YAMATO	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:::N
7847fa16a7fSMasatake YAMATO	N	input.tcl	/^namespace eval N {$/;"	n
7857fa16a7fSMasatake YAMATO	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:::N->M
7867fa16a7fSMasatake YAMATO	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
7877fa16a7fSMasatake YAMATO
78886bcb5c2SHiroo HAYASHILet's define '``^``' as default prefix:
7897fa16a7fSMasatake YAMATO
7907fa16a7fSMasatake YAMATO.. code-block:: console
791a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
7927fa16a7fSMasatake YAMATO
7937fa16a7fSMasatake YAMATO	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' input.tcl
7947fa16a7fSMasatake YAMATO	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
7957fa16a7fSMasatake YAMATO	N	input.tcl	/^namespace eval N {$/;"	n
7967fa16a7fSMasatake YAMATO	^N	input.tcl	/^namespace eval N {$/;"	n
7977fa16a7fSMasatake YAMATO	^N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
7987fa16a7fSMasatake YAMATO	^N->M->pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
7997fa16a7fSMasatake YAMATO	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8007fa16a7fSMasatake YAMATO	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
8017fa16a7fSMasatake YAMATO	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8027fa16a7fSMasatake YAMATO
8037fa16a7fSMasatake YAMATOLet's override the specification of separator for combining a
80486bcb5c2SHiroo HAYASHInamespace and a procedure with '``+``': (About the separator for
8057fa16a7fSMasatake YAMATOcombining a namespace and another namespace, ctags uses the default separator.)
8067fa16a7fSMasatake YAMATO
8077fa16a7fSMasatake YAMATO.. code-block:: console
808a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
8097fa16a7fSMasatake YAMATO
810a5c14cdaSHiroo HAYASHI	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' --_scopesep-Tcl='n/p:+' input.tcl
8117fa16a7fSMasatake YAMATO	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
8127fa16a7fSMasatake YAMATO	N	input.tcl	/^namespace eval N {$/;"	n
8137fa16a7fSMasatake YAMATO	^N	input.tcl	/^namespace eval N {$/;"	n
8147fa16a7fSMasatake YAMATO	^N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:^N
8157fa16a7fSMasatake YAMATO	^N->M+pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
8167fa16a7fSMasatake YAMATO	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8177fa16a7fSMasatake YAMATO	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:^N->M
8187fa16a7fSMasatake YAMATO	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8197fa16a7fSMasatake YAMATO
82086bcb5c2SHiroo HAYASHILet's override the definition of prefix for a namespace with '``@``':
8217fa16a7fSMasatake YAMATO(About the prefix for procedures, ctags uses the default prefix.)
8227fa16a7fSMasatake YAMATO
8237fa16a7fSMasatake YAMATO.. code-block:: console
824a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
8257fa16a7fSMasatake YAMATO
826a5c14cdaSHiroo HAYASHI	$ ctags -o - --extras=+q --_scopesep-Tcl='*/*:->' --_scopesep-Tcl='/*:^' --_scopesep-Tcl='n/p:+' --_scopesep-Tcl='/n:@' input.tcl
8277fa16a7fSMasatake YAMATO	@N	input.tcl	/^namespace eval N {$/;"	n
8287fa16a7fSMasatake YAMATO	@N->M	input.tcl	/^	namespace eval M {$/;"	n	namespace:@N
8297fa16a7fSMasatake YAMATO	@N->M+pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:@N->M
8307fa16a7fSMasatake YAMATO	M	input.tcl	/^	namespace eval M {$/;"	n	namespace:@N
8317fa16a7fSMasatake YAMATO	N	input.tcl	/^namespace eval N {$/;"	n
8327fa16a7fSMasatake YAMATO	^pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8337fa16a7fSMasatake YAMATO	pr0	input.tcl	/^		proc pr0 {s} {$/;"	p	namespace:@N->M
8347fa16a7fSMasatake YAMATO	pr1	input.tcl	/^proc pr1 {s} {$/;"	p
8357fa16a7fSMasatake YAMATO
836ecc1c043SMasatake YAMATO
837b40096fdSHadriel KaplanMulti-line pattern match
8388370e4a6SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8398370e4a6SMasatake YAMATO
840b40096fdSHadriel KaplanWe often need to scan multiple lines to generate a tag, whether due to
841b40096fdSHadriel Kaplanneeding contextual information to decide whether to tag or not, or to
842b40096fdSHadriel Kaplanconstrain generating tags to only certain cases, or to grab multiple
843b40096fdSHadriel Kaplansubstrings to generate the tag name.
8448370e4a6SMasatake YAMATO
84586bcb5c2SHiroo HAYASHIUniversal Ctags has two ways to accomplish this: *multi-line regex options*,
84686bcb5c2SHiroo HAYASHIand an experimental *multi-table regex options* described later.
8478370e4a6SMasatake YAMATO
848b40096fdSHadriel KaplanThe newly introduced ``--mline-regex-<LANG>`` is similar to ``--regex-<LANG>``
849b40096fdSHadriel Kaplanexcept the pattern is applied to the whole file's contents, not line by line.
8508370e4a6SMasatake YAMATO
85104cce070SHiroo HAYASHIThis example is based on an issue `#219
85204cce070SHiroo HAYASHI<https://github.com/universal-ctags/ctags/issues/219>`_ posted by
85304cce070SHiroo HAYASHI@andreicristianpetcu:
854b40096fdSHadriel Kaplan
855b40096fdSHadriel Kaplan.. code-block:: java
856b40096fdSHadriel Kaplan
857b40096fdSHadriel Kaplan	// in input.java:
858b40096fdSHadriel Kaplan
8598370e4a6SMasatake YAMATO	@Subscribe
8608370e4a6SMasatake YAMATO	public void catchEvent(SomeEvent e)
8618370e4a6SMasatake YAMATO	{
8628370e4a6SMasatake YAMATO	   return;
8638370e4a6SMasatake YAMATO	}
8648370e4a6SMasatake YAMATO
8658370e4a6SMasatake YAMATO	@Subscribe
8668370e4a6SMasatake YAMATO	public void
8678370e4a6SMasatake YAMATO	recover(Exception e)
8688370e4a6SMasatake YAMATO	{
8698370e4a6SMasatake YAMATO	    return;
8708370e4a6SMasatake YAMATO	}
8718370e4a6SMasatake YAMATO
872b40096fdSHadriel KaplanThe above java code is similar to the Java `Spring <https://spring.io>`_
873b40096fdSHadriel Kaplanframework. The ``@Subscribe`` annotation is a keyword for the framework, and the
874b40096fdSHadriel Kaplandeveloper would like to have a tag generated for each method annotated with
875b40096fdSHadriel Kaplan``@Subscribe``, using the name of the method followed by a dash followed by the
876b40096fdSHadriel Kaplantype of the argument. For example the developer wants the tag name
877b40096fdSHadriel Kaplan``Event-SomeEvent`` generated for the first method shown above.
878b40096fdSHadriel Kaplan
879b40096fdSHadriel KaplanTo accomplish this, the developer creates a :file:`spring.ctags` file with
880b40096fdSHadriel Kaplanthe following:
881b40096fdSHadriel Kaplan
882d14dd918SMasatake YAMATO.. code-block:: ctags
883a5c14cdaSHiroo HAYASHI	:emphasize-lines: 4
884b40096fdSHadriel Kaplan
885b40096fdSHadriel Kaplan	# in spring.ctags:
8868370e4a6SMasatake YAMATO	--langdef=javaspring
887d14dd918SMasatake YAMATO	--map-javaspring=+.java
88810860ef1SMasatake YAMATO	--mline-regex-javaspring=/@Subscribe([[:space:]])*([a-z ]+)[[:space:]]*([a-zA-Z]*)\(([a-zA-Z]*)/\3-\4/s,subscription/{mgroup=3}
8898370e4a6SMasatake YAMATO	--fields=+ln
8908370e4a6SMasatake YAMATO
891b40096fdSHadriel KaplanAnd now using :file:`spring.ctags` the tag file has this:
892b40096fdSHadriel Kaplan
893b40096fdSHadriel Kaplan.. code-block:: console
894b40096fdSHadriel Kaplan
89545e335abSHiroo HAYASHI	$ ctags -o - --options=./spring.ctags input.java
8968370e4a6SMasatake YAMATO	Event-SomeEvent	input.java	/^public void catchEvent(SomeEvent e)$/;"	s	line:2	language:javaspring
8978370e4a6SMasatake YAMATO	recover-Exception	input.java	/^    recover(Exception e)$/;"	s	line:10	language:javaspring
8988370e4a6SMasatake YAMATO
899b40096fdSHadriel KaplanMultiline pattern flags
900b40096fdSHadriel Kaplan......................................................................
901b40096fdSHadriel Kaplan
902b40096fdSHadriel Kaplan.. note:: These flags also apply to the experimental ``--_mtable-regex-<LANG>``
903b40096fdSHadriel Kaplan	option described later.
904b40096fdSHadriel Kaplan
905641e337aSMasatake YAMATO``{mgroup=N}``
9068370e4a6SMasatake YAMATO
907b40096fdSHadriel Kaplan	This flag indicates the pattern should be applied to the whole file
908b40096fdSHadriel Kaplan	contents, not line by line. ``N`` is the number of a capture group in the
909b40096fdSHadriel Kaplan	pattern, which is used to record the line number location of the tag. In the
910b40096fdSHadriel Kaplan	above example ``3`` is specified. The start position of the regex capture
911b40096fdSHadriel Kaplan	group 3, relative to the whole file is used.
912b40096fdSHadriel Kaplan
913b40096fdSHadriel Kaplan.. warning:: You **must** add an ``{mgroup=N}`` flag to the multi-line
914b40096fdSHadriel Kaplan	``--mline-regex-<LANG>`` option, even if the ``N`` is ``0`` (meaning the
915b40096fdSHadriel Kaplan	start position of the whole regex pattern). You do not need to add it for
916b40096fdSHadriel Kaplan	the multi-table ``--_mtable-regex-<LANG>``.
917b40096fdSHadriel Kaplan
9183cd8570eSHiroo HAYASHI.. TODO: Q: isn't the above restriction really a bug? I think it is. I should fix it.
919db3dd52bSHiroo HAYASHI   Q to @masatake-san: Do you mean that {mgroup=0} can be omitted? -> #2918 is opened
920b40096fdSHadriel Kaplan
9218370e4a6SMasatake YAMATO
922c9bfc26fSMasatake YAMATO``{_advanceTo=N[start|end]}``
923c9bfc26fSMasatake YAMATO
924b40096fdSHadriel Kaplan	A regex pattern is applied to whole file's contents iteratively. This long
925b40096fdSHadriel Kaplan	flag specifies from where the pattern should be applied in the next
926b40096fdSHadriel Kaplan	iteration for regex matching. When a pattern matches, the next pattern
927b40096fdSHadriel Kaplan	matching starts from the start or end of capture group ``N``. By default it
928e4668dd9SMasanari Iida	advances to the end of the whole match (i.e., ``{_advanceTo=0end}`` is
929b40096fdSHadriel Kaplan	the default).
930c9bfc26fSMasatake YAMATO
931c9bfc26fSMasatake YAMATO
932c9bfc26fSMasatake YAMATO	Let's think about following input
933c9bfc26fSMasatake YAMATO	::
934c9bfc26fSMasatake YAMATO
935c9bfc26fSMasatake YAMATO	   def def abc
936c9bfc26fSMasatake YAMATO
9373cd8570eSHiroo HAYASHI	Consider two sets of options, ``foo.ctags`` and ``bar.ctags``.
938c9bfc26fSMasatake YAMATO
939d14dd918SMasatake YAMATO	.. code-block:: ctags
940a5c14cdaSHiroo HAYASHI		:emphasize-lines: 5
941c9bfc26fSMasatake YAMATO
942b40096fdSHadriel Kaplan		# foo.ctags:
943c9bfc26fSMasatake YAMATO	   	--langdef=foo
944c9bfc26fSMasatake YAMATO	   	--langmap=foo:.foo
945c9bfc26fSMasatake YAMATO	   	--kinddef-foo=a,something,something
946c9bfc26fSMasatake YAMATO	   	--mline-regex-foo=/def *([a-z]+)/\1/a/{mgroup=1}
947c9bfc26fSMasatake YAMATO
948c9bfc26fSMasatake YAMATO
949d14dd918SMasatake YAMATO	.. code-block:: ctags
950a5c14cdaSHiroo HAYASHI		:emphasize-lines: 5
951c9bfc26fSMasatake YAMATO
952b40096fdSHadriel Kaplan		# bar.ctags:
953c9bfc26fSMasatake YAMATO		--langdef=bar
954c9bfc26fSMasatake YAMATO		--langmap=bar:.bar
955c9bfc26fSMasatake YAMATO		--kinddef-bar=a,something,something
956c9bfc26fSMasatake YAMATO		--mline-regex-bar=/def *([a-z]+)/\1/a/{mgroup=1}{_advanceTo=1start}
957c9bfc26fSMasatake YAMATO
95886bcb5c2SHiroo HAYASHI	``foo.ctags`` emits following tags output::
959c9bfc26fSMasatake YAMATO
960c9bfc26fSMasatake YAMATO	   def	input.foo	/^def def abc$/;"	a
961c9bfc26fSMasatake YAMATO
96286bcb5c2SHiroo HAYASHI	``bar.ctags`` emits following tags output::
963c9bfc26fSMasatake YAMATO
964c9bfc26fSMasatake YAMATO	   def	input-0.bar	/^def def abc$/;"	a
965c9bfc26fSMasatake YAMATO	   abc	input-0.bar	/^def def abc$/;"	a
966c9bfc26fSMasatake YAMATO
96786bcb5c2SHiroo HAYASHI	``_advanceTo=1start`` is specified in ``bar.ctags``.
96886bcb5c2SHiroo HAYASHI	This allows ctags to capture ``abc``.
969c9bfc26fSMasatake YAMATO
970c9bfc26fSMasatake YAMATO	At the first iteration, the patterns of both
97186bcb5c2SHiroo HAYASHI	``foo.ctags`` and ``bar.ctags`` match as follows
972c9bfc26fSMasatake YAMATO	::
973f7c45d47SMasatake YAMATO
974c9bfc26fSMasatake YAMATO		0   1       (start)
975c9bfc26fSMasatake YAMATO		v   v
976c9bfc26fSMasatake YAMATO		def def abc
977c9bfc26fSMasatake YAMATO		       ^
978c9bfc26fSMasatake YAMATO		       0,1  (end)
979c9bfc26fSMasatake YAMATO
98086bcb5c2SHiroo HAYASHI	``def`` at the group 1 is captured as a tag in
981c9bfc26fSMasatake YAMATO	both languages. At the next iteration, the positions
982c9bfc26fSMasatake YAMATO	where the pattern matching is applied to are not the
98325d761b4SMasatake YAMATO	same in the languages.
984c9bfc26fSMasatake YAMATO
98586bcb5c2SHiroo HAYASHI	``foo.ctags``
986c9bfc26fSMasatake YAMATO	::
987f7c45d47SMasatake YAMATO
988c9bfc26fSMasatake YAMATO		       0end (default)
989c9bfc26fSMasatake YAMATO		       v
990c9bfc26fSMasatake YAMATO		def def abc
991c9bfc26fSMasatake YAMATO
992c9bfc26fSMasatake YAMATO
99386bcb5c2SHiroo HAYASHI	``bar.ctags``
994c9bfc26fSMasatake YAMATO	::
995f7c45d47SMasatake YAMATO
996c9bfc26fSMasatake YAMATO		    1start (as specified in _advanceTo long flag)
997c9bfc26fSMasatake YAMATO		    v
998c9bfc26fSMasatake YAMATO		def def abc
999c9bfc26fSMasatake YAMATO
1000c9bfc26fSMasatake YAMATO	This difference of positions makes the difference of tags output.
1001c9bfc26fSMasatake YAMATO
1002b40096fdSHadriel Kaplan	A more relevant use-case is when ``{_advanceTo=N[start|end]}`` is used in
1003b40096fdSHadriel Kaplan	the experimental ``--_mtable-regex-<LANG>``, to "advance" back to the
1004b40096fdSHadriel Kaplan	beginning of a match, so that one can generate multiple tags for the same
1005b40096fdSHadriel Kaplan	input line(s).
1006c9bfc26fSMasatake YAMATO
1007b40096fdSHadriel Kaplan.. note:: This flag doesn't work well with scope related flags and ``exclusive`` flags.
100809be9c82SMasatake YAMATO
100901afa120SMasatake YAMATO
1010b40096fdSHadriel Kaplan.. Q: this was previously titled "Byte oriented pattern matching...", presumably
1011b40096fdSHadriel Kaplan	because it "matched against the input at the current byte position, not line".
1012b40096fdSHadriel Kaplan	But that's also true for --mline-regex-<LANG>, as far as I can tell.
1013b40096fdSHadriel Kaplan
1014b40096fdSHadriel KaplanAdvanced pattern matching with multiple regex tables
101501afa120SMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101601afa120SMasatake YAMATO
1017b40096fdSHadriel Kaplan.. note:: This is a highly experimental feature. This will not go into
1018b40096fdSHadriel Kaplan	the man page of 6.0. But let's be honest, it's the most exciting feature!
101901afa120SMasatake YAMATO
1020b40096fdSHadriel KaplanIn some cases, the ``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options are not
1021b40096fdSHadriel Kaplansufficient to generate the tags for a particular language. Some of the common
1022b40096fdSHadriel Kaplanreasons for this are:
102301afa120SMasatake YAMATO
1024b40096fdSHadriel Kaplan* To ignore commented lines or sections for the language file, so that
1025b40096fdSHadriel Kaplan  tags aren't generated for symbols that are within the comments.
1026b40096fdSHadriel Kaplan* To enter and exit scope, and use it for tagging based on contextual
1027b40096fdSHadriel Kaplan  state or with end-scope markers that are difficult to match to their
1028b40096fdSHadriel Kaplan  associated scope entry point.
1029b40096fdSHadriel Kaplan* To support nested scopes.
1030b40096fdSHadriel Kaplan* To change the pattern searched for, or the resultant tag for the same
1031b40096fdSHadriel Kaplan  pattern, based on scoping or contextual location.
1032b40096fdSHadriel Kaplan* To break up an overly complicated ``--mline-regex-<LANG>`` pattern into
1033b40096fdSHadriel Kaplan  separate regex patterns, for performance or readability reasons.
103401afa120SMasatake YAMATO
1035dccba5efSHiroo HAYASHITo help handle such things, Universal Ctags has been enhanced with multi-table
1036b40096fdSHadriel Kaplanregex matching. The feature is inspired by `lex`, the fast lexical analyzer
1037b40096fdSHadriel Kaplangenerator, which is a popular tool on Unix environments for writing parsers, and
1038b40096fdSHadriel Kaplan`RegexLexer <http://pygments.org/docs/lexerdevelopment/>`_ of Pygments.
1039b40096fdSHadriel KaplanKnowledge about them will help you understand the new options.
104001afa120SMasatake YAMATO
1041b40096fdSHadriel KaplanThe new options are:
1042b40096fdSHadriel Kaplan
1043b40096fdSHadriel Kaplan``--_tabledef-<LANG>``
1044b40096fdSHadriel Kaplan	Declares a new regex matching table of a given name for the language,
104586bcb5c2SHiroo HAYASHI	as described in ":ref:`tabledef`".
1046b40096fdSHadriel Kaplan
1047b40096fdSHadriel Kaplan``--_mtable-regex-<LANG>``
1048b40096fdSHadriel Kaplan	Adds a regex pattern and associated tag generation information and flags, to
104986bcb5c2SHiroo HAYASHI	the given table, as described in ":ref:`mtable_regex`".
1050b40096fdSHadriel Kaplan
1051b40096fdSHadriel Kaplan``--_mtable-extend-<LANG>``
1052b40096fdSHadriel Kaplan	Includes a previously-defined regex table to the named one.
1053b40096fdSHadriel Kaplan
1054b40096fdSHadriel KaplanThe above will be discussed in more detail shortly.
1055b40096fdSHadriel Kaplan
10563f73955fSMasatake YAMATOFirst, let's explain the feature with an example. Consider an
105786bcb5c2SHiroo HAYASHIimaginary language `X` has a similar syntax as JavaScript: ``var`` is
10583cd8570eSHiroo HAYASHIused as defining variable(s), and "``/* ... */``" is used for block
1059b40096fdSHadriel Kaplancomments.
1060b40096fdSHadriel Kaplan
1061b40096fdSHadriel KaplanHere is our input, :file:`input.x`:
1062b40096fdSHadriel Kaplan
1063b40096fdSHadriel Kaplan.. code-block:: java
106401afa120SMasatake YAMATO
106501afa120SMasatake YAMATO   /* BLOCK COMMENT
106601afa120SMasatake YAMATO   var dont_capture_me;
106701afa120SMasatake YAMATO   */
106801afa120SMasatake YAMATO   var a /* ANOTHER BLOCK COMMENT */, b;
106901afa120SMasatake YAMATO
1070b40096fdSHadriel KaplanWe want ctags to capture ``a`` and ``b`` - but it is difficult to write a parser
1071b40096fdSHadriel Kaplanthat will ignore ``dont_capture_me`` in the comment with a classical regex
1072b40096fdSHadriel Kaplanparser defined with ``--regex-<LANG>`` or ``--mline-regex-<LANG>``, because of
1073b40096fdSHadriel Kaplanthe block comments.
107401afa120SMasatake YAMATO
1075be11ec05SMasanari IidaThe ``--regex-<LANG>`` option only works on one line at a time, so can not know
1076b40096fdSHadriel Kaplan``dont_capture_me`` is within comments. The ``--mline-regex-<LANG>`` could
1077b40096fdSHadriel Kaplando it in theory, but due to the greedy nature of the regex engine it is
1078b40096fdSHadriel Kaplanimpractical and potentially inefficient to do so, given that there could be
107986bcb5c2SHiroo HAYASHImultiple block comments in the file, with '``*``' inside them, etc.
108001afa120SMasatake YAMATO
1081b40096fdSHadriel KaplanA parser written with multi-table regex, on the other hand, can capture only
1082b40096fdSHadriel Kaplan``a`` and ``b`` safely. But it is more complicated to understand.
108301afa120SMasatake YAMATO
10843cd8570eSHiroo HAYASHIHere is the 1st version of :file:`X.ctags`:
1085d14dd918SMasatake YAMATO
1086d14dd918SMasatake YAMATO.. code-block:: ctags
108701afa120SMasatake YAMATO
108801afa120SMasatake YAMATO   --langdef=X
108901afa120SMasatake YAMATO   --map-X=.x
109001afa120SMasatake YAMATO   --kinddef-X=v,var,variables
109101afa120SMasatake YAMATO
1092b40096fdSHadriel KaplanNot so interesting. It doesn't really *do* anything yet. It just creates a new
1093b40096fdSHadriel Kaplanlanguage named ``X``, for files ending with a :file:`.x` suffix, and defines a
1094b40096fdSHadriel Kaplannew tag for variable kinds.
109501afa120SMasatake YAMATO
1096b40096fdSHadriel KaplanWhen writing a multi-table parser, you have to think about the necessary states
109786bcb5c2SHiroo HAYASHIof parsing. For the parser of language `X`, we need the following states:
109801afa120SMasatake YAMATO
109901afa120SMasatake YAMATO* `toplevel` (initial state)
110001afa120SMasatake YAMATO* `comment` (inside comment)
110101afa120SMasatake YAMATO* `vars` (var statements)
110201afa120SMasatake YAMATO
1103b40096fdSHadriel Kaplan.. _tabledef:
110401afa120SMasatake YAMATO
1105b40096fdSHadriel KaplanDeclaring a new regex table
1106b40096fdSHadriel Kaplan......................................................................
1107b40096fdSHadriel Kaplan
1108b40096fdSHadriel KaplanBefore adding regular expressions, you have to declare tables for each state
1109b40096fdSHadriel Kaplanwith the ``--_tabledef-<LANG>=<TABLE>`` option.
1110b40096fdSHadriel Kaplan
1111b40096fdSHadriel KaplanHere is the 2nd version of :file:`X.ctags` doing so:
1112d14dd918SMasatake YAMATO
1113d14dd918SMasatake YAMATO.. code-block:: ctags
1114a5c14cdaSHiroo HAYASHI	:emphasize-lines: 5-7
111501afa120SMasatake YAMATO
111601afa120SMasatake YAMATO	--langdef=X
111701afa120SMasatake YAMATO	--map-X=.x
111801afa120SMasatake YAMATO	--kinddef-X=v,var,variables
111901afa120SMasatake YAMATO
112001afa120SMasatake YAMATO	--_tabledef-X=toplevel
112101afa120SMasatake YAMATO	--_tabledef-X=comment
112201afa120SMasatake YAMATO	--_tabledef-X=vars
112301afa120SMasatake YAMATO
1124b40096fdSHadriel KaplanFor table names, only characters in the range ``[0-9a-zA-Z_]`` are acceptable.
112501afa120SMasatake YAMATO
1126b40096fdSHadriel KaplanFor a given language, for each file's input the ctags multi-table parser begins
112786bcb5c2SHiroo HAYASHIwith the first declared table. For :file:`X.ctags`, ``toplevel`` is the one.
1128b40096fdSHadriel KaplanThe other tables are only ever entered/checked if another table specified to do
1129b40096fdSHadriel Kaplanso, starting with the first table. In other words, if the first declared table
1130b40096fdSHadriel Kaplandoes not find a match for the current input, and does not specify to go to
1131b40096fdSHadriel Kaplananother table, the other tables for that language won't be used. The flags to go
1132b40096fdSHadriel Kaplanto another table are ``{tenter}``, ``{tleave}``, and ``{tjump}``, as described
1133b40096fdSHadriel Kaplanlater.
113401afa120SMasatake YAMATO
1135b40096fdSHadriel Kaplan.. _mtable_regex:
113601afa120SMasatake YAMATO
1137b40096fdSHadriel KaplanAdding a regex to a regex table
1138b40096fdSHadriel Kaplan......................................................................
113901afa120SMasatake YAMATO
1140b40096fdSHadriel KaplanThe new option to add a regex to a declared table is ``--_mtable-regex-<LANG>``,
1141b40096fdSHadriel Kaplanand it follows this form:
114201afa120SMasatake YAMATO
11433cd8570eSHiroo HAYASHI.. code-block:: ctags
114401afa120SMasatake YAMATO
1145b40096fdSHadriel Kaplan	--_mtable-regex-<LANG>=<TABLE>/<PATTERN>/<NAME>/[<KIND>]/LONGFLAGS
1146b40096fdSHadriel Kaplan
1147b40096fdSHadriel KaplanThe parameters for ``--_mtable-regex-<LANG>`` look complicated. However,
1148b40096fdSHadriel Kaplan``<PATTERN>``, ``<NAME>``, and ``<KIND>`` are the same as the parameters of the
1149b40096fdSHadriel Kaplan``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options. ``<TABLE>`` is simply
1150b40096fdSHadriel Kaplanthe name of a table previously declared with the ``--_tabledef-<LANG>`` option.
1151b40096fdSHadriel Kaplan
1152b40096fdSHadriel KaplanA regex pattern added to a parser with ``--_mtable-regex-<LANG>`` is matched
1153b40096fdSHadriel Kaplanagainst the input at the current byte position, not line. Even if you do not
115486bcb5c2SHiroo HAYASHIspecify the '``^``' anchor at the start of the pattern, ctags adds '``^``' to
1155b40096fdSHadriel Kaplanthe pattern automatically. Unlike the ``--regex-<LANG>`` and
115686bcb5c2SHiroo HAYASHI``--mline-regex-<LANG>`` options, a '``^``' anchor does not mean "beginning of
1157b40096fdSHadriel Kaplanline" in ``--_mtable-regex-<LANG>``; instead it means the beginning of the
1158b40096fdSHadriel Kaplaninput string (i.e., the current byte position).
1159b40096fdSHadriel Kaplan
1160b40096fdSHadriel KaplanThe ``LONGFLAGS`` include the already discussed flags for ``--regex-<LANG>`` and
1161b40096fdSHadriel Kaplan``--mline-regex-<LANG>``: ``{scope=...}``, ``{mgroup=N}``, ``{_advanceTo=N}``,
1162b40096fdSHadriel Kaplan``{basic}``, ``{extend}``, and ``{icase}``. The ``{exclusive}`` flag does not
1163b40096fdSHadriel Kaplanmake sense for multi-table regex.
1164b40096fdSHadriel Kaplan
1165b40096fdSHadriel KaplanIn addition, several new flags are introduced exclusively for multi-table
1166b40096fdSHadriel Kaplanregex use:
1167b40096fdSHadriel Kaplan
1168b40096fdSHadriel Kaplan``{tenter}``
1169b40096fdSHadriel Kaplan	Push the current table on the stack, and enter another table.
1170b40096fdSHadriel Kaplan
1171b40096fdSHadriel Kaplan``{tleave}``
1172b40096fdSHadriel Kaplan	Leave the current table, pop the stack, and go to the table that was
1173b40096fdSHadriel Kaplan	just popped from the stack.
1174b40096fdSHadriel Kaplan
1175b40096fdSHadriel Kaplan``{tjump}``
1176b40096fdSHadriel Kaplan	Jump to another table, without affecting the stack.
1177b40096fdSHadriel Kaplan
1178b40096fdSHadriel Kaplan``{treset}``
1179b40096fdSHadriel Kaplan	Clear the stack, and go to another table.
1180b40096fdSHadriel Kaplan
1181b40096fdSHadriel Kaplan``{tquit}``
1182b40096fdSHadriel Kaplan	Clear the stack, and stop processing the current input file for this
1183b40096fdSHadriel Kaplan	language.
1184b40096fdSHadriel Kaplan
1185b40096fdSHadriel KaplanTo explain the above new flags, we'll continue using our example in the
1186b40096fdSHadriel Kaplannext section.
118701afa120SMasatake YAMATO
118801afa120SMasatake YAMATOSkipping block comments
118901afa120SMasatake YAMATO......................................................................
119001afa120SMasatake YAMATO
1191b40096fdSHadriel KaplanLet's continue with our example. Here is the 3rd version of :file:`X.ctags`:
119201afa120SMasatake YAMATO
1193d14dd918SMasatake YAMATO.. code-block:: ctags
1194a5c14cdaSHiroo HAYASHI	:emphasize-lines: 9-13
1195a5c14cdaSHiroo HAYASHI	:linenos:
119601afa120SMasatake YAMATO
119701afa120SMasatake YAMATO	--langdef=X
119801afa120SMasatake YAMATO	--map-X=.x
119901afa120SMasatake YAMATO	--kinddef-X=v,var,variables
120001afa120SMasatake YAMATO
120101afa120SMasatake YAMATO	--_tabledef-X=toplevel
120201afa120SMasatake YAMATO	--_tabledef-X=comment
120301afa120SMasatake YAMATO	--_tabledef-X=vars
120401afa120SMasatake YAMATO
120501afa120SMasatake YAMATO	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
120601afa120SMasatake YAMATO	--_mtable-regex-X=toplevel/.//
120701afa120SMasatake YAMATO
120801afa120SMasatake YAMATO	--_mtable-regex-X=comment/\*\///{tleave}
120901afa120SMasatake YAMATO	--_mtable-regex-X=comment/.//
121001afa120SMasatake YAMATO
1211b40096fdSHadriel KaplanFour ``--_mtable-regex-X`` lines are added for skipping the block comments. Let's
1212b40096fdSHadriel Kaplandiscuss them one by one.
121301afa120SMasatake YAMATO
121486bcb5c2SHiroo HAYASHIFor each new file it scans, ctags always chooses the first pattern of the
121586bcb5c2SHiroo HAYASHIfirst table of the parser. Even if it's an empty table, ctags will only try
1216be11ec05SMasanari Iidathe first declared table. (in such a case it would immediately fail to match
1217da7b7cd3SIvan Gonzalez Polancoanything, and thus stop processing the input file and effectively do nothing)
121801afa120SMasatake YAMATO
1219b40096fdSHadriel KaplanThe first declared table (``toplevel``) has the following regex added to
1220b40096fdSHadriel Kaplanit first:
122101afa120SMasatake YAMATO
1222d14dd918SMasatake YAMATO.. code-block:: ctags
1223a5c14cdaSHiroo HAYASHI	:linenos:
1224a5c14cdaSHiroo HAYASHI	:lineno-start: 9
122501afa120SMasatake YAMATO
1226b40096fdSHadriel Kaplan	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1227b40096fdSHadriel Kaplan
1228b40096fdSHadriel KaplanA pattern of ``\/\*`` is added to the ``toplevel`` table, to match the
1229b40096fdSHadriel Kaplanbeginning of a block comment. A backslash character is used in front of the
123086bcb5c2SHiroo HAYASHIleading '``/``' to escape the separation character '``/``' that separates the fields
1231b40096fdSHadriel Kaplanof ``--_mtable-regex-<LANG>``. Another backslash inside the pattern is used
123286bcb5c2SHiroo HAYASHIbefore the asterisk '``*``', to make it a literal asterisk character in regex.
1233b40096fdSHadriel Kaplan
123486bcb5c2SHiroo HAYASHIThe last ``//`` means ctags should not tag something matching this pattern.
1235b40096fdSHadriel KaplanIn ``--regex-<LANG>`` you never use ``//`` because it would be pointless to
1236b40096fdSHadriel Kaplanmatch something and not tag it using and single-line ``--regex-<LANG>``; in
1237b40096fdSHadriel Kaplanmulti-line ``--mline-regex-<LANG>`` you rarely see it, because it would rarely
1238b40096fdSHadriel Kaplanbe useful. But in multi-table regex it's quite common, since you frequently
1239b40096fdSHadriel Kaplanwant to transition from one state to another (i.e., ``tenter`` or ``tjump``
1240b40096fdSHadriel Kaplanfrom one table to another).
1241b40096fdSHadriel Kaplan
1242b40096fdSHadriel KaplanThe long flag added to our first regex of our first table is ``tenter``, which
1243b40096fdSHadriel Kaplanis a long flag for switching the table and pushing on the stack. ``{tenter=comment}``
124401afa120SMasatake YAMATOmeans "switch the table from toplevel to comment".
124501afa120SMasatake YAMATO
124686bcb5c2SHiroo HAYASHISo given the input file :file:`input.x` shown earlier, ctags will begin at
1247b40096fdSHadriel Kaplanthe ``toplevel`` table and try to match the first regex. It will succeed, and
1248b40096fdSHadriel Kaplanthus push on the stack and go to the ``comment`` table.
124901afa120SMasatake YAMATO
1250b40096fdSHadriel KaplanIt will begin at the top of the ``comment`` table (it always begins at the top
1251b40096fdSHadriel Kaplanof a given table), and try each regex line in sequence until it finds a match.
1252b40096fdSHadriel KaplanIf it fails to find a match, it will pop the stack and go to the table that was
1253b40096fdSHadriel Kaplanjust popped from the stack, and begin trying to match at the top of *that* table.
1254b40096fdSHadriel KaplanIf it continues failing to find a match, and ultimately reaches the end of the
1255b40096fdSHadriel Kaplanstack, it will stop processing for this file. For the next input file, it will
1256b40096fdSHadriel Kaplanbegin again from the top of the first declared table.
125701afa120SMasatake YAMATO
1258b40096fdSHadriel KaplanGetting back to our example, the top of the ``comment`` table has this regex:
125901afa120SMasatake YAMATO
1260d14dd918SMasatake YAMATO.. code-block:: ctags
1261a5c14cdaSHiroo HAYASHI	:linenos:
1262a5c14cdaSHiroo HAYASHI	:lineno-start: 12
126301afa120SMasatake YAMATO
1264b40096fdSHadriel Kaplan	--_mtable-regex-X=comment/\*\///{tleave}
1265b40096fdSHadriel Kaplan
1266b40096fdSHadriel KaplanSimilar to the previous ``toplevel`` table pattern, this one for ``\*\/`` uses
126786bcb5c2SHiroo HAYASHIa backslash to escape the separator '``/``', as well as one before the '``*``' to
1268b40096fdSHadriel Kaplanmake it a literal asterisk in regex. So what it's looking for, from a simple
1269b40096fdSHadriel Kaplanstring perspective, is the sequence ``*/``. Note that this means even though
1270b40096fdSHadriel Kaplanyou see three backslashes ``///`` at the end, the first one is escaped and used
1271b40096fdSHadriel Kaplanfor the pattern itself, and the ``--_mtable-regex-X`` only has ``//`` to
1272b40096fdSHadriel Kaplanseparate the regex pattern from the long flags, instead of the usual ``///``.
1273b40096fdSHadriel KaplanThus it's using the shorthand form of the ``--_mtable-regex-X`` option.
1274b40096fdSHadriel KaplanIt could instead have been:
1275b40096fdSHadriel Kaplan
1276d14dd918SMasatake YAMATO.. code-block:: ctags
1277b40096fdSHadriel Kaplan
1278b40096fdSHadriel Kaplan	--_mtable-regex-X=comment/\*\////{tleave}
1279b40096fdSHadriel Kaplan
1280b40096fdSHadriel KaplanThe above would have worked exactly the same.
1281b40096fdSHadriel Kaplan
1282b40096fdSHadriel KaplanGetting back to our example, remember we're looking at the :file:`input.x`
1283b40096fdSHadriel Kaplanfile, currently using the ``comment`` table, and trying to match the first
1284b40096fdSHadriel Kaplanregex of that table, shown above, at the following location::
1285b40096fdSHadriel Kaplan
1286b40096fdSHadriel Kaplan	   ,ctags is trying to match starting here
1287b40096fdSHadriel Kaplan	  v
128801afa120SMasatake YAMATO	/* BLOCK COMMENT
128901afa120SMasatake YAMATO	var dont_capture_me;
129001afa120SMasatake YAMATO	*/
129101afa120SMasatake YAMATO	var a /* ANOTHER BLOCK COMMENT */, b;
129201afa120SMasatake YAMATO
1293b40096fdSHadriel KaplanThe pattern doesn't match for the position just after ``/*``, because that
129486bcb5c2SHiroo HAYASHIposition is a space character. So ctags tries the next pattern in the same
1295b40096fdSHadriel Kaplantable:
129601afa120SMasatake YAMATO
1297d14dd918SMasatake YAMATO.. code-block:: ctags
1298a5c14cdaSHiroo HAYASHI	:linenos:
1299a5c14cdaSHiroo HAYASHI	:lineno-start: 13
130001afa120SMasatake YAMATO
1301b40096fdSHadriel Kaplan	--_mtable-regex-X=comment/.//
130201afa120SMasatake YAMATO
1303b40096fdSHadriel KaplanThis pattern matches any any one character including newline; the current
1304b40096fdSHadriel Kaplanposition moves one character forward. Now the character at the current position is
130586bcb5c2SHiroo HAYASHI'``B``'. The first pattern of the table ``*/`` still does not match with the input. So
130686bcb5c2SHiroo HAYASHIctags uses next pattern again. When the current position moves to the ``*/``
1307b40096fdSHadriel Kaplanof the 3rd line of :file:`input.x`, it will finally match this:
130801afa120SMasatake YAMATO
1309d14dd918SMasatake YAMATO.. code-block:: ctags
1310a5c14cdaSHiroo HAYASHI	:linenos:
1311a5c14cdaSHiroo HAYASHI	:lineno-start: 12
131201afa120SMasatake YAMATO
1313b40096fdSHadriel Kaplan	--_mtable-regex-X=comment/\*\///{tleave}
131401afa120SMasatake YAMATO
1315b40096fdSHadriel KaplanIn this pattern, the long flag ``{tleave}`` is specified. This triggers table
131686bcb5c2SHiroo HAYASHIswitching again. ``{tleave}`` makes ctags switch the table back to the last
1317b40096fdSHadriel Kaplantable used before doing ``{tenter}``. In this case, ``toplevel`` is the table.
131886bcb5c2SHiroo HAYASHIctags manages a stack where references to tables are put. ``{tenter}`` pushes
1319b40096fdSHadriel Kaplanthe current table to the stack. ``{tleave}`` pops the table at the top of the
1320b40096fdSHadriel Kaplanstack and chooses it.
132101afa120SMasatake YAMATO
132286bcb5c2SHiroo HAYASHISo now ctags is back to the ``toplevel`` table, and tries the first regex
1323b40096fdSHadriel Kaplanof that table, which was this:
1324b40096fdSHadriel Kaplan
1325d14dd918SMasatake YAMATO.. code-block:: ctags
1326a5c14cdaSHiroo HAYASHI	:linenos:
1327a5c14cdaSHiroo HAYASHI	:lineno-start: 9
1328b40096fdSHadriel Kaplan
1329b40096fdSHadriel Kaplan	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
1330b40096fdSHadriel Kaplan
1331b40096fdSHadriel KaplanIt tries to match that against its current position, which is now the
1332b40096fdSHadriel Kaplannewline on line 3, between the ``*/`` and the word ``var``::
1333b40096fdSHadriel Kaplan
1334b40096fdSHadriel Kaplan	/* BLOCK COMMENT
1335b40096fdSHadriel Kaplan	var dont_capture_me;
1336b40096fdSHadriel Kaplan	*/ <--- ctags is now at this newline (/n) character
1337b40096fdSHadriel Kaplan	var a /* ANOTHER BLOCK COMMENT */, b;
1338b40096fdSHadriel Kaplan
1339b40096fdSHadriel KaplanThe first regex of the ``toplevel`` table does not match a newline, so it tries
1340b40096fdSHadriel Kaplanthe second regex:
1341b40096fdSHadriel Kaplan
1342d14dd918SMasatake YAMATO.. code-block:: ctags
1343a5c14cdaSHiroo HAYASHI	:linenos:
1344a5c14cdaSHiroo HAYASHI	:lineno-start: 13
1345b40096fdSHadriel Kaplan
1346b40096fdSHadriel Kaplan	--_mtable-regex-X=toplevel/.//
1347b40096fdSHadriel Kaplan
134886bcb5c2SHiroo HAYASHIThis matches a newline successfully, but has no actions to perform. So ctags
1349b40096fdSHadriel Kaplanmoves one character forward (the newline it just matched), and goes back to the
1350b40096fdSHadriel Kaplantop of the ``toplevel`` table, and tries the first regex again. Eventually we'll
1351b40096fdSHadriel Kaplanreach the beginning of the second block comment, and do the same things as before.
1352b40096fdSHadriel Kaplan
135386bcb5c2SHiroo HAYASHIWhen ctags finally reaches the end of the file (the position after ``b;``),
1354b40096fdSHadriel Kaplanit will not be able to match either the first or second regex of the
1355b40096fdSHadriel Kaplan``toplevel`` table, and quit processing the input file.
1356b40096fdSHadriel Kaplan
1357b40096fdSHadriel KaplanSo far, we've successfully skipped over block comments for our new ``X``
135886bcb5c2SHiroo HAYASHIlanguage, but haven't generated any tags. The point of ctags is to generate
1359b40096fdSHadriel Kaplantags, not just keep your computer warm. So now let's move onto actually tagging
1360b40096fdSHadriel Kaplanvariables...
136101afa120SMasatake YAMATO
136201afa120SMasatake YAMATO
136301afa120SMasatake YAMATOCapturing variables in a sequence
136401afa120SMasatake YAMATO......................................................................
136501afa120SMasatake YAMATO
1366b40096fdSHadriel KaplanHere is the 4th version of :file:`X.ctags`:
136701afa120SMasatake YAMATO
1368d14dd918SMasatake YAMATO.. code-block:: ctags
1369a5c14cdaSHiroo HAYASHI	:emphasize-lines: 10,16-19
1370a5c14cdaSHiroo HAYASHI	:linenos:
137101afa120SMasatake YAMATO
137201afa120SMasatake YAMATO	--langdef=X
137301afa120SMasatake YAMATO	--map-X=.x
137401afa120SMasatake YAMATO	--kinddef-X=v,var,variables
137501afa120SMasatake YAMATO
137601afa120SMasatake YAMATO	--_tabledef-X=toplevel
137701afa120SMasatake YAMATO	--_tabledef-X=comment
137801afa120SMasatake YAMATO	--_tabledef-X=vars
137901afa120SMasatake YAMATO
138001afa120SMasatake YAMATO	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
138101afa120SMasatake YAMATO	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
138201afa120SMasatake YAMATO	--_mtable-regex-X=toplevel/.//
138301afa120SMasatake YAMATO
138401afa120SMasatake YAMATO	--_mtable-regex-X=comment/\*\///{tleave}
138501afa120SMasatake YAMATO	--_mtable-regex-X=comment/.//
138601afa120SMasatake YAMATO
138701afa120SMasatake YAMATO	--_mtable-regex-X=vars/;//{tleave}
138801afa120SMasatake YAMATO	--_mtable-regex-X=vars/\/\*//{tenter=comment}
138901afa120SMasatake YAMATO	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
139001afa120SMasatake YAMATO	--_mtable-regex-X=vars/.//
139101afa120SMasatake YAMATO
1392b40096fdSHadriel KaplanOne pattern in ``toplevel`` was added, and a new table ``vars`` with four
1393b40096fdSHadriel Kaplanpatterns was also added.
139401afa120SMasatake YAMATO
1395b40096fdSHadriel KaplanThe new regex in ``toplevel`` is this:
139601afa120SMasatake YAMATO
1397d14dd918SMasatake YAMATO.. code-block:: ctags
1398a5c14cdaSHiroo HAYASHI	:linenos:
1399a5c14cdaSHiroo HAYASHI	:lineno-start: 10
140001afa120SMasatake YAMATO
1401b40096fdSHadriel Kaplan	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
140201afa120SMasatake YAMATO
1403b40096fdSHadriel KaplanThe purpose of this being in `toplevel` is to switch to the `vars` table when
1404b40096fdSHadriel Kaplanthe keyword ``var`` is found in the input stream. We need to switch states
1405b40096fdSHadriel Kaplan(i.e., tables) because we can't simply capture the variables ``a`` and ``b``
1406b40096fdSHadriel Kaplanwith a single regex pattern in the ``toplevel`` table, because there might be
1407b40096fdSHadriel Kaplanblock comments inside the ``var`` statement (as there are in our
1408b40096fdSHadriel Kaplan:file:`input.x`), and we also need to create *two* tags: one for ``a`` and one
1409b40096fdSHadriel Kaplanfor ``b``, even though the word ``var`` only appears once. In other words, we
1410b40096fdSHadriel Kaplanneed to "remember" that we saw the keyword ``var``, when we later encounter the
1411b40096fdSHadriel Kaplannames ``a`` and ``b``, so that we know to tag each of them; and saving that
1412b40096fdSHadriel Kaplan"in-variable-statement" state is accomplished by switching tables to the
1413b40096fdSHadriel Kaplan``vars`` table.
141401afa120SMasatake YAMATO
1415b40096fdSHadriel KaplanThe first regex in our new ``vars`` table is:
141601afa120SMasatake YAMATO
1417d14dd918SMasatake YAMATO.. code-block:: ctags
1418a5c14cdaSHiroo HAYASHI	:linenos:
1419a5c14cdaSHiroo HAYASHI	:lineno-start: 16
142001afa120SMasatake YAMATO
1421b40096fdSHadriel Kaplan	--_mtable-regex-X=vars/;//{tleave}
1422b40096fdSHadriel Kaplan
142386bcb5c2SHiroo HAYASHIThis pattern is used to match a single semi-colon '``;``', and if it matches
1424b40096fdSHadriel Kaplanpop back to the ``toplevel`` table using the ``{tleave}`` long flag. We
1425b40096fdSHadriel Kaplandidn't have to make this the first regex pattern, because it doesn't overlap
1426b40096fdSHadriel Kaplanwith any of the other ones other than the ``/.//`` last one (which must be
1427b40096fdSHadriel Kaplanlast for this example to work).
1428b40096fdSHadriel Kaplan
1429b40096fdSHadriel KaplanThe second regex in our ``vars`` table is:
1430b40096fdSHadriel Kaplan
1431d14dd918SMasatake YAMATO.. code-block:: ctags
1432a5c14cdaSHiroo HAYASHI	:linenos:
1433a5c14cdaSHiroo HAYASHI	:lineno-start: 17
1434b40096fdSHadriel Kaplan
1435b40096fdSHadriel Kaplan	--_mtable-regex-X=vars/\/\*//{tenter=comment}
1436b40096fdSHadriel Kaplan
1437b40096fdSHadriel KaplanWe need this because block comments can be in variable definitions::
143801afa120SMasatake YAMATO
143901afa120SMasatake YAMATO   var a /* ANOTHER BLOCK COMMENT */, b;
144001afa120SMasatake YAMATO
1441b40096fdSHadriel KaplanSo to skip block comments in such a position, the pattern ``\/\*`` is used just
1442b40096fdSHadriel Kaplanlike it was used in the ``toplevel`` table: to find the literal ``/*`` beginning
1443b40096fdSHadriel Kaplanof the block comment and enter the ``comment`` table. Because we're using
1444b40096fdSHadriel Kaplan``{tenter}`` and ``{tleave}`` to push/pop from a stack of tables, we can
1445b40096fdSHadriel Kaplanuse the same ``comment`` table for both ``toplevel`` and ``vars`` to go to,
144686bcb5c2SHiroo HAYASHIbecause ctags will *remember* the previous table and ``{tleave}`` will
1447b40096fdSHadriel Kaplanpop back to the right one.
144801afa120SMasatake YAMATO
1449b40096fdSHadriel KaplanThe third regex in our ``vars`` table is:
145001afa120SMasatake YAMATO
1451d14dd918SMasatake YAMATO.. code-block:: ctags
1452a5c14cdaSHiroo HAYASHI	:linenos:
1453a5c14cdaSHiroo HAYASHI	:lineno-start: 18
145401afa120SMasatake YAMATO
1455b40096fdSHadriel Kaplan	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
145601afa120SMasatake YAMATO
1457b40096fdSHadriel KaplanThis is nothing special, but is the one that actually tags something: it
1458b40096fdSHadriel Kaplancaptures the variable name and uses it for generating a ``variable`` (shorthand
1459b40096fdSHadriel Kaplan``v``) tag kind.
1460b40096fdSHadriel Kaplan
1461b40096fdSHadriel KaplanThe last regex in the ``vars`` table we've seen before:
1462b40096fdSHadriel Kaplan
1463d14dd918SMasatake YAMATO.. code-block:: ctags
1464a5c14cdaSHiroo HAYASHI	:linenos:
1465a5c14cdaSHiroo HAYASHI	:lineno-start: 19
1466b40096fdSHadriel Kaplan
1467b40096fdSHadriel Kaplan	--_mtable-regex-X=vars/.//
1468b40096fdSHadriel Kaplan
146986bcb5c2SHiroo HAYASHIThis makes ctags ignore any other characters, such as whitespace or the
147086bcb5c2SHiroo HAYASHIcomma '``,``'.
147101afa120SMasatake YAMATO
147201afa120SMasatake YAMATO
1473b40096fdSHadriel KaplanRunning our example
147401afa120SMasatake YAMATO......................................................................
147501afa120SMasatake YAMATO
147601afa120SMasatake YAMATO.. code-block:: console
147701afa120SMasatake YAMATO
147801afa120SMasatake YAMATO	$ cat input.x
147901afa120SMasatake YAMATO	/* BLOCK COMMENT
148001afa120SMasatake YAMATO	var dont_capture_me;
148101afa120SMasatake YAMATO	*/
148201afa120SMasatake YAMATO	var a /* ANOTHER BLOCK COMMENT */, b;
148301afa120SMasatake YAMATO
148401afa120SMasatake YAMATO	$ u-ctags -o - --fields=+n --options=X.ctags input.x
148501afa120SMasatake YAMATO	u-ctags -o - --fields=+n --options=X.ctags input.x
148601afa120SMasatake YAMATO	a	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4
148701afa120SMasatake YAMATO	b	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4
148801afa120SMasatake YAMATO
1489b40096fdSHadriel KaplanIt works!
149001afa120SMasatake YAMATO
1491b40096fdSHadriel KaplanYou can find additional examples of multi-table regex in our github repo, under
1492b40096fdSHadriel Kaplanthe ``optlib`` directory. For example ``puppetManifest.ctags`` is a serious
1493b40096fdSHadriel Kaplanexample. It is the primary parser for testing multi-table regex parsers, and
149486bcb5c2SHiroo HAYASHIused in the actual ctags program for parsing puppet manifest files.
149501afa120SMasatake YAMATO
149601afa120SMasatake YAMATO
14973f73955fSMasatake YAMATO.. _guest-regex-flag:
14983f73955fSMasatake YAMATO
1499b45a42b3SHiroo HAYASHIScheduling a guest parser with ``_guest`` regex flag
15003f73955fSMasatake YAMATO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15013f73955fSMasatake YAMATO.. NOT REVIEWED YET
15023f73955fSMasatake YAMATO
150386bcb5c2SHiroo HAYASHIWith ``_guest`` regex flag, you can run a parser (a guest parser) on an
15043f73955fSMasatake YAMATOarea of the current input file.
1505b45a42b3SHiroo HAYASHISee ":ref:`host-guest-parsers`" about the concept of the guest parser.
15063f73955fSMasatake YAMATO
15073cd8570eSHiroo HAYASHIThe ``_guest`` regex flag specifies a *guest spec*, and attaches it to
15083f73955fSMasatake YAMATOthe associated regex pattern.
15093f73955fSMasatake YAMATO
151086bcb5c2SHiroo HAYASHIA guest spec has three fields: *<PARSER>*, *<START>* of area, and *<END>* of area.
151186bcb5c2SHiroo HAYASHIThe ``_guest`` regex flag has following forms::
15123f73955fSMasatake YAMATO
151386bcb5c2SHiroo HAYASHI  {_guest=<PARSER>,<START>,<END>}
15143f73955fSMasatake YAMATO
15153cd8570eSHiroo HAYASHIctags maintains a data called *guest request* during parsing.  A
15163f73955fSMasatake YAMATOguest request also has three fields: `parser`, `start of area`, and
15173f73955fSMasatake YAMATO`end of area`.
15183f73955fSMasatake YAMATO
15193f73955fSMasatake YAMATOYou, a parser developer, have to fill the fields of guest specs.
152086bcb5c2SHiroo HAYASHIctags inquiries the guest spec when matching the regex pattern
15213f73955fSMasatake YAMATOassociated with it, tries to fill the fields of the guest request,
15223f73955fSMasatake YAMATOand runs a guest parser when all the fields of the guest request are
15233f73955fSMasatake YAMATOfilled.
15243f73955fSMasatake YAMATO
15253cd8570eSHiroo HAYASHIIf you use `Multi-line pattern match`_ to define a host parser,
15263cd8570eSHiroo HAYASHIyou must specify all the fields of `guest request`.
15273cd8570eSHiroo HAYASHI
15283cd8570eSHiroo HAYASHIOn the other hand if you don't use `Multi-line pattern match`_ to define a host parser,
152986bcb5c2SHiroo HAYASHIctags can fill fields of `guest request` incrementally; more than
15303f73955fSMasatake YAMATOone guest specs are used to fill the fields. In other words, you can
15313cd8570eSHiroo HAYASHImake some of the fields of a guest spec empty.
15323f73955fSMasatake YAMATO
153386bcb5c2SHiroo HAYASHIThe *<PARSER>* field of ``_guest`` regex flag
15343f73955fSMasatake YAMATO......................................................................
153586bcb5c2SHiroo HAYASHIFor *<PARSER>*, you can specify one of the following items:
15363f73955fSMasatake YAMATO
15373f73955fSMasatake YAMATOa name of a parser
15383f73955fSMasatake YAMATO
15393f73955fSMasatake YAMATO	If you know the guest parser you want to run before parsing
1540*6024deefSMasatake YAMATO	the input file, specify the name of the parser. Aliases of parsers
1541*6024deefSMasatake YAMATO	are also considered when finding a parser for the name.
15423f73955fSMasatake YAMATO
15433f73955fSMasatake YAMATO	An example of running C parser as a guest parser::
15443f73955fSMasatake YAMATO
15453f73955fSMasatake YAMATO		{_guest=C,...
15463f73955fSMasatake YAMATO
154786bcb5c2SHiroo HAYASHIthe group number of a regex pattern started from '``\``' (backslash)
15483f73955fSMasatake YAMATO
15493f73955fSMasatake YAMATO	If a parser name appears in an input file, write a regex pattern
15503f73955fSMasatake YAMATO	to capture the name.  Specify the group number where the name is
155186bcb5c2SHiroo HAYASHI	stored to the parser.  In such case, use '``\``' as the prefix for
1552*6024deefSMasatake YAMATO	the number. Aliases of parsers are also considered when finding
1553*6024deefSMasatake YAMATO	a parser for the name.
15543f73955fSMasatake YAMATO
15553f73955fSMasatake YAMATO	Let's see an example. Git Flavor Markdown (GFM) is a language for
15563f73955fSMasatake YAMATO	documentation. It provides a notation for quoting a snippet of
15573f73955fSMasatake YAMATO	program code; the language treats the area started from ``~~~`` to
15583f73955fSMasatake YAMATO	``~~~`` as a snippet. You can specify a programming language of
15593f73955fSMasatake YAMATO	the snippet with starting the area with
156086bcb5c2SHiroo HAYASHI	``~~~<THE_NAME_OF_LANGUAGE>``, like ``~~~C`` or ``~~~Java``.
15613f73955fSMasatake YAMATO
15623f73955fSMasatake YAMATO	To run a guest parser on the area, you have to capture the
1563a5c14cdaSHiroo HAYASHI	*<THE_NAME_OF_LANGUAGE>* with a regex pattern:
1564a5c14cdaSHiroo HAYASHI
1565a5c14cdaSHiroo HAYASHI	.. code-block:: ctags
15663f73955fSMasatake YAMATO
15673f73955fSMasatake YAMATO		--_mtable-regex-Markdown=main/~~~([a-zA-Z0-9][-#+a-zA-Z0-9]*)[\n]//{_guest=\1,0end,}
15683f73955fSMasatake YAMATO
15693f73955fSMasatake YAMATO	The pattern captures the language name in the input file with the
157086bcb5c2SHiroo HAYASHI	regex group 1, and specify it to *<PARSER>*::
15713f73955fSMasatake YAMATO
15723f73955fSMasatake YAMATO		{guest=\1,...
15733f73955fSMasatake YAMATO
157486bcb5c2SHiroo HAYASHIthe group number of a regex pattern started from '``*``' (asterisk)
15753f73955fSMasatake YAMATO
15763f73955fSMasatake YAMATO	If a file name implying a programming language appears in an input
15773f73955fSMasatake YAMATO	file, capture the file name with the regex pattern where the guest
157886bcb5c2SHiroo HAYASHI	spec attaches to. ctags tries to find a proper parser for the
15793f73955fSMasatake YAMATO	file name by inquiring the langmap.
15803f73955fSMasatake YAMATO
158186bcb5c2SHiroo HAYASHI	Use '``*``' as the prefix to the number for specifying the group of
15823f73955fSMasatake YAMATO	the regex pattern that captures the file name.
15833f73955fSMasatake YAMATO
15843f73955fSMasatake YAMATO	Let's see an example. Consider you have a shell script that emits
15853cd8570eSHiroo HAYASHI	a program code instantiated from one of the templates. Here documents
15863f73955fSMasatake YAMATO	are used to represent the templates like:
15873f73955fSMasatake YAMATO
15883f73955fSMasatake YAMATO	.. code-block:: sh
15893f73955fSMasatake YAMATO
15903f73955fSMasatake YAMATO		i=...
15913f73955fSMasatake YAMATO		cat > foo.c <<EOF
15923f73955fSMasatake YAMATO			int main (void) { return $i; }
15933f73955fSMasatake YAMATO		EOF
15943f73955fSMasatake YAMATO
15953f73955fSMasatake YAMATO		cat > foo.el <<EOF
15963f73955fSMasatake YAMATO			(defun foo () (1+ $i))
15973f73955fSMasatake YAMATO		EOF
15983f73955fSMasatake YAMATO
15993f73955fSMasatake YAMATO	To run guest parsers for the here document areas, the shell
16003f73955fSMasatake YAMATO	script parser of ctags must choose the parsers from the file
1601a5c14cdaSHiroo HAYASHI	names (``foo.c`` and ``foo.el``):
1602a5c14cdaSHiroo HAYASHI
1603a5c14cdaSHiroo HAYASHI	.. code-block:: ctags
16043f73955fSMasatake YAMATO
16053f73955fSMasatake YAMATO		--regex-sh=/cat > ([a-z.]+) <<EOF//{_guest=*1,0end,}
16063f73955fSMasatake YAMATO
16073f73955fSMasatake YAMATO	The pattern captures the file name in the input file with the
160886bcb5c2SHiroo HAYASHI	regex group 1, and specify it to *<PARSER>*::
16093f73955fSMasatake YAMATO
16103f73955fSMasatake YAMATO	   {_guest=*1,...
16113f73955fSMasatake YAMATO
161286bcb5c2SHiroo HAYASHIThe *<START>* and *<END>* fields of `_guest` regex flag
16133f73955fSMasatake YAMATO......................................................................
16143f73955fSMasatake YAMATO
161586bcb5c2SHiroo HAYASHIThe *<START>* and *<END>* fields specify the area the *<PARSER>* parses.  *<START>*
161686bcb5c2SHiroo HAYASHIspecifies the start of the area. *<END>* specifies the end of the area.
16173f73955fSMasatake YAMATO
16183f73955fSMasatake YAMATOThe forms of the two fields are the same: a regex group number
161986bcb5c2SHiroo HAYASHIfollowed by ``start`` or ``end``. e.g. ``3start``, ``0end``.  The suffixes,
162086bcb5c2SHiroo HAYASHI``start`` and ``end``, represents one of two boundaries of the group.
16213f73955fSMasatake YAMATO
162286bcb5c2SHiroo HAYASHILet's see an example::
16233f73955fSMasatake YAMATO
16243f73955fSMasatake YAMATO	{_guest=C,2end,3start}
16253f73955fSMasatake YAMATO
16263f73955fSMasatake YAMATOThis guest regex flag means running C parser on the area between
162786bcb5c2SHiroo HAYASHI``2end`` and ``3start``. ``2end`` means the area starts from the end of
162886bcb5c2SHiroo HAYASHImatching of the 2nd regex group associated with the flag. ``3start``
16293f73955fSMasatake YAMATOmeans the area ends at the beginning of matching of the 3rd regex
16303f73955fSMasatake YAMATOgroup associated with the flag.
16313f73955fSMasatake YAMATO
16323f73955fSMasatake YAMATOLet's more realistic example.
16333cd8570eSHiroo HAYASHIHere is an optlib file for an imaginary language `single`:
16343f73955fSMasatake YAMATO
1635d14dd918SMasatake YAMATO.. code-block:: ctags
1636a5c14cdaSHiroo HAYASHI	:emphasize-lines: 3
1637d14dd918SMasatake YAMATO
16383f73955fSMasatake YAMATO	--langdef=single
16393f73955fSMasatake YAMATO	--map-single=.single
16403f73955fSMasatake YAMATO	--regex-single=/^(BEGIN_C<).*(>END_C)$//{_guest=C,1end,2start}
16413f73955fSMasatake YAMATO
164286bcb5c2SHiroo HAYASHIThis parser can run C parser and extract ``main`` function from the
16433f73955fSMasatake YAMATOfollowing input file::
16443f73955fSMasatake YAMATO
16453f73955fSMasatake YAMATO	BEGIN_C<int main (int argc, char **argv) { return 0; }>END_C
16463f73955fSMasatake YAMATO	        ^                                             ^
16473f73955fSMasatake YAMATO	         `- "1end" points here.                       |
16483f73955fSMasatake YAMATO	                               "2start" points here. -+
16493f73955fSMasatake YAMATO
1650b45a42b3SHiroo HAYASHI.. NOT REVIEWED YET
16510f3a04d2SHiroo HAYASHI
16520f3a04d2SHiroo HAYASHI.. _defining-subparsers:
16530f3a04d2SHiroo HAYASHI
16540f3a04d2SHiroo HAYASHIDefining a subparser
16550f3a04d2SHiroo HAYASHI~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16560f3a04d2SHiroo HAYASHI
16570f3a04d2SHiroo HAYASHIBasic
16580f3a04d2SHiroo HAYASHI.........................................................................
16590f3a04d2SHiroo HAYASHI
166086bcb5c2SHiroo HAYASHIAbout the concept of subparser, see ":ref:`base-sub-parsers`".
16610f3a04d2SHiroo HAYASHI
16623cd8570eSHiroo HAYASHI``--langdef=<LANG>`` option is extended as
16633cd8570eSHiroo HAYASHI``--langdef=<LANG>[{base=<LANG>}[{shared|dedicated|bidirectional}]][{_autoFQTag}]`` to define
16640f3a04d2SHiroo HAYASHIa subparser for a specified base parser. Combining with ``--kinddef-<LANG>``
16650f3a04d2SHiroo HAYASHIand ``--regex-<KIND>`` options, you can extend an existing parser
16660f3a04d2SHiroo HAYASHIwithout risk of kind confliction.
16670f3a04d2SHiroo HAYASHI
16680f3a04d2SHiroo HAYASHILet's see an example.
16690f3a04d2SHiroo HAYASHI
16700f3a04d2SHiroo HAYASHIinput.c
16710f3a04d2SHiroo HAYASHI
16720f3a04d2SHiroo HAYASHI.. code-block:: C
16730f3a04d2SHiroo HAYASHI
16740f3a04d2SHiroo HAYASHI    static int set_one_prio(struct task_struct *p, int niceval, int error)
16750f3a04d2SHiroo HAYASHI    {
16760f3a04d2SHiroo HAYASHI    }
16770f3a04d2SHiroo HAYASHI
16780f3a04d2SHiroo HAYASHI    SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
16790f3a04d2SHiroo HAYASHI    {
16800f3a04d2SHiroo HAYASHI	    ...;
16810f3a04d2SHiroo HAYASHI    }
16820f3a04d2SHiroo HAYASHI
16830f3a04d2SHiroo HAYASHI.. code-block:: console
16840f3a04d2SHiroo HAYASHI
168545e335abSHiroo HAYASHI    $ ctags  -x --_xformat="%20N %10K %10l"  -o - input.c
16860f3a04d2SHiroo HAYASHI	    set_one_prio   function          C
16870f3a04d2SHiroo HAYASHI	 SYSCALL_DEFINE3   function          C
16880f3a04d2SHiroo HAYASHI
168986bcb5c2SHiroo HAYASHIC parser doesn't understand that ``SYSCALL_DEFINE3`` is a macro for defining an
16900f3a04d2SHiroo HAYASHIentry point for a system.
16910f3a04d2SHiroo HAYASHI
1692a5c14cdaSHiroo HAYASHILet's define `linux` subparser which using C parser as a base parser (``linux.ctags``):
16930f3a04d2SHiroo HAYASHI
1694a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1695a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1,3
16960f3a04d2SHiroo HAYASHI
16970f3a04d2SHiroo HAYASHI	--langdef=linux{base=C}
16980f3a04d2SHiroo HAYASHI	--kinddef-linux=s,syscall,system calls
16990f3a04d2SHiroo HAYASHI	--regex-linux=/SYSCALL_DEFINE[0-9]\(([^, )]+)[\),]*/\1/s/
17000f3a04d2SHiroo HAYASHI
17010f3a04d2SHiroo HAYASHIThe output is change as follows with `linux` parser:
17020f3a04d2SHiroo HAYASHI
17030f3a04d2SHiroo HAYASHI.. code-block:: console
1704a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2
17050f3a04d2SHiroo HAYASHI
170645e335abSHiroo HAYASHI	$ ctags --options=./linux.ctags -x --_xformat="%20N %10K %10l"  -o - input.c
17070f3a04d2SHiroo HAYASHI		 setpriority    syscall      linux
17080f3a04d2SHiroo HAYASHI		set_one_prio   function          C
17090f3a04d2SHiroo HAYASHI	     SYSCALL_DEFINE3   function          C
17100f3a04d2SHiroo HAYASHI
171186bcb5c2SHiroo HAYASHI``setpriority`` is recognized as a ``syscall`` of `linux`.
17120f3a04d2SHiroo HAYASHI
171386bcb5c2SHiroo HAYASHIUsing only ``--regex-C=...`` you can capture ``setpriority``.
17140f3a04d2SHiroo HAYASHIHowever, there were concerns about kind confliction; when introducing
171586bcb5c2SHiroo HAYASHIa new kind with ``--regex-C=...``, you cannot use a letter and name already
171686bcb5c2SHiroo HAYASHIused in C parser and ``--regex-C=...`` options specified in the other places.
17170f3a04d2SHiroo HAYASHI
17180f3a04d2SHiroo HAYASHIYou can use a newly defined subparser as a new namespace of kinds.
17190f3a04d2SHiroo HAYASHIIn addition you can enable/disable with the subparser usable
172086bcb5c2SHiroo HAYASHI``--languages=[+|-]`` option:
17210f3a04d2SHiroo HAYASHI
17220f3a04d2SHiroo HAYASHI.. code-block::console
17230f3a04d2SHiroo HAYASHI
172445e335abSHiroo HAYASHI    $ ctags --options=./linux.ctags --languages=-linux -x --_xformat="%20N %10K %10l"  -o - input.c
17250f3a04d2SHiroo HAYASHI	    set_one_prio   function          C
17260f3a04d2SHiroo HAYASHI	 SYSCALL_DEFINE3   function          C
17270f3a04d2SHiroo HAYASHI
1728eb56edb2SHiroo HAYASHI.. _optlib_directions:
1729eb56edb2SHiroo HAYASHI
1730eb56edb2SHiroo HAYASHIDirection flags
17310f3a04d2SHiroo HAYASHI.........................................................................
17320f3a04d2SHiroo HAYASHI
1733755aeae5SMasatake YAMATO.. TESTCASE: Units/flags-langdef-directions.r
1734755aeae5SMasatake YAMATO
1735eb56edb2SHiroo HAYASHIAs explained in ":ref:`multiple_parsers_directions`" in
1736eb56edb2SHiroo HAYASHI":ref:`multiple_parsers`", you can choose direction(s) how a base parser and a
1737eb56edb2SHiroo HAYASHIguest parser work together with direction flags.
17380f3a04d2SHiroo HAYASHI
1739eb56edb2SHiroo HAYASHIThe following examples are taken from `#1409
17400f3a04d2SHiroo HAYASHI<https://github.com/universal-ctags/ctags/issues/1409>`_ submitted by @sgraham on
17410f3a04d2SHiroo HAYASHIgithub Universal Ctags repository.
17420f3a04d2SHiroo HAYASHI
174386bcb5c2SHiroo HAYASHI``input.cc`` and ``input.mojom`` are input files, and have the same
17440f3a04d2SHiroo HAYASHIcontents::
17450f3a04d2SHiroo HAYASHI
17460f3a04d2SHiroo HAYASHI	ABC();
17470f3a04d2SHiroo HAYASHI	int main(void)
17480f3a04d2SHiroo HAYASHI	{
17490f3a04d2SHiroo HAYASHI	}
17500f3a04d2SHiroo HAYASHI
175186bcb5c2SHiroo HAYASHIC++ parser can capture ``main`` as a function. `Mojom` subparser defined in the
175286bcb5c2SHiroo HAYASHIlater runs on C++ parser and is for capturing ``ABC``.
17530f3a04d2SHiroo HAYASHI
17540f3a04d2SHiroo HAYASHIshared combination
1755a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
175686bcb5c2SHiroo HAYASHI``{shared}`` is specified, for ``input.cc``, both tags capture by C++ parser
175786bcb5c2SHiroo HAYASHIand mojom parser are recorded to tags file. For ``input.mojom``, only
17580f3a04d2SHiroo HAYASHItags captured by mojom parser are recorded to tags file.
17590f3a04d2SHiroo HAYASHI
17600f3a04d2SHiroo HAYASHImojom-shared.ctags:
17610f3a04d2SHiroo HAYASHI
17620f3a04d2SHiroo HAYASHI.. code-block:: ctags
1763a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
17640f3a04d2SHiroo HAYASHI
17650f3a04d2SHiroo HAYASHI	--langdef=mojom{base=C++}{shared}
17660f3a04d2SHiroo HAYASHI	--map-mojom=+.mojom
17670f3a04d2SHiroo HAYASHI	--kinddef-mojom=f,function,functions
17680f3a04d2SHiroo HAYASHI	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
17690f3a04d2SHiroo HAYASHI
1770a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1771a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2
17720f3a04d2SHiroo HAYASHI
1773a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-shared.ctags --fields=+l -o - input.cc
17740f3a04d2SHiroo HAYASHI	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
17750f3a04d2SHiroo HAYASHI	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
17760f3a04d2SHiroo HAYASHI
1777a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1778a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2
17790f3a04d2SHiroo HAYASHI
1780a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-shared.ctags --fields=+l -o - input.mojom
17810f3a04d2SHiroo HAYASHI	ABC	input.mojom	/^ ABC();$/;"	f	language:mojom
17820f3a04d2SHiroo HAYASHI
17830f3a04d2SHiroo HAYASHIMojom parser uses C++ parser internally but tags captured by C++ parser are
17840f3a04d2SHiroo HAYASHIdropped in the output.
17850f3a04d2SHiroo HAYASHI
17860f3a04d2SHiroo HAYASHIdedicated combination
1787a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
178886bcb5c2SHiroo HAYASHI``{dedicated}`` is specified, for ``input.cc``, only tags capture by C++
178986bcb5c2SHiroo HAYASHIparser are recorded to tags file. For ``input.mojom``, both tags capture
17900f3a04d2SHiroo HAYASHIby C++ parser and mojom parser are recorded to tags file.
17910f3a04d2SHiroo HAYASHI
17920f3a04d2SHiroo HAYASHImojom-dedicated.ctags:
17930f3a04d2SHiroo HAYASHI
17940f3a04d2SHiroo HAYASHI.. code-block:: ctags
1795a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
17960f3a04d2SHiroo HAYASHI
17970f3a04d2SHiroo HAYASHI	--langdef=mojom{base=C++}{dedicated}
17980f3a04d2SHiroo HAYASHI	--map-mojom=+.mojom
17990f3a04d2SHiroo HAYASHI	--kinddef-mojom=f,function,functions
18000f3a04d2SHiroo HAYASHI	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
18010f3a04d2SHiroo HAYASHI
1802a5c14cdaSHiroo HAYASHI.. code-block:: ctags
18030f3a04d2SHiroo HAYASHI
1804a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-dedicated.ctags --fields=+l -o - input.cc
18050f3a04d2SHiroo HAYASHI	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
18060f3a04d2SHiroo HAYASHI
1807a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1808a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2-3
18090f3a04d2SHiroo HAYASHI
1810a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-dedicated.ctags --fields=+l -o - input.mojom
18110f3a04d2SHiroo HAYASHI	ABC	input.mojom	/^ ABC();$/;"	f	language:mojom
18120f3a04d2SHiroo HAYASHI	main	input.mojom	/^int main(void)$/;"	f	language:C++	typeref:typename:int
18130f3a04d2SHiroo HAYASHI
181486bcb5c2SHiroo HAYASHIMojom parser works only when ``.mojom`` file is given as input.
18150f3a04d2SHiroo HAYASHI
18160f3a04d2SHiroo HAYASHIbidirectional combination
1817a60d2470SHiroo HAYASHI^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
181886bcb5c2SHiroo HAYASHI``{bidirectional}`` is specified, both tags capture by C++ parser and
181986bcb5c2SHiroo HAYASHImojom parser are recorded to tags file for either input ``input.cc`` and
182086bcb5c2SHiroo HAYASHI``input.mojom``.
18210f3a04d2SHiroo HAYASHI
18220f3a04d2SHiroo HAYASHImojom-bidirectional.ctags:
18230f3a04d2SHiroo HAYASHI
18240f3a04d2SHiroo HAYASHI.. code-block:: ctags
1825a5c14cdaSHiroo HAYASHI	:emphasize-lines: 1
18260f3a04d2SHiroo HAYASHI
18270f3a04d2SHiroo HAYASHI	--langdef=mojom{base=C++}{bidirectional}
18280f3a04d2SHiroo HAYASHI	--map-mojom=+.mojom
18290f3a04d2SHiroo HAYASHI	--kinddef-mojom=f,function,functions
18300f3a04d2SHiroo HAYASHI	--regex-mojom=/^[ ]+([a-zA-Z]+)\(/\1/f/
18310f3a04d2SHiroo HAYASHI
1832a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1833a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2
18340f3a04d2SHiroo HAYASHI
1835a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-bidirectional.ctags --fields=+l -o - input.cc
18360f3a04d2SHiroo HAYASHI	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
18370f3a04d2SHiroo HAYASHI	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
18380f3a04d2SHiroo HAYASHI
1839a5c14cdaSHiroo HAYASHI.. code-block:: ctags
1840a5c14cdaSHiroo HAYASHI	:emphasize-lines: 2-3
18410f3a04d2SHiroo HAYASHI
1842a5c14cdaSHiroo HAYASHI	$ ctags --options=mojom-bidirectional.ctags --fields=+l -o - input.mojom
18430f3a04d2SHiroo HAYASHI	ABC	input.cc	/^ ABC();$/;"	f	language:mojom
18440f3a04d2SHiroo HAYASHI	main	input.cc	/^int main(void)$/;"	f	language:C++	typeref:typename:int
18450f3a04d2SHiroo HAYASHI
18460f3a04d2SHiroo HAYASHI
1847e30940dcSHiroo HAYASHI.. _optlib2c:
1848e30940dcSHiroo HAYASHI
1849e30940dcSHiroo HAYASHITranslating an option file into C source code (optlib2c)
1850e30940dcSHiroo HAYASHI~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1851e30940dcSHiroo HAYASHIUniversal Ctags has an ``optlib2c`` script that translates an option file into C
1852e30940dcSHiroo HAYASHIsource code. Your optlib parser can thus easily become a built-in parser.
1853e30940dcSHiroo HAYASHI
185486bcb5c2SHiroo HAYASHITo add your optlib file, ``foo.ctags``, into ctags do the following steps;
1855e30940dcSHiroo HAYASHI
1856e30940dcSHiroo HAYASHI* copy ``foo.ctags`` file on ``optlib/`` directory
185779059629SHiroo HAYASHI* add ``foo.ctags`` on ``OPTLIB2C_INPUT`` variable in ``source.mak``
1858e30940dcSHiroo HAYASHI* add ``fooParser`` on ``PARSER_LIST`` macro variable in ``main/parser_p.h``
1859e30940dcSHiroo HAYASHI
1860e30940dcSHiroo HAYASHIYou are encouraged to submit your :file:`.ctags` file to our repository on
186186bcb5c2SHiroo HAYASHIgithub through a pull request. See ":ref:`contributions`" for more details.
1862