xref: /Universal-ctags/man/ctags-optlib.7.rst.in (revision f998e51db8f0c1074d073a1199b0df74db9794e6)
1.. _ctags-optlib(7):
2
3==============================================================
4ctags-optlib
5==============================================================
6--------------------------------------------------------------
7Universal Ctags parser definition language
8--------------------------------------------------------------
9:Version: @VERSION@
10:Manual group: Universal Ctags
11:Manual section: 7
12
13SYNOPSIS
14--------
15|	**@CTAGS_NAME_EXECUTABLE@** [options] [file(s)]
16|	**@ETAGS_NAME_EXECUTABLE@** [options] [file(s)]
17
18DESCRIPTION
19-----------
20
21*Exuberant Ctags*, the ancestor of *Universal Ctags*, has provided
22the way to define a new parser from command line.  Universal Ctags
23extends and refines this feature. *optlib parser* is the name for such
24parser in Universal Ctags. "opt" intends a parser is defined with
25combination of command line options. "lib" intends an optlib parser
26can be more than ad-hoc personal configuration.
27
28This man page is for people who want to define an optlib parser. The
29readers should read ctags(1) of Universal Ctags first.
30
31Following options are for defining (or customizing) a parser:
32
33* ``--langdef=<name>``
34* ``--map-<LANG>=[+|-]<extension>|<pattern>``
35* ``--kinddef-<LANG>=<letter>,<name>,<description>``
36* ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
37* ``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
38
39Following options are for controlling loading parser definition:
40
41* ``--options=<pathname>``
42* ``--options-maybe=<pathname>``
43* ``--optlib-dir=[+]<directory>``
44
45The design of options and notations for defining a parser in
46Exuberant Ctags may focus on reducing the number of typing by user.
47Reducing the number of typing is important for users who want to
48define (or customize) a parser quickly.
49
50On the other hand, the design in Universal Ctags focuses on
51maintainability. The notation of Universal Ctags is redundant than
52that of Exuberant Ctags; the newly introduced kind should be declared
53explicitly, (long) names are approved than one-letter flags
54specifying kinds, and naming rules are stricter.
55
56This man page explains only stable options and flags.  Universal Ctags
57also introduces experimental options and flags which have names starting
58with ``_``. For documentation on these options and flags, visit
59Universal Ctags web site at https://ctags.io/.
60
61
62Storing a parser definition to a file
63~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64Though it is possible to define a parser from command line, you don't
65want to type the same command line each time when you need the parser.
66You can store options for defining a parser into a file.
67
68@CTAGS_NAME_EXECUTABLE@ loads files (preload files) listed in "FILES"
69section of ctags(1) at program starting up. You can put your parser
70definition needed usually to the files.
71
72``--options=<pathname>``, ``--options-maybe=<pathname>``, and
73``--optlib-dir=[+]<directory>`` are for loading optlib files you need
74occasionally. See "Option File Options" section of ctags(1) for
75these options.
76
77As explained in "FILES" section of ctags(1), options for defining a
78parser listed line by line in an optlib file. Prefixed white spaces are
79ignored. A line starting with '#' is treated as a comment.  Escaping
80shell meta character is not needed.
81
82Use ``.ctags`` as file extension for optlib file. You can define
83multiple parsers in an optlib file but it is better to make a file for
84each parser definition.
85
86``--_echo=<msg>`` and ``--_force-quit=<num>`` options are for debugging
87optlib parser.
88
89
90Overview for defining a parser
91~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
92
931. Design the parser
94
95   You need know both the target language and the ctags'
96   concepts (definition, reference, kind, role, field, extra). About
97   the concepts, ctags(1) of Universal Ctags may help you.
98
992. Give a name to the parser
100
101   Use ``--langdef=<name>`` option. *<name>* is referred as *<LANG>* in
102   the later steps.
103
1043. Give a file pattern or file extension for activating the parser
105
106   Use ``--map-<LANG>=[+|-]<extension>|<pattern>``.
107
1084. Define kinds
109
110   Use ``--kinddef-<LANG>=<letter>,<name>,<description>`` option.
111   Universal Ctags introduces this option.  Exuberant Ctags doesn't
112   have. In Exuberant Ctags, a kind is defined as a side effect of
113   specifying ``--regex-<LANG>=`` option. So user doesn't have a
114   chance to recognize how important the definition of kind.
115
1165. Define patterns
117
118   Use ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
119   option for a single-line regular expression. You can also use
120   ``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
121   option for a multi-line regular expression.
122
123   As *<kind-spec>*, you can use the one-letter flag defined with
124   ``--kinddef-<LANG>=<letter>,<name>,<description>`` option.
125
126OPTIONS
127------------
128
129``--langdef=<name>``
130	Defines a new user-defined language, *<name>*, to be parsed with regular
131	expressions. Once defined, *<name>* may be used in other options taking
132	language names.
133
134	*<name>* must consist of alphanumeric characters, '``#``', or '``+``'
135	('[a-zA-Z0-9#+]+'). The graph characters other than '``#``' and
136	'``+``' are disallowed (or reserved). Some of them (``[-=:{.]``) are
137	disallowed because they can make the command line parser of
138	@CTAGS_NAME_EXECUTABLE@ confused. The rest of them are just
139	reserved for future extending @CTAGS_NAME_EXECUTABLE@.
140
141	``all`` is an exception.  ``all`` as *<name>* is not acceptable. It is
142	a reserved word. See the description of
143	``--kinds-(<LANG>|all)=[+|-](<kinds>|*)`` option in ctags(1) about how the
144	reserved word is used.
145
146	The names of built-in parsers are capitalized. When
147	@CTAGS_NAME_EXECUTABLE@ evaluates an option in a command line, and
148	chooses a parser, @CTAGS_NAME_EXECUTABLE@ uses the names of
149	parsers in a case-insensitive way. Therefore, giving a name
150	started from a lowercase character doesn't help you to avoid the
151	parser name confliction. However, in a tags file,
152	@CTAGS_NAME_EXECUTABLE@ prints parser names in a case-sensitive
153	way; it prints a parser name as specified in ``--langdef=<name>``
154	option.  Therefore, we recommend you to give a name started from a
155	lowercase character to your private optlib parser. With this
156	convention, people can know where a tag entry in a tag file comes
157	from a built-in parser or a private optlib parser.
158
159``--kinddef-<LANG>=<letter>,<name>,<description>``
160	Define a kind for *<LANG>*.
161	Be not confused this with ``--kinds-<LANG>``.
162
163	*<letter>* must be an alphabetical character ('[a-zA-EG-Z]')
164	other than "F". "F" has been reserved for representing a file
165	since Exuberant Ctags.
166
167	*<name>* must start with an alphabetic character, and the rest
168	must  be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use
169	"file" as *<name>*. It has been reserved for representing a file
170	since Exuberant Ctags.
171
172	Note that using a number character in a *<name>* violates the
173	version 2 of tags file format though @CTAGS_NAME_EXECUTABLE@
174	accepts it. For more detail, see tags(5).
175
176	*<description>* comes from any printable ASCII characters. The
177	exception is ``{`` and ``\``. ``{`` is reserved for adding flags
178	this option in the future. So put ``\`` before ``{`` to include
179	``{`` to a description. To include ``\`` itself to a description,
180	put ``\`` before ``\``.
181
182	Both *<letter>*, *<name>* and their combination must be unique in
183	a *<LANG>*.
184
185	This option is newly introduced in Universal Ctags.  This option
186	reduces the typing defining a regex pattern with
187	``--regex-<LANG>=``, and keeps the consistency of kind
188	definitions in a language.
189
190	The *<letter>* can be used as an argument for ``--kinds-<LANG>``
191	option to enable or disable the kind. Unless ``K`` field is
192	enabled, the *<letter>* is used as value in the "kind" extension
193	field in tags output.
194
195	The *<name>* surrounded by braces can be used as an argument for
196	``--kind-<LANG>`` option. If ``K`` field is enabled, the *<name>*
197	is used as value in the "kind" extension field in tags output.
198
199	The *<description>* and *<letter>* are listed in ``--list-kinds``
200	output. All three elements of the kind-spec are listed in
201	``--list-kinds-full`` output. Don't use braces in the
202	*<description>*. They will be used meta characters in the future.
203
204``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
205	Define a single-line regular expression.
206
207	The */<line_pattern>/<name_pattern>/* pair defines a regular expression
208	replacement pattern, similar in style to ``sed`` substitution
209	commands, ``s/regexp/replacement/``, with which to generate tags from source files mapped to
210	the named language, *<LANG>*, (case-insensitive; either a built-in
211	or user-defined language).
212
213	The regular expression, *<line_pattern>*, defines
214	an extended regular expression (roughly that used by egrep(1)),
215	which is used to locate a single source line containing a tag and
216	may specify tab characters using ``\t``.
217
218	When a matching line is
219	found, a tag will be generated for the name defined by
220	*<name_pattern>*, which generally will contain the special
221	back-references ``\1`` through ``\9`` to refer to matching sub-expression
222	groups within *<line_pattern>*.
223
224	The '``/``' separator characters shown in the
225	parameter to the option can actually be replaced by any
226	character. Note that whichever separator character is used will
227	have to be escaped with a backslash ('``\``') character wherever it is
228	used in the parameter as something other than a separator. The
229	regular expression defined by this option is added to the current
230	list of regular expressions for the specified language unless the
231	parameter is omitted, in which case the current list is cleared.
232
233	Unless modified by *<flags>*, *<line_pattern>* is interpreted as a POSIX
234	extended regular expression. The *<name_pattern>* should expand for all
235	matching lines to a non-empty string of characters, or a warning
236	message will be reported unless ``{placeholder}`` regex flag is
237	specified.
238
239	A kind specifier (*<kind-spec>*) for tags matching regexp may
240	follow *<name_pattern>*, which will determine what kind of tag is
241	reported in the ``kind`` extension field (see tags(5)).
242
243	*<kind-spec>* has two forms: *one-letter form* and *full form*.
244
245	The	one-letter form in the form of ``<letter>``. It just refers a kind
246	*<letter>* defined with ``--kinddef-<LANG>``. This form is recommended in
247	Universal Ctags.
248
249	The full form of *<kind-spec>* is in the form of
250	``<letter>,<name>,<description>``. 	Either the kind *<name>* and/or the
251	*<description>* can be omitted. See the description of
252	``--kinddef-<LANG>=<letter>,<name>,<description>`` option about the
253	elements.
254
255	The full form is supported only for keeping the compatibility with Exuberant
256	Ctags which does not have ``--kinddef-<LANG>`` option. Supporting the
257	form will be removed from Universal Ctags in the future.
258
259	.. MEMO: the following line is commented out
260		If *<kind-spec>* is omitted, it defaults to ``r,regex``.
261
262	About *<flags>*, see "FLAGS FOR ``--regex-<LANG>`` OPTION".
263
264	For more information on the regular expressions used by
265	@CTAGS_NAME_EXECUTABLE@, see either the regex(5,7) man page, or
266	the GNU info documentation for regex (e.g. "``info regex``").
267
268``--list-regex-flags``
269	Lists the flags that can be used in ``--regex-<LANG>`` option.
270
271``--list-mline-regex-flags``
272	Lists the flags that can be used in ``--mline-regex-<LANG>`` option.
273
274``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
275	Define a multi-line regular expression.
276
277	This option is similar to ``--regex-<LANG>`` option except the pattern is
278	applied to the whole file’s contents, not line by line.
279
280``--_echo=<message>``
281	Print *<message>* to the standard error stream.  This is helpful to
282	understand (and debug) optlib loading feature of Universal Ctags.
283
284``--_force-quit[=<num>]``
285	Exits immediately when this option is processed.  If *<num>* is used
286	as exit status. The default is 0.  This is helpful to debug optlib
287	loading feature of Universal Ctags.
288
289
290FLAGS FOR ``--regex-<LANG>`` OPTION
291~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
292
293You can specify more than one flag, ``<letter>|{<name>}``, at the end of ``--regex-<LANG>`` to
294control how Universal Ctags uses the pattern.
295
296Exuberant Ctags uses a *<letter>* to represent a flag. In
297Universal Ctags, a *<name>* surrounded by braces (name form) can be used
298in addition to *<letter>*. The name form makes a user reading an optlib
299file easier.
300
301The most of all flags newly added in Universal Ctags
302don't have the one-letter representation. All of them have only the name
303representation. ``--list-regex-flags`` lists all the flags.
304
305``basic`` (one-letter form ``b``)
306	The pattern is interpreted as a POSIX basic regular expression.
307
308``exclusive`` (one-letter form ``x``)
309	Skip testing the other patterns if a line is matched to this
310	pattern. This is useful to avoid using CPU to parse line comments.
311
312``extend`` (one-letter form ``e``)
313	The pattern is interpreted as a POSIX extended regular
314	expression (default).
315
316``pcre2`` (one-letter form ``p``, experimental)
317	The pattern is interpreted as a PCRE2 regular expression explained
318	in pcre2syntax(3).  This flag is available only if the ctags is
319	built with ``pcre2`` library. See the output of
320	``--list-features`` option to know whether your ctags is
321	built-with ``pcre2`` or not.
322
323``icase`` (one-letter form ``i``)
324	The regular expression is to be applied in a case-insensitive
325	manner.
326
327``placeholder``
328	Don't emit a tag captured with a regex pattern.  The replacement
329	can be an empty string.  See the following description of
330	``scope=...`` flag about how this is useful.
331
332``scope=(ref|push|pop|clear|set|replace)``
333
334	Specify what to do with the internal scope stack.
335
336	A parser programmed with ``--regex-<LANG>`` has a stack (scope
337	stack) internally. You can use it for tracking scope
338	information. The ``scope=...`` flag is for manipulating and
339	utilizing the scope stack.
340
341	If ``{scope=push}`` is specified, a tag captured with
342	``--regex-<LANG>`` is pushed to the stack. ``{scope=push}``
343	implies ``{scope=ref}``.
344
345	You can fill the scope field (``scope:``) of captured tag with
346	``{scope=ref}``. If ``{scope=ref}`` flag is given,
347	@CTAGS_NAME_EXECUTABLE@ attaches the tag at the top to the tag
348	captured with ``--regex-<LANG>`` as the value for the ``scope:``
349	field.
350
351	@CTAGS_NAME_EXECUTABLE@ pops the tag at the top of the stack when
352	``--regex-<LANG>`` with ``{scope=pop}`` is matched to the input
353	line.
354
355	Specifying ``{scope=clear}`` removes all the tags in the scope.
356	Specifying ``{scope=set}`` removes all the tags in the scope, and
357	then pushes the captured tag as ``{scope=push}`` does.
358
359	``{scope=replace}`` does the three things sequentially. First it
360	does the same as ``{scope=pop}``, then fills the ``scope:`` field
361	of the tag captured with ``--regex-<LANG>``, and pushes the tag to
362	the scope stack as if ``{scope=push}`` was given finally.
363	You cannot specify another scope action together with
364	``{scope=replace}``.
365
366	You don't want to specify ``{scope=pop}{scope=push}`` as an
367	alternative to ``{scope=replace}``; ``{scope=pop}{scope=push}``
368	fills the ``scope:`` field of the tag captured with ``--regex-<LANG>``
369	first, then pops the tag at the top of the stack, and pushes
370	the captured tag to the scope stack finally. The timing when
371	filling the end field is different between ``{scope=replace}`` and
372	``{scope=pop}{scope=push}``.
373
374	In some cases, you may want to use ``--regex-<LANG>`` only for its
375	side effects: using it only to manipulate the stack but not for
376	capturing a tag. In such a case, make *<name_pattern>* component of
377	``--regex-<LANG>`` option empty while specifying ``{placeholder}``
378	as a regex flag. For example, a non-named tag can be put on
379	the stack by giving a regex flag "``{scope=push}{placeholder}``".
380
381	You may wonder what happens if a regex pattern with
382	``{scope=ref}`` flag matches an input line but the stack is empty,
383	or a non-named tag is at the top. If the regex pattern contains a
384	``{scope=ref}`` flag and the stack is empty, the ``{scope=ref}``
385	flag is ignored and nothing is attached to the ``scope:`` field.
386
387	If the top of the stack contains an unnamed tag,
388	@CTAGS_NAME_EXECUTABLE@ searches deeper into the stack to find the
389	top-most named tag. If it reaches the bottom of the stack without
390	finding a named tag, the ``{scope=ref}`` flag is ignored and
391	nothing is attached to the ``scope:`` field.
392
393	When a named tag on the stack is popped or cleared as the side
394	effect of a pattern matching, @CTAGS_NAME_EXECUTABLE@ attaches the
395	line number of the match to the ``end:`` field of
396	the named tag.
397
398	@CTAGS_NAME_EXECUTABLE@ clears all of the tags on the stack when it
399	reaches the end of the input source file. The line number of the
400	end is attached to the ``end:`` field of the cleared tags.
401
402``warning=<message>``
403	print the given *<message>* at WARNING level
404
405``fatal=<message>``
406	print the given *<message>* and exit
407
408EXAMPLES
409-------------
410
411Perl Pod
412~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
413
414This is the definition (pod.ctags) used in ctags for parsing Pod
415(https://perldoc.perl.org/perlpod.html) file.
416
417.. code-block:: ctags
418
419   --langdef=pod
420   --map-pod=+.pod
421
422   --kinddef-pod=c,chapter,chapters
423   --kinddef-pod=s,section,sections
424   --kinddef-pod=S,subsection,subsections
425   --kinddef-pod=t,subsubsection,subsubsections
426
427   --regex-pod=/^=head1[ \t]+(.+)/\1/c/
428   --regex-pod=/^=head2[ \t]+(.+)/\1/s/
429   --regex-pod=/^=head3[ \t]+(.+)/\1/S/
430   --regex-pod=/^=head4[ \t]+(.+)/\1/t/
431
432Using scope regex flags
433~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
434
435Let's think about writing a parser for a very small subset of the Ruby
436language.
437
438input source file (``input.srb``)::
439
440	class Example
441	  def methodA
442		puts "in class_method"
443	  end
444	  def methodB
445		puts "in class_method"
446	  end
447	end
448
449The parser for the input should capture ``Example`` with ``class`` kind,
450``methodA``, and ``methodB`` with ``method`` kind. ``methodA`` and ``methodB``
451should have ``Example`` as their scope. ``end:`` fields of each tag
452should have proper values.
453
454optlib file (``sub-ruby.ctags``):
455
456.. code-block:: ctags
457
458	--langdef=subRuby
459	--map-subRuby=.srb
460	--kinddef-subRuby=c,class,classes
461	--kinddef-subRuby=m,method,methods
462	--regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
463	--regex-subRuby=/^end///{scope=pop}{placeholder}
464	--regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
465	--regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
466
467command line and output::
468
469	$ ctags --quiet --fields=+eK \
470	--options=./sub-ruby.ctags -o - input.srb
471	Example	input.srb	/^class Example$/;"	class	end:8
472	methodA	input.srb	/^  def methodA$/;"	method	class:Example	end:4
473	methodB	input.srb	/^  def methodB$/;"	method	class:Example	end:7
474
475
476SEE ALSO
477--------
478
479The official Universal Ctags web site at:
480
481https://ctags.io/
482
483ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
484
485AUTHOR
486------
487
488Universal Ctags project
489https://ctags.io/
490(This man page partially derived from ctags(1) of
491Executable-ctags)
492
493Darren Hiebert <dhiebert@users.sourceforge.net>
494http://DarrenHiebert.com/
495