1.. _ctags-optlib(7): 2 3============================================================== 4ctags-optlib 5============================================================== 6 7Universal Ctags parser definition language 8 9:Version: 5.9.0 10:Manual group: Universal Ctags 11:Manual section: 7 12 13SYNOPSIS 14-------- 15| **ctags** [options] [file(s)] 16| **etags** [options] [file(s)] 17 18DESCRIPTION 19----------- 20 21*Exuberant Ctags*, the ancestor of *Universal Ctags*, has provided 22the way to define a new parser from command line. Universal Ctags 23extends and refines this feature. *optlib parser* is the name for such 24parser in Universal Ctags. "opt" intends a parser is defined with 25combination of command line options. "lib" intends an optlib parser 26can be more than ad-hoc personal configuration. 27 28This man page is for people who want to define an optlib parser. The 29readers should read :ref:`ctags(1) <ctags(1)>` of Universal Ctags first. 30 31Following options are for defining (or customizing) a parser: 32 33* ``--langdef=<name>`` 34* ``--map-<LANG>=[+|-]<extension>|<pattern>`` 35* ``--kinddef-<LANG>=<letter>,<name>,<description>`` 36* ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 37* ``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 38 39Following options are for controlling loading parser definition: 40 41* ``--options=<pathname>`` 42* ``--options-maybe=<pathname>`` 43* ``--optlib-dir=[+]<directory>`` 44 45The design of options and notations for defining a parser in 46Exuberant Ctags may focus on reducing the number of typing by user. 47Reducing the number of typing is important for users who want to 48define (or customize) a parser quickly. 49 50On the other hand, the design in Universal Ctags focuses on 51maintainability. The notation of Universal Ctags is redundant than 52that of Exuberant Ctags; the newly introduced kind should be declared 53explicitly, (long) names are approved than one-letter flags 54specifying kinds, and naming rules are stricter. 55 56This man page explains only stable options and flags. Universal Ctags 57also introduces experimental options and flags which have names starting 58with ``_``. For documentation on these options and flags, visit 59Universal Ctags web site at https://ctags.io/. 60 61 62Storing a parser definition to a file 63~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 64Though it is possible to define a parser from command line, you don't 65want to type the same command line each time when you need the parser. 66You can store options for defining a parser into a file. 67 68ctags loads files (preload files) listed in "FILES" 69section of :ref:`ctags(1) <ctags(1)>` at program starting up. You can put your parser 70definition needed usually to the files. 71 72``--options=<pathname>``, ``--options-maybe=<pathname>``, and 73``--optlib-dir=[+]<directory>`` are for loading optlib files you need 74occasionally. See "Option File Options" section of :ref:`ctags(1) <ctags(1)>` for 75these options. 76 77As explained in "FILES" section of :ref:`ctags(1) <ctags(1)>`, options for defining a 78parser listed line by line in an optlib file. Prefixed white spaces are 79ignored. A line starting with '#' is treated as a comment. Escaping 80shell meta character is not needed. 81 82Use ``.ctags`` as file extension for optlib file. You can define 83multiple parsers in an optlib file but it is better to make a file for 84each parser definition. 85 86``--_echo=<msg>`` and ``--_force-quit=<num>`` options are for debugging 87optlib parser. 88 89 90Overview for defining a parser 91~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 92 931. Design the parser 94 95 You need know both the target language and the ctags' 96 concepts (definition, reference, kind, role, field, extra). About 97 the concepts, :ref:`ctags(1) <ctags(1)>` of Universal Ctags may help you. 98 992. Give a name to the parser 100 101 Use ``--langdef=<name>`` option. *<name>* is referred as *<LANG>* in 102 the later steps. 103 1043. Give a file pattern or file extension for activating the parser 105 106 Use ``--map-<LANG>=[+|-]<extension>|<pattern>``. 107 1084. Define kinds 109 110 Use ``--kinddef-<LANG>=<letter>,<name>,<description>`` option. 111 Universal Ctags introduces this option. Exuberant Ctags doesn't 112 have. In Exuberant Ctags, a kind is defined as a side effect of 113 specifying ``--regex-<LANG>=`` option. So user doesn't have a 114 chance to recognize how important the definition of kind. 115 1165. Define patterns 117 118 Use ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 119 option for a single-line regular expression. You can also use 120 ``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 121 option for a multi-line regular expression. 122 123 As *<kind-spec>*, you can use the one-letter flag defined with 124 ``--kinddef-<LANG>=<letter>,<name>,<description>`` option. 125 126OPTIONS 127------------ 128 129``--langdef=<name>`` 130 Defines a new user-defined language, *<name>*, to be parsed with regular 131 expressions. Once defined, *<name>* may be used in other options taking 132 language names. 133 134 *<name>* must consist of alphanumeric characters, '``#``', or '``+``' 135 ('[a-zA-Z0-9#+]+'). The graph characters other than '``#``' and 136 '``+``' are disallowed (or reserved). Some of them (``[-=:{.]``) are 137 disallowed because they can make the command line parser of 138 ctags confused. The rest of them are just 139 reserved for future extending ctags. 140 141 ``all`` is an exception. ``all`` as *<name>* is not acceptable. It is 142 a reserved word. See the description of 143 ``--kinds-(<LANG>|all)=[+|-](<kinds>|*)`` option in :ref:`ctags(1) <ctags(1)>` about how the 144 reserved word is used. 145 146 The names of built-in parsers are capitalized. When 147 ctags evaluates an option in a command line, and 148 chooses a parser, ctags uses the names of 149 parsers in a case-insensitive way. Therefore, giving a name 150 started from a lowercase character doesn't help you to avoid the 151 parser name confliction. However, in a tags file, 152 ctags prints parser names in a case-sensitive 153 way; it prints a parser name as specified in ``--langdef=<name>`` 154 option. Therefore, we recommend you to give a name started from a 155 lowercase character to your private optlib parser. With this 156 convention, people can know where a tag entry in a tag file comes 157 from a built-in parser or a private optlib parser. 158 159``--kinddef-<LANG>=<letter>,<name>,<description>`` 160 Define a kind for *<LANG>*. 161 Be not confused this with ``--kinds-<LANG>``. 162 163 *<letter>* must be an alphabetical character ('[a-zA-EG-Z]') 164 other than "F". "F" has been reserved for representing a file 165 since Exuberant Ctags. 166 167 *<name>* must start with an alphabetic character, and the rest 168 must be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use 169 "file" as *<name>*. It has been reserved for representing a file 170 since Exuberant Ctags. 171 172 Note that using a number character in a *<name>* violates the 173 version 2 of tags file format though ctags 174 accepts it. For more detail, see :ref:`tags(5) <tags(5)>`. 175 176 *<description>* comes from any printable ASCII characters. The 177 exception is ``{`` and ``\``. ``{`` is reserved for adding flags 178 this option in the future. So put ``\`` before ``{`` to include 179 ``{`` to a description. To include ``\`` itself to a description, 180 put ``\`` before ``\``. 181 182 Both *<letter>*, *<name>* and their combination must be unique in 183 a *<LANG>*. 184 185 This option is newly introduced in Universal Ctags. This option 186 reduces the typing defining a regex pattern with 187 ``--regex-<LANG>=``, and keeps the consistency of kind 188 definitions in a language. 189 190 The *<letter>* can be used as an argument for ``--kinds-<LANG>`` 191 option to enable or disable the kind. Unless ``K`` field is 192 enabled, the *<letter>* is used as value in the "kind" extension 193 field in tags output. 194 195 The *<name>* surrounded by braces can be used as an argument for 196 ``--kind-<LANG>`` option. If ``K`` field is enabled, the *<name>* 197 is used as value in the "kind" extension field in tags output. 198 199 The *<description>* and *<letter>* are listed in ``--list-kinds`` 200 output. All three elements of the kind-spec are listed in 201 ``--list-kinds-full`` output. Don't use braces in the 202 *<description>*. They will be used meta characters in the future. 203 204``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 205 Define a single-line regular expression. 206 207 The */<line_pattern>/<name_pattern>/* pair defines a regular expression 208 replacement pattern, similar in style to ``sed`` substitution 209 commands, ``s/regexp/replacement/``, with which to generate tags from source files mapped to 210 the named language, *<LANG>*, (case-insensitive; either a built-in 211 or user-defined language). 212 213 The regular expression, *<line_pattern>*, defines 214 an extended regular expression (roughly that used by egrep(1)), 215 which is used to locate a single source line containing a tag and 216 may specify tab characters using ``\t``. 217 218 When a matching line is 219 found, a tag will be generated for the name defined by 220 *<name_pattern>*, which generally will contain the special 221 back-references ``\1`` through ``\9`` to refer to matching sub-expression 222 groups within *<line_pattern>*. 223 224 The '``/``' separator characters shown in the 225 parameter to the option can actually be replaced by any 226 character. Note that whichever separator character is used will 227 have to be escaped with a backslash ('``\``') character wherever it is 228 used in the parameter as something other than a separator. The 229 regular expression defined by this option is added to the current 230 list of regular expressions for the specified language unless the 231 parameter is omitted, in which case the current list is cleared. 232 233 Unless modified by *<flags>*, *<line_pattern>* is interpreted as a POSIX 234 extended regular expression. The *<name_pattern>* should expand for all 235 matching lines to a non-empty string of characters, or a warning 236 message will be reported unless ``{placeholder}`` regex flag is 237 specified. 238 239 A kind specifier (*<kind-spec>*) for tags matching regexp may 240 follow *<name_pattern>*, which will determine what kind of tag is 241 reported in the ``kind`` extension field (see :ref:`tags(5) <tags(5)>`). 242 243 *<kind-spec>* has two forms: *one-letter form* and *full form*. 244 245 The one-letter form in the form of ``<letter>``. It just refers a kind 246 *<letter>* defined with ``--kinddef-<LANG>``. This form is recommended in 247 Universal Ctags. 248 249 The full form of *<kind-spec>* is in the form of 250 ``<letter>,<name>,<description>``. Either the kind *<name>* and/or the 251 *<description>* can be omitted. See the description of 252 ``--kinddef-<LANG>=<letter>,<name>,<description>`` option about the 253 elements. 254 255 The full form is supported only for keeping the compatibility with Exuberant 256 Ctags which does not have ``--kinddef-<LANG>`` option. Supporting the 257 form will be removed from Universal Ctags in the future. 258 259 .. MEMO: the following line is commented out 260 If *<kind-spec>* is omitted, it defaults to ``r,regex``. 261 262 About *<flags>*, see "FLAGS FOR ``--regex-<LANG>`` OPTION". 263 264 For more information on the regular expressions used by 265 ctags, see either the regex(5,7) man page, or 266 the GNU info documentation for regex (e.g. "``info regex``"). 267 268``--list-regex-flags`` 269 Lists the flags that can be used in ``--regex-<LANG>`` option. 270 271``--list-mline-regex-flags`` 272 Lists the flags that can be used in ``--mline-regex-<LANG>`` option. 273 274``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]`` 275 Define a multi-line regular expression. 276 277 This option is similar to ``--regex-<LANG>`` option except the pattern is 278 applied to the whole file’s contents, not line by line. 279 280``--_echo=<message>`` 281 Print *<message>* to the standard error stream. This is helpful to 282 understand (and debug) optlib loading feature of Universal Ctags. 283 284``--_force-quit[=<num>]`` 285 Exits immediately when this option is processed. If *<num>* is used 286 as exit status. The default is 0. This is helpful to debug optlib 287 loading feature of Universal Ctags. 288 289 290FLAGS FOR ``--regex-<LANG>`` OPTION 291~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 292 293You can specify more than one flag, ``<letter>|{<name>}``, at the end of ``--regex-<LANG>`` to 294control how Universal Ctags uses the pattern. 295 296Exuberant Ctags uses a *<letter>* to represent a flag. In 297Universal Ctags, a *<name>* surrounded by braces (name form) can be used 298in addition to *<letter>*. The name form makes a user reading an optlib 299file easier. 300 301The most of all flags newly added in Universal Ctags 302don't have the one-letter representation. All of them have only the name 303representation. ``--list-regex-flags`` lists all the flags. 304 305``basic`` (one-letter form ``b``) 306 The pattern is interpreted as a POSIX basic regular expression. 307 308``exclusive`` (one-letter form ``x``) 309 Skip testing the other patterns if a line is matched to this 310 pattern. This is useful to avoid using CPU to parse line comments. 311 312``extend`` (one-letter form ``e``) 313 The pattern is interpreted as a POSIX extended regular 314 expression (default). 315 316``pcre2`` (one-letter form ``p``, experimental) 317 The pattern is interpreted as a PCRE2 regular expression explained 318 in pcre2syntax(3). This flag is available only if the ctags is 319 built with ``pcre2`` library. See the output of 320 ``--list-features`` option to know whether your ctags is 321 built-with ``pcre2`` or not. 322 323``icase`` (one-letter form ``i``) 324 The regular expression is to be applied in a case-insensitive 325 manner. 326 327``placeholder`` 328 Don't emit a tag captured with a regex pattern. The replacement 329 can be an empty string. See the following description of 330 ``scope=...`` flag about how this is useful. 331 332``scope=(ref|push|pop|clear|set|replace)`` 333 334 Specify what to do with the internal scope stack. 335 336 A parser programmed with ``--regex-<LANG>`` has a stack (scope 337 stack) internally. You can use it for tracking scope 338 information. The ``scope=...`` flag is for manipulating and 339 utilizing the scope stack. 340 341 If ``{scope=push}`` is specified, a tag captured with 342 ``--regex-<LANG>`` is pushed to the stack. ``{scope=push}`` 343 implies ``{scope=ref}``. 344 345 You can fill the scope field (``scope:``) of captured tag with 346 ``{scope=ref}``. If ``{scope=ref}`` flag is given, 347 ctags attaches the tag at the top to the tag 348 captured with ``--regex-<LANG>`` as the value for the ``scope:`` 349 field. 350 351 ctags pops the tag at the top of the stack when 352 ``--regex-<LANG>`` with ``{scope=pop}`` is matched to the input 353 line. 354 355 Specifying ``{scope=clear}`` removes all the tags in the scope. 356 Specifying ``{scope=set}`` removes all the tags in the scope, and 357 then pushes the captured tag as ``{scope=push}`` does. 358 359 ``{scope=replace}`` does the three things sequentially. First it 360 does the same as ``{scope=pop}``, then fills the ``scope:`` field 361 of the tag captured with ``--regex-<LANG>``, and pushes the tag to 362 the scope stack as if ``{scope=push}`` was given finally. 363 You cannot specify another scope action together with 364 ``{scope=replace}``. 365 366 You don't want to specify ``{scope=pop}{scope=push}`` as an 367 alternative to ``{scope=replace}``; ``{scope=pop}{scope=push}`` 368 fills the ``scope:`` field of the tag captured with ``--regex-<LANG>`` 369 first, then pops the tag at the top of the stack, and pushes 370 the captured tag to the scope stack finally. The timing when 371 filling the end field is different between ``{scope=replace}`` and 372 ``{scope=pop}{scope=push}``. 373 374 In some cases, you may want to use ``--regex-<LANG>`` only for its 375 side effects: using it only to manipulate the stack but not for 376 capturing a tag. In such a case, make *<name_pattern>* component of 377 ``--regex-<LANG>`` option empty while specifying ``{placeholder}`` 378 as a regex flag. For example, a non-named tag can be put on 379 the stack by giving a regex flag "``{scope=push}{placeholder}``". 380 381 You may wonder what happens if a regex pattern with 382 ``{scope=ref}`` flag matches an input line but the stack is empty, 383 or a non-named tag is at the top. If the regex pattern contains a 384 ``{scope=ref}`` flag and the stack is empty, the ``{scope=ref}`` 385 flag is ignored and nothing is attached to the ``scope:`` field. 386 387 If the top of the stack contains an unnamed tag, 388 ctags searches deeper into the stack to find the 389 top-most named tag. If it reaches the bottom of the stack without 390 finding a named tag, the ``{scope=ref}`` flag is ignored and 391 nothing is attached to the ``scope:`` field. 392 393 When a named tag on the stack is popped or cleared as the side 394 effect of a pattern matching, ctags attaches the 395 line number of the match to the ``end:`` field of 396 the named tag. 397 398 ctags clears all of the tags on the stack when it 399 reaches the end of the input source file. The line number of the 400 end is attached to the ``end:`` field of the cleared tags. 401 402``warning=<message>`` 403 print the given *<message>* at WARNING level 404 405``fatal=<message>`` 406 print the given *<message>* and exit 407 408EXAMPLES 409------------- 410 411Perl Pod 412~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 413 414This is the definition (pod.ctags) used in ctags for parsing Pod 415(https://perldoc.perl.org/perlpod.html) file. 416 417.. code-block:: ctags 418 419 --langdef=pod 420 --map-pod=+.pod 421 422 --kinddef-pod=c,chapter,chapters 423 --kinddef-pod=s,section,sections 424 --kinddef-pod=S,subsection,subsections 425 --kinddef-pod=t,subsubsection,subsubsections 426 427 --regex-pod=/^=head1[ \t]+(.+)/\1/c/ 428 --regex-pod=/^=head2[ \t]+(.+)/\1/s/ 429 --regex-pod=/^=head3[ \t]+(.+)/\1/S/ 430 --regex-pod=/^=head4[ \t]+(.+)/\1/t/ 431 432Using scope regex flags 433~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 434 435Let's think about writing a parser for a very small subset of the Ruby 436language. 437 438input source file (``input.srb``):: 439 440 class Example 441 def methodA 442 puts "in class_method" 443 end 444 def methodB 445 puts "in class_method" 446 end 447 end 448 449The parser for the input should capture ``Example`` with ``class`` kind, 450``methodA``, and ``methodB`` with ``method`` kind. ``methodA`` and ``methodB`` 451should have ``Example`` as their scope. ``end:`` fields of each tag 452should have proper values. 453 454optlib file (``sub-ruby.ctags``): 455 456.. code-block:: ctags 457 458 --langdef=subRuby 459 --map-subRuby=.srb 460 --kinddef-subRuby=c,class,classes 461 --kinddef-subRuby=m,method,methods 462 --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push} 463 --regex-subRuby=/^end///{scope=pop}{placeholder} 464 --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push} 465 --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder} 466 467command line and output:: 468 469 $ ctags --quiet --fields=+eK \ 470 --options=./sub-ruby.ctags -o - input.srb 471 Example input.srb /^class Example$/;" class end:8 472 methodA input.srb /^ def methodA$/;" method class:Example end:4 473 methodB input.srb /^ def methodB$/;" method class:Example end:7 474 475 476SEE ALSO 477-------- 478 479The official Universal Ctags web site at: 480 481https://ctags.io/ 482 483:ref:`ctags(1) <ctags(1)>`, :ref:`tags(5) <tags(5)>`, regex(3), regex(7), egrep(1), pcre2syntax(3) 484 485AUTHOR 486------ 487 488Universal Ctags project 489https://ctags.io/ 490(This man page partially derived from :ref:`ctags(1) <ctags(1)>` of 491Executable-ctags) 492 493Darren Hiebert <dhiebert@users.sourceforge.net> 494http://DarrenHiebert.com/ 495