1.. _tags(5): 2 3============================================================== 4tags 5============================================================== 6 7Vi tags file format extended in ctags projects 8 9:Version: 2+ 10:Manual group: Universal Ctags 11:Manual section: 5 12 13DESCRIPTION 14----------- 15 16The contents of next section is a copy of FORMAT file in Exuberant 17Ctags source code in its subversion repository at sourceforge.net. 18 19Exceptions introduced in Universal Ctags are explained inline with 20"EXCEPTION" marker. 21 22---- 23 24Proposal for extended Vi tags file format 25----------------------------------------- 26 27| Version: 0.06 DRAFT 28| Date: 1998 Feb 8 29| Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net> 30 31Introduction 32~~~~~~~~~~~~ 33 34The file format for the "tags" file, as used by Vi and many of its 35descendants, has limited capabilities. 36 37This additional functionality is desired: 38 391. Static or local tags. 40 The scope of these tags is the file where they are defined. The same tag 41 can appear in several files, without really being a duplicate. 422. Duplicate tags. 43 Allow the same tag to occur more then once. They can be located in 44 a different file and/or have a different command. 453. Support for C++. 46 A tag is not only specified by its name, but also by the context (the 47 class name). 484. Future extension. 49 When even more additional functionality is desired, it must be possible to 50 add this later, without breaking programs that don't support it. 51 52 53From proposal to standard 54~~~~~~~~~~~~~~~~~~~~~~~~~ 55 56To make this proposal into a standard for tags files, it needs to be supported 57by most people working on versions of Vi, ctags, etc.. Currently this 58standard is supported by: 59 60Darren Hiebert <dhiebert at users.sourceforge.net> 61 Exuberant Ctags 62 63Bram Moolenaar <Bram at vim.org> 64 Vim (Vi IMproved) 65 66These have been or will be asked to support this standard: 67 68Nvi 69 Keith Bostic <bostic at bsdi.com> 70 71Vile 72 Tom E. Dickey <dickey at clark.net> 73 74NEdit 75 Mark Edel <edel at ltx.com> 76 77CRiSP 78 Paul Fox <fox at crisp.demon.co.uk> 79 80Lemmy 81 James Iuliano <jai at accessone.com> 82 83Zeus 84 Jussi Jumppanen <jussij at ca.com.au> 85 86Elvis 87 Steve Kirkendall <kirkenda at cs.pdx.edu> 88 89FTE 90 Marko Macek <Marko.Macek at snet.fri.uni-lj.si> 91 92 93Backwards compatibility 94~~~~~~~~~~~~~~~~~~~~~~~ 95 96A tags file that is generated in the new format should still be usable by Vi. 97This makes it possible to distribute tags files that are usable by all 98versions and descendants of Vi. 99 100This restricts the format to what Vi can handle. The format is: 101 1021. The tags file is a list of lines, each line in the format:: 103 104 {tagname}<Tab>{tagfile}<Tab>{tagaddress} 105 106 107 {tagname} 108 Any identifier, not containing white space.. 109 110 EXCEPTION: Universal Ctags violates this item of the proposal; 111 tagname may contain spaces. However, tabs are not allowed. 112 113 <Tab> 114 Exactly one TAB character (although many versions of Vi can 115 handle any amount of white space). 116 117 {tagfile} 118 The name of the file where {tagname} is defined, relative to 119 the current directory (or location of the tags file?). 120 121 {tagaddress} 122 Any Ex command. When executed, it behaves like 'magic' was 123 not set. 124 1252. The tags file is sorted on {tagname}. This allows for a binary search in 126 the file. 127 1283. Duplicate tags are allowed, but which one is actually used is 129 unpredictable (because of the binary search). 130 131The best way to add extra text to the line for the new functionality, without 132breaking it for Vi, is to put a comment in the {tagaddress}. This gives the 133freedom to use any text, and should work in any traditional Vi implementation. 134 135For example, when the old tags file contains:: 136 137 main main.c /^main(argc, argv)$/ 138 DEBUG defines.c 89 139 140The new lines can be:: 141 142 main main.c /^main(argc, argv)$/;"any additional text 143 DEBUG defines.c 89;"any additional text 144 145Note that the ';' is required to put the cursor in the right line, and then 146the '"' is recognized as the start of a comment. 147 148For Posix compliant Vi versions this will NOT work, since only a line number 149or a search command is recognized. I hope Posix can be adjusted. Nvi suffers 150from this. 151 152 153Security 154~~~~~~~~ 155 156Vi allows the use of any Ex command in a tags file. This has the potential of 157a trojan horse security leak. 158 159The proposal is to allow only Ex commands that position the cursor in a single 160file. Other commands, like editing another file, quitting the editor, 161changing a file or writing a file, are not allowed. It is therefore logical 162to call the command a tagaddress. 163 164Specifically, these two Ex commands are allowed: 165 166* A decimal line number:: 167 168 89 169 170* A search command. It is a regular expression pattern, as used by Vi, 171 enclosed in // or ??:: 172 173 /^int c;$/ 174 ?main()? 175 176There are two combinations possible: 177 178* Concatenation of the above, with ';' in between. The meaning is that the 179 first line number or search command is used, the cursor is positioned in 180 that line, and then the second search command is used (a line number would 181 not be useful). This can be done multiple times. This is useful when the 182 information in a single line is not unique, and the search needs to start 183 in a specified line. 184 :: 185 186 /struct xyz {/;/int count;/ 187 389;/struct foo/;/char *s;/ 188 189* A trailing comment can be added, starting with ';"' (two characters: 190 semi-colon and double-quote). This is used below. 191 :: 192 193 89;" foo bar 194 195This might be extended in the future. What is currently missing is a way to 196position the cursor in a certain column. 197 198 199Goals 200~~~~~ 201 202Now the usage of the comment text has to be defined. The following is aimed 203at: 204 2051. Keep the text short, because: 206 207 * The line length that Vi can handle is limited to 512 characters. 208 * Tags files can contain thousands of tags. I have seen tags files of 209 several Mbytes. 210 * More text makes searching slower. 211 2122. Keep the text readable, because: 213 214 * It is often necessary to check the output of a new ctags program. 215 * Be able to edit the file by hand. 216 * Make it easier to write a program to produce or parse the file. 217 2183. Don't use special characters, because: 219 220 * It should be possible to treat a tags file like any normal text file. 221 222Proposal 223~~~~~~~~ 224 225Use a comment after the {tagaddress} field. The format would be:: 226 227 {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..] 228 229 230{tagname} 231 Any identifier, not containing white space.. 232 233 EXCEPTION: Universal Ctags violates this item of the proposal; 234 name may contain spaces. However, tabs are not allowed. 235 Conversion, for some characters including <Tab> in the "value", 236 explained in the last of this section is applied. 237 238<Tab> 239 Exactly one TAB character (although many versions of Vi can 240 handle any amount of white space). 241 242{tagfile} 243 The name of the file where {tagname} is defined, relative to 244 the current directory (or location of the tags file?). 245 246{tagaddress} 247 Any Ex command. When executed, it behaves like 'magic' was 248 not set. It may be restricted to a line number or a search 249 pattern (Posix). 250 251Optionally: 252 253;" 254 semicolon + doublequote: Ends the tagaddress in way that looks 255 like the start of a comment to Vi. 256 257{tagfield} 258 See below. 259 260A tagfield has a name, a colon, and a value: "name:value". 261 262* The name consist only out of alphabetical characters. Upper and lower case 263 are allowed. Lower case is recommended. Case matters ("kind:" and "Kind: 264 are different tagfields). 265 266 EXCEPTION: Universal Ctags allows users to use a numerical character 267 in the name other than its initial letter. 268 269* The value may be empty. 270 It cannot contain a <Tab>. 271 272 - When a value contains a ``\t``, this stands for a <Tab>. 273 - When a value contains a ``\r``, this stands for a <CR>. 274 - When a value contains a ``\n``, this stands for a <NL>. 275 - When a value contains a ``\\``, this stands for a single ``\`` character. 276 277 Other use of the backslash character is reserved for future expansion. 278 Warning: When a tagfield value holds an MS-DOS file name, the backslashes 279 must be doubled! 280 281 EXCEPTION: Universal Ctags introduces more conversion rules. 282 283 - When a value contains a ``\a``, this stands for a <BEL> (0x07). 284 - When a value contains a ``\b``, this stands for a <BS> (0x08). 285 - When a value contains a ``\v``, this stands for a <VT> (0x0b). 286 - When a value contains a ``\f``, this stands for a <FF> (0x0c). 287 - The characters in range 0x01 to 0x1F included, and 0x7F are 288 converted to ``\x`` prefixed hexadecimal number if the characters are 289 not handled in the above "value" rules. 290 - The leading space (0x20) and ``!`` (0x21) in {tagname} are converted 291 to ``\x`` prefixed hexadecimal number (``\x20`` and ``\x21``) if the 292 tag is not a pseudo-tag. As described later, a pseudo-tag starts with 293 ``!``. These rules are for distinguishing pseudo-tags and non pseudo-tags 294 (regular tags) when tags lines in a tag file are sorted. 295 296Proposed tagfield names: 297 298=============== ============================================================================= 299FIELD-NAME DESCRIPTION 300=============== ============================================================================= 301arity Number of arguments for a function tag. 302 303class Name of the class for which this tag is a member or method. 304 305enum Name of the enumeration in which this tag is an enumerator. 306 307file Static (local) tag, with a scope of the specified file. When 308 the value is empty, {tagfile} is used. 309 310function Function in which this tag is defined. Useful for local 311 variables (and functions). When functions nest (e.g., in 312 Pascal), the function names are concatenated, separated with 313 '/', so it looks like a path. 314 315kind Kind of tag. The value depends on the language. For C and 316 C++ these kinds are recommended: 317 318 c 319 class name 320 321 d 322 define (from #define XXX) 323 324 e 325 enumerator 326 327 f 328 function or method name 329 330 F 331 file name 332 333 g 334 enumeration name 335 336 m 337 member (of structure or class data) 338 339 p 340 function prototype 341 342 s 343 structure name 344 345 t 346 typedef 347 348 u 349 union name 350 351 v 352 variable 353 354 When this field is omitted, the kind of tag is undefined. 355 356struct Name of the struct in which this tag is a member. 357 358union Name of the union in which this tag is a member. 359=============== ============================================================================= 360 361 362Note that these are mostly for C and C++. When tags programs are written for 363other languages, this list should be extended to include the used field names. 364This will help users to be independent of the tags program used. 365 366Examples:: 367 368 asdf sub.cc /^asdf()$/;" new_field:some\svalue file: 369 foo_t sub.h /^typedef foo_t$/;" kind:t 370 func3 sub.p /^func3()$/;" function:/func1/func2 file: 371 getflag sub.c /^getflag(arg)$/;" kind:f file: 372 inc sub.cc /^inc()$/;" file: class:PipeBuf 373 374 375The name of the "kind:" field can be omitted. This is to reduce the size of 376the tags file by about 15%. A program reading the tags file can recognize the 377"kind:" field by the missing ':'. Examples:: 378 379 foo_t sub.h /^typedef foo_t$/;" t 380 getflag sub.c /^getflag(arg)$/;" f file: 381 382 383Additional remarks: 384 385* When a tagfield appears twice in a tag line, only the last one is used. 386 387 388Note about line separators: 389 390Vi traditionally runs on Unix systems, where the line separator is a single 391linefeed character <NL>. On MS-DOS and compatible systems <CR><NL> is the 392standard line separator. To increase portability, this line separator is also 393supported. 394 395On the Macintosh a single <CR> is used for line separator. Supporting this on 396Unix systems causes problems, because most fgets() implementation don't see 397the <CR> as a line separator. Therefore the support for a <CR> as line 398separator is limited to the Macintosh. 399 400Summary: 401 402============== ====================== ========================= 403line separator generated on accepted on 404============== ====================== ========================= 405<LF> Unix Unix, MS-DOS, Macintosh 406<CR> Macintosh Macintosh 407<CR><LF> MS-DOS Unix, MS-DOS, Macintosh 408============== ====================== ========================= 409 410The characters <CR> and <LF> cannot be used inside a tag line. This is not 411mentioned elsewhere (because it's obvious). 412 413 414Note about white space: 415 416Vi allowed any white space to separate the tagname from the tagfile, and the 417filename from the tagaddress. This would need to be allowed for backwards 418compatibility. However, all known programs that generate tags use a single 419<Tab> to separate fields. 420 421There is a problem for using file names with embedded white space in the 422tagfile field. To work around this, the same special characters could be used 423as in the new fields, for example ``\s``. But, unfortunately, in MS-DOS the 424backslash character is used to separate file names. The file name 425``c:\vim\sap`` contains ``\s``, but this is not a <Space>. The number of 426backslashes could be doubled, but that will add a lot of characters, and make 427parsing the tags file slower and clumsy. 428 429To avoid these problems, we will only allow a <Tab> to separate fields, and 430not support a file name or tagname that contains a <Tab> character. This 431means that we are not 100% Vi compatible. However, there is no known tags 432program that uses something else than a <Tab> to separate the fields. Only 433when a user typed the tags file himself, or made his own program to generate a 434tags file, we could run into problems. To solve this, the tags file should be 435filtered, to replace the arbitrary white space with a single <Tab>. This Vi 436command can be used:: 437 438 :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/ 439 440(replace ^I with a real <Tab>). 441 442 443TAG FILE INFORMATION: 444 445Pseudo-tag lines can be used to encode information into the tag file regarding 446details about its content (e.g. have the tags been sorted?, are the optional 447tagfields present?), and regarding the program used to generate the tag file. 448This information can be used both to optimize use of the tag file (e.g. 449enable/disable binary searching) and provide general information (what version 450of the generator was used). 451 452The names of the tags used in these lines may be suitably chosen to ensure 453that when sorted, they will always be located near the first lines of the tag 454file. The use of "!_TAG_" is recommended. Note that a rare tag like "!" 455can sort to before these lines. The program reading the tags file should be 456smart enough to skip over these tags. 457 458The lines described below have been chosen to convey a select set of 459information. 460 461Tag lines providing information about the content of the tag file:: 462 463 !_TAG_FILE_FORMAT {version-number} /optional comment/ 464 !_TAG_FILE_SORTED {0|1} /0=unsorted, 1=sorted/ 465 466The {version-number} used in the tag file format line reserves the value of 467"1" for tag files complying with the original UNIX vi/ctags format, and 468reserves the value "2" for tag files complying with this proposal. This value 469may be used to determine if the extended features described in this proposal 470are present. 471 472Tag lines providing information about the program used to generate the tag 473file, and provided solely for documentation purposes:: 474 475 !_TAG_PROGRAM_AUTHOR {author-name} /{email-address}/ 476 !_TAG_PROGRAM_NAME {program-name} /optional comment/ 477 !_TAG_PROGRAM_URL {URL} /optional comment/ 478 !_TAG_PROGRAM_VERSION {version-id} /optional comment/ 479 480EXCEPTION: Universal Ctags introduces more kinds of pseudo-tags. 481See :ref:`ctags-client-tools(7) <ctags-client-tools(7)>` about them. 482 483---- 484 485 486Exceptions in Universal Ctags 487-------------------------------------------- 488 489Universal Ctags supports this proposal with some 490exceptions. 491 492 493Exceptions 494~~~~~~~~~~~ 495 496#. {tagname} in tags file generated by Universal Ctags may contain 497 spaces and several escape sequences. Parsers for documents like Tex and 498 reStructuredText, or liberal languages such as JavaScript need these 499 exceptions. See {tagname} of Proposal section for more detail about the 500 conversion. 501 502#. "name" part of {tagfield} in a tag generated by Universal Ctags may 503 contain numeric characters, but the first character of the "name" 504 must be alphabetic. 505 506 .. NOT REVIEWED YET (above item) 507 508.. _compat-output: 509 510Compatible output and weakness 511~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 513.. NOT REVIEWED YET 514 515Default behavior (``--output-format=u-ctags`` option) has the 516exceptions. In other hand, with ``--output-format=e-ctags`` option 517ctags has no exception; Universal Ctags command may use the same file 518format as Exuberant Ctags. However, ``--output-format=e-ctags`` throws 519away a tag entry which name includes a space or a tab 520character. ``TAG_OUTPUT_MODE`` pseudo-tag tells which format is 521used when ctags generating tags file. 522 523SEE ALSO 524-------- 525:ref:`ctags(1) <ctags(1)>`, :ref:`ctags-client-tools(7) <ctags-client-tools(7)>`, :ref:`ctags-incompatibilities(7) <ctags-incompatibilities(7)>`, :ref:`readtags(1) <readtags(1)>` 526