xref: /Universal-ctags/docs/man/tags.5.rst (revision 6e1547f107d8e35e6f732b9c60569302b17d8abd)
1.. _tags(5):
2
3==============================================================
4tags
5==============================================================
6
7Vi tags file format extended in ctags projects
8
9:Version: 2+
10:Manual group: Universal Ctags
11:Manual section: 5
12
13DESCRIPTION
14-----------
15
16The contents of next section is a copy of FORMAT file in Exuberant
17Ctags source code in its subversion repository at sourceforge.net.
18
19Exceptions introduced in Universal Ctags are explained inline with
20"EXCEPTION" marker.
21
22----
23
24Proposal for extended Vi tags file format
25-----------------------------------------
26
27| Version: 0.06 DRAFT
28| Date: 1998 Feb 8
29| Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>
30
31Introduction
32~~~~~~~~~~~~
33
34The file format for the "tags" file, as used by Vi and many of its
35descendants, has limited capabilities.
36
37This additional functionality is desired:
38
391. Static or local tags.
40   The scope of these tags is the file where they are defined.  The same tag
41   can appear in several files, without really being a duplicate.
422. Duplicate tags.
43   Allow the same tag to occur more then once.  They can be located in
44   a different file and/or have a different command.
453. Support for C++.
46   A tag is not only specified by its name, but also by the context (the
47   class name).
484. Future extension.
49   When even more additional functionality is desired, it must be possible to
50   add this later, without breaking programs that don't support it.
51
52
53From proposal to standard
54~~~~~~~~~~~~~~~~~~~~~~~~~
55
56To make this proposal into a standard for tags files, it needs to be supported
57by most people working on versions of Vi, ctags, etc..  Currently this
58standard is supported by:
59
60Darren Hiebert <dhiebert at users.sourceforge.net>
61	Exuberant Ctags
62
63Bram Moolenaar <Bram at vim.org>
64	Vim (Vi IMproved)
65
66These have been or will be asked to support this standard:
67
68Nvi
69		Keith Bostic <bostic at bsdi.com>
70
71Vile
72		Tom E. Dickey <dickey at clark.net>
73
74NEdit
75		Mark Edel <edel at ltx.com>
76
77CRiSP
78		Paul Fox <fox at crisp.demon.co.uk>
79
80Lemmy
81		James Iuliano <jai at accessone.com>
82
83Zeus
84		Jussi Jumppanen <jussij at ca.com.au>
85
86Elvis
87		Steve Kirkendall <kirkenda at cs.pdx.edu>
88
89FTE
90		Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
91
92
93Backwards compatibility
94~~~~~~~~~~~~~~~~~~~~~~~
95
96A tags file that is generated in the new format should still be usable by Vi.
97This makes it possible to distribute tags files that are usable by all
98versions and descendants of Vi.
99
100This restricts the format to what Vi can handle.  The format is:
101
1021. The tags file is a list of lines, each line in the format::
103
104	{tagname}<Tab>{tagfile}<Tab>{tagaddress}
105
106
107   {tagname}
108	Any identifier, not containing white space..
109
110	EXCEPTION: Universal Ctags violates this item of the proposal;
111	tagname may contain spaces. However, tabs are not allowed.
112
113   <Tab>
114	Exactly one TAB character (although many versions of Vi can
115	handle any amount of white space).
116
117   {tagfile}
118	The name of the file where {tagname} is defined, relative to
119	the current directory (or location of the tags file?).
120
121   {tagaddress}
122	Any Ex command.  When executed, it behaves like 'magic' was
123	not set.
124
1252. The tags file is sorted on {tagname}.  This allows for a binary search in
126   the file.
127
1283. Duplicate tags are allowed, but which one is actually used is
129   unpredictable (because of the binary search).
130
131The best way to add extra text to the line for the new functionality, without
132breaking it for Vi, is to put a comment in the {tagaddress}.  This gives the
133freedom to use any text, and should work in any traditional Vi implementation.
134
135For example, when the old tags file contains::
136
137	main	main.c	/^main(argc, argv)$/
138	DEBUG	defines.c	89
139
140The new lines can be::
141
142	main	main.c	/^main(argc, argv)$/;"any additional text
143	DEBUG	defines.c	89;"any additional text
144
145Note that the ';' is required to put the cursor in the right line, and then
146the '"' is recognized as the start of a comment.
147
148For Posix compliant Vi versions this will NOT work, since only a line number
149or a search command is recognized.  I hope Posix can be adjusted.  Nvi suffers
150from this.
151
152
153Security
154~~~~~~~~
155
156Vi allows the use of any Ex command in a tags file.  This has the potential of
157a trojan horse security leak.
158
159The proposal is to allow only Ex commands that position the cursor in a single
160file.  Other commands, like editing another file, quitting the editor,
161changing a file or writing a file, are not allowed.  It is therefore logical
162to call the command a tagaddress.
163
164Specifically, these two Ex commands are allowed:
165
166* A decimal line number::
167
168	89
169
170* A search command.  It is a regular expression pattern, as used by Vi,
171  enclosed in // or ??::
172
173	/^int c;$/
174	?main()?
175
176There are two combinations possible:
177
178* Concatenation of the above, with ';' in between.  The meaning is that the
179  first line number or search command is used, the cursor is positioned in
180  that line, and then the second search command is used (a line number would
181  not be useful).  This can be done multiple times.  This is useful when the
182  information in a single line is not unique, and the search needs to start
183  in a specified line.
184  ::
185
186	/struct xyz {/;/int count;/
187	389;/struct foo/;/char *s;/
188
189* A trailing comment can be added, starting with ';"' (two characters:
190  semi-colon and double-quote).  This is used below.
191  ::
192
193	89;" foo bar
194
195This might be extended in the future.  What is currently missing is a way to
196position the cursor in a certain column.
197
198
199Goals
200~~~~~
201
202Now the usage of the comment text has to be defined.  The following is aimed
203at:
204
2051. Keep the text short, because:
206
207   * The line length that Vi can handle is limited to 512 characters.
208   * Tags files can contain thousands of tags.  I have seen tags files of
209     several Mbytes.
210   * More text makes searching slower.
211
2122. Keep the text readable, because:
213
214   * It is often necessary to check the output of a new ctags program.
215   * Be able to edit the file by hand.
216   * Make it easier to write a program to produce or parse the file.
217
2183. Don't use special characters, because:
219
220   * It should be possible to treat a tags file like any normal text file.
221
222Proposal
223~~~~~~~~
224
225Use a comment after the {tagaddress} field.  The format would be::
226
227	{tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
228
229
230{tagname}
231	Any identifier, not containing white space..
232
233	EXCEPTION: Universal Ctags violates this item of the proposal;
234	name may contain spaces. However, tabs are not allowed.
235	Conversion, for some characters including <Tab> in the "value",
236	explained in the last of this section is applied.
237
238<Tab>
239	Exactly one TAB character (although many versions of Vi can
240	handle any amount of white space).
241
242{tagfile}
243	The name of the file where {tagname} is defined, relative to
244	the current directory (or location of the tags file?).
245
246{tagaddress}
247	Any Ex command.  When executed, it behaves like 'magic' was
248	not set.  It may be restricted to a line number or a search
249	pattern (Posix).
250
251Optionally:
252
253;"
254		semicolon + doublequote: Ends the tagaddress in way that looks
255		like the start of a comment to Vi.
256
257{tagfield}
258		See below.
259
260A tagfield has a name, a colon, and a value: "name:value".
261
262* The name consist only out of alphabetical characters.  Upper and lower case
263  are allowed.  Lower case is recommended.  Case matters ("kind:" and "Kind:
264  are different tagfields).
265
266  EXCEPTION: Universal Ctags allows users to use a numerical character
267  in the name other than its initial letter.
268
269* The value may be empty.
270  It cannot contain a <Tab>.
271
272  - When a value contains a ``\t``, this stands for a <Tab>.
273  - When a value contains a ``\r``, this stands for a <CR>.
274  - When a value contains a ``\n``, this stands for a <NL>.
275  - When a value contains a ``\\``, this stands for a single ``\`` character.
276
277  Other use of the backslash character is reserved for future expansion.
278  Warning: When a tagfield value holds an MS-DOS file name, the backslashes
279  must be doubled!
280
281  EXCEPTION: Universal Ctags introduces more conversion rules.
282
283  - When a value contains a ``\a``, this stands for a <BEL> (0x07).
284  - When a value contains a ``\b``, this stands for a <BS> (0x08).
285  - When a value contains a ``\v``, this stands for a <VT> (0x0b).
286  - When a value contains a ``\f``, this stands for a <FF> (0x0c).
287  - The characters in range 0x01 to 0x1F included, and 0x7F are
288    converted to ``\x`` prefixed hexadecimal number if the characters are
289    not handled in the above "value" rules.
290  - The leading space (0x20) and ``!`` (0x21) in {tagname} are converted
291    to ``\x`` prefixed hexadecimal number (``\x20`` and ``\x21``) if the
292    tag is not a pseudo-tag. As described later, a pseudo-tag starts with
293    ``!``. These rules are for distinguishing pseudo-tags and non pseudo-tags
294    (regular tags) when tags lines in a tag file are sorted.
295
296Proposed tagfield names:
297
298=============== =============================================================================
299FIELD-NAME	DESCRIPTION
300=============== =============================================================================
301arity		Number of arguments for a function tag.
302
303class		Name of the class for which this tag is a member or method.
304
305enum		Name of the enumeration in which this tag is an enumerator.
306
307file		Static (local) tag, with a scope of the specified file.  When
308		the value is empty, {tagfile} is used.
309
310function	Function in which this tag is defined.  Useful for local
311		variables (and functions).  When functions nest (e.g., in
312		Pascal), the function names are concatenated, separated with
313		'/', so it looks like a path.
314
315kind		Kind of tag.  The value depends on the language.  For C and
316		C++ these kinds are recommended:
317
318		c
319			class name
320
321		d
322			define (from #define XXX)
323
324		e
325			enumerator
326
327		f
328			function or method name
329
330		F
331			file name
332
333		g
334			enumeration name
335
336		m
337			member (of structure or class data)
338
339		p
340			function prototype
341
342		s
343			structure name
344
345		t
346			typedef
347
348		u
349			union name
350
351		v
352			variable
353
354		When this field is omitted, the kind of tag is undefined.
355
356struct		Name of the struct in which this tag is a member.
357
358union		Name of the union in which this tag is a member.
359=============== =============================================================================
360
361
362Note that these are mostly for C and C++.  When tags programs are written for
363other languages, this list should be extended to include the used field names.
364This will help users to be independent of the tags program used.
365
366Examples::
367
368	asdf	sub.cc	/^asdf()$/;"	new_field:some\svalue	file:
369	foo_t	sub.h	/^typedef foo_t$/;"	kind:t
370	func3	sub.p	/^func3()$/;"	function:/func1/func2	file:
371	getflag	sub.c	/^getflag(arg)$/;"	kind:f	file:
372	inc	sub.cc	/^inc()$/;"	file: class:PipeBuf
373
374
375The name of the "kind:" field can be omitted.  This is to reduce the size of
376the tags file by about 15%.  A program reading the tags file can recognize the
377"kind:" field by the missing ':'.  Examples::
378
379	foo_t	sub.h	/^typedef foo_t$/;"	t
380	getflag	sub.c	/^getflag(arg)$/;"	f	file:
381
382
383Additional remarks:
384
385* When a tagfield appears twice in a tag line, only the last one is used.
386
387
388Note about line separators:
389
390Vi traditionally runs on Unix systems, where the line separator is a single
391linefeed character <NL>.  On MS-DOS and compatible systems <CR><NL> is the
392standard line separator.  To increase portability, this line separator is also
393supported.
394
395On the Macintosh a single <CR> is used for line separator.  Supporting this on
396Unix systems causes problems, because most fgets() implementation don't see
397the <CR> as a line separator.  Therefore the support for a <CR> as line
398separator is limited to the Macintosh.
399
400Summary:
401
402==============  ======================  =========================
403line separator	generated on		accepted on
404==============  ======================  =========================
405<LF>		Unix			Unix, MS-DOS, Macintosh
406<CR>		Macintosh		Macintosh
407<CR><LF>	MS-DOS			Unix, MS-DOS, Macintosh
408==============  ======================  =========================
409
410The characters <CR> and <LF> cannot be used inside a tag line.  This is not
411mentioned elsewhere (because it's obvious).
412
413
414Note about white space:
415
416Vi allowed any white space to separate the tagname from the tagfile, and the
417filename from the tagaddress.  This would need to be allowed for backwards
418compatibility.  However, all known programs that generate tags use a single
419<Tab> to separate fields.
420
421There is a problem for using file names with embedded white space in the
422tagfile field.  To work around this, the same special characters could be used
423as in the new fields, for example ``\s``.  But, unfortunately, in MS-DOS the
424backslash character is used to separate file names.  The file name
425``c:\vim\sap`` contains ``\s``, but this is not a <Space>.  The number of
426backslashes could be doubled, but that will add a lot of characters, and make
427parsing the tags file slower and clumsy.
428
429To avoid these problems, we will only allow a <Tab> to separate fields, and
430not support a file name or tagname that contains a <Tab> character.  This
431means that we are not 100% Vi compatible.  However, there is no known tags
432program that uses something else than a <Tab> to separate the fields.  Only
433when a user typed the tags file himself, or made his own program to generate a
434tags file, we could run into problems.  To solve this, the tags file should be
435filtered, to replace the arbitrary white space with a single <Tab>.  This Vi
436command can be used::
437
438	:%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
439
440(replace ^I with a real <Tab>).
441
442
443TAG FILE INFORMATION:
444
445Pseudo-tag lines can be used to encode information into the tag file regarding
446details about its content (e.g. have the tags been sorted?, are the optional
447tagfields present?), and regarding the program used to generate the tag file.
448This information can be used both to optimize use of the tag file (e.g.
449enable/disable binary searching) and provide general information (what version
450of the generator was used).
451
452The names of the tags used in these lines may be suitably chosen to ensure
453that when sorted, they will always be located near the first lines of the tag
454file.  The use of "!_TAG_" is recommended.  Note that a rare tag like "!"
455can sort to before these lines.  The program reading the tags file should be
456smart enough to skip over these tags.
457
458The lines described below have been chosen to convey a select set of
459information.
460
461Tag lines providing information about the content of the tag file::
462
463    !_TAG_FILE_FORMAT	{version-number}	/optional comment/
464    !_TAG_FILE_SORTED	{0|1}			/0=unsorted, 1=sorted/
465
466The {version-number} used in the tag file format line reserves the value of
467"1" for tag files complying with the original UNIX vi/ctags format, and
468reserves the value "2" for tag files complying with this proposal. This value
469may be used to determine if the extended features described in this proposal
470are present.
471
472Tag lines providing information about the program used to generate the tag
473file, and provided solely for documentation purposes::
474
475    !_TAG_PROGRAM_AUTHOR	{author-name}	/{email-address}/
476    !_TAG_PROGRAM_NAME	{program-name}	/optional comment/
477    !_TAG_PROGRAM_URL	{URL}	/optional comment/
478    !_TAG_PROGRAM_VERSION	{version-id}	/optional comment/
479
480EXCEPTION: Universal Ctags introduces more kinds of pseudo-tags.
481See :ref:`ctags-client-tools(7) <ctags-client-tools(7)>` about them.
482
483----
484
485
486Exceptions in Universal Ctags
487--------------------------------------------
488
489Universal Ctags supports this proposal with some
490exceptions.
491
492
493Exceptions
494~~~~~~~~~~~
495
496#. {tagname} in tags file generated by Universal Ctags may contain
497   spaces and several escape sequences. Parsers for documents like Tex and
498   reStructuredText, or liberal languages such as JavaScript need these
499   exceptions. See {tagname} of Proposal section for more detail about the
500   conversion.
501
502#. "name" part of {tagfield} in a tag generated by Universal Ctags may
503   contain numeric characters, but the first character of the "name"
504   must be alphabetic.
505
506   .. NOT REVIEWED YET (above item)
507
508.. _compat-output:
509
510Compatible output and weakness
511~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
512
513.. NOT REVIEWED YET
514
515Default behavior (``--output-format=u-ctags`` option) has the
516exceptions.  In other hand, with ``--output-format=e-ctags`` option
517ctags has no exception; Universal Ctags command may use the same file
518format as Exuberant Ctags. However, ``--output-format=e-ctags`` throws
519away a tag entry which name includes a space or a tab
520character. ``TAG_OUTPUT_MODE`` pseudo-tag tells which format is
521used when ctags generating tags file.
522
523SEE ALSO
524--------
525:ref:`ctags(1) <ctags(1)>`, :ref:`ctags-client-tools(7) <ctags-client-tools(7)>`, :ref:`ctags-incompatibilities(7) <ctags-incompatibilities(7)>`, :ref:`readtags(1) <readtags(1)>`
526