xref: /Universal-ctags/old-docs/website/FORMAT (revision 94eb5533b4afecf9ec7f085ede410daa2faa5c1f) !
1Proposal for extended Vi tags file format
2=========================================
3
4Version: 0.06 DRAFT
5   Date: 1998 Feb 8
6 Author: Bram Moolenaar <Bram at vim.org>
7    and: Darren Hiebert <dhiebert at users.sourceforge.net>
8
9
101. Introduction
11---------------
12
13The file format for the "tags" file, as used by Vi and many of its
14descendants, has limited capabilities.
15
16This additional functionality is desired:
17
181. Static or local tags.
19   The scope of these tags is the file where they are defined.  The same tag
20   can appear in several files, without really being a duplicate.
212. Duplicate tags.
22   Allow the same tag to occur more then once.  They can be located in
23   a different file and/or have a different command.
243. Support for C++.
25   A tag is not only specified by its name, but also by the context (the
26   class name).
274. Future extension.
28   When even more additional functionality is desired, it must be possible to
29   add this later, whithout breaking programs that don't support it.
30
31
322. From proposal to standard
33----------------------------
34
35To make this proposal into a standard for tags files, it needs to be supported
36by most people working on versions of Vi, ctags, etc..  Currently this
37standard is supported by:
38
39Darren Hiebert <dhiebert at users.sourceforge.net>   Exuberant ctags
40Bram Moolenaar <Bram at vim.org>                     Vim (Vi IMproved)
41
42These have been or will be asked to support this standard:
43
44Nvi		Keith Bostic <bostic at bsdi.com>
45Vile		Tom E. Dickey <dickey at clark.net>
46NEdit		Mark Edel <edel at ltx.com>
47CRiSP		Paul Fox <fox at crisp.demon.co.uk>
48Lemmy           James Iuliano <jai at accessone.com>
49Zeus            Jussi Jumppanen <jussij at ca.com.au>
50Elvis		Steve Kirkendall <kirkenda at cs.pdx.edu>
51FTE             Marko Macek <Marko.Macek at snet.fri.uni-lj.si>
52
53
543. Backwards compatibility
55--------------------------
56
57A tags file that is generated in the new format should still be usable by Vi.
58This makes it possible to distribute tags files that are usable by all
59versions and descendants of Vi.
60
61This restricts the format to what Vi can handle.  The format is:
62
631. The tags file is a list of lines, each line in the format:
64
65	{tagname}<Tab>{tagfile}<Tab>{tagaddress}
66
67   {tagname}	Any identifier, not containing white space..
68   <Tab>	Exactly one TAB character (although many versions of Vi can
69		handle any amount of white space).
70   {tagfile}	The name of the file where {tagname} is defined, relative to
71   		the current directory (or location of the tags file?).
72   {tagaddress}	Any Ex command.  When executed, it behaves like 'magic' was
73		not set.
74
752. The tags file is sorted on {tagname}.  This allows for a binary search in
76   the file.
77
783. Duplicate tags are allowed, but which one is actually used is
79   unpredictable (because of the binary search).
80
81The best way to add extra text to the line for the new functionality, without
82breaking it for Vi, is to put a comment in the {tagaddress}.  This gives the
83freedom to use any text, and should work in any traditional Vi implementation.
84
85For example, when the old tags file contains:
86
87	main	main.c	/^main(argc, argv)$/
88	DEBUG	defines.c	89
89
90The new lines can be:
91
92	main	main.c	/^main(argc, argv)$/;"any additional text
93	DEBUG	defines.c	89;"any additional text
94
95Note that the ';' is required to put the cursor in the right line, and then
96the '"' is recognized as the start of a comment.
97
98For Posix compliant Vi versions this will NOT work, since only a line number
99or a search command is recognized.  I hope Posix can be adjusted.  Nvi suffers
100from this.
101
102
1034. Security
104-----------
105
106Vi allows the use of any Ex command in a tags file.  This has the potential of
107a trojan horse security leak.
108
109The proposal is to allow only Ex commands that position the cursor in a single
110file.  Other commands, like editing another file, quitting the editor,
111changing a file or writing a file, are not allowed.  It is therefore logical
112to call the command a tagaddress.
113
114Specifically, these two Ex commands are allowed:
115- A decimal line number.
116	89
117- A search command.  It is a regular expression pattern, as used by Vi,
118  enclosed in // or ??.
119	/^int c;$/
120	?main()?
121
122There are two combinations possible:
123- Concatenation of the above, with ';' in between.  The meaning is that the
124  first line number or search command is used, the cursor is positioned in
125  that line, and then the second search command is used (a line number would
126  not be useful).  This can be done multiple times.  This is useful when the
127  information in a single line is not unique, and the search needs to start
128  in a specified line.
129	/struct xyz {/;/int count;/
130	389;/struct foo/;/char *s;/
131- A trailing comment can be added, starting with ';"' (two characters:
132  semi-colon and double-quote).  This is used below.
133	89;" foo bar
134
135This might be extended in the future.  What is currently missing is a way to
136position the cursor in a certain column.
137
138
1395. Goals
140--------
141
142Now the usage of the comment text has to be defined.  The following is aimed
143at:
144
1451. Keep the text short, because:
146   - The line length that Vi can handle is limited to 512 characters.
147   - Tags files can contain thousands of tags.  I have seen tags files of
148     several Mbytes.
149   - More text makes searching slower.
1502. Keep the text readable, because:
151   - It is often necessary to check the output of a new ctags program.
152   - Be able to edit the file by hand.
153   - Make it easier to write a program to produce or parse the file.
1543. Don't use special characters, because:
155   - It should be possible to treat a tags file like any normal text file.
156
157
1586. Proposal
159-----------
160
161Use a comment after the {tagaddress} field.  The format would be:
162
163	{tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
164
165   {tagname}	Any identifier, not containing white space..
166   <Tab>	Exactly one TAB character (although many versions of Vi can
167		handle any amount of white space).
168   {tagfile}	The name of the file where {tagname} is defined, relative to
169   		the current directory (or location of the tags file?).
170   {tagaddress}	Any Ex command.  When executed, it behaves like 'magic' was
171		not set.  It may be restricted to a line number or a search
172		pattern (Posix).
173Optionally:
174   ;"		semicolon + doublequote: Ends the tagaddress in way that looks
175		like the start of a comment to Vi.
176   {tagfield}	See below.
177
178A tagfield has a name, a colon, and a value: "name:value".
179- The name consist only out of alphabetical characters.  Upper and lower case
180  are allowed.  Lower case is recommended.  Case matters ("kind:" and "Kind:
181  are different tagfields).
182- The value may be empty.
183  It cannot contain a <Tab>.
184  When a value contains a "\t", this stands for a <Tab>.
185  When a value contains a "\r", this stands for a <CR>.
186  When a value contains a "\n", this stands for a <NL>.
187  When a value contains a "\\", this stands for a single '\' character.
188  Other use of the backslash character is reserved for future expansion.
189  Warning: When a tagfield value holds an MS-DOS file name, the backslashes
190  must be doubled!
191
192
193Proposed tagfield names:
194
195FIELD-NAME	DESCRIPTION
196
197arity		Number of arguments for a function tag.
198
199class		Name of the class for which this tag is a member or method.
200
201enum		Name of the enumeration in which this tag is an enumerator.
202
203file		Static (local) tag, with a scope of the specified file.  When
204		the value is empty, {tagfile} is used.
205
206function	Function in which this tag is defined.  Useful for local
207		variables (and functions).  When functions nest (e.g., in
208		Pascal), the function names are concatenated, separated with
209		'/', so it looks like a path.
210
211kind		Kind of tag.  The value depends on the language.  For C and
212		C++ these kinds are recommended:
213		c	class name
214		d	define (from #define XXX)
215		e	enumerator
216		f	function or method name
217		F	file name
218		g	enumeration name
219		m	member (of structure or class data)
220		p	function prototype
221		s	structure name
222		t	typedef
223		u	union name
224		v	variable
225		When this field is omitted, the kind of tag is undefined.
226
227struct		Name of the struct in which this tag is a member.
228
229union		Name of the union in which this tag is a member.
230
231
232Note that these are mostly for C and C++.  When tags programs are written for
233other languages, this list should be extended to include the used field names.
234This will help users to be independent of the tags program used.
235
236Examples:
237
238	asdf	sub.cc	/^asdf()$/;"	new_field:some\svalue	file:
239	foo_t	sub.h	/^typedef foo_t$/;"	kind:t
240	func3	sub.p	/^func3()$/;"	function:/func1/func2	file:
241	getflag	sub.c	/^getflag(arg)$/;"	kind:f	file:
242	inc	sub.cc	/^inc()$/;"	file: class:PipeBuf
243
244
245The name of the "kind:" field can be omitted.  This is to reduce the size of
246the tags file by about 15%.  A program reading the tags file can recognize the
247"kind:" field by the missing ':'.  Examples:
248
249	foo_t	sub.h	/^typedef foo_t$/;"	t
250	getflag	sub.c	/^getflag(arg)$/;"	f	file:
251
252
253Additional remarks:
254- When a tagfield appears twice in a tag line, only the last one is used.
255
256
257Note about line separators:
258
259Vi traditionally runs on Unix systems, where the line separator is a single
260linefeed character <NL>.  On MS-DOS and compatible systems <CR><NL> is the
261standard line separator.  To increase portability, this line separator is also
262supported.
263
264On the Macintosh a single <CR> is used for line separator.  Supporting this on
265Unix systems causes problems, because most fgets() implementation don't see
266the <CR> as a line separator.  Therefore the support for a <CR> as line
267separator is limited to the Macintosh.
268
269Summary:
270line separator	generated on		accepted on
271<LF>		Unix			Unix, MS-DOS, Macintosh
272<CR>		Macintosh		Macintosh
273<CR><LF>	MS-DOS			Unix, MS-DOS, Macintosh
274
275The characters <CR> and <LF> cannot be used inside a tag line.  This is not
276mentioned elsewhere (because it's obvious).
277
278
279Note about white space:
280
281Vi allowed any white space to separate the tagname from the tagfile, and the
282filename from the tagaddress.  This would need to be allowed for backwards
283compatibility.  However, all known programs that generate tags use a single
284<Tab> to separate fields.
285
286There is a problem for using file names with embedded white space in the
287tagfile field.  To work around this, the same special characters could be used
288as in the new fields, for example "\s".  But, unfortunately, in MS-DOS the
289backslash character is used to separate file names.  The file name
290"c:\vim\sap" contains "\s", but this is not a <Space>.  The number of
291backslashes could be doubled, but that will add a lot of characters, and make
292parsing the tags file slower and clumsy.
293
294To avoid these problems, we will only allow a <Tab> to separate fields, and
295not support a file name or tagname that contains a <Tab> character.  This
296means that we are not 100% Vi compatible.  However, there is no known tags
297program that uses something else than a <Tab> to separate the fields.  Only
298when a user typed the tags file himself, or made his own program to generate a
299tags file, we could run into problems.  To solve this, the tags file should be
300filtered, to replace the arbitrary white space with a single <Tab>.  This Vi
301command can be used:
302
303	:%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/
304
305(replace ^I with a real <Tab>).
306
307
308TAG FILE INFORMATION:
309
310Psuedo-tag lines can be used to encode information into the tag file regarding
311details about its content (e.g. have the tags been sorted?, are the optional
312tagfields present?), and regarding the program used to generate the tag file.
313This information can be used both to optimize use of the tag file (e.g.
314enable/disable binary searching) and provide general information (what version
315of the generator was used).
316
317The names of the tags used in these lines may be suitably chosen to ensure
318that when sorted, they will always be located near the first lines of the tag
319file.  The use of "!_TAG_" is recommended.  Note that a rare tag like "!"
320can sort to before these lines.  The program reading the tags file should be
321smart enough to skip over these tags.
322
323The lines described below have been chosen to convey a select set of
324information.
325
326Tag lines providing information about the content of the tag file:
327
328!_TAG_FILE_FORMAT	{version-number}	/optional comment/
329!_TAG_FILE_SORTED	{0|1}			/0=unsorted, 1=sorted/
330
331The {version-number} used in the tag file format line reserves the value of
332"1" for tag files complying with the original UNIX vi/ctags format, and
333reserves the value "2" for tag files complying with this proposal. This value
334may be used to determine if the extended features described in this proposal
335are present.
336
337Tag lines providing information about the program used to generate the tag
338file, and provided solely for documentation purposes:
339
340!_TAG_PROGRAM_AUTHOR	{author-name}	/{email-address}/
341!_TAG_PROGRAM_NAME	{program-name}	/optional comment/
342!_TAG_PROGRAM_URL	{URL}	/optional comment/
343!_TAG_PROGRAM_VERSION	{version-id}	/optional comment/
344
345
346[End Of Document]
347