1Proposal for extended Vi tags file format 2========================================= 3 4Version: 0.06 DRAFT 5 Date: 1998 Feb 8 6 Author: Bram Moolenaar <Bram at vim.org> 7 and: Darren Hiebert <dhiebert at users.sourceforge.net> 8 9 101. Introduction 11--------------- 12 13The file format for the "tags" file, as used by Vi and many of its 14descendants, has limited capabilities. 15 16This additional functionality is desired: 17 181. Static or local tags. 19 The scope of these tags is the file where they are defined. The same tag 20 can appear in several files, without really being a duplicate. 212. Duplicate tags. 22 Allow the same tag to occur more then once. They can be located in 23 a different file and/or have a different command. 243. Support for C++. 25 A tag is not only specified by its name, but also by the context (the 26 class name). 274. Future extension. 28 When even more additional functionality is desired, it must be possible to 29 add this later, whithout breaking programs that don't support it. 30 31 322. From proposal to standard 33---------------------------- 34 35To make this proposal into a standard for tags files, it needs to be supported 36by most people working on versions of Vi, ctags, etc.. Currently this 37standard is supported by: 38 39Darren Hiebert <dhiebert at users.sourceforge.net> Exuberant ctags 40Bram Moolenaar <Bram at vim.org> Vim (Vi IMproved) 41 42These have been or will be asked to support this standard: 43 44Nvi Keith Bostic <bostic at bsdi.com> 45Vile Tom E. Dickey <dickey at clark.net> 46NEdit Mark Edel <edel at ltx.com> 47CRiSP Paul Fox <fox at crisp.demon.co.uk> 48Lemmy James Iuliano <jai at accessone.com> 49Zeus Jussi Jumppanen <jussij at ca.com.au> 50Elvis Steve Kirkendall <kirkenda at cs.pdx.edu> 51FTE Marko Macek <Marko.Macek at snet.fri.uni-lj.si> 52 53 543. Backwards compatibility 55-------------------------- 56 57A tags file that is generated in the new format should still be usable by Vi. 58This makes it possible to distribute tags files that are usable by all 59versions and descendants of Vi. 60 61This restricts the format to what Vi can handle. The format is: 62 631. The tags file is a list of lines, each line in the format: 64 65 {tagname}<Tab>{tagfile}<Tab>{tagaddress} 66 67 {tagname} Any identifier, not containing white space.. 68 <Tab> Exactly one TAB character (although many versions of Vi can 69 handle any amount of white space). 70 {tagfile} The name of the file where {tagname} is defined, relative to 71 the current directory (or location of the tags file?). 72 {tagaddress} Any Ex command. When executed, it behaves like 'magic' was 73 not set. 74 752. The tags file is sorted on {tagname}. This allows for a binary search in 76 the file. 77 783. Duplicate tags are allowed, but which one is actually used is 79 unpredictable (because of the binary search). 80 81The best way to add extra text to the line for the new functionality, without 82breaking it for Vi, is to put a comment in the {tagaddress}. This gives the 83freedom to use any text, and should work in any traditional Vi implementation. 84 85For example, when the old tags file contains: 86 87 main main.c /^main(argc, argv)$/ 88 DEBUG defines.c 89 89 90The new lines can be: 91 92 main main.c /^main(argc, argv)$/;"any additional text 93 DEBUG defines.c 89;"any additional text 94 95Note that the ';' is required to put the cursor in the right line, and then 96the '"' is recognized as the start of a comment. 97 98For Posix compliant Vi versions this will NOT work, since only a line number 99or a search command is recognized. I hope Posix can be adjusted. Nvi suffers 100from this. 101 102 1034. Security 104----------- 105 106Vi allows the use of any Ex command in a tags file. This has the potential of 107a trojan horse security leak. 108 109The proposal is to allow only Ex commands that position the cursor in a single 110file. Other commands, like editing another file, quitting the editor, 111changing a file or writing a file, are not allowed. It is therefore logical 112to call the command a tagaddress. 113 114Specifically, these two Ex commands are allowed: 115- A decimal line number. 116 89 117- A search command. It is a regular expression pattern, as used by Vi, 118 enclosed in // or ??. 119 /^int c;$/ 120 ?main()? 121 122There are two combinations possible: 123- Concatenation of the above, with ';' in between. The meaning is that the 124 first line number or search command is used, the cursor is positioned in 125 that line, and then the second search command is used (a line number would 126 not be useful). This can be done multiple times. This is useful when the 127 information in a single line is not unique, and the search needs to start 128 in a specified line. 129 /struct xyz {/;/int count;/ 130 389;/struct foo/;/char *s;/ 131- A trailing comment can be added, starting with ';"' (two characters: 132 semi-colon and double-quote). This is used below. 133 89;" foo bar 134 135This might be extended in the future. What is currently missing is a way to 136position the cursor in a certain column. 137 138 1395. Goals 140-------- 141 142Now the usage of the comment text has to be defined. The following is aimed 143at: 144 1451. Keep the text short, because: 146 - The line length that Vi can handle is limited to 512 characters. 147 - Tags files can contain thousands of tags. I have seen tags files of 148 several Mbytes. 149 - More text makes searching slower. 1502. Keep the text readable, because: 151 - It is often necessary to check the output of a new ctags program. 152 - Be able to edit the file by hand. 153 - Make it easier to write a program to produce or parse the file. 1543. Don't use special characters, because: 155 - It should be possible to treat a tags file like any normal text file. 156 157 1586. Proposal 159----------- 160 161Use a comment after the {tagaddress} field. The format would be: 162 163 {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..] 164 165 {tagname} Any identifier, not containing white space.. 166 <Tab> Exactly one TAB character (although many versions of Vi can 167 handle any amount of white space). 168 {tagfile} The name of the file where {tagname} is defined, relative to 169 the current directory (or location of the tags file?). 170 {tagaddress} Any Ex command. When executed, it behaves like 'magic' was 171 not set. It may be restricted to a line number or a search 172 pattern (Posix). 173Optionally: 174 ;" semicolon + doublequote: Ends the tagaddress in way that looks 175 like the start of a comment to Vi. 176 {tagfield} See below. 177 178A tagfield has a name, a colon, and a value: "name:value". 179- The name consist only out of alphabetical characters. Upper and lower case 180 are allowed. Lower case is recommended. Case matters ("kind:" and "Kind: 181 are different tagfields). 182- The value may be empty. 183 It cannot contain a <Tab>. 184 When a value contains a "\t", this stands for a <Tab>. 185 When a value contains a "\r", this stands for a <CR>. 186 When a value contains a "\n", this stands for a <NL>. 187 When a value contains a "\\", this stands for a single '\' character. 188 Other use of the backslash character is reserved for future expansion. 189 Warning: When a tagfield value holds an MS-DOS file name, the backslashes 190 must be doubled! 191 192 193Proposed tagfield names: 194 195FIELD-NAME DESCRIPTION 196 197arity Number of arguments for a function tag. 198 199class Name of the class for which this tag is a member or method. 200 201enum Name of the enumeration in which this tag is an enumerator. 202 203file Static (local) tag, with a scope of the specified file. When 204 the value is empty, {tagfile} is used. 205 206function Function in which this tag is defined. Useful for local 207 variables (and functions). When functions nest (e.g., in 208 Pascal), the function names are concatenated, separated with 209 '/', so it looks like a path. 210 211kind Kind of tag. The value depends on the language. For C and 212 C++ these kinds are recommended: 213 c class name 214 d define (from #define XXX) 215 e enumerator 216 f function or method name 217 F file name 218 g enumeration name 219 m member (of structure or class data) 220 p function prototype 221 s structure name 222 t typedef 223 u union name 224 v variable 225 When this field is omitted, the kind of tag is undefined. 226 227struct Name of the struct in which this tag is a member. 228 229union Name of the union in which this tag is a member. 230 231 232Note that these are mostly for C and C++. When tags programs are written for 233other languages, this list should be extended to include the used field names. 234This will help users to be independent of the tags program used. 235 236Examples: 237 238 asdf sub.cc /^asdf()$/;" new_field:some\svalue file: 239 foo_t sub.h /^typedef foo_t$/;" kind:t 240 func3 sub.p /^func3()$/;" function:/func1/func2 file: 241 getflag sub.c /^getflag(arg)$/;" kind:f file: 242 inc sub.cc /^inc()$/;" file: class:PipeBuf 243 244 245The name of the "kind:" field can be omitted. This is to reduce the size of 246the tags file by about 15%. A program reading the tags file can recognize the 247"kind:" field by the missing ':'. Examples: 248 249 foo_t sub.h /^typedef foo_t$/;" t 250 getflag sub.c /^getflag(arg)$/;" f file: 251 252 253Additional remarks: 254- When a tagfield appears twice in a tag line, only the last one is used. 255 256 257Note about line separators: 258 259Vi traditionally runs on Unix systems, where the line separator is a single 260linefeed character <NL>. On MS-DOS and compatible systems <CR><NL> is the 261standard line separator. To increase portability, this line separator is also 262supported. 263 264On the Macintosh a single <CR> is used for line separator. Supporting this on 265Unix systems causes problems, because most fgets() implementation don't see 266the <CR> as a line separator. Therefore the support for a <CR> as line 267separator is limited to the Macintosh. 268 269Summary: 270line separator generated on accepted on 271<LF> Unix Unix, MS-DOS, Macintosh 272<CR> Macintosh Macintosh 273<CR><LF> MS-DOS Unix, MS-DOS, Macintosh 274 275The characters <CR> and <LF> cannot be used inside a tag line. This is not 276mentioned elsewhere (because it's obvious). 277 278 279Note about white space: 280 281Vi allowed any white space to separate the tagname from the tagfile, and the 282filename from the tagaddress. This would need to be allowed for backwards 283compatibility. However, all known programs that generate tags use a single 284<Tab> to separate fields. 285 286There is a problem for using file names with embedded white space in the 287tagfile field. To work around this, the same special characters could be used 288as in the new fields, for example "\s". But, unfortunately, in MS-DOS the 289backslash character is used to separate file names. The file name 290"c:\vim\sap" contains "\s", but this is not a <Space>. The number of 291backslashes could be doubled, but that will add a lot of characters, and make 292parsing the tags file slower and clumsy. 293 294To avoid these problems, we will only allow a <Tab> to separate fields, and 295not support a file name or tagname that contains a <Tab> character. This 296means that we are not 100% Vi compatible. However, there is no known tags 297program that uses something else than a <Tab> to separate the fields. Only 298when a user typed the tags file himself, or made his own program to generate a 299tags file, we could run into problems. To solve this, the tags file should be 300filtered, to replace the arbitrary white space with a single <Tab>. This Vi 301command can be used: 302 303 :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/ 304 305(replace ^I with a real <Tab>). 306 307 308TAG FILE INFORMATION: 309 310Psuedo-tag lines can be used to encode information into the tag file regarding 311details about its content (e.g. have the tags been sorted?, are the optional 312tagfields present?), and regarding the program used to generate the tag file. 313This information can be used both to optimize use of the tag file (e.g. 314enable/disable binary searching) and provide general information (what version 315of the generator was used). 316 317The names of the tags used in these lines may be suitably chosen to ensure 318that when sorted, they will always be located near the first lines of the tag 319file. The use of "!_TAG_" is recommended. Note that a rare tag like "!" 320can sort to before these lines. The program reading the tags file should be 321smart enough to skip over these tags. 322 323The lines described below have been chosen to convey a select set of 324information. 325 326Tag lines providing information about the content of the tag file: 327 328!_TAG_FILE_FORMAT {version-number} /optional comment/ 329!_TAG_FILE_SORTED {0|1} /0=unsorted, 1=sorted/ 330 331The {version-number} used in the tag file format line reserves the value of 332"1" for tag files complying with the original UNIX vi/ctags format, and 333reserves the value "2" for tag files complying with this proposal. This value 334may be used to determine if the extended features described in this proposal 335are present. 336 337Tag lines providing information about the program used to generate the tag 338file, and provided solely for documentation purposes: 339 340!_TAG_PROGRAM_AUTHOR {author-name} /{email-address}/ 341!_TAG_PROGRAM_NAME {program-name} /optional comment/ 342!_TAG_PROGRAM_URL {URL} /optional comment/ 343!_TAG_PROGRAM_VERSION {version-id} /optional comment/ 344 345 346[End Of Document] 347