1<html> 2<head> 3<title>Exuberant Ctags: Adding support for a new language</title> 4</head> 5<body> 6 7<h1>How to Add Support for a New Language to Exuberant Ctags</h1> 8 9<p> 10<b>Exuberant Ctags</b> has been designed to make it very easy to add your own 11custom language parser. As an exercise, let us assume that I want to add 12support for my new language, <em>Swine</em>, the successor to Perl (i.e. Perl 13before Swine <wince>). This language consists of simple definitions of 14labels in the form "<code>def my_label</code>". Let us now examine the various 15ways to do this. 16</p> 17 18<h2>Operational background</h2> 19 20<p> 21As ctags considers each file name, it tries to determine the language of the 22file by applying the following three tests in order: if the file extension has 23been mapped to a language, if the file name matches a shell pattern mapped to 24a language, and finally if the file is executable and its first line specifies 25an interpreter using the Unix-style "#!" specification (if supported on the 26platform). If a language was identified, the file is opened and then the 27appropriate language parser is called to operate on the currently open file. 28The parser parses through the file and whenever it finds some interesting 29token, calls a function to define a tag entry. 30</p> 31 32<h2>Creating a user-defined language</h2> 33 34<p> 35The quickest and easiest way to do this is by defining a new language using 36the program options. In order to have Swine support available every time I 37start ctags, I will place the following lines into the file 38<code>$HOME/.ctags</code>, which is read in every time ctags starts: 39 40<code> 41<pre> 42 --langdef=swine 43 --langmap=swine:.swn 44 --regex-swine=/^def[ \t]*([a-zA-Z0-9_]+)/\1/d,definition/ 45</pre> 46</code> 47The first line defines the new language, the second maps a file extension to 48it, and the third defines a regular expression to identify a language 49definition and generate a tag file entry for it. 50</p> 51 52<h2>Integrating a new language parser</h2> 53 54<p> 55Now suppose that I want to truly integrate compiled-in support for Swine into 56ctags. First, I create a new module, <code>swine.c</code>, and add one 57externally visible function to it, <code>extern parserDefinition 58*SwineParser(void)</code>, and add its name to the table in 59<code>parsers.h</code>. The job of this parser definition function is to 60create an instance of the <code>parserDefinition</code> structure (using 61<code>parserNew()</code>) and populate it with information defining how files 62of this language are recognized, what kinds of tags it can locate, and the 63function used to invoke the parser on the currently open file. 64</p> 65 66<p> 67The structure <code>parserDefinition</code> allows assignment of the following 68fields: 69 70<code> 71<pre> 72 const char *name; /* name of language */ 73 kindOption *kinds; /* tag kinds handled by parser */ 74 unsigned int kindCount; /* size of `kinds' list */ 75 const char *const *extensions; /* list of default extensions */ 76 const char *const *patterns; /* list of default file name patterns */ 77 parserInitialize initialize; /* initialization routine, if needed */ 78 simpleParser parser; /* simple parser (common case) */ 79 rescanParser parser2; /* rescanning parser (unusual case) */ 80 boolean regex; /* is this a regex parser? */ 81</pre> 82</code> 83</p> 84 85<p> 86The <code>name</code> field must be set to a non-empty string. Also, unless 87<code>regex</code> is set true (see below), either <code>parser</code> or 88<code>parser2</code> must set to point to a parsing routine which will 89generate the tag entries. All other fields are optional. 90 91<p> 92Now all that is left is to implement the parser. In order to do its job, the 93parser should read the file stream using using one of the two I/O interfaces: 94either the character-oriented <code>fileGetc()</code>, or the line-oriented 95<code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser 96can put back a character using <code>fileUngetc()</code>. How our Swine parser 97actually parses the contents of the file is entirely up to the writer of the 98parser--it can be as crude or elegant as desired. You will note a variety of 99examples from the most complex (c.c) to the simplest (make.c). 100</p> 101 102<p> 103When the Swine parser identifies an interesting token for which it wants to 104add a tag to the tag file, it should create a <code>tagEntryInfo</code> 105structure and initialize it by calling <code>initTagEntry()</code>, which 106initializes defaults and fills information about the current line number and 107the file position of the beginning of the line. After filling in information 108defining the current entry (and possibly overriding the file position or other 109defaults), the parser passes this structure to <code>makeTagEntry()</code>. 110</p> 111 112<p> 113Instead of writing a character-oriented parser, it may be possible to specify 114regular expressions which define the tags. In this case, instead of defining a 115parsing function, <code>SwineParser()</code>, sets <code>regex</code> to true, 116and points <code>initialize</code> to a function which calls 117<code>addTagRegex()</code> to install the regular expressions which define its 118tags. The regular expressions thus installed are compared against each line 119of the input file and generate a specified tag when matched. It is usually 120much easier to write a regex-based parser, although they can be slower (one 121parser example was 4 times slower). Whether the speed difference matters to 122you depends upon how much code you have to parse. It is probably a good 123strategy to implement a regex-based parser first, and if it is too slow for 124you, then invest the time and effort to write a character-based parser. 125</p> 126 127<p> 128A regex-based parser is inherently line-oriented (i.e. the entire tag must be 129recognizable from looking at a single line) and context-insensitive (i.e the 130generation of the tag is entirely based upon when the regular expression 131matches a single line). However, a regex-based callback mechanism is also 132available, installed via the function <code>addCallbackRegex()</code>. This 133allows a specified function to be invoked whenever a specific regular 134expression is matched. This allows a character-oriented parser to operate 135based upon context of what happened on a previous line (e.g. the start or end 136of a multi-line comment). Note that regex callbacks are called just before the 137first character of that line can is read via either <code>fileGetc()</code> or 138using <code>fileGetc()</code>. The effect of this is that before either of 139these routines return, a callback routine may be invoked because the line 140matched a regex callback. A callback function to be installed is defined by 141these types: 142 143<code> 144<pre> 145 typedef void (*regexCallback) (const char *line, const regexMatch *matches, unsigned int count); 146 147 typedef struct { 148 size_t start; /* character index in line where match starts */ 149 size_t length; /* length of match */ 150 } regexMatch; 151</pre> 152</code> 153</p> 154 155<p> 156The callback function is passed the line matching the regular expression and 157an array of <code>count</code> structures defining the subexpression matches 158of the regular expression, starting from \0 (the entire line). 159</p> 160 161<p> 162Lastly, be sure to add your the name of the file containing your parser (e.g. 163swine.c) to the macro <code>SOURCES</code> in the file <code>source.mak</code> 164and an entry for the object file to the macro <code>OBJECTS</code> in the same 165file, so that your new module will be compiled into the program. 166</p> 167 168<p> 169In case you have some problems run <code>ctags --verbose</code> to see if the 170extensions or patterns you defined for your language conflict with other 171languages. 172</p> 173 174<p> 175This is all there is to it. All other details are specific to the parser and 176how it wants to do its job. There are some support functions which can take 177care of some commonly needed parsing tasks, such as keyword table lookups (see 178keyword.c), which you can make use of if desired (examples of its use can be 179found in c.c, eiffel.c, and fortran.c). Almost everything is already taken care 180of automatically for you by the infrastructure. Writing the actual parsing 181algorithm is the hardest part, but is not constrained by any need to conform 182to anything in ctags other than that mentioned above. 183</p> 184 185<p> 186There are several different approaches used in the parsers inside <b>Exuberant 187Ctags</b> and you can browse through these as examples of how to go about 188creating your own. 189</p> 190 191<h2>Examples</h2> 192 193<p> 194Below you will find several example parsers demonstrating most of the 195facilities available. These include three alternative implementations 196of a Swine parser, which generate tags for lines beginning with 197"<code>def</code>" followed by some name. 198</p> 199 200<code> 201<pre> 202/*************************************************************************** 203 * swine.c 204 * Character-based parser for Swine definitions 205 **************************************************************************/ 206/* INCLUDE FILES */ 207#include "general.h" /* always include first */ 208 209#include <string.h> /* to declare strxxx() functions */ 210#include <ctype.h> /* to define isxxx() macros */ 211 212#include "parse.h" /* always include */ 213#include "read.h" /* to define file fileReadLine() */ 214 215/* DATA DEFINITIONS */ 216typedef enum eSwineKinds { 217 K_DEFINE 218} swineKind; 219 220static kindOption SwineKinds [] = { 221 { TRUE, 'd', "definition", "pig definition" } 222}; 223 224/* FUNCTION DEFINITIONS */ 225 226static void findSwineTags (void) 227{ 228 vString *name = vStringNew (); 229 const unsigned char *line; 230 231 while ((line = fileReadLine ()) != NULL) 232 { 233 /* Look for a line beginning with "def" followed by name */ 234 if (strncmp ((const char*) line, "def", (size_t) 3) == 0 && 235 isspace ((int) line [3])) 236 { 237 const unsigned char *cp = line + 4; 238 while (isspace ((int) *cp)) 239 ++cp; 240 while (isalnum ((int) *cp) || *cp == '_') 241 { 242 vStringPut (name, (int) *cp); 243 ++cp; 244 } 245 makeSimpleTag (name, SwineKinds, K_DEFINE); 246 vStringClear (name); 247 } 248 } 249 vStringDelete (name); 250} 251 252/* Create parser definition stucture */ 253extern parserDefinition* SwineParser (void) 254{ 255 static const char *const extensions [] = { "swn", NULL }; 256 parserDefinition* def = parserNew ("Swine"); 257 def->kinds = SwineKinds; 258 def->kindCount = KIND_COUNT (SwineKinds); 259 def->extensions = extensions; 260 def->parser = findSwineTags; 261 return def; 262} 263</pre> 264</code> 265 266<p> 267<pre> 268<code> 269/*************************************************************************** 270 * swine.c 271 * Regex-based parser for Swine 272 **************************************************************************/ 273/* INCLUDE FILES */ 274#include "general.h" /* always include first */ 275#include "parse.h" /* always include */ 276 277/* FUNCTION DEFINITIONS */ 278 279static void installSwineRegex (const langType language) 280{ 281 addTagRegex (language, "^def[ \t]*([a-zA-Z0-9_]+)", "\\1", "d,definition", NULL); 282} 283 284/* Create parser definition stucture */ 285extern parserDefinition* SwineParser (void) 286{ 287 static const char *const extensions [] = { "swn", NULL }; 288 parserDefinition* def = parserNew ("Swine"); 289 def->patterns = patterns; 290 def->extensions = extensions; 291 def->initialize = installSwineRegex; 292 def->regex = TRUE; 293 return def; 294} 295</code> 296</pre> 297 298<p> 299<pre> 300/*************************************************************************** 301 * swine.c 302 * Regex callback-based parser for Swine definitions 303 **************************************************************************/ 304/* INCLUDE FILES */ 305#include "general.h" /* always include first */ 306 307#include "parse.h" /* always include */ 308#include "read.h" /* to define file fileReadLine() */ 309 310/* DATA DEFINITIONS */ 311typedef enum eSwineKinds { 312 K_DEFINE 313} swineKind; 314 315static kindOption SwineKinds [] = { 316 { TRUE, 'd', "definition", "pig definition" } 317}; 318 319/* FUNCTION DEFINITIONS */ 320 321static void definition (const char *const line, const regexMatch *const matches, 322 const unsigned int count) 323{ 324 if (count > 1) /* should always be true per regex */ 325 { 326 vString *const name = vStringNew (); 327 vStringNCopyS (name, line + matches [1].start, matches [1].length); 328 makeSimpleTag (name, SwineKinds, K_DEFINE); 329 } 330} 331 332static void findSwineTags (void) 333{ 334 while (fileReadLine () != NULL) 335 ; /* don't need to do anything here since callback is sufficient */ 336} 337 338static void installSwine (const langType language) 339{ 340 addCallbackRegex (language, "^def[ \t]+([a-zA-Z0-9_]+)", NULL, definition); 341} 342 343/* Create parser definition stucture */ 344extern parserDefinition* SwineParser (void) 345{ 346 static const char *const extensions [] = { "swn", NULL }; 347 parserDefinition* def = parserNew ("Swine"); 348 def->kinds = SwineKinds; 349 def->kindCount = COUNT_ARRAY (SwineKinds); 350 def->extensions = extensions; 351 def->parser = findSwineTags; 352 def->initialize = installSwine; 353 return def; 354} 355</pre> 356 357<p> 358<pre> 359/*************************************************************************** 360 * make.c 361 * Regex-based parser for makefile macros 362 **************************************************************************/ 363/* INCLUDE FILES */ 364#include "general.h" /* always include first */ 365#include "parse.h" /* always include */ 366 367/* FUNCTION DEFINITIONS */ 368 369static void installMakefileRegex (const langType language) 370{ 371 addTagRegex (language, "(^|[ \t])([A-Z0-9_]+)[ \t]*:?=", "\\2", "m,macro", "i"); 372} 373 374/* Create parser definition stucture */ 375extern parserDefinition* MakefileParser (void) 376{ 377 static const char *const patterns [] = { "[Mm]akefile", NULL }; 378 static const char *const extensions [] = { "mak", NULL }; 379 parserDefinition* const def = parserNew ("Makefile"); 380 def->patterns = patterns; 381 def->extensions = extensions; 382 def->initialize = installMakefileRegex; 383 def->regex = TRUE; 384 return def; 385} 386</pre> 387 388</body> 389</html> 390