xref: /Universal-ctags/old-docs/EXTENDING.html (revision cfc45e3bd9000e8ce193399c7e0ecf2bbeb57977)
1<html>
2<head>
3<title>Exuberant Ctags: Adding support for a new language</title>
4</head>
5<body>
6
7<h1>How to Add Support for a New Language to Exuberant Ctags</h1>
8
9<p>
10<b>Exuberant Ctags</b> has been designed to make it very easy to add your own
11custom language parser. As an exercise, let us assume that I want to add
12support for my new language, <em>Swine</em>, the successor to Perl (i.e. Perl
13before Swine &lt;wince&gt;). This language consists of simple definitions of
14labels in the form "<code>def my_label</code>". Let us now examine the various
15ways to do this.
16</p>
17
18<h2>Operational background</h2>
19
20<p>
21As ctags considers each file name, it tries to determine the language of the
22file by applying the following three tests in order: if the file extension has
23been mapped to a language, if the file name matches a shell pattern mapped to
24a language, and finally if the file is executable and its first line specifies
25an interpreter using the Unix-style "#!" specification (if supported on the
26platform). If a language was identified, the file is opened and then the
27appropriate language parser is called to operate on the currently open file.
28The parser parses through the file and whenever it finds some interesting
29token, calls a function to define a tag entry.
30</p>
31
32<h2>Creating a user-defined language</h2>
33
34<p>
35The quickest and easiest way to do this is by defining a new language using
36the program options. In order to have Swine support available every time I
37start ctags, I will place the following lines into the file
38<code>$HOME/.ctags</code>, which is read in every time ctags starts:
39
40<code>
41<pre>
42  --langdef=swine
43  --langmap=swine:.swn
44  --regex-swine=/^def[ \t]*([a-zA-Z0-9_]+)/\1/d,definition/
45</pre>
46</code>
47The first line defines the new language, the second maps a file extension to
48it, and the third defines a regular expression to identify a language
49definition and generate a tag file entry for it.
50</p>
51
52<h2>Integrating a new language parser</h2>
53
54<p>
55Now suppose that I want to truly integrate compiled-in support for Swine into
56ctags. First, I create a new module, <code>swine.c</code>, and add one
57externally visible function to it, <code>extern parserDefinition
58*SwineParser(void)</code>, and add its name to the table in
59<code>parsers.h</code>. The job of this parser definition function is to
60create an instance of the <code>parserDefinition</code> structure (using
61<code>parserNew()</code>) and populate it with information defining how files
62of this language are recognized, what kinds of tags it can locate, and the
63function used to invoke the parser on the currently open file.
64</p>
65
66<p>
67The structure <code>parserDefinition</code> allows assignment of the following
68fields:
69
70<code>
71<pre>
72  const char *name;               /* name of language */
73  kindOption *kinds;              /* tag kinds handled by parser */
74  unsigned int kindCount;         /* size of `kinds' list */
75  const char *const *extensions;  /* list of default extensions */
76  const char *const *patterns;    /* list of default file name patterns */
77  parserInitialize initialize;    /* initialization routine, if needed */
78  simpleParser parser;            /* simple parser (common case) */
79  rescanParser parser2;           /* rescanning parser (unusual case) */
80  boolean regex;                  /* is this a regex parser? */
81</pre>
82</code>
83</p>
84
85<p>
86The <code>name</code> field must be set to a non-empty string. Also, unless
87<code>regex</code> is set true (see below), either <code>parser</code> or
88<code>parser2</code> must set to point to a parsing routine which will
89generate the tag entries. All other fields are optional.
90
91<p>
92Now all that is left is to implement the parser. In order to do its job, the
93parser should read the file stream using using one of the two I/O interfaces:
94either the character-oriented <code>fileGetc()</code>, or the line-oriented
95<code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser
96can put back a character using <code>fileUngetc()</code>. How our Swine parser
97actually parses the contents of the file is entirely up to the writer of the
98parser--it can be as crude or elegant as desired. You will note a variety of
99examples from the most complex (c.c) to the simplest (make.c).
100</p>
101
102<p>
103When the Swine parser identifies an interesting token for which it wants to
104add a tag to the tag file, it should create a <code>tagEntryInfo</code>
105structure and initialize it by calling <code>initTagEntry()</code>, which
106initializes defaults and fills information about the current line number and
107the file position of the beginning of the line. After filling in information
108defining the current entry (and possibly overriding the file position or other
109defaults), the parser passes this structure to <code>makeTagEntry()</code>.
110</p>
111
112<p>
113Instead of writing a character-oriented parser, it may be possible to specify
114regular expressions which define the tags. In this case, instead of defining a
115parsing function, <code>SwineParser()</code>, sets <code>regex</code> to true,
116and points <code>initialize</code> to a function which calls
117<code>addTagRegex()</code> to install the regular expressions which define its
118tags. The regular expressions thus installed are compared against each line
119of the input file and generate a specified tag when matched. It is usually
120much easier to write a regex-based parser, although they can be slower (one
121parser example was 4 times slower). Whether the speed difference matters to
122you depends upon how much code you have to parse. It is probably a good
123strategy to implement a regex-based parser first, and if it is too slow for
124you, then invest the time and effort to write a character-based parser.
125</p>
126
127<p>
128A regex-based parser is inherently line-oriented (i.e. the entire tag must be
129recognizable from looking at a single line) and context-insensitive (i.e the
130generation of the tag is entirely based upon when the regular expression
131matches a single line). However, a regex-based callback mechanism is also
132available, installed via the function <code>addCallbackRegex()</code>. This
133allows a specified function to be invoked whenever a specific regular
134expression is matched. This allows a character-oriented parser to operate
135based upon context of what happened on a previous line (e.g. the start or end
136of a multi-line comment). Note that regex callbacks are called just before the
137first character of that line can is read via either <code>fileGetc()</code> or
138using <code>fileGetc()</code>. The effect of this is that before either of
139these routines return, a callback routine may be invoked because the line
140matched a regex callback. A callback function to be installed is defined by
141these types:
142
143<code>
144<pre>
145  typedef void (*regexCallback) (const char *line, const regexMatch *matches, unsigned int count);
146
147  typedef struct {
148      size_t start;   /* character index in line where match starts */
149      size_t length;  /* length of match */
150  } regexMatch;
151</pre>
152</code>
153</p>
154
155<p>
156The callback function is passed the line matching the regular expression and
157an array of <code>count</code> structures defining the subexpression matches
158of the regular expression, starting from \0 (the entire line).
159</p>
160
161<p>
162Lastly, be sure to add your the name of the file containing your parser (e.g.
163swine.c) to the macro <code>SOURCES</code> in the file <code>source.mak</code>
164and an entry for the object file to the macro <code>OBJECTS</code> in the same
165file, so that your new module will be compiled into the program.
166</p>
167
168<p>
169In case you have some problems run <code>ctags --verbose</code> to see if the
170extensions or patterns you defined for your language conflict with other
171languages.
172</p>
173
174<p>
175This is all there is to it. All other details are specific to the parser and
176how it wants to do its job. There are some support functions which can take
177care of some commonly needed parsing tasks, such as keyword table lookups (see
178keyword.c), which you can make use of if desired (examples of its use can be
179found in c.c, eiffel.c, and fortran.c). Almost everything is already taken care
180of automatically for you by the infrastructure.  Writing the actual parsing
181algorithm is the hardest part, but is not constrained by any need to conform
182to anything in ctags other than that mentioned above.
183</p>
184
185<p>
186There are several different approaches used in the parsers inside <b>Exuberant
187Ctags</b> and you can browse through these as examples of how to go about
188creating your own.
189</p>
190
191<h2>Examples</h2>
192
193<p>
194Below you will find several example parsers demonstrating most of the
195facilities available. These include three alternative implementations
196of a Swine parser, which generate tags for lines beginning with
197"<code>def</code>" followed by some name.
198</p>
199
200<code>
201<pre>
202/***************************************************************************
203 * swine.c
204 * Character-based parser for Swine definitions
205 **************************************************************************/
206/* INCLUDE FILES */
207#include "general.h"    /* always include first */
208
209#include &lt;string.h&gt;     /* to declare strxxx() functions */
210#include &lt;ctype.h&gt;      /* to define isxxx() macros */
211
212#include "parse.h"      /* always include */
213#include "read.h"       /* to define file fileReadLine() */
214
215/* DATA DEFINITIONS */
216typedef enum eSwineKinds {
217    K_DEFINE
218} swineKind;
219
220static kindOption SwineKinds [] = {
221    { TRUE, 'd', "definition", "pig definition" }
222};
223
224/* FUNCTION DEFINITIONS */
225
226static void findSwineTags (void)
227{
228    vString *name = vStringNew ();
229    const unsigned char *line;
230
231    while ((line = fileReadLine ()) != NULL)
232    {
233        /* Look for a line beginning with "def" followed by name */
234        if (strncmp ((const char*) line, "def", (size_t) 3) == 0  &amp;&amp;
235            isspace ((int) line [3]))
236        {
237            const unsigned char *cp = line + 4;
238            while (isspace ((int) *cp))
239                ++cp;
240            while (isalnum ((int) *cp)  ||  *cp == '_')
241            {
242                vStringPut (name, (int) *cp);
243                ++cp;
244            }
245            makeSimpleTag (name, SwineKinds, K_DEFINE);
246            vStringClear (name);
247        }
248    }
249    vStringDelete (name);
250}
251
252/* Create parser definition stucture */
253extern parserDefinition* SwineParser (void)
254{
255    static const char *const extensions [] = { "swn", NULL };
256    parserDefinition* def = parserNew ("Swine");
257    def-&gt;kinds      = SwineKinds;
258    def-&gt;kindCount  = KIND_COUNT (SwineKinds);
259    def-&gt;extensions = extensions;
260    def-&gt;parser     = findSwineTags;
261    return def;
262}
263</pre>
264</code>
265
266<p>
267<pre>
268<code>
269/***************************************************************************
270 * swine.c
271 * Regex-based parser for Swine
272 **************************************************************************/
273/* INCLUDE FILES */
274#include "general.h"    /* always include first */
275#include "parse.h"      /* always include */
276
277/* FUNCTION DEFINITIONS */
278
279static void installSwineRegex (const langType language)
280{
281    addTagRegex (language, "^def[ \t]*([a-zA-Z0-9_]+)", "\\1", "d,definition", NULL);
282}
283
284/* Create parser definition stucture */
285extern parserDefinition* SwineParser (void)
286{
287    static const char *const extensions [] = { "swn", NULL };
288    parserDefinition* def = parserNew ("Swine");
289    def-&gt;patterns   = patterns;
290    def-&gt;extensions = extensions;
291    def-&gt;initialize = installSwineRegex;
292    def-&gt;regex      = TRUE;
293    return def;
294}
295</code>
296</pre>
297
298<p>
299<pre>
300/***************************************************************************
301 * swine.c
302 * Regex callback-based parser for Swine definitions
303 **************************************************************************/
304/* INCLUDE FILES */
305#include "general.h"    /* always include first */
306
307#include "parse.h"      /* always include */
308#include "read.h"       /* to define file fileReadLine() */
309
310/* DATA DEFINITIONS */
311typedef enum eSwineKinds {
312    K_DEFINE
313} swineKind;
314
315static kindOption SwineKinds [] = {
316    { TRUE, 'd', "definition", "pig definition" }
317};
318
319/* FUNCTION DEFINITIONS */
320
321static void definition (const char *const line, const regexMatch *const matches,
322                       const unsigned int count)
323{
324    if (count &gt; 1)    /* should always be true per regex */
325    {
326        vString *const name = vStringNew ();
327        vStringNCopyS (name, line + matches [1].start, matches [1].length);
328        makeSimpleTag (name, SwineKinds, K_DEFINE);
329    }
330}
331
332static void findSwineTags (void)
333{
334    while (fileReadLine () != NULL)
335        ;  /* don't need to do anything here since callback is sufficient */
336}
337
338static void installSwine (const langType language)
339{
340    addCallbackRegex (language, "^def[ \t]+([a-zA-Z0-9_]+)", NULL, definition);
341}
342
343/* Create parser definition stucture */
344extern parserDefinition* SwineParser (void)
345{
346    static const char *const extensions [] = { "swn", NULL };
347    parserDefinition* def = parserNew ("Swine");
348    def-&gt;kinds      = SwineKinds;
349    def-&gt;kindCount  = COUNT_ARRAY (SwineKinds);
350    def-&gt;extensions = extensions;
351    def-&gt;parser     = findSwineTags;
352    def-&gt;initialize = installSwine;
353    return def;
354}
355</pre>
356
357<p>
358<pre>
359/***************************************************************************
360 * make.c
361 * Regex-based parser for makefile macros
362 **************************************************************************/
363/* INCLUDE FILES */
364#include "general.h"    /* always include first */
365#include "parse.h"      /* always include */
366
367/* FUNCTION DEFINITIONS */
368
369static void installMakefileRegex (const langType language)
370{
371    addTagRegex (language, "(^|[ \t])([A-Z0-9_]+)[ \t]*:?=", "\\2", "m,macro", "i");
372}
373
374/* Create parser definition stucture */
375extern parserDefinition* MakefileParser (void)
376{
377    static const char *const patterns [] = { "[Mm]akefile", NULL };
378    static const char *const extensions [] = { "mak", NULL };
379    parserDefinition* const def = parserNew ("Makefile");
380    def-&gt;patterns   = patterns;
381    def-&gt;extensions = extensions;
382    def-&gt;initialize = installMakefileRegex;
383    def-&gt;regex      = TRUE;
384    return def;
385}
386</pre>
387
388</body>
389</html>
390