xref: /Universal-ctags/docs/parser-in-c.rst (revision 86bcb5c2d162b4f8df782db0bb7293094d899fd2)
1063580daSHiroo HAYASHI.. _writing_parser_in_c:
2063580daSHiroo HAYASHI
3063580daSHiroo HAYASHI=============================================================================
4063580daSHiroo HAYASHIWriting a parser in C
5063580daSHiroo HAYASHI=============================================================================
6063580daSHiroo HAYASHI
7*86bcb5c2SHiroo HAYASHIThe section is based on the section "Integrating a new language parser" in "`How
8063580daSHiroo HAYASHIto Add Support for a New Language to Exuberant Ctags (EXTENDING)
9*86bcb5c2SHiroo HAYASHI<http://ctags.sourceforge.net/EXTENDING.html>`_" of Exuberant Ctags documents.
10063580daSHiroo HAYASHI
11063580daSHiroo HAYASHINow suppose that I want to truly integrate compiled-in support for Swine into
12063580daSHiroo HAYASHIctags.
13063580daSHiroo HAYASHI
14063580daSHiroo HAYASHIRegistering a parser
15063580daSHiroo HAYASHI-------------------------------------------------
16*86bcb5c2SHiroo HAYASHIFirst, I create a new module, ``swine.c``, and add one externally visible function
17*86bcb5c2SHiroo HAYASHIto it, ``extern parserDefinition *SwineParser(void)``, and add its name to the
18063580daSHiroo HAYASHItable in ``parsers.h``. The job of this parser definition function is to create
19*86bcb5c2SHiroo HAYASHIan instance of the ``parserDefinition`` structure (using ``parserNew()``) and
20063580daSHiroo HAYASHIpopulate it with information defining how files of this language are recognized,
21063580daSHiroo HAYASHIwhat kinds of tags it can locate, and the function used to invoke the parser on
22063580daSHiroo HAYASHIthe currently open file.
23063580daSHiroo HAYASHI
24*86bcb5c2SHiroo HAYASHIThe structure ``parserDefinition`` allows assignment of the following fields:
25063580daSHiroo HAYASHI
26063580daSHiroo HAYASHI.. code-block:: c
27063580daSHiroo HAYASHI
28063580daSHiroo HAYASHI	struct sParserDefinition {
29063580daSHiroo HAYASHI		/* defined by parser */
30063580daSHiroo HAYASHI		char* name;                    /* name of language */
31063580daSHiroo HAYASHI		kindDefinition* kindTable;	   /* tag kinds handled by parser */
32063580daSHiroo HAYASHI		unsigned int kindCount;        /* size of 'kinds' list */
33063580daSHiroo HAYASHI		const char *const *extensions; /* list of default extensions */
34063580daSHiroo HAYASHI		const char *const *patterns;   /* list of default file name patterns */
35063580daSHiroo HAYASHI		const char *const *aliases;    /* list of default aliases (alternative names) */
36063580daSHiroo HAYASHI		parserInitialize initialize;   /* initialization routine, if needed */
37063580daSHiroo HAYASHI		parserFinalize finalize;       /* finalize routine, if needed */
38063580daSHiroo HAYASHI		simpleParser parser;           /* simple parser (common case) */
39063580daSHiroo HAYASHI		rescanParser parser2;          /* rescanning parser (unusual case) */
40063580daSHiroo HAYASHI		selectLanguage* selectLanguage; /* may be used to resolve conflicts */
41063580daSHiroo HAYASHI		unsigned int method;           /* See METHOD_ definitions above */
42063580daSHiroo HAYASHI		unsigned int useCork;		   /* bit fields of corkUsage */
43063580daSHiroo HAYASHI		...
44063580daSHiroo HAYASHI	};
45063580daSHiroo HAYASHI
46063580daSHiroo HAYASHIThe ``name`` field must be set to a non-empty string. Also either ``parser`` or
47063580daSHiroo HAYASHI``parser2`` must set to point to a parsing routine which will generate the tag
48063580daSHiroo HAYASHIentries. All other fields are optional.
49063580daSHiroo HAYASHI
50063580daSHiroo HAYASHIReading input file stream
51063580daSHiroo HAYASHI-------------------------------------------------
52063580daSHiroo HAYASHINow all that is left is to implement the parser. In order to do its job, the
53063580daSHiroo HAYASHIparser should read the file stream using using one of the two I/O interfaces:
54063580daSHiroo HAYASHIeither the character-oriented ``getcFromInputFile()``, or the line-oriented
55063580daSHiroo HAYASHI``readLineFromInputFile()``.
56063580daSHiroo HAYASHI
57*86bcb5c2SHiroo HAYASHISee ":ref:`input-text-stream`" for more details.
58063580daSHiroo HAYASHI
59063580daSHiroo HAYASHIParsing
60063580daSHiroo HAYASHI-------------------------------------------------
61063580daSHiroo HAYASHIHow our Swine parser actually parses the contents of the file is entirely up to
62063580daSHiroo HAYASHIthe writer of the parser--it can be as crude or elegant as desired. You will
63063580daSHiroo HAYASHInote a variety of examples from the most complex (``parsers/cxx/*.[hc]``) to the
64063580daSHiroo HAYASHIsimplest (``parsers/make.[ch]``).
65063580daSHiroo HAYASHI
66063580daSHiroo HAYASHIAdding a tag to the tag file
67063580daSHiroo HAYASHI-------------------------------------------------
68063580daSHiroo HAYASHIWhen the Swine parser identifies an interesting token for which it wants to add
69063580daSHiroo HAYASHIa tag to the tag file, it should create a ``tagEntryInfo`` structure and
70063580daSHiroo HAYASHIinitialize it by calling ``initTagEntry()``, which initializes defaults and
71063580daSHiroo HAYASHIfills information about the current line number and the file position of the
72063580daSHiroo HAYASHIbeginning of the line. After filling in information defining the current entry
73063580daSHiroo HAYASHI(and possibly overriding the file position or other defaults), the parser passes
74063580daSHiroo HAYASHIthis structure to ``makeTagEntry()``.
75063580daSHiroo HAYASHI
76*86bcb5c2SHiroo HAYASHISee ":ref:`output-tag-stream`" for more details.
77063580daSHiroo HAYASHI
78063580daSHiroo HAYASHIAdding the parser to ``ctags``
79063580daSHiroo HAYASHI-------------------------------------------------
80063580daSHiroo HAYASHILastly, be sure to add your the name of the file containing your parser (e.g.
81063580daSHiroo HAYASHI``parsers/swine.c``) to the macro ``PARSER_SRCS`` in the file ``source.mak``, so
82063580daSHiroo HAYASHIthat your new module will be compiled into the program.
83063580daSHiroo HAYASHI
84063580daSHiroo HAYASHIMisc.
85063580daSHiroo HAYASHI-------------------------------------------------
86063580daSHiroo HAYASHIThis is all there is to it. All other details are specific to the parser and how
87063580daSHiroo HAYASHIit wants to do its job.
88063580daSHiroo HAYASHI
89063580daSHiroo HAYASHIThere are some support functions which can take care of some commonly needed
90*86bcb5c2SHiroo HAYASHIparsing tasks, such as *keyword table lookups* (see ``main/keyword.c``), which you
91063580daSHiroo HAYASHIcan make use of if desired (examples of its use can be found in ``parsers/c.c``,
92063580daSHiroo HAYASHI``parsers/eiffel.c``, and ``parsers/fortran.c``).
93063580daSHiroo HAYASHI
94063580daSHiroo HAYASHISupport functions can be found in ``main/*.h`` excluding ``main/*_p.h``.
95063580daSHiroo HAYASHI
96063580daSHiroo HAYASHIAlmost everything is already taken care of automatically for you by the
97063580daSHiroo HAYASHIinfrastructure. Writing the actual parsing algorithm is the hardest part, but is
98063580daSHiroo HAYASHInot constrained by any need to conform to anything in ctags other than that
99063580daSHiroo HAYASHImentioned above.
100063580daSHiroo HAYASHI
101063580daSHiroo HAYASHIThere are several different approaches used in the parsers inside Universal
102063580daSHiroo HAYASHICtags and you can browse through these as examples of how to go about creating
103063580daSHiroo HAYASHIyour own.
104