xref: /Universal-ctags/docs/parser-cxx.rst (revision 8a87cfff4050b199f0bf8803961fc79cdd22226b)
1.. _cxx:
2
3======================================================================
4The new C/C++ parser
5======================================================================
6
7:Maintainer: Szymon Tomasz Stefanek <s.stefanek@gmail.com>
8
9Introduction
10---------------------------------------------------------------------
11
12The C++ language has strongly evolved since the old C/C++ parser was
13written. The old parser was struggling with some of the new features
14of the language and has shown signs of reaching its limits. For this
15reason in February/March 2016 the C/C++ parser was rewritten from
16scratch.
17
18In the first release several outstanding bugs were fixed and some new
19features were added. Among them:
20
21- Tagging of "using namespace" declarations
22- Tagging of function parameters
23- Extraction of function parameter types
24- Tagging of anonymous structures/unions/classes/enums
25- Support for C++11 lambdas (as anonymous functions)
26- Support for function-level scopes (for local variables and parameters)
27- Extraction of local variables which include calls to constructors
28- Extraction of local variables from within the for(), while(), if()
29  and switch() parentheses.
30- Support for function prototypes/declarations with trailing return type
31
32At the time of writing (March 2016) more features are planned.
33
34Notable New Features
35---------------------------------------------------------------------
36
37Some of the notable new features are described below.
38
39Properties
40^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
41
42Several properties of functions and variables can be extracted
43and placed in a new field called ``properties``.
44The syntax to enable it is:
45
46.. code-block:: console
47
48	$ ctags ... --fields-c++=+{properties} ...
49
50At the time of writing the following properties are reported:
51
52- ``virtual``: a function is marked as virtual
53- ``static``: a function/variable is marked as static
54- ``inline``: a function implementation is marked as inline
55- ``explicit``: a function is marked as explicit
56- ``extern``: a function/variable is marked as extern
57- ``const``: a function is marked as const
58- ``pure``: a virtual function is pure (i.e = 0)
59- ``override``: a function is marked as override
60- ``default``: a function is marked as default
61- ``final``: a function is marked as final
62- ``delete``: a function is marked as delete
63- ``mutable``: a variable is marked as mutable
64- ``volatile``: a function is marked as volatile
65- ``specialization``: a function is a template specialization
66- ``scopespecialization``: template specialization of scope ``a<x>::b()``
67- ``deprecated``: a function is marked as deprecated via ``__attribute__``
68- ``scopedenum``: a scoped enumeration (C++11)
69
70Preprocessor macros
71^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
72
73Defining a macro from command line
74~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
75
76The new parser supports the definition of real preprocessor macros
77via the ``-D`` option. All types of macros are supported,
78including the ones with parameters and variable arguments.
79Stringification, token pasting and recursive macro expansion are also supported.
80
81Option ``-I`` is now simply a backward-compatible syntax to define a
82macro with no replacement.
83
84The syntax is similar to the corresponding gcc ``-D`` option.
85
86Some examples follow.
87
88.. code-block:: console
89
90	$ ctags ... -D IGNORE_THIS ...
91
92With this commandline the following C/C++ input
93
94.. code-block:: C
95
96	int IGNORE_THIS a;
97
98will be processed as if it was
99
100.. code-block:: C
101
102	int a;
103
104Defining a macro with parameters uses the following syntax:
105
106.. code-block:: console
107
108	$ ctags ... -D "foreach(arg)=for(arg;;)" ...
109
110This example defines ``for(arg;;)`` as the replacement ``foreach(arg)``.
111So the following C/C++ input
112
113.. code-block:: C
114
115	foreach(char * p,pointers)
116	{
117
118	}
119
120is processed in new C/C++ parser as:
121
122.. code-block:: C
123
124	for(char * p;;)
125	{
126
127	}
128
129and the p local variable can be extracted.
130
131The previous commandline includes quotes since the macros generally contain
132characters that are treated specially by the shells. You may need some escaping.
133
134Token pasting is performed by the ``##`` operator, just like in the normal
135C preprocessor.
136
137.. code-block:: console
138
139	$ ctags ... -D "DECLARE_FUNCTION(prefix)=int prefix ## Call();"
140
141So the following code
142
143.. code-block:: C
144
145	DECLARE_FUNCTION(a)
146	DECLARE_FUNCTION(b)
147
148will be processed as
149
150.. code-block:: C
151
152	int aCall();
153	int bCall();
154
155Macros with variable arguments use the gcc ``__VA_ARGS__`` syntax.
156
157.. code-block:: console
158
159	$ ctags ... -D "DECLARE_FUNCTION(name,...)=int name(__VA_ARGS__);"
160
161So the following code
162
163.. code-block:: C
164
165	DECLARE_FUNCTION(x,int a,int b)
166
167will be processed as
168
169.. code-block:: C
170
171	int x(int a,int b);
172
173Automatically expanding macros defined in the same input file (HIGHLY EXPERIMENTAL)
174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176If a CPreProcessor macro defined in a C/C++/CUDA file, the macro invocation in the
177SAME file can be expanded with following options:
178
179.. code-block:: text
180
181   --param-CPreProcessor._expand=1
182   --fields-C=+{macrodef}
183   --fields-C++=+{macrodef}
184   --fields-CUDA=+{macrodef}
185   --fields=+{signature}
186
187Let's see an example.
188
189input.c:
190.. code-block:: C
191
192	#define DEFUN(NAME) int NAME (int x, int y)
193	#define BEGIN {
194	#define END }
195
196	DEFUN(myfunc)
197	  BEGIN
198	  return -1
199	  END
200
201The output without options:
202.. code-block::
203
204   $ ctags -o - input.c
205   BEGIN	input.c	/^#define BEGIN /;"	d	language:C	file:
206   DEFUN	input.c	/^#define DEFUN(/;"	d	language:C	file:
207   END	input.c	/^#define END /;"	d	language:C	file:
208
209The output with options:
210.. code-block::
211
212   $ ctags --param-CPreProcessor._expand=1 --fields-C=+'{macrodef}' --fields=+'{signature}' -o - input.c
213   BEGIN	input.c	/^#define BEGIN /;"	d	language:C	file:	macrodef:{
214   DEFUN	input.c	/^#define DEFUN(/;"	d	language:C	file:	signature:(NAME)	macrodef:int NAME (int x, int y)
215   END	input.c	/^#define END /;"	d	language:C	file:	macrodef:}
216   myfunc	input.c	/^DEFUN(myfunc)$/;"	f	language:C	typeref:typename:int	signature:(int x,int y)
217
218``myfunc`` coded by ``DEFUN`` macro is captured well.
219
220
221This feature is highly experimental. At least three limitations are known.
222
223* This feature doesn't understand ``#undef`` yet.
224  Once a macro is defined, its invocation is always expanded even
225  after the parser sees ``#undef`` for the macro in the same input
226  file.
227
228* Macros are expanded incorrectly if the result of macro expansion
229  includes the macro invocation again.
230
231* Currently, ctags can expand a macro invocation only if its
232  definitions are in the same input file. ctags cannot expand a macro
233  defined in the header file included from the current input file.
234
235Enabling this macro expansion feature makes the parsing speed about
236two times slower.
237
238
239Incompatible Changes
240---------------------------------------------------------------------
241
242The parser is mostly compatible with the old one. There are some minor
243incompatible changes which are described below.
244
245
246Anonymous structure names
247^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
248
249The old parser produced structure names in the form ``__anonN`` where N
250was a number starting at 1 in each file and increasing at each new
251structure. This caused collisions in symbol names when ctags was run
252on multiple files.
253
254In the new parser the anonymous structure names depend on the file name
255being processed and on the type of the structure itself. Collisions are
256far less likely (though not impossible as hash functions are unavoidably
257imperfect).
258
259Pitfall: the file name used for hashing includes the path as passed to the
260ctags executable. So the same file "seen" from different paths will produce
261different structure names. This is unavoidable and is up to the user to
262ensure that multiple ctags runs are started from a common directory root.
263
264File scope
265^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
266
267The file scope information is not 100% reliable. It never was.
268There are several cases in that compiler, linker or even source code
269tricks can "unhide" file scope symbols (for instance \*.c files can be
270included into each other) and several other cases in that the limitation
271of the scope of a symbol to a single file simply cannot be determined
272with a single pass or without looking at a program as a whole.
273
274The new parser defines a simple policy for file scope association
275that tries to be as compatible as possible with the old parser and
276should reflect the most common usages. The policy is the following:
277
278- Namespaces are in file scope if declared inside a .c or .cpp file
279
280- Function prototypes are in file scope if declared inside a .c or .cpp file
281
282- K&R style function definitions are in file scope if declared static
283  inside a .c file.
284
285- Function definitions appearing inside a namespace are in file scope only
286  if declared static inside a .c or .cpp file.
287  Note that this rule includes both global functions (global namespace)
288  and class/struct/union members defined outside of the class/struct/union
289  declaration.
290
291- Function definitions appearing inside a class/struct/union declaration
292  are in file scope only if declared static inside a .cpp file
293
294- Function parameters are always in file scope
295
296- Local variables are always in file scope
297
298- Variables appearing inside a namespace are in file scope only if
299  they are declared static inside a .c or .cpp file
300
301- Variables that are members of a class/struct/union are in file scope
302  only if declared in a .c or .cpp file
303
304- Typedefs are in file scope if appearing inside a .c or .cpp file
305
306Most of these rules are debatable in one way or the other. Just keep in mind
307that this is not 100% reliable.
308
309Inheritance information
310^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
311
312The new parser does not strip template names from base classes.
313For a declaration like
314
315.. code-block:: C
316
317	template<typename A> class B : public C<A>
318
319the old parser reported ``C`` as base class while the new one reports
320``C<A>``.
321
322Typeref
323^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
324
325The syntax of the typeref field (``typeref:A:B``) was designed with only
326struct/class/union/enum types in mind. Generic types don't have ``A``
327information and the keywords became entirely optional in C++:
328you just can't tell. Furthermore, struct/class/union/enum types
329share the same namespace and their names can't collide, so the ``A``
330information is redundant for most purposes.
331
332To accommodate generic types and preserve some degree of backward
333compatibility the new parser uses struct/class/union/enum in place
334of ``A`` where such keyword can be inferred. Where the information is
335not available it uses the 'typename' keyword.
336
337Generally, you should ignore the information in field ``A`` and use
338only information in field ``B``.
339