1.. _cxx: 2 3====================================================================== 4The new C/C++ parser 5====================================================================== 6 7:Maintainer: Szymon Tomasz Stefanek <s.stefanek@gmail.com> 8 9Introduction 10--------------------------------------------------------------------- 11 12The C++ language has strongly evolved since the old C/C++ parser was 13written. The old parser was struggling with some of the new features 14of the language and has shown signs of reaching its limits. For this 15reason in February/March 2016 the C/C++ parser was rewritten from 16scratch. 17 18In the first release several outstanding bugs were fixed and some new 19features were added. Among them: 20 21- Tagging of "using namespace" declarations 22- Tagging of function parameters 23- Extraction of function parameter types 24- Tagging of anonymous structures/unions/classes/enums 25- Support for C++11 lambdas (as anonymous functions) 26- Support for function-level scopes (for local variables and parameters) 27- Extraction of local variables which include calls to constructors 28- Extraction of local variables from within the for(), while(), if() 29 and switch() parentheses. 30- Support for function prototypes/declarations with trailing return type 31 32At the time of writing (March 2016) more features are planned. 33 34Notable New Features 35--------------------------------------------------------------------- 36 37Some of the notable new features are described below. 38 39Properties 40^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 41 42Several properties of functions and variables can be extracted 43and placed in a new field called ``properties``. 44The syntax to enable it is: 45 46.. code-block:: console 47 48 $ ctags ... --fields-c++=+{properties} ... 49 50At the time of writing the following properties are reported: 51 52- ``virtual``: a function is marked as virtual 53- ``static``: a function/variable is marked as static 54- ``inline``: a function implementation is marked as inline 55- ``explicit``: a function is marked as explicit 56- ``extern``: a function/variable is marked as extern 57- ``const``: a function is marked as const 58- ``pure``: a virtual function is pure (i.e = 0) 59- ``override``: a function is marked as override 60- ``default``: a function is marked as default 61- ``final``: a function is marked as final 62- ``delete``: a function is marked as delete 63- ``mutable``: a variable is marked as mutable 64- ``volatile``: a function is marked as volatile 65- ``specialization``: a function is a template specialization 66- ``scopespecialization``: template specialization of scope ``a<x>::b()`` 67- ``deprecated``: a function is marked as deprecated via ``__attribute__`` 68- ``scopedenum``: a scoped enumeration (C++11) 69 70Preprocessor macros 71^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 72 73Defining a macro from command line 74~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 75 76The new parser supports the definition of real preprocessor macros 77via the ``-D`` option. All types of macros are supported, 78including the ones with parameters and variable arguments. 79Stringification, token pasting and recursive macro expansion are also supported. 80 81Option ``-I`` is now simply a backward-compatible syntax to define a 82macro with no replacement. 83 84The syntax is similar to the corresponding gcc ``-D`` option. 85 86Some examples follow. 87 88.. code-block:: console 89 90 $ ctags ... -D IGNORE_THIS ... 91 92With this commandline the following C/C++ input 93 94.. code-block:: C 95 96 int IGNORE_THIS a; 97 98will be processed as if it was 99 100.. code-block:: C 101 102 int a; 103 104Defining a macro with parameters uses the following syntax: 105 106.. code-block:: console 107 108 $ ctags ... -D "foreach(arg)=for(arg;;)" ... 109 110This example defines ``for(arg;;)`` as the replacement ``foreach(arg)``. 111So the following C/C++ input 112 113.. code-block:: C 114 115 foreach(char * p,pointers) 116 { 117 118 } 119 120is processed in new C/C++ parser as: 121 122.. code-block:: C 123 124 for(char * p;;) 125 { 126 127 } 128 129and the p local variable can be extracted. 130 131The previous commandline includes quotes since the macros generally contain 132characters that are treated specially by the shells. You may need some escaping. 133 134Token pasting is performed by the ``##`` operator, just like in the normal 135C preprocessor. 136 137.. code-block:: console 138 139 $ ctags ... -D "DECLARE_FUNCTION(prefix)=int prefix ## Call();" 140 141So the following code 142 143.. code-block:: C 144 145 DECLARE_FUNCTION(a) 146 DECLARE_FUNCTION(b) 147 148will be processed as 149 150.. code-block:: C 151 152 int aCall(); 153 int bCall(); 154 155Macros with variable arguments use the gcc ``__VA_ARGS__`` syntax. 156 157.. code-block:: console 158 159 $ ctags ... -D "DECLARE_FUNCTION(name,...)=int name(__VA_ARGS__);" 160 161So the following code 162 163.. code-block:: C 164 165 DECLARE_FUNCTION(x,int a,int b) 166 167will be processed as 168 169.. code-block:: C 170 171 int x(int a,int b); 172 173Automatically expanding macros defined in the same input file (HIGHLY EXPERIMENTAL) 174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 175 176If a CPreProcessor macro defined in a C/C++/CUDA file, the macro invocation in the 177SAME file can be expanded with following options: 178 179.. code-block:: text 180 181 --param-CPreProcessor._expand=1 182 --fields-C=+{macrodef} 183 --fields-C++=+{macrodef} 184 --fields-CUDA=+{macrodef} 185 --fields=+{signature} 186 187Let's see an example. 188 189input.c: 190.. code-block:: C 191 192 #define DEFUN(NAME) int NAME (int x, int y) 193 #define BEGIN { 194 #define END } 195 196 DEFUN(myfunc) 197 BEGIN 198 return -1 199 END 200 201The output without options: 202.. code-block:: 203 204 $ ctags -o - input.c 205 BEGIN input.c /^#define BEGIN /;" d language:C file: 206 DEFUN input.c /^#define DEFUN(/;" d language:C file: 207 END input.c /^#define END /;" d language:C file: 208 209The output with options: 210.. code-block:: 211 212 $ ctags --param-CPreProcessor._expand=1 --fields-C=+'{macrodef}' --fields=+'{signature}' -o - input.c 213 BEGIN input.c /^#define BEGIN /;" d language:C file: macrodef:{ 214 DEFUN input.c /^#define DEFUN(/;" d language:C file: signature:(NAME) macrodef:int NAME (int x, int y) 215 END input.c /^#define END /;" d language:C file: macrodef:} 216 myfunc input.c /^DEFUN(myfunc)$/;" f language:C typeref:typename:int signature:(int x,int y) 217 218``myfunc`` coded by ``DEFUN`` macro is captured well. 219 220 221This feature is highly experimental. At least three limitations are known. 222 223* This feature doesn't understand ``#undef`` yet. 224 Once a macro is defined, its invocation is always expanded even 225 after the parser sees ``#undef`` for the macro in the same input 226 file. 227 228* Macros are expanded incorrectly if the result of macro expansion 229 includes the macro invocation again. 230 231* Currently, ctags can expand a macro invocation only if its 232 definitions are in the same input file. ctags cannot expand a macro 233 defined in the header file included from the current input file. 234 235Enabling this macro expansion feature makes the parsing speed about 236two times slower. 237 238 239Incompatible Changes 240--------------------------------------------------------------------- 241 242The parser is mostly compatible with the old one. There are some minor 243incompatible changes which are described below. 244 245 246Anonymous structure names 247^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 248 249The old parser produced structure names in the form ``__anonN`` where N 250was a number starting at 1 in each file and increasing at each new 251structure. This caused collisions in symbol names when ctags was run 252on multiple files. 253 254In the new parser the anonymous structure names depend on the file name 255being processed and on the type of the structure itself. Collisions are 256far less likely (though not impossible as hash functions are unavoidably 257imperfect). 258 259Pitfall: the file name used for hashing includes the path as passed to the 260ctags executable. So the same file "seen" from different paths will produce 261different structure names. This is unavoidable and is up to the user to 262ensure that multiple ctags runs are started from a common directory root. 263 264File scope 265^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 266 267The file scope information is not 100% reliable. It never was. 268There are several cases in that compiler, linker or even source code 269tricks can "unhide" file scope symbols (for instance \*.c files can be 270included into each other) and several other cases in that the limitation 271of the scope of a symbol to a single file simply cannot be determined 272with a single pass or without looking at a program as a whole. 273 274The new parser defines a simple policy for file scope association 275that tries to be as compatible as possible with the old parser and 276should reflect the most common usages. The policy is the following: 277 278- Namespaces are in file scope if declared inside a .c or .cpp file 279 280- Function prototypes are in file scope if declared inside a .c or .cpp file 281 282- K&R style function definitions are in file scope if declared static 283 inside a .c file. 284 285- Function definitions appearing inside a namespace are in file scope only 286 if declared static inside a .c or .cpp file. 287 Note that this rule includes both global functions (global namespace) 288 and class/struct/union members defined outside of the class/struct/union 289 declaration. 290 291- Function definitions appearing inside a class/struct/union declaration 292 are in file scope only if declared static inside a .cpp file 293 294- Function parameters are always in file scope 295 296- Local variables are always in file scope 297 298- Variables appearing inside a namespace are in file scope only if 299 they are declared static inside a .c or .cpp file 300 301- Variables that are members of a class/struct/union are in file scope 302 only if declared in a .c or .cpp file 303 304- Typedefs are in file scope if appearing inside a .c or .cpp file 305 306Most of these rules are debatable in one way or the other. Just keep in mind 307that this is not 100% reliable. 308 309Inheritance information 310^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 311 312The new parser does not strip template names from base classes. 313For a declaration like 314 315.. code-block:: C 316 317 template<typename A> class B : public C<A> 318 319the old parser reported ``C`` as base class while the new one reports 320``C<A>``. 321 322Typeref 323^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 324 325The syntax of the typeref field (``typeref:A:B``) was designed with only 326struct/class/union/enum types in mind. Generic types don't have ``A`` 327information and the keywords became entirely optional in C++: 328you just can't tell. Furthermore, struct/class/union/enum types 329share the same namespace and their names can't collide, so the ``A`` 330information is redundant for most purposes. 331 332To accommodate generic types and preserve some degree of backward 333compatibility the new parser uses struct/class/union/enum in place 334of ``A`` where such keyword can be inferred. Where the information is 335not available it uses the 'typename' keyword. 336 337Generally, you should ignore the information in field ``A`` and use 338only information in field ``B``. 339