xref: /Universal-ctags/docs/testing-parser.rst (revision 8c31bb07cae649f474bc4eab644d201c72b6886f)
1.. _testing_parser:
2
3=============================================================================
4Testing a parser
5=============================================================================
6
7
8.. contents:: `Table of contents`
9	:depth: 3
10	:local:
11
12It is difficult for us to know syntax of all languages supported in ctags. Test
13facility and test cases are quite important for maintaining ctags with limited
14resources.
15
16..	_units:
17
18*Units* test facility
19---------------------------------------------------------------------
20
21:Maintainer: Masatake YAMATO <yamato@redhat.com>
22
23----
24
25**Test facility**
26
27Exuberant Ctags has a test facility. The test case were *Test*
28directory. So Here I call it *Test*.
29
30Main aim of the facility is detecting regression. All files under Test
31directory are given as input for old and new version of ctags
32commands.  The output tags files of both versions are compared. If any
33difference is found the check fails. *Test* expects the older ctags
34binary to be correct.
35
36This expectation is not always met. Consider that a parser for a new
37language is added. You may want to add a sample source code for that
38language to *Test*. An older ctags version is unable to generate a
39tags file for that sample code, but the newer ctags version does. At
40this point a difference is found and *Test* reports failure.
41
42**Units facility**
43
44The units test facility (*Units*) I describe here takes a different
45approach. An input file and an expected output file are given by a
46contributor of a language parser. The units test facility runs ctags
47command with the input file and compares its output and the expected
48output file. The expected output doesn't depend on ctags.
49
50If a contributor sends a patch which may improve a language parser,
51and if a reviewer is not familiar with that language, s/he cannot
52evaluate it.
53
54*Unit* test files, the pair of input file and expected output file may
55be able to explain the intent of patch well; and may help the
56reviewer.
57
58How to write a test case
59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60
61The test facility recognizes an input file and an expected
62output file by patterns of file name. Each test case should
63have its own directory under Units directory.
64
65*Units/TEST/input.\** **requisite**
66
67	Input file name must have a *input* as basename. *TEST*
68	part should explain the test case well.
69
70*Units/TEST/input[-_][0-9].\** *Units/TEST/input[-_][0-9][-_]\*.\** **optional**
71
72	Optional input file names. They are put next to *input.\** in
73	testing command line.
74
75*Units/TEST/expected.tags* **optional**
76
77	Expected output file must have a name *expected.tags*. It
78	should be the same directory of the input file.
79
80	If this file is not given, the exit status of ctags process
81	is just checked; the output is ignored.
82
83	If you want to test etags output (specified with ``-e`` ),
84	Use **.tags-e** as suffix instead of **.tags**.
85	In such a case you don't have to write ``-e`` to ``args.ctags``.
86	The test facility sets ``-e`` automatically.
87
88	If you want to test cross reference output (specified with ``-x`` ),
89	Use **.tags-x** as suffix instead of **.tags**.
90	In such a case you don't have to write ``-x`` to ``args.ctags``.
91	The test facility sets ``-x`` automatically.
92
93	If you want to test json output (specified with ``--output-format=json`` ),
94	Use **.tags-json** as suffix instead of **.tags**.
95	In such a case you don't have to write ``--output-format=json`` to ``args.ctags``,
96	and add ``json`` to ``features`` as described below.
97	The test facility sets the option and the feature automatically.
98
99*Units/TEST/args.ctags* **optional**
100
101	``-o -`` is used as default optional argument when running a
102	unit test ctags. If you want to add more options, enumerate
103	options in **args.ctags** file.
104
105	Remember you have to put one option in one line; don't
106	put multiple options to one line. Multiple options in
107	one line doesn't work.
108
109*Units/TEST/filter* **optional**
110
111	You can rearrange the output of ctags with this command
112	before comparing with *executed.tags*.
113	This command is invoked with no argument. The output
114	ctags is given via stdin. Rearrange data should be
115	written to stdout.
116
117*Units/TEST/features* **optional**
118
119	If a unit test case requires special features of ctags,
120	enumerate them in this file line by line. If a target ctags
121	doesn't have one of the features, the test is skipped.
122
123	If a file line is started with ``!``, the effect is inverted;
124	if a target ctags has the feature specified with ``!``, the
125	test is skipped.
126
127	All features built-in can be listed with passing
128	``--list-features`` to ctags.
129
130*Units/TEST/languages* **optional**
131
132	If a unit test case requires that language parsers are enabled/available,
133	enumerate them in this file line by line. If one of them is
134	disabled/unavailable, the test is skipped.
135
136	language parsers enabled/available can be checked with passing
137	``--list-languages`` to ctags.
138
139Note for importing a test case from Test directory
140~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141
142I think all test cases under Test directory should be converted to
143Units.
144
145If you convert use following TEST name convention.
146
147* use *.t* instead of *.d* as suffix for the name
148
149Here is an example::
150
151	Test/simple.sh
152
153This should be::
154
155	Units/simple.sh.t
156
157With this name convention we can track which test case is converted or
158not.
159
160Example of files
161~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
162
163See `Units/parser-c.r/c-sample.d
164<https://github.com/universal-ctags/ctags/tree/master/Units/parser-c.r/c-sample.d>`_.
165
166How to run unit tests
167~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
168
169*test* make target::
170
171	 $ make units
172
173The result of unit tests is reported by lines. You can specify
174test cases with ``UNITS=``.
175
176An example to run *vim-command.d* only::
177
178	$ make units UNITS=vim-command
179
180Another example to run *vim-command.d* and *parser-python.r/bug1856363.py.d*::
181
182	$ make units UNITS=vim-command,bug1856363.py
183
184During testing *OUTPUT.tmp*, *EXPECTED.tmp* and *DIFF.tmp* files are
185generated for each test case directory. These are removed when the
186unit test is **passed**.  If the result is **FAILED**, it is kept for
187debugging. Following command line can clean up these generated files
188at once::
189
190	$ make clean-units
191
192Other than **FAILED** and **passed** two types of result are
193defined.
194
195
196**skipped**
197
198	means running the test case is skipped in some reason.
199
200**failed (KNOWN bug)**
201
202	means the result was failed but the failure is expected.
203	See ":ref:`gathering_test`".
204
205Example of running
206~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207::
208
209	$ make units
210	Category: ROOT
211	-------------------------------------------------------------------------
212	Testing 1795612.js as JavaScript                            passed
213	Testing 1850914.js as JavaScript                            passed
214	Testing 1878155.js as JavaScript                            passed
215	Testing 1880687.js as JavaScript                            passed
216	Testing 2023624.js as JavaScript                            passed
217	Testing 3184782.sql as SQL                                  passed
218	...
219
220Running unit tests for specific languages
221~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
222
223You can run only the tests for specific languages by setting
224``LANGUAGES`` to parsers as reported by
225``ctags --list-languages``::
226
227	make units LANGUAGES=PHP,C
228
229Multiple languages can be selected using a comma separated list.
230
231.. _gathering_test:
232
233Gathering test cases for known bugs
234~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
235
236When we meet a bug, it is an important development activity to make a small test
237case that triggers the bug.
238Even the bug cannot be fixed in soon,
239the test case is an important result of work. Such result should
240be merged to the source tree. However, we don't love **FAILED**
241message, too. What we should do?
242
243In such a case, merge as usually but use *.b* as suffix for
244the directory of test case instead of *.d*.
245
246``parser-autoconf.r/nested-block.ac.b/`` is an example
247of ``.b``*`` suffix usage.
248
249When you run test.units target, you will see::
250
251    Testing c-sample as C                                 passed
252    Testing css-singlequote-in-comment as CSS             failed (KNOWN bug)
253    Testing ctags-simple as ctags                         passed
254
255Suffix *.i* is a variant of *.b*. *.i* is for merging/gathering input
256which lets ctags process enter an infinite loop. Different from *.b*,
257test cases marked as *.i* are never executed. They are just skipped
258but reported the skips::
259
260    Testing ada-ads as Ada                                passed
261    Testing ada-function as Ada                           skipped (may cause an infinite loop)
262    Testing ada-protected as Ada                          passed
263    ...
264
265    Summary (see CMDLINE.tmp to reproduce without test harness)
266    ------------------------------------------------------------
267      #passed:                                347
268      #FIXED:                                 0
269      #FAILED (unexpected-exit-status):       0
270      #FAILED (unexpected-output):            0
271      #skipped (features):                    0
272      #skipped (languages):                   0
273      #skipped (infinite-loop):               1
274        ada-protected
275      ...
276
277Running under valgrind and timeout
278~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279If ``VG=1`` is given, each test cases are run under valgrind.
280If valgrind detects an error, it is reported as::
281
282    $ make units VG=1
283    Testing css-singlequote-in-comment as CSS             failed (valgrind-error)
284    ...
285    Summary (see CMDLINE.tmp to reproduce without test harness)
286    ------------------------------------------------------------
287    ...
288    #valgrind-error:                        1
289      css-singlequote-in-comment
290    ...
291
292In this case the report of valgrind is recorded to
293``Units/css-singlequote-in-comment/VALGRIND-CSS.tmp``.
294
295NOTE: ``/bin/bash`` is needed to report the result. You can specify a shell
296running test with SHELL macro like::
297
298    $ make units VG=1 SHELL=/bin/bash
299
300
301If ``TIMEOUT=N`` is given, each test cases are run under timeout
302command. If ctags doesn't stop in ``N`` second, it is stopped
303by timeout command and reported as::
304
305    $ make units TIMEOUT=1
306    Testing css-singlequote-in-comment as CSS             failed (TIMED OUT)
307    ...
308    Summary (see CMDLINE.tmp to reproduce without test harness)
309    ------------------------------------------------------------
310    ...
311    #TIMED-OUT:                             1
312      css-singlequote-in-comment
313    ...
314
315If ``TIMEOUT=N`` is given, *.i* test cases are run. They will be
316reported as *TIMED-OUT*.
317
318Categories
319~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
320
321.. NOT REVIEWED
322
323With *.r* suffix, you can put test cases under a sub directory
324of *Units*. ``Units/parser-ada.r`` is an example. If *misc/units*
325test harness, the sub directory is called a category. ``parser-ada.r``
326is the name category in the above example.
327
328*CATEGORIES* macro of make is for running units in specified categories.
329Following command line is for running units in
330``Units/parser-sh.r`` and ``Units/parser-ada.r``::
331
332  $ make units CATEGORIES='parser-sh,parser-ada'
333
334
335Finding minimal bad input
336~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
337
338When a test case is failed, the input causing ``FAILED`` result is
339passed to *misc/units shrink*.  *misc/units shrink* tries to make the
340shortest input which makes ctags exits with non-zero status.  The
341result is reported to ``Units/\*/SHRINK-${language}.tmp``.  Maybe
342useful to debug.
343
344Acknowledgments
345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346
347The file name rule is suggested by Maxime Coste <frrrwww@gmail.com>.
348
349Reviewing the result of Units test
350------------------------------------------------------------
351
352Try misc/review.
353
354.. code-block:: console
355
356    $ misc/review --help
357    Usage:
358            misc/review          help|--help|-h                 show this message
359            misc/review          [list] [-b]                    list failed Units and Tmain
360                                 -b                             list .b (known bug) marked cases
361            misc/review          inspect [-b]                   inspect difference interactively
362                                 -b                             inspect .b (known bug) marked cases
363    $
364
365Semi-fuzz(*Fuzz*) testing
366---------------------------------------------------------------------
367
368Unexpected input can lead ctags to enter an infinite loop. The fuzz
369target tries to identify these conditions by passing
370semi-random (semi-broken) input to ctags.
371
372::
373
374	$ make fuzz LANGUAGES=LANG1[,LANG2,...]
375
376With this command line, ctags is run for random variations of all test
377inputs under *Units/\*/input.\** of languages defined by ``LANGUAGES``
378macro variable. In this target, the output of ctags is ignored and
379only the exit status is analyzed. The ctags binary is also run under
380timeout command, such that if an infinite loop is found it will exit
381with a non-zero status. The timeout will be reported as following::
382
383	[timeout C]                Units/test.vhd.t/input.vhd
384
385This means that if C parser doesn't stop within N seconds when
386*Units/test.vhd.t/input.vhd* is given as an input, timeout will
387interrupt ctags. The default duration can be changed using
388``TIMEOUT=N`` argument in *make* command. If there is no timeout but
389the exit status is non-zero, the target reports it as following::
390
391	[unexpected-status(N) C]                Units/test.vhd.t/input.vhd
392
393The list of parsers which can be used as a value for ``LANGUAGES`` can
394be obtained with following command line
395
396::
397
398	$ ctags --list-languages
399
400Besides ``LANGUAGES`` and ``TIMEOUT``, fuzz target also takes the
401following parameters:
402
403	``VG=1``
404
405		Run ctags under valgrind. If valgrind finds a memory
406		error it is reported as::
407
408			[valgrind-error Verilog]                Units/array_spec.f90.t/input.f90
409
410		The valgrind report is recorded at
411		``Units/\*/VALGRIND-${language}.tmp``.
412
413As the same as units target, this semi-fuzz test target also calls
414*misc/units shrink* when a test case is failed. See "*Units* test facility"
415about the shrunk result.
416
417*Noise* testing
418---------------------------------------------------------------------
419
420After enjoying developing Semi-fuzz testing, I'm looking for a more unfair
421approach. Run
422
423::
424
425	$ make noise LANGUAGES=LANG1[,LANG2,...]
426
427The noise target generates test cases by inserting or deleting one
428character to the test cases of *Units*.
429
430It takes a long time, even without ``VG=1``, so this cannot be run
431under Travis CI. However, it is a good idea to run it locally.
432
433*Chop* and *slap* testing
434---------------------------------------------------------------------
435
436After reviving many bug reports, we recognized some of them spot
437unexpected EOF. The chop target was developed based on this recognition.
438
439The chop target generates many input files from an existing input file
440under *Units* by truncating the existing input file at variety file
441positions.
442
443::
444
445   $ make chop  LANGUAGES=LANG1[,LANG2,...]
446
447It takes a long time, especially with ``VG=1``, so this cannot be run
448under Travis CI. However, it is a good idea to run it locally.
449
450slap target is derived from chop target. While chop target truncates
451the existing input files from tail, the slap target does the same
452from head.
453
454..	_input-validation:
455
456Input validation for *Units*
457---------------------------------------------------------------------
458
459We have to maintain parsers for languages that we don't know well.  We
460don't have enough time to learn the languages.
461
462*Units* test cases help us not introduce wrong changes to a parser.
463
464However, there is still an issue; a developer who doesn't know a
465target language well may write a broken test input file for the
466language.  Here comes "Input validation."
467
468How to run an example session of input validation
469~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
470
471You can validate the test input files of *Units* with *validate-input*
472make target if a validator or a language is defined.
473
474Here is an example validating an input file for JSON.
475
476.. code-block:: console
477
478  $ make validate-input VALIDATORS=jq
479  ...
480  Category: ROOT
481  ------------------------------------------------------------
482  simple-json.d/input.json with jq                                 valid
483
484  Summary
485  ------------------------------------------------------------
486    #valid:                                 1
487    #invalid:                               0
488    #skipped (known invalidation)           0
489    #skipped (validator unavailable)        0
490
491
492This example shows validating *simple-json.d/input.json* as an input
493file with *jq* validator. With VALIDATORS variable passed via
494command-line, you can specify validators to run. Multiple validators
495can be specified using a comma-separated list.  If you don't give
496VALIDATORS, the make target tries to use all available validators.
497
498The meanings of "valid" and "invalid" in "Summary" are apparent.  In
499two cases, the target skips validating input files:
500
501#skipped (known invalidation)
502
503    A test case specifies KNOWN-INVALIDATION in its *validator* file.
504
505#skipped (validator unavailable)
506
507    A command for a validator is not available.
508
509*validator* file
510~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
511
512*validator* file in a *Units* test directory specifies which
513validator the make target should use.
514
515.. code-block:: console
516
517  $ cat Units/simple-json.d/validator
518  jq
519
520If you put *validator* file to a category directory (a directory
521having *.r* suffix), the make target uses the validator specified in
522the file as default.  The default validator can be overridden with a
523*validator* file in a subdirectory.
524
525.. code-block:: console
526
527  $ cat Units/parser-puppetManifest.r/validator
528  puppet
529  # cat Units/parser-puppetManifest.r/puppet-append.d/validator
530  KNOWN-INVALIDATION
531
532In the example, the make target uses *puppet* validator for validating
533the most of all input files under *Units/parser-puppetManifest.r*
534directory. An exception is an input file under
535*Units/parser-puppetManifest.r/puppet-append.d* directory.  The
536directory has its specific *validator* file.
537
538If a *Unit* test case doesn't have *expected.tags* file, the make
539target doesn't run the validator on the file even if a default
540validator is given in its category directory.
541
542If a *Unit* test case specifies KNOWN-INVALIDATION in its *validator*
543file, the make target just increments "#skipped (known invalidation)"
544counter. The target reports the counter at the end of execution.
545
546validator command
547~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
548
549A validator specified in a *validator* file is a command file put
550under *misc/validators* directory.  The command must have "validator-"
551as prefix in its file name. For an example,
552*misc/validators/validator-jq* is the command for "jq".
553
554The command file must be an executable. *validate-input* make target
555runs the command in two ways.
556
557*is_runnable* method
558
559    Before running the command as a validator, the target runs
560    the command with "is_runnable" as the first argument.
561    A validator command can let the target know whether the
562    validator command is runnable or not with exit status.
563    0 means ready to run. Non-zero means not ready to run.
564
565    The make target never runs the validator command for
566    validation purpose if the exit status is non-zero.
567
568    For an example, *misc/validators/validator-jq* command uses *jq*
569    command as its backend. If *jq* command is not available on a
570    system, *validator-jq* can do nothing.  If such case,
571    *is_runnable* method of *validator-jq* command should exit with
572    non-zero value.
573
574*validate* method
575
576    The make target runs the command with "validate* and an input
577    file name for validating the input file.  The command exits
578    non-zero if the input file contains invalid syntax. This method
579    will never run if *is_runnable* method of the command exits with
580    non-zero.
581
582
583..	_man_test:
584
585Testing examples in language specific man pages
586---------------------------------------------------------------------
587
588:Maintainer: Masatake YAMATO <yamato@redhat.com>
589
590----
591
592`man-test` is a target for testing the examples in the language
593specific man pages (``man/ctags-lang-<LANG>.7.rst.in``). The command
594line for running the target is:
595
596.. code-block:: console
597
598   $ make man-test
599
600An example for testing must have following form:
601
602.. code-block:: ReStructuredText
603
604  "input.<EXT>"
605
606  .. code-block:: <LANG>
607
608    <INPUT LINES>
609
610  "output.tags"
611  with "<OPTIONS FOR CTAGS>"
612
613  .. code-block:: tags
614
615    <TAGS OUTPUT LINES>
616
617
618The man-test target recognizes the form and does the same as
619the following shell code for each example in the man page:
620
621.. code-block:: console
622
623  $ echo <INPUT LINES> > input.<EXT>
624  $ echo <TAGS OUTPUT LINES> > output.tags
625  $ ctags <OPTIONS FOR CTAGS> > actual.tags
626  $ diff output.tags actual.tags
627
628A backslash character at the end of ``<INPUT LINES>`` or
629``<TAGS OUTPUT LINES>`` represents the continuation of lines;
630a subsequent newline is ignored.
631
632.. code-block:: ReStructuredText
633
634    .. code-block:: tags
635
636       very long\
637            line
638
639is read as:
640
641.. code-block:: ReStructuredText
642
643    .. code-block:: tags
644
645       very long     line
646
647Here is an example of a test case taken from
648``ctags-lang-python.7.rst.in``:
649
650.. code-block:: ReStructuredText
651
652	"input.py"
653
654	.. code-block:: Python
655
656	   import X0
657
658	"output.tags"
659	with "--options=NONE -o - --extras=+r --fields=+rzK input.py"
660
661	.. code-block:: tags
662
663		X0	input.py	/^import X0$/;"	kind:module	roles:imported
664
665``make man-test`` returns 0 if the all test cases in the all language
666specific man pages are passed.
667
668Here is an example output of the man-test target.
669
670.. code-block:: console
671
672	$ make man-test
673	  RUN      man-test
674	# Run test cases in ./man/ctags-lang-julia.7.rst.in
675	```
676	./man/ctags-lang-julia.7.rst.in[0]:75...passed
677	./man/ctags-lang-julia.7.rst.in[1]:93...passed
678	```
679	# Run test cases in ./man/ctags-lang-python.7.rst.in
680	```
681	./man/ctags-lang-python.7.rst.in[0]:116...passed
682	./man/ctags-lang-python.7.rst.in[1]:133...passed
683	./man/ctags-lang-python.7.rst.in[2]:154...passed
684	./man/ctags-lang-python.7.rst.in[3]:170...passed
685	./man/ctags-lang-python.7.rst.in[4]:187...passed
686	./man/ctags-lang-python.7.rst.in[5]:230...passed
687	```
688	# Run test cases in ./man/ctags-lang-verilog.7.rst.in
689	```
690	./man/ctags-lang-verilog.7.rst.in[0]:51...passed
691	```
692	OK
693
694NOTE: keep examples in the man pages simple. If you want to test ctags
695complicated (and or subtle) input, use the units target. The main
696purpose of the examples is for explaining the parser.
697