xref: /OpenGrok/opengrok-indexer/src/main/jflex/analysis/tcl/Tcl.lexh (revision d219b4cea555a12b602d2d5518daa22134ad4879)
1*d219b4ceSAdam Hornacek/*
2*d219b4ceSAdam Hornacek * CDDL HEADER START
3*d219b4ceSAdam Hornacek *
4*d219b4ceSAdam Hornacek * The contents of this file are subject to the terms of the
5*d219b4ceSAdam Hornacek * Common Development and Distribution License (the "License").
6*d219b4ceSAdam Hornacek * You may not use this file except in compliance with the License.
7*d219b4ceSAdam Hornacek *
8*d219b4ceSAdam Hornacek * See LICENSE.txt included in this distribution for the specific
9*d219b4ceSAdam Hornacek * language governing permissions and limitations under the License.
10*d219b4ceSAdam Hornacek *
11*d219b4ceSAdam Hornacek * When distributing Covered Code, include this CDDL HEADER in each
12*d219b4ceSAdam Hornacek * file and include the License file at LICENSE.txt.
13*d219b4ceSAdam Hornacek * If applicable, add the following below this CDDL HEADER, with the
14*d219b4ceSAdam Hornacek * fields enclosed by brackets "[]" replaced with your own identifying
15*d219b4ceSAdam Hornacek * information: Portions Copyright [yyyy] [name of copyright owner]
16*d219b4ceSAdam Hornacek *
17*d219b4ceSAdam Hornacek * CDDL HEADER END
18*d219b4ceSAdam Hornacek */
19*d219b4ceSAdam Hornacek
20*d219b4ceSAdam Hornacek/*
21*d219b4ceSAdam Hornacek * Copyright (c) 2008, 2016, Oracle and/or its affiliates. All rights reserved.
22*d219b4ceSAdam Hornacek * Portions Copyright (c) 2017, Chris Fraire <cfraire@me.com>.
23*d219b4ceSAdam Hornacek *
24*d219b4ceSAdam Hornacek * Copyright © 1993 The Regents of the University of California.
25*d219b4ceSAdam Hornacek * Copyright © 1994-1996 Sun Microsystems, Inc.
26*d219b4ceSAdam Hornacek * Copyright © 1995-1997 Roger E. Critchlow Jr.
27*d219b4ceSAdam Hornacek */
28*d219b4ceSAdam Hornacek
29*d219b4ceSAdam HornacekNumber = ([0-9]+\.[0-9]+|[0-9][0-9]*|"#" [boxBOX] [0-9a-fA-F]+)
30*d219b4ceSAdam Hornacek
31*d219b4ceSAdam Hornacek/*
32*d219b4ceSAdam Hornacek * [1] Commands. ... Semi-colons and newlines are command separators unless
33*d219b4ceSAdam Hornacek * quoted as described below.
34*d219b4ceSAdam Hornacek *
35*d219b4ceSAdam Hornacek * [3] Words. Words of a command are separated by white space (except for
36*d219b4ceSAdam Hornacek * newlines, which are command separators).
37*d219b4ceSAdam Hornacek * [4] Double quotes. If the first character of a word is double-quote (``"'')
38*d219b4ceSAdam Hornacek * then the word is terminated by the next double-quote character.
39*d219b4ceSAdam Hornacek * [5] Braces. If the first character of a word is an open brace (``{'') then
40*d219b4ceSAdam Hornacek * the word is terminated by the matching close brace (``}'').
41*d219b4ceSAdam Hornacek *  N.b. OpenGrok handles [4] and [5] as special matches distinct from {Word}.
42*d219b4ceSAdam Hornacek *
43*d219b4ceSAdam Hornacek * [9] Comments. If a hash character (``#'') appears at a point where Tcl is
44*d219b4ceSAdam Hornacek * expecting the first character of the first word of a command, then the hash
45*d219b4ceSAdam Hornacek * character and the characters that follow it, up through the next newline,
46*d219b4ceSAdam Hornacek * are treated as a comment and ignored. The comment character only has
47*d219b4ceSAdam Hornacek * significance when it appears at the beginning of a command.
48*d219b4ceSAdam Hornacek *
49*d219b4ceSAdam Hornacek * N.b. this "OrdinaryWord" is for OpenGrok's purpose of symbol tokenization
50*d219b4ceSAdam Hornacek * and deviates from the above definitions by treating backslash escapes as
51*d219b4ceSAdam Hornacek * word breaking and precluding some characters from starting words and mostly
52*d219b4ceSAdam Hornacek * the same from continuing words. E.g., hyphen is not allowed by OpenGrok to
53*d219b4ceSAdam Hornacek * start OrdinaryWord but can be present afterward.
54*d219b4ceSAdam Hornacek */
55*d219b4ceSAdam HornacekOrdinaryWord = [\S--\-,=#\"\}\{\]\[\)\(\\] [\S--#\"\}\{\]\[\)\(\\]*
56*d219b4ceSAdam Hornacek
57*d219b4ceSAdam Hornacek/*
58*d219b4ceSAdam Hornacek * [7] Variable substitution.
59*d219b4ceSAdam Hornacek *
60*d219b4ceSAdam Hornacek * $name
61*d219b4ceSAdam Hornacek *     Name is the name of a scalar variable; the name is a sequence of one or
62*d219b4ceSAdam Hornacek *     more characters that are a letter, digit, underscore, or namespace
63*d219b4ceSAdam Hornacek *     separators (two or more colons).
64*d219b4ceSAdam Hornacek */
65*d219b4ceSAdam HornacekVarsub1 = \$ {name_unit}+
66*d219b4ceSAdam Hornacekname_unit = ([\p{Letter}\p{Digit}_] | [:][:]+)
67*d219b4ceSAdam Hornacek/*
68*d219b4ceSAdam Hornacek * $name(index)
69*d219b4ceSAdam Hornacek *     Name gives the name of an array variable and index gives the name of an
70*d219b4ceSAdam Hornacek *     element within that array. Name must contain only letters, digits,
71*d219b4ceSAdam Hornacek *     underscores, and namespace separators, and may be an empty string.
72*d219b4ceSAdam Hornacek */
73*d219b4ceSAdam HornacekVarsub2 = \$ {name_unit}* \( {name_unit}+ \)
74*d219b4ceSAdam Hornacek/*
75*d219b4ceSAdam Hornacek * ${name}
76*d219b4ceSAdam Hornacek *     Name is the name of a scalar variable. It may contain any characters
77*d219b4ceSAdam Hornacek *     whatsoever except for close braces.
78*d219b4ceSAdam Hornacek */
79*d219b4ceSAdam HornacekVarsub3 = \$\{ [^\}]+ \}
80*d219b4ceSAdam Hornacek
81*d219b4ceSAdam Hornacek/*
82*d219b4ceSAdam Hornacek * [8] Backslash substitution.
83*d219b4ceSAdam Hornacek * Backslash plus a character, where ... in all cases but [for the characters]
84*d219b4ceSAdam Hornacek * described below, the backslash is dropped and the following character is
85*d219b4ceSAdam Hornacek * treated as an ordinary character and included in the word.
86*d219b4ceSAdam Hornacek *
87*d219b4ceSAdam Hornacek * Special cases:
88*d219b4ceSAdam Hornacek * a,f,b,n,r,t,v,backslash;
89*d219b4ceSAdam Hornacek * \<newline>whiteSpace;
90*d219b4ceSAdam Hornacek * \ooo The digits ooo (one, two, or three of them);
91*d219b4ceSAdam Hornacek * \xhh The hexadecimal digits hh .... Any number of hexadecimal digits may be
92*d219b4ceSAdam Hornacek *     present;
93*d219b4ceSAdam Hornacek * \uhhhh The hexadecimal digits hhhh (one, two, three, or four of them)
94*d219b4ceSAdam Hornacek *
95*d219b4ceSAdam Hornacek * "Backslash substitution is not performed on words enclosed in braces, except
96*d219b4ceSAdam Hornacek * for backslash-newline as described above."
97*d219b4ceSAdam Hornacek */
98*d219b4ceSAdam HornacekBackslash_sub = [\\] ([afbnrtv\\] | \p{Number}{1,3} | [x][0-9a-fA-F]+ |
99*d219b4ceSAdam Hornacek    [u][0-9a-fA-F]{1,4} | [[^]--[afbnrtv\n\p{Number}xu\\]])
100*d219b4ceSAdam HornacekBackslash_nl = [\\] \n\s+
101*d219b4ceSAdam Hornacek
102*d219b4ceSAdam HornacekWordOperators = ("*" | "&&" | "||")
103