xref: /Lucene/lucene/queryparser/src/java/overview.html (revision f41eabdc5fa091079b83cdc7813cdcfb05dfbf46)
1e8e4245dSRobert Muir<!--
2e8e4245dSRobert Muir  Licensed to the Apache Software Foundation (ASF) under one or more
3e8e4245dSRobert Muir  contributor license agreements.  See the NOTICE file distributed with
4e8e4245dSRobert Muir  this work for additional information regarding copyright ownership.
5e8e4245dSRobert Muir  The ASF licenses this file to You under the Apache License, Version 2.0
6e8e4245dSRobert Muir  (the "License"); you may not use this file except in compliance with
7e8e4245dSRobert Muir  the License.  You may obtain a copy of the License at
8e8e4245dSRobert Muir
9e8e4245dSRobert Muir      http://www.apache.org/licenses/LICENSE-2.0
10e8e4245dSRobert Muir
11e8e4245dSRobert Muir  Unless required by applicable law or agreed to in writing, software
12e8e4245dSRobert Muir  distributed under the License is distributed on an "AS IS" BASIS,
13e8e4245dSRobert Muir  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14e8e4245dSRobert Muir  See the License for the specific language governing permissions and
15e8e4245dSRobert Muir  limitations under the License.
16e8e4245dSRobert Muir  -->
17e8e4245dSRobert Muir<html>
18e8e4245dSRobert Muir  <head>
19e8e4245dSRobert Muir    <title>
20e8e4245dSRobert Muir      QueryParsers
21e8e4245dSRobert Muir    </title>
22e8e4245dSRobert Muir  </head>
23e8e4245dSRobert Muir  <body>
24*f41eabdcSRobert Muir  <h1>Apache Lucene QueryParsers.</h1>
25ba57e92fSRobert Muir  <p>
26ba57e92fSRobert Muir  This module provides a number of queryparsers:
27ba57e92fSRobert Muir  <ul>
28ba57e92fSRobert Muir     <li><a href="#classic">Classic</a>
29ba57e92fSRobert Muir     <li><a href="#analyzing">Analyzing</a>
30ba57e92fSRobert Muir     <li><a href="#complexphrase">Complex Phrase</a>
31ba57e92fSRobert Muir     <li><a href="#extendable">Extendable</a>
32ba57e92fSRobert Muir     <li><a href="#flexible">Flexible</a>
33ba57e92fSRobert Muir     <li><a href="#surround">Surround</a>
34ba57e92fSRobert Muir     <li><a href="#xml">XML</a>
35ba57e92fSRobert Muir  </ul>
360d339043SRobert Muir  <hr>
370d339043SRobert Muir  <h2><a id="classic">Classic</a></h2>
38ba57e92fSRobert Muir  A Simple Lucene QueryParser implemented with JavaCC.
390d339043SRobert Muir  <h2><a id="analyzing">Analyzing</a></h2>
40ba57e92fSRobert Muir  QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
410d339043SRobert Muir  <h2><a id="complexphrase">Complex Phrase</a></h2>
42ba57e92fSRobert Muir  QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
430d339043SRobert Muir  <h2><a id="extendable">Extendable</a></h2>
44ba57e92fSRobert Muir  Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names.
450d339043SRobert Muir  <h2><a id="flexible">Flexible</a></h2>
46ba57e92fSRobert Muir<p>
47ba57e92fSRobert MuirThis project contains the new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization.
48ba57e92fSRobert Muir</p>
49ba57e92fSRobert Muir
50ba57e92fSRobert Muir<p>
51ba57e92fSRobert MuirIt's currently divided in 2 main packages:
52ba57e92fSRobert Muir<ul>
53ba57e92fSRobert Muir<li>{@link org.apache.lucene.queryparser.flexible.core}: it contains the query parser API classes, which should be extended by query parser implementations. </li>
54ba57e92fSRobert Muir<li>{@link org.apache.lucene.queryparser.flexible.standard}: it contains the current Lucene query parser implementation using the new query parser API.</li>
55ba57e92fSRobert Muir</ul>
56ba57e92fSRobert Muir
57ba57e92fSRobert Muir<h3>Features</h3>
58ba57e92fSRobert Muir
59ba57e92fSRobert Muir    <ol>
60ba57e92fSRobert Muir        <li>Full support for boolean logic (not enabled)</li>
61ba57e92fSRobert Muir        <li>QueryNode Trees - support for several syntaxes,
62ba57e92fSRobert Muir            that can be converted into similar syntax QueryNode trees.</li>
63ba57e92fSRobert Muir        <li>QueryNode Processors - Optimize, validate, rewrite the
64ba57e92fSRobert Muir            QueryNode trees</li>
65ba57e92fSRobert Muir    <li>Processors Pipelines - Select your favorite Processor
66ba57e92fSRobert Muir        and build a processor pipeline, to implement the features you need</li>
67ba57e92fSRobert Muir        <li>Config Interfaces - Allow the consumer of the Query Parser to implement
68ba57e92fSRobert Muir            a diff Config Handler Objects to suite their needs.</li>
69ba57e92fSRobert Muir        <li>Standard Builders - convert QueryNode's into several lucene
70ba57e92fSRobert Muir            representations. Supported conversion is using a 2.4 compatible logic</li>
71ba57e92fSRobert Muir        <li>QueryNode tree's can be converted to a lucene 2.4 syntax string, using toQueryString</li>
72ba57e92fSRobert Muir    </ol>
73ba57e92fSRobert Muir
74ba57e92fSRobert Muir<h3>Design</h3>
75ba57e92fSRobert Muir<p>
76ba57e92fSRobert MuirThis new query parser was designed to have very generic
77ba57e92fSRobert Muirarchitecture, so that it can be easily used for different
78ba57e92fSRobert Muirproducts with varying query syntaxes. This code is much more
79ba57e92fSRobert Muirflexible and extensible than the Lucene query parser in 2.4.X.
80ba57e92fSRobert Muir</p>
81ba57e92fSRobert Muir<p>
82ba57e92fSRobert MuirThe new query parser  goal is to separate syntax and semantics of a query. E.g. 'a AND
83ba57e92fSRobert Muirb', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
84ba57e92fSRobert MuirIt distinguishes the semantics of the different query components, e.g.
85ba57e92fSRobert Muirwhether and how to tokenize/lemmatize/normalize the different terms or
86ba57e92fSRobert Muirwhich Query objects to create for the terms. It allows to
87ba57e92fSRobert Muirwrite a parser with a new syntax, while reusing the underlying
88ba57e92fSRobert Muirsemantics, as quickly as possible.
89ba57e92fSRobert Muir</p>
90ba57e92fSRobert Muir<p>
91ba57e92fSRobert MuirThe query parser has three layers and its core is what we call the
92ba57e92fSRobert MuirQueryNode tree. It is a tree that initially represents the syntax of the
93ba57e92fSRobert Muiroriginal query, e.g. for 'a AND b':
94ba57e92fSRobert Muir</p>
95ba57e92fSRobert Muir<pre>
96ba57e92fSRobert Muir      AND
97ba57e92fSRobert Muir     /   \
98ba57e92fSRobert Muir    A     B
99ba57e92fSRobert Muir</pre>
100ba57e92fSRobert Muir<p>
101ba57e92fSRobert MuirThe three layers are:
102ba57e92fSRobert Muir</p>
103ba57e92fSRobert Muir<dl>
104ba57e92fSRobert Muir<dt>QueryParser</dt>
105ba57e92fSRobert Muir<dd>
106ba57e92fSRobert MuirThis layer is the text parsing layer which simply transforms the
107ba57e92fSRobert Muirquery text string into a {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} tree. Every text parser
108ba57e92fSRobert Muirmust implement the interface {@link org.apache.lucene.queryparser.flexible.core.parser.SyntaxParser}.
109ba57e92fSRobert MuirLucene default implementations implements it using JavaCC.
110ba57e92fSRobert Muir</dd>
111ba57e92fSRobert Muir
112ba57e92fSRobert Muir<dt>QueryNodeProcessor</dt>
113ba57e92fSRobert Muir<dd>The query node processors do most of the work. It is in fact a
114ba57e92fSRobert Muirconfigurable chain of processors. Each processors can walk the tree and
115ba57e92fSRobert Muirmodify nodes or even the tree's structure. That makes it possible to
116ba57e92fSRobert Muire.g. do query optimization before the query is executed or to tokenize
117ba57e92fSRobert Muirterms.
118ba57e92fSRobert Muir</dd>
119ba57e92fSRobert Muir
120ba57e92fSRobert Muir<dt>QueryBuilder</dt>
121ba57e92fSRobert Muir<dd>
122ba57e92fSRobert MuirThe third layer is a configurable map of builders, which map {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} types to its specific
123ba57e92fSRobert Muirbuilder that will transform the QueryNode into Lucene Query object.
124ba57e92fSRobert Muir</dd>
125ba57e92fSRobert Muir
126ba57e92fSRobert Muir</dl>
127ba57e92fSRobert Muir
128ba57e92fSRobert Muir<p>
129ba57e92fSRobert MuirFurthermore, the query parser uses flexible configuration objects. It also uses message classes that
130ba57e92fSRobert Muirallow to attach resource bundles. This makes it possible to translate
131ba57e92fSRobert Muirmessages, which is an important feature of a query parser.
132ba57e92fSRobert Muir</p>
133ba57e92fSRobert Muir<p>
134ba57e92fSRobert MuirThis design allows to develop different query syntaxes very quickly.
135ba57e92fSRobert Muir</p>
136ba57e92fSRobert Muir
137ba57e92fSRobert Muir<h3>StandardQueryParser and QueryParserWrapper</h3>
138ba57e92fSRobert Muir
139ba57e92fSRobert Muir<p>
140ba57e92fSRobert MuirThe classic Lucene query parser is located under
141ba57e92fSRobert Muir{@link org.apache.lucene.queryparser.classic}.
142ba57e92fSRobert Muir<p>
143ba57e92fSRobert MuirTo make it simpler to use the new query parser
144ba57e92fSRobert Muirthe class {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} may be helpful,
145ba57e92fSRobert Muirspecially for people that do not want to extend the Query Parser.
146ba57e92fSRobert MuirIt uses the default Lucene query processors, text parser and builders, so
147ba57e92fSRobert Muiryou don't need to worry about dealing with those.
148ba57e92fSRobert Muir
149ba57e92fSRobert Muir{@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} usage:
150ba57e92fSRobert Muir
151ba57e92fSRobert Muir<pre class="prettyprint">
152ba57e92fSRobert Muir      StandardQueryParser qpHelper = new StandardQueryParser();
153ba57e92fSRobert Muir      StandardQueryConfigHandler config =  qpHelper.getQueryConfigHandler();
154ba57e92fSRobert Muir      config.setAllowLeadingWildcard(true);
155ba57e92fSRobert Muir      config.setAnalyzer(new WhitespaceAnalyzer());
156ba57e92fSRobert Muir      Query query = qpHelper.parse("apache AND lucene", "defaultField");
157ba57e92fSRobert Muir</pre>
1580d339043SRobert Muir<h2><a id="surround">Surround</a></h2>
159ba57e92fSRobert Muir<p>
160ba57e92fSRobert MuirA QueryParser that supports the Span family of queries as well as pre and infix notation.
161ba57e92fSRobert Muir</p>
1620d339043SRobert Muir<h2><a id="xml">XML</a></h2>
163ba57e92fSRobert MuirA QueryParser that produces Lucene Query objects from XML streams.
164e8e4245dSRobert Muir  </body>
165e8e4245dSRobert Muir</html>
166