1e8e4245dSRobert Muir<!-- 2e8e4245dSRobert Muir Licensed to the Apache Software Foundation (ASF) under one or more 3e8e4245dSRobert Muir contributor license agreements. See the NOTICE file distributed with 4e8e4245dSRobert Muir this work for additional information regarding copyright ownership. 5e8e4245dSRobert Muir The ASF licenses this file to You under the Apache License, Version 2.0 6e8e4245dSRobert Muir (the "License"); you may not use this file except in compliance with 7e8e4245dSRobert Muir the License. You may obtain a copy of the License at 8e8e4245dSRobert Muir 9e8e4245dSRobert Muir http://www.apache.org/licenses/LICENSE-2.0 10e8e4245dSRobert Muir 11e8e4245dSRobert Muir Unless required by applicable law or agreed to in writing, software 12e8e4245dSRobert Muir distributed under the License is distributed on an "AS IS" BASIS, 13e8e4245dSRobert Muir WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14e8e4245dSRobert Muir See the License for the specific language governing permissions and 15e8e4245dSRobert Muir limitations under the License. 16e8e4245dSRobert Muir --> 17e8e4245dSRobert Muir<html> 18e8e4245dSRobert Muir <head> 19e8e4245dSRobert Muir <title> 20e8e4245dSRobert Muir QueryParsers 21e8e4245dSRobert Muir </title> 22e8e4245dSRobert Muir </head> 23e8e4245dSRobert Muir <body> 24*f41eabdcSRobert Muir <h1>Apache Lucene QueryParsers.</h1> 25ba57e92fSRobert Muir <p> 26ba57e92fSRobert Muir This module provides a number of queryparsers: 27ba57e92fSRobert Muir <ul> 28ba57e92fSRobert Muir <li><a href="#classic">Classic</a> 29ba57e92fSRobert Muir <li><a href="#analyzing">Analyzing</a> 30ba57e92fSRobert Muir <li><a href="#complexphrase">Complex Phrase</a> 31ba57e92fSRobert Muir <li><a href="#extendable">Extendable</a> 32ba57e92fSRobert Muir <li><a href="#flexible">Flexible</a> 33ba57e92fSRobert Muir <li><a href="#surround">Surround</a> 34ba57e92fSRobert Muir <li><a href="#xml">XML</a> 35ba57e92fSRobert Muir </ul> 360d339043SRobert Muir <hr> 370d339043SRobert Muir <h2><a id="classic">Classic</a></h2> 38ba57e92fSRobert Muir A Simple Lucene QueryParser implemented with JavaCC. 390d339043SRobert Muir <h2><a id="analyzing">Analyzing</a></h2> 40ba57e92fSRobert Muir QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer. 410d339043SRobert Muir <h2><a id="complexphrase">Complex Phrase</a></h2> 42ba57e92fSRobert Muir QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*" 430d339043SRobert Muir <h2><a id="extendable">Extendable</a></h2> 44ba57e92fSRobert Muir Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names. 450d339043SRobert Muir <h2><a id="flexible">Flexible</a></h2> 46ba57e92fSRobert Muir<p> 47ba57e92fSRobert MuirThis project contains the new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization. 48ba57e92fSRobert Muir</p> 49ba57e92fSRobert Muir 50ba57e92fSRobert Muir<p> 51ba57e92fSRobert MuirIt's currently divided in 2 main packages: 52ba57e92fSRobert Muir<ul> 53ba57e92fSRobert Muir<li>{@link org.apache.lucene.queryparser.flexible.core}: it contains the query parser API classes, which should be extended by query parser implementations. </li> 54ba57e92fSRobert Muir<li>{@link org.apache.lucene.queryparser.flexible.standard}: it contains the current Lucene query parser implementation using the new query parser API.</li> 55ba57e92fSRobert Muir</ul> 56ba57e92fSRobert Muir 57ba57e92fSRobert Muir<h3>Features</h3> 58ba57e92fSRobert Muir 59ba57e92fSRobert Muir <ol> 60ba57e92fSRobert Muir <li>Full support for boolean logic (not enabled)</li> 61ba57e92fSRobert Muir <li>QueryNode Trees - support for several syntaxes, 62ba57e92fSRobert Muir that can be converted into similar syntax QueryNode trees.</li> 63ba57e92fSRobert Muir <li>QueryNode Processors - Optimize, validate, rewrite the 64ba57e92fSRobert Muir QueryNode trees</li> 65ba57e92fSRobert Muir <li>Processors Pipelines - Select your favorite Processor 66ba57e92fSRobert Muir and build a processor pipeline, to implement the features you need</li> 67ba57e92fSRobert Muir <li>Config Interfaces - Allow the consumer of the Query Parser to implement 68ba57e92fSRobert Muir a diff Config Handler Objects to suite their needs.</li> 69ba57e92fSRobert Muir <li>Standard Builders - convert QueryNode's into several lucene 70ba57e92fSRobert Muir representations. Supported conversion is using a 2.4 compatible logic</li> 71ba57e92fSRobert Muir <li>QueryNode tree's can be converted to a lucene 2.4 syntax string, using toQueryString</li> 72ba57e92fSRobert Muir </ol> 73ba57e92fSRobert Muir 74ba57e92fSRobert Muir<h3>Design</h3> 75ba57e92fSRobert Muir<p> 76ba57e92fSRobert MuirThis new query parser was designed to have very generic 77ba57e92fSRobert Muirarchitecture, so that it can be easily used for different 78ba57e92fSRobert Muirproducts with varying query syntaxes. This code is much more 79ba57e92fSRobert Muirflexible and extensible than the Lucene query parser in 2.4.X. 80ba57e92fSRobert Muir</p> 81ba57e92fSRobert Muir<p> 82ba57e92fSRobert MuirThe new query parser goal is to separate syntax and semantics of a query. E.g. 'a AND 83ba57e92fSRobert Muirb', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. 84ba57e92fSRobert MuirIt distinguishes the semantics of the different query components, e.g. 85ba57e92fSRobert Muirwhether and how to tokenize/lemmatize/normalize the different terms or 86ba57e92fSRobert Muirwhich Query objects to create for the terms. It allows to 87ba57e92fSRobert Muirwrite a parser with a new syntax, while reusing the underlying 88ba57e92fSRobert Muirsemantics, as quickly as possible. 89ba57e92fSRobert Muir</p> 90ba57e92fSRobert Muir<p> 91ba57e92fSRobert MuirThe query parser has three layers and its core is what we call the 92ba57e92fSRobert MuirQueryNode tree. It is a tree that initially represents the syntax of the 93ba57e92fSRobert Muiroriginal query, e.g. for 'a AND b': 94ba57e92fSRobert Muir</p> 95ba57e92fSRobert Muir<pre> 96ba57e92fSRobert Muir AND 97ba57e92fSRobert Muir / \ 98ba57e92fSRobert Muir A B 99ba57e92fSRobert Muir</pre> 100ba57e92fSRobert Muir<p> 101ba57e92fSRobert MuirThe three layers are: 102ba57e92fSRobert Muir</p> 103ba57e92fSRobert Muir<dl> 104ba57e92fSRobert Muir<dt>QueryParser</dt> 105ba57e92fSRobert Muir<dd> 106ba57e92fSRobert MuirThis layer is the text parsing layer which simply transforms the 107ba57e92fSRobert Muirquery text string into a {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} tree. Every text parser 108ba57e92fSRobert Muirmust implement the interface {@link org.apache.lucene.queryparser.flexible.core.parser.SyntaxParser}. 109ba57e92fSRobert MuirLucene default implementations implements it using JavaCC. 110ba57e92fSRobert Muir</dd> 111ba57e92fSRobert Muir 112ba57e92fSRobert Muir<dt>QueryNodeProcessor</dt> 113ba57e92fSRobert Muir<dd>The query node processors do most of the work. It is in fact a 114ba57e92fSRobert Muirconfigurable chain of processors. Each processors can walk the tree and 115ba57e92fSRobert Muirmodify nodes or even the tree's structure. That makes it possible to 116ba57e92fSRobert Muire.g. do query optimization before the query is executed or to tokenize 117ba57e92fSRobert Muirterms. 118ba57e92fSRobert Muir</dd> 119ba57e92fSRobert Muir 120ba57e92fSRobert Muir<dt>QueryBuilder</dt> 121ba57e92fSRobert Muir<dd> 122ba57e92fSRobert MuirThe third layer is a configurable map of builders, which map {@link org.apache.lucene.queryparser.flexible.core.nodes.QueryNode} types to its specific 123ba57e92fSRobert Muirbuilder that will transform the QueryNode into Lucene Query object. 124ba57e92fSRobert Muir</dd> 125ba57e92fSRobert Muir 126ba57e92fSRobert Muir</dl> 127ba57e92fSRobert Muir 128ba57e92fSRobert Muir<p> 129ba57e92fSRobert MuirFurthermore, the query parser uses flexible configuration objects. It also uses message classes that 130ba57e92fSRobert Muirallow to attach resource bundles. This makes it possible to translate 131ba57e92fSRobert Muirmessages, which is an important feature of a query parser. 132ba57e92fSRobert Muir</p> 133ba57e92fSRobert Muir<p> 134ba57e92fSRobert MuirThis design allows to develop different query syntaxes very quickly. 135ba57e92fSRobert Muir</p> 136ba57e92fSRobert Muir 137ba57e92fSRobert Muir<h3>StandardQueryParser and QueryParserWrapper</h3> 138ba57e92fSRobert Muir 139ba57e92fSRobert Muir<p> 140ba57e92fSRobert MuirThe classic Lucene query parser is located under 141ba57e92fSRobert Muir{@link org.apache.lucene.queryparser.classic}. 142ba57e92fSRobert Muir<p> 143ba57e92fSRobert MuirTo make it simpler to use the new query parser 144ba57e92fSRobert Muirthe class {@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} may be helpful, 145ba57e92fSRobert Muirspecially for people that do not want to extend the Query Parser. 146ba57e92fSRobert MuirIt uses the default Lucene query processors, text parser and builders, so 147ba57e92fSRobert Muiryou don't need to worry about dealing with those. 148ba57e92fSRobert Muir 149ba57e92fSRobert Muir{@link org.apache.lucene.queryparser.flexible.standard.StandardQueryParser} usage: 150ba57e92fSRobert Muir 151ba57e92fSRobert Muir<pre class="prettyprint"> 152ba57e92fSRobert Muir StandardQueryParser qpHelper = new StandardQueryParser(); 153ba57e92fSRobert Muir StandardQueryConfigHandler config = qpHelper.getQueryConfigHandler(); 154ba57e92fSRobert Muir config.setAllowLeadingWildcard(true); 155ba57e92fSRobert Muir config.setAnalyzer(new WhitespaceAnalyzer()); 156ba57e92fSRobert Muir Query query = qpHelper.parse("apache AND lucene", "defaultField"); 157ba57e92fSRobert Muir</pre> 1580d339043SRobert Muir<h2><a id="surround">Surround</a></h2> 159ba57e92fSRobert Muir<p> 160ba57e92fSRobert MuirA QueryParser that supports the Span family of queries as well as pre and infix notation. 161ba57e92fSRobert Muir</p> 1620d339043SRobert Muir<h2><a id="xml">XML</a></h2> 163ba57e92fSRobert MuirA QueryParser that produces Lucene Query objects from XML streams. 164e8e4245dSRobert Muir </body> 165e8e4245dSRobert Muir</html> 166