1<!-- 2 Licensed to the Apache Software Foundation (ASF) under one or more 3 contributor license agreements. See the NOTICE file distributed with 4 this work for additional information regarding copyright ownership. 5 The ASF licenses this file to You under the Apache License, Version 2.0 6 (the "License"); you may not use this file except in compliance with 7 the License. You may obtain a copy of the License at 8 9 http://www.apache.org/licenses/LICENSE-2.0 10 11 Unless required by applicable law or agreed to in writing, software 12 distributed under the License is distributed on an "AS IS" BASIS, 13 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 See the License for the specific language governing permissions and 15 limitations under the License. 16--> 17<html> 18<head> 19 <title>Apache Lucene API</title> 20</head> 21<body> 22 23<p>Apache Lucene is a high-performance, full-featured text search engine library. 24Here's a simple example how to use Lucene for indexing and searching (using JUnit 25to check if the results are what we expect):</p> 26 27<!-- code comes from org.apache.lucene.TestDemo. 28 See LUCENE-8481 for reasons why it's out of sync with the code. 29 --> 30<pre class="prettyprint"> 31 Analyzer analyzer = new StandardAnalyzer(); 32 33 Path indexPath = Files.createTempDirectory("tempIndex"); 34 Directory directory = FSDirectory.open(indexPath); 35 IndexWriterConfig config = new IndexWriterConfig(analyzer); 36 IndexWriter iwriter = new IndexWriter(directory, config); 37 Document doc = new Document(); 38 String text = "This is the text to be indexed."; 39 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); 40 iwriter.addDocument(doc); 41 iwriter.close(); 42 43 // Now search the index: 44 DirectoryReader ireader = DirectoryReader.open(directory); 45 IndexSearcher isearcher = new IndexSearcher(ireader); 46 // Parse a simple query that searches for "text": 47 QueryParser parser = new QueryParser("fieldname", analyzer); 48 Query query = parser.parse("text"); 49 ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs; 50 assertEquals(1, hits.length); 51 // Iterate through the results: 52 for (int i = 0; i < hits.length; i++) { 53 Document hitDoc = isearcher.doc(hits[i].doc); 54 assertEquals("This is the text to be indexed.", hitDoc.get("fieldname")); 55 } 56 ireader.close(); 57 directory.close(); 58 IOUtils.rm(indexPath);</pre> 59<!-- ======================================================== --> 60 61 62 63<p>The Lucene API is divided into several packages:</p> 64 65<ul> 66<li> 67<b>{@link org.apache.lucene.analysis}</b> 68defines an abstract {@link org.apache.lucene.analysis.Analyzer Analyzer} 69API for converting text from a {@link java.io.Reader} 70into a {@link org.apache.lucene.analysis.TokenStream TokenStream}, 71an enumeration of token {@link org.apache.lucene.util.Attribute Attribute}s. 72A TokenStream can be composed by applying {@link org.apache.lucene.analysis.TokenFilter TokenFilter}s 73to the output of a {@link org.apache.lucene.analysis.Tokenizer Tokenizer}. 74Tokenizers and TokenFilters are strung together and applied with an {@link org.apache.lucene.analysis.Analyzer Analyzer}. 75<a href="../analysis/common/overview-summary.html">analysis-common</a> provides a number of Analyzer implementations, including 76<a href="../analysis/common/org/apache/lucene/analysis/core/StopAnalyzer.html">StopAnalyzer</a> 77and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li> 78 79<li> 80<b>{@link org.apache.lucene.codecs}</b> 81provides an abstraction over the encoding and decoding of the inverted index structure, 82as well as different implementations that can be chosen depending upon application needs. 83 84<li> 85<b>{@link org.apache.lucene.document}</b> 86provides a simple {@link org.apache.lucene.document.Document Document} 87class. A Document is simply a set of named {@link org.apache.lucene.document.Field Field}s, 88whose values may be strings or instances of {@link java.io.Reader}.</li> 89 90<li> 91<b>{@link org.apache.lucene.index}</b> 92provides two primary classes: {@link org.apache.lucene.index.IndexWriter IndexWriter}, 93which creates and adds documents to indices; and {@link org.apache.lucene.index.IndexReader}, 94which accesses the data in the index.</li> 95 96<li> 97<b>{@link org.apache.lucene.search}</b> 98provides data structures to represent queries (ie {@link org.apache.lucene.search.TermQuery TermQuery} 99for individual words, {@link org.apache.lucene.search.PhraseQuery PhraseQuery} 100for phrases, and {@link org.apache.lucene.search.BooleanQuery BooleanQuery} 101for boolean combinations of queries) and the {@link org.apache.lucene.search.IndexSearcher IndexSearcher} 102which turns queries into {@link org.apache.lucene.search.TopDocs TopDocs}. 103A number of <a href="../queryparser/overview-summary.html">QueryParser</a>s are provided for producing 104query structures from strings or xml. 105 106<li> 107<b>{@link org.apache.lucene.store}</b> 108defines an abstract class for storing persistent data, the {@link org.apache.lucene.store.Directory Directory}, 109which is a collection of named files written by an {@link org.apache.lucene.store.IndexOutput IndexOutput} 110and read by an {@link org.apache.lucene.store.IndexInput IndexInput}. 111Multiple implementations are provided, but {@link org.apache.lucene.store.FSDirectory FSDirectory} is generally 112recommended as it tries to use operating system disk buffer caches efficiently.</li> 113 114<li> 115<b>{@link org.apache.lucene.util}</b> 116contains a few handy data structures and util classes, ie {@link org.apache.lucene.util.FixedBitSet FixedBitSet} 117and {@link org.apache.lucene.util.PriorityQueue PriorityQueue}.</li> 118</ul> 119To use Lucene, an application should: 120<ol> 121<li> 122Create {@link org.apache.lucene.document.Document Document}s by 123adding 124{@link org.apache.lucene.document.Field Field}s;</li> 125 126<li> 127Create an {@link org.apache.lucene.index.IndexWriter IndexWriter} 128and add documents to it with {@link org.apache.lucene.index.IndexWriter#addDocument(Iterable) addDocument()};</li> 129 130<li> 131Call <a href="../queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#parse(java.lang.String)">QueryParser.parse()</a> 132to build a query from a string; and</li> 133 134<li> 135Create an {@link org.apache.lucene.search.IndexSearcher IndexSearcher} 136and pass the query to its {@link org.apache.lucene.search.IndexSearcher#search(org.apache.lucene.search.Query, int) search()} 137method.</li> 138</ol> 139Some simple examples of code which does this are: 140<ul> 141<li> 142 <a href="../demo/src-html/org/apache/lucene/demo/IndexFiles.html">IndexFiles.java</a> creates an 143index for all the files contained in a directory.</li> 144 145<li> 146 <a href="../demo/src-html/org/apache/lucene/demo/SearchFiles.html">SearchFiles.java</a> prompts for 147queries and searches an index.</li> 148</ul> 149To demonstrate these, try something like: 150<blockquote><code>> <b>java -cp lucene-core.jar:lucene-demo.jar:lucene-analysis-common.jar org.apache.lucene.demo.IndexFiles -index index -docs rec.food.recipes/soups</b></code> 151<br><code>adding rec.food.recipes/soups/abalone-chowder</code> 152<br><code> </code>[ ... ] 153 154<p><code>> <b>java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analysis-common.jar org.apache.lucene.demo.SearchFiles</b></code> 155<br><code>Query: <b>chowder</b></code> 156<br><code>Searching for: chowder</code> 157<br><code>34 total matching documents</code> 158<br><code>1. rec.food.recipes/soups/spam-chowder</code> 159<br><code> </code>[ ... thirty-four documents contain the word "chowder" ... ] 160 161<p><code>Query: <b>"clam chowder" AND Manhattan</b></code> 162<br><code>Searching for: +"clam chowder" +manhattan</code> 163<br><code>2 total matching documents</code> 164<br><code>1. rec.food.recipes/soups/clam-chowder</code> 165<br><code> </code>[ ... two documents contain the phrase "clam chowder" 166and the word "manhattan" ... ] 167<br> [ Note: "+" and "-" are canonical, but "AND", "OR" 168and "NOT" may be used. ]</blockquote> 169 170</body> 171</html> 172