xref: /Lucene/lucene/MIGRATE.md (revision 71a9acb2e2aa55257021eefce1e5d8d390bc7048)
13bedc087SDawid Weiss<!--
23bedc087SDawid Weiss    Licensed to the Apache Software Foundation (ASF) under one or more
33bedc087SDawid Weiss    contributor license agreements.  See the NOTICE file distributed with
43bedc087SDawid Weiss    this work for additional information regarding copyright ownership.
53bedc087SDawid Weiss    The ASF licenses this file to You under the Apache License, Version 2.0
63bedc087SDawid Weiss    the "License"); you may not use this file except in compliance with
73bedc087SDawid Weiss    the License.  You may obtain a copy of the License at
83bedc087SDawid Weiss
93bedc087SDawid Weiss        http://www.apache.org/licenses/LICENSE-2.0
103bedc087SDawid Weiss
113bedc087SDawid Weiss    Unless required by applicable law or agreed to in writing, software
123bedc087SDawid Weiss    distributed under the License is distributed on an "AS IS" BASIS,
133bedc087SDawid Weiss    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
143bedc087SDawid Weiss    See the License for the specific language governing permissions and
153bedc087SDawid Weiss    limitations under the License.
163bedc087SDawid Weiss -->
173bedc087SDawid Weiss
18c7697b08STomoko Uchida# Apache Lucene Migration Guide
19c7697b08STomoko Uchida
20b2e866b7SRobert Muir## Migration from Lucene 9.x to Lucene 10.0
21b2e866b7SRobert Muir
22*71a9acb2STomoko Uchida### PersianStemFilter is added to PersianAnalyzer (LUCENE-10312)
23*71a9acb2STomoko Uchida
24*71a9acb2STomoko UchidaPersianAnalyzer now includes PersianStemFilter, that would change analysis results. If you need the exactly same analysis
25*71a9acb2STomoko Uchidabehaviour as 9.x, clone `PersianAnalyzer` in 9.x or create custom analyzer by using `CustomAnalyzer` on your own.
26*71a9acb2STomoko Uchida
2784e4b85bSRobert Muir### AutomatonQuery/CompiledAutomaton/RunAutomaton/RegExp no longer determinize (LUCENE-10010)
28b2e866b7SRobert Muir
29b2e866b7SRobert MuirThese classes no longer take a `determinizeWorkLimit` and no longer determinize
30b2e866b7SRobert Muirbehind the scenes. It is the responsibility of the caller to to call
31b2e866b7SRobert Muir`Operations.determinize()` for DFA execution.
32b2e866b7SRobert Muir
3394fe7e31Szacharymorn### DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery removed in favor of FieldExistsQuery (LUCENE-10436)
3494fe7e31Szacharymorn
3594fe7e31SzacharymornThese classes have been removed and consolidated into `FieldExistsQuery`. To migrate, caller simply replace those classes
3694fe7e31Szacharymornwith the new one during object instantiation.
3794fe7e31Szacharymorn
38694d7975SRushabh Shah### Normalizer and stemmer classes are now package private (LUCENE-10561)
39694d7975SRushabh Shah
40694d7975SRushabh ShahExcept for a few exceptions, almost all normalizer and stemmer classes are now package private. If your code depends on
41694d7975SRushabh Shahconstants defined in them, copy the constant values and re-define them in your code.
42694d7975SRushabh Shah
434dc3e8abSRobert Muir## Migration from Lucene 9.0 to Lucene 9.1
444dc3e8abSRobert Muir
45a94fbb79SDawid Weiss### Test framework package migration and module (LUCENE-10301)
46a94fbb79SDawid Weiss
470b517573SUwe SchindlerThe test framework is now a Java module. All the classes have been moved from
48a94fbb79SDawid Weiss`org.apache.lucene.*` to `org.apache.lucene.tests.*` to avoid package name conflicts
49a94fbb79SDawid Weisswith the core module. If you were using the Lucene test framework, the migration should be
50a94fbb79SDawid Weissfairly automatic (package prefix).
51a94fbb79SDawid Weiss
524dc3e8abSRobert Muir### Minor syntactical changes in StandardQueryParser (LUCENE-10223)
534dc3e8abSRobert Muir
544dc3e8abSRobert MuirAdded interval functions and min-should-match support to `StandardQueryParser`. This
554dc3e8abSRobert Muirmeans that interval function prefixes (`fn:`) and the `@` character after parentheses will
564dc3e8abSRobert Muirparse differently than before. If you need the exact previous behavior, clone the
574dc3e8abSRobert Muir`StandardSyntaxParser` from the previous version of Lucene and create a custom query parser
584dc3e8abSRobert Muirwith that parser.
594dc3e8abSRobert Muir
600b517573SUwe Schindler### Lucene Core now depends on java.logging (JUL) module (LUCENE-10342)
610b517573SUwe Schindler
620b517573SUwe SchindlerLucene Core now logs certain warnings and errors using Java Util Logging (JUL).
630b517573SUwe SchindlerIt is therefore recommended to install wrapper libraries with JUL logging handlers to
640b517573SUwe Schindlerfeed the log events into your app's own logging system.
650b517573SUwe Schindler
660b517573SUwe SchindlerUnder normal circumstances Lucene won't log anything, but in the case of a problem
670b517573SUwe Schindlerusers should find the logged information in the usual log files.
680b517573SUwe Schindler
690b517573SUwe SchindlerLucene also provides a `JavaLoggingInfoStream` implementation that logs `IndexWriter`
700b517573SUwe Schindlerevents using JUL.
710b517573SUwe Schindler
720b517573SUwe SchindlerTo feed Lucene's log events into the well-known Log4J system, we refer to
730b517573SUwe Schindlerthe [Log4j JDK Logging Adapter](https://logging.apache.org/log4j/2.x/log4j-jul/index.html)
740b517573SUwe Schindlerin combination with the corresponding system property:
750b517573SUwe Schindler`java.util.logging.manager=org.apache.logging.log4j.jul.LogManager`.
760b517573SUwe Schindler
778aa4a564SUwe Schindler### Kuromoji and Nori analysis component constructors for custom dictionaries
788aa4a564SUwe Schindler
798aa4a564SUwe SchindlerThe Kuromoji and Nori analysis modules had some way to customize the backing dictionaries
808aa4a564SUwe Schindlerby passing a path to file or classpath resources using some inconsistently implemented
818aa4a564SUwe SchindlerAPIs. This was buggy from the beginning, but some users made use of it. Due to move to Java
828aa4a564SUwe Schindlermodule system, especially the resource lookup on classpath stopped to work correctly.
838aa4a564SUwe SchindlerThe Lucene team therefore implemented new APIs to create dictionary implementations
848aa4a564SUwe Schindlerwith custom data files. Unfortunately there were some shortcomings in the 9.1 version,
858aa4a564SUwe Schindleralso when using the now deprecated ctors, so users are advised to upgrade to
868aa4a564SUwe SchindlerLucene 9.2 or stay with 9.0.
878aa4a564SUwe Schindler
888aa4a564SUwe SchindlerSee LUCENE-10558 for more details and workarounds.
898aa4a564SUwe Schindler
904dc3e8abSRobert Muir## Migration from Lucene 8.x to Lucene 9.0
914dc3e8abSRobert Muir
924dc3e8abSRobert Muir### Rename of binary artifacts from '**-analyzers-**' to '**-analysis-**' (LUCENE-9562)
935aa9da9eSRobert Muir
945aa9da9eSRobert MuirAll binary analysis packages (and corresponding Maven artifacts) have been renamed and are
954dc3e8abSRobert Muirnow consistent with repository module `analysis`. You will need to adjust build dependencies
965aa9da9eSRobert Muirto the new coordinates:
975aa9da9eSRobert Muir
985aa9da9eSRobert Muir|         Old Artifact Coordinates            |        New Artifact Coordinates            |
995aa9da9eSRobert Muir|---------------------------------------------|--------------------------------------------|
1005aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-common    |org.apache.lucene:lucene-analysis-common    |
1015aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-icu       |org.apache.lucene:lucene-analysis-icu       |
1025aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-kuromoji  |org.apache.lucene:lucene-analysis-kuromoji  |
1035aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-morfologik|org.apache.lucene:lucene-analysis-morfologik|
1045aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-nori      |org.apache.lucene:lucene-analysis-nori      |
1055aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-opennlp   |org.apache.lucene:lucene-analysis-opennlp   |
1065aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-phonetic  |org.apache.lucene:lucene-analysis-phonetic  |
1075aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-smartcn   |org.apache.lucene:lucene-analysis-smartcn   |
1085aa9da9eSRobert Muir|org.apache.lucene:lucene-analyzers-stempel   |org.apache.lucene:lucene-analysis-stempel   |
1095aa9da9eSRobert Muir
110f725b27eSDawid Weiss
1114dc3e8abSRobert Muir### LucenePackage class removed (LUCENE-10260)
112f725b27eSDawid Weiss
1134dc3e8abSRobert Muir`LucenePackage` class has been removed. The implementation string can be
1144dc3e8abSRobert Muirretrieved from `Version.getPackageImplementationVersion()`.
115651755aaSDawid Weiss
1164dc3e8abSRobert Muir### Directory API is now little-endian (LUCENE-9047)
117651755aaSDawid Weiss
1184dc3e8abSRobert Muir`DataOutput`'s `writeShort()`, `writeInt()`, and `writeLong()` methods now encode with
1194dc3e8abSRobert Muirlittle-endian byte order. If you have custom subclasses of `DataInput`/`DataOutput`, you
1204dc3e8abSRobert Muirwill need to adjust them from big-endian byte order to little-endian byte order.
121321d274bSRobert Muir
1224dc3e8abSRobert Muir### NativeUnixDirectory removed and replaced by DirectIODirectory (LUCENE-8982)
1234b508aefSUwe Schindler
1244b508aefSUwe SchindlerJava 11 supports to use Direct IO without native wrappers from Java code.
1254dc3e8abSRobert Muir`NativeUnixDirectory` in the misc module was therefore removed and replaced
1264dc3e8abSRobert Muirby `DirectIODirectory`. To use it, you need a JVM and operating system that
1274b508aefSUwe Schindlersupports Direct IO.
1284b508aefSUwe Schindler
1294dc3e8abSRobert Muir### BM25Similarity.setDiscountOverlaps and LegacyBM25Similarity.setDiscountOverlaps methods removed (LUCENE-9646)
130227256d9SPatrick Marty
1314dc3e8abSRobert MuirThe `discountOverlaps()` parameter for both `BM25Similarity` and `LegacyBM25Similarity`
132227256d9SPatrick Martyis now set by the constructor of those classes.
133227256d9SPatrick Marty
1344dc3e8abSRobert Muir### Packages in misc module are renamed (LUCENE-9600)
135d1110394STomoko Uchida
1364dc3e8abSRobert MuirThese packages in the `lucene-misc` module are renamed:
137d1110394STomoko Uchida
1384dc3e8abSRobert Muir|    Old Package Name      |       New Package Name        |
1394dc3e8abSRobert Muir|--------------------------|-------------------------------|
1404dc3e8abSRobert Muir|org.apache.lucene.document|org.apache.lucene.misc.document|
1414dc3e8abSRobert Muir|org.apache.lucene.index   |org.apache.lucene.misc.index   |
1424dc3e8abSRobert Muir|org.apache.lucene.search  |org.apache.lucene.misc.search  |
1434dc3e8abSRobert Muir|org.apache.lucene.store   |org.apache.lucene.misc.store   |
1444dc3e8abSRobert Muir|org.apache.lucene.util    |org.apache.lucene.misc.util    |
145d1110394STomoko Uchida
1464dc3e8abSRobert MuirThe following classes were moved to the `lucene-core` module:
147d1110394STomoko Uchida
1484dc3e8abSRobert Muir- org.apache.lucene.document.InetAddressPoint
1494dc3e8abSRobert Muir- org.apache.lucene.document.InetAddressRange
1506a7131eeSTomoko Uchida
1514dc3e8abSRobert Muir### Packages in sandbox module are renamed (LUCENE-9319)
1526a7131eeSTomoko Uchida
1534dc3e8abSRobert MuirThese packages in the `lucene-sandbox` module are renamed:
1546a7131eeSTomoko Uchida
1554dc3e8abSRobert Muir|    Old Package Name      |       New Package Name           |
1564dc3e8abSRobert Muir|--------------------------|----------------------------------|
1574dc3e8abSRobert Muir|org.apache.lucene.codecs  |org.apache.lucene.sandbox.codecs  |
1584dc3e8abSRobert Muir|org.apache.lucene.document|org.apache.lucene.sandbox.document|
1594dc3e8abSRobert Muir|org.apache.lucene.search  |org.apache.lucene.sandbox.search  |
16044c1bd42STomoko Uchida
1614dc3e8abSRobert Muir### Backward codecs are renamed (LUCENE-9318)
16244c1bd42STomoko Uchida
1634dc3e8abSRobert MuirThese packages in the `lucene-backwards-codecs` module are renamed:
1644e0aa0d2Smsfroh
1654dc3e8abSRobert Muir|    Old Package Name    |       New Package Name          |
1664dc3e8abSRobert Muir|------------------------|---------------------------------|
1674dc3e8abSRobert Muir|org.apache.lucene.codecs|org.apache.lucene.backward_codecs|
1684dc3e8abSRobert Muir
1694dc3e8abSRobert Muir### JapanesePartOfSpeechStopFilterFactory loads default stop tags if "tags" argument not specified (LUCENE-9567)
1704dc3e8abSRobert Muir
1714dc3e8abSRobert MuirPreviously, `JapanesePartOfSpeechStopFilterFactory` added no filter if `args` didn't include "tags". Now, it will load
1724e0aa0d2Smsfrohthe default stop tags returned by `JapaneseAnalyzer.getDefaultStopTags()` (i.e. the tags from`stoptags.txt` in the
1734e0aa0d2Smsfroh`lucene-analyzers-kuromoji` jar.)
1744e0aa0d2Smsfroh
1754dc3e8abSRobert Muir### ICUCollationKeyAnalyzer is renamed (LUCENE-9558)
176b70eaeeeSTomoko Uchida
1774dc3e8abSRobert MuirThese packages in the `lucene-analysis-icu` module are renamed:
178b70eaeeeSTomoko Uchida
1794dc3e8abSRobert Muir|    Old Package Name       |       New Package Name       |
1804dc3e8abSRobert Muir|---------------------------|------------------------------|
1814dc3e8abSRobert Muir|org.apache.lucene.collation|org.apache.lucene.analysis.icu|
1825e617cccSTomoko Uchida
1834dc3e8abSRobert Muir### Base and concrete analysis factories are moved / package renamed (LUCENE-9317)
1845e617cccSTomoko Uchida
1854dc3e8abSRobert MuirBase analysis factories are moved to `lucene-core`, also their package names are renamed.
1864dc3e8abSRobert Muir
1874dc3e8abSRobert Muir|                Old Class Name                    |               New Class Name               |
1884dc3e8abSRobert Muir|--------------------------------------------------|--------------------------------------------|
1894dc3e8abSRobert Muir|org.apache.lucene.analysis.util.TokenizerFactory  |org.apache.lucene.analysis.TokenizerFactory |
1904dc3e8abSRobert Muir|org.apache.lucene.analysis.util.CharFilterFactory |org.apache.lucene.analysis.CharFilterFactory|
1914dc3e8abSRobert Muir|org.apache.lucene.analysis.util.TokenFilterFactory|org.apache.lucene.analysis.TokenizerFactory |
1925e617cccSTomoko Uchida
1935e617cccSTomoko UchidaThe service provider files placed in `META-INF/services` for custom analysis factories should be renamed as follows:
1945e617cccSTomoko Uchida
1955e617cccSTomoko Uchida- META-INF/services/org.apache.lucene.analysis.TokenizerFactory
1965e617cccSTomoko Uchida- META-INF/services/org.apache.lucene.analysis.CharFilterFactory
1975e617cccSTomoko Uchida- META-INF/services/org.apache.lucene.analysis.TokenFilterFactory
1985e617cccSTomoko Uchida
1994dc3e8abSRobert Muir`StandardTokenizerFactory` is moved to `lucene-core` module.
2005e617cccSTomoko Uchida
2014dc3e8abSRobert MuirThe `org.apache.lucene.analysis.standard` package in `lucene-analysis-common` module
2024dc3e8abSRobert Muiris split into `org.apache.lucene.analysis.classic` and `org.apache.lucene.analysis.email`.
2035e617cccSTomoko Uchida
2044dc3e8abSRobert Muir### RegExpQuery now rejects invalid backslashes (LUCENE-9370)
205819e668cSmarkharwood
206819e668cSmarkharwoodWe now follow the [Java rules](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs) for accepting backslashes.
207819e668cSmarkharwoodAlphabetic characters other than s, S, w, W, d or D that are preceded by a backslash are considered illegal syntax and will throw an exception.
208819e668cSmarkharwood
2094dc3e8abSRobert Muir### RegExp certain regular expressions now match differently (LUCENE-9336)
21018bd2971Smarkharwood
21118bd2971SmarkharwoodThe commonly used regular expressions \w \W \d \D \s and \S now work the same way [Java Pattern](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART) matching works. Previously these expressions were (mis)interpreted as searches for the literal characters w, d, s etc.
21218bd2971Smarkharwood
2134dc3e8abSRobert Muir### NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259)
214c7697b08STomoko Uchida
215c7697b08STomoko UchidaThe factory option name to output the original term was corrected in accordance with its Javadoc.
216c7697b08STomoko Uchida
2174dc3e8abSRobert Muir### IndexMergeTool defaults changes (LUCENE-9206)
218c7697b08STomoko Uchida
219c7697b08STomoko UchidaThis command-line tool no longer forceMerges to a single segment. Instead, by
220c7697b08STomoko Uchidadefault it just follows (configurable) merge policy. If you really want to merge
2214dc3e8abSRobert Muirto a single segment, you can pass `-max-segments 1`.
222c7697b08STomoko Uchida
2234dc3e8abSRobert Muir### FST Builder is renamed FSTCompiler with fluent-style Builder (LUCENE-9089)
224c7697b08STomoko Uchida
2254dc3e8abSRobert MuirSimply use `FSTCompiler` instead of the previous `Builder`. Use either the simple constructor with default settings, or
2264dc3e8abSRobert Muirthe `FSTCompiler.Builder` to tune and tweak any parameter.
227c7697b08STomoko Uchida
2284dc3e8abSRobert Muir### Kuromoji user dictionary now forbids illegal segmentation (LUCENE-8933)
229c7697b08STomoko Uchida
230c7697b08STomoko UchidaUser dictionary now strictly validates if the (concatenated) segment is the same as the surface form. This change avoids
231c7697b08STomoko Uchidaunexpected runtime exceptions or behaviours.
232c7697b08STomoko UchidaFor example, these entries are not allowed at all and an exception is thrown when loading the dictionary file.
233c7697b08STomoko Uchida
234c7697b08STomoko Uchida```
235c7697b08STomoko Uchida# concatenated "日本経済新聞" does not match the surface form "日経新聞"
236c7697b08STomoko Uchida日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
237c7697b08STomoko Uchida
238c7697b08STomoko Uchida# concatenated "日経新聞" does not match the surface form "日本経済新聞"
239c7697b08STomoko Uchida日本経済新聞,日経 新聞,ニッケイ シンブン,カスタム名詞
240c7697b08STomoko Uchida```
241c7697b08STomoko Uchida
2424dc3e8abSRobert Muir### JapaneseTokenizer no longer emits original (compound) tokens by default when the mode is not NORMAL (LUCENE-9123)
243c7697b08STomoko Uchida
2444dc3e8abSRobert Muir`JapaneseTokenizer` and `JapaneseAnalyzer` no longer emits original tokens when `discardCompoundToken` option is not specified.
2454dc3e8abSRobert MuirThe constructor option has been introduced since Lucene 8.5.0, and the default value is changed to `true`.
246c7697b08STomoko Uchida
247c7697b08STomoko UchidaWhen given the text "株式会社", JapaneseTokenizer (mode != NORMAL) emits decompounded tokens "株式" and "会社" only and no
2484dc3e8abSRobert Muirlonger outputs the original token "株式会社" by default. To output original tokens, `discardCompoundToken` option should be
2494dc3e8abSRobert Muirexplicitly set to `false`. Be aware that if this option is set to `false`, `SynonymFilter` or `SynonymGraphFilter` does not work
250c7697b08STomoko Uchidacorrectly (see LUCENE-9173).
251c7697b08STomoko Uchida
2524dc3e8abSRobert Muir### Analysis factories now have customizable symbolic names (LUCENE-8778) and need additional no-arg constructor (LUCENE-9281)
253c7697b08STomoko Uchida
2544dc3e8abSRobert MuirThe SPI names for concrete subclasses of `TokenizerFactory`, `TokenFilterFactory`, and `CharfilterFactory` are no longer
255c7697b08STomoko Uchidaderived from their class name. Instead, each factory must have a static "NAME" field like this:
256c7697b08STomoko Uchida
2574dc3e8abSRobert Muir```java
258c7697b08STomoko Uchida    /** o.a.l.a.standard.StandardTokenizerFactory's SPI name */
259c7697b08STomoko Uchida    public static final String NAME = "standard";
260c7697b08STomoko Uchida```
261c7697b08STomoko Uchida
2624dc3e8abSRobert MuirA factory can be resolved/instantiated with its `NAME` by using methods such as `TokenizerFactory.lookupClass(String)`
2634dc3e8abSRobert Muiror `TokenizerFactory.forName(String, Map<String,String>)`.
264c7697b08STomoko Uchida
2654dc3e8abSRobert MuirIf there are any user-defined factory classes that don't have proper `NAME` field, an exception will be thrown
2664dc3e8abSRobert Muirwhen (re)loading factories. e.g., when calling `TokenizerFactory.reloadTokenizers(ClassLoader)`.
267c7697b08STomoko Uchida
268c7697b08STomoko UchidaIn addition starting all factories need to implement a public no-arg constructor, too. The reason for this
269c7697b08STomoko Uchidachange comes from the fact that Lucene now uses `java.util.ServiceLoader` instead its own implementation to
270c7697b08STomoko Uchidaload the factory classes to be compatible with Java Module System changes (e.g., load factories from modules).
271c7697b08STomoko UchidaIn the future, extensions to Lucene developed on the Java Module System may expose the factories from their
272c7697b08STomoko Uchida`module-info.java` file instead of `META-INF/services`.
273c7697b08STomoko Uchida
2744dc3e8abSRobert MuirThis constructor is never called by Lucene, so by default it throws an `UnsupportedOperationException`. User-defined
275c7697b08STomoko Uchidafactory classes should implement it in the following way:
276c7697b08STomoko Uchida
2774dc3e8abSRobert Muir```java
278c7697b08STomoko Uchida    /** Default ctor for compatibility with SPI */
279c7697b08STomoko Uchida    public StandardTokenizerFactory() {
280c7697b08STomoko Uchida      throw defaultCtorException();
281c7697b08STomoko Uchida    }
282c7697b08STomoko Uchida```
283c7697b08STomoko Uchida
284c7697b08STomoko Uchida(`defaultCtorException()` is a protected static helper method)
285c7697b08STomoko Uchida
2864dc3e8abSRobert Muir### TermsEnum is now fully abstract (LUCENE-8292, LUCENE-8662)
287c7697b08STomoko Uchida
2884dc3e8abSRobert Muir`TermsEnum` has been changed to be fully abstract, so non-abstract subclasses must implement all its methods.
2894dc3e8abSRobert MuirNon-Performance critical `TermsEnum`s can use `BaseTermsEnum` as a base class instead. The change was motivated
2904dc3e8abSRobert Muirby several performance issues with `FilterTermsEnum` that caused significant slowdowns and massive memory consumption due
2914dc3e8abSRobert Muirto not delegating all method from `TermsEnum`.
292c7697b08STomoko Uchida
2934dc3e8abSRobert Muir### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream removed (LUCENE-8474)
294c7697b08STomoko Uchida
2954dc3e8abSRobert MuirRAM-based directory implementation have been removed.
2964dc3e8abSRobert Muir`ByteBuffersDirectory` can be used as a RAM-resident replacement, although it
2974dc3e8abSRobert Muiris discouraged in favor of the default `MMapDirectory`.
298c7697b08STomoko Uchida
2994dc3e8abSRobert Muir### Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014)
300c7697b08STomoko Uchida
3014dc3e8abSRobert Muir`SpanQuery` and `PhraseQuery` now always calculate their slops as
3024dc3e8abSRobert Muir`(1.0 / (1.0 + distance))`.  Payload factor calculation is performed by
3034dc3e8abSRobert Muir`PayloadDecoder` in the `lucene-queries` module.
304c7697b08STomoko Uchida
3054dc3e8abSRobert Muir### Scorer must produce positive scores (LUCENE-7996)
306c7697b08STomoko Uchida
3074dc3e8abSRobert Muir`Scorer`s are no longer allowed to produce negative scores. If you have custom
308c7697b08STomoko Uchidaquery implementations, you should make sure their score formula may never produce
309c7697b08STomoko Uchidanegative scores.
310c7697b08STomoko Uchida
311c7697b08STomoko UchidaAs a side-effect of this change, negative boosts are now rejected and
3124dc3e8abSRobert Muir`FunctionScoreQuery` maps negative values to 0.
313c7697b08STomoko Uchida
3144dc3e8abSRobert Muir### CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099)
315c7697b08STomoko Uchida
3164dc3e8abSRobert MuirInstead use `FunctionScoreQuery` and a `DoubleValuesSource` implementation.  `BoostedQuery`
3174dc3e8abSRobert Muirand `BoostingQuery` may be replaced by calls to `FunctionScoreQuery.boostByValue()` and
3184dc3e8abSRobert Muir`FunctionScoreQuery.boostByQuery()`.  To replace more complex calculations in
3194dc3e8abSRobert Muir`CustomScoreQuery`, use the `lucene-expressions` module:
320c7697b08STomoko Uchida
3214dc3e8abSRobert Muir```java
322c7697b08STomoko UchidaSimpleBindings bindings = new SimpleBindings();
323c7697b08STomoko Uchidabindings.add("score", DoubleValuesSource.SCORES);
324c7697b08STomoko Uchidabindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
325c7697b08STomoko Uchidabindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
326c7697b08STomoko UchidaExpression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
327c7697b08STomoko UchidaFunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
328c7697b08STomoko Uchida```
329c7697b08STomoko Uchida
3304dc3e8abSRobert Muir### IndexOptions can no longer be changed dynamically (LUCENE-8134)
331c7697b08STomoko Uchida
3324dc3e8abSRobert MuirChanging `IndexOptions` for a field on the fly will now result into an
3334dc3e8abSRobert Muir`IllegalArgumentException`. If a field is indexed
3344dc3e8abSRobert Muir(`FieldType.indexOptions() != IndexOptions.NONE`) then all documents must have
335c7697b08STomoko Uchidathe same index options for that field.
336c7697b08STomoko Uchida
337c7697b08STomoko Uchida
3384dc3e8abSRobert Muir### IndexSearcher.createNormalizedWeight() removed (LUCENE-8242)
339c7697b08STomoko Uchida
3404dc3e8abSRobert MuirInstead use `IndexSearcher.createWeight()`, rewriting the query first, and using
3414dc3e8abSRobert Muira boost of `1f`.
342c7697b08STomoko Uchida
3434dc3e8abSRobert Muir### Memory codecs removed (LUCENE-8267)
344c7697b08STomoko Uchida
3454dc3e8abSRobert MuirMemory codecs (`MemoryPostingsFormat`, `MemoryDocValuesFormat`) have been removed from the codebase.
346c7697b08STomoko Uchida
3474dc3e8abSRobert Muir### Direct doc-value format removed (LUCENE-8917)
348c7697b08STomoko Uchida
3494dc3e8abSRobert MuirThe `Direct` doc-value format has been removed from the codebase.
350c7697b08STomoko Uchida
3514dc3e8abSRobert Muir### QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144)
352c7697b08STomoko Uchida
353c7697b08STomoko UchidaCaching everything is discouraged as it disables the ability to skip non-interesting documents.
3544dc3e8abSRobert Muir`ALWAYS_CACHE` can be replaced by a `UsageTrackingQueryCachingPolicy` with an appropriate config.
355c7697b08STomoko Uchida
3564dc3e8abSRobert Muir### English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444)
357c7697b08STomoko Uchida
3584dc3e8abSRobert MuirTo retain the old behaviour, pass `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` as an argument
359c7697b08STomoko Uchidato the constructor
360c7697b08STomoko Uchida
3614dc3e8abSRobert Muir### StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved
362c7697b08STomoko Uchida
3634dc3e8abSRobert MuirEnglish stop words are now defined in `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` in the
3644dc3e8abSRobert Muir`analysis-common` module.
365c7697b08STomoko Uchida
3664dc3e8abSRobert Muir### TopDocs.maxScore removed
367c7697b08STomoko Uchida
3684dc3e8abSRobert Muir`TopDocs.maxScore` is removed. `IndexSearcher` and `TopFieldCollector` no longer have
369c7697b08STomoko Uchidaan option to compute the maximum score when sorting by field. If you need to
370c7697b08STomoko Uchidaknow the maximum score for a query, the recommended approach is to run a
371c7697b08STomoko Uchidaseparate query:
372c7697b08STomoko Uchida
3734dc3e8abSRobert Muir```java
374c7697b08STomoko Uchida  TopDocs topHits = searcher.search(query, 1);
375c7697b08STomoko Uchida  float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
376c7697b08STomoko Uchida```
377c7697b08STomoko Uchida
378c7697b08STomoko UchidaThanks to other optimizations that were added to Lucene 8, this query will be
379c7697b08STomoko Uchidaable to efficiently select the top-scoring document without having to visit
380c7697b08STomoko Uchidaall matches.
381c7697b08STomoko Uchida
3824dc3e8abSRobert Muir### TopFieldCollector always assumes fillFields=true
383c7697b08STomoko Uchida
3844dc3e8abSRobert MuirBecause filling sort values doesn't have a significant overhead, the `fillFields`
3854dc3e8abSRobert Muiroption has been removed from `TopFieldCollector` factory methods. Everything
3864dc3e8abSRobert Muirbehaves as if it was previously set to `true`.
387c7697b08STomoko Uchida
3884dc3e8abSRobert Muir### TopFieldCollector no longer takes a trackDocScores option
389c7697b08STomoko Uchida
390c7697b08STomoko UchidaComputing scores at collection time is less efficient than running a second
391c7697b08STomoko Uchidarequest in order to only compute scores for documents that made it to the top
3924dc3e8abSRobert Muirhits. As a consequence, the `trackDocScores` option has been removed and can be
3934dc3e8abSRobert Muirreplaced with the new `TopFieldCollector.populateScores()` helper method.
394c7697b08STomoko Uchida
3954dc3e8abSRobert Muir### IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long
396c7697b08STomoko Uchida
397c7697b08STomoko UchidaLucene 8 received optimizations for collection of top-k matches by not visiting
398c7697b08STomoko Uchidaall matches. However these optimizations won't help if all matches still need
399c7697b08STomoko Uchidato be visited in order to compute the total number of hits. As a consequence,
4004dc3e8abSRobert Muir`IndexSearcher`'s `search()` and `searchAfter()` methods were changed to only count hits
4014dc3e8abSRobert Muiraccurately up to 1,000, and `Topdocs.totalHits` was changed from a `long` to an
402c7697b08STomoko Uchidaobject that says whether the hit count is accurate or a lower bound of the
403c7697b08STomoko Uchidaactual hit count.
404c7697b08STomoko Uchida
4054dc3e8abSRobert Muir### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated (LUCENE-8467, LUCENE-8438)
406c7697b08STomoko Uchida
407c7697b08STomoko UchidaThis RAM-based directory implementation is an old piece of code that uses inefficient
408c7697b08STomoko Uchidathread synchronization primitives and can be confused as "faster" than the NIO-based
4094dc3e8abSRobert Muir`MMapDirectory`. It is deprecated and scheduled for removal in future versions of
4104dc3e8abSRobert MuirLucene.
411c7697b08STomoko Uchida
4124dc3e8abSRobert Muir### LeafCollector.setScorer() now takes a Scorable rather than a Scorer (LUCENE-6228)
413c7697b08STomoko Uchida
4144dc3e8abSRobert Muir`Scorer` has a number of methods that should never be called from `Collector`s, for example
4154dc3e8abSRobert Muirthose that advance the underlying iterators.  To hide these, `LeafCollector.setScorer()`
4164dc3e8abSRobert Muirnow takes a `Scorable`, an abstract class that scorers can extend, with methods
4174dc3e8abSRobert Muir`docId()` and `score()`.
418c7697b08STomoko Uchida
4194dc3e8abSRobert Muir### Scorers must have non-null Weights
420c7697b08STomoko Uchida
4214dc3e8abSRobert MuirIf a custom `Scorer` implementation does not have an associated `Weight`, it can probably
4224dc3e8abSRobert Muirbe replaced with a `Scorable` instead.
423c7697b08STomoko Uchida
4244dc3e8abSRobert Muir### Suggesters now return Long instead of long for weight() during indexing, and double instead of long at suggest time
425c7697b08STomoko Uchida
426c7697b08STomoko UchidaMost code should just require recompilation, though possibly requiring some added casts.
427c7697b08STomoko Uchida
4284dc3e8abSRobert Muir### TokenStreamComponents is now final
429c7697b08STomoko Uchida
4304dc3e8abSRobert MuirInstead of overriding `TokenStreamComponents.setReader()` to customise analyzer
4314dc3e8abSRobert Muirinitialisation, you should now pass a `Consumer<Reader>` instance to the
4324dc3e8abSRobert Muir`TokenStreamComponents` constructor.
433c7697b08STomoko Uchida
4344dc3e8abSRobert Muir### LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed
435c7697b08STomoko Uchida
4364dc3e8abSRobert Muir`LowerCaseTokenizer` combined tokenization and filtering in a way that broke token
4374dc3e8abSRobert Muirnormalization, so they have been removed. Instead, use a `LetterTokenizer` followed by
4384dc3e8abSRobert Muira `LowerCaseFilter`.
439c7697b08STomoko Uchida
4404dc3e8abSRobert Muir### CharTokenizer no longer takes a normalizer function
441c7697b08STomoko Uchida
4424dc3e8abSRobert Muir`CharTokenizer` now only performs tokenization. To perform any type of filtering
4434dc3e8abSRobert Muiruse a `TokenFilter` chain as you would with any other `Tokenizer`.
444c7697b08STomoko Uchida
4454dc3e8abSRobert Muir### Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery
446c7697b08STomoko Uchida
4474dc3e8abSRobert MuirBoth `Highlighter` and `FastVectorHighlighter` need a custom `WeightedSpanTermExtractor` or `FieldQuery`, respectively,
4484dc3e8abSRobert Muirin order to support `ToParentBlockJoinQuery`/`ToChildBlockJoinQuery`.
449c7697b08STomoko Uchida
4504dc3e8abSRobert Muir### MultiTermAwareComponent replaced by CharFilterFactory.normalize() and TokenFilterFactory.normalize()
451c7697b08STomoko Uchida
4524dc3e8abSRobert MuirNormalization is now type-safe, with `CharFilterFactory.normalize()` returning a `Reader` and
4534dc3e8abSRobert Muir`TokenFilterFactory.normalize()` returning a `TokenFilter`.
454c7697b08STomoko Uchida
4554dc3e8abSRobert Muir### k1+1 constant factor removed from BM25 similarity numerator (LUCENE-8563)
456c7697b08STomoko Uchida
4574dc3e8abSRobert MuirScores computed by the `BM25Similarity` are lower than previously as the `k1+1`
458c7697b08STomoko Uchidaconstant factor was removed from the numerator of the scoring formula.
459c7697b08STomoko UchidaOrdering of results is preserved unless scores are computed from multiple
460c7697b08STomoko Uchidafields using different similarities. The previous behaviour is now exposed
4614dc3e8abSRobert Muirby the `LegacyBM25Similarity` class which can be found in the lucene-misc jar.
462c7697b08STomoko Uchida
4634dc3e8abSRobert Muir### IndexWriter.maxDoc()/numDocs() removed in favor of IndexWriter.getDocStats()
464c7697b08STomoko Uchida
4654dc3e8abSRobert Muir`IndexWriter.getDocStats()` should be used instead of `maxDoc()` / `numDocs()` which offers a consistent
4664dc3e8abSRobert Muirview on document stats. Previously calling two methods in order to get point in time stats was subject
467c7697b08STomoko Uchidato concurrent changes.
468c7697b08STomoko Uchida
4694dc3e8abSRobert Muir### maxClausesCount moved from BooleanQuery To IndexSearcher (LUCENE-8811)
470c7697b08STomoko Uchida
4714dc3e8abSRobert Muir`IndexSearcher` now performs max clause count checks on all types of queries (including BooleanQueries).
4724dc3e8abSRobert MuirThis led to a logical move of the clauses count from `BooleanQuery` to `IndexSearcher`.
473c7697b08STomoko Uchida
4744dc3e8abSRobert Muir### TopDocs.merge shall no longer allow setting of shard indices
475c7697b08STomoko Uchida
4764dc3e8abSRobert Muir`TopDocs.merge()`'s API has been changed to stop allowing passing in a parameter to indicate if it should
477c7697b08STomoko Uchidaset shard indices for hits as they are seen during the merge process. This is done to simplify the API
478c7697b08STomoko Uchidato be more dynamic in terms of passing in custom tie breakers.
4794dc3e8abSRobert MuirIf shard indices are to be used for tie breaking docs with equal scores during `TopDocs.merge()`, then it is
4804dc3e8abSRobert Muirmandatory that the input `ScoreDocs` have their shard indices set to valid values prior to calling `merge()`
481c7697b08STomoko Uchida
4824dc3e8abSRobert Muir### TopDocsCollector Shall Throw IllegalArgumentException For Malformed Arguments
483c7697b08STomoko Uchida
4844dc3e8abSRobert Muir`TopDocsCollector` shall no longer return an empty `TopDocs` for malformed arguments.
4854dc3e8abSRobert MuirRather, an `IllegalArgumentException` shall be thrown. This is introduced for better
486c7697b08STomoko Uchidadefence and to ensure that there is no bubbling up of errors when Lucene is
487c7697b08STomoko Uchidaused in multi level applications
488b0333ab5SMayya Sharipova
4894dc3e8abSRobert Muir### Assumption of data consistency between different data-structures sharing the same field name
490b0333ab5SMayya Sharipova
491b0333ab5SMayya SharipovaSorting on a numeric field that is indexed with both doc values and points may use an
492b0333ab5SMayya Sharipovaoptimization to skip non-competitive documents. This optimization relies on the assumption
493b0333ab5SMayya Sharipovathat the same data is stored in these points and doc values.
494f3a284adSRobert Muir
495d03662c4SMayya Sharipova### Require consistency between data-structures on a per-field basis
496d03662c4SMayya Sharipova
497d03662c4SMayya SharipovaThe per field data-structures are implicitly defined by the first document
498d03662c4SMayya Sharipovaindexed that contains a certain field. Once defined, the per field
499d03662c4SMayya Sharipovadata-structures are not changeable for the whole index. For example, if you
500d03662c4SMayya Sharipovafirst index a document where a certain field is indexed with doc values and
501d03662c4SMayya Sharipovapoints, all subsequent documents containing this field must also have this
502d03662c4SMayya Sharipovafield indexed with only doc values and points.
503d03662c4SMayya Sharipova
504d03662c4SMayya SharipovaThis also means that an index created in the previous version that doesn't
505d03662c4SMayya Sharipovasatisfy this requirement can not be updated.
506d03662c4SMayya Sharipova
507d03662c4SMayya Sharipova### Doc values updates are allowed only for doc values only fields
508d03662c4SMayya Sharipova
509d03662c4SMayya SharipovaPreviously IndexWriter could update doc values for a binary or numeric docValue
510d03662c4SMayya Sharipovafield that was also indexed with other data structures (e.g. postings, vectors
511d03662c4SMayya Sharipovaetc). This is not allowed anymore. A field must be indexed with only doc values
5124dc3e8abSRobert Muirto be allowed for doc values updates in `IndexWriter`.
513d03662c4SMayya Sharipova
5144dc3e8abSRobert Muir### SortedDocValues no longer extends BinaryDocValues (LUCENE-9796)
515f3a284adSRobert Muir
5164dc3e8abSRobert Muir`SortedDocValues` no longer extends `BinaryDocValues`: `SortedDocValues` do not have a per-document
517f3a284adSRobert Muirbinary value, they have a per-document numeric `ordValue()`. The ordinal can then be dereferenced
518f3a284adSRobert Muirto its binary form with `lookupOrd()`, but it was a performance trap to implement a `binaryValue()`
519f3a284adSRobert Muiron the SortedDocValues api that does this behind-the-scenes on every document.
520f3a284adSRobert Muir
521f3a284adSRobert MuirYou can replace calls of `binaryValue()` with `lookupOrd(ordValue())` as a "quick fix", but it is
522f3a284adSRobert Muirbetter to use the ordinal alone (integer-based datastructures) for per-document access, and only
5234dc3e8abSRobert Muircall `lookupOrd()` a few times at the end (e.g. for the hits you want to display). Otherwise, if you
5244dc3e8abSRobert Muirreally don't want per-document ordinals, but instead a per-document `byte[]`, use a `BinaryDocValues`
525f3a284adSRobert Muirfield.
52679f14b17SAdrien Grand
5274dc3e8abSRobert Muir### Removed CodecReader.ramBytesUsed() (LUCENE-9387)
52879f14b17SAdrien Grand
52979f14b17SAdrien GrandLucene index readers are now using so little memory with the default codec that
53079f14b17SAdrien Grandit was decided to remove the ability to estimate their RAM usage.
531650cad19SGreg Miller
5324dc3e8abSRobert Muir### LongValueFacetCounts no longer accepts multiValued param in constructors (LUCENE-9948)
533650cad19SGreg Miller
5344dc3e8abSRobert Muir`LongValueFacetCounts` will now automatically detect whether-or-not an indexed field is single- or
535650cad19SGreg Millermulti-valued. The user no longer needs to provide this information to the ctors. Migrating should
536650cad19SGreg Millerbe as simple as no longer providing this boolean.
5374464cd87SAlan Woodward
5384dc3e8abSRobert Muir### SpanQuery and subclasses have moved from core/ to the queries module
5394464cd87SAlan Woodward
5404dc3e8abSRobert MuirThey can now be found in the `org.apache.lucene.queries.spans` package.
541dbb4c265SAlan Woodward
5424dc3e8abSRobert Muir### SpanBoostQuery has been removed (LUCENE-8143)
543dbb4c265SAlan Woodward
5444dc3e8abSRobert Muir`SpanBoostQuery` was a no-op unless used at the top level of a `SpanQuery` nested
5454dc3e8abSRobert Muirstructure. Use a standard `BoostQuery` here instead.
5469e9c3bd2SAlan Woodward
5474dc3e8abSRobert Muir### Sort is immutable (LUCENE-9325)
5489e9c3bd2SAlan Woodward
5499e9c3bd2SAlan WoodwardRather than using `setSort()` to change sort values, you should instead create
5504dc3e8abSRobert Muira new `Sort` instance with the new values.
5516ee69e06SGreg Miller
5524dc3e8abSRobert Muir### Taxonomy-based faceting uses more modern encodings (LUCENE-9450, LUCENE-10062, LUCENE-10122)
5536ee69e06SGreg Miller
5546ee69e06SGreg MillerThe side-car taxonomy index now uses doc values for ord-to-path lookup (LUCENE-9450) and parent
5556ee69e06SGreg Millerlookup (LUCENE-10122) instead of stored fields and positions (respectively). Document ordinals
5566ee69e06SGreg Millerare now encoded with `SortedNumericDocValues` instead of using a custom (v-int) binary format.
5576ee69e06SGreg MillerPerformance gains have been observed with these encoding changes. These changes were introduced
5586ee69e06SGreg Millerin 9.0, and 9.x releases remain backwards-compatible with 8.x indexes, but starting with 10.0,
5596ee69e06SGreg Milleronly the newer formats are supported. Users will need to create a new index with all their
5606ee69e06SGreg Millerdocuments using 9.0 or later to pick up the new format and remain compatible with 10.x releases.
5616ee69e06SGreg MillerJust re-adding documents to an existing index is not enough to pick up the changes as the
5626ee69e06SGreg Millerformat will "stick" to whatever version was used to initially create the index.
5636ee69e06SGreg Miller
5646ee69e06SGreg MillerAdditionally, `OrdinalsReader` (and sub-classes) are fully removed starting with 10.0. These
5656ee69e06SGreg Millerclasses were `@Deprecated` starting with 9.0. Users are encouraged to rely on the default
5666ee69e06SGreg Millertaxonomy facet encodings where possible. If custom formats are needed, users will need
5676ee69e06SGreg Millerto manage the indexed data on their own and create new `Facet` implementations to use it.
568