xref: /Lucene/lucene/MIGRATE.md (revision 71a9acb2e2aa55257021eefce1e5d8d390bc7048)
1<!--
2    Licensed to the Apache Software Foundation (ASF) under one or more
3    contributor license agreements.  See the NOTICE file distributed with
4    this work for additional information regarding copyright ownership.
5    The ASF licenses this file to You under the Apache License, Version 2.0
6    the "License"); you may not use this file except in compliance with
7    the License.  You may obtain a copy of the License at
8
9        http://www.apache.org/licenses/LICENSE-2.0
10
11    Unless required by applicable law or agreed to in writing, software
12    distributed under the License is distributed on an "AS IS" BASIS,
13    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14    See the License for the specific language governing permissions and
15    limitations under the License.
16 -->
17
18# Apache Lucene Migration Guide
19
20## Migration from Lucene 9.x to Lucene 10.0
21
22### PersianStemFilter is added to PersianAnalyzer (LUCENE-10312)
23
24PersianAnalyzer now includes PersianStemFilter, that would change analysis results. If you need the exactly same analysis
25behaviour as 9.x, clone `PersianAnalyzer` in 9.x or create custom analyzer by using `CustomAnalyzer` on your own.
26
27### AutomatonQuery/CompiledAutomaton/RunAutomaton/RegExp no longer determinize (LUCENE-10010)
28
29These classes no longer take a `determinizeWorkLimit` and no longer determinize
30behind the scenes. It is the responsibility of the caller to to call
31`Operations.determinize()` for DFA execution.
32
33### DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery removed in favor of FieldExistsQuery (LUCENE-10436)
34
35These classes have been removed and consolidated into `FieldExistsQuery`. To migrate, caller simply replace those classes
36with the new one during object instantiation.
37
38### Normalizer and stemmer classes are now package private (LUCENE-10561)
39
40Except for a few exceptions, almost all normalizer and stemmer classes are now package private. If your code depends on
41constants defined in them, copy the constant values and re-define them in your code.
42
43## Migration from Lucene 9.0 to Lucene 9.1
44
45### Test framework package migration and module (LUCENE-10301)
46
47The test framework is now a Java module. All the classes have been moved from
48`org.apache.lucene.*` to `org.apache.lucene.tests.*` to avoid package name conflicts
49with the core module. If you were using the Lucene test framework, the migration should be
50fairly automatic (package prefix).
51
52### Minor syntactical changes in StandardQueryParser (LUCENE-10223)
53
54Added interval functions and min-should-match support to `StandardQueryParser`. This
55means that interval function prefixes (`fn:`) and the `@` character after parentheses will
56parse differently than before. If you need the exact previous behavior, clone the
57`StandardSyntaxParser` from the previous version of Lucene and create a custom query parser
58with that parser.
59
60### Lucene Core now depends on java.logging (JUL) module (LUCENE-10342)
61
62Lucene Core now logs certain warnings and errors using Java Util Logging (JUL).
63It is therefore recommended to install wrapper libraries with JUL logging handlers to
64feed the log events into your app's own logging system.
65
66Under normal circumstances Lucene won't log anything, but in the case of a problem
67users should find the logged information in the usual log files.
68
69Lucene also provides a `JavaLoggingInfoStream` implementation that logs `IndexWriter`
70events using JUL.
71
72To feed Lucene's log events into the well-known Log4J system, we refer to
73the [Log4j JDK Logging Adapter](https://logging.apache.org/log4j/2.x/log4j-jul/index.html)
74in combination with the corresponding system property:
75`java.util.logging.manager=org.apache.logging.log4j.jul.LogManager`.
76
77### Kuromoji and Nori analysis component constructors for custom dictionaries
78
79The Kuromoji and Nori analysis modules had some way to customize the backing dictionaries
80by passing a path to file or classpath resources using some inconsistently implemented
81APIs. This was buggy from the beginning, but some users made use of it. Due to move to Java
82module system, especially the resource lookup on classpath stopped to work correctly.
83The Lucene team therefore implemented new APIs to create dictionary implementations
84with custom data files. Unfortunately there were some shortcomings in the 9.1 version,
85also when using the now deprecated ctors, so users are advised to upgrade to
86Lucene 9.2 or stay with 9.0.
87
88See LUCENE-10558 for more details and workarounds.
89
90## Migration from Lucene 8.x to Lucene 9.0
91
92### Rename of binary artifacts from '**-analyzers-**' to '**-analysis-**' (LUCENE-9562)
93
94All binary analysis packages (and corresponding Maven artifacts) have been renamed and are
95now consistent with repository module `analysis`. You will need to adjust build dependencies
96to the new coordinates:
97
98|         Old Artifact Coordinates            |        New Artifact Coordinates            |
99|---------------------------------------------|--------------------------------------------|
100|org.apache.lucene:lucene-analyzers-common    |org.apache.lucene:lucene-analysis-common    |
101|org.apache.lucene:lucene-analyzers-icu       |org.apache.lucene:lucene-analysis-icu       |
102|org.apache.lucene:lucene-analyzers-kuromoji  |org.apache.lucene:lucene-analysis-kuromoji  |
103|org.apache.lucene:lucene-analyzers-morfologik|org.apache.lucene:lucene-analysis-morfologik|
104|org.apache.lucene:lucene-analyzers-nori      |org.apache.lucene:lucene-analysis-nori      |
105|org.apache.lucene:lucene-analyzers-opennlp   |org.apache.lucene:lucene-analysis-opennlp   |
106|org.apache.lucene:lucene-analyzers-phonetic  |org.apache.lucene:lucene-analysis-phonetic  |
107|org.apache.lucene:lucene-analyzers-smartcn   |org.apache.lucene:lucene-analysis-smartcn   |
108|org.apache.lucene:lucene-analyzers-stempel   |org.apache.lucene:lucene-analysis-stempel   |
109
110
111### LucenePackage class removed (LUCENE-10260)
112
113`LucenePackage` class has been removed. The implementation string can be
114retrieved from `Version.getPackageImplementationVersion()`.
115
116### Directory API is now little-endian (LUCENE-9047)
117
118`DataOutput`'s `writeShort()`, `writeInt()`, and `writeLong()` methods now encode with
119little-endian byte order. If you have custom subclasses of `DataInput`/`DataOutput`, you
120will need to adjust them from big-endian byte order to little-endian byte order.
121
122### NativeUnixDirectory removed and replaced by DirectIODirectory (LUCENE-8982)
123
124Java 11 supports to use Direct IO without native wrappers from Java code.
125`NativeUnixDirectory` in the misc module was therefore removed and replaced
126by `DirectIODirectory`. To use it, you need a JVM and operating system that
127supports Direct IO.
128
129### BM25Similarity.setDiscountOverlaps and LegacyBM25Similarity.setDiscountOverlaps methods removed (LUCENE-9646)
130
131The `discountOverlaps()` parameter for both `BM25Similarity` and `LegacyBM25Similarity`
132is now set by the constructor of those classes.
133
134### Packages in misc module are renamed (LUCENE-9600)
135
136These packages in the `lucene-misc` module are renamed:
137
138|    Old Package Name      |       New Package Name        |
139|--------------------------|-------------------------------|
140|org.apache.lucene.document|org.apache.lucene.misc.document|
141|org.apache.lucene.index   |org.apache.lucene.misc.index   |
142|org.apache.lucene.search  |org.apache.lucene.misc.search  |
143|org.apache.lucene.store   |org.apache.lucene.misc.store   |
144|org.apache.lucene.util    |org.apache.lucene.misc.util    |
145
146The following classes were moved to the `lucene-core` module:
147
148- org.apache.lucene.document.InetAddressPoint
149- org.apache.lucene.document.InetAddressRange
150
151### Packages in sandbox module are renamed (LUCENE-9319)
152
153These packages in the `lucene-sandbox` module are renamed:
154
155|    Old Package Name      |       New Package Name           |
156|--------------------------|----------------------------------|
157|org.apache.lucene.codecs  |org.apache.lucene.sandbox.codecs  |
158|org.apache.lucene.document|org.apache.lucene.sandbox.document|
159|org.apache.lucene.search  |org.apache.lucene.sandbox.search  |
160
161### Backward codecs are renamed (LUCENE-9318)
162
163These packages in the `lucene-backwards-codecs` module are renamed:
164
165|    Old Package Name    |       New Package Name          |
166|------------------------|---------------------------------|
167|org.apache.lucene.codecs|org.apache.lucene.backward_codecs|
168
169### JapanesePartOfSpeechStopFilterFactory loads default stop tags if "tags" argument not specified (LUCENE-9567)
170
171Previously, `JapanesePartOfSpeechStopFilterFactory` added no filter if `args` didn't include "tags". Now, it will load
172the default stop tags returned by `JapaneseAnalyzer.getDefaultStopTags()` (i.e. the tags from`stoptags.txt` in the
173`lucene-analyzers-kuromoji` jar.)
174
175### ICUCollationKeyAnalyzer is renamed (LUCENE-9558)
176
177These packages in the `lucene-analysis-icu` module are renamed:
178
179|    Old Package Name       |       New Package Name       |
180|---------------------------|------------------------------|
181|org.apache.lucene.collation|org.apache.lucene.analysis.icu|
182
183### Base and concrete analysis factories are moved / package renamed (LUCENE-9317)
184
185Base analysis factories are moved to `lucene-core`, also their package names are renamed.
186
187|                Old Class Name                    |               New Class Name               |
188|--------------------------------------------------|--------------------------------------------|
189|org.apache.lucene.analysis.util.TokenizerFactory  |org.apache.lucene.analysis.TokenizerFactory |
190|org.apache.lucene.analysis.util.CharFilterFactory |org.apache.lucene.analysis.CharFilterFactory|
191|org.apache.lucene.analysis.util.TokenFilterFactory|org.apache.lucene.analysis.TokenizerFactory |
192
193The service provider files placed in `META-INF/services` for custom analysis factories should be renamed as follows:
194
195- META-INF/services/org.apache.lucene.analysis.TokenizerFactory
196- META-INF/services/org.apache.lucene.analysis.CharFilterFactory
197- META-INF/services/org.apache.lucene.analysis.TokenFilterFactory
198
199`StandardTokenizerFactory` is moved to `lucene-core` module.
200
201The `org.apache.lucene.analysis.standard` package in `lucene-analysis-common` module
202is split into `org.apache.lucene.analysis.classic` and `org.apache.lucene.analysis.email`.
203
204### RegExpQuery now rejects invalid backslashes (LUCENE-9370)
205
206We now follow the [Java rules](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs) for accepting backslashes.
207Alphabetic characters other than s, S, w, W, d or D that are preceded by a backslash are considered illegal syntax and will throw an exception.
208
209### RegExp certain regular expressions now match differently (LUCENE-9336)
210
211The commonly used regular expressions \w \W \d \D \s and \S now work the same way [Java Pattern](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART) matching works. Previously these expressions were (mis)interpreted as searches for the literal characters w, d, s etc.
212
213### NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259)
214
215The factory option name to output the original term was corrected in accordance with its Javadoc.
216
217### IndexMergeTool defaults changes (LUCENE-9206)
218
219This command-line tool no longer forceMerges to a single segment. Instead, by
220default it just follows (configurable) merge policy. If you really want to merge
221to a single segment, you can pass `-max-segments 1`.
222
223### FST Builder is renamed FSTCompiler with fluent-style Builder (LUCENE-9089)
224
225Simply use `FSTCompiler` instead of the previous `Builder`. Use either the simple constructor with default settings, or
226the `FSTCompiler.Builder` to tune and tweak any parameter.
227
228### Kuromoji user dictionary now forbids illegal segmentation (LUCENE-8933)
229
230User dictionary now strictly validates if the (concatenated) segment is the same as the surface form. This change avoids
231unexpected runtime exceptions or behaviours.
232For example, these entries are not allowed at all and an exception is thrown when loading the dictionary file.
233
234```
235# concatenated "日本経済新聞" does not match the surface form "日経新聞"
236日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
237
238# concatenated "日経新聞" does not match the surface form "日本経済新聞"
239日本経済新聞,日経 新聞,ニッケイ シンブン,カスタム名詞
240```
241
242### JapaneseTokenizer no longer emits original (compound) tokens by default when the mode is not NORMAL (LUCENE-9123)
243
244`JapaneseTokenizer` and `JapaneseAnalyzer` no longer emits original tokens when `discardCompoundToken` option is not specified.
245The constructor option has been introduced since Lucene 8.5.0, and the default value is changed to `true`.
246
247When given the text "株式会社", JapaneseTokenizer (mode != NORMAL) emits decompounded tokens "株式" and "会社" only and no
248longer outputs the original token "株式会社" by default. To output original tokens, `discardCompoundToken` option should be
249explicitly set to `false`. Be aware that if this option is set to `false`, `SynonymFilter` or `SynonymGraphFilter` does not work
250correctly (see LUCENE-9173).
251
252### Analysis factories now have customizable symbolic names (LUCENE-8778) and need additional no-arg constructor (LUCENE-9281)
253
254The SPI names for concrete subclasses of `TokenizerFactory`, `TokenFilterFactory`, and `CharfilterFactory` are no longer
255derived from their class name. Instead, each factory must have a static "NAME" field like this:
256
257```java
258    /** o.a.l.a.standard.StandardTokenizerFactory's SPI name */
259    public static final String NAME = "standard";
260```
261
262A factory can be resolved/instantiated with its `NAME` by using methods such as `TokenizerFactory.lookupClass(String)`
263or `TokenizerFactory.forName(String, Map<String,String>)`.
264
265If there are any user-defined factory classes that don't have proper `NAME` field, an exception will be thrown
266when (re)loading factories. e.g., when calling `TokenizerFactory.reloadTokenizers(ClassLoader)`.
267
268In addition starting all factories need to implement a public no-arg constructor, too. The reason for this
269change comes from the fact that Lucene now uses `java.util.ServiceLoader` instead its own implementation to
270load the factory classes to be compatible with Java Module System changes (e.g., load factories from modules).
271In the future, extensions to Lucene developed on the Java Module System may expose the factories from their
272`module-info.java` file instead of `META-INF/services`.
273
274This constructor is never called by Lucene, so by default it throws an `UnsupportedOperationException`. User-defined
275factory classes should implement it in the following way:
276
277```java
278    /** Default ctor for compatibility with SPI */
279    public StandardTokenizerFactory() {
280      throw defaultCtorException();
281    }
282```
283
284(`defaultCtorException()` is a protected static helper method)
285
286### TermsEnum is now fully abstract (LUCENE-8292, LUCENE-8662)
287
288`TermsEnum` has been changed to be fully abstract, so non-abstract subclasses must implement all its methods.
289Non-Performance critical `TermsEnum`s can use `BaseTermsEnum` as a base class instead. The change was motivated
290by several performance issues with `FilterTermsEnum` that caused significant slowdowns and massive memory consumption due
291to not delegating all method from `TermsEnum`.
292
293### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream removed (LUCENE-8474)
294
295RAM-based directory implementation have been removed.
296`ByteBuffersDirectory` can be used as a RAM-resident replacement, although it
297is discouraged in favor of the default `MMapDirectory`.
298
299### Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014)
300
301`SpanQuery` and `PhraseQuery` now always calculate their slops as
302`(1.0 / (1.0 + distance))`.  Payload factor calculation is performed by
303`PayloadDecoder` in the `lucene-queries` module.
304
305### Scorer must produce positive scores (LUCENE-7996)
306
307`Scorer`s are no longer allowed to produce negative scores. If you have custom
308query implementations, you should make sure their score formula may never produce
309negative scores.
310
311As a side-effect of this change, negative boosts are now rejected and
312`FunctionScoreQuery` maps negative values to 0.
313
314### CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099)
315
316Instead use `FunctionScoreQuery` and a `DoubleValuesSource` implementation.  `BoostedQuery`
317and `BoostingQuery` may be replaced by calls to `FunctionScoreQuery.boostByValue()` and
318`FunctionScoreQuery.boostByQuery()`.  To replace more complex calculations in
319`CustomScoreQuery`, use the `lucene-expressions` module:
320
321```java
322SimpleBindings bindings = new SimpleBindings();
323bindings.add("score", DoubleValuesSource.SCORES);
324bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
325bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
326Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
327FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
328```
329
330### IndexOptions can no longer be changed dynamically (LUCENE-8134)
331
332Changing `IndexOptions` for a field on the fly will now result into an
333`IllegalArgumentException`. If a field is indexed
334(`FieldType.indexOptions() != IndexOptions.NONE`) then all documents must have
335the same index options for that field.
336
337
338### IndexSearcher.createNormalizedWeight() removed (LUCENE-8242)
339
340Instead use `IndexSearcher.createWeight()`, rewriting the query first, and using
341a boost of `1f`.
342
343### Memory codecs removed (LUCENE-8267)
344
345Memory codecs (`MemoryPostingsFormat`, `MemoryDocValuesFormat`) have been removed from the codebase.
346
347### Direct doc-value format removed (LUCENE-8917)
348
349The `Direct` doc-value format has been removed from the codebase.
350
351### QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144)
352
353Caching everything is discouraged as it disables the ability to skip non-interesting documents.
354`ALWAYS_CACHE` can be replaced by a `UsageTrackingQueryCachingPolicy` with an appropriate config.
355
356### English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444)
357
358To retain the old behaviour, pass `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` as an argument
359to the constructor
360
361### StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved
362
363English stop words are now defined in `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` in the
364`analysis-common` module.
365
366### TopDocs.maxScore removed
367
368`TopDocs.maxScore` is removed. `IndexSearcher` and `TopFieldCollector` no longer have
369an option to compute the maximum score when sorting by field. If you need to
370know the maximum score for a query, the recommended approach is to run a
371separate query:
372
373```java
374  TopDocs topHits = searcher.search(query, 1);
375  float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
376```
377
378Thanks to other optimizations that were added to Lucene 8, this query will be
379able to efficiently select the top-scoring document without having to visit
380all matches.
381
382### TopFieldCollector always assumes fillFields=true
383
384Because filling sort values doesn't have a significant overhead, the `fillFields`
385option has been removed from `TopFieldCollector` factory methods. Everything
386behaves as if it was previously set to `true`.
387
388### TopFieldCollector no longer takes a trackDocScores option
389
390Computing scores at collection time is less efficient than running a second
391request in order to only compute scores for documents that made it to the top
392hits. As a consequence, the `trackDocScores` option has been removed and can be
393replaced with the new `TopFieldCollector.populateScores()` helper method.
394
395### IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long
396
397Lucene 8 received optimizations for collection of top-k matches by not visiting
398all matches. However these optimizations won't help if all matches still need
399to be visited in order to compute the total number of hits. As a consequence,
400`IndexSearcher`'s `search()` and `searchAfter()` methods were changed to only count hits
401accurately up to 1,000, and `Topdocs.totalHits` was changed from a `long` to an
402object that says whether the hit count is accurate or a lower bound of the
403actual hit count.
404
405### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated (LUCENE-8467, LUCENE-8438)
406
407This RAM-based directory implementation is an old piece of code that uses inefficient
408thread synchronization primitives and can be confused as "faster" than the NIO-based
409`MMapDirectory`. It is deprecated and scheduled for removal in future versions of
410Lucene.
411
412### LeafCollector.setScorer() now takes a Scorable rather than a Scorer (LUCENE-6228)
413
414`Scorer` has a number of methods that should never be called from `Collector`s, for example
415those that advance the underlying iterators.  To hide these, `LeafCollector.setScorer()`
416now takes a `Scorable`, an abstract class that scorers can extend, with methods
417`docId()` and `score()`.
418
419### Scorers must have non-null Weights
420
421If a custom `Scorer` implementation does not have an associated `Weight`, it can probably
422be replaced with a `Scorable` instead.
423
424### Suggesters now return Long instead of long for weight() during indexing, and double instead of long at suggest time
425
426Most code should just require recompilation, though possibly requiring some added casts.
427
428### TokenStreamComponents is now final
429
430Instead of overriding `TokenStreamComponents.setReader()` to customise analyzer
431initialisation, you should now pass a `Consumer<Reader>` instance to the
432`TokenStreamComponents` constructor.
433
434### LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed
435
436`LowerCaseTokenizer` combined tokenization and filtering in a way that broke token
437normalization, so they have been removed. Instead, use a `LetterTokenizer` followed by
438a `LowerCaseFilter`.
439
440### CharTokenizer no longer takes a normalizer function
441
442`CharTokenizer` now only performs tokenization. To perform any type of filtering
443use a `TokenFilter` chain as you would with any other `Tokenizer`.
444
445### Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery
446
447Both `Highlighter` and `FastVectorHighlighter` need a custom `WeightedSpanTermExtractor` or `FieldQuery`, respectively,
448in order to support `ToParentBlockJoinQuery`/`ToChildBlockJoinQuery`.
449
450### MultiTermAwareComponent replaced by CharFilterFactory.normalize() and TokenFilterFactory.normalize()
451
452Normalization is now type-safe, with `CharFilterFactory.normalize()` returning a `Reader` and
453`TokenFilterFactory.normalize()` returning a `TokenFilter`.
454
455### k1+1 constant factor removed from BM25 similarity numerator (LUCENE-8563)
456
457Scores computed by the `BM25Similarity` are lower than previously as the `k1+1`
458constant factor was removed from the numerator of the scoring formula.
459Ordering of results is preserved unless scores are computed from multiple
460fields using different similarities. The previous behaviour is now exposed
461by the `LegacyBM25Similarity` class which can be found in the lucene-misc jar.
462
463### IndexWriter.maxDoc()/numDocs() removed in favor of IndexWriter.getDocStats()
464
465`IndexWriter.getDocStats()` should be used instead of `maxDoc()` / `numDocs()` which offers a consistent
466view on document stats. Previously calling two methods in order to get point in time stats was subject
467to concurrent changes.
468
469### maxClausesCount moved from BooleanQuery To IndexSearcher (LUCENE-8811)
470
471`IndexSearcher` now performs max clause count checks on all types of queries (including BooleanQueries).
472This led to a logical move of the clauses count from `BooleanQuery` to `IndexSearcher`.
473
474### TopDocs.merge shall no longer allow setting of shard indices
475
476`TopDocs.merge()`'s API has been changed to stop allowing passing in a parameter to indicate if it should
477set shard indices for hits as they are seen during the merge process. This is done to simplify the API
478to be more dynamic in terms of passing in custom tie breakers.
479If shard indices are to be used for tie breaking docs with equal scores during `TopDocs.merge()`, then it is
480mandatory that the input `ScoreDocs` have their shard indices set to valid values prior to calling `merge()`
481
482### TopDocsCollector Shall Throw IllegalArgumentException For Malformed Arguments
483
484`TopDocsCollector` shall no longer return an empty `TopDocs` for malformed arguments.
485Rather, an `IllegalArgumentException` shall be thrown. This is introduced for better
486defence and to ensure that there is no bubbling up of errors when Lucene is
487used in multi level applications
488
489### Assumption of data consistency between different data-structures sharing the same field name
490
491Sorting on a numeric field that is indexed with both doc values and points may use an
492optimization to skip non-competitive documents. This optimization relies on the assumption
493that the same data is stored in these points and doc values.
494
495### Require consistency between data-structures on a per-field basis
496
497The per field data-structures are implicitly defined by the first document
498indexed that contains a certain field. Once defined, the per field
499data-structures are not changeable for the whole index. For example, if you
500first index a document where a certain field is indexed with doc values and
501points, all subsequent documents containing this field must also have this
502field indexed with only doc values and points.
503
504This also means that an index created in the previous version that doesn't
505satisfy this requirement can not be updated.
506
507### Doc values updates are allowed only for doc values only fields
508
509Previously IndexWriter could update doc values for a binary or numeric docValue
510field that was also indexed with other data structures (e.g. postings, vectors
511etc). This is not allowed anymore. A field must be indexed with only doc values
512to be allowed for doc values updates in `IndexWriter`.
513
514### SortedDocValues no longer extends BinaryDocValues (LUCENE-9796)
515
516`SortedDocValues` no longer extends `BinaryDocValues`: `SortedDocValues` do not have a per-document
517binary value, they have a per-document numeric `ordValue()`. The ordinal can then be dereferenced
518to its binary form with `lookupOrd()`, but it was a performance trap to implement a `binaryValue()`
519on the SortedDocValues api that does this behind-the-scenes on every document.
520
521You can replace calls of `binaryValue()` with `lookupOrd(ordValue())` as a "quick fix", but it is
522better to use the ordinal alone (integer-based datastructures) for per-document access, and only
523call `lookupOrd()` a few times at the end (e.g. for the hits you want to display). Otherwise, if you
524really don't want per-document ordinals, but instead a per-document `byte[]`, use a `BinaryDocValues`
525field.
526
527### Removed CodecReader.ramBytesUsed() (LUCENE-9387)
528
529Lucene index readers are now using so little memory with the default codec that
530it was decided to remove the ability to estimate their RAM usage.
531
532### LongValueFacetCounts no longer accepts multiValued param in constructors (LUCENE-9948)
533
534`LongValueFacetCounts` will now automatically detect whether-or-not an indexed field is single- or
535multi-valued. The user no longer needs to provide this information to the ctors. Migrating should
536be as simple as no longer providing this boolean.
537
538### SpanQuery and subclasses have moved from core/ to the queries module
539
540They can now be found in the `org.apache.lucene.queries.spans` package.
541
542### SpanBoostQuery has been removed (LUCENE-8143)
543
544`SpanBoostQuery` was a no-op unless used at the top level of a `SpanQuery` nested
545structure. Use a standard `BoostQuery` here instead.
546
547### Sort is immutable (LUCENE-9325)
548
549Rather than using `setSort()` to change sort values, you should instead create
550a new `Sort` instance with the new values.
551
552### Taxonomy-based faceting uses more modern encodings (LUCENE-9450, LUCENE-10062, LUCENE-10122)
553
554The side-car taxonomy index now uses doc values for ord-to-path lookup (LUCENE-9450) and parent
555lookup (LUCENE-10122) instead of stored fields and positions (respectively). Document ordinals
556are now encoded with `SortedNumericDocValues` instead of using a custom (v-int) binary format.
557Performance gains have been observed with these encoding changes. These changes were introduced
558in 9.0, and 9.x releases remain backwards-compatible with 8.x indexes, but starting with 10.0,
559only the newer formats are supported. Users will need to create a new index with all their
560documents using 9.0 or later to pick up the new format and remain compatible with 10.x releases.
561Just re-adding documents to an existing index is not enough to pick up the changes as the
562format will "stick" to whatever version was used to initially create the index.
563
564Additionally, `OrdinalsReader` (and sub-classes) are fully removed starting with 10.0. These
565classes were `@Deprecated` starting with 9.0. Users are encouraged to rely on the default
566taxonomy facet encodings where possible. If custom formats are needed, users will need
567to manage the indexed data on their own and create new `Facet` implementations to use it.
568