1<!-- 2 Licensed to the Apache Software Foundation (ASF) under one or more 3 contributor license agreements. See the NOTICE file distributed with 4 this work for additional information regarding copyright ownership. 5 The ASF licenses this file to You under the Apache License, Version 2.0 6 the "License"); you may not use this file except in compliance with 7 the License. You may obtain a copy of the License at 8 9 http://www.apache.org/licenses/LICENSE-2.0 10 11 Unless required by applicable law or agreed to in writing, software 12 distributed under the License is distributed on an "AS IS" BASIS, 13 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 See the License for the specific language governing permissions and 15 limitations under the License. 16 --> 17 18# Apache Lucene Migration Guide 19 20## Migration from Lucene 9.x to Lucene 10.0 21 22### PersianStemFilter is added to PersianAnalyzer (LUCENE-10312) 23 24PersianAnalyzer now includes PersianStemFilter, that would change analysis results. If you need the exactly same analysis 25behaviour as 9.x, clone `PersianAnalyzer` in 9.x or create custom analyzer by using `CustomAnalyzer` on your own. 26 27### AutomatonQuery/CompiledAutomaton/RunAutomaton/RegExp no longer determinize (LUCENE-10010) 28 29These classes no longer take a `determinizeWorkLimit` and no longer determinize 30behind the scenes. It is the responsibility of the caller to to call 31`Operations.determinize()` for DFA execution. 32 33### DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery removed in favor of FieldExistsQuery (LUCENE-10436) 34 35These classes have been removed and consolidated into `FieldExistsQuery`. To migrate, caller simply replace those classes 36with the new one during object instantiation. 37 38### Normalizer and stemmer classes are now package private (LUCENE-10561) 39 40Except for a few exceptions, almost all normalizer and stemmer classes are now package private. If your code depends on 41constants defined in them, copy the constant values and re-define them in your code. 42 43## Migration from Lucene 9.0 to Lucene 9.1 44 45### Test framework package migration and module (LUCENE-10301) 46 47The test framework is now a Java module. All the classes have been moved from 48`org.apache.lucene.*` to `org.apache.lucene.tests.*` to avoid package name conflicts 49with the core module. If you were using the Lucene test framework, the migration should be 50fairly automatic (package prefix). 51 52### Minor syntactical changes in StandardQueryParser (LUCENE-10223) 53 54Added interval functions and min-should-match support to `StandardQueryParser`. This 55means that interval function prefixes (`fn:`) and the `@` character after parentheses will 56parse differently than before. If you need the exact previous behavior, clone the 57`StandardSyntaxParser` from the previous version of Lucene and create a custom query parser 58with that parser. 59 60### Lucene Core now depends on java.logging (JUL) module (LUCENE-10342) 61 62Lucene Core now logs certain warnings and errors using Java Util Logging (JUL). 63It is therefore recommended to install wrapper libraries with JUL logging handlers to 64feed the log events into your app's own logging system. 65 66Under normal circumstances Lucene won't log anything, but in the case of a problem 67users should find the logged information in the usual log files. 68 69Lucene also provides a `JavaLoggingInfoStream` implementation that logs `IndexWriter` 70events using JUL. 71 72To feed Lucene's log events into the well-known Log4J system, we refer to 73the [Log4j JDK Logging Adapter](https://logging.apache.org/log4j/2.x/log4j-jul/index.html) 74in combination with the corresponding system property: 75`java.util.logging.manager=org.apache.logging.log4j.jul.LogManager`. 76 77### Kuromoji and Nori analysis component constructors for custom dictionaries 78 79The Kuromoji and Nori analysis modules had some way to customize the backing dictionaries 80by passing a path to file or classpath resources using some inconsistently implemented 81APIs. This was buggy from the beginning, but some users made use of it. Due to move to Java 82module system, especially the resource lookup on classpath stopped to work correctly. 83The Lucene team therefore implemented new APIs to create dictionary implementations 84with custom data files. Unfortunately there were some shortcomings in the 9.1 version, 85also when using the now deprecated ctors, so users are advised to upgrade to 86Lucene 9.2 or stay with 9.0. 87 88See LUCENE-10558 for more details and workarounds. 89 90## Migration from Lucene 8.x to Lucene 9.0 91 92### Rename of binary artifacts from '**-analyzers-**' to '**-analysis-**' (LUCENE-9562) 93 94All binary analysis packages (and corresponding Maven artifacts) have been renamed and are 95now consistent with repository module `analysis`. You will need to adjust build dependencies 96to the new coordinates: 97 98| Old Artifact Coordinates | New Artifact Coordinates | 99|---------------------------------------------|--------------------------------------------| 100|org.apache.lucene:lucene-analyzers-common |org.apache.lucene:lucene-analysis-common | 101|org.apache.lucene:lucene-analyzers-icu |org.apache.lucene:lucene-analysis-icu | 102|org.apache.lucene:lucene-analyzers-kuromoji |org.apache.lucene:lucene-analysis-kuromoji | 103|org.apache.lucene:lucene-analyzers-morfologik|org.apache.lucene:lucene-analysis-morfologik| 104|org.apache.lucene:lucene-analyzers-nori |org.apache.lucene:lucene-analysis-nori | 105|org.apache.lucene:lucene-analyzers-opennlp |org.apache.lucene:lucene-analysis-opennlp | 106|org.apache.lucene:lucene-analyzers-phonetic |org.apache.lucene:lucene-analysis-phonetic | 107|org.apache.lucene:lucene-analyzers-smartcn |org.apache.lucene:lucene-analysis-smartcn | 108|org.apache.lucene:lucene-analyzers-stempel |org.apache.lucene:lucene-analysis-stempel | 109 110 111### LucenePackage class removed (LUCENE-10260) 112 113`LucenePackage` class has been removed. The implementation string can be 114retrieved from `Version.getPackageImplementationVersion()`. 115 116### Directory API is now little-endian (LUCENE-9047) 117 118`DataOutput`'s `writeShort()`, `writeInt()`, and `writeLong()` methods now encode with 119little-endian byte order. If you have custom subclasses of `DataInput`/`DataOutput`, you 120will need to adjust them from big-endian byte order to little-endian byte order. 121 122### NativeUnixDirectory removed and replaced by DirectIODirectory (LUCENE-8982) 123 124Java 11 supports to use Direct IO without native wrappers from Java code. 125`NativeUnixDirectory` in the misc module was therefore removed and replaced 126by `DirectIODirectory`. To use it, you need a JVM and operating system that 127supports Direct IO. 128 129### BM25Similarity.setDiscountOverlaps and LegacyBM25Similarity.setDiscountOverlaps methods removed (LUCENE-9646) 130 131The `discountOverlaps()` parameter for both `BM25Similarity` and `LegacyBM25Similarity` 132is now set by the constructor of those classes. 133 134### Packages in misc module are renamed (LUCENE-9600) 135 136These packages in the `lucene-misc` module are renamed: 137 138| Old Package Name | New Package Name | 139|--------------------------|-------------------------------| 140|org.apache.lucene.document|org.apache.lucene.misc.document| 141|org.apache.lucene.index |org.apache.lucene.misc.index | 142|org.apache.lucene.search |org.apache.lucene.misc.search | 143|org.apache.lucene.store |org.apache.lucene.misc.store | 144|org.apache.lucene.util |org.apache.lucene.misc.util | 145 146The following classes were moved to the `lucene-core` module: 147 148- org.apache.lucene.document.InetAddressPoint 149- org.apache.lucene.document.InetAddressRange 150 151### Packages in sandbox module are renamed (LUCENE-9319) 152 153These packages in the `lucene-sandbox` module are renamed: 154 155| Old Package Name | New Package Name | 156|--------------------------|----------------------------------| 157|org.apache.lucene.codecs |org.apache.lucene.sandbox.codecs | 158|org.apache.lucene.document|org.apache.lucene.sandbox.document| 159|org.apache.lucene.search |org.apache.lucene.sandbox.search | 160 161### Backward codecs are renamed (LUCENE-9318) 162 163These packages in the `lucene-backwards-codecs` module are renamed: 164 165| Old Package Name | New Package Name | 166|------------------------|---------------------------------| 167|org.apache.lucene.codecs|org.apache.lucene.backward_codecs| 168 169### JapanesePartOfSpeechStopFilterFactory loads default stop tags if "tags" argument not specified (LUCENE-9567) 170 171Previously, `JapanesePartOfSpeechStopFilterFactory` added no filter if `args` didn't include "tags". Now, it will load 172the default stop tags returned by `JapaneseAnalyzer.getDefaultStopTags()` (i.e. the tags from`stoptags.txt` in the 173`lucene-analyzers-kuromoji` jar.) 174 175### ICUCollationKeyAnalyzer is renamed (LUCENE-9558) 176 177These packages in the `lucene-analysis-icu` module are renamed: 178 179| Old Package Name | New Package Name | 180|---------------------------|------------------------------| 181|org.apache.lucene.collation|org.apache.lucene.analysis.icu| 182 183### Base and concrete analysis factories are moved / package renamed (LUCENE-9317) 184 185Base analysis factories are moved to `lucene-core`, also their package names are renamed. 186 187| Old Class Name | New Class Name | 188|--------------------------------------------------|--------------------------------------------| 189|org.apache.lucene.analysis.util.TokenizerFactory |org.apache.lucene.analysis.TokenizerFactory | 190|org.apache.lucene.analysis.util.CharFilterFactory |org.apache.lucene.analysis.CharFilterFactory| 191|org.apache.lucene.analysis.util.TokenFilterFactory|org.apache.lucene.analysis.TokenizerFactory | 192 193The service provider files placed in `META-INF/services` for custom analysis factories should be renamed as follows: 194 195- META-INF/services/org.apache.lucene.analysis.TokenizerFactory 196- META-INF/services/org.apache.lucene.analysis.CharFilterFactory 197- META-INF/services/org.apache.lucene.analysis.TokenFilterFactory 198 199`StandardTokenizerFactory` is moved to `lucene-core` module. 200 201The `org.apache.lucene.analysis.standard` package in `lucene-analysis-common` module 202is split into `org.apache.lucene.analysis.classic` and `org.apache.lucene.analysis.email`. 203 204### RegExpQuery now rejects invalid backslashes (LUCENE-9370) 205 206We now follow the [Java rules](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs) for accepting backslashes. 207Alphabetic characters other than s, S, w, W, d or D that are preceded by a backslash are considered illegal syntax and will throw an exception. 208 209### RegExp certain regular expressions now match differently (LUCENE-9336) 210 211The commonly used regular expressions \w \W \d \D \s and \S now work the same way [Java Pattern](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART) matching works. Previously these expressions were (mis)interpreted as searches for the literal characters w, d, s etc. 212 213### NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259) 214 215The factory option name to output the original term was corrected in accordance with its Javadoc. 216 217### IndexMergeTool defaults changes (LUCENE-9206) 218 219This command-line tool no longer forceMerges to a single segment. Instead, by 220default it just follows (configurable) merge policy. If you really want to merge 221to a single segment, you can pass `-max-segments 1`. 222 223### FST Builder is renamed FSTCompiler with fluent-style Builder (LUCENE-9089) 224 225Simply use `FSTCompiler` instead of the previous `Builder`. Use either the simple constructor with default settings, or 226the `FSTCompiler.Builder` to tune and tweak any parameter. 227 228### Kuromoji user dictionary now forbids illegal segmentation (LUCENE-8933) 229 230User dictionary now strictly validates if the (concatenated) segment is the same as the surface form. This change avoids 231unexpected runtime exceptions or behaviours. 232For example, these entries are not allowed at all and an exception is thrown when loading the dictionary file. 233 234``` 235# concatenated "日本経済新聞" does not match the surface form "日経新聞" 236日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞 237 238# concatenated "日経新聞" does not match the surface form "日本経済新聞" 239日本経済新聞,日経 新聞,ニッケイ シンブン,カスタム名詞 240``` 241 242### JapaneseTokenizer no longer emits original (compound) tokens by default when the mode is not NORMAL (LUCENE-9123) 243 244`JapaneseTokenizer` and `JapaneseAnalyzer` no longer emits original tokens when `discardCompoundToken` option is not specified. 245The constructor option has been introduced since Lucene 8.5.0, and the default value is changed to `true`. 246 247When given the text "株式会社", JapaneseTokenizer (mode != NORMAL) emits decompounded tokens "株式" and "会社" only and no 248longer outputs the original token "株式会社" by default. To output original tokens, `discardCompoundToken` option should be 249explicitly set to `false`. Be aware that if this option is set to `false`, `SynonymFilter` or `SynonymGraphFilter` does not work 250correctly (see LUCENE-9173). 251 252### Analysis factories now have customizable symbolic names (LUCENE-8778) and need additional no-arg constructor (LUCENE-9281) 253 254The SPI names for concrete subclasses of `TokenizerFactory`, `TokenFilterFactory`, and `CharfilterFactory` are no longer 255derived from their class name. Instead, each factory must have a static "NAME" field like this: 256 257```java 258 /** o.a.l.a.standard.StandardTokenizerFactory's SPI name */ 259 public static final String NAME = "standard"; 260``` 261 262A factory can be resolved/instantiated with its `NAME` by using methods such as `TokenizerFactory.lookupClass(String)` 263or `TokenizerFactory.forName(String, Map<String,String>)`. 264 265If there are any user-defined factory classes that don't have proper `NAME` field, an exception will be thrown 266when (re)loading factories. e.g., when calling `TokenizerFactory.reloadTokenizers(ClassLoader)`. 267 268In addition starting all factories need to implement a public no-arg constructor, too. The reason for this 269change comes from the fact that Lucene now uses `java.util.ServiceLoader` instead its own implementation to 270load the factory classes to be compatible with Java Module System changes (e.g., load factories from modules). 271In the future, extensions to Lucene developed on the Java Module System may expose the factories from their 272`module-info.java` file instead of `META-INF/services`. 273 274This constructor is never called by Lucene, so by default it throws an `UnsupportedOperationException`. User-defined 275factory classes should implement it in the following way: 276 277```java 278 /** Default ctor for compatibility with SPI */ 279 public StandardTokenizerFactory() { 280 throw defaultCtorException(); 281 } 282``` 283 284(`defaultCtorException()` is a protected static helper method) 285 286### TermsEnum is now fully abstract (LUCENE-8292, LUCENE-8662) 287 288`TermsEnum` has been changed to be fully abstract, so non-abstract subclasses must implement all its methods. 289Non-Performance critical `TermsEnum`s can use `BaseTermsEnum` as a base class instead. The change was motivated 290by several performance issues with `FilterTermsEnum` that caused significant slowdowns and massive memory consumption due 291to not delegating all method from `TermsEnum`. 292 293### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream removed (LUCENE-8474) 294 295RAM-based directory implementation have been removed. 296`ByteBuffersDirectory` can be used as a RAM-resident replacement, although it 297is discouraged in favor of the default `MMapDirectory`. 298 299### Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) 300 301`SpanQuery` and `PhraseQuery` now always calculate their slops as 302`(1.0 / (1.0 + distance))`. Payload factor calculation is performed by 303`PayloadDecoder` in the `lucene-queries` module. 304 305### Scorer must produce positive scores (LUCENE-7996) 306 307`Scorer`s are no longer allowed to produce negative scores. If you have custom 308query implementations, you should make sure their score formula may never produce 309negative scores. 310 311As a side-effect of this change, negative boosts are now rejected and 312`FunctionScoreQuery` maps negative values to 0. 313 314### CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) 315 316Instead use `FunctionScoreQuery` and a `DoubleValuesSource` implementation. `BoostedQuery` 317and `BoostingQuery` may be replaced by calls to `FunctionScoreQuery.boostByValue()` and 318`FunctionScoreQuery.boostByQuery()`. To replace more complex calculations in 319`CustomScoreQuery`, use the `lucene-expressions` module: 320 321```java 322SimpleBindings bindings = new SimpleBindings(); 323bindings.add("score", DoubleValuesSource.SCORES); 324bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield")); 325bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield")); 326Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))"); 327FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings)); 328``` 329 330### IndexOptions can no longer be changed dynamically (LUCENE-8134) 331 332Changing `IndexOptions` for a field on the fly will now result into an 333`IllegalArgumentException`. If a field is indexed 334(`FieldType.indexOptions() != IndexOptions.NONE`) then all documents must have 335the same index options for that field. 336 337 338### IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) 339 340Instead use `IndexSearcher.createWeight()`, rewriting the query first, and using 341a boost of `1f`. 342 343### Memory codecs removed (LUCENE-8267) 344 345Memory codecs (`MemoryPostingsFormat`, `MemoryDocValuesFormat`) have been removed from the codebase. 346 347### Direct doc-value format removed (LUCENE-8917) 348 349The `Direct` doc-value format has been removed from the codebase. 350 351### QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) 352 353Caching everything is discouraged as it disables the ability to skip non-interesting documents. 354`ALWAYS_CACHE` can be replaced by a `UsageTrackingQueryCachingPolicy` with an appropriate config. 355 356### English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444) 357 358To retain the old behaviour, pass `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` as an argument 359to the constructor 360 361### StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved 362 363English stop words are now defined in `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` in the 364`analysis-common` module. 365 366### TopDocs.maxScore removed 367 368`TopDocs.maxScore` is removed. `IndexSearcher` and `TopFieldCollector` no longer have 369an option to compute the maximum score when sorting by field. If you need to 370know the maximum score for a query, the recommended approach is to run a 371separate query: 372 373```java 374 TopDocs topHits = searcher.search(query, 1); 375 float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score; 376``` 377 378Thanks to other optimizations that were added to Lucene 8, this query will be 379able to efficiently select the top-scoring document without having to visit 380all matches. 381 382### TopFieldCollector always assumes fillFields=true 383 384Because filling sort values doesn't have a significant overhead, the `fillFields` 385option has been removed from `TopFieldCollector` factory methods. Everything 386behaves as if it was previously set to `true`. 387 388### TopFieldCollector no longer takes a trackDocScores option 389 390Computing scores at collection time is less efficient than running a second 391request in order to only compute scores for documents that made it to the top 392hits. As a consequence, the `trackDocScores` option has been removed and can be 393replaced with the new `TopFieldCollector.populateScores()` helper method. 394 395### IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long 396 397Lucene 8 received optimizations for collection of top-k matches by not visiting 398all matches. However these optimizations won't help if all matches still need 399to be visited in order to compute the total number of hits. As a consequence, 400`IndexSearcher`'s `search()` and `searchAfter()` methods were changed to only count hits 401accurately up to 1,000, and `Topdocs.totalHits` was changed from a `long` to an 402object that says whether the hit count is accurate or a lower bound of the 403actual hit count. 404 405### RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated (LUCENE-8467, LUCENE-8438) 406 407This RAM-based directory implementation is an old piece of code that uses inefficient 408thread synchronization primitives and can be confused as "faster" than the NIO-based 409`MMapDirectory`. It is deprecated and scheduled for removal in future versions of 410Lucene. 411 412### LeafCollector.setScorer() now takes a Scorable rather than a Scorer (LUCENE-6228) 413 414`Scorer` has a number of methods that should never be called from `Collector`s, for example 415those that advance the underlying iterators. To hide these, `LeafCollector.setScorer()` 416now takes a `Scorable`, an abstract class that scorers can extend, with methods 417`docId()` and `score()`. 418 419### Scorers must have non-null Weights 420 421If a custom `Scorer` implementation does not have an associated `Weight`, it can probably 422be replaced with a `Scorable` instead. 423 424### Suggesters now return Long instead of long for weight() during indexing, and double instead of long at suggest time 425 426Most code should just require recompilation, though possibly requiring some added casts. 427 428### TokenStreamComponents is now final 429 430Instead of overriding `TokenStreamComponents.setReader()` to customise analyzer 431initialisation, you should now pass a `Consumer<Reader>` instance to the 432`TokenStreamComponents` constructor. 433 434### LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed 435 436`LowerCaseTokenizer` combined tokenization and filtering in a way that broke token 437normalization, so they have been removed. Instead, use a `LetterTokenizer` followed by 438a `LowerCaseFilter`. 439 440### CharTokenizer no longer takes a normalizer function 441 442`CharTokenizer` now only performs tokenization. To perform any type of filtering 443use a `TokenFilter` chain as you would with any other `Tokenizer`. 444 445### Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery 446 447Both `Highlighter` and `FastVectorHighlighter` need a custom `WeightedSpanTermExtractor` or `FieldQuery`, respectively, 448in order to support `ToParentBlockJoinQuery`/`ToChildBlockJoinQuery`. 449 450### MultiTermAwareComponent replaced by CharFilterFactory.normalize() and TokenFilterFactory.normalize() 451 452Normalization is now type-safe, with `CharFilterFactory.normalize()` returning a `Reader` and 453`TokenFilterFactory.normalize()` returning a `TokenFilter`. 454 455### k1+1 constant factor removed from BM25 similarity numerator (LUCENE-8563) 456 457Scores computed by the `BM25Similarity` are lower than previously as the `k1+1` 458constant factor was removed from the numerator of the scoring formula. 459Ordering of results is preserved unless scores are computed from multiple 460fields using different similarities. The previous behaviour is now exposed 461by the `LegacyBM25Similarity` class which can be found in the lucene-misc jar. 462 463### IndexWriter.maxDoc()/numDocs() removed in favor of IndexWriter.getDocStats() 464 465`IndexWriter.getDocStats()` should be used instead of `maxDoc()` / `numDocs()` which offers a consistent 466view on document stats. Previously calling two methods in order to get point in time stats was subject 467to concurrent changes. 468 469### maxClausesCount moved from BooleanQuery To IndexSearcher (LUCENE-8811) 470 471`IndexSearcher` now performs max clause count checks on all types of queries (including BooleanQueries). 472This led to a logical move of the clauses count from `BooleanQuery` to `IndexSearcher`. 473 474### TopDocs.merge shall no longer allow setting of shard indices 475 476`TopDocs.merge()`'s API has been changed to stop allowing passing in a parameter to indicate if it should 477set shard indices for hits as they are seen during the merge process. This is done to simplify the API 478to be more dynamic in terms of passing in custom tie breakers. 479If shard indices are to be used for tie breaking docs with equal scores during `TopDocs.merge()`, then it is 480mandatory that the input `ScoreDocs` have their shard indices set to valid values prior to calling `merge()` 481 482### TopDocsCollector Shall Throw IllegalArgumentException For Malformed Arguments 483 484`TopDocsCollector` shall no longer return an empty `TopDocs` for malformed arguments. 485Rather, an `IllegalArgumentException` shall be thrown. This is introduced for better 486defence and to ensure that there is no bubbling up of errors when Lucene is 487used in multi level applications 488 489### Assumption of data consistency between different data-structures sharing the same field name 490 491Sorting on a numeric field that is indexed with both doc values and points may use an 492optimization to skip non-competitive documents. This optimization relies on the assumption 493that the same data is stored in these points and doc values. 494 495### Require consistency between data-structures on a per-field basis 496 497The per field data-structures are implicitly defined by the first document 498indexed that contains a certain field. Once defined, the per field 499data-structures are not changeable for the whole index. For example, if you 500first index a document where a certain field is indexed with doc values and 501points, all subsequent documents containing this field must also have this 502field indexed with only doc values and points. 503 504This also means that an index created in the previous version that doesn't 505satisfy this requirement can not be updated. 506 507### Doc values updates are allowed only for doc values only fields 508 509Previously IndexWriter could update doc values for a binary or numeric docValue 510field that was also indexed with other data structures (e.g. postings, vectors 511etc). This is not allowed anymore. A field must be indexed with only doc values 512to be allowed for doc values updates in `IndexWriter`. 513 514### SortedDocValues no longer extends BinaryDocValues (LUCENE-9796) 515 516`SortedDocValues` no longer extends `BinaryDocValues`: `SortedDocValues` do not have a per-document 517binary value, they have a per-document numeric `ordValue()`. The ordinal can then be dereferenced 518to its binary form with `lookupOrd()`, but it was a performance trap to implement a `binaryValue()` 519on the SortedDocValues api that does this behind-the-scenes on every document. 520 521You can replace calls of `binaryValue()` with `lookupOrd(ordValue())` as a "quick fix", but it is 522better to use the ordinal alone (integer-based datastructures) for per-document access, and only 523call `lookupOrd()` a few times at the end (e.g. for the hits you want to display). Otherwise, if you 524really don't want per-document ordinals, but instead a per-document `byte[]`, use a `BinaryDocValues` 525field. 526 527### Removed CodecReader.ramBytesUsed() (LUCENE-9387) 528 529Lucene index readers are now using so little memory with the default codec that 530it was decided to remove the ability to estimate their RAM usage. 531 532### LongValueFacetCounts no longer accepts multiValued param in constructors (LUCENE-9948) 533 534`LongValueFacetCounts` will now automatically detect whether-or-not an indexed field is single- or 535multi-valued. The user no longer needs to provide this information to the ctors. Migrating should 536be as simple as no longer providing this boolean. 537 538### SpanQuery and subclasses have moved from core/ to the queries module 539 540They can now be found in the `org.apache.lucene.queries.spans` package. 541 542### SpanBoostQuery has been removed (LUCENE-8143) 543 544`SpanBoostQuery` was a no-op unless used at the top level of a `SpanQuery` nested 545structure. Use a standard `BoostQuery` here instead. 546 547### Sort is immutable (LUCENE-9325) 548 549Rather than using `setSort()` to change sort values, you should instead create 550a new `Sort` instance with the new values. 551 552### Taxonomy-based faceting uses more modern encodings (LUCENE-9450, LUCENE-10062, LUCENE-10122) 553 554The side-car taxonomy index now uses doc values for ord-to-path lookup (LUCENE-9450) and parent 555lookup (LUCENE-10122) instead of stored fields and positions (respectively). Document ordinals 556are now encoded with `SortedNumericDocValues` instead of using a custom (v-int) binary format. 557Performance gains have been observed with these encoding changes. These changes were introduced 558in 9.0, and 9.x releases remain backwards-compatible with 8.x indexes, but starting with 10.0, 559only the newer formats are supported. Users will need to create a new index with all their 560documents using 9.0 or later to pick up the new format and remain compatible with 10.x releases. 561Just re-adding documents to an existing index is not enough to pick up the changes as the 562format will "stick" to whatever version was used to initially create the index. 563 564Additionally, `OrdinalsReader` (and sub-classes) are fully removed starting with 10.0. These 565classes were `@Deprecated` starting with 9.0. Users are encouraged to rely on the default 566taxonomy facet encodings where possible. If custom formats are needed, users will need 567to manage the indexed data on their own and create new `Facet` implementations to use it. 568