1Lucene Change Log 2 3For more information on past and future Lucene versions, please see: 4http://s.apache.org/luceneversions 5 6======================= Lucene 10.0.0 ======================= 7 8API Changes 9--------------------- 10 11* LUCENE-10010: AutomatonQuery, CompiledAutomaton, RunAutomaton, RegExp 12 classes no longer determinize NFAs. Instead it is the responsibility 13 of the caller to determinize. (Robert Muir) 14 15* LUCENE-10368: IntTaxonomyFacets has been make pkg-private and serves only as an internal 16 implementation detail of taxonomy-faceting. (Greg Miller) 17 18* LUCENE-10400: Remove deprecated dictionary constructors in Kuromoji and Nori (Tomoko Uchida) 19 20* LUCENE-10440: TaxonomyFacets and FloatTaxonomyFacets have been made pkg-private and only serve 21 as internal implementation details of taxonomy-faceting. (Greg Miller) 22 23* LUCENE-10431: MultiTermQuery.setRewriteMethod() has been removed. (Alan Woodward) 24 25* LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and 26 KnnVectorFieldExistsQuery. (Zach Chen, Adrien Grand) 27 28* LUCENE-10561: Reduce class/member visibility of all normalizer and stemmer classes. (Rushabh Shah) 29 30* LUCENE-10266: Move nearest-neighbor search on points to core. (Rushabh Shah) 31 32New Features 33--------------------- 34 35* LUCENE-10010 Introduce NFARunAutomaton to run NFA directly. (Patrick Zhai) 36 37Improvements 38--------------------- 39 40* LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori. 41 (Uihyun Kim) 42 43Optimizations 44--------------------- 45(No changes) 46 47Bug Fixes 48--------------------- 49 50* LUCENE-10599: LogMergePolicy is more likely to keep merging segments until 51 they reach the maximum merge size. (Adrien Grand) 52 53Other 54--------------------- 55* LUCENE-10283: The minimum required Java version was bumped from 11 to 17. 56 (Adrien Grand, Uwe Schindler, Dawid Weiss, Robert Muir) 57 58* LUCENE-10253: The @BadApple annotation has been removed from the test 59 framework. (Adrien Grand) 60 61* LUCENE-10393: Unify binary dictionary and dictionary writer in Kuromoji and Nori. 62 (Tomoko Uchida, Robert Muir) 63 64* LUCENE-10475: Merge dictionary builders in `util` package into `dict` package in Kuromoji and Nori. 65 All classes in `org.apache.lucene.analysis.[ja|ko].util` was moved to `org.apache.lucene.analysis.[ja|ko].dict`. 66 (Tomoko Uchida) 67 68* LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida) 69 70======================== Lucene 9.3.0 ======================= 71 72API Changes 73--------------------- 74(No changes) 75 76New Features 77--------------------- 78(No changes) 79 80Improvements 81--------------------- 82 83* LUCENE-10078: Merge on full flush is now enabled by default with a timeout of 84 500ms. (Adrien Grand) 85 86* LUCENE-10585: Facet module code cleanup (copy/paste scrubbing, simplification and some very minor 87 optimization tweaks). (Greg Miller) 88 89Optimizations 90--------------------- 91* LUCENE-8519: MultiDocValues.getNormValues should not call getMergedFieldInfos (Rushabh Shah) 92 93Bug Fixes 94--------------------- 95 96* LUCENE-10574: Prevent pathological O(N^2) merging. (Adrien Grand) 97 98* LUCENE-10582: Fix merging of overridden CollectionStatistics in CombinedFieldQuery (Yannick Welsch) 99 100* LUCENE-10598: SortedSetDocValues#docValueCount() should be always greater than zero. (Lu Xugang) 101 102* LUCENE-10563: Fix failure to tessellate complex polygon (Craig Taverner) 103 104* LUCENE-10605: Fix error in 32bit jvm object alignment gap calculation (Sun Wuqiang) 105 106* GITHUB#956: Make sure KnnVectorQuery applies search boost. (Julie Tibshirani) 107 108Other 109--------------------- 110 111* LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests. (Dawid Weiss) 112 113* LUCENE-10604: Improve ability to test and debug triangulation algorithm in Tessellator. 114 (Craig Taverner) 115 116======================= Lucene 9.2.0 ======================= 117 118API Changes 119--------------------- 120 121* LUCENE-10325: Facets API extended to support getTopFacets. (Yuting Gan) 122 123* LUCENE-10482: Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the 124 taxoEpoch decide. Add a test case that demonstrates the inconsistencies caused when you reuse taxoArrays on older 125 checkpoints. (Gautam Worah) 126 127* LUCENE-10558: Add new constructors to Kuromoji and Nori dictionary classes to support classpath / 128 module system usage. It is now possible to use JDK's Class/ClassLoader/Module#getResource(...) apis 129 and pass their returned URL to dictionary constructors to load resources from Classpath or Module 130 resources. (Uwe Schindler, Tomoko Uchida, Mike Sokolov) 131 132New Features 133--------------------- 134 135* LUCENE-10312: Add PersianStemmer based on the Arabic stemmer. (Ramin Alirezaee) 136 137* LUCENE-10539: Return a stream of completions from FSTCompletion. (Dawid Weiss) 138 139* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery 140 to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand) 141 142* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory` 143 implementation. `Monitor` can be created with a readonly `QueryIndex` in order to 144 have readonly `Monitor` instances. (Niko Usai) 145 146* LUCENE-10456: Implement rewrite and Weight#count for MultiRangeQuery 147 by merging overlapping ranges . (Jianping Weng) 148 149* LUCENE-10444: Support alternate aggregation functions in association facets. (Greg Miller) 150 151Improvements 152--------------------- 153 154* LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to 155 work properly with or without offsets. (Dawid Weiss) 156 157* LUCENE-10494: Implement method to bulk add all collection elements to a PriorityQueue. 158 (Bauyrzhan Sakhariyev) 159 160* LUCENE-10484: Add support for concurrent random sampling by calling 161 RandomSamplingFacetsCollector#createManager. (Luca Cavanna) 162 163* LUCENE-10467: Throws IllegalArgumentException for Facets#getAllDims and Facets#getTopChildren 164 if topN <= 0. (Yuting Gan) 165 166* LUCENE-9848: Correctly sort HNSW graph neighbors when applying diversity criterion (Mayya 167 Sharipova, Michael Sokolov) 168 169* LUCENE-10527: Use 2*maxConn for the last layer in HNSW (Mayya Sharipova) 170 171Optimizations 172--------------------- 173 174* LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization 175 when NumericLeafComparator#setScorer is called. (Jianping Weng) 176 177* LUCENE-10452: Hunspell: call checkCanceled less frequently to reduce the overhead (Peter Gromov) 178 179* LUCENE-10451: Hunspell: don't perform potentially expensive spellchecking after timeout (Peter Gromov) 180 181* LUCENE-10418: More `Query#rewrite` optimizations for the non-scoring case. 182 (Adrien Grand) 183 184* LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery 185 with FieldExistsQuery. (Zach Chen, Michael McCandless, Adrien Grand) 186 187* LUCENE-10481: FacetsCollector will not request scores if it does not use them. (Mike Drob) 188 189* LUCENE-10503: Potential speedup for pure disjunctions whose clauses produce 190 scores that are very close to each other. (Adrien Grand) 191 192* LUCENE-10315: Use SIMD instructions to decode BKD doc IDs. (Guo Feng, Adrien Grand, Ignacio Vera) 193 194* LUCENE-8836: Speed up calls to TermsEnum#lookupOrd on doc values terms enums 195 and sequences of increasing ords. (Bruno Roustant, Adrien Grand) 196 197* LUCENE-10536: Doc values terms dictionaries now use the first (uncompressed) 198 term of each block as a dictionary when compressing suffixes of the other 63 199 terms of the block. (Adrien Grand) 200 201* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader. 202 (Zach Chen, Adrien Grand, Julie Tibshirani, Tomoko Uchida) 203 204* LUCENE-10542: FieldSource exists implementations can avoid value retrieval (Kevin Risden) 205 206* LUCENE-10534: MinFloatFunction / MaxFloatFunction exists check can be slow (Kevin Risden) 207 208* LUCENE-10496: Queries sorted by field now better handle the degenerate case 209 when the search order and the index order are in opposite directions. 210 (Jianping Weng) 211 212* LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle 213 ordToDoc in HNSW vectors (Lu Xugang) 214 215* LUCENE-10488: Facets#getTopDims optimized for taxonomy faceting and 216 ConcurrentSortedSetDocValuesFacetCounts. (Yuting Gan) 217 218Bug Fixes 219--------------------- 220* LUCENE-10477: Highlighter: WeightedSpanTermExtractor.extractWeightedSpanTerms to Query#rewrite 221 multiple times if necessary. (Christine Poerschke, Adrien Grand) 222 223* LUCENE-10491: A correctness bug in the way scores are provided within TaxonomyFacetSumValueSource 224 was fixed. (Michael McCandless, Greg Miller) 225 226* LUCENE-10466: Ensure IndexSortSortedNumericDocValuesRangeQuery handles sort field 227 types besides LONG (Andriy Redko) 228 229* LUCENE-10292: Suggest: Fix AnalyzingInfixSuggester / BlendedInfixSuggester to correctly return 230 existing lookup() results during concurrent build(). Fix other FST based suggesters so that 231 getCount() returned results consistent with lookup() during concurrent build(). (hossman) 232 233* LUCENE-10508: Fixes some edge cases where GeoArea were built in a way that vertical planes 234 could not evaluate their sign, either because the planes where the same or the center between those 235 planes was lying in one of the planes. (Ignacio Vera) 236 237* LUCENE-10495: Fix return statement of siblingsLoaded() in TaxonomyFacets. (Yuting Gan) 238 239* LUCENE-10533: SpellChecker.formGrams is missing bounds check (Kevin Risden) 240 241* LUCENE-10529: Properly handle when TestTaxonomyFacetAssociations test case randomly indexes 242 no documents instead of throwing an NPE. (Greg Miller) 243 244* LUCENE-10470: Check if polygon has been successfully tessellated before we fail (we are failing some valid 245 tessellations) and allow filtering edges that fold on top of the previous one. (Ignacio Vera) 246 247* LUCENE-10530: Avoid floating point precision test case bug in TestTaxonomyFacetAssociations. 248 (Greg Miller) 249 250* LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode. (Lu Xugang) 251 252* LUCENE-10558: Restore behaviour of deprecated Kuromoji and Nori dictionary constructors for 253 custom dictionary support. Please also use new URL-based constructors for classpath/module 254 system ressources. (Uwe Schindler, Tomoko Uchida, Mike Sokolov) 255 256* LUCENE-10564: Make sure SparseFixedBitSet#or updates ramBytesUsed. (Julie Tibshirani) 257 258Build 259--------------------- 260 261* GITHUB#768: Upgrade forbiddenapis to version 3.3. (Uwe Schindler) 262 263* GITHUB#890: Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) 264 265* LUCENE-10532: Remove LuceneTestCase.Slow annotation. All tests can be fast. (Robert Muir) 266 267Other 268--------------------- 269* LUCENE-10526: Test-framework: Add FilterFileSystemProvider.wrapPath(Path) method for mock filesystems 270 to override if they need to extend the Path implementation. (Gautam Worah, Robert Muir) 271 272* LUCENE-10525: Test-framework: Add detection of illegal windows filenames to WindowsFS. (Gautam Worah) 273 274* LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255. 275 (Robert Muir, Uwe Schindler, Tomoko Uchida, Dawid Weiss) 276 277* GITHUB#854: Allow to link to GitHub pull request from CHANGES. (Tomoko Uchida, Jan Høydahl) 278 279======================= Lucene 9.1.0 ======================= 280 281API Changes 282--------------------- 283 284* LUCENE-10244: MultiCollector::getCollectors is now public, allowing users to access the wrapped 285 collectors. (Andriy Redko) 286 287* LUCENE-10197: UnifiedHighlighter now has a Builder to construct it. The UH's setters are now 288 deprecated. (Animesh Pandey, David Smiley) 289 290* LUCENE-10301: the test framework is now a module. All the classes have been moved from 291 org.apache.lucene.* to org.apache.lucene.tests.* to avoid package name conflicts with the 292 core module. (Dawid Weiss) 293 294* LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues. 295 (Zach Chen, Michael Sokolov, Julie Tibshirani, Adrien Grand) 296 297* LUCENE-10335: Deprecate helper methods for resource loading in IOUtils and StopwordAnalyzerBase 298 that are not compatible with module system (Class#getResourceAsStream() and Class#getResource() 299 are caller sensitive in Java 11). Instead add utility method IOUtils#requireResourceNonNull(T) 300 to test existence of resource based on null return value. (Uwe Schindler, Dawid Weiss) 301 302* LUCENE-10349: WordListLoader methods now return unmodifiable CharArraySets. (Uwe Schindler) 303 304* LUCENE-10377: SortField.getComparator() has changed signature. The second parameter is now 305 a boolean indicating whether or not skipping should be enabled on the comparator. 306 (Alan Woodward) 307 308* LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting. (Greg Miller) 309 310* LUCENE-10368: IntTaxonomyFacets has been deprecated and is no longer a supported extension point 311 for user-created faceting implementations. (Greg Miller) 312 313* LUCENE-10400: Add constructors that take external resource Paths to dictionary classes in Kuromoji and Nori: 314 ConnectionCosts, TokenInfoDictionary, and UnknownDictionary. Old constructors that take resource scheme and 315 resource path in those classes are deprecated; These are replaced with the new constructors and planned to be 316 removed in a future release. (Tomoko Uchida, Uwe Schindler, Mike Sokolov) 317 318* LUCENE-10050: Deprecate DrillSideways#search(Query, Collector) in favor of 319 DrillSideways#search(Query, CollectorManager). This reflects the change (LUCENE-10002) being made in 320 IndexSearcher#search that trends towards using CollectorManagers over Collectors. (Gautam Worah) 321 322* LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces. 323 (David Smiley, Uwe Schindler, Dawid Weiss, Tomoko Uchida) 324 325* LUCENE-10398: Add static method for getting Terms from LeafReader. (Spike Liu) 326 327* LUCENE-10440: TaxonomyFacets and FloatTaxonomyFacets have been deprecated and are no longer 328 supported extension points for user-created faceting implementations. (Greg Miller) 329 330* LUCENE-10431: MultiTermQuery.setRewriteMethod() has been deprecated, and constructor 331 parameters for the various implementations added. (Alan Woodward) 332 333* LUCENE-10171: OpenNLPOpsFactory.getLemmatizerDictionary(String, ResourceLoader) now returns a 334 DictionaryLemmatizer object instead of a raw String serialization of the dictionary. 335 (Spyros Kapnissis via Michael Gibney, Alessandro Benedetti) 336 337New Features 338--------------------- 339 340* LUCENE-10255: Lucene JARs are now proper modules, with module descriptors and dependency information. 341 (Chris Hegarty, Uwe Schindler, Tomoko Uchida, Dawid Weiss) 342 343* LUCENE-10342: Lucene Core now depends on java.logging (JUL) module and reports 344 if MMapDirectory cannot unmap mapped ByteBuffers or RamUsageEstimator's object size 345 calculations may be off. This was added especially for users running Lucene with the 346 Java Module System where some optional features are not available by default or supported. 347 For all apps using Lucene it is strongly recommended, to explicitely require non-standard 348 JDK modules: jdk.unsupported (unmapping) and jdk.management (OOP size for RAM usage calculatons). 349 It is also recommended to install JUL logging adapters to feed the log events into your app's 350 logging system. (Uwe Schindler, Dawid Weiss, Tomoko Uchida, Robert Muir) 351 352* LUCENE-10330: Make MMapDirectory tests fail by default, if unmapping does not work. 353 (Uwe Schindler, Dawid Weiss) 354 355* LUCENE-10223: Add interval function support to StandardQueryParser. Add min-should-match operator 356 support to StandardQueryParser. Update and clean up package documentation in flexible query parser 357 module. (Dawid Weiss, Alan Woodward) 358 359* LUCENE-10220: Add an utility method to get IntervalSource from analyzed text (or token stream). 360 (Uwe Schindler, Dawid Weiss, Alan Woodward) 361 362* LUCENE-10085: Added Weight#count on DocValuesFieldExistsQuery to speed up the query if terms or 363 points are indexed. 364 (Quentin Pradet, Adrien Grand) 365 366* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all 367 documents have the field.. (Alan Woodward) 368 369* LUCENE-10248: Add SpanishPluralStemFilter, for precise stemming of Spanish plurals. 370 For more information, see https://s.apache.org/spanishplural (Xavier Sanchez Loro) 371 372* LUCENE-10243: StandardTokenizer, UAX29URLEmailTokenizer, and HTMLStripCharFilter have 373 been upgraded to Unicode 12.1 (Robert Muir) 374 375* LUCENE-10335: Add ModuleResourceLoader as complement to ClasspathResourceLoader. 376 (Uwe Schindler) 377 378* LUCENE-10245: MultiDoubleValues(Source) and MultiLongValues(Source) were added as multi-valued 379 versions of DoubleValues(Source) and LongValues(Source) to the facets module. LongValueFacetCounts, 380 LongRangeFacetCounts and DoubleRangeFacetCounts were augmented to support these new multi-valued 381 abstractions. DoubleRange and LongRange also support creating queries from these multi-valued 382 sources. (Greg Miller) 383 384* LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets. (Marc D'mello) 385 386* LUCENE-10395: Add support for TotalHitCountCollectorManager, a collector manager 387 based on TotalHitCountCollector that allows users to parallelize counting the 388 number of hits. (Luca Cavanna, Adrien Grand) 389 390* LUCENE-10403: Add ArrayUtil#grow(T[]). (Greg Miller) 391 392* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss, 393 Alan Woodward) 394 395* LUCENE-10378: Implement Weight#count for PointRangeQuery to provide a faster way to calculate 396 the number of matching range docs when each doc has at-most one point and the points are 1-dimensional. 397 (Gautam Worah, Ignacio Vera, Adrien Grand) 398 399* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera) 400 401* LUCENE-10382: Add support for filtering in KnnVectorQuery. This allows for finding the 402 nearest k documents that also match a query. (Julie Tibshirani, Joel Bernstein) 403 404* LUCENE-10237: Add MergeOnFlushMergePolicy to sandbox. 405 (Michael Froh, Anand Kotriwal) 406 407Improvements 408--------------------- 409 410* LUCENE-10313: use java util logging in Luke. Add dynamic log filtering. Drop 411 the persistent log previously written to ~/.luke.d/luke.log. Configure Java's default 412 logging handlers to persist Luke logs according to your needs. (Tomoko Uchida, Dawid Weiss) 413 414* LUCENE-10238: Upgrade icu4j dependency to 70.1. (Dawid Weiss) 415 416* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the 417 PointValues abstract class. (Ignacio Vera, Adrien Grand) 418 419* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree 420 added in LUCENE-9820 (Ignacio Vera) 421 422* LUCENE-9538: Detect polygon self-intersections in the Tessellator. (Ignacio Vera) 423 424* LUCENE-10275: Speed up MultiRangeQuery by using an interval tree. (Ignacio Vera) 425 426* LUCENE-10229: Unify behaviour of match offsets for interval queries on fields 427 with or without offsets enabled. (Patrick Zhai) 428 429* LUCENE-10054 Make HnswGraph hierarchical (Mayya Sharipova, Julie Tibshirani, Mike Sokolov, 430 Adrien Grand) 431 432* LUCENE-10371: Make IndexRearranger able to arrange segment in a determined order. 433 (Patrick Zhai) 434 435Optimizations 436--------------------- 437 438* LUCENE-10329: Use computed block mask for DirectMonotonicReader#get. (Guo Feng) 439 440* LUCENE-10280: Optimize BKD leaves' doc IDs codec when they are continuous. (Guo Feng) 441 442* LUCENE-10233: Store BKD leaves' doc IDs as bitset in some cases (typically for low cardinality fields 443 or sorted indices) to speed up addAll. (Guo Feng, Adrien Grand) 444 445* LUCENE-10225: Improve IntroSelector with 3-ways partitioning. (Bruno Roustant, Adrien Grand) 446 447* LUCENE-10321: Tweak MultiRangeQuery interval tree creation to skip "pulling up" mins. (Greg Miller) 448 449* LUCENE-10252: ValueSource.asDoubleValues and asLongValues should not compute the score unless 450 asked to -- typically never. This fixes a performance regression since 7.3 LUCENE-8099 when some 451 older boosting queries were replaced with this. (David Smiley) 452 453* LUCENE-10346: Optimize facet counting for single-valued TaxonomyFacetCounts. (Guo Feng) 454 455* LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts. (Greg Miller) 456 457* LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll. 458 (Guo Feng, Greg Miller) 459 460* LUCENE-10375: Speed up HNSW vectors merge by first writing combined vector 461 data to a file. (Julie Tibshirani, Adrien Grand) 462 463* LUCENE-10388: Remove MultiLevelSkipListReader#SkipBuffer to make JVM less confused. (Guo Feng) 464 465* LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of 466 matching clauses is a constant. (LuYunCheng via Adrien Grand) 467 468* LUCENE-10412: More `Query#rewrite` optimizations for MatchNoDocsQuery. 469 (Adrien Grand) 470 471* LUCENE-10408 Better encoding of doc Ids in vectors. (Mayya Sharipova, Julie Tibshirani, Adrien Grand) 472 473* LUCENE-10424, LUCENE-10439: Optimize the "everything matches" case for count query in PointRangeQuery. (Ignacio Vera, Lu Xugang) 474 475* LUCENE-10084, LUCENE-10435: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery whenever 476 terms or points have a docCount that is equal to maxDoc. (Vigya Sharma, Lu Xugang) 477 478* LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery 479 then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery. (Lu Xugang) 480 481* LUCENE-10450: IndexSortSortedNumericDocValuesRangeQuery could be rewrite to MatchAllDocsQuery. (Lu Xugang) 482 483* LUCENE-10453: Indexing and search speedup with KNN vectors when using 484 euclidean distance. (Adrien Grand) 485 486* LUCENE-10455: IndexSortSortedNumericDocValuesRangeQuery now implements the scorerSupplier API. (Lu Xugang) 487 488Changes in runtime behavior 489--------------------- 490 491* LUCENE-10291: Lucene now only writes files for terms and postings if at least 492 one field is indexed with postings. (Yannick Welsch) 493 494* LUCENE-10311: FixedBitSet#approximateCardinality now trades accuracy for 495 speed instead of delegating to FixedBitSet#cardinality. 496 (Robert Muir, Adrien Grand) 497 498Bug Fixes 499--------------------- 500 501* LUCENE-10316: fix TestLRUQueryCache.testCachingAccountableQuery failure. (Patrick Zhai) 502 503* LUCENE-10279: Fix equals in MultiRangeQuery. (Ignacio Vera) 504 505* LUCENE-10349: Fix all analyzers to behave according to their documentation: 506 getDefaultStopSet() methods now return unmodifiable CharArraySets. (Uwe Schindler) 507 508* LUCENE-10352: Add missing service provider entries: KoreanNumberFilterFactory, 509 DaitchMokotoffSoundexFilterFactory (Uwe Schindler, Robert Muir) 510 511* LUCENE-10352: Fixed ctor argument checks: JapaneseKatakanaStemFilter, 512 DoubleMetaphoneFilter (Uwe Schindler, Robert Muir) 513 514* LUCENE-10236: Stop duplicating norms when scoring in CombinedFieldQuery. 515 (Zach Chen, Jim Ferenczi, Julie Tibshirani) 516 517* LUCENE-10353: Add random null injection to TestRandomChains. (Robert Muir, 518 Uwe Schindler) 519 520* LUCENE-10377: CheckIndex could incorrectly throw an error when checking index sorts 521 defined on older indexes. (Alan Woodward) 522 523* LUCENE-9952: Address inaccurate dim counts for SSDV faceting in cases where a dim is configured 524 as multi-valued. (Greg Miller) 525 526* LUCENE-10401: Fix lookups on empty doc-value terms dictionaries to no longer 527 throw an ArrayIndexOutOfBoundsException. (Adrien Grand) 528 529* LUCENE-10402: Prefix intervals should declare their automaton as binary, otherwise prefixes 530 containing multibyte characters will not correctly match. (Alan Woodward) 531 532* LUCENE-10407: Containing intervals could sometimes yield incorrect matches when wrapped 533 in a disjunction. (Alan Woodward, Dawid Weiss) 534 535* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored 536 as BytesRef instead of BytesRefHash so they don't have a limit on size. (Ignacio Vera) 537 538* LUCENE-10428: Queries with a misbehaving score function may no longer cause 539 infinite loops in their parent BooleanQuery. 540 (Ankit Jain, Daniel Doubrovkine, Adrien Grand) 541 542* LUCENE-10431: MultiTermQuery no longer includes its rewrite method in its hashcode 543 calculation, as this could cause problems with wrapper queries like BooleanQuery which 544 expect their child queries hashcodes to be stable. (Alan Woodward) 545 546* LUCENE-10469: Fix ScoreMode propagation by ConstantScoreQuery. (Adrien Grand) 547 548Other 549--------------------- 550 551* LUCENE-10273: Deprecate SpanishMinimalStemFilter in favor of SpanishPluralStemFilter. (Robert Muir) 552 553* LUCENE-10284: Upgrade morfologik-stemming to 2.1.8. (Dawid Weiss) 554 555* LUCENE-10310: TestXYDocValuesQueries#doRandomDistanceTest does not produce random circles with radius 556 with '0' value any longer. 557 558* LUCENE-10352: Removed duplicate instances of StringMockResourceLoader and migrated class to 559 test-framework. (Uwe Schindler, Robert Muir) 560 561* LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test 562 and discover classes to check from module system. The test now checks all analyzer modules, 563 so it may discover new bugs outside of analysis:common module. (Uwe Schindler, Robert Muir) 564 565* LUCENE-10413: Make Ukrainian default stop words list available as a public getter. (Alan Woodward) 566 567* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon 568 does not contain enough no-collinear points. (Ignacio Vera) 569 570======================= Lucene 9.0.0 ======================= 571 572New Features 573--------------------- 574 575* LUCENE-9322, LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida) 576 577* LUCENE-9004, LUCENE-10040: Approximate nearest vector search via NSW graphs (Mike Sokolov, Tomoko Uchida et al.) 578 579* LUCENE-9659: SpanPayloadCheckQuery now supports inequalities. (Kevin Watters, Gus Heck) 580 581* LUCENE-9589: Swedish Minimal Stemmer (janhoy) 582 583* LUCENE-9313: Add SerbianAnalyzer based on the snowball stemmer. (Dragan Ivanovic) 584 585* LUCENE-10095: Add NepaliAnalyzer based on the snowball stemmer. (Robert Muir) 586 587* LUCENE-10096: Add TamilAnalyzer based on the snowball stemmer. (Robert Muir) 588 589* LUCENE-10102: Add JapaneseCompletionFilter for Input Method-aware auto-completion (Tomoko Uchida, Robert Muir, Jun Ohtani) 590 591System Requirements 592--------------------- 593 594* LUCENE-8738: Move to Java 11 as minimum Java version. 595 (Adrien Grand, Uwe Schindler) 596 597API Changes 598--------------------- 599 600* LUCENE-8638: Remove many deprecated methods and classes including FST.lookupByOutput(), 601 LegacyBM25Similarity and Jaspell suggester. 602 603* LUCENE-8982: Separate out native code to another module to allow cpp 604 build with gradle. This also changes the name of the native "posix-support" 605 library to LuceneNativeIO. (Zachary Chen, Dawid Weiss) 606 607* LUCENE-9562: All binary analysis packages (and corresponding 608 Maven artifacts) with names containing '-analyzers-' have been renamed 609 to '-analysis-'. (Dawid Weiss) 610 611* LUCENE-8474: RAMDirectory and associated deprecated classes have been 612 removed. (Dawid Weiss) 613 614* LUCENE-3041: The deprecated Weight#extractTerms() method has been 615 removed (Alan Woodward, Simon Willnauer, David Smiley, Luca Cavanna) 616 617* LUCENE-8805: StoredFieldVisitor#stringField now takes a String rather than a 618 byte[] that stores the UTF-8 bytes of the stored string. 619 (Namgyu Kim via Adrien Grand) 620 621* LUCENE-8811: BooleanQuery#setMaxClauseCount() and #getMaxClauseCount() have 622 moved to IndexSearcher. The checks are now implemented using a QueryVisitor 623 and apply to all queries, rather than only booleans. (Atri Sharma, Adrien 624 Grand, Alan Woodward) 625 626* LUCENE-8909: The deprecated IndexWriter#getFieldNames() method has been removed. 627 (Adrien Grand, Munendra S N) 628 629* LUCENE-8948: Change "name" argument in ICU factories to "form". Here, "form" is 630 named after "Unicode Normalization Form". (Tomoko Uchida) 631 632* LUCENE-8933: Validate JapaneseTokenizer user dictionary entry. (Tomoko Uchida) 633 634* LUCENE-8905: Better defence against malformed arguments in TopDocsCollector 635 (Atri Sharma) 636 637* LUCENE-9089: FST Builder renamed FSTCompiler with fluent-style Builder. 638 (Bruno Roustant) 639 640* LUCENE-9212: Deprecated Intervals.multiterm() methods that take a bare Automaton 641 have been removed (Alan Woodward) 642 643* LUCENE-9264: SimpleFSDirectory has been removed in favor of NIOFSDirectory. 644 (Yannick Welsch) 645 646* LUCENE-9281: Use java.util.ServiceLoader to load codec components and analysis 647 factories to be compatible with Java Module System. This allows to load factories 648 without META-INF/service from a Java module exposing the factory in the module 649 descriptor. This breaks backwards compatibility as custom analysis factories 650 must now also implement the default constructor (see MIGRATE.md). 651 (Uwe Schindler, Dawid Weiss) 652 653* LUCENE-9307: BufferedIndexInput#setBufferSize has been removed. (Adrien Grand) 654 655* LUCENE-9340: SimpleBindings#add(SortField) has been removed. (Alan Woodward) 656 657* LUCENE-9462: Fields without positions should still return MatchIterator. 658 (Alan Woodward, Dawid Weiss) 659 660* LUCENE-9516: Removed the ability to replace the IndexingChain / DocConsumer 661 in Lucenes IndexWriter. The interface is not sufficient to efficiently 662 replace the functionality with reasonable efforts. (Simon Willnauer) 663 664* LUCENE-9317 LUCENE-9318 LUCENE-9319 LUCENE-9558 LUCENE-9600 : Clean up package name conflicts 665 between modules. See MIGRATE.md for details. (David Ryan, Tomoko Uchida, Uwe Schindler, Dawid Weiss) 666 667* LUCENE-9646: Set BM25Similarity discountOverlaps via the constructor (Patrick Marty via Bruno Roustant) 668 669* LUCENE-9480: Make DataInput's skipBytes(long) abstract as the implementation was not performant. 670 IndexInput's api is unaffected: skipBytes() is implemented via seek(). (Greg Miller) 671 672* LUCENE-9796: SortedDocValues no longer extends BinaryDocValues, as binaryValue() was not performant. 673 See MIGRATE.md for details. (Robert Muir) 674 675* LUCENE-9853: JapaneseAnalyzer should use CJKWidthCharFilter for full-width and half-width character normalization. 676 (Tomoko Uchida) 677 678* LUCENE-9387: Removed CodecReader#ramBytesUsed. (Adrien Grand) 679 680* LUCENE-9334: Require consistency between data-structures on a per-field basis. 681 A field across all documents within an index must be indexed with the same index 682 options and data-structures. As a consequence of this, doc values updates are 683 only applicable for fields that are indexed with doc values only. (Mayya Sharipova, 684 Adrien Grand, Simon Willnauer) 685 686* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand) 687 688* LUCENE-9948: No longer require the user to specify whether-or-not a field is multi-valued in 689 LongValueFacetCounts (detect automatically based on what is indexed). (Greg Miller) 690 691* LUCENE-9843: Remove compression option on default codec's docvalues. (Jack Conradson) 692 693* LUCENE-9204: SpanQuery and its subclasses have been moved from core/ into the 694 queries/ module. (Alan Woodward) 695 696* LUCENE-9454: Analyzer no longer has a mutable version field. (Alan Woodward) 697 698* LUCENE-9956: Expose the getBaseQuery, getDrillDownQueries APIs from DrillDownQuery (Gautam Worah) 699 700* LUCENE-8143: SpanBoostQuery has been removed. (Alan Woodward) 701 702* LUCENE-9998: Remove unused parameter fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish(), 703 including those subclasses. (kkewwei) 704 705* LUCENE-7020: TieredMergePolicy#setMaxMergeAtOnceExplicit has been removed. 706 TieredMergePolicy no longer sets a limit on the maximum number of segments 707 that can be merged at once via a forced merge. (Adrien Grand, Shawn Heisey) 708 709* LUCENE-10027: Directory reader open API from indexCommit and leafSorter has been modified 710 to add an extra parameter - minSupportedMajorVersion. (Mayya Sharipova) 711 712* LUCENE-9620: Added a (sometimes) faster implementation for IndexSearcher#count that relies on the new Weight#count API. 713 The Weight#count API represents a cleaner way for Query classes to optimize their counting method. 714 (Gautam Worah, Adrien Grand) 715 716* LUCENE-10089: Add a method to SortField that allows to enable or disable numeric sort 717 optimization to use the points index to skip over non-competitive documents, 718 which is enabled by default from 9.0 (Mayya Sharipova, Adrien Grand) 719 720* LUCENE-10115: Add an extension point, BaseQueryParser#getFuzzyDistance, to allow custom 721 query parsers to determine the similarity distance for fuzzy queries. (Chris Hegarty) 722 723* LUCENE-10132: Support addition of diagnostics by custom merge policies (Chris Hegarty) 724 725* LUCENE-9325: Sort is now final, and the `setSort()` method has been removed (Alan Woodward) 726 727* LUCENE-9431: The UnifiedHighlighter's WEIGHT_MATCHES flag is now set by default, provided its 728 requirements are met. It can be disabled via over-riding getFlags (Animesh Pandey, David Smiley) 729 730* LUCENE-10158: Add a new interface Unwrappable to the utils package to allow code to 731 unwrap wrappers/delegators that are added by Lucene's testing framework. This will allow 732 testing new MMapDirectory implementation based on JDK Project Panama. (Uwe Schindler) 733 734* LUCENE-10260: LucenePackage class has been removed. The implementation string can be 735 retrieved from Version.getPackageImplementationVersion(). (Uwe Schindler, Dawid Weiss) 736 737Improvements 738--------------------- 739 740* LUCENE-10234: Added Automatic-Module-Name to all JARs. This is the first step to enable full Java 741 module system (JMS) support in later Lucene versions. At the moment, the automatic names should 742 not be considered stable. (Dawid Weiss, Uwe Schindler) 743 744* LUCENE-10182: TestRamUsageEstimator used RamUsageTester.sizeOf throughout, making some of the 745 tests trivial. Now, it compares results from RamUsageEstimator with those from RamUsageTester. 746 To prevent this error in the future, RamUsageTester.sizeOf was renamed to ramUsed. 747 (Uwe Schindler, Dawid Weiss, Stefan Vodita) 748 749* LUCENE-10129: RamUsageEstimator overloads the shallowSizeOf method for primitive arrays 750 to avoid falling back on shallowSizeOf(Object), which could lead to performance traps. 751 (Robert Muir, Uwe Schindler, Stefan Vodita) 752 753* LUCENE-10139: ExternalRefSorter returns a covariant with a subtype of BytesRefIterator 754 that is Closeable. (Dawid Weiss). 755 756* LUCENE-10135: Correct passage selector behavior for long matching snippets (Dawid Weiss). 757 758* LUCENE-9960: Avoid unnecessary top element replacement for equal elements in PriorityQueue. (Dawid Weiss) 759 760* LUCENE-9633: Improve match highlighter behavior for degenerate intervals (on non-existing positions). 761 (Dawid Weiss) 762 763* LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS is returned. (Patrick Zhai) 764 765* LUCENE-9576: Improve ConcurrentMergeScheduler settings by default, assuming modern I/O. 766 Previously Lucene was too conservative, jumping through hoops to detect if disks were SSD-backed. 767 In many common modern cases (VMs, RAID arrays, containers, encrypted mounts, non-Linux OS), 768 the pessimistic heuristics were wrong, resulting in slower indexing performance. Heuristics were 769 also complex and would trigger JDK issues even on unrelated mount points. Merge scheduler defaults 770 are now modernized and the heuristics removed. Users with spinning disks that want to maximize I/O 771 performance should tweak ConcurrentMergeScheduler. (Robert Muir) 772 773* LUCENE-9463: Query match region retrieval component, passage scoring and formatting 774 for building custom highlighters. (Alan Woodward, Dawid Weiss) 775 776* LUCENE-9370: RegExp query is no longer lenient about inappropriate backslashes and 777 follows the Java Pattern policy for rejecting illegal syntax. (Mark Harwood) 778 779* LUCENE-9336: RegExp query now supports \w \W \d \D \s \S expressions. 780 This is a break with previous behaviour where these were (mis)interpreted 781 as literally the characters w W d etc. (Mark Harwood) 782 783* LUCENE-8757: When provided with an ExecutorService to run queries across 784 multiple threads, IndexSearcher now groups small segments together, up to 785 250k docs per slice. (Atri Sharma via Adrien Grand) 786 787* LUCENE-8857: Introduce Custom Tiebreakers in TopDocs.merge for tie breaking on 788 docs on equal scores. Also, remove the ability of TopDocs.merge to set shard 789 indices (Atri Sharma, Adrien Grand, Simon Willnauer) 790 791* LUCENE-8958: Shared count early termination for relevance sorted indices (Atri Sharma) 792 793* LUCENE-8937: Avoid aggressive stemming on numbers in the FrenchMinimalStemmer. 794 (Adrien Gallou via Tomoko Uchida) 795 796* LUCENE-8596: Kuromoji user dictionary now accepts entries containing hash mark (#) that were 797 previously treated as beginning a line-ending comment (Satoshi Kato and Masaru Hasegawa via 798 Michael Sokolov) 799 800* LUCENE-9109: Use StackWalker to implement TestSecurityManager's detection 801 of JVM exit (Uwe Schindler) 802 803* LUCENE-9110: Refactor stack analysis in tests to use generalized LuceneTestCase 804 methods that use StackWalker (Uwe Schindler) 805 806* LUCENE-9206: IndexMergeTool gets additional options to control the merging. 807 This tool no longer forceMerge(1)s to a single segment by default. If you 808 rely upon this behavior, pass -max-segments 1 instead. (Robert Muir) 809 810* LUCENE-9220: Upgrade snowball to 2.0. New snowball stemmers: Hindi, Indonesian, 811 Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball' 812 task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss) 813 814* LUCENE-9354: Improvements to snowball french stopwords list, so that it is less 815 aggressive. (Philippe Ouellet) 816 817* LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley) 818 819* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma) 820 821* LUCENE-9280: Add an ability for field comparators to skip non-competitive documents. 822 Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE 823 instructs Lucene to skip non-competitive documents whenever possible. For numeric 824 sort fields the skipping functionality works when the same field is indexed both 825 with doc values and points. In this case, there is an assumption that the same data is 826 stored in these points and doc values (Mayya Sharipova, Jim Ferenczi, Adrien Grand) 827 828* LUCENE-9449: Enhance DocComparator to provide an iterator over competitive 829 documents when searching with "after". This iterator can quickly position 830 on the desired "after" document skipping all documents and segments before 831 "after". Also redesign numeric comparators to provide skipping functionality 832 by default. (Mayya Sharipova, Jim Ferenczi) 833 834* LUCENE-9527: Upgrade javacc to 7.0.4, regenerate query parsers. (Dawid Weiss) 835 836* LUCENE-9531: Consolidated CharStream and FastCharStream classes: these have been moved 837 from each query parser package to org.apache.lucene.queryparser.charstream (Dawid Weiss). 838 839* LUCENE-9450: Use BinaryDocValues for the taxonomy index instead of StoredFields. 840 Add backwards compatibility tests for the taxonomy index. (Gautam Worah, Michael McCandless) 841 842* LUCENE-9605: Update snowball to d8cf01ddf37a, adds Yiddish stemmer. (Robert Muir) 843 844* LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, 845 and rename to DirectIODirectory (Zach Chen, Uwe Schindler, Mike McCandless, Dawid Weiss). 846 847* LUCENE-9674: Implement faster advance on VectorValues using binary search. 848 (Anand Kotriwal, Mike Sokolov) 849 850* LUCENE-9794: Speed up implementations of DataInput.skipBytes(). (Greg Miller) 851 852* LUCENE-9898: Removes no longer used scorePayload method from BM25Similarity 853 (Pieter van Boxtel) 854 855* LUCENE-9850: Switch to PFOR encoding for doc IDs (instead of FOR). (Greg Miller) 856 857* LUCENE-9929: Add NorwegianNormalizationFilter, which does the same as ScandinavianNormalizationFilter except 858 it does not fold oo->ø and ao->å. (janhoy, Robert Muir, Adrien Grand) 859 860* LUCENE-9535: Improve DocumentsWriterPerThreadPool to prefer larger instances. 861 (Adrien Grand) 862 863* LUCENE-10000: MultiCollectorManager now has parity with MultiCollector with respect to how it 864 handles CollectionTerminationException and setMinCompetitiveScore calls. (Greg Miller) 865 866* LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) 867 (Uwe Schinder) 868 869* LUCENE-9662: Make CheckIndex concurrent by parallelizing index check across segments. 870 (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir) 871 872* LUCENE-9476: Add new getBulkPath API to DirectoryTaxonomyReader to more efficiently retrieve FacetLabels for multiple 873 facet ordinals at once. This API is 2-4% faster than iteratively calling getPath. 874 The getPath API now throws an IAE instead of returning null if the ordinal is out of bounds. 875 (Gautam Worah, Mike McCandless) 876 877* LUCENE-10113: Use VarHandles to access int/long/short primitive types in byte arrays. 878 This improves readability and performance of encoding/decoding of primitives to index 879 file format in input/output classes like DataInput / DataOutput and codecs. 880 (Uwe Schindler, Robert Muir) 881 882* LUCENE-10112: Improve LZ4 Compression performance with direct primitive read/writes. 883 (Tim Brooks, Uwe Schindler, Robert Muir, Adrien Grand) 884 885* LUCENE-10125: Optimize primitive writes in OutputStreamIndexOutput. 886 (Uwe Schindler, Robert Muir, Adrien Grand) 887 888* LUCENE-10143: Delegate primitive writes in RateLimitedIndexOutput. 889 (Uwe Schindler, Robert Muir, Adrien Grand) 890 891* LUCENE-10145, LUCENE-10153: Faster flushes and merges of points by leveraging 892 VarHandles. (Adrien Grand) 893 894* LUCENE-10201: Spatial-Extras: Upgrading Spatial4j to 0.8 improving a varitety of minor things. 895 See release notes. https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8 896 (David Smiley) 897 898* LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values 899 with its own custom encoding. (Greg Miller) 900 901Bug fixes 902--------------------- 903 904* LUCENE-9686: Fix read past EOF handling in DirectIODirectory. (Zach Chen, 905 Julie Tibshirani) 906 907* LUCENE-8663: NRTCachingDirectory.slowFileExists may open a file while 908 it's inaccessible. (Dawid Weiss) 909 910* LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to 911 estimate Long.valueOf cache size. (Cleber Muramoto, Dawid Weiss) 912 913* LUCENE-9290: Don't assume that different XYPoint have different hash code 914 (Ignacio Vera via Mike Drob) 915 916* LUCENE-9372: Fix paths for cygwin/msys before gradle wrapper jar lookup. 917 (Peter Barna) 918 919* LUCENE-9365: FuzzyQuery was missing matches when prefix length was equal to the term length 920 (Mark Harwood, Mike Drob) 921 922* LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear edges during polygon 923 splitting. (Ignacio Vera) 924 925* LUCENE-9930: The Ukrainian analyzer was reloading its dictionary for every new 926 TokenStreamComponents, which could lead to memory leaks. (Alan Woodward) 927 928* LUCENE-9940: The order of disjuncts in DisjunctionMaxQuery does not matter 929 for equality checks (Alan Woodward) 930 931* LUCENE-9971: Requesting facet counts for unseen dimensions in SortedSetDocValueFacetCounts and 932 ConcurrentSortedSetDocValueFacetCounts now returns null / -1 instead of throwing 933 IllegalArgumentException as per Javadoc spec in Facets. (Alexander Lukyanchikov) 934 935* LUCENE-9823: Prevent unsafe rewrites for SynonymQuery and CombinedFieldQuery. Before, rewriting 936 could slightly change the scoring when weights were specified. (Naoto Minami via Julie Tibshirani) 937 938* LUCENE-10047: Fix a value de-duping bug in LongValueFacetCounts and RangeFacetCounts 939 (Greg Miller) 940 941* LUCENE-10101, LUCENE-9281: Use getField() instead of getDeclaredField() to 942 minimize security impact by analysis SPI discovery. (Uwe Schindler) 943 944* LUCENE-10114: Remove unused byte order mark in Lucene90PostingsWriter. This 945 was initially introduced by accident in Lucene 8.4. (Uwe Schindler) 946 947* LUCENE-10140: Fix cases where minimizing interval iterators could return 948 incorrect matches (Nikolay Khitrin, Alan Woodward) 949 950Changes in Backwards Compatibility Policy 951 952* LUCENE-9904: regenerated UAX29URLEmailTokenizer and the corresponding analyzer with up-to-date top 953 level domains. This may change the token sequence compared to previous Lucene versions. (Dawid Weiss) 954 955* LUCENE-9669: DirectoryReader#open now accepts an argument to open indices created with versions 956 older than N-1. Lucene now can open indices created with a major version of N-2 in read-only mode. 957 Opening an index created with a major version of N-2 with an IndexWriter is not supported. 958 Further does lucene only support file-format compatibilty which enables reading of old indices while 959 semantic changes like analysis or certain encoding on top of the file format are only supported on 960 a best effort basis. (Simon Willnauer) 961 962* LUCENE-10232: Fix MultiRangeQuery to confirm all dimensions for a given range match. (Greg Miller) 963 964Build 965--------------------- 966 967* LUCENE-9077 LUCENE-9433: Support Gradle build, remove Ant support from trunk (Dawid Weiss, Erick Erickson, Uwe Schindler et.al.) 968 969* LUCENE-8768: Fix Javadocs build in Java 11. (Namgyu Kim) 970 971* LUCENE-9544: add regenerate gradle script for nori dictionary (Namgyu Kim) 972 973* LUCENE-10195: Add gradle cache option and make some tasks cacheable. (Jerome Prinet, Dawid Weiss) 974 975* LUCENE-10198: LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults 976 (balmukund.mandal@intel.com, Dawid Weiss) 977 978* LUCENE-10163: Move LICENSE and NOTICE files to top level to satisfy src artifact requirements (janhoy) 979 980Other 981--------------------- 982 983* LUCENE-10122: Use NumericDocValues to store taxonomy parent array (Patrick Zhai) 984 985* LUCENE-10136: allow 'var' declarations in source code (Dawid Weiss) 986 987* LUCENE-9570, LUCENE-9564: Apply google java format and enforce it on source Java files. 988 Review diffs and correct automatic formatting oddities. (Erick Erickson, 989 Bruno Roustant, Dawid Weiss) 990 991* LUCENE-9631: Properly override slice() on subclasses of OffsetRange. (Dawid Weiss) 992 993* LUCENE-9391: Upgrade HPPC to 0.8.2. (Patrick Zhai) 994 995* LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap. (Patrick Zhai) 996 997* LUCENE-9092: upgrade randomizedtesting to 2.7.5 (Dawid Weiss) 998 999* LUCENE-8656: Deprecations in FuzzyQuery and get compiler warnings out of 1000 queryparser code (Alan Woodward, Erick Erickson) 1001 1002* LUCENE-9344: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler) 1003 1004* LUCENE-9267: Update MatchingQueries documentation to correct 1005 time unit. (Pierre-Luc Perron via Mike Drob) 1006 1007* LUCENE-9411: Fail compilation on warnings, 9x gradle-only (Erick Erickson, Dawid Weiss) 1008 Deserves mention here as well as Lucene CHANGES.txt since it affects both. 1009 1010* LUCENE-9215: Replace checkJavaDocs.py with doclet (Robert Muir, Dawid Weiss, Uwe Schindler) 1011 1012* LUCENE-9497: Integrate Error Prone, a static analysis tool during compilation (Dawid Weiss, Varun Thacker) 1013 1014* LUCENE-9627: Remove unused Lucene50FieldInfosFormat codec and small refactor some codecs 1015 to separate reading header/footer from reading content of the file. (Ignacio Vera) 1016 1017* LUCENE-9773: Upgrade icu to 68.2 (Robert Muir) 1018 1019* LUCENE-9822: Add assertion to PFOR exception encoding, documenting the BLOCK_SIZE assumption. (Greg Miller) 1020 1021* LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting. (Zach Chen) 1022 1023* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move 1024 the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera) 1025 1026* LUCENE-9907: Remove dependency on PackedInts#getReader() from the current codecs and move the 1027 method to backwards codec. (Ignacio Vera) 1028 1029* LUCENE-10024: Catch NoSuchFileException when opening index directory with Luke. 1030 (Michael Wechner, Tomoko Uchida) 1031 1032======================= Lucene 8.11.1 ======================= 1033 1034Bug Fixes 1035--------------------- 1036* SOLR-15843: Update Log4J to 2.16 (Mike Drob, janhoy) 1037 1038======================= Lucene 8.11.0 ======================= 1039 1040API Changes 1041--------------------- 1042(No changes) 1043 1044New Features 1045--------------------- 1046(No changes) 1047 1048Improvements 1049--------------------- 1050 1051* LUCENE-9662: Make CheckIndex concurrent by parallelizing index check across segments. 1052 (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir) 1053 1054* LUCENE-10103: Make QueryCache respect Accountable queries. (Patrick Zhai) 1055 1056Optimizations 1057--------------------- 1058 1059* LUCENE-9673: Substantially improve RAM efficiency of how MemoryIndex stores 1060 postings in memory, and reduced a bit of RAM overhead in 1061 IndexWriter's internal postings book-keeping (mashudong) 1062 1063* LUCENE-10196: Improve IntroSorter with 3-ways partitioning. (Bruno Roustant) 1064 1065Bug Fixes 1066--------------------- 1067 1068* LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter. 1069 (Lu Xugang) 1070 1071* LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter. 1072 (Lu Xugang) 1073 1074* LUCENE-10070 Skip deleted docs when accumulating facet counts for all docs. (Ankur Goel, Greg Miller) 1075 1076* LUCENE-10134: ConcurrentSortedSetDocValuesFacetCounts shouldn't share liveDocs Bits across threads. 1077 (Ankur Goel) 1078 1079* LUCENE-10154: NumericLeafComparator to define getPointValues. (Mayya Sharipova, Adrien Grand) 1080 1081* LUCENE-10208: Ensure that the minimum competitive score does not decrease in concurrent search. (Jim Ferenczi, Adrien Grand) 1082 1083Build 1084--------------------- 1085 1086* LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2. (Uwe Schindler) 1087 1088Other 1089--------------------- 1090 1091* LUCENE-10098: Add docs/links to GermanAnalyzer describing how to decompound nouns. (Robert Muir) 1092 1093======================= Lucene 8.10.1 ======================= 1094 1095Bug Fixes 1096--------------------- 1097 1098* LUCENE-10110: MultiCollector now handles single leaf collector that wants to skip low-scoring hits 1099 but the combined score mode doesn't allow it. (Jim Ferenczi) 1100 1101* LUCENE-10119: Sort optimization with search_after can wrongly skip documents 1102 whose values are equal to the last value of the previous page (Nhat Nguyen) 1103 1104* LUCENE-10126: Sort optimization with a chunked bulk scorer 1105 can wrongly skip documents (Nhat Nguyen, Mayya Sharipova) 1106 1107======================= Lucene 8.10.0 ======================= 1108 1109API Changes 1110--------------------- 1111* LUCENE-9962: DrillSideways allows sub-classes to provide "drill down" FacetsCollectors. They 1112 may provide a null collector if they choose to bypass "drill down" facet collection. (Greg Miller) 1113 1114* LUCENE-9902: Change the getValue method from IntTaxonomyFacets to be protected instead of private. 1115 Users can now access the count of an ordinal directly without constructing an extra FacetLabel. 1116 Also use variable length arguments for the getOrdinal call in TaxonomyReader. (Gautam Worah) 1117 1118* LUCENE-10036: Replaced the ScoreCachingWrappingScorer ctor with a static factory method that 1119 ensures unnecessary wrapping doesn't occur. (Greg Miller) 1120 1121* LUCENE-10027: Add a new Directory reader open API from indexCommit and 1122 a custom comparator for sorting leaf readers. (Mayya Sharipova) 1123 1124* LUCENE-7020: TieredMergePolicy#setMaxMergeAtOnceExplicit is deprecated 1125 and the number of segments that get merged via explicit merges is unlimited 1126 by default. (Adrien Grand, Shawn Heisey) 1127 1128New Features 1129--------------------- 1130* LUCENE-10083: Analyzer and stemmer for Telugu language (Vinod Singh) 1131 1132* LUCENE-10035: The SimpleText codec now writes skip lists. 1133 (wuda via Adrien Grand) 1134 1135Improvements 1136--------------------- 1137* LUCENE-9944: Allow DrillSideways users to provide their own CollectorManager without also requiring 1138 them to provide an ExecutorService. (Greg Miller) 1139 1140* LUCENE-9946: Support for multi-value fields in LongRangeFacetCounts and 1141 DoubleRangeFacetCounts. (Greg Miller) 1142 1143* LUCENE-9965: Added QueryProfilerIndexSearcher and ProfilerCollector to support debugging 1144 query execution strategy and timing. (Jack Conradson, Julie Tibshirani) 1145 1146* LUCENE-9981: Operations.getCommonSuffix/Prefix(Automaton) is now much more 1147 efficient, from a worst case exponential down to quadratic cost in the 1148 number of states + transitions in the Automaton. These methods no longer 1149 use the costly determinize method, removing the risk of 1150 TooComplexToDeterminizeException (Robert Muir, Mike McCandless) 1151 1152* LUCENE-9981: Operations.determinize now throws TooComplexToDeterminizeException 1153 based on too much "effort" spent determinizing rather than a precise state 1154 count on the resulting returned automaton, to better handle adversarial 1155 cases like det(rev(regexp("(.*a){2000}"))) that spend lots of effort but 1156 result in smallish eventual returned automata. (Robert Muir, Mike McCandless) 1157 1158* LUCENE-9983: Stop sorting determinize powersets unnecessarily. (Patrick Zhai) 1159 1160* LUCENE-9177: ICUNormalizer2CharFilter no longer requires normalization-inert 1161 characters as boundaries for incremental processing, vastly improving worst-case 1162 performance. (Michael Gibney) 1163 1164* LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring 1165 (Grigoriy Troitskiy) 1166 1167* LUCENE-9945: Extend DrillSideways to support exposing FacetCollectors directly. 1168 (Greg Miller, Sejal Pawar) 1169 1170* LUCENE-10043: Decrease default for LRUQueryCache's skipCacheFactor to 10. 1171 This prevents caching a query clause when it is much more expensive than 1172 running the top-level query. (Julie Tibshirani) 1173 1174* LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts. (Greg Miller) 1175 1176* LUCENE-9917: The BEST_SPEED compression mode now trades more compression ratio 1177 in exchange of faster reads. (Adrien Grand) 1178 1179Optimizations 1180--------------------- 1181* LUCENE-9996: Improved memory efficiency of IndexWriter's RAM buffer, in 1182 particular in the case of many fields and many indexing threads. 1183 (Adrien Grand) 1184 1185* LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery. 1186 (David Harsha via Julie Tibshirani) 1187 1188* LUCENE-10031: Slightly faster segment merging for sorted indices. 1189 (Adrien Grand) 1190 1191* LUCENE-10014: Lucene90DocValuesFormat was using too many bits per 1192 value when compressing via gcd, unnecessarily wasting index storage. 1193 (weizijun) 1194 1195Bug Fixes 1196--------------------- 1197* LUCENE-9988: Fix DrillSideways correctness bug introduced in LUCENE-9944 (Greg Miller) 1198 1199* LUCENE-9964: Duplicate long values in a document field should only be counted once when using SortedNumericDocValuesFields 1200 (Gautam Worah) 1201 1202* LUCENE-9999: CombinedFieldQuery can fail with an exception when document 1203 is missing some fields. (Jim Ferenczi, Julie Tibshirani) 1204 1205* LUCENE-10020: DocComparator should not skip docs with the same docID on 1206 multiple sorts with search after (Mayya Sharipova, Julie Tibshirani) 1207 1208* LUCENE-10026: Fix CombinedFieldQuery equals and hashCode, which ensures 1209 query rewrites don't drop CombinedFieldQuery clauses. (Julie Tibshirani) 1210 1211* LUCENE-10039: Correct CombinedFieldQuery scoring when there is a single 1212 field. (Julie Tibshirani) 1213 1214* LUCENE-10046: Counting bug fixed in StringValueFacetCounts. (Greg Miller) 1215 1216* LUCENE-9963: FlattenGraphFilter is now more robust when handling 1217 incoming holes in the input token graph (Geoff Lawson) 1218 1219* LUCENE-10008: Respect ignoreCase in CommonGramsFilterFactory (Vigya Sharma) 1220 1221* LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached. (Greg Miller, Zachary Chen) 1222 1223* LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces. 1224 (Jim Ferenczi) 1225 1226* LUCENE-10106: Sort optimization can wrongly skip the first document of 1227 each segment (Nhat Nguyen) 1228 1229Other 1230--------------------- 1231(No changes) 1232 1233======================= Lucene 8.9.0 ======================= 1234 1235API Changes 1236--------------------- 1237 1238* LUCENE-9680: IndexWriter#getFieldNames() method added to get fields present in index. 1239 This method was removed in LUCENE-8909. (Oren Ovadia) 1240 1241New Features 1242--------------------- 1243* LUCENE-9507: Custom order for leaves in IndexReader and IndexWriter 1244 (Mayya Sharipova, Mike McCandless, Jim Ferenczi) 1245 1246* LUCENE-9575: PatternTypingFilter has been added to allow setting a type attribute on tokens based on 1247 a configured set of regular expressions (Gus Heck). 1248 1249* LUCENE-9572: TypeAsSynonymFilter has been enhanced support ignoring some types, and to allow 1250 the generated synonyms to copy some or all flags from the original token (Gus Heck). 1251 1252* LUCENE-9574 A token filter to drop tokens that match all specified flags. (Gus Heck, Uwe Schindler) 1253 1254* LUCENE-9537: Added smoothingScore method and default implementation to 1255 Scorable abstract class. The smoothing score allows scorers to calculate a 1256 score for a document where the search term or subquery is not present. The 1257 smoothing score acts like an idf so that documents that do not have terms or 1258 subqueries that are more frequent in the index are not penalized as much as 1259 documents that do not have less frequent terms or subqueries and prevents 1260 scores which are the product or terms or subqueries from going to zero. Added 1261 the implementation of the Indri AND and the IndriDirichletSimilarity from the 1262 academic Indri search engine: http://www.lemurproject.org/indri.php. 1263 (Cameron VandenBerg) 1264 1265* LUCENE-9694: New tool for creating a deterministic index to enable benchmarking changes 1266 on a consistent multi-segment index even when they require re-indexing. (Patrick Zhai) 1267 1268* LUCENE-9385: Add FacetsConfig option to control which drill-down 1269 terms are indexed for a FacetLabel (Zachary Chen) 1270 1271* LUCENE-9950: New facet counting implementation for general string doc value fields 1272 (SortedSetDocValues / SortedDocValues) not created through FacetsConfig (Greg Miller) 1273 1274Improvements 1275--------------------- 1276 1277* LUCENE-9725: BM25FQuery was extended to handle similarities beyond BM25Similarity. It 1278 was renamed to CombinedFieldQuery to reflect its more general scope. (Julie Tibshirani) 1279 1280* LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues. 1281 (Jaison Bi via Bruno Roustant) 1282 1283* LUCENE-9687: Hunspell support improvements: add API for spell-checking and suggestions, support compound words, 1284 fix various behavior differences between Java and C++ implementations, improve performance (Peter Gromov, Dawid Weiss) 1285 1286* LUCENE-9877: Reduce index size by increasing allowable exceptions in PForUtil from 3 to 7. (Greg Miller) 1287 1288* LUCENE-9935: Enable bulk merge for stored fields with index sort. (Robert Muir, Adrien Grand, Nhat Nguyen) 1289 1290Optimizations 1291--------------------- 1292 1293* LUCENE-9932: Performance improvement for BKD index building (neoremind) 1294 1295* LUCENE-9827: Speed up merging of stored fields and term vectors for smaller segments. 1296 (Daniel Mitterdorfer, Dimitrios Liapis, Adrien Grand, Robert Muir) 1297 1298Bug Fixes 1299--------------------- 1300 1301* LUCENE-9791: BytesRefHash.equals/find is now thread safe, fixing a 1302 Luwak/Monitor bug causing registered queries to sometimes fail to 1303 match. (Paweł Bugalski) 1304 1305* LUCENE-9887: Fixed parameter use in RadixSelector. 1306 (liupanfeng via Adrien Grand) 1307 1308* LUCENE-9958: Fixed performance regression for boolean queries that configure a 1309 minimum number of matching clauses. (Adrien Grand, Matt Weber) 1310 1311* LUCENE-9953: LongValueFacetCounts should count each document at most once when determining 1312 the total count for a dimension. Prior to this fix, multi-value docs could contribute a > 1 1313 count to the dimension count. (Greg Miller) 1314 1315* LUCENE-9967: Do not throw NullPointerException while trying to handle another exception in 1316 ReplicaNode.start (Steven Schlansker) 1317 1318* LUCENE-9991: Fix edge case failure in TestStringValueFacetCounts (Greg Miller) 1319 1320Other 1321--------------------- 1322 1323* LUCENE-9836: Removed the pure Maven build. It is no longer possible to build 1324 artifacts using Maven (this feature was no longer working correctly). Due to 1325 migration to Gradle for Lucene/Solr 9.0, the maintenance of the Maven build 1326 was no longer reasonable. POM files are generated for deployment to Maven 1327 Central only. Please use "ant generate-maven-artifacts" to produce and deploy 1328 artifacts to any repository. (Uwe Schindler, Dawid Weiss) 1329 1330* LUCENE-9836: Migrate Maven tasks to use "maven-resolver-ant-tasks" 1331 instead of the no longer maintained "maven-ant-tasks". (Uwe Schindler) 1332 1333* LUCENE-9985: Upgrade jetty to 9.4.41 (janhoy) 1334 1335* LUCENE-9976: Fix WANDScorer assertion error. (Zach Chen, Adrien Grand, Dawid Weiss) 1336======================= Lucene 8.8.2 ======================= 1337 1338Bug Fixes 1339--------------------- 1340 1341* LUCENE-9870: Fix Circle2D intersectsLine t-value (distance) range clamp (Jørgen Nystad) 1342 1343* LUCENE-9744: NPE on a degenerate query in MinimumShouldMatchIntervalsSource 1344 $MinimumMatchesIterator.getSubMatches(). (Alan Woodward) 1345 1346* LUCENE-9762: DoubleValuesSource.fromQuery (also used by FunctionScoreQuery.boostByQuery) could 1347 throw an exception when the query implements TwoPhaseIterator and when the score is requested 1348 repeatedly. (David Smiley, hossman) 1349 1350======================= Lucene 8.8.1 ======================= 1351 1352Bug Fixes 1353--------------------- 1354(No changes) 1355 1356======================= Lucene 8.8.0 ======================= 1357 1358New Features 1359--------------------- 1360 1361* LUCENE-9552: New LatLonPoint query that accepts an array of LatLonGeometries. (Ignacio Vera) 1362 1363* LUCENE-9641: LatLonPoint query support for spatial relationships. (Ignacio Vera) 1364 1365* LUCENE-9553: New XYPoint query that accepts an array of XYGeometries. (Ignacio Vera) 1366 1367* LUCENE-9378: Doc values now allow configuring how to trade compression for 1368 retrieval speed. (Adrien Grand) 1369 1370* LUCENE-9413: Add CJKWidthCharFilter and its factory (Tomoko Uchida) 1371 1372Improvements 1373--------------------- 1374 1375* LUCENE-9455: ExitableTermsEnum should sample timeout and interruption 1376 check before calling next(). (Zach Chen via Bruno Roustant) 1377 1378* LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when the 1379 provided min is 1. (Jim Ferenczi) 1380 1381* LUCENE-9675: Binary doc values fields now expose their configured compression mode 1382 in the attributes of the field info. (Jim Ferenczi) 1383 1384Optimizations 1385--------------------- 1386 1387* LUCENE-9536: Reduced memory usage for OrdinalMap when a segment has all 1388 values. (Julie Tibshirani via Adrien Grand) 1389 1390* LUCENE-9021: QueryParser: re-use the LookaheadSuccess exception. (Przemek Bruski via Mikhail Khludnev) 1391 1392* LUCENE-9636: Faster decoding of postings for some numbers of bits per value. 1393 (Guo Feng via Adrien Grand) 1394 1395* LUCENE-9346: WANDScorer now supports queries that have a 1396 `minimumNumberShouldMatch` configured. (Xi Zachary Chen via Adrien Grand) 1397 1398Bug Fixes 1399--------------------- 1400 1401* LUCENE-9508: DocumentsWriter was only stalling threads for 1 second allowing 1402 documents to be indexed even the DocumentsWriter wasn't able to keep up flushing. 1403 Unless IW can't make progress due to an ill behaving DWPT this issue was barely 1404 noticeable. (Simon Willnauer) 1405 1406* LUCENE-9581: Japanese tokenizer should discard the compound token instead of disabling the decomposition 1407 of long tokens when discardCompoundToken is activated. (Jim Ferenczi) 1408 1409* LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic. 1410 (Ignacio Vera) 1411 1412* LUCENE-9606: Wrap boolean queries generated by shape fields with a Constant score query. (Ignacio Vera) 1413 1414* LUCENE-9635: BM25FQuery - Mask encoded norm long value in array lookup. 1415 (Yilun Cui) 1416 1417* LUCENE-9617: Fix per-field memory leak in IndexWriter.deleteAll(). Reset next available internal 1418 field number to 0 on FieldInfos.clear(), to avoid wasting FieldInfo references. (Michael Froh) 1419 1420* LUCENE-9642: When encoding triangles in ShapeField, make sure generated triangles are CCW by rotating 1421 triangle points before checking triangle orientation. (Ignacio Vera) 1422 1423* LUCENE-9661: Fix deadlock in TermsEnum.EMPTY that occurs when trying to initialize TermsEnum and BaseTermsEnum 1424 at the same time (Namgyu Kim) 1425 1426Other 1427--------------------- 1428 1429* SOLR-14995: Update Jetty to 9.4.34 (Mike Drob) 1430 1431* LUCENE-9637: Removes some unused code and replaces the Point implementation on ShapeField/ShapeQuery 1432 random tests. (Ignacio Vera) 1433 1434======================= Lucene 8.7.0 ======================= 1435 1436API Changes 1437--------------------- 1438 1439* LUCENE-9437: Lucene's facet module's DocValuesOrdinalsReader.decode method 1440 is now public, making it easier for applications to decode facet 1441 ordinals into their corresponding labels (Ankur Goel) 1442 1443* LUCENE-9515: IndexingChain now accepts individual primitives rather than a 1444 DocumentsWriterPerThread instance in order to create a new DocConsumer. 1445 (Simon Willnauer) 1446 1447New Features 1448--------------------- 1449 1450* LUCENE-9386: RegExpQuery added case insensitive matching option. (Mark Harwood) 1451 1452* LUCENE-8962: Add IndexWriter merge-on-refresh feature to selectively merge 1453 small segments on getReader, subject to a configurable timeout, to improve 1454 search performance by reducing the number of small segments for searching. (Simon Willnauer) 1455 1456* LUCENE-9484: Allow sorting an index after it was created. With SortingCodecReader, existing 1457 unsorted segments can be wrapped and merged into a fresh index using IndexWriter#addIndices 1458 API. (Simon Willnauer, Adrien Grand) 1459 1460* LUCENE-9444: Add utility class to retrieve facet labels from the 1461 taxonomy index for a facet field so such fields do not also have to 1462 be redundantly stored (Ankur Goel) 1463 1464Improvements 1465--------------------- 1466 1467* LUCENE-8574: Add a new ExpressionValueSource which will enforce only one value per name 1468 per hit in dependencies, ExpressionFunctionValues will no longer 1469 recompute already computed values (Patrick Zhai) 1470 1471* LUCENE-9416: Fix CheckIndex to print an invalid non-zero norm as 1472 unsigned long when detecting corruption. 1473 1474* LUCENE-9440: FieldInfo#checkConsistency called twice from Lucene50(60)FieldInfosFormat#read; 1475 Removed the (redundant?) assert and do these checks for real. (Yauheni Putsykovich) 1476 1477* LUCENE-9446: In BooleanQuery rewrite, always remove MatchAllDocsQuery filter clauses 1478 when possible. (Julie Tibshirani) 1479 1480* LUCENE-9501: Improve coverage for Asserting* test classes: make sure to handle singleton doc 1481 values, and sometimes exercise Weight#scorer instead of Weight#bulkScorer for top-level 1482 queries. (Julie Tibshirani) 1483 1484* LUCENE-9511: Include StoredFieldsWriter in DWPT accounting to ensure that it's 1485 heap consumption is taken into account when IndexWriter stalls or should flush 1486 DWPTs. (Simon Willnauer) 1487 1488* LUCENE-9514: Include TermVectorsWriter in DWPT accounting to ensure that it's 1489 heap consumption is taken into account when IndexWriter stalls or should flush 1490 DWPTs. (Simon Willnauer) 1491 1492* LUCENE-9523: In query shapes over shape fields, skip points while traversing the 1493 BKD tree when the relationship with the document is already known. (Ignacio Vera) 1494 1495* LUCENE-9539: Use more compact datastructures to represent sorted doc-values in memory when 1496 sorting a segment before flush and in SortingCodecReader. (Simon Willnauer) 1497 1498* LUCENE-9458: WordDelimiterGraphFilter should order tokens at the same position by endOffset to 1499 emit longer tokens first. The same graph is produced. (David Smiley) 1500 1501Optimizations 1502--------------------- 1503 1504* LUCENE-9395: ConstantValuesSource now shares a single DoubleValues 1505 instance across all segments (Tony Xu) 1506 1507* LUCENE-9447, LUCENE-9486: Stored fields now get higer compression ratios on 1508 highly compressible data. (Adrien Grand) 1509 1510* LUCENE-9373: FunctionMatchQuery now accepts a "matchCost" optimization hint. 1511 (Maxim Glazkov, David Smiley) 1512 1513* LUCENE-9510: Indexing with an index sort is now faster by not compressing 1514 temporary representations of the data. (Adrien Grand) 1515 1516Bug Fixes 1517--------------------- 1518 1519* LUCENE-9427: Fix a regression where the unified highlighter didn't produce 1520 highlights on fuzzy queries that correspond to exact matches. (Julie Tibshirani) 1521 1522* LUCENE-9467: Fix NRTCachingDirectory to use Directory#fileLength to check if a file 1523 already exists instead of opening an IndexInput on the file which might throw a AccessDeniedException 1524 in some Directory implementations. (Simon Willnauer) 1525 1526* LUCENE-9501: Fix a bug in IndexSortSortedNumericDocValuesRangeQuery where it could violate the 1527 DocIdSetIterator contract. (Julie Tibshirani) 1528 1529* LUCENE-9401: Include field in ComplexPhraseQuery's toString() (Thomas Hecker via Munendra S N) 1530 1531* LUCENE-9578: Fix TermRangeQuery when there is no upper bound and the lower 1532 bound is the empty string excluded. This would previously match no strings at 1533 all while it should match all non-empty strings. 1534 (Christoph Buescher via Adrien Grand) 1535 1536* LUCENE-9524: Fix NPE in SpanWeight#explain when no scoring is required and 1537 SpanWeight has null Similarity.SimScorer. (Zach Chen) 1538 1539Documentation 1540--------------------- 1541 1542* LUCENE-9424: Add a performance warning to AttributeSource.captureState javadocs (Patrick Zhai) 1543 1544Changes in Runtime Behavior 1545--------------------- 1546 1547* LUCENE-9539: SortingCodecReader now doesn't cache doc values fields anymore. Previously, SortingCodecReader 1548 used to cache all doc values fields after they were loaded into memory. This reader should only be used 1549 to sort segments after the fact using IndexWriter#addIndices. (Simon Willnauer) 1550 1551 1552Other 1553--------------------- 1554 1555* LUCENE-9292: Refactor BKD point configuration into its own class. (Ignacio Vera) 1556 1557* LUCENE-9470: Make TestXYMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera) 1558 1559* LUCENE-9512: Move LockFactory stress test to be a unit/integration 1560 test. (Uwe Schindler, Dawid Weiss, Robert Muir) 1561 1562Build 1563 1564* Upgrade forbiddenapis to version 3.1. (Uwe Schindler) 1565 1566======================= Lucene 8.6.3 ======================= 1567 1568Bug Fixes 1569--------------------- 1570(No changes) 1571 1572======================= Lucene 8.6.2 ======================= 1573 1574Bug Fixes 1575--------------------- 1576* LUCENE-9478: Prevent DWPTDeleteQueue from referencing itself and leaking memory. The queue 1577 passed an implicit this reference to the next queue instance on flush which leaked about 500byte 1578 of memory on each full flush, commit or getReader call. (Simon Willnauer) 1579 1580======================= Lucene 8.6.1 ======================= 1581 1582Bug Fixes 1583--------------------- 1584* LUCENE-9443: The UnifiedHighlighter was closing the underlying reader when there were multiple term-vector fields. 1585 This was a regression in 8.6.0. (David Smiley, Chris Beer) 1586 1587======================= Lucene 8.6.0 ======================= 1588 1589API Changes 1590--------------------- 1591 1592* LUCENE-9265: SimpleFSDirectory is deprecated in favor of NIOFSDirectory. (Yannick Welsch) 1593 1594* LUCENE-9304: Removed ability to set DocumentsWriterPerThreadPool on IndexWriterConfig. 1595 The DocumentsWriterPerThreadPool is a packaged protected final class which made it impossible 1596 to customize. (Simon Willnauer) 1597 1598* LUCENE-9339: MergeScheduler#merge doesn't accept a parameter if a new merge was found anymore. 1599 (Simon Willnauer) 1600 1601* LUCENE-9330: SortFields are now responsible for writing themselves into index headers if they 1602 are used as index sorts. (Alan Woodward, Uwe Schindler, Adrien Grand) 1603 1604* LUCENE-9340: Deprecate SimpleBindings#add(SortField). (Alan Woodward) 1605 1606* LUCENE-9345: MergeScheduler is now decoupled from IndexWriter. Instead it accepts a MergeSource 1607 interface that offers the basic methods to acquire pending merges, run the merge and do accounting 1608 around it. (Simon Willnauer) 1609 1610* LUCENE-9349: QueryVisitor.consumeTermsMatching() now takes a 1611 Supplier<ByteRunAutomaton> to enable queries that build large automata to 1612 provide them lazily. TermsInSetQuery switches to using this method 1613 to report matching terms. (Alan Woodward) 1614 1615* LUCENE-9366: DocValues.emptySortedNumeric() not longer takes a maxDoc parameter 1616 (Alan Woodward) 1617 1618* LUCENE-7822: CodecUtil#checkFooter(IndexInput, Throwable) now throws a 1619 CorruptIndexException if checksums mismatch or if checksums can't be verified. 1620 (Martin Amirault, Adrien Grand) 1621 1622New Features 1623--------------------- 1624 1625* LUCENE-7889: Grouping by range based on values from DoubleValuesSource and LongValuesSource 1626 (Alan Woodward) 1627 1628* LUCENE-8962: Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, 1629 subject to a configurable timeout, to improve search performance by reducing the number of small 1630 segments for searching (Michael Froh, Mike Sokolov, Mike Mccandless, Simon Willnauer) 1631 1632Improvements 1633--------------------- 1634* LUCENE-9276: Use same code-path for updateDocuments and updateDocument in IndexWriter and 1635 DocumentsWriter. (Simon Willnauer) 1636 1637* LUCENE-9279: Update dictionary version for Ukrainian analyzer to 4.9.1 (Andriy Rysin via Dawid Weiss) 1638 1639* LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values. 1640 (David Smiley, Juan Rodriguez) 1641 1642* LUCENE-9304: Removed ThreadState abstraction from DocumentsWriter which allows pooling of DWPT directly and 1643 improves the approachability of the IndexWriter code. (Simon Willnauer) 1644 1645* LUCENE-9324: Add an ID to SegmentCommitInfo in order to compare commits for equality and make 1646 snapshots incremental on generational files. (Simon Willnauer, Mike Mccandless, Adrien Grand) 1647 1648* LUCENE-9342: TotalHits' relation will be EQUAL_TO when the number of hits is lower than TopDocsColector's numHits 1649 (Tomás Fernández Löbbe) 1650 1651* LUCENE-9353: Metadata of the terms dictionary moved to its own file, with the 1652 `.tmd` extension. This allows checksums of metadata to be verified when 1653 opening indices and helps save seeks when opening an index. (Adrien Grand) 1654 1655* LUCENE-9359: SegmentInfos#readCommit now always returns a 1656 CorruptIndexException if the content of the file is invalid. (Adrien Grand) 1657 1658* LUCENE-9393: Make FunctionScoreQuery use ScoreMode.COMPLETE for creating the inner query weight when 1659 ScoreMode.TOP_DOCS is requested. (Tomás Fernández Löbbe) 1660 1661* LUCENE-9392: Make FacetsConfig.DELIM_CHAR publicly accessible (Ankur Goel) 1662 1663* LUCENE-9397: UniformSplit supports encodable fields metadata. (Bruno Roustant) 1664 1665* LUCENE-9396: Improved truncation detection for points. (Adrien Grand, Robert Muir) 1666 1667* LUCENE-9402: Let MultiCollector handle minCompetitiveScore (Tomás Fernández Löbbe, Adrien Grand) 1668 1669Optimizations 1670--------------------- 1671 1672* LUCENE-9254: UniformSplit keeps FST off-heap. (Bruno Roustant) 1673 1674* LUCENE-8103: DoubleValuesSource and QueryValueSource now use a TwoPhaseIterator if one is provided by the Query. 1675 (Michele Palmia, David Smiley) 1676 1677* LUCENE-9287: UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery. (Ignacio Vera) 1678 1679* LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster. 1680 (Bruno Roustant) 1681 1682* LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects (Erick Erickson) 1683 1684* LUCENE-9273: Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic 1685 relate method for all relations, we use specialize methods for each one. In addition, the type of triangle is 1686 computed at deserialization time, therefore we can be more selective when decoding points of a triangle. 1687 (Ignacio Vera) 1688 1689* LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512. 1690 (Ignacio Vera) 1691 1692* LUCENE-9148: Points now write their index in a separate file. (Adrien Grand) 1693 1694Bug Fixes 1695--------------------- 1696* LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option (Paul Pazderski) 1697 1698* LUCENE-8849: DocValuesRewriteMethod.visit wasn't visiting its embedded query (Michele Palmia, David Smiley) 1699 1700* LUCENE-9258: DocTermsIndexDocValues assumed it was operating on a SortedDocValues (single valued) field when 1701 it could be multi-valued used with a SortedSetSelector (Michele Palmia) 1702 1703* LUCENE-9164: Ensure IW processes all internal events before it closes itself on a rollback. 1704 (Simon Willnauer, Nhat Nguyen, Dawid Weiss, Mike Mccandless) 1705 1706* LUCENE-8908: Return default value from objectVal when doc doesn't match the query in QueryValueSource 1707 (Bill Bell, hossman, Munendra S N, Michele Palmia) 1708 1709* LUCENE-9133: Fix for potential NPE in TermFilteredPresearcher for empty fields (Marvin Justice via Mike Drob) 1710 1711* LUCENE-9309: Wait for #addIndexes merges when aborting merges. (Simon Willnauer) 1712 1713* LUCENE-9337: Ensure CMS updates it's thread accounting datastructures consistently. 1714 CMS today releases it's lock after finishing a merge before it re-acquires it to update 1715 the thread accounting datastructures. This causes threading issues where concurrently 1716 finishing threads fail to pick up pending merges causing potential thread starvation on 1717 forceMerge calls. (Simon Willnauer) 1718 1719* LUCENE-9314: Single-document monitor runs were using the less efficient MultiDocumentBatch 1720 implementation. (Pierre-Luc Perron, Alan Woodward) 1721 1722* LUCENE-9362: Fix equality check in ExpressionValueSource#rewrite. This fixes rewriting of inner value sources. 1723 (Dmitry Emets) 1724 1725* LUCENE-9405: IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted. 1726 (Michael Froh, Simon Willnauer, Mike Mccandless, Mike Sokolov) 1727 1728* LUCENE-9400: Tessellator might build illegal polygons when several holes share the shame vertex. (Ignacio Vera) 1729 1730* LUCENE-9417: Tessellator might build illegal polygons when several holes share are connected to the same 1731 vertex. (Ignacio Vera) 1732 1733* LUCENE-9418: Fix ordered intervals over interleaved terms (Alan Woodward) 1734 1735Other 1736--------------------- 1737 1738* LUCENE-9257: Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed. (Bruno Roustant) 1739 1740* LUCENE-9272: Checksums of the terms index are now verified when 1741 LeafReader#checkIntegrity is called rather than when opening the index. 1742 (Adrien Grand) 1743 1744* LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder. (Namgyu Kim) 1745 1746* LUCENE-9275: Make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera) 1747 1748* LUCENE-9244: Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point 1749 is shared by multiple leaves. (Ignacio Vera) 1750 1751* LUCENE-9271: ByteBufferIndexInput was refactored to work on top of the 1752 ByteBuffer API. (Adrien Grand) 1753 1754* LUCENE-9191: Make LineFileDocs's random seeking more efficient, making tests using LineFileDocs faster (Robert Muir, 1755 Mike McCandless) 1756 1757* LUCENE-9338: Refactors SimpleBindings to improve type safety and cycle detection (Alan Woodward, 1758 Adrien Grand) 1759 1760* LUCENE-9358: Change the way the multi-dimensional BKD tree builder generates the intermediate tree representation to be 1761 equal to the one dimensional case to avoid unnecessary tree and leaves rotation. (Ignacio Vera) 1762 1763* LUCENE-9288: poll_mirrors.py release script can handle HTTPS mirrors. (Ignacio Vera) 1764 1765* LUCENE-9232: Fix or suppress 13 resource leak precommit warnings in lucene/replicator (Andras Salamon via Erick Erickson) 1766 1767* LUCENE-9398: Always keep BKD index off-heap. BKD reader does not implement Accountable any more. (Ignacio Vera) 1768 1769Build 1770 1771* Upgrade forbiddenapis to version 3.0.1. (Uwe Schindler) 1772 1773* LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search 1774 (Andras Salamon via Erick Erickson) 1775 1776* LUCENE-9380: Fix auxiliary class warnings in Lucene (Erick Erickson) 1777 1778* LUCENE-9389: Enhance gradle logging calls validation: eliminate getMessage() (Andras Salamon via Erick Erickson) 1779 1780======================= Lucene 8.5.2 ======================= 1781 1782Optimizations 1783--------------------- 1784 1785* LUCENE-9350: Partial reversion of LUCENE-9068; holding levenshtein automata on FuzzyQuery can end 1786 up blowing up query caches which use query objects as cache keys, so building the automata is 1787 now delayed to search time again. (Alan Woodward, Mike Drob) 1788 1789======================= Lucene 8.5.1 ======================= 1790 1791Bug Fixes 1792--------------------- 1793 1794* LUCENE-9300: Fix corruption of the new gen field infos when doc values updates are applied on a segment created 1795 externally and added to the index with IndexWriter#addIndexes(Directory). (Jim Ferenczi, Adrien Grand) 1796 1797======================= Lucene 8.5.0 ======================= 1798 1799API Changes 1800--------------------- 1801 1802* LUCENE-9093: Not an API change but a change in behavior of the UnifiedHighlighter's LengthGoalBreakIterator that will 1803 yield Passages sized a little different due to the fact that the sizing pivot is now the center of the first match and 1804 not its left edge. 1805 1806* LUCENE-9116: PostingsWriterBase and PostingsReaderBase no longer support 1807 setting a field's metadata via a `long[]`. (Adrien Grand) 1808 1809* LUCENE-9116: The FSTOrd postings format has been removed. 1810 (Adrien Grand) 1811 1812* LUCENE-8369: Remove obsolete spatial module. (Nick Knize, David Smiley) 1813 1814* LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core. (Nick Knize) 1815 1816* LUCENE-9218: XY geometries API works in float space. (Ignacio Vera) 1817 1818* LUCENE-9212: Intervals.multiterm() takes CompiledAutomaton rather than plain Automaton 1819 (Alan Woodward) 1820 1821* LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d. (Nick Knize) 1822 1823* LUCENE-9171: QueryBuilder.newTermQuery() and .newSynonymQuery() now take boost parameters. 1824 (Alessandro Benedetti, Alan Woodward) 1825 1826New Features 1827--------------------- 1828 1829* LUCENE-8903: Add LatLonShape and XYShape point query. (Ignacio Vera) 1830 1831* LUCENE-8707: Add LatLonShape and XYShape distance query. (Ignacio Vera) 1832 1833* LUCENE-9238: New XYPointField field and Queries for indexing, searching and sorting 1834 cartesian points. (Ignacio Vera) 1835 1836Improvements 1837--------------------- 1838 1839* LUCENE-9149: Increase data dimension limit in BKD. (Nick Knize) 1840 1841* LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. (Andy Webb via Bruno Roustant) 1842 1843* LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials (Nándor Mátravölgyi) 1844 1845* LUCENE-9105: UniformSplit postings format detects corrupted index and better handles IO exceptions. (Bruno Roustant) 1846 1847* LUCENE-9106: UniformSplit postings format allows extension of block/line serializers. (Bruno Roustant) 1848 1849* LUCENE-9093: UnifiedHighlighter's LengthGoalBreakIterator has a new fragmentAlignment option to better center the 1850 first match in the passage. Also the sizing point now pivots at the center of the first match term and not its left 1851 edge. This yields Passages that won't be identical to the previous behavior. (Nándor Mátravölgyi, David Smiley) 1852 1853* LUCENE-9153: Allow WhitespaceAnalyzer to set a maxTokenLength other than the default of 255 1854 (Alan Woodward) 1855 1856* LUCENE-9152: Improve line intersections with polygons when they are touching from the outside. (Ignacio Vera) 1857 1858* LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option that controls whether 1859 the tokenizer emits original (compound) tokens when the mode is not NORMAL. (Kazuaki Hiraga via Tomoko Uchida) 1860 1861* LUCENE-9253: KoreanTokenizer now supports custom dictionaries(system, unknown). (Namgyu Kim) 1862 1863* LUCENE-9171: QueryBuilder can now use BoostAttributes on input token streams to selectively 1864 boost particular terms or synonyms in parsed queries. (Alessandro Benedetti, Alan Woodward) 1865 1866* LUCENE-9298: Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared. (Yu Binglei, Simon Willnauer) 1867 1868Optimizations 1869--------------------- 1870 1871* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood) 1872 1873* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand) 1874 1875* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a 1876 single field to the same value. This optimization can reduce the flush time by around 1877 20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer) 1878 1879* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir) 1880 1881* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant) 1882 1883* LUCENE-9260: LeafReader#checkIntegrity verifies checksums of CFS files. 1884 (Adrien Grand) 1885 1886* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob) 1887 1888* LUCENE-9113: Faster merging of SORTED/SORTED_SET doc values. (Adrien Grand) 1889 1890* LUCENE-9125: Optimize Automaton.step() with binary search and introduce Automaton.next(). (Bruno Roustant) 1891 1892* LUCENE-9147: The index of stored fields and term vectors in now off-heap. 1893 (Adrien Grand) 1894 1895Bug Fixes 1896--------------------- 1897 1898* LUCENE-9084: Fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester (Paul Ward) 1899 1900* LUCENE-9115: NRTCachingDirectory no longer caches files of unknown size. 1901 (Adrien Grand) 1902 1903* LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. 1904 (Ignacio Vera) 1905 1906* LUCENE-9135: Make UniformSplit FieldMetadata counters long. (Bruno Roustant) 1907 1908* LUCENE-9200: Fix TieredMergePolicy to use double (not float) math to make its merging decisions, fixing 1909 a corner-case bug uncovered by fun randomized tests (Robert Muir, Mike McCandless) 1910 1911* LUCENE-9099: Unordered and Ordered interval queries now correctly handle 1912 repeated subterms - ordered intervals could supply an 'extra' minimized 1913 interval, resulting in odd matches when combined with eg CONTAINS queries; 1914 and unordered intervals would match duplicate subterms on the same position, 1915 so an query for UNORDERED(foo, foo) would match a document containing 'foo' 1916 only once. (Alan Woodward) 1917 1918* LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. (Ignacio Vera) 1919 1920* LUCENE-9243: Add fudge factor when creating a bounding box of a XYCircle. (Ignacio Vera) 1921 1922* LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (Ignacio Vera) 1923 1924* LUCENE-9251: Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon 1925 were bot filtered out properly. (Ignacio Vera) 1926 1927* LUCENE-9263: Fix wrong transformation of distance in meters to radians in Geo3DPoint. (Ignacio Vera) 1928 1929Other 1930--------------------- 1931 1932* LUCENE-9109: Backport some changes from master (except StackWalker) to improve 1933 TestSecurityManager (Uwe Schindler) 1934 1935* LUCENE-9110: Backport refactored stack analysis in tests to use generalized 1936 LuceneTestCase methods (Uwe Schindler) 1937 1938* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are 1939 executed with input objects that extend such interface. (Ignacio Vera) 1940 1941* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are 1942 executed with input objects that extend such interface. (Ignacio Vera) 1943 1944* LUCENE-9096: Simplification of CompressingTermVectorsWriter#flushOffsets. 1945 (kkewwei via Adrien Grand) 1946 1947* LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection. (Ignacio Vera) 1948 1949======================= Lucene 8.4.1 ======================= 1950 1951Bug Fixes 1952--------------------- 1953(No changes) 1954 1955======================= Lucene 8.4.0 ======================= 1956 1957API Changes 1958 1959* LUCENE-9029: Deprecate SloppyMath toRadians/toDegrees in favor of Java Math. 1960 (Jack Conradson via Adrien Grand) 1961 1962New Features 1963 1964* LUCENE-8620: Add CONTAINS support for LatLonShape and XYShape. (Ignacio Vera) 1965 1966Improvements 1967 1968* LUCENE-9002: Skip costly caching clause in LRUQueryCache if it makes the query 1969 many times slower. (Guoqiang Jiang) 1970 1971* LUCENE-9006: WordDelimiterGraphFilter's catenateAll token is now ordered before any token parts, like WDF did. 1972 (David Smiley) 1973 1974* LUCENE-9028: introducing Intervals.multiterm() (Mikhail Khludnev) 1975 1976* LUCENE-9018: ConcatenateGraphFilter now has a configurable separator. (Stanislav Mikulchik, David Smiley) 1977 1978* LUCENE-9036: ExitableDirectoryReader may interupt scaning over DocValues (Mikhail Khludnev) 1979 1980* LUCENE-9062: QueryVisitor now has a consumeTermsMatching() method, allowing queries 1981 that match a class of terms to pass a ByteRunAutomaton matching those that class 1982 back to the visitor. (Alan Woodward, David Smiley) 1983 1984* LUCENE-9073: IntervalQuery to respond field on toString() and explain() (Mikhail Khludnev) 1985 1986Optimizations 1987 1988* LUCENE-8928: When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits 1989 to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4. 1990 (Ignacio Vera, Adrien Grand) 1991 1992* BaseDirectoryReader no longer sums up the `LeafReader#numDocs` of its leaves 1993 eagerly. This especially helps when creating views of readers that hide 1994 documents, since computing the number of live documents is an expensive 1995 operation. (Adrien Grand) 1996 1997* LUCENE-8992: TopFieldCollector and TopScoreDocCollector can now share minimum scores across leaves 1998 concurrently. (Adrien Grand, Atri Sharma, Jim Ferenczi) 1999 2000* LUCENE-8932: BKDReader's index is now stored off-heap when the IndexInput is 2001 an instance of ByteBufferIndexInput. (Jack Conradson via Adrien Grand) 2002 2003* LUCENE-9024: IntroSelector now falls back to the median of medians algorithm 2004 instead of sorting when the maximum recursion level is exceeded, providing 2005 better worst-case runtime. (Paul Sanwald via Adrien Grand) 2006 2007* LUCENE-8920: The denser arcs of FST now index labels with a bitset in order 2008 to provide near constant time access. (Bruno Roustant, Mike Sokolov via Adrien Grand) 2009 2010* LUCENE-9027: Use SIMD instructions to decode postings. (Adrien Grand) 2011 2012* LUCENE-9049: Remove FST cached root arcs now redundant with labels indexed by bitset. 2013 This frees some on-heap FST space. (Jack Conradson via Bruno Roustant) 2014 2015* LUCENE-9045: Do not use TreeMap/TreeSet in BlockTree and PerFieldPostingsFormat. (Bruno Roustant) 2016 2017Bug Fixes 2018 2019* LUCENE-9001: Fix race condition in SetOnce. (Przemko Robakowski) 2020 2021* LUCENE-9030: Fix WordnetSynonymParser behaviour so it behaves similar to 2022 SolrSynonymParser. (Christoph Buescher via Alan Woodward) 2023 2024* LUCENE-9054: Fix reproduceJenkinsFailures.py to not overwrite junit XML files when retrying (hossman) 2025 2026* LUCENE-9031: UnsupportedOperationException on MatchesIterator.getQuery() (Alan Woodward, Mikhail Khludnev) 2027 2028* LUCENE-8996: maxScore was sometimes missing from distributed grouped responses. 2029 (Julien Massenet, Diego Ceccarelli, Munendra S N, Christine Poerschke) 2030 2031* LUCENE-9055: Fix the detection of lines crossing triangles through edge points. 2032 (Ignacio Vera) 2033 2034* LUCENE-9103: Disjunctions can miss some hits in some rare conditions. (Adrien Grand) 2035 2036Other 2037 2038* LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - Part 2 (Koen De Groote) 2039 2040* LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). (Koen De Groote) 2041 2042* LUCENE-8746: Refactor EdgeTree - Introduce a Component tree that represents the tree of components (e.g polygons). 2043 Edge tree is now just a tree of edges. (Ignacio Vera) 2044 2045* LUCENE-9046: Fix wrong example in Javadoc of TermInSetQuery (Namgyu Kim) 2046 2047* LUCENE-8983: Add sandbox PhraseWildcardQuery to control multi-terms expansions in a phrase. (Bruno Roustant) 2048 2049* LUCENE-9067: Polygon2D#contains() is now thread safe. (Ignacio Vera) 2050 2051Build 2052 2053* Upgrade forbiddenapis to version 2.7; upgrade Groovy to 2.4.17. (Uwe Schindler) 2054 2055* LUCENE-9041: Upgrade ecj to 3.19.0 to fix sporadic precommit javadoc issues (Kevin Risden) 2056 2057======================= Lucene 8.3.1 ======================= 2058 2059Bug Fixes 2060 2061* LUCENE-9050: MultiTermIntervalsSource.visit() was not calling back to its 2062 visitor. (Alan Woodward) 2063 2064======================= Lucene 8.3.0 ======================= 2065 2066API Changes 2067 2068* LUCENE-8909: IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316, this 2069 method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method. (Adrien Grand, Munendra S N) 2070 2071* LUCENE-8755: SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class. The quad 2072 and packed quad prefix trees are sensitive to this. It's recommended to pass the version like you 2073 should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions 2074 may be incompatible with older indexes. (Chongchen Chen, David Smiley) 2075 2076* LUCENE-8956: QueryRescorer now only sorts the first topN hits instead of all 2077 initial hits. (Paul Sanwald via Adrien Grand) 2078 2079* LUCENE-8921: IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq. 2080 And don't call if docFreq <= 0. The previous implementation survives as deprecated and final. It's removed in 9.0. 2081 (Bruno Roustant, David Smiley, Alan Woodward) 2082 2083* LUCENE-8990: PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by 2084 the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of 2085 PointValues#estimatePointCount(visitor). (Ignacio Vera, Adrien Grand) 2086 2087New Features 2088 2089* LUCENE-8936: Add SpanishMinimalStemFilter (vinod kumar via Tomoko Uchida) 2090 2091* LUCENE-8764 LUCENE-8945: Add "export all terms and doc freqs" feature to Luke with delimiters. (Leonardo Menezes, Amish Shah via Tomoko Uchida) 2092 2093* LUCENE-8747: Composite Matches from multiple subqueries now allow access to 2094 their submatches, and a new NamedMatches API allows marking of subqueries 2095 and a simple way to find which subqueries have matched on a given document 2096 (Alan Woodward, Jim Ferenczi) 2097 2098* LUCENE-8769: Introduce Range Query For Multiple Connected Ranges (Atri Sharma) 2099 2100* LUCENE-8960: Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField (Ignacio Vera) 2101 2102* LUCENE-8753: New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and 2103 extensibility. New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal 2104 term dictionary across fields. (Bruno Roustant, Juan Rodriguez, David Smiley) 2105 2106Improvements 2107 2108* LUCENE-8874: Show SPI names instead of class names in Luke Analysis tab. (Tomoko Uchida) 2109 2110* LUCENE-8894: Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes. (Tomoko Uchida) 2111 2112* LUCENE-8914: move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor 2113 so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera) 2114 2115* LUCENE-8955: move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor 2116 so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera) 2117 2118* LUCENE-8918: PhraseQuery throws exceptions at construction time if it is passed 2119 null arguments. (Alan Woodward) 2120 2121* LUCENE-8916: GraphTokenStreamFiniteStrings preserves all Token attributes 2122 through its finite strings TokenStreams (Alan Woodward) 2123 2124* LUCENE-8906: Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it. 2125 (Bruno Roustant) 2126 2127* LUCENE-8942: Remove redundant parameters and improve visibility strictness in 2128 LRUQueryCache (Atri Sharma) 2129 2130* SOLR-13663: Introduce <SpanPositionRange> into XML Query Parser (Alessandro Benedetti via Mikhail Khludnev) 2131 2132* LUCENE-8952: Use a sort key instead of true distance in NearestNeighbor (Julie Tibshirani). 2133 2134* LUCENE-8620: Tessellator labels the edges of the generated triangles whether they belong to 2135 the original polygon. This information is added to the triangle encoding. (Ignacio Vera) 2136 2137* LUCENE-8964: Fix geojson shape parsing on string arrays in properties 2138 (Alexander Reelsen) 2139 2140* LUCENE-8976: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor. (Ignacio Vera) 2141 2142* LUCENE-8966: The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters. (Jim Ferenczi) 2143 2144* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (Andy Hind via Anshum Gupta) 2145 2146Optimizations 2147 2148* LUCENE-8922: DisjunctionMaxQuery more efficiently leverages impacts to skip 2149 non-competitive hits. (Adrien Grand) 2150 2151* LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when 2152 the total hits is not requested. (Jim Ferenczi) 2153 2154* LUCENE-8941: Matches on wildcard queries will defer building their full 2155 disjunction until a MatchesIterator is pulled (Alan Woodward) 2156 2157* LUCENE-8755: spatial-extras quad and packed quad prefix trees now index points faster. 2158 (Chongchen Chen, David Smiley) 2159 2160* LUCENE-8860: add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery. 2161 (Igor Motov via Ignacio Vera) 2162 2163* LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries by 2164 doing just one pass whenever possible. (Ignacio Vera) 2165 2166* LUCENE-8939: Introduce shared count based early termination across multiple slices 2167 (Atri Sharma) 2168 2169* LUCENE-8980: Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment. 2170 Large perf gain for ID/time like data when populated sequentially. (Guoqiang Jiang) 2171 2172Bug Fixes 2173 2174* LUCENE-8755: spatial-extras quad and packed quad prefix trees could throw a 2175 NullPointerException for certain cell edge coordinates (Chongchen Chen, David Smiley) 2176 2177* LUCENE-9005: BooleanQuery.visit() would pull subVisitors from its parent visitor, rather 2178 than from a visitor for its own specific query. This could cause problems when BQ was 2179 nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST 2180 subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from 2181 the parent. (Alan Woodward) 2182 2183Other 2184 2185* LUCENE-8778 LUCENE-8911 LUCENE-8957: Define analyzer SPI names as static final fields and document the names in Javadocs. 2186 (Tomoko Uchida, Uwe Schindler) 2187 2188* LUCENE-8758: QuadPrefixTree: removed levelS and levelN fields which weren't used. (Amish Shah) 2189 2190* LUCENE-8975: Code Cleanup: Use entryset for map iteration wherever possible. (Koen De Groote) 2191 2192* LUCENE-8993, LUCENE-8807: Changed all repository and download references in build files 2193 to HTTPS. (Uwe Schindler) 2194 2195* LUCENE-8998: Fix OverviewImplTest.testIsOptimized reproducible failure. (Tomoko Uchida) 2196 2197* LUCENE-8999: LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test 2198 w/o wrapping in a new assertion failure unless the caller has explicitly expected them (hossman) 2199 2200* LUCENE-8062: GlobalOrdinalsWithScoreQuery is no longer eligible for query caching. (Jim Ferenczi) 2201 2202======================= Lucene 8.2.0 ======================= 2203 2204API Changes 2205 2206* LUCENE-8865: IndexSearcher now uses Executor instead of ExecutorSerivce. 2207 This change is fully backwards compatible since ExecutorService directly 2208 implements Executor. (Simon Willnauer) 2209 2210* LUCENE-8856: Intervals queries have moved from the sandbox to the queries 2211 module. (Alan Woodward) 2212 2213* LUCENE-8893: Intervals.wildcard() and Intervals.prefix() methods now take 2214 BytesRef rather than String. (Alan Woodward) 2215 2216New Features 2217 2218* LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian 2219 geometries. (Nick Knize) 2220 2221* LUCENE-8891: Snowball stemmer/analyzer for the Estonian language. 2222 (Gert Morten Paimla via Tomoko Uchida) 2223 2224* LUCENE-8815: Provide a DoubleValues implementation for retrieving the value of features without 2225 requiring a separate numeric field. Note that as feature values are stored with only 8 bits of 2226 mantissa the values returned may have a delta from the original values indexed. 2227 (Colin Goodheart-Smithe via Adrien Grand) 2228 2229* LUCENE-8803: Provide a FeatureSortfield to allow sorting search hits by descending value of a 2230 feature. This is exposed via the factory method FeatureField#newFeatureSort. 2231 (Colin Goodheart-Smithe via Adrien Grand) 2232 2233* LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set 2234 to false (defaults to true). 2235 (Namgyu Kim via Jim Ferenczi) 2236 2237* LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number 2238 and process decimal point. It is similar to the JapaneseNumberFilter. 2239 (Namgyu Kim) 2240 2241* LUCENE-8362: Add doc-value support to range fields. (Atri Sharma via Adrien Grand) 2242 2243* LUCENE-8766: Add monitor subproject (previously Luwak monitoring library). This 2244 allows a stream of documents to be matched against a set of registered queries 2245 in an efficient manner, for use as a monitoring or classification tool. 2246 (Alan Woodward) 2247 2248* LUCENE-7714: Add a numeric range query in sandbox that takes advantage of index sorting. 2249 (Julie Tibshirani via Jim Ferenczi) 2250 2251* LUCENE-8859: The completion suggester's postings format now have an option to 2252 load its internal FST off-heap. (Jim Ferenczi) 2253 2254Bug Fixes 2255 2256* LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode methods. (Ignacio Vera) 2257 2258* LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex 2259 with the polygon. (Ignacio Vera) 2260 2261* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates. 2262 This causes assertion errors and potentially broken field attributes in the IndexWriter when 2263 IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer) 2264 2265* LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances. 2266 (Vamshi Vijay Nakkirtha via Adrien Grand) 2267 2268* LUCENE-8828: Removes the buggy 'disallow overlaps' boolean from Intervals.unordered(), 2269 and replaces it with a new Intervals.unorderedNoOverlaps() method (Alan Woodward) 2270 2271* LUCENE-8843: Don't ignore exceptions that are thrown when trying to open a 2272 file in IOUtils#fsync. (Jason Tedor via Adrien Grand) 2273 2274* LUCENE-8835: FileSwitchDirectory now respects the file extension when listing directory 2275 contents to ensure we don't expose pending deletes if both directory point to the same 2276 underlying filesystem directory. (Simon Willnauer) 2277 2278* LUCENE-8853: FileSwitchDirectory now applies best effort to place tmp files in the same 2279 directory as the target files. (Simon Willnauer) 2280 2281* LUCENE-8892: Add missing closing parentheses in MultiBoolFunction's description() (Florian Diebold, Munendra S N) 2282 2283Improvements 2284 2285* LUCENE-7840: Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier 2286 as opposed to eliminating them during scoring construction. (Atri Sharma via Jim Ferenczi) 2287 2288* LUCENE-8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid 2289 executing the second phase when scorers don't intersect. (Adrien Grand, Jim Ferenczi) 2290 2291* LUCENE-8818: Fix smokeTestRelease.py encoding bug (janhoy) 2292 2293* LUCENE-8845: Allow Intervals.prefix() and Intervals.wildcard() to specify 2294 their maximum allowed expansions (Alan Woodward) 2295 2296* LUCENE-8875: Introduce a Collector optimized for use cases when large 2297 number of hits are requested (Atri Sharma) 2298 2299* LUCENE-8848 LUCENE-7757 LUCENE-8492: The UnifiedHighlighter now detects that parts of the query are not understood by 2300 it, and thus it should not make optimizations that result in no highlights or slow highlighting. This generally works 2301 best for WEIGHT_MATCHES mode. Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser 2302 will now highlight correctly. (David Smiley) 2303 2304* LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps. (Jun Ohtani via Tomoko Uchida) 2305 2306* LUCENE-8855: Add Accountable to some Query implementations (ab, Adrien Grand) 2307 2308Optimizations 2309 2310* LUCENE-8796: Use exponential search instead of binary search in 2311 IntArrayDocIdSet#advance method (Luca Cavanna via Adrien Grand) 2312 2313* LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor. 2314 Now caller threads execute at least one search on an index even if there is 2315 an executor provided to minimize thread context switching. (Simon Willnauer) 2316 2317* LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality. 2318 It stores the distinct values once with the cardinality value reducing the 2319 storage cost. (Ignacio Vera) 2320 2321* LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored 2322 on leaves. (Ignacio Vera) 2323 2324* LUCENE-8896: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) 2325 for several queries. (Ignacio Vera) 2326 2327* LUCENE-8901: Load frequencies lazily only when needed in BlockDocsEnum and 2328 BlockImpactsEverythingEnum (Mayya Sharipova). 2329 2330* LUCENE-8888: Optimize distribution of points with data dimensions in 2331 BKD tree leaves. (Ignacio Vera) 2332 2333* LUCENE-8311: Phrase queries now leverage impacts. (Adrien Grand) 2334 2335Test Framework 2336 2337* LUCENE-8825: CheckHits now display the shard index in case of mismatch 2338 between top hits. (Atri Sharma via Adrien Grand) 2339 2340Other 2341 2342* LUCENE-8847: Code Cleanup: Remove StringBuilder.append with concatenated 2343 strings. (Koen De Groote via Uwe Schindler) 2344 2345* LUCENE-8861: Script to find open Github PRs that needs attention (janhoy) 2346 2347* LUCENE-8852: ReleaseWizard tool for release managers (janhoy) 2348 2349* LUCENE-8838: Remove support for Steiner points on Tessellator. (Ignacio Vera) 2350 2351* LUCENE-8879: Improve BKDRadixSelector tests. (Ignacio Vera) 2352 2353* LUCENE-8886: Fix TestMutablePointsReaderUtils tests. (Ignacio Vera) 2354 2355======================= Lucene 8.1.1 ======================= 2356Improvements 2357 2358* LUCENE-8781: FST lookup performance has been improved in many cases by 2359 encoding Arcs using full-sized arrays with gaps. The new encoding is 2360 enabled for postings in the default codec and for suggesters. (Mike Sokolov) 2361 2362 2363======================= Lucene 8.1.0 ======================= 2364 2365API Changes 2366 2367* LUCENE-3041: A query introspection API has been added. Queries should 2368 implement a visit() method, taking a QueryVisitor, and either pass the 2369 visitor down to any child queries, or call a visitX() or consumeX() method 2370 on it. All locations in the code that called Weight.extractTerms() 2371 have been changed to use this API, and the extractTerms() method has 2372 been deprecated. (Alan Woodward, Simon Willnauer, David Smiley, Luca 2373 Cavanna) 2374 2375* LUCENE-8735: Directory.getPendingDeletions is now abstract to ensure 2376 subclasses override it. FilterDirectory now delegates the call, ensuring 2377 correct default behaviour for subclasses. (Henning Andersen) 2378 2379New Features 2380 2381* LUCENE-2562: The well-known graphical user interface for inspecting Lucene 2382 indexes "Luke" was added as a Lucene module. It can be started from the 2383 binary distribution by calling the shell scripts in the module folder 2384 or from the source checkout by using `ant -f lucene/luke/build.xml run`. 2385 Luke provides a Swing-based user interface and can be used to open 2386 Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index 2387 commits and segments, or test (custom) analyzers. It also has maintenance 2388 functions to check index structures and force merge indexes for archival. 2389 Luke was originally developed by Andrzej Bialecki, later maintained by 2390 Dmitry Kan and finally rewritten by Tomoko Uchida to use the ASF licensing 2391 compatible Swing framework (as shipped with JDKs). 2392 (Tomoko Uchida, Uwe Schindler) 2393 2394Bug fixes 2395 2396* LUCENE-8736: LatLonShapePolygonQuery returns incorrect WITHIN results 2397 with shared boundaries. Point in Polygon now correctly includes boundary 2398 points. Box and Polygon relations with triangles have also been improved to 2399 correctly include boundary points. (Nick Knize) 2400 2401* LUCENE-8712: Polygon2D does not detect crossings through segment edges. 2402 (Ignacio Vera) 2403 2404* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int 2405 overflow bug that disabled cleaning of the cache (Russell A Brown) 2406 2407* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to 2408 IndexSearcher (Alan Woodward, Yury Pakhomov) 2409 2410* LUCENE-8719: FixedShingleFilter can miss shingles at the end of a token stream if 2411 there are multiple paths with different lengths. (Alan Woodward) 2412 2413* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the 2414 cheapest merges that allow the index to go down to `maxSegmentCount` segments 2415 or less. (Armin Braun via Adrien Grand) 2416 2417* LUCENE-8477: Interval disjunctions could miss valid hits if some of the 2418 clauses of the disjunction are minimized away. We now rewrite intervals 2419 if a source contains a disjunction and the internal gaps matter for 2420 matching. This behaviour can be disabled if users are more interested 2421 in speed rather than accuracy of matching. (Alan Woodward, Jim Ferenczi) 2422 2423* LUCENE-8741: ValueSource.fromDoubleValuesSource() was casting to 2424 Scorer instead of Scorable, leading to ClassCastExceptions (Markus Jelsma, 2425 Alan Woodward) 2426 2427* LUCENE-8754: Fix ConcurrentModificationException in SegmentInfo if 2428 attributes are accessed in MergePolicy while the merge is running (Simon Willnauer) 2429 2430* LUCENE-8765: Fixed validation of the number of added points in KD trees. 2431 (Zhao Yang via Adrien Grand) 2432 2433Improvements 2434 2435* LUCENE-8673: Use radix partitioning when merging dimensional points instead 2436 of sorting all dimensions before hand. (Ignacio Vera, Adrien Grand) 2437 2438* LUCENE-8687: Optimise radix partitioning for points on heap. (Ignacio Vera) 2439 2440* LUCENE-8699: Change HeapPointWriter to use a single byte array instead to a list 2441 of byte arrays. In addition a new interface PointValue is added to abstract out 2442 the different formats between offline and on-heap writers. (Ignacio Vera) 2443 2444* LUCENE-8703: Build point writers in the BKD tree only when they are needed. 2445 (Ignacio Vera) 2446 2447* LUCENE-8652: SynonymQuery can now deboost the document frequency of each term when 2448 blending the score of the synonym. (Jim Ferenczi) 2449 2450* LUCENE-8631: The Korean's user dictionary now picks the longest-matching word and discards 2451 the other matches. (Yeongsu Kim via Jim Ferenczi) 2452 2453* LUCENE-8732: ConstantScoreQuery can now early terminate the query if the minimum score is 2454 greater than the constant score and total hits are not requested. (Jim Ferenczi) 2455 2456* LUCENE-8750: Implements setMissingValue() on sort fields produced from 2457 DoubleValuesSource and LongValuesSource (Mike Sokolov via Alan Woodward) 2458 2459* LUCENE-8701: ToParentBlockJoinQuery now creates a child scorer that disallows skipping over 2460 non-competitive documents if the score of a parent depends on the score of multiple 2461 children (avg, max, min). Additionally the score mode `none` that assigns a constant score to 2462 each parent can early terminate top scores's collection. (Jim Ferenczi) 2463 2464* LUCENE-8751: Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1 2465 (single document). (Jim Ferenczi) 2466 2467* LUCENE-8752: Japanese new era name '令和' (Reiwa) is added to the dictionary used in 2468 JapaneseTokenizer so that the analyzer handles the era name correctly. 2469 Reiwa is set to replace the Heisei Era on May 1, 2019. (Tomoko Uchida) 2470 2471* LUCENE-8671: Introduced reader attributes allows a per IndexReader configuration 2472 of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a 2473 per field basis (Simon Willnauer) 2474 2475* LUCENE-8787: spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3 2476 digits of milliseconds precision but now parses other lengths (although > 3 not used). 2477 (Thomas Lemmé via David Smiley) 2478 2479Changes in Runtime Behavior 2480 2481* LUCENE-8671: Load FST off-heap also for ID-like fields if reader is not opened 2482 from an IndexWriter. (Simon Willnauer) 2483 2484* LUCENE-8730: WordDelimiterGraphFilter always emits its original token first. This 2485 brings its behaviour into line with the deprecated WordDelimiterFilter, so that 2486 the only difference in output between the two is in the position length 2487 attribute. (Alan Woodward, Jim Ferenczi) 2488 2489* LUCENE-7386: Disjunctions nested in disjunctions are now flattened. This might 2490 trigger changes in the produced scores due to changes to the order in which 2491 scores of sub clauses are summed up. (Adrien Grand) 2492 2493* LUCENE-8756: MoreLikeThisQuery now respects custom term frequencies 2494 (TermFrequencyAttribute) at search time (Olli Kuonanoja) 2495 2496Other 2497 2498* LUCENE-8680: Refactor EdgeTree#relateTriangle method. (Ignacio Vera) 2499 2500* LUCENE-8685: Refactor LatLonShape tests. (Ignacio Vera) 2501 2502* LUCENE-8713: Add Line2D tests. (Ignacio Vera) 2503 2504* LUCENE-8729: Workaround: Disable accessibility doclints (Java 13+), 2505 so compilation with recent JDK succeeds. (Uwe Schindler) 2506 2507* LUCENE-8725: Make TermsQuery.SeekingTermSetTermsEnum a top level class and public (noble) 2508 2509======================= Lucene 8.0.0 ======================= 2510 2511API Changes 2512 2513* LUCENE-8662: TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef) 2514 in FilterLeafReader.FilterTermsEnum. (Jeffery Yuan via Tomás Fernández Löbbe, Simon Willnauer) 2515 2516* LUCENE-8469: Deprecated StringHelper.compare has been removed. (Dawid Weiss) 2517 2518* LUCENE-8039: Introduce a "delta distance" method set to GeoDistance. This 2519 allows distance calculations, especially for paths, to take into account an 2520 "excursion" to include the specified point. 2521 2522* LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are 2523 now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq() 2524 and Terms.getSumTotalTermFreq() are now required: if frequencies are not 2525 stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(), 2526 respectively, because all freq() values equal 1. (Adrien Grand, Robert Muir) 2527 2528* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan 2529 Woodward) 2530 2531* LUCENE-8014: Similarity.computeSlopFactor() and 2532 Similarity.computePayloadFactor() have been removed (Alan Woodward) 2533 2534* LUCENE-7996: Queries are now required to produce positive scores. 2535 (Adrien Grand) 2536 2537* LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been 2538 removed (Alan Woodward) 2539 2540* LUCENE-8012: Explanation now takes Number rather than float (Alan Woodward, 2541 Robert Muir) 2542 2543* LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document 2544 scoring factors. (Adrien Grand) 2545 2546* LUCENE-8113: TermContext has been renamed to TermStates, and can now be 2547 constructed lazily if term statistics are not required (Alan Woodward) 2548 2549* LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has 2550 been removed (Alan Woodward) 2551 2552* LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings, 2553 MemoryDocValues). (Dawid Weiss) 2554 2555* LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework. 2556 (Nhat Nguyen via Adrien Grand) 2557 2558* LUCENE-8356: StandardFilter and StandardFilterFactory have been removed 2559 (Alan Woodward) 2560 2561* LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed 2562 (Alan Woodward) 2563 2564* LUCENE-8388: Unused PostingsEnum#attributes() method has been removed 2565 (Alan Woodward) 2566 2567* LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector 2568 no longer have an option to compute the maximum score when sorting by field. 2569 (Adrien Grand) 2570 2571* LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now 2572 always fills fields. (Adrien Grand) 2573 2574* LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores 2575 need to be set on top hits via TopFieldCollector#populateScores instead. 2576 (Adrien Grand) 2577 2578* LUCENE-6228: A new Scorable abstract class has been added, containing only those 2579 methods from Scorer that should be called from Collectors. LeafCollector.setScorer() 2580 now takes a Scorable rather than a Scorer. (Alan Woodward, Adrien Grand) 2581 2582* LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator. 2583 (Dimitrios Athanasiou) 2584 2585* LUCENE-8483: Scorers may no longer take null as a Weight (Alan Woodward) 2586 2587* LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer<Reader> 2588 in its constructor (Mark Harwood, Alan Woodward, Adrien Grand) 2589 2590* LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer 2591 takes a normalizer function. (Alan Woodward) 2592 2593* LUCENE-7875: Moved MultiFields static methods out of the class. getLiveDocs is now 2594 in MultiBits which is now public. getMergedFieldInfos and getIndexedFields are now in 2595 FieldInfos. getTerms is now in MultiTerms. getTermPositionsEnum and getTermDocsEnum 2596 were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms. 2597 (David Smiley) 2598 2599* LUCENE-8513: MultiFields.getFields is now removed. Please avoid this class, 2600 and Fields in general, when possible. (David Smiley) 2601 2602* LUCENE-8497: MultiTermAwareComponent has been removed, and in its place 2603 TokenFilterFactory and CharFilterFactory now expose type-safe normalize() 2604 methods. This decouples normalization from tokenization entirely. 2605 (Mayya Sharipova, Alan Woodward) 2606 2607* LUCENE-8597: IntervalIterator now exposes a gaps() method that reports the 2608 number of gaps between its component sub-intervals. This can be used in a 2609 new filter available via Intervals.maxgaps(). (Alan Woodward) 2610 2611* LUCENE-8609: Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor 2612 of IndexWriter#getDocStats(). (Simon Willnauer) 2613 2614* LUCENE-8292: Make TermsEnum fully abstract. (Simon Willnauer) 2615 2616Changes in Runtime Behavior 2617 2618* LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of 2619 numDocs. (Robert Muir, Dawid Weiss). 2620 2621* LUCENE-7837: Indices that were created before the previous major version 2622 will now fail to open even if they have been merged with the previous major 2623 version. (Adrien Grand) 2624 2625* LUCENE-8020: Similarities are no longer passed terms that don't exist by 2626 queries such as SpanOrQuery, so scoring formulas no longer require 2627 divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null 2628 instead of returning bogus values for a non-existent term or field. (Robert Muir) 2629 2630* LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0 2631 when the function produces a negative value. (Adrien Grand) 2632 2633* LUCENE-8116: Similarities now score fields that omit norms as if the norm was 2634 1. This might change score values on fields that omit norms. (Adrien Grand) 2635 2636* LUCENE-8134: Index options are no longer automatically downgraded. 2637 (Adrien Grand) 2638 2639* LUCENE-8031: Length normalization correctly reflects omission of term frequencies. 2640 (Robert Muir, Adrien Grand) 2641 2642* LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords 2643 (Alan Woodward) 2644 2645* LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute 2646 total hit counts accurately up to 1,000 in order to enable top-hits 2647 optimizations such as block-max WAND (LUCENE-8135). (Adrien Grand) 2648 2649* LUCENE-8505: IndexWriter#addIndices will now fail if the target index is sorted but 2650 the candidate is not. (Jim Ferenczi) 2651 2652* LUCENE-8535: Highlighter and FVH doesn't support ToParent and ToChildBlockJoinQuery out of the 2653 box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery 2654 should be used. (Simon Willnauer, Jim Ferenczi, Julie Tibshirani) 2655 2656* LUCENE-8563: BM25 scores don't include the (k1+1) factor in their numerator 2657 anymore. This doesn't affect ordering as this is a constant factor which is 2658 the same for every document. (Luca Cavanna via Adrien Grand) 2659 2660* LUCENE-8509: WordDelimiterGraphFilter will no longer set the offsets of internal 2661 tokens by default, preventing a number of bugs when the filter is chained with 2662 tokenfilters that change the length of their tokens (Alan Woodward) 2663 2664* LUCENE-8633: IntervalQuery scores do not use term weighting any more, the score 2665 is instead calculated as a function of the sloppy frequency of the matching 2666 intervals. (Alan Woodward, Jim Ferenczi) 2667 2668* LUCENE-8635: FSTs can now remain off-heap, accessed via 2669 IndexInput, and the default codec's term dictionary 2670 (BlockTreeTermsReader) will now leave the FST for the terms index 2671 off-heap for non-primary-key fields using MMapDirectory, reducing 2672 heap usage for such fields. (Ankit Jain) 2673 2674New Features 2675 2676* LUCENE-8340: LongPoint#newDistanceFeatureQuery may be used to boost scores based on 2677 how close a value of a long field is from an configurable origin. This is 2678 typically useful to boost by recency. (Adrien Grand) 2679 2680* LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores 2681 based on the haversine distance of a LatLonPoint field to a provided point. This is 2682 typically useful to boost by distance. (Ignacio Vera) 2683 2684* LUCENE-8216: Added a new BM25FQuery in sandbox to blend statistics across several fields 2685 using the BM25F formula. (Adrien Grand, Jim Ferenczi) 2686 2687* LUCENE-8564: GraphTokenFilter is an abstract class useful for token filters that need 2688 to read-ahead in the token stream and take into account graph structures. This 2689 also changes FixedShingleFilter to extend GraphTokenFilter (Alan Woodward) 2690 2691* LUCENE-8612: Intervals.extend() treats an interval as if it covered a wider 2692 span than it actually does, allowing users to force minimum gaps between 2693 intervals in a phrase. (Alan Woodward) 2694 2695* LUCENE-8629: New interval functions: Intervals.before(), Intervals.after(), 2696 Intervals.within() and Intervals.overlapping(). (Alan Woodward) 2697 2698* LUCENE-8622: Adds a minimum-should-match interval function that produces intervals 2699 spanning a subset of a set of sources. (Alan Woodward) 2700 2701* LUCENE-8645: Intervals.fixField() allows you to report intervals from one field 2702 as if they came from another. (Alan Woodward) 2703 2704* LUCENE-8646: New interval functions: Intervals.prefix() and Intervals.wildcard() 2705 (Alan Woodward) 2706 2707* LUCENE-8655: Add a getter in FunctionScoreQuery class in order to access to the 2708 underlying DoubleValuesSource. (Gérald Quaire via Alan Woodward) 2709 2710* LUCENE-8697: GraphTokenStreamFiniteStrings correctly handles side paths 2711 containing gaps (Alan Woodward) 2712 2713* LUCENE-8702: Simplify intervals returned from vararg Intervals factory methods 2714 (Alan Woodward) 2715 2716Improvements 2717 2718* LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities. 2719 SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues. 2720 Add missing range checks for similarity parameters. 2721 Improve BM25 and ClassicSimilarity's explanations. (Robert Muir) 2722 2723* LUCENE-8011: Improved similarity explanations. 2724 (Mayya Sharipova via Adrien Grand) 2725 2726* LUCENE-4198: Codecs now have the ability to index score impacts. 2727 (Adrien Grand) 2728 2729* LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in 2730 order to speed up selection of top scored documents. (Adrien Grand) 2731 2732* LUCENE-8279: CheckIndex now cross-checks terms with norms. (Adrien Grand) 2733 2734* LUCENE-8660: TopDocsCollectors now return an accurate count (instead of a lower bound) 2735 if the total hit count is equal to the provided threshold. (Adrien Grand, Jim Ferenczi) 2736 2737Optimizations 2738 2739* LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms 2740 (David Smiley, Robert Muir) 2741 2742* LUCENE-4100: Disjunctions now support faster collection of top hits when the 2743 total hit count is not required. (Stefan Pohl, Adrien Grand, Robert Muir) 2744 2745* LUCENE-7993: Phrase queries are now faster if total hit counts are not 2746 required. (Adrien Grand) 2747 2748* LUCENE-8109: Boolean queries propagate information about the minimum 2749 competitive score in order to make collection faster if there are disjunctions 2750 or phrase queries as sub queries, which know how to leverage this information 2751 to run faster. (Adrien Grand) 2752 2753* LUCENE-8439: Disjunction max queries can skip blocks to select the top documents 2754 if the total hit count is not required. (Jim Ferenczi, Adrien Grand) 2755 2756* LUCENE-8204: Boolean queries with a mix of required and optional clauses are 2757 now faster if the total hit count is not required. (Jim Ferenczi, Adrien Grand) 2758 2759* LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers. 2760 (Jim Ferenczi, Adrien Grand) 2761 2762* LUCENE-8511: MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos 2763 (David Smiley) 2764 2765* LUCENE-8507: TopFieldCollector can now update the minimum competitive score if the primary sort 2766 is by relevancy and the total hit count is not required. (Jim Ferenczi) 2767 2768* LUCENE-8464: ConstantScoreScorer now implements setMinCompetitveScore in order 2769 to early terminate the iterator if the minimum score is greater than the constant 2770 score. (Christophe Bismuth via Jim Ferenczi) 2771 2772* LUCENE-8607: MatchAllDocsQuery can shortcut when total hit count is not 2773 required (Alan Woodward, Adrien Grand) 2774 2775* LUCENE-8585: Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values. 2776 (Toke Eskildsen, Adrien Grand) 2777 2778======================= Lucene 7.7.2 ======================= 2779 2780Bug fixes 2781 2782* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to 2783 IndexSearcher (Alan Woodward, Yury Pakhomov) 2784 2785* LUCENE-8735: FilterDirectory.getPendingDeletions now forwards to the delegate 2786 even the method is not abstract in the super class. This prevents issues 2787 where our best effort in carrying on generations in the IndexWriter since pending 2788 deletions are swallowed by the FilterDirectory. (Henning Andersen, Simon Willnauer) 2789 2790* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the 2791 cheapest merges that allow the index to go down to `maxSegmentCount` segments 2792 or less. (Armin Braun via Adrien Grand) 2793 2794* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates. 2795 This causes assertion errors and potentially broken field attributes in the IndexWriter when 2796 IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer) 2797 2798* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int 2799 overflow bug that disabled cleaning of the cache (Russell A Brown) 2800 2801* LUCENE-8809: Refresh and rollback concurrently can leave segment states unclosed (Nhat Nguyen) 2802 2803======================= Lucene 7.7.1 ======================= 2804(No Changes) 2805 2806======================= Lucene 7.7.0 ======================= 2807 2808Changes in Runtime Behavior 2809 2810* LUCENE-8527: StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0, 2811 and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type. 2812 2813Build 2814 2815* LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core 2816 dependency. (Dawid Weiss) 2817 2818* LUCENE-8537: ant test command fails under lucene/tools (Peter Somogyi) 2819 2820Bug fixes: 2821 2822* LUCENE-8669: Fix LatLonShape WITHIN queries that fail with Multiple search Polygons 2823 that share the dateline. (Nick Knize) 2824 2825* LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary. 2826 (Yoo Jeongin via Jim Ferenczi) 2827 2828* LUCENE-8624: int overflow in ByteBuffersDataOutput.size(). (Mulugeta Mammo, 2829 Dawid Weiss) 2830 2831* LUCENE-8625: int overflow in ByteBuffersDataInput.sliceBufferList. (Mulugeta Mammo, 2832 Dawid Weiss) 2833 2834* LUCENE-8639: Newly created threadstates while flushing / refreshing can cause duplicated 2835 sequence IDs on IndexWriter. (Simon Willnauer) 2836 2837* LUCENE-8649: LatLonShape's within and disjoint queries can return false positives with 2838 indexed multi-shapes. (Ignacio Vera) 2839 2840* LUCENE-8654: Polygon2D#relateTriangle returns the wrong answer if polygon is inside 2841 the triangle. (Ignacio Vera) 2842 2843* LUCENE-8650: ConcatenatingTokenStream did not correctly clear its state in reset(), and 2844 was not propagating final position increments from its child streams correctly. 2845 (Dan Meehl, Alan Woodward) 2846 2847* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused 2848 by a big buffer (1024 chars). (Jim Ferenczi) 2849 2850New Features 2851 2852* LUCENE-8026: ExitableDirectoryReader may now time out queries that run on 2853 points such as range queries or geo queries. 2854 (Christophe Bismuth via Adrien Grand) 2855 2856* LUCENE-8508: IndexWriter can now set the created version via 2857 IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature. 2858 (Adrien Grand) 2859 2860* LUCENE-8601: Attributes set in the IndexableFieldType for each field during indexing will 2861 now be recorded into the corresponding FieldInfo's attributes, accessible at search 2862 time (Murali Krishna P) 2863 2864Improvements 2865 2866* LUCENE-8463: TopFieldCollector can now early-terminates queries when sorting by SortField.DOC. 2867 (Christophe Bismuth via Jim Ferenczi) 2868 2869* LUCENE-8562: Speed up merging segments of points with data dimensions by only sorting on the indexed 2870 dimensions. (Ignacio Vera) 2871 2872* LUCENE-8529: TopSuggestDocsCollector will now use the completion key to tiebreak completion 2873 suggestion with identical scores. (Jim Ferenczi) 2874 2875* LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics. 2876 (Namgyu Kim via Adrien Grand) 2877 2878* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and 2879 detects script boundaries more accurately with Character#UnicodeScript#of. 2880 (Christophe Bismuth, Jim Ferenczi) 2881 2882* LUCENE-8581: Change LatLonShape encoding to use 4 bytes Per Dimension. 2883 (Ignacio Vera, Nick Knize, Adrien Grand) 2884 2885* LUCENE-8527: Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer, 2886 increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization. 2887 2888* LUCENE-8640: Date Range format validation (Lucky Sharma, David Smiley via Mikhail Khludnev) 2889 2890Optimizations 2891 2892* LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment. 2893 (Christophe Bismuth via David Smiley) 2894 2895* LUCENE-8590: BufferedUpdates now uses an optimized storage for buffering docvalues updates that 2896 can safe up to 80% of the heap used compared to the previous implementation and uses non-object 2897 based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand) 2898 2899* LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats 2900 yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grand) 2901 2902* LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates. 2903 (Simon Willnauer, Adrien Grand) 2904 2905* LUCENE-8600: Doc-value updates get applied faster by sorting with quicksort, 2906 rather than an in-place mergesort, which needs to perform fewer swaps. 2907 (Adrien Grand) 2908 2909* LUCENE-8623: Decrease I/O pressure when merging high dimensional points. (Ignacio Vera) 2910 2911Test Framework 2912 2913* LUCENE-8604: TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written 2914 to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of 2915 logs per test class. (Dawid Weiss) 2916 2917Other 2918 2919* LUCENE-8573: BKDWriter now uses FutureArrays#mismatch to compute shared prefixes. 2920 (Christoph Büscher via Adrien Grand) 2921 2922* LUCENE-8605: Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery. 2923 (Ignacio Vera) 2924 2925* LUCENE-8609: Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats() 2926 that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes. 2927 (Simon Willnauer, Nhat Nguyen) 2928 2929======================= Lucene 7.6.0 ======================= 2930 2931Build 2932 2933* LUCENE-8504: Upgrade forbiddenapis to version 2.6. (Uwe Schindler) 2934 2935* LUCENE-8493: Stop publishing insecure .sha1 files with releases (janhoy) 2936 2937Bug fixes 2938 2939* LUCENE-8479: QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception 2940 if the number of expanded path reaches the BooleanQuery#maxClause limit. (Jim Ferenczi) 2941 2942* LUCENE-8522: throw InvalidShapeException when constructing a polygon and 2943 all points are coplanar. (Ignacio Vera) 2944 2945* LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings 2946 in the graph if the slop is greater than 0. Span queries cannot be used in this case because 2947 they don't handle slop the same way than phrase queries. (Steve Rowe, Uwe Schindler, Jim Ferenczi) 2948 2949* LUCENE-8524: Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer. 2950 This change also removes empty terms and trim surface form in Nori's Korean dictionary. (Trey Jones, Jim Ferenczi) 2951 2952* LUCENE-8550: Fix filtering of coplanar points when creating linked list on 2953 polygon tesselator. (Ignacio Vera) 2954 2955* LUCENE-8549: Polygon tessellator throws an error if some parts of the shape 2956 could not be processed. (Ignacio Vera) 2957 2958* LUCENE-8540: Better handling of min/max values for Geo3d encoding. (Ignacio Vera) 2959 2960* LUCENE-8534: Fix incorrect computation for triangles intersecting polygon edges in 2961 shape tessellation. (Ignacio Vera) 2962 2963* LUCENE-8559: Fix bug where polygon edges were skipped when checking for intersections. 2964 (Ignacio Vera) 2965 2966* LUCENE-8556: Use latitude and longitude instead of encoding values to check if triangle is ear 2967 when using morton optimisation. (Ignacio Vera) 2968 2969* LUCENE-8586: Intervals.or() could get stuck in an infinite loop on certain indexes 2970 (Alan Woodward) 2971 2972* LUCENE-8595: Fix interleaved DV update and reset. Interleaved update and reset value 2973 to the same doc in the same updates package looses an update if the reset comes before 2974 the update as well as loosing the reset if the update comes frist. (Simon Willnauer, Adrien Grand) 2975 2976* LUCENE-8592: Fix index sorting corruption due to numeric overflow. The merge of sorted segments 2977 can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains 2978 values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge 2979 (instead of last because of the reverse order) due to this bug. Indices affected by the bug can be 2980 detected by running the CheckIndex command on a distribution that contains the fix (7.6+). 2981 (Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer) 2982 2983New Features 2984 2985* LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users 2986 to select a fewer number of dimensions to be used for creating the index than 2987 the total number of dimensions used for field encoding. i.e., dimensions 0 to N 2988 may be used to determine how to split the inner nodes, and dimensions N+1 to D 2989 are ignored and stored as data dimensions at the leaves. (Nick Knize) 2990 2991* LUCENE-8538: Add a Simple WKT Shape Parser for creating Lucene Geometries (Polygon, Line, 2992 Rectangle) from WKT format. (Nick Knize) 2993 2994* LUCENE-8462: Adds an Arabic snowball stemmer based on 2995 https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl 2996 (Ryadh Dahimene via Jim Ferenczi) 2997 2998* LUCENE-8554: Add new LatLonShapeLineQuery that queries indexed LatLonShape fields 2999 by arbitrary lines. (Nick Knize) 3000 3001* LUCENE-8555: Add dateline crossing support to LatLonShapeBoundingBoxQuery. (Nick Knize) 3002 3003Improvements 3004 3005* LUCENE-8521: Change LatLonShape encoding to 7 dimensions instead of 6; where the 3006 first 4 are index dimensions defining the bounding box of the Triangle and the 3007 remaining 3 data dimensions define the vertices of the triangle. (Nick Knize) 3008 3009* LUCENE-8557: LeafReader.getFieldInfos is now documented and tested that it ought to return 3010 the same cached instance. MemoryIndex's impl now pre-creates the FieldInfos instead of 3011 re-calculating a new instance each time. (Tim Underwood, David Smiley) 3012 3013* LUCENE-8558: Replace O(N) lookup with O(1) lookup in PerFieldMergeState#FilterFieldInfos. 3014 (Kranthi via Simon Willnauer) 3015 3016Other 3017 3018* LUCENE-8523: Correct typo in JapaneseNumberFilterFactory javadocs (Ankush Jhalani 3019 via Alan Woodward) 3020 3021* LUCENE-8533: Fix Javadocs of DataInput#readVInt(): Negative numbers are 3022 supported, but should be avoided. (Vladimir Dolzhenko via Uwe Schindler) 3023 3024======================= Lucene 7.5.1 ======================= 3025 3026Bug Fixes 3027 3028* LUCENE-8454: Fix incorrect vertex indexing and other computation errors in 3029 shape tessellation that would sometimes cause an infinite loop. (Nick Knize) 3030 3031======================= Lucene 7.5.0 ======================= 3032 3033API Changes 3034 3035* LUCENE-8467: RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated 3036 (Dawid Weiss) 3037 3038* LUCENE-8356: StandardFilter is deprecated (Alan Woodward) 3039 3040* LUCENE-8373: ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated. Instead 3041 use EnglishAnalyzer.ENGLISH_STOP_WORD_SET. The default constructor for 3042 StopAnalyzer is also deprecated, and a stop word set should be explicitly 3043 passed to the constructor. (Alan Woodward) 3044 3045* LUCENE-8378: Add DocIdSetIterator.range static method to return an iterator 3046 matching a range of docids (Mike McCandless) 3047 3048* LUCENE-8379: Add experimental TermQuery.getTermStates method (Mike McCandless) 3049 3050* LUCENE-8407: Add experimental SpanTermQuery.getTermStates method (David Smiley) 3051 3052* LUCENE-8390: MatchesIteratorSupplier replaced by IOSupplier (Alan Woodward, 3053 David Smiley) 3054 3055* LUCENE-8397: Add DirectoryTaxonomyWriter.getCache (Mike McCandless) 3056 3057* LUCENE-8387: Add experimental IndexSearcher.getSlices API to see which slices 3058 IndexSearcher is searching concurrently when it's created with an ExecutorService 3059 (Mike McCandless) 3060 3061* LUCENE-8263: TieredMergePolicy's reclaimDeletesWeight has been replaced with a 3062 new deletesPctAllowed setting to control how aggressively deletes should be 3063 reclaimed. (Erick Erickson, Adrien Grand) 3064 3065* LUCENE-7314: Graduate LatLonPoint and query classes to core (Nick Knize) 3066 3067* LUCENE-8428: The way that oal.util.PriorityQueue creates sentinel objects has 3068 been changed from a protected method to a java.util.function.Supplier as a 3069 constructor argument. (Adrien Grand) 3070 3071* LUCENE-8437: CheckIndex.Status.cantOpenSegments and missingSegmentVersion 3072 have been removed as they were not computed correctly. (Adrien Grand) 3073 3074* LUCENE-8286: The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that 3075 will tell this highlighter to use the new MatchesIterator API as the underlying 3076 approach to navigate matching hits for a query. This mode will highlight more 3077 accurately than any other highlighter, and can mark up phrases as one span instead of 3078 word-by-word. The UH's public internal APIs changed a bit in the process. 3079 (David Smiley) 3080 3081* LUCENE-8471: IndexWriter.getFlushingBytes() returns how many bytes are currently 3082 being flushed to disk. (Alan Woodward) 3083 3084* LUCENE-8422: Static helper functions for Matches and MatchesIterator implementations 3085 have been moved from Matches to MatchesUtils (Alan Woodward) 3086 3087* LUCENE-8343: Suggesters now require Long (versus long, previously) from weight() method 3088 while indexing, and provide double (versus long, previously) scores at lookup time 3089 (Alessandro Benedetti) 3090 3091* LUCENE-8459: SearcherTaxonomyManager now has a constructor taking already opened 3092 IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example. 3093 (Mike McCandless) 3094 3095Bug Fixes 3096 3097* LUCENE-8445: Tighten condition when two planes are identical to prevent constructing 3098 bogus tiles when building GeoPolygons. (Ignacio Vera) 3099 3100* LUCENE-8444: Prevent building functionally identical plane bounds when constructing 3101 DualCrossingEdgeIterator . (Ignacio Vera) 3102 3103* LUCENE-8380: UTF8TaxonomyWriterCache inconsistency. (Ruslan Torobaev, Dawid Weiss) 3104 3105* LUCENE-8164: IndexWriter silently accepts broken payload. This has been fixed 3106 via LUCENE-8165 since we are now checking for offset+length going out of bounds. 3107 (Robert Muir, Nhat Nyugen, Simon Willnauer) 3108 3109* LUCENE-8370: Reproducing 3110 TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields() 3111 failures (Erick Erickson) 3112 3113* LUCENE-8376, LUCENE-8371: ConditionalTokenFilter.end() would not propagate correctly 3114 if the last token in the stream was subsequently dropped; FixedShingleFilter did 3115 not set position increment in end() (Alan Woodward) 3116 3117* LUCENE-8395: WordDelimiterGraphFilter would incorrectly insert a hole into a 3118 TokenStream if a token consisting entirely of delimiter characters was 3119 encountered, but preserve_original was set. (Alan Woodward) 3120 3121* LUCENE-8398: TieredMergePolicy.getMaxMergedSegmentMB has rounding error (Erick Erickson) 3122 3123* LUCENE-8429: DaciukMihovAutomatonBuilder is no longer prone to stack 3124 overflows by enforcing a maximum term length. (Adrien Grand) 3125 3126* LUCENE-8441: IndexWriter now checks doc value type for index sort fields 3127 and fails the document if they are not compatible. (Jim Ferenczi, Mike McCandless) 3128 3129* LUCENE-8458: Adjust initialization condition of PendingSoftDeletes and ensures 3130 it is initialized before accepting deletes (Simon Willnauer, Nhat Nguyen) 3131 3132* LUCENE-8466: IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush 3133 if the index is sorted. (Adrien Grand, Jim Ferenczi, Vish Ramachandran) 3134 3135* LUCENE-8502: Allow access to delegate in FilterCodecReader. FilterCodecReader didn't 3136 allow access to it's delegate like other filter readers. This adds a new #getDelegate method 3137 to access the wrapped reader. (Simon Willnauer) 3138 3139Changes in Runtime Behavior 3140 3141* LUCENE-7976: TieredMergePolicy now respects maxSegmentSizeMB by default when executing 3142 findForcedMerges and findForcedDeletesMerges (Erick Erickson) 3143 3144* LUCENE-8263: TieredMergePolicy now reclaims deleted documents more 3145 aggressively by default ensuring that no more than ~1/3 of the index size is 3146 used by deleted documents. (Adrien Grand) 3147 3148* LUCENE-8503: Call #getDelegate instead of direct member access during unwrap. 3149 Filter*Reader instances access the member or the delegate directly instead of 3150 calling getDelegate(). In order to track access of the delegate these methods 3151 should call #getDelegate() (Simon Willnauer) 3152 3153Improvements 3154 3155* LUCENE-8468: A ByteBuffer based Directory implementation. (Dawid Weiss) 3156 3157* LUCENE-8447: Add DISJOINT and WITHIN support to LatLonShape queries. (Nick Knize) 3158 3159* LUCENE-8440: Add support for indexing and searching Line and Point shapes using LatLonShape encoding (Nick Knize) 3160 3161* LUCENE-8435: Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons (Nick Knize) 3162 3163* LUCENE-8367: Make per-dimension drill down optional for each facet dimension (Mike McCandless) 3164 3165* LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes 3166 into a triangular mesh and indexes individual triangles as a 6 dimension point (Nick Knize) 3167 3168* LUCENE-8345, GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives; 3169 add wrapper class constructors to forbiddenapis. (Michael Braun via Uwe Schindler) 3170 3171* LUCENE-8415: Clean up Directory contracts and JavaDoc comments. (Dawid Weiss) 3172 3173* LUCENE-8414: Make segmentInfos private in IndexWriter (Simon Willnauer, Nhat Nguyen) 3174 3175* LUCENE-8446: The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in 3176 the passage as merged (as if one larger match). (David Smiley) 3177 3178* LUCENE-8460: Better argument validation in StoredField. (Namgyu Kim) 3179 3180* LUCENE-8432: TopFieldComparator stops comparing documents if the index is 3181 sorted, even if hits still need to be visited to compute the hit count. 3182 (Nikolay Khitrin) 3183 3184* LUCENE-8422: IntervalQuery now returns useful Matches (Alan Woodward) 3185 3186* LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the 3187 number of dimensions is bigger than 1. It improves performance when there is 3188 correlation between the dimensions, for example ranges. (Ignacio Vera, Adrien Grand) 3189 3190Build 3191 3192* LUCENE-5143: Stop publishing KEYS file with each version, use topmost lucene/KEYS file only. 3193 The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file. 3194 Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build. (janhoy) 3195 3196Other 3197 3198* LUCENE-8485: Update randomizedtesting to version 2.6.4. (Dawid Weiss) 3199 3200* LUCENE-8366: Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's 3201 Extended_Pictographic property. (Robert Muir) 3202 3203* LUCENE-8408: original Highlighter: Remove obsolete static AttributeFactory instance 3204 in TokenStreamFromTermVector. (Michael Braun, David Smiley) 3205 3206* LUCENE-8420: Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x 3207 cannot read. 1.9.0 can read the old format. (Koji Sekiguchi) 3208 3209* LUCENE-8453: Add documentation to analysis factories of Korean (Nori) analyzer 3210 module. (Tomoko Uchida via Uwe Schindler) 3211 3212* LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml (Erick Erickson) 3213 3214* LUCENE-8456: Upgrade Apache Commons Compress to v1.18 (Steve Rowe) 3215 3216* LUCENE-765: Improved org.apache.lucene.index javadocs. (Mike Sokolov) 3217 3218* LUCENE-8476: Remove redundant nullity check and switch to optimized List.sort in the 3219 Korean's user dictionary. (Namgyu Kim) 3220 3221======================= Lucene 7.4.1 ======================= 3222 3223Bug Fixes 3224 3225 * LUCENE-8365: Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes 3226 a "off by one" error in the UnifiedHighlighter's code that is only triggered when 3227 two nested SpanNearQueries contain the same term. (Marc-Andre Morissette via Simon Willnauer) 3228 3229 * LUCENE-8381: Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes 3230 while wrapping reader for merges. (Simon Willnauer, Nhat Nguyen) 3231 3232 * LUCENE-8384: Fix missing advance docValues generation while handling docValues 3233 update in PendingSoftDeletes. (Simon Willnauer, Nhat Nguyen) 3234 3235 * LUCENE-8472: Always rewrite the soft-deletes merge retention query. (Adrien Grand, Nhat Nguyen) 3236 3237======================= Lucene 7.4.0 ======================= 3238 3239Upgrading 3240 3241* LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you 3242 explicitly use the preservePositionIncrements=false setting (not the default), then you ought 3243 to rebuild your suggester index. If you don't, queries or indexed data with trailing position 3244 gaps (e.g. stop words) may not work correctly. (David Smiley, Jim Ferenczi) 3245 3246API Changes 3247 3248* LUCENE-8242: IndexSearcher.createNormalizedWeight() has been deprecated. 3249 Instead use IndexSearcher.createWeight(), rewriting the query first. 3250 (Alan Woodward) 3251 3252* LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now 3253 also overrides getMaxCFSSegmentSizeMB (Mike Sokolov via Mike McCandless) 3254 3255* LUCENE-8303: LiveDocsFormat is now only responsible for (de)serialization of 3256 live docs. (Adrien Grand) 3257 3258Changes in Runtime Behavior 3259 3260* LUCENE-8309: Live docs are no longer backed by a FixedBitSet. (Adrien Grand) 3261 3262* LUCENE-8330: Detach IndexWriter from MergePolicy. MergePolicy now instead of 3263 requiring IndexWriter as a hard dependency expects a MergeContext which 3264 IndexWriter implements. (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless) 3265 3266New Features 3267 3268* LUCENE-8200: Allow doc-values to be updated atomically together 3269 with a document. Doc-Values updates now can be used as a soft-delete 3270 mechanism to all keeping several version of a document or already 3271 deleted documents around for later reuse. See "IW.softUpdateDocument(...)" 3272 for reference. (Simon Willnauer) 3273 3274* LUCENE-8197: A new FeatureField makes it easy and efficient to integrate 3275 static relevance signals into the final score. (Adrien Grand, Robert Muir) 3276 3277* LUCENE-8202: Add a FixedShingleFilter (Alan Woodward, Adrien Grand, Jim 3278 Ferenczi) 3279 3280* LUCENE-8125: ICUTokenizer support for emoji/emoji sequence tokens. (Robert Muir) 3281 3282* LUCENE-8196, LUCENE-8300: A new IntervalQuery in the sandbox allows efficient proximity 3283 searches based on minimum-interval semantics. (Alan Woodward, Adrien Grand, 3284 Jim Ferenczi, Simon Willnauer, Matt Weber) 3285 3286* LUCENE-8233: Add support for soft deletes to IndexWriter delete accounting. 3287 Soft deletes are accounted for inside the index writer and therefor also 3288 by merge policies. A SoftDeletesRetentionMergePolicy is added that allows 3289 to selectively carry over soft_deleted document across merges for retention 3290 policies (Simon Willnauer, Mike McCandless, Robert Muir) 3291 3292* LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper that allows to respect 3293 soft deletes if the reader is opened form a directory. (Simon Willnauer, 3294 Mike McCandless, Uwe Schindler, Adrien Grand) 3295 3296* LUCENE-8229, LUCENE-8270: Add a method Weight.matches(LeafReaderContext, doc) 3297 that returns an iterator over matching positions for a given query and document. 3298 This allows exact hit extraction and will enable implementation of accurate 3299 highlighters. (Alan Woodward, Adrien Grand, David Smiley) 3300 3301* LUCENE-8249: Implement Matches API for phrase queries (Alan Woodward, Adrien 3302 Grand) 3303 3304* LUCENE-8246: Allow to customize the number of deletes a merge claims. This 3305 helps merge policies in the soft-delete case to correctly implement retention 3306 policies without triggering uncessary merges. (Simon Willnauer, Mike McCandless) 3307 3308* LUCENE-8231: A new analysis module (nori) similar to Kuromoji 3309 but to handle Korean using mecab-ko-dic and morphological analysis. 3310 (Robert Muir, Jim Ferenczi) 3311 3312* LUCENE-8265: WordDelimter/GraphFilter now have an option to skip tokens 3313 marked with KeywordAttribute (Mike Sokolov via Mike McCandless) 3314 3315* LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can 3316 update doc values for a specific term but this might affect all documents 3317 containing the term. With tryUpdateDocValues users can update doc-values 3318 fields for individual documents. This allows for instance to soft-delete 3319 individual documents. (Simon Willnauer) 3320 3321* LUCENE-8298: Allow DocValues updates to reset a value. Passing a DV field with a null 3322 value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the 3323 provided document. This allows to undelete a soft-deleted document unless it's been claimed 3324 by a merge. (Simon Willnauer) 3325 3326* LUCENE-8273: ConditionalTokenFilter allows analysis chains to skip particular token 3327 filters based on the attributes of the current token. This generalises the keyword 3328 token logic currently used for stemmers and WDF. It is integrated into 3329 CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new 3330 ProtectedTermFilter is added as an example. (Alan Woodward, Robert Muir, 3331 David Smiley, Steve Rowe, Mike Sokolov) 3332 3333* LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes. Today we fail 3334 creating the IndexWriter when the directory has a pending delete. Yet, this 3335 is mainly done to prevent writing still existing files more than once. 3336 IndexFileDeleter already accounts for that for existing files which we can 3337 now use to also take pending deletes into account which ensures that all file 3338 generations per segment always go forward. (Simon Willnauer) 3339 3340* LUCENE-7960: Add preserveOriginal option to the NGram and EdgeNGram filters. 3341 (Ingomar Wesp, Shawn Heisey via Robert Muir) 3342 3343* LUCENE-8335: Enforce soft-deletes field up-front. Soft deletes field must be marked 3344 as such once it's introduced and can't be changed after the fact. 3345 (Nhat Nguyen via Simon Willnauer) 3346 3347* LUCENE-8332: New ConcatenateGraphFilter for concatenating all tokens into one (or more 3348 in the event of a graph input). This is useful for fast analyzed exact-match lookup, 3349 suggesters, and as a component of a named entity recognition system. This was excised 3350 out of CompletionTokenStream in the NRT doc suggester. (David Smiley, Jim Ferenczi) 3351 3352Bug Fixes 3353 3354* LUCENE-8221: MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger 3355 indexes. 3356 3357* LUCENE-8266: Detect bogus tiles when creating a standard polygon and 3358 throw a TileException. (Ignacio Vera) 3359 3360* LUCENE-8234: Fixed bug in how spatial relationship is computed for 3361 GeoStandardCircle when it covers the whole world. (Ignacio Vera) 3362 3363* LUCENE-8236: Filter duplicated points when creating GeoPath shapes to 3364 avoid creation of bogus planes. (Ignacio Vera) 3365 3366* LUCENE-8243: IndexWriter.addIndexes(Directory[]) did not properly preserve 3367 index file names for updated doc values fields (Simon Willnauer, 3368 Michael McCandless, Nhat Nguyen) 3369 3370* LUCENE-8275: Push up #checkPendingDeletes to Directory to ensure IW fails if 3371 the directory has pending deletes files even if the directory is filtered or 3372 a FileSwitchDirectory (Simon Willnauer, Robert Muir) 3373 3374* LUCENE-8244: Do not leak open file descriptors in SearcherTaxonomyManager's 3375 refresh on exception (Mike McCandless) 3376 3377* LUCENE-8305: ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery 3378 that rewrites to a MatchNoDocsQuery instead of throwing an exception. 3379 (Bjarke Mortensen, Andy Tran via David Smiley) 3380 3381* LUCENE-8287: Ensure that empty regex completion queries always return no results. 3382 (Julie Tibshirani via Jim Ferenczi) 3383 3384* LUCENE-8317: Prevent concurrent deletes from being applied during full flush. 3385 Future deletes could potentially be exposed to flushes/commits/refreshes if the 3386 amount of RAM used by deletes is greater than half of the IW RAM buffer. (Simon Willnauer) 3387 3388* LUCENE-8320: Fix WindowsFS to correctly account for rename and hardlinks. 3389 (Simon Willnauer, Nhat Nguyen) 3390 3391* LUCENE-8328: Ensure ReadersAndUpdates consistently executes under lock. 3392 (Nhat Nguyen via Simon Willnauer) 3393 3394* LUCENE-8325: Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs. 3395 (chengpohi via Jim Ferenczi) 3396 3397* LUCENE-8186: LowerCaseTokenizerFactory now lowercases text in multi-term 3398 queries. (Tim Allison via Adrien Grand) 3399 3400* LUCENE-8278: Some end-of-input no-scheme domain-only URL tokens are typed as 3401 <ALPHANUM> rather than <URL>. (Junte Zhang, Steve Rowe) 3402 3403* LUCENE-8355: Prevent IW from opening an already dropped segment while DV updates 3404 are written. (Nhat Nguyen via Simon Willnauer) 3405 3406* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing 3407 position increment when the preservePositionIncrement setting is false. 3408 (David Smiley, Jim Ferenczi) 3409 3410* LUCENE-8357: FunctionScoreQuery.boostByQuery() and boostByValue() were 3411 producing truncated Explanations (Markus Jelsma, Alan Woodward) 3412 3413* LUCENE-8360: NGramTokenFilter and EdgeNGramTokenFilter did not correctly 3414 set position increments in end() (Alan Woodward) 3415 3416Other 3417 3418* LUCENE-8301: Update randomizedtesting to 2.6.0. (Dawid Weiss) 3419 3420* LUCENE-8299: Geo3D wrapper uses new polygon method factory that gives better 3421 support for polygons with many points (>100). (Ignacio vera) 3422 3423* LUCENE-8261: InterpolatedProperties.interpolate and recursive property 3424 references. (Steve Rowe, Dawid Weiss) 3425 3426* LUCENE-8228: removed obsolete IndexDeletionPolicy clone() requirements from 3427 the javadoc. (Dawid Weiss) 3428 3429* LUCENE-8219: Use a realistic estimate of the number of nodes and links in 3430 LevensteinAutomaton.java, to save reallocation of arrays. 3431 (Christian Ziech) 3432 3433* LUCENE-8214: Improve selection of testPoint for GeoComplexPolygon. 3434 (Ignacio Vera) 3435 3436* SOLR-10912: Add automatic patch validation. (Mano Kovacs, Steve Rowe) 3437 3438* LUCENE-8122, LUCENE-8175: Upgrade analysis/icu to ICU 61.1. 3439 (Robert Muir, Adrien Grand, Uwe Schindler) 3440 3441* LUCENE-8291: Remove QueryTemplateManager utility class from XML queryparser. 3442 This class is just a general XML transforming tool (using property files and 3443 XSLT) and has nothing to do with query parsing. It can easily be implemented 3444 using more sophisticated libraries or using XSL transformers from the JDK. 3445 This change also removes the Lucene demo webapp to prevent XSS issues in 3446 untested/unmaintained code. (Uwe Schindler) 3447 3448Build 3449 3450* LUCENE-7935: Publish .sha512 hash files with the release artifacts and stop 3451 publishing .md5 hashes since the algorithm is broken (janhoy) 3452 3453* LUCENE-8230: Upgrade forbiddenapis to version 2.5. (Uwe Schindler) 3454 3455Documentation 3456 3457* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs 3458 (Mike Sokolov via Mike McCandless) 3459 3460======================= Lucene 7.3.1 ======================= 3461 3462Bug fixes 3463 3464* LUCENE-8254: LRUQueryCache could cause IndexReader to hang on close, when 3465 shared with another reader with no CacheHelper (Alan Woodward, Simon Willnauer, 3466 Adrien Grand) 3467 3468======================= Lucene 7.3.0 ======================= 3469 3470API Changes 3471 3472* LUCENE-8051: LevensteinDistance renamed to LevenshteinDistance. 3473 (Pulak Ghosh via Adrien Grand) 3474 3475* LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery. 3476 Users should instead use FunctionScoreQuery, possibly combined with 3477 a lucene expression (Alan Woodward) 3478 3479* LUCENE-8104: Remove facets module compile-time dependency on queries 3480 (Alan Woodward) 3481 3482* LUCENE-8145: UnifiedHighlighter now uses a unitary OffsetsEnum rather 3483 than a list of enums (Alan Woodward, David Smiley, Jim Ferenczi, Timothy 3484 Rodriguez) 3485 3486New Features 3487 3488* LUCENE-2899: Add new module analysis/opennlp, with analysis components 3489 to perform tokenization, part-of-speech tagging, lemmatization and phrase 3490 chunking by invoking the corresponding OpenNLP tools. Named entity 3491 recognition is also provided as a Solr update request processor. 3492 (Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau, 3493 Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe) 3494 3495* LUCENE-8126: Add new spatial prefix tree (SPT) based on google S2 geometry. 3496 It can only be used currently with Geo3D spatial context and it provides 3497 improvements on indexing time for non-points shapes and on query performance. 3498 (Ignacio Vera, David Smiley). 3499 3500Improvements 3501 3502* LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads 3503 Index/Update Threads try to help out flushing pending document buffers to 3504 disk. This change adds an expert setting to opt ouf of this behavior unless 3505 flusing is falling behind. (Simon Willnauer) 3506 3507* LUCENE-8086: spatial-extras Geo3dFactory: Use GeoExactCircle with 3508 configurable precision for non-spherical planet models. 3509 (Ignacio Vera via David Smiley) 3510 3511* LUCENE-8093: TrimFilterFactory implements MultiTermAwareComponent (Alan Woodward) 3512 3513* LUCENE-8094: TermInSetQuery.toString now returns "field:(A B C)" (Mike McCandless) 3514 3515* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are 3516 position sensitive (e.g. part of a phrase) by having an accurate freq. 3517 (David Smiley) 3518 3519* LUCENE-8129: A Unicode set filter can now be specified when using ICUFoldingFilter. 3520 (Ere Maijala) 3521 3522* LUCENE-7966: Build Multi-Release JARs to enable usage of optimized intrinsic methods 3523 from Java 9 for index bounds checking and array comparison/mismatch. This change 3524 introduces Java 8 replacements for those Java 9 methods and patches the compiled 3525 classes to use the optimized variants through the MR-JAR mechanism. 3526 (Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless) 3527 3528* LUCENE-8127: Speed up rewriteNoScoring when there are no MUST clauses. 3529 (Michael Braun via Adrien Grand) 3530 3531* LUCENE-8152: Improve consumption of doc-value iterators. (Horatiu Lazu via 3532 Adrien Grand) 3533 3534* LUCENE-8033: FieldInfos now always use a dense encoding. (Mayya Sharipova 3535 via Adrien Grand) 3536 3537* LUCENE-8190: Specialized cell interface to allow any spatial prefix tree to 3538 benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy. 3539 (Ignacio Vera) 3540 3541Bug Fixes 3542 3543* LUCENE-8077: Fixed bug in how CheckIndex verifies doc-value iterators. 3544 (Xiaoshan Sun via Adrien Grand) 3545 3546* SOLR-11758: Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F 3547 (Munendra S N via hossman) 3548 3549* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested 3550 SpanNearQueries at positions where it should not have. It's fixed in the UH by 3551 switching to the SpanCollector API. The original Highlighter still has this 3552 problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal parts of 3553 the UH were refactored. (David Smiley, Steve Davids) 3554 3555* LUCENE-8120: Fix LatLonBoundingBox's toString() method (Martijn van Groningen, Adrien Grand) 3556 3557* LUCENE-8130: Fix NullPointerException from TermStates.toString() (Mike McCandless) 3558 3559* LUCENE-8124: Fixed HyphenationCompoundWordTokenFilter to handle correctly 3560 hyphenation patterns with indicator >= 7. (Holger Bruch via Adrien Grand) 3561 3562* LUCENE-8163: BaseDirectoryTestCase could produce random filenames that fail 3563 on Windows (Alan Woodward) 3564 3565* LUCENE-8174: Fixed {Float,Double,Int,Long}Range.toString(). (Oliver Kaleske 3566 via Adrien Grand) 3567 3568* LUCENE-8182: Fixed BoostingQuery to apply the context boost instead of the parent query 3569 boost (Jim Ferenczi) 3570 3571* LUCENE-8188: Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the 3572 ResourceLoader to be leaked (hossman) 3573 3574 3575Other 3576 3577* LUCENE-8111: IndexOrDocValuesQuery Javadoc references outdated method name. 3578 (Kai Chan via Adrien Grand) 3579 3580* LUCENE-8106: Add script (reproduceJenkinsFailures.py) to attempt to reproduce 3581 failing tests from a Jenkins log. (Steve Rowe) 3582 3583* LUCENE-8075: Removed unnecessary null check in IntersectTermsEnum. 3584 (Pulak Ghosh via Adrien Grand) 3585 3586* LUCENE-8156: Require users to not have ASM on the Ant classpath during build. 3587 This is required by LUCENE-7966. (Adrien Grand, Uwe Schindler) 3588 3589* LUCENE-8161: spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7, 3590 which is drop-in compatible (Lucene doesn't expressly use any of the few API differences). 3591 Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version. JTS 1.15.0 is 3592 dual-licensed to include BSD; prior versions were LGPL. (David Smiley) 3593 3594* LUCENE-8155: Add back support in smoke tester to run against later Java versions. 3595 (Uwe Schindler) 3596 3597* LUCENE-8169: Migrated build to use OpenClover 4.2.1 for checking code coverage. 3598 (Uwe Schindler) 3599 3600* LUCENE-8170: Improve OpenClover reports (separate test from production code); 3601 enable coverage reports inside test-frameworks. (Uwe Schindler) 3602 3603Build 3604 3605* LUCENE-8168: Moved Groovy scripts in build files to separate files. 3606 Update Groovy to 2.4.13. (Uwe Schindler) 3607 3608* LUCENE-8176: HttpReplicatorTest awaits more than a minute for stopping Jetty threads 3609 (Mikhail Khludnev) 3610 3611======================= Lucene 7.2.1 ======================= 3612 3613Bug Fixes 3614 3615* LUCENE-8117: Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues. (Jim Ferenczi). 3616 3617======================= Lucene 7.2.0 ======================= 3618 3619API Changes 3620 3621* LUCENE-8017, LUCENE-8042: Weight, DoubleValuesSource and related objects 3622 now implement a SegmentCacheable interface, with a single method 3623 isCacheable(LeafReaderContext) determining whether or not the object may 3624 be cached against a LeafReader. (Alan Woodward, Robert Muir) 3625 3626* LUCENE-8038: Payload factors for scoring in PayloadScoreQuery are now 3627 calculated by a PayloadDecoder, instead of delegating to the Similarity. 3628 (Alan Woodward) 3629 3630* LUCENE-8014: Similarity.computeSlopFactor() and 3631 Similarity.computePayloadFactor() have been deprecated. (Alan Woodward) 3632 3633* LUCENE-6278: Scorer.freq() has been removed (Alan Woodward) 3634 3635* LUCENE-7736: DoubleValuesSource and LongValuesSource now expose a 3636 rewrite(IndexSearcher) function. (Alan Woodward) 3637 3638* LUCENE-7998: DoubleValuesSource.fromQuery() allows you to use the scores 3639 from a Query as a DoubleValuesSource. (Alan Woodward) 3640 3641* LUCENE-8049: IndexWriter.getMergingSegments()'s return type was changed from 3642 Collection to Set to more accurately reflect it's nature. (David Smiley) 3643 3644* LUCENE-8059: TopFieldDocCollector can now early terminate collection when 3645 the sort order is compatible with the index order. As a consequence, 3646 EarlyTerminatingSortingCollector is now deprecated. (Adrien Grand) 3647 3648New Features 3649 3650* LUCENE-8061: Add convenience factory methods to create BBoxes and XYZSolids 3651 directly from bounds objects. 3652 3653* LUCENE-7736: IndexReaderFunctions expose various IndexReader statistics as 3654 DoubleValuesSources. (Alan Woodward) 3655 3656* LUCENE-8068: Allow IndexWriter to write a single DWPT to disk Adds a 3657 flushNextBuffer method to IndexWriter that allows the caller to 3658 synchronously move the next pending or the biggest non-pending index buffer to 3659 disk. This enables flushing selected buffer to disk without highjacking an 3660 indexing thread. This is for instance useful if more than one IW (shards) must 3661 be maintained in a single JVM / system. (Simon Willnauer) 3662 3663Bug Fixes 3664 3665* LUCENE-8076: Normalize Vincenti distance calculation for planet models that aren't normalized. 3666 (Ignacio Vera) 3667 3668* LUCENE-8057: Exact circle bounds computation was incorrect. 3669 (Ignacio Vera) 3670 3671* LUCENE-8056: Exact circle segment bounding suffered from precision errors. 3672 (Karl Wright) 3673 3674* LUCENE-8054: Fix the exact circle case where relationships fail when the 3675 planet model has c <= ab, because the planes are constructed incorrectly. 3676 (Ignacio Vera) 3677 3678* LUCENE-7991: KNearestNeighborDocumentClassifier.knnSearch no longer applies 3679 a previous boosted field's factor to subsequent unboosted fields. 3680 (Christine Poerschke) 3681 3682* LUCENE-7999: Switch from int to long to track the name for the next 3683 segment to write, so that very long lived indices with very frequent 3684 refreshes or commits, and high indexing thread counts, do not 3685 overflow an int (Mykhailo Demianenko via Mike McCandless) 3686 3687* LUCENE-8025: Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields 3688 that omit term frequency information, as it is equivalent in that case. 3689 Previously bogus numbers were used, and many similarities would 3690 completely degrade. (Robert Muir, Adrien Grand) 3691 3692* LUCENE-8045: ParallelLeafReader did not correctly report FieldInfo.dvGen 3693 (Alan Woodward) 3694 3695* LUCENE-8034: Use subtraction instead of addition to sidestep int 3696 overflow in SpanNotQuery. (Hari Menon via Mike McCandless) 3697 3698* LUCENE-8078: The query cache should not cache instances of 3699 MatchNoDocsQuery. (Jon Harper via Adrien Grand) 3700 3701* LUCENE-8048: Filesystems do not guarantee order of directories updates 3702 (Nikolay Martynov, Simon Willnauer, Erick Erickson) 3703 3704Optimizations 3705 3706* LUCENE-8018: Smaller FieldInfos memory footprint by not retaining unnecessary 3707 references to TreeMap entries. (Julian Vassev via Adrien Grand) 3708 3709* LUCENE-7994: Use int/int scatter map to gather facet counts when the 3710 number of hits is small relative to the number of unique facet labels 3711 (Dawid Weiss, Robert Muir, Mike McCandless) 3712 3713* LUCENE-8062: GlobalOrdinalsQuery is no longer eligible for caching. (Jim Ferenczi) 3714 3715* LUCENE-8058: Large instances of TermInSetQuery are no longer eligible for 3716 caching as they could break memory accounting of the query cache. 3717 (Adrien Grand) 3718 3719* LUCENE-8055: MemoryIndex.MemoryDocValuesIterator returns 2 documents 3720 instead of 1. (Simon Willnauer) 3721 3722* LUCENE-8043: Fix document accounting in IndexWriter to prevent writing too many 3723 documents. Once this happens, Lucene refuses to open the index and throws a 3724 CorruptIndexException. (Simon Willnauer, Yonik Seeley, Mike McCandless) 3725 3726Tests 3727 3728* LUCENE-8035: Run tests with JDK-specific options: --illegal-access=deny 3729 on Java 9+. (Uwe Schindler) 3730 3731Build 3732 3733* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy 3734 jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe) 3735 3736 3737======================= Lucene 7.1.0 ======================= 3738 3739Changes in Runtime Behavior 3740 3741* Resolving of external entities in queryparser/xml/CoreParser is disallowed 3742 by default. See SOLR-11477 for details. 3743 3744New Features 3745 3746* LUCENE-7970: Add a shape to Geo3D that consists of multiple planes that 3747 approximate a true circle, rather than an ellipse, for non-spherical planet models. 3748 (Karl Wright, Ignacio Vera) 3749 3750* LUCENE-7955: Add support for the concept of "nearest distance" to Geo3D's 3751 GeoPath abstraction, which is the distance along the path to the point that is 3752 closest to the provided point. (Karl Wright) 3753 3754* LUCENE-7906: Add spatial relationships between all currently-defined Geo shapes. 3755 (Ignacio Vera) 3756 3757* LUCENE-7955: Add support for zero-width paths. (Karl Wright) 3758 3759* LUCENE-7936: Add serialization and deserialization support to Geo3D. (Karl Wright, 3760 Ignacio Vera) 3761 3762* LUCENE-7942: Distance computations now have the ability to accurately aggregate 3763 distances, rather than just doing sums. (Karl Wright) 3764 3765* LUCENE-7934: Add a planet model interface. (Karl Wright) 3766 3767* LUCENE-7918: Revamp the API for composites so that it's generic and can be used 3768 for many kinds of shapes. (Ignacio Vera) 3769 3770* LUCENE-7621: Add CoveringQuery, a query whose required number of matching 3771 clauses can be defined per document. (Adrien Grand) 3772 3773* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual 3774 numeric values (Mike McCandless) 3775 3776* LUCENE-7940: Add BengaliAnalyzer. (Md. Abdulla-Al-Sun via Robert Muir) 3777 3778* LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type. 3779 (Nick Knize) 3780 3781* LUCENE-7951: Spatial-extras has much better Geo3d support by implementing Spatial4j 3782 abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator. 3783 (Ignacio Vera, David Smiley) 3784 3785* LUCENE-7973: Update dictionary version for Ukrainian analyzer to 3.9.0 (Andriy 3786 Rysin via Dawid Weiss) 3787 3788* LUCENE-7974: Add FloatPointNearestNeighbor, an N-dimensional FloatPoint 3789 K-nearest-neighbor search implementation. (Steve Rowe) 3790 3791* LUCENE-7975: Change the default taxonomy facets cache to a faster 3792 byte[] (UTF-8) based cache. (Mike McCandless) 3793 3794* LUCENE-7972: DirectoryTaxonomyReader, in Lucene's facet module, now 3795 implements Accountable, so you can more easily track how much heap 3796 it's using. (Mike McCandless) 3797 3798* LUCENE-7982: A new NormsFieldExistsQuery matches documents that have 3799 norms in a specified field (Colin Goodheart-Smithe via Mike McCandless) 3800 3801Optimizations 3802 3803* LUCENE-7905: Optimize how OrdinalMap (used by 3804 SortedSetDocValuesFacetCounts and others) builds its map (Robert 3805 Muir, Adrien Grand, Mike McCandless) 3806 3807* LUCENE-7655: Speed up geo-distance queries in case of dense single-valued 3808 fields when most documents match. (Maciej Zasada via Adrien Grand) 3809 3810* LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more 3811 than 8x greater than the cost of the lead iterator in order to use doc values. 3812 (Murali Krishna P via Adrien Grand) 3813 3814* LUCENE-7925: Collapse duplicate SHOULD or MUST clauses by summing up their 3815 boosts. (Adrien Grand) 3816 3817* LUCENE-7939: MinShouldMatchSumScorer now leverages two-phase iteration in 3818 order to be faster when used in conjunctions. (Adrien Grand) 3819 3820* LUCENE-7827: AnalyzingInfixSuggester doesn't create "textgrams" 3821 when minPrefixChar=0 (Mikhail Khludnev) 3822 3823Bug Fixes 3824 3825* LUCENE-8066: It was still possible to construct a concave GeoExactCircle, so use 3826 a sector approach to prevent that. (Ignacio Vera) 3827 3828* LUCENE-7967: The GeoDegeneratePoint isWithin() method needed allowance for 3829 numerical precision. (Karl Wright) 3830 3831* LUCENE-7965: GeoBBoxFactory was constructing the wrong shape at the poles 3832 if the longitude span was greater than 180 degrees. (Karl Wright) 3833 3834* LUCENE-7916: Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used 3835 with a different ICU JAR version than it is compiled against. Note, this is 3836 not recommended, lucene-analyzers-icu contains binary data structures 3837 specific to ICU/Unicode versions it is built against. (Chris Koenig, Robert Muir) 3838 3839* LUCENE-7891: Lucene's taxonomy facets now uses a non-buggy LRU cache 3840 by default. (Jan-Willem van den Broek via Mike McCandless) 3841 3842* LUCENE-7959: Improve NativeFSLockFactory's exception message if it cannot create 3843 write.lock for an empty index due to bad permissions/read-only filesystem/etc. 3844 (Erick Erickson, Shawn Heisey, Robert Muir) 3845 3846* LUCENE-7968: AnalyzingSuggester would sometimes order suggestions incorrectly, 3847 it did not properly break ties on the surface forms when both the weights and 3848 the analyzed forms were equal. (Robert Muir) 3849 3850* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all 3851 child scorers (Adrien Grand, Mike McCandless) 3852 3853* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser 3854 by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) 3855 3856Build 3857 3858* SOLR-11181: Switch order of maven artifact publishing procedure: deploy first 3859 instead of locally installing first, to workaround a double repository push of 3860 *-sources.jar and *-javadoc.jar files. (Lynn Monson via Steve Rowe) 3861 3862* LUCENE-6673: Maven build fails for target javadoc:jar. 3863 (Ramkumar Aiyengar, Daniel Collins via Steve Rowe) 3864 3865* LUCENE-7985: Upgrade forbiddenapis to 2.4.1. (Uwe Schindler) 3866 3867Other 3868 3869* LUCENE-7948, LUCENE-7937: Upgrade randomizedtesting to 2.5.3 (minor fixes 3870 in test filtering for IDEs). (Mike Sokolov, Dawid Weiss) 3871 3872* LUCENE-7933: LongBitSet now validates the numBits parameter (Won 3873 Jonghoon, Mike McCandless) 3874 3875* LUCENE-7978: Add some more documentation about setting up build 3876 environment. (Anton R. Yuste via Uwe Schindler) 3877 3878* LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface 3879 instead of an abstract class with a single method (Dawid Weiss) 3880 3881* LUCENE-5753: Update TLDs recognized by UAX29URLEmailTokenizer. (Steve Rowe) 3882 3883 3884======================= Lucene 7.0.1 ======================= 3885 3886Bug Fixes 3887 3888* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all 3889 child scorers (Adrien Grand, Mike McCandless) 3890 3891======================= Lucene 7.0.0 ======================= 3892 3893New Features 3894 3895* LUCENE-7703: SegmentInfos now record the major Lucene version at index 3896 creation time. (Adrien Grand) 3897 3898* LUCENE-7756: LeafReader.getMetaData now exposes the index created version as 3899 well as the oldest Lucene version that contributed to the segment. 3900 (Adrien Grand) 3901 3902* LUCENE-7854: The new TermFrequencyAttribute used during analysis 3903 with a custom token stream allows indexing custom term frequencies 3904 (Mike McCandless) 3905 3906* LUCENE-7866: Add a new DelimitedTermFrequencyTokenFilter that allows to 3907 mark tokens with a custom term frequency (LUCENE-7854). It parses a numeric 3908 value after a separator char ('|') at the end of each token and changes 3909 the term frequency to this value. (Uwe Schindler, Robert Muir, Mike 3910 McCandless) 3911 3912* LUCENE-7868: Multiple threads can now resolve deletes and doc values 3913 updates concurrently, giving sizable speedups in update-heavy 3914 indexing use cases (Simon Willnauer, Mike McCandless) 3915 3916* LUCENE-7823: Pure query based naive bayes classifier using BM25 scores (Tommaso Teofili) 3917 3918* LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili) 3919 3920* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory. 3921 (Juan Pedro via Adrien Grand) 3922 3923API Changes 3924 3925* LUCENE-2605: Classic QueryParser no longer splits on whitespace by default. 3926 Use setSplitOnWhitespace(true) to get the old behavior. (Steve Rowe) 3927 3928* LUCENE-7369: Similarity.coord and BooleanQuery.disableCoord are removed. 3929 (Adrien Grand) 3930 3931* LUCENE-7368: Removed query normalization. (Adrien Grand) 3932 3933* LUCENE-7355: AnalyzingQueryParser has been removed as its functionality has 3934 been folded into the classic QueryParser. (Adrien Grand) 3935 3936* LUCENE-7407: Doc values APIs have been switched from random access 3937 to iterators, enabling future codec compression improvements. (Mike 3938 McCandless) 3939 3940* LUCENE-7475: Norms now support sparsity, allowing to pay for what is 3941 actually used. (Adrien Grand) 3942 3943* LUCENE-7494: Points now have a per-field API, like doc values. (Adrien Grand) 3944 3945* LUCENE-7410: Cache keys and close listeners have been refactored in order 3946 to be less trappy. See IndexReader.getReaderCacheHelper and 3947 LeafReader.getCoreCacheHelper. (Adrien Grand) 3948 3949* LUCENE-6819: Index-time boosts are not supported anymore. As a replacement, 3950 index-time scoring factors should be indexed into a doc value field and 3951 combined at query time using eg. FunctionScoreQuery. (Adrien Grand) 3952 3953* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType. 3954 (David Smiley) 3955 3956* LUCENE-7701: Grouping collectors have been refactored, such that groups are 3957 now defined by a GroupSelector implementation. (Alan Woodward) 3958 3959* LUCENE-7741: DoubleValuesSource now has an explain() method (Alan Woodward, 3960 Adrien Grand) 3961 3962* LUCENE-7815: Removed the PostingsHighlighter; you should use the UnifiedHighlighter 3963 instead, which derived from the UH. WholeBreakIterator and 3964 CustomSeparatorBreakIterator were moved to UH's package. (David Smiley) 3965 3966* LUCENE-7850: Removed support for legacy numerics. (Adrien Grand) 3967 3968* LUCENE-7500: Removed abstract LeafReader.fields(); instead terms(fieldName) 3969 has been made abstract, fomerly was final. Also, MultiFields.getTerms 3970 was optimized to work directly instead of being implemented on getFields. 3971 (David Smiley) 3972 3973* LUCENE-7872: TopDocs.totalHits is now a long. (Adrien Grand, hossman) 3974 3975* LUCENE-7868: IndexWriterConfig.setMaxBufferedDeleteTerms is 3976 removed. (Simon Willnauer, Mike McCandless) 3977 3978* LUCENE-7877: PrefixAwareTokenStream is replaced with ConcatenatingTokenStream 3979 (Alan Woodward, Uwe Schindler, Adrien Grand) 3980 3981* LUCENE-7867: The deprecated Token class is now only available in the test 3982 framework (Alan Woodward, Adrien Grand) 3983 3984* LUCENE-7723: DoubleValuesSource enforces implementation of equals() and 3985 hashCode() (Alan Woodward) 3986 3987* LUCENE-7737: The spatial-extras module no longer has a dependency on the 3988 queries module. All uses of ValueSource are either replaced with core 3989 DoubleValuesSource extensions, or with the new ShapeValuesSource and 3990 ShapeValuesPredicate classes (Alan Woodward, David Smiley) 3991 3992* LUCENE-7892: Doc-values query factory methods have been renamed so that their 3993 name contains "slow" in order to cleary indicate that they would usually be a 3994 bad choice. (Adrien Grand) 3995 3996* LUCENE-7899: FieldValueQuery is renamed to DocValuesFieldExistsQuery 3997 (Adrien Grand, Mike McCandless) 3998 3999Bug Fixes 4000 4001* LUCENE-7626: IndexWriter will no longer accept broken token offsets 4002 (Mike McCandless) 4003 4004* LUCENE-7859: Spatial-extras PackedQuadPrefixTree bug that only revealed itself 4005 with the new pointsOnly optimizations in LUCENE-7845. (David Smiley) 4006 4007* LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value, introducing 4008 wrap methods accepting children as DISI. Extracting ToParentDocValues (Mikhail Khludnev) 4009 4010* LUCENE-7914: Add a maximum recursion level in automaton recursive 4011 functions (Operations.isFinite and Operations.topsortState) to prevent 4012 large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi) 4013 4014* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even 4015 if possible). (Dawid Weiss) 4016 4017* LUCENE-7956: Fixed potential stack overflow error in ICUNormalizer2CharFilter. 4018 (Adrien Grand) 4019 4020* LUCENE-7963: Remove useless getAttribute() in DefaultIndexingChain that 4021 causes performance drop, introduced by LUCENE-7626. (Daniel Mitterdorfer 4022 via Uwe Schindler) 4023 4024Improvements 4025 4026* LUCENE-7489: Better storage of sparse doc-values fields with the default 4027 codec. (Adrien Grand) 4028 4029* LUCENE-7730: More accurate encoding of the length normalization factor 4030 thanks to the removal of index-time boosts. (Adrien Grand) 4031 4032* LUCENE-7901: Original Highlighter now eagerly throws an exception if you 4033 provide components that are null. (Jason Gerlowski, David Smiley) 4034 4035* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss) 4036 4037Optimizations 4038 4039* LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both 4040 in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT 4041 clauses. (Spyros Kapnissis via Adrien Grand, Uwe Schindler) 4042 4043* LUCENE-7506: FastTaxonomyFacetCounts should use CPU in proportion to 4044 the size of the intersected set of hits from the query and documents 4045 that have a facet value, so sparse faceting works as expected 4046 (Adrien Grand via Mike McCandless) 4047 4048* LUCENE-7519: Add optimized APIs to compute browse-only top level 4049 facets (Mike McCandless) 4050 4051* LUCENE-7589: Numeric doc values now have the ability to encode blocks of 4052 values using different numbers of bits per value if this proves to save 4053 storage. (Adrien Grand) 4054 4055* LUCENE-7845: Enhance spatial-extras RecursivePrefixTreeStrategy queries when the 4056 query is a point (for 2D) or a is a simple date interval (e.g. 1 month). When 4057 the strategy is marked as pointsOnly, the results is a TermQuery. (David Smiley) 4058 4059* LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi) 4060 4061* LUCENE-7828: Speed up range queries on range fields by improving how we 4062 compute the relation between the query and inner nodes of the BKD tree. 4063 (Adrien Grand) 4064 4065Other 4066 4067* LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss) 4068 4069* LUCENE-7328: Remove LegacyNumericEncoding from GeoPointField. (Nick Knize) 4070 4071* LUCENE-7360: Remove Explanation.toHtml() (Alan Woodward) 4072 4073* LUCENE-7681: MemoryIndex uses new DocValues API (Alan Woodward) 4074 4075* LUCENE-7753: Make fields static when possible. 4076 (Daniel Jelinski via Adrien Grand) 4077 4078* LUCENE-7540: Upgrade ICU to 59.1 (Mike McCandless, Jim Ferenczi) 4079 4080* LUCENE-7852: Correct copyright year(s) in lucene/LICENSE.txt file. 4081 (Christine Poerschke, Steve Rowe) 4082 4083* LUCENE-7719: Generalized the UnifiedHighlighter's support for AutomatonQuery 4084 for character & binary automata. Added AutomatonQuery.isBinary. (David Smiley) 4085 4086* LUCENE-7873: Due to serious problems with context class loaders in several 4087 frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, 4088 DocValuesFormats and all analysis factories was changed to only inspect the 4089 current classloader that defined the interface class (lucene-core.jar). 4090 See MIGRATE.txt for more information! (Uwe Schindler, Dawid Weiss) 4091 4092* LUCENE-7883: Lucene no longer uses the context class loader when resolving 4093 resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only 4094 resolved against Lucene's class loader by default. Please use another builder 4095 method to change to a custom classloader. (Uwe Schindler) 4096 4097* LUCENE-5822: Convert README to Markdown (Jason Gerlowski via Mike Drob) 4098 4099* LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer. 4100 (Ahmet Arslan via Steve Rowe) 4101 4102* LUCENE-7800: Remove code that potentially rethrows checked exceptions 4103 from methods that don't declare them ("sneaky throw" hack). (Robert Muir, 4104 Uwe Schindler, Dawid Weiss) 4105 4106* LUCENE-7876: Avoid calls to LeafReader.fields() and MultiFields.getFields() 4107 that are trivially replaced by LeafReader.terms() and MultiFields.getTerms() 4108 (David Smiley) 4109 4110======================= Lucene 6.6.5 ======================= 4111(No Changes) 4112 4113======================= Lucene 6.6.4 ======================= 4114(No Changes) 4115 4116======================= Lucene 6.6.3 ======================= 4117 4118Build 4119 4120* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy 4121 jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe) 4122 4123======================= Lucene 6.6.2 ======================= 4124 4125Changes in Runtime Behavior 4126 4127* Resolving of external entities in queryparser/xml/CoreParser is disallowed 4128 by default. See SOLR-11477 for details. 4129 4130Bug Fixes 4131 4132* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser 4133 by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) 4134 4135======================= Lucene 6.6.1 ======================= 4136 4137Bug Fixes 4138 4139* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects 4140 that these points are visited in ascending order. The memory index doesn't do this and this can result in document 4141 with multiple points that should match to not match. (Martijn van Groningen) 4142 4143* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi) 4144 4145======================= Lucene 6.6.0 ======================= 4146 4147New Features 4148 4149* LUCENE-7811: Add a concurrent SortedSet facets implementation. 4150 (Mike McCandless) 4151 4152Bug Fixes 4153 4154* LUCENE-7777: ByteBlockPool.readBytes sometimes throws 4155 ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB 4156 were added (Mike McCandless) 4157 4158* LUCENE-7797: The static FSDirectory.listAll(Path) method was always 4159 returning an empty array. (Atkins Chang via Mike McCandless) 4160 4161* LUCENE-7481: Fixed missing rewrite methods for SpanPayloadCheckQuery 4162 and PayloadScoreQuery. (Erik Hatcher) 4163 4164* LUCENE-7808: Fixed PayloadScoreQuery and SpanPayloadCheckQuery 4165 .equals and .hashCode methods. (Erik Hatcher) 4166 4167* LUCENE-7798: Add .equals and .hashCode to ToParentBlockJoinSortField 4168 (Mikhail Khludnev) 4169 4170* LUCENE-7814: DateRangePrefixTree (in spatial-extras) had edge-case bugs for 4171 years >= 292,000,000. (David Smiley) 4172 4173* LUCENE-5365, LUCENE-7818: Fix incorrect condition in queryparser's 4174 QueryNodeOperation#logicalAnd(). (Olivier Binda, Amrit Sarkar, 4175 AppChecker via Uwe Schindler) 4176 4177* LUCENE-7821: The classic and flexible query parsers, as well as Solr's 4178 "lucene"/standard query parser, should require " TO " in range queries, 4179 and accept "TO" as endpoints in range queries. (hossman, Steve Rowe) 4180 4181* LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city). 4182 (Jim Ferenczi) 4183 4184* LUCENE-7817: Pass cached query to onQueryCache instead of null. 4185 (Christoph Kaser via Adrien Grand) 4186 4187* LUCENE-7831: CodecUtil should not seek to negative offsets. (Adrien Grand) 4188 4189* LUCENE-7833: ToParentBlockJoinQuery computed the min score instead of the max 4190 score with ScoreMode.MAX. (Adrien Grand) 4191 4192* LUCENE-7847: Fixed all-docs-match optimization of range queries on range 4193 fields. (Adrien Grand) 4194 4195* LUCENE-7810: Fix equals() and hashCode() methods of several join queries. 4196 (Hossman, Adrien Grand, Martijn van Groningen) 4197 4198Improvements 4199 4200* LUCENE-7782: OfflineSorter now passes the total number of items it 4201 will write to getWriter (Mike McCandless) 4202 4203* LUCENE-7785: Move dictionary for Ukrainian analyzer to external dependency. 4204 (Andriy Rysin via Steve Rowe, Dawid Weiss) 4205 4206* LUCENE-7801: SortedSetDocValuesReaderState now implements 4207 Accountable so you can see how much RAM it's using (Robert Muir, 4208 Mike McCandless) 4209 4210* LUCENE-7792: OfflineSorter can now run concurrently if you pass it 4211 an optional ExecutorService (Dawid Weiss, Mike McCandless) 4212 4213* LUCENE-7811: Sorted set facets now use sparse storage when 4214 collecting hits, when appropriate. (Mike McCandless) 4215 4216Optimizations 4217 4218* LUCENE-7787: spatial-extras HeatmapFacetCounter will now short-circuit it's 4219 work when Bits.MatchNoBits is passed. (David Smiley) 4220 4221Other 4222 4223* LUCENE-7796: Make IOUtils.reThrow idiom declare Error return type so 4224 callers may use it in a way that compiler knows subsequent code is 4225 unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways 4226 with a slightly different semantics (see javadoc). (Hossman, Robert Muir, 4227 Dawid Weiss) 4228 4229* LUCENE-7754: Inner classes should be static whenever possible. 4230 (Daniel Jelinski via Adrien Grand) 4231 4232* LUCENE-7751: Avoid boxing primitives only to call compareTo. 4233 (Daniel Jelinski via Adrien Grand) 4234 4235* LUCENE-7743: Never call new String(String). 4236 (Daniel Jelinski via Adrien Grand) 4237 4238* LUCENE-7761: Fixed comment in ReqExclScorer. 4239 (Pablo Pita Leira via Adrien Grand) 4240 4241======================= Lucene 6.5.1 ======================= 4242 4243Bug Fixes 4244 4245* LUCENE-7755: Fixed join queries to not reference IndexReaders, as it could 4246 cause leaks if they are cached. (Adrien Grand) 4247 4248* LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method. 4249 (Martin Amirault via Adrien Grand) 4250 4251* LUCENE-7769: The UnifiedHighligter wasn't highlighting portions of the query 4252 wrapped in BoostQuery or SpanBoostQuery. (David Smiley, Dmitry Malinin) 4253 4254Other 4255 4256* LUCENE-7763: Remove outdated comment in IndexWriterConfig.setIndexSort javadocs. 4257 (马可阳 via Christine Poerschke) 4258 4259======================= Lucene 6.5.0 ======================= 4260 4261API Changes 4262 4263* LUCENE-7740: Refactor Range Fields to remove Field suffix (e.g., DoubleRange), 4264 move InetAddressRange and InetAddressPoint from sandbox to misc module, and 4265 refactor all other range fields from sandbox to core. (Nick Knize) 4266 4267* LUCENE-7624: TermsQuery has been renamed as TermInSetQuery and moved to core. 4268 (Alan Woodward) 4269 4270* LUCENE-7637: TermInSetQuery requires that all terms come from the same field. 4271 (Adrien Grand) 4272 4273* LUCENE-7644: FieldComparatorSource.newComparator() and 4274 SortField.getComparator() no longer throw IOException (Alan Woodward) 4275 4276* LUCENE-7643: Replaced doc-values queries in lucene/sandbox with factory 4277 methods on the *DocValuesField classes. (Adrien Grand) 4278 4279* LUCENE-7659: Added a IndexWriter#getFieldNames() method (experimental) to return 4280 all field names as visible from the IndexWriter. This would be useful for 4281 IndexWriter#updateDocValues() calls, to prevent calling with non-existent 4282 docValues fields (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless) 4283 4284* LUCENE-6959: Removed ToParentBlockJoinCollector in favour of 4285 ParentChildrenBlockJoinQuery, that can return the matching children documents per 4286 parent document. This query should be executed for each matching parent document 4287 after the main query has been executed. (Adrien Grand, Martijn van Groningen, 4288 Mike McCandless) 4289 4290* LUCENE-7628: Scorer.getChildren() now only returns Scorers that are 4291 positioned on the current document, and can throw an IOException. 4292 AssertingScorer checks that getChildren() is not called on an unpositioned 4293 Scorer. (Alan Woodward, Adrien Grand) 4294 4295* LUCENE-7702: Removed GraphQuery in favour of simple boolean query. (Matt Webber via Jim Ferenczi) 4296 4297* LUCENE-7707: TopDocs.merge now takes a boolean option telling it 4298 when to use the incoming shard index versus when to assign the shard 4299 index itself, allowing users to merge shard responses incrementally 4300 instead of once all shard responses are present. (Simon Willnauer, 4301 Mike McCandless) 4302 4303* LUCENE-7700: A cleanup of merge throughput control logic. Refactored all the 4304 code previously scattered throughout the IndexWriter and 4305 ConcurrentMergeScheduler into a more accessible set of public methods (see 4306 MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and 4307 OneMerge.mergeInit). (Dawid Weiss, Mike McCandless). 4308 4309* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType. 4310 (David Smiley) 4311 4312New Features 4313 4314* LUCENE-7738: Add new InetAddressRange for indexing and querying InetAddress 4315 ranges. (Nick Knize) 4316 4317* LUCENE-7449: Add CROSSES relation support to RangeFieldQuery. (Nick Knize) 4318 4319* LUCENE-7623: Add FunctionScoreQuery and FunctionMatchQuery (Alan Woodward, 4320 Adrien Grand, David Smiley) 4321 4322* LUCENE-7619: Add WordDelimiterGraphFilter, just like 4323 WordDelimiterFilter except it produces correct token graphs so that 4324 proximity queries at search time will produce correct results (Mike 4325 McCandless) 4326 4327* LUCENE-7656: Added the LatLonDocValuesField.new(Box/Distance)Query() factory 4328 methods that are the equivalent of factory methods on LatLonPoint but operate 4329 on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery 4330 for best performance. (Adrien Grand) 4331 4332* LUCENE-7673: Added MultiValued[Int/Long/Float/Double]FieldSource that given a 4333 SortedNumericSelector.Type can give a ValueSource view of a 4334 SortedNumericDocValues field. (Tomás Fernández Löbbe) 4335 4336* LUCENE-7465: Add SimplePatternTokenizer and 4337 SimplePatternSplitTokenizer, using Lucene's regexp/automaton 4338 implementation for analysis/tokenization (Clinton Gormley, Mike 4339 McCandless) 4340 4341* LUCENE-7688: Add OneMergeWrappingMergePolicy class. 4342 (Keith Laban, Christine Poerschke) 4343 4344* LUCENE-7686: The near-real-time document suggester can now 4345 efficiently filter out duplicate suggestions (Uwe Schindler, Mike 4346 McCandless) 4347 4348* LUCENE-7712: SimpleQueryParser now supports default fuzziness 4349 syntax, mapping foo~ to a FuzzyQuery with edit distance 2. (Lee 4350 Hinman, David Pilato via Mike McCandless) 4351 4352Bug Fixes 4353 4354* LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads 4355 and preserve all attributes. (Nathan Gass via Uwe Schindler) 4356 4357* LUCENE-7679: MemoryIndex was ignoring omitNorms settings on passed-in 4358 IndexableFields. (Alan Woodward) 4359 4360* LUCENE-7692: PatternReplaceCharFilterFactory now implements MultiTermAware. 4361 (Adrien Grand) 4362 4363* LUCENE-7685: ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the 4364 rewritten child query in their equals and hashCode implementations. 4365 (Adrien Grand) 4366 4367* LUCENE-7698: CommonGramsQueryFilter was producing a disconnected 4368 token graph, messing up phrase queries when it was used during query 4369 parsing (Ere Maijala via Mike McCandless) 4370 4371* LUCENE-7708: ShingleFilter without unigram was producing a disconnected 4372 token graph, messing up queries when it was used during query 4373 parsing (Jim Ferenczi) 4374 4375Improvements 4376 4377* LUCENE-7055: Added Weight#scorerSupplier, which allows to estimate the cost 4378 of a Scorer before actually building it, in order to optimize how the query 4379 should be run, eg. using points or doc values depending on costs of other 4380 parts of the query. (Adrien Grand) 4381 4382* LUCENE-7643: IndexOrDocValuesQuery allows to execute range queries using 4383 either points or doc values depending on which one is more efficient. 4384 (Adrien Grand) 4385 4386* LUCENE-7662: If index files are missing, throw CorruptIndexException instead 4387 of the less descriptive FileNotFound or NoSuchFileException (Mike Drob via 4388 Mike McCandless, Erick Erickson) 4389 4390* LUCENE-7680: UsageTrackingQueryCachingPolicy never caches term filters anymore 4391 since they are plenty fast. This also has the side-effect of leaving more 4392 space in the history for costly filters. (Adrien Grand) 4393 4394* LUCENE-7677: UsageTrackingQueryCachingPolicy now caches compound queries a bit 4395 earlier than regular queries in order to improve cache efficiency. 4396 (Adrien Grand) 4397 4398* LUCENE-7710: BlockPackedReader throws CorruptIndexException and includes 4399 IndexInput description instead of plain IOException (Mike Drob via 4400 Mike McCandless) 4401 4402* LUCENE-7695: ComplexPhraseQueryParser to support query time synonyms (Markus Jelsma 4403 via Mikhail Khludnev) 4404 4405* LUCENE-7747: QueryBuilder now iterates lazily over the possible paths when building a graph query 4406 (Jim Ferenczi) 4407 4408Optimizations 4409 4410* LUCENE-7641: Optimized point range queries to compute documents that do not 4411 match the range on single-valued fields when more than half the documents in 4412 the index would match. (Adrien Grand) 4413 4414* LUCENE-7656: Speed up for LatLonPointDistanceQuery by computing distances even 4415 less often. (Adrien Grand) 4416 4417* LUCENE-7661: Speed up for LatLonPointInPolygonQuery by pre-computing the 4418 relation of the polygon with a grid. (Adrien Grand) 4419 4420* LUCENE-7660: Speed up LatLonPointDistanceQuery by improving the detection of 4421 whether BKD cells are entirely within the distance close to the dateline. 4422 (Adrien Grand) 4423 4424* LUCENE-7654: ToParentBlockJoinQuery now implements two-phase iteration and 4425 computes scores lazily in order to be faster when used in conjunctions. 4426 (Adrien Grand) 4427 4428* LUCENE-7667: BKDReader now calls `IntersectVisitor.grow()` on larger 4429 increments. (Adrien Grand) 4430 4431* LUCENE-7638: Query parsers now analyze the token graph for articulation 4432 points (or cut vertices) in order to create more efficient queries for 4433 multi-token synonyms. (Jim Ferenczi) 4434 4435* LUCENE-7699: Query parsers now use span queries to produce more efficient 4436 phrase queries for multi-token synonyms. (Matt Webber via Jim Ferenczi) 4437 4438* LUCENE-7742: Fix places where we were unboxing and then re-boxing 4439 according to FindBugs (Daniel Jelinski via Mike McCandless) 4440 4441* LUCENE-7739: Fix places where we unnecessarily boxed while parsing 4442 a numeric value according to FindBugs (Daniel Jelinski via Mike 4443 McCandless) 4444 4445Build 4446 4447* LUCENE-7653: Update randomizedtesting to version 2.5.0. (Dawid Weiss) 4448 4449* LUCENE-7665: Remove grouping dependency from the join module. 4450 (Martijn van Groningen) 4451 4452* SOLR-10023: Add non-recursive 'test-nocompile' target: Only runs unit tests. 4453 Jars are not downloaded; compilation is not updated; and Clover is not enabled. 4454 (Steve Rowe) 4455 4456* LUCENE-7694: Update forbiddenapis to version 2.3. (Uwe Schindler) 4457 4458* LUCENE-7693: Replace "org.apache." logic in GetMavenDependenciesTask. 4459 (Daniel Collins, Christine Poerschke) 4460 4461* LUCENE-7726: Fix HTML entity bugs in Javadocs to be able to build with 4462 Java 9. (Uwe Schindler, Hossman) 4463 4464* LUCENE-7727: Replace end-of-life Markdown parser "Pegdown" by "Flexmark" 4465 for compatibility with Java 9. (Uwe Schindler) 4466 4467Other 4468 4469* LUCENE-7666: Fix typos in lucene-join package info javadoc. 4470 (Tom Saleeba via Christine Poerschke) 4471 4472* LUCENE-7658: queryparser/xml CoreParser now implements SpanQueryBuilder interface. 4473 (Daniel Collins, Christine Poerschke) 4474 4475* LUCENE-7715: NearSpansUnordered simplifications. 4476 (Paul Elschot via Adrien Grand) 4477 4478======================= Lucene 6.4.2 ======================= 4479 4480Bug Fixes 4481 4482* LUCENE-7676: Fixed FilterCodecReader to override more super-class methods. 4483 Also added TestFilterCodecReader class. (Christine Poerschke) 4484 4485* LUCENE-7717: The UnifiedHighlighter and PostingsHighlighter were not highlighting 4486 prefix queries with multi-byte characters. TermRangeQuery is affected too. 4487 (Dmitry Malinin, David Smiley) 4488 4489======================= Lucene 6.4.1 ======================= 4490 4491Build 4492 4493* LUCENE-7651: Fix Javadocs build for Java 8u121 by injecting "Google Code 4494 Prettify" without adding Javascript to Javadocs's -bottom parameter. 4495 Also update Prettify to latest version to fix Google Chrome issue. 4496 (Uwe Schindler) 4497 4498Bug Fixes 4499 4500* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery 4501 with a TermContext is cached. (Adrien Grand) 4502 4503* LUCENE-7647: Made stored fields reclaim native memory more aggressively when 4504 configured with BEST_COMPRESSION. This could otherwise result in out-of-memory 4505 issues. (Adrien Grand) 4506 4507* LUCENE-7670: AnalyzingInfixSuggester should not immediately open an 4508 IndexWriter over an already-built index. (Steve Rowe) 4509 4510======================= Lucene 6.4.0 ======================= 4511 4512API Changes 4513 4514* LUCENE-7533: Classic query parser no longer allows autoGeneratePhraseQueries 4515 to be set to true when splitOnWhitespace is false (and vice-versa). 4516 4517* LUCENE-7607: LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer 4518 are declared as throwing IOException (Alan Woodward) 4519 4520* LUCENE-7617: Collector construction for two-pass grouping queries is 4521 abstracted into a new Grouper class, which can be passed as a constructor 4522 parameter to GroupingSearch. The abstract base classes for the different 4523 grouping Collectors are renamed to remove the Abstract* prefix. 4524 (Alan Woodward, Martijn van Groningen) 4525 4526* LUCENE-7609: The expressions module now uses the DoubleValuesSource API, and 4527 no longer depends on the queries module. Expression#getValueSource() is 4528 replaced with Expression#getDoubleValuesSource(). (Alan Woodward, Adrien 4529 Grand) 4530 4531* LUCENE-7610: The facets module now uses the DoubleValuesSource API, and 4532 methods that take ValueSource parameters are deprecated (Alan Woodward) 4533 4534* LUCENE-7611: DocumentValueSourceDictionary now takes a LongValuesSource 4535 as a parameter, and the ValueSource equivalent is deprecated (Alan Woodward) 4536 4537New features 4538 4539* LUCENE-5867: Added BooleanSimilarity. (Robert Muir, Adrien Grand) 4540 4541* LUCENE-7466: Added AxiomaticSimilarity. (Peilin Yang via Tommaso Teofili) 4542 4543* LUCENE-7590: Added DocValuesStatsCollector to compute statistics on DocValues 4544 fields. (Shai Erera) 4545 4546* LUCENE-7587: The new FacetQuery and MultiFacetQuery helper classes 4547 make it simpler to execute drill down when drill sideways counts are 4548 not needed (Emmanuel Keller via Mike McCandless) 4549 4550* LUCENE-6664: A new SynonymGraphFilter outputs a correct graph 4551 structure for multi-token synonyms, separating out a 4552 FlattenGraphFilter that is hardwired into the current 4553 SynonymFilter. This finally makes it possible to implement 4554 correct multi-token synonyms at search time. See 4555 http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html 4556 for details. (Mike McCandless) 4557 4558* LUCENE-5325: Added LongValuesSource and DoubleValuesSource, intended as 4559 type-safe replacements for ValueSource in the queries module. These 4560 expose per-segment LongValues or DoubleValues iterators. (Alan Woodward, Adrien Grand) 4561 4562* LUCENE-7603: Graph token streams are now handled accurately by query 4563 parsers, by enumerating all paths and creating the corresponding 4564 query/ies as sub-clauses (Matt Weber via Mike McCandless) 4565 4566* LUCENE-7588: DrillSideways can now run queries concurrently, and 4567 supports an IndexSearcher using an executor service to run each query 4568 concurrently across all segments in the index (Emmanuel Keller via 4569 Mike McCandless) 4570 4571* LUCENE-7627: Added .intersect methods to SortedDocValues and 4572 SortedSetDocValues to allow filtering their TermsEnums with a 4573 CompiledAutomaton (Alan Woodward, Mike McCandless) 4574 4575Bug Fixes 4576 4577* LUCENE-7547: JapaneseTokenizerFactory was failing to close the 4578 dictionary file it opened (Markus via Mike McCandless) 4579 4580* LUCENE-7562: CompletionFieldsConsumer sometimes throws 4581 NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless) 4582 4583* LUCENE-7533: Classic query parser: disallow autoGeneratePhraseQueries=true 4584 when splitOnWhitespace=false (and vice-versa). (Steve Rowe) 4585 4586* LUCENE-7536: ASCIIFoldingFilterFactory used to return an illegal multi-term 4587 component when preserveOriginal was set to true. (Adrien Grand) 4588 4589* LUCENE-7576: Fix Terms.intersect in the default codec to detect when 4590 the incoming automaton is a special case and throw a clearer 4591 exception than NullPointerException (Tom Mortimer via Mike McCandless) 4592 4593* LUCENE-6989: Fix Exception handling in MMapDirectory's unmap hack 4594 support code to work with Java 9's new InaccessibleObjectException 4595 that does not extend ReflectiveAccessException in Java 9. 4596 (Uwe Schindler) 4597 4598* LUCENE-7581: Lucene now prevents updating a doc values field that is used 4599 in the index sort, since this would lead to corruption. (Jim 4600 Ferenczi via Mike McCandless) 4601 4602* LUCENE-7570: IndexWriter may deadlock if a commit is running while 4603 there are too many merges running and one of the merges hits a 4604 tragic exception (Joey Echeverria via Mike McCandless) 4605 4606* LUCENE-7594: Fixed point range queries on floating-point types to recommend 4607 using helpers for exclusive bounds that are consistent with Double.compare. 4608 (Adrien Grand, Dawid Weiss) 4609 4610* LUCENE-7606: Normalization with CustomAnalyzer would only apply the last 4611 token filter. (Adrien Grand) 4612 4613* LUCENE-7612: Removed an unused dependency from the suggester to the misc 4614 module. (Alan Woodward) 4615 4616Improvements 4617 4618* LUCENE-7532: Add back lost codec file format documentation 4619 (Shinichiro Abe via Mike McCandless) 4620 4621* LUCENE-6824: TermAutomatonQuery now rewrites to TermQuery, 4622 PhraseQuery or MultiPhraseQuery when the word automaton is simple 4623 (Mike McCandless) 4624 4625* LUCENE-7431: Allow a certain amount of overlap to be specified between the include 4626 and exclude arguments of SpanNotQuery via negative pre and/or post arguments. 4627 (Marc Morissette via David Smiley) 4628 4629* LUCENE-7544: UnifiedHighlighter: add extension points for handling custom queries. 4630 (Michael Braun, David Smiley) 4631 4632* LUCENE-7538: Asking IndexWriter to store a too-massive text field 4633 now throws IllegalArgumentException instead of a cryptic exception 4634 that closes your IndexWriter (Steve Chen via Mike McCandless) 4635 4636* LUCENE-7524: Added more detailed explanation of how IDF is computed in 4637 ClassicSimilarity and BM25Similarity. (Adrien Grand) 4638 4639* LUCENE-7564: AnalyzingInfixSuggester should close its IndexWriter by default 4640 at the end of build(). (Steve Rowe) 4641 4642* LUCENE-7526: Enhanced UnifiedHighlighter's passage relevancy for queries with 4643 wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed() 4644 which can be overridden to return false to eek out more speed in some cases. 4645 (Timothy M. Rodriguez, David Smiley) 4646 4647* LUCENE-7560: QueryBuilder.createFieldQuery is no longer final, 4648 giving custom query parsers subclassing QueryBuilder more freedom to 4649 control how text is analyzed and converted into a query (Matt Weber 4650 via Mike McCandless) 4651 4652* LUCENE-7537: Index time sorting now supports multi-valued sorts 4653 using selectors (MIN, MAX, etc.) (Jim Ferenczi via Mike McCandless) 4654 4655* LUCENE-7575: UnifiedHighlighter can now highlight fields with queries that don't 4656 necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default. 4657 See UH get/setFieldMatcher. (Jim Ferenczi via David Smiley) 4658 4659* LUCENE-7592: If the segments file is truncated, we now throw 4660 CorruptIndexException instead of the more confusing EOFException 4661 (Mike Drob via Mike McCandless) 4662 4663* LUCENE-6989: Make MMapDirectory's unmap hack work with Java 9 EA (b150+): 4664 Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer). 4665 Java 9 now needs same permissions like Java 8; 4666 RuntimePermission("accessClassInPackage.jdk.internal.ref") 4667 is no longer needed. Support for older Java 9 builds was removed. 4668 (Uwe Schindler) 4669 4670* LUCENE-7401: Changed the way BKD trees pick the split dimension in order to 4671 ensure all dimensions are indexed. (Adrien Grand) 4672 4673* LUCENE-7614: Complex Phrase Query parser ignores double quotes around single token 4674 prefix, wildcard, range queries (Mikhail Khludnev) 4675 4676* LUCENE-7620: Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks 4677 that would create Passages that are too short. Only for use with the UnifiedHighlighter 4678 (and probably PostingsHighlighter). (David Smiley) 4679 4680Optimizations 4681 4682* LUCENE-7568: Optimize merging when index sorting is used but the 4683 index is already sorted (Jim Ferenczi via Mike McCandless) 4684 4685* LUCENE-7563: The BKD in-memory index for dimensional points now uses 4686 a compressed format, using substantially less RAM in some cases 4687 (Adrien Grand, Mike McCandless) 4688 4689* LUCENE-7583: BKD writing now buffers each leaf block in heap before 4690 writing to disk, giving a small speedup in points-heavy use cases. 4691 (Mike McCandless) 4692 4693* LUCENE-7572: Doc values queries now cache their hash code. (Adrien Grand) 4694 4695Other 4696 4697* LUCENE-7546: Fixed references to benchmark wikipedia data and the Jenkins line-docs file 4698 (David Smiley) 4699 4700* LUCENE-7534: fix smokeTestRelease.py to run on Cygwin (Mikhail Khludnev) 4701 4702* LUCENE-7559: UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow 4703 passage creation to be customized. (David Smiley) 4704 4705* LUCENE-7599: Simplify TestRandomChains using Java's built-in Predicate and 4706 Function interfaces. (Ahmet Arslan via Adrien Grand) 4707 4708* LUCENE-7595: Improve RAMUsageTester in test-framework to estimate memory usage of 4709 runtime classes and work with Java 9 EA (b148+). Disable static field heap usage 4710 checker in LuceneTestCase. (Uwe Schindler, Dawid Weiss) 4711 4712Build 4713 4714* LUCENE-7387: fix defaultCodec in build.xml to account for the line ending (hossman) 4715 4716* LUCENE-7543: Make changes-to-html target an offline operation, by moving the 4717 Lucene and Solr DOAP RDF files into the Git source repository under 4718 dev-tools/doap/ and then pulling release dates from those files, rather than 4719 from JIRA. (Mano Kovacs, hossman, Steve Rowe) 4720 4721* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9 4722 build 148+. Also update JGit version for working-copy checks. (Uwe Schindler) 4723 4724======================= Lucene 6.3.0 ======================= 4725 4726API Changes 4727 4728New Features 4729 4730* LUCENE-7438: New "UnifiedHighlighter" derivative of the PostingsHighlighter that 4731 can consume offsets from postings, term vectors, or analysis. It can highlight phrases 4732 as accurately as the standard Highlighter. Light term vectors can be used with offsets 4733 in postings for fast wildcard (MultiTermQuery) highlighting. 4734 (David Smiley, Timothy Rodriguez) 4735 4736* LUCENE-7490: SimpleQueryParser now parses '*' to MatchAllDocsQuery 4737 (Lee Hinman via Mike McCandless) 4738 4739Bug Fixes 4740 4741* LUCENE-7507: Upgrade morfologik-stemming to version 2.1.1 (fixes security 4742 manager issue with Polish dictionary lookup). (Dawid Weiss) 4743 4744* LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are 4745 neither BooleanQuery nor TermQuery. (Steve Rowe) 4746 4747* LUCENE-7456: PerFieldPostings/DocValues was failing to delegate the 4748 merge method (Julien MASSENET via Mike McCandless) 4749 4750* LUCENE-7468: ASCIIFoldingFilter should not emit duplicated tokens when 4751 preserve original is on. (David Causse via Adrien Grand) 4752 4753* LUCENE-7484: FastVectorHighlighter failed to highlight SynonymQuery 4754 (Jim Ferenczi via Mike McCandless) 4755 4756* LUCENE-7476: JapaneseNumberFilter should not invoke incrementToken 4757 on its input after it's exhausted (Andy Hind via Mike McCandless) 4758 4759* LUCENE-7486: DisjunctionMaxQuery does not work correctly with queries that 4760 return negative scores. (Ivan Provalov, Uwe Schindler, Adrien Grand) 4761 4762* LUCENE-7491: Suddenly turning on dimensional points for some fields 4763 that already exist in an index but didn't previously index 4764 dimensional points could cause unexpected merge exceptions (Hans 4765 Lund, Mike McCandless) 4766 4767* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points. 4768 (Hossman) 4769 4770* LUCENE-7493: FacetCollector.search threw an unexpected exception if 4771 you asked for zero hits but wanted facets (Mahesh via Mike McCandless) 4772 4773* LUCENE-7505: AnalyzingInfixSuggester returned invalid results when 4774 allTermsRequired is false and context filters are specified (Mike 4775 McCandless) 4776 4777* LUCENE-7429: AnalyzerWrapper can now modify the normalization chain too and 4778 DelegatingAnalyzerWrapper does the right thing automatically. (Adrien Grand) 4779 4780* LUCENE-7135: Lucene's check for 32 or 64 bit JVM now works around security 4781 manager blocking access to some properties (Aaron Madlon-Kay via 4782 Mike McCandless) 4783 4784Improvements 4785 4786* LUCENE-7439: FuzzyQuery now matches all terms within the specified 4787 edit distance, even if they are short terms (Mike McCandless) 4788 4789* LUCENE-7496: Better toString for SweetSpotSimilarity (janhoy) 4790 4791* LUCENE-7520: Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery 4792 when its field doesn't match the field the extraction is scoped to. 4793 (Cao Manh Dat via David Smiley) 4794 4795Optimizations 4796 4797* LUCENE-7501: BKDReader should not store the split dimension explicitly in the 4798 1D case. (Adrien Grand) 4799 4800Other 4801 4802* LUCENE-7513: Upgrade randomizedtesting to 2.4.0. (Dawid Weiss) 4803 4804* LUCENE-7452: Block join query exception suggests how to find a doc, which 4805 violates orthogonality requirement. (Mikhail Khludnev) 4806 4807* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All 4808 highlighters are supported via SearchTravRetHighlight. (David Smiley) 4809 4810Build 4811 4812* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on 4813 Java 9 (this changed with recent EA build b135). (Uwe Schindler) 4814 4815======================= Lucene 6.2.1 ======================= 4816 4817API Changes 4818 4819* LUCENE-7436: MinHashFilter's constructor, and some of its default 4820 settings, should be public. (Doug Turnbull via Mike McCandless) 4821 4822Bug Fixes 4823 4824* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when 4825 trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one 4826 term. (Thomas Kappler via David Smiley) 4827 4828* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an 4829 ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs) 4830 with large skips. (yonik) 4831 4832* LUCENE-7442: MinHashFilter's ctor should validate its args. 4833 (Cao Manh Dat via Steve Rowe) 4834 4835* LUCENE-7318: Fix backwards compatibility issues around StandardAnalyzer 4836 and its components, introduced with Lucene 6.2.0. The moved classes 4837 were restored in their original packages: LowercaseFilter and StopFilter, 4838 as well as several utility classes. (Uwe Schindler, Mike McCandless) 4839 4840======================= Lucene 6.2.0 ======================= 4841 4842API Changes 4843 4844* ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5. 4845 4846New Features 4847 4848* LUCENE-7388: Add point based IntRangeField, FloatRangeField, LongRangeField along with 4849 supporting queries and tests (Nick Knize) 4850 4851* LUCENE-7381: Add point based DoubleRangeField and RangeFieldQuery for 4852 indexing and querying on Ranges up to 4 dimensions (Nick Knize) 4853 4854* LUCENE-6968: LSH Filter (Tommaso Teofili, Andy Hind, Cao Manh Dat) 4855 4856* LUCENE-7302: IndexWriter methods that change the index now return a 4857 long "sequence number" indicating the effective equivalent 4858 single-threaded execution order (Mike McCandless) 4859 4860* LUCENE-7335: IndexWriter's commit data is now late binding, 4861 recording key/values from a provided iterable based on when the 4862 commit actually takes place (Mike McCandless) 4863 4864* LUCENE-7287: UkrainianMorfologikAnalyzer is a new dictionary-based 4865 analyzer for the Ukrainian language (Andriy Rysin via Mike 4866 McCandless) 4867 4868* LUCENE-7373: Directory.renameFile, which did both renaming and fsync 4869 of the directory metadata, has been deprecated; use the new separate 4870 methods Directory.rename and Directory.syncMetaData instead (Robert Muir, 4871 Uwe Schindler, Mike McCandless) 4872 4873* LUCENE-7355: Added Analyzer#normalize(), which only applies normalization to 4874 an input string. (Adrien Grand) 4875 4876* LUCENE-7380: Add Polygon.fromGeoJSON for more easily creating 4877 Polygon instances from a standard GeoJSON string (Robert Muir, Mike 4878 McCandless) 4879 4880* LUCENE-7395: PerFieldSimilarityWrapper requires a default similarity 4881 for calculating query norm and coordination factor in Lucene 6.x. 4882 Lucene 7 will no longer have those factors. (Uwe Schindler, Sascha Markus) 4883 4884* SOLR-9279: Queries module: new ComparisonBoolFunction base class 4885 (Doug Turnbull via David Smiley) 4886 4887Bug Fixes 4888 4889* LUCENE-6662: Fixed potential resource leaks. (Rishabh Patel via Adrien Grand) 4890 4891* LUCENE-7340: MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug(). 4892 (Daniel Collins, David Smiley) 4893 4894* LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the 4895 wrong default AttributeFactory for new Tokenizers. 4896 (Terry Smith, Uwe Schindler) 4897 4898* LUCENE-7389: Fix FieldType.setDimensions(...) validation for the dimensionNumBytes 4899 parameter. (Martijn van Groningen) 4900 4901* LUCENE-7391: Fix performance regression in MemoryIndex's fields() introduced 4902 in Lucene 6. (Steve Mason via David Smiley) 4903 4904* LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query 4905 norm and coordination factor using a default similarity added as ctor param. 4906 (Uwe Schindler, Sascha Markus) 4907 4908* SOLR-9413: Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test. 4909 (AppChecker, Christine Poerschke) 4910 4911* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup 4912 PositionIncrementAttribute every time. (Mike McCandless, Robert Muir) 4913 4914Improvements 4915 4916* LUCENE-7323: Compound file writing now verifies the incoming 4917 sub-files' checkums and segment IDs, to catch hardware issues or 4918 filesytem bugs earlier (Robert Muir, Mike McCandless) 4919 4920* LUCENE-6766: Index time sorting has graduated from the misc module 4921 to core, is much simpler to use, via 4922 IndexWriter.setIndexSort, and now works with dimensional points. 4923 (Adrien Grand, Mike McCandless) 4924 4925* LUCENE-5931: Detect when an application tries to reopen an 4926 IndexReader after (illegally) removing the old index and 4927 reindexing (Vitaly Funstein, Robert Muir, Mike McCandless) 4928 4929* LUCENE-6171: Lucene now passes the StandardOpenOption.CREATE_NEW 4930 option when writing new files so the filesystem enforces our 4931 write-once architecture, possibly catching externally caused 4932 issues sooner (Robert Muir, Mike McCandless) 4933 4934* LUCENE-7318: StandardAnalyzer has been moved from the analysis 4935 module into core and is now the default analyzer in 4936 IndexWriterConfig (Robert Muir, Mike McCandless) 4937 4938* LUCENE-7345: RAMDirectory now enforces write-once files as well 4939 (Robert Muir, Mike McCandless) 4940 4941* LUCENE-7337: MatchNoDocsQuery now scores with 0 normalization factor 4942 and empty boolean queries now rewrite to MatchNoDocsQuery instead of 4943 vice/versa (Jim Ferenczi via Mike McCandless) 4944 4945* LUCENE-7359: Add equals() and hashCode() to Explanation (Alan Woodward) 4946 4947* LUCENE-7353: ScandinavianFoldingFilterFactory and 4948 ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent. 4949 (Adrien Grand) 4950 4951* LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to 4952 control whether to split on whitespace prior to text analysis. Default 4953 behavior remains unchanged: split-on-whitespace=true. (Steve Rowe) 4954 4955* LUCENE-7276: MatchNoDocsQuery now includes an optional reason for 4956 why it was used (Jim Ferenczi via Mike McCandless) 4957 4958* LUCENE-7355: AnalyzingQueryParser now only applies the subset of the analysis 4959 chain that is about normalization for range/fuzzy/wildcard queries. 4960 (Adrien Grand) 4961 4962* LUCENE-7376: Add support for ToParentBlockJoinQuery to fast vector highlighter's 4963 FieldQuery. (Martijn van Groningen) 4964 4965* LUCENE-7385: Improve/fix assert messages in SpanScorer. (David Smiley) 4966 4967* LUCENE-7393: Add ICUTokenizer option to parse Myanmar text as syllables instead of words, 4968 because the ICU word-breaking algorithm has some issues. This allows for the previous 4969 tokenization used before Lucene 5. (AM, Robert Muir) 4970 4971* LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with 4972 no guarantees. This uses a store-store barrier and yields the current thread 4973 before unmapping to allow in-flight requests to finish. The new code no longer 4974 uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new 4975 ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones. 4976 (Robert Muir, Uwe Schindler) 4977 4978Optimizations 4979 4980* LUCENE-7330, LUCENE-7339: Speed up conjunction queries. (Adrien Grand) 4981 4982* LUCENE-7356: SearchGroup tweaks. (Christine Poerschke) 4983 4984* LUCENE-7351: Doc id compression for points. (Adrien Grand) 4985 4986* LUCENE-7371: Point values are now better compressed using run-length 4987 encoding. (Adrien Grand) 4988 4989* LUCENE-7311: Cached term queries do not seek the terms dictionary anymore. 4990 (Adrien Grand) 4991 4992* LUCENE-7396, LUCENE-7399: Faster flush of points. 4993 (Adrien Grand, Mike McCandless) 4994 4995* LUCENE-7406: Automaton and PrefixQuery tweaks (fewer object (re)allocations). 4996 (Christine Poerschke) 4997 4998Other 4999 5000* LUCENE-4787: Fixed some highlighting javadocs. (Michael Dodsworth via Adrien 5001 Grand) 5002 5003* LUCENE-7334: Update ASM dependency to 5.1. (Uwe Schindler) 5004 5005* LUCENE-7346: Update forbiddenapis to version 2.2. 5006 (Uwe Schindler) 5007 5008* LUCENE-7360: Explanation.toHtml() is deprecated. (Alan Woodward) 5009 5010* LUCENE-7372: Factor out an org.apache.lucene.search.FilterWeight class. 5011 (Christine Poerschke, Adrien Grand, David Smiley) 5012 5013* LUCENE-7384: Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to 5014 reuse the existing Similarity instead of creating a new one. (David Smiley) 5015 5016======================= Lucene 6.1.0 ======================= 5017 5018New Features 5019 5020* LUCENE-7099: Add LatLonDocValuesField.newDistanceSort to the sandbox. 5021 (Robert Muir) 5022 5023* LUCENE-7140: Add PlanetModel.bisection to spatial3d (Karl Wright via 5024 Mike McCandless) 5025 5026* LUCENE-7069: Add LatLonPoint.nearest, to find nearest N points to a 5027 provided query point (Mike McCandless) 5028 5029* LUCENE-7234: Added InetAddressPoint.nextDown/nextUp to easily generate range 5030 queries with excluded bounds. (Adrien Grand) 5031 5032* LUCENE-7300: The misc module now has a directory wrapper that uses hard-links if 5033 applicable and supported when copying files from another FSDirectory in 5034 Directory#copyFrom. (Simon Willnauer) 5035 5036API Changes 5037 5038* LUCENE-7184: Refactor LatLonPoint encoding methods to new GeoEncodingUtils 5039 helper class in core geo package. Also refactors LatLonPointTests to 5040 TestGeoEncodingUtils (Nick Knize) 5041 5042* LUCENE-7163: refactor GeoRect, Polygon, and GeoUtils tests to geo 5043 package in core (Nick Knize) 5044 5045* LUCENE-7152: Refactor GeoUtils from lucene-spatial package to 5046 core (Nick Knize) 5047 5048* LUCENE-7141: Switch OfflineSorter's ByteSequencesReader to 5049 BytesRefIterator (Mike McCandless) 5050 5051* LUCENE-7150: Spatial3d gets useful APIs to create common shape 5052 queries, matching LatLonPoint. (Karl Wright via Mike McCandless) 5053 5054* LUCENE-7243: Removed the LeafReaderContext parameter from 5055 QueryCachingPolicy#shouldCache. (Adrien Grand) 5056 5057Optimizations 5058 5059* LUCENE-7071: Reduce bytes copying in OfflineSorter, giving ~10% 5060 speedup on merging 2D LatLonPoint values (Mike McCandless) 5061 5062* LUCENE-7105, LUCENE-7215: Optimize LatLonPoint's newDistanceQuery. 5063 (Robert Muir) 5064 5065* LUCENE-7097: IntroSorter now recurses to 2 * log_2(count) quicksort 5066 stack depth before switching to heapsort (Adrien Grand, Mike McCandless) 5067 5068* LUCENE-7115: Speed up FieldCache.CacheEntry toString by setting initial 5069 StringBuilder capacity (Gregory Chanan) 5070 5071* LUCENE-7147: Improve disjoint check for geo distance query traversal 5072 (Ryan Ernst, Robert Muir, Mike McCandless) 5073 5074* LUCENE-7153: GeoPointField and LatLonPoint polygon queries now support 5075 multiple polygons and holes, with memory usage independent of 5076 polygon complexity. (Karl Wright, Mike McCandless, Robert Muir) 5077 5078* LUCENE-7159: Speed up LatLonPoint polygon performance. (Robert Muir, Ryan Ernst) 5079 5080* LUCENE-7211: Reduce memory & GC for spatial RPT Intersects when the number of 5081 matching docs is small. (Jeff Wartes, David Smiley) 5082 5083* LUCENE-7235: LRUQueryCache should not take a lock for segments that it will 5084 not cache on anyway. (Adrien Grand) 5085 5086* LUCENE-7238: Explicitly disable the query cache in MemoryIndex#createSearcher. 5087 (Adrien Grand) 5088 5089* LUCENE-7237: LRUQueryCache now prefers returning an uncached Scorer than 5090 waiting on a lock. (Adrien Grand) 5091 5092* LUCENE-7261, LUCENE-7262, LUCENE-7264, LUCENE-7258: Speed up DocIdSetBuilder 5093 (which is used by TermsQuery, multi-term queries and several point queries). 5094 (Adrien Grand, Jeff Wartes, David Smiley) 5095 5096* LUCENE-7299: Speed up BytesRefHash.sort() using radix sort. (Adrien Grand) 5097 5098* LUCENE-7306: Speed up points indexing and merging using radix sort. 5099 (Adrien Grand) 5100 5101Bug Fixes 5102 5103* LUCENE-7127: Fix corner case bugs in GeoPointDistanceQuery. (Robert Muir) 5104 5105* LUCENE-7166: Fix corner case bugs in LatLonPoint/GeoPointField bounding box 5106 queries. (Robert Muir) 5107 5108* LUCENE-7168: Switch to stable encode for geo3d, remove quantization 5109 test leniency, remove dead code (Mike McCandless) 5110 5111* LUCENE-7301: Multiple doc values updates to the same document within 5112 one update batch could be applied in the wrong order resulting in 5113 the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless) 5114 5115* LUCENE-7312: Fix geo3d's x/y/z double to int encoding to ensure it always 5116 rounds down (Karl Wright, Mike McCandless) 5117 5118* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases 5119 where ranges of documents had only a single clause matching while 5120 other ranges had more than one clause matching (Ahmet Arslan, 5121 hossman, Mike McCandless) 5122 5123* LUCENE-7286: Added support for highlighting SynonymQuery. (Adrien Grand) 5124 5125* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the 5126 dateline and indexed non-point shapes are much bigger than the heatmap region. 5127 (David Smiley) 5128 5129* LUCENE-7333: Fix test bug where randomSimpleString() generated a filename 5130 that is a reserved device name on Windows. (Uwe Schindler, Mike McCandless) 5131 5132Other 5133 5134* LUCENE-7295: TermAutomatonQuery.hashCode calculates Automaton.toDot().hash, 5135 equivalence relationship replaced with object identity. (Dawid Weiss) 5136 5137* LUCENE-7277: Make Query.hashCode and Query.equals abstract. (Paul Elschot, 5138 Dawid Weiss) 5139 5140* LUCENE-7174: Upgrade randomizedtesting to 2.3.4. (Uwe Schindler, Dawid Weiss) 5141 5142* LUCENE-7205: Remove repeated nl.getLength() calls in 5143 (Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder. (Christine Poerschke) 5144 5145* LUCENE-7210: Make TestCore*Parser's analyzer choice override-able 5146 (Christine Poerschke, Daniel Collins) 5147 5148* LUCENE-7263: Make queryparser/xml/CoreParser's SpanQueryBuilderFactory 5149 accessible to deriving classes. (Daniel Collins via Christine Poerschke) 5150 5151* SOLR-9109/SOLR-9121: Allow specification of a custom Ivy settings file via system 5152 property "ivysettings.xml". (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe) 5153 5154* LUCENE-7206: Improve the ToParentBlockJoinQuery's explain by including the explain 5155 of the best matching child doc. (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen) 5156 5157* LUCENE-7307: Add getters to the PointInSetQuery and PointRangeQuery queries. 5158 (Martijn van Groningen, Adrien Grand) 5159 5160Build 5161 5162* LUCENE-7292: Use '-release' instead of '-source/-target' during 5163 compilation on Java 9+ to ensure real cross-compilation. 5164 (Uwe Schindler) 5165 5166* LUCENE-7296: Update forbiddenapis to version 2.1. 5167 (Uwe Schindler) 5168 5169======================= Lucene 6.0.1 ======================= 5170 5171New Features 5172 5173* LUCENE-7278: Spatial-extras DateRangePrefixTree's Calendar is now configurable, to 5174 e.g. clear the Gregorian Change Date. Also, toString(cal) is now identical to 5175 DateTimeFormatter.ISO_INSTANT. (David Smiley) 5176 5177Bug Fixes 5178 5179* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations 5180 should delegate to the wrapped weight. (Martijn van Groningen) 5181 5182* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand) 5183 5184* LUCENE-7232: Fixed InetAddressPoint.newPrefixQuery, which was generating an 5185 incorrect query when the prefix length was not a multiple of 8. (Adrien Grand) 5186 5187* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException 5188 on some valid inputs (Mike McCandless) 5189 5190* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll() 5191 that led to IllegalStateException being thrown when nothing was wrong. 5192 (David Smiley, yonik) 5193 5194* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders 5195 match the underlying queries' (lower|upper)Term optionality logic. 5196 (Kaneshanathan Srivisagan, Christine Poerschke) 5197 5198* LUCENE-7257: Fixed PointValues#size(IndexReader, String), docCount, 5199 minPackedValue and maxPackedValue to skip leaves that do not have points 5200 rather than raising an IllegalStateException. (Adrien Grand) 5201 5202* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan 5203 Woodward) 5204 5205* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term 5206 phrase queries. (Eva Popenda, Alan Woodward) 5207 5208* LUCENE-7293: Don't try to highlight GeoPoint queries (Britta Weber, 5209 Nick Knize, Mike McCandless, Uwe Schindler) 5210 5211Documentation 5212 5213* LUCENE-7223: Improve XXXPoint javadocs to make it clear that you 5214 should separately add StoredField if you want to retrieve these 5215 field values at search time (Greg Huber, Robert Muir, Mike McCandless) 5216 5217======================= Lucene 6.0.0 ======================= 5218 5219System Requirements 5220 5221* LUCENE-5950: Move to Java 8 as minimum Java version. 5222 (Ryan Ernst, Uwe Schindler) 5223 5224* LUCENE-6069: Lucene Core now gets compiled with Java 8 "compact1" profile, 5225 all other modules with "compact2". (Robert Muir, Uwe Schindler) 5226 5227New Features 5228 5229* LUCENE-6631: Lucene Document classification (Tommaso Teofili, Alessandro Benedetti) 5230 5231* LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single 5232 token which is a concatenation of the sorted and de-duplicated set of 5233 input tokens. Useful for normalizing short text in clustering/linking 5234 tasks. (Mark Harwood, Adrien Grand) 5235 5236* LUCENE-5735: NumberRangePrefixTreeStrategy now includes interval/range faceting 5237 for counting ranges that align with the underlying terms as defined by the 5238 NumberRangePrefixTree (e.g. familiar date units like days). (David Smiley) 5239 5240* LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field 5241 length computations, to avoid skew from documents that don't have the field. 5242 (Ahmet Arslan via Robert Muir) 5243 5244* LUCENE-6758: Use docCount+1 for DefaultSimilarity's IDF, so that queries 5245 containing nonexistent fields won't screw up querynorm. (Terry Smith, Robert Muir) 5246 5247* SOLR-7876: The QueryTimeout interface now has a isTimeoutEnabled method 5248 that can return false to exit from ExitableDirectoryReader wrapping at 5249 the point fields() is called. (yonik) 5250 5251* LUCENE-6825: Add low-level support for block-KD trees (Mike McCandless) 5252 5253* LUCENE-6852, LUCENE-6975: Add support for points (dimensionally 5254 indexed values) to index, document and codec APIs, including a 5255 simple text implementation. (Mike McCandless) 5256 5257* LUCENE-6861: Create Lucene60Codec, supporting points. 5258 (Mike McCandless) 5259 5260* LUCENE-6879: Allow to define custom CharTokenizer instances without 5261 subclassing using Java 8 lambdas or method references. (Uwe Schindler) 5262 5263* LUCENE-6881: Cutover all BKD implementations to points 5264 (Mike McCandless) 5265 5266* LUCENE-6837: Add N-best output support to JapaneseTokenizer. 5267 (Hiroharu Konno via Christian Moen) 5268 5269* LUCENE-6962: Add per-dimension min/max to points 5270 (Mike McCandless) 5271 5272* LUCENE-6975: Add ExactPointQuery, to match a single N-dimensional 5273 point (Robert Muir, Mike McCandless) 5274 5275* LUCENE-6989: Add preliminary support for MMapDirectory unmapping in Java 9. 5276 (Uwe Schindler, Chris Hegarty, Peter Levart) 5277 5278* LUCENE-7040: Upgrade morfologik-stemming to version 2.1.0. 5279 (Dawid Weiss) 5280 5281* LUCENE-7048: Add XXXPoint.newSetQuery, to create a query that 5282 efficiently matches all documents containing any of the specified 5283 point values. This is the analog of TermsQuery, but for points 5284 instead. (Adrien Grand, Robert Muir, Mike McCandless) 5285 5286API Changes 5287 5288* LUCENE-7094: BBoxStrategy and PointVectorStrategy now support 5289 PointValues (in addition to legacy numeric trie). Their APIs 5290 were changed a little and also made more consistent. PointValues/Trie 5291 is optional, DocValues is optional, stored value is optional. 5292 (Nick Knize, David Smiley) 5293 5294* LUCENE-6067: Accountable.getChildResources has a default 5295 implementation returning the empty list. (Robert Muir) 5296 5297* LUCENE-6583: FilteredQuery has been removed. Instead, you can construct a 5298 BooleanQuery with one MUST clause for the query, and one FILTER clause for 5299 the filter. (Adrien Grand) 5300 5301* LUCENE-6651: AttributeImpl#reflectWith(AttributeReflector) was made 5302 abstract and has no reflection-based default implementation anymore. 5303 (Uwe Schindler) 5304 5305* LUCENE-6706: PayloadTermQuery and PayloadNearQuery have been removed. 5306 Instead, use PayloadScoreQuery to wrap any SpanQuery. (Alan Woodward) 5307 5308* LUCENE-6829: OfflineSorter, and the classes that use it (suggesters, 5309 hunspell) now do all temporary file IO via Directory instead of 5310 directly through java's temp dir. Directory.createTempOutput 5311 creates a uniquely named IndexOutput, and the new 5312 IndexOutput.getName returns its name (Dawid Weiss, Robert Muir, Mike 5313 McCandless) 5314 5315* LUCENE-6917: Deprecate and rename NumericXXX classes to 5316 LegacyNumericXXX in favor of points (Mike McCandless) 5317 5318* LUCENE-6947: SortField.missingValue is now protected. You can read its 5319 value using the new SortField.getMissingValue getter. (Adrien Grand) 5320 5321* LUCENE-7028: Remove duplicate method in LegacyNumericUtils. 5322 (Uwe Schindler) 5323 5324* LUCENE-7052, LUCENE-7053: Remove custom comparators from BytesRef 5325 class and solely use natural byte[] comparator throughout codebase. 5326 This also simplifies API of BytesRefHash. It also replaces the natural 5327 comparator in ArrayUtil by Java 8's Comparator#naturalOrder(). 5328 (Mike McCandless, Uwe Schindler, Robert Muir) 5329 5330* LUCENE-7060: Update Spatial4j to 0.6. The package com.spatial4j.core 5331 is now org.locationtech.spatial4j. (David Smiley) 5332 5333* LUCENE-7058: Add getters to various Query implementations (Guillaume Smet via 5334 Alan Woodward) 5335 5336* LUCENE-7064: MultiPhraseQuery is now immutable and should be constructed 5337 with MultiPhraseQuery.Builder. (Luc Vanlerberghe via Adrien Grand) 5338 5339* LUCENE-7072: Geo3DPoint always uses WGS84 planet model. 5340 (Robert Muir, Mike McCandless) 5341 5342* LUCENE-7056: Geo3D classes are in different packages now. (David Smiley) 5343 5344* LUCENE-6952: These classes are now abstract: FilterCodecReader, FilterLeafReader, 5345 FilterCollector, FilterDirectory. And some Filter* classes in 5346 lucene-test-framework too. (David Smiley) 5347 5348* SOLR-8867: FunctionValues.getRangeScorer now takes a LeafReaderContext instead 5349 of an IndexReader, and avoids matching documents without a value in the field 5350 for numeric fields. (yonik) 5351 5352Optimizations 5353 5354* LUCENE-6891: Use prefix coding when writing points in 5355 each leaf block in the default codec, to reduce the index 5356 size (Mike McCandless) 5357 5358* LUCENE-6901: Optimize points indexing: use faster 5359 IntroSorter instead of InPlaceMergeSorter, and specialize 1D 5360 merging to merge sort the already sorted segments instead of 5361 re-indexing (Mike McCandless) 5362 5363* LUCENE-6793: LegacyNumericRangeQuery.hashCode() is now less subject to hash 5364 collisions. (J.B. Langston via Adrien Grand) 5365 5366* LUCENE-7050: TermsQuery is now cached more aggressively by the default 5367 query caching policy. (Adrien Grand) 5368 5369* LUCENE-7066: PointRangeQuery got optimized for the case that all documents 5370 have a value and all points from the segment match. (Adrien Grand) 5371 5372Changes in Runtime Behavior 5373 5374* LUCENE-6789: IndexSearcher's default Similarity is changed to BM25Similarity. 5375 Use ClassicSimilarity to get the old vector space DefaultSimilarity. (Robert Muir) 5376 5377* LUCENE-6886: Reserve the .tmp file name extension for temp files, 5378 and codec components are no longer allowed to use this extension 5379 (Robert Muir, Mike McCandless) 5380 5381* LUCENE-6835: Directory.listAll now returns entries in sorted order, 5382 to not leak platform-specific behavior, and "retrying file deletion" 5383 is now the responsibility of Directory.deleteFile, not the caller. 5384 (Robert Muir, Mike McCandless) 5385 5386Tests 5387 5388* LUCENE-7009: Add expectThrows utility to LuceneTestCase. This uses a lambda 5389 expression to encapsulate a statement that is expected to throw an exception. 5390 (Ryan Ernst) 5391 5392Bug Fixes 5393 5394* LUCENE-7065: Fix the explain for the global ordinals join query. Before the 5395 explain would also indicate that non matching documents would match. 5396 On top of that with score mode average, the explain would fail with a NPE. 5397 (Martijn van Groningen) 5398 5399* LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many 5400 temporary file descriptors, for large sorts (Mike McCandless) 5401 5402* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for 5403 Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe) 5404 5405* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance 5406 implementation (Karl Wright via Mike McCandless) 5407 5408* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called 5409 on queries that could not be extracted. (Adrien Grand) 5410 5411* LUCENE-7126: Remove GeoPointDistanceRangeQuery. This query was implemented 5412 with boolean NOT, and incorrect for multi-valued documents. (Robert Muir) 5413 5414* LUCENE-7158: Consistently use earth's WGS84 mean radius wherever our 5415 geo search implementations approximate the earth as a sphere (Karl 5416 Wright via Mike McCandless) 5417 5418Other 5419 5420* LUCENE-7035: Upgrade icu4j to 56.1/unicode 8. (Robert Muir) 5421 5422* LUCENE-7087: Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>' 5423 as document instead of 'Document'. (Martijn van Groningen) 5424 5425* LUCENE-7091: Add doc values support to MemoryIndex 5426 (Martijn van Groningen, David Smiley) 5427 5428* LUCENE-7093: Add point values support to MemoryIndex 5429 (Martijn van Groningen, Mike McCandless) 5430 5431* LUCENE-7095: Add point values support to the numeric field query time join. 5432 (Martijn van Groningen, Mike McCandless) 5433 5434======================= Lucene 5.5.5 ======================= 5435 5436Changes in Runtime Behavior 5437 5438* Resolving of external entities in queryparser/xml/CoreParser is disallowed 5439 by default. See SOLR-11477 for details. 5440 5441Bug Fixes 5442 5443* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup 5444 PositionIncrementAttribute every time. (Mike McCandless, Robert Muir) 5445 5446* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser 5447 by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) 5448 5449======================= Lucene 5.5.4 ======================= 5450 5451Bug Fixes 5452 5453* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when 5454 trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one 5455 term. (Thomas Kappler via David Smiley) 5456 5457* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery 5458 with a TermContext is cached. (Adrien Grand) 5459 5460* LUCENE-7647: Made stored fields reclaim native memory more aggressively when 5461 configured with BEST_COMPRESSION. This could otherwise result in out-of-memory 5462 issues. (Adrien Grand) 5463 5464* LUCENE-7562: CompletionFieldsConsumer sometimes throws 5465 NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless) 5466 5467* LUCENE-7547: JapaneseTokenizerFactory was failing to close the 5468 dictionary file it opened (Markus via Mike McCandless) 5469 5470* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points. 5471 (Hossman) 5472 5473* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an 5474 ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs) 5475 with large skips. (yonik) 5476 5477* LUCENE-7570: IndexWriter may deadlock if a commit is running while 5478 there are too many merges running and one of the merges hits a 5479 tragic exception (Joey Echeverria via Mike McCandless) 5480 5481Other 5482 5483* LUCENE-6989: Backport MMapDirectory's unmapping code from Lucene 6.4 to use 5484 MethodHandles. This allows it to work with Java 9 (EA build 150 and later). 5485 (Uwe Schindler) 5486 5487Build 5488 5489* LUCENE-7543: Make changes-to-html target an offline operation, by moving the 5490 Lucene and Solr DOAP RDF files into the Git source repository under 5491 dev-tools/doap/ and then pulling release dates from those files, rather than 5492 from JIRA. (Mano Kovacs, hossman, Steve Rowe) 5493 5494* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9 5495 build 148+. Also update JGit version for working-copy checks. This does not 5496 fix all issues with Java 9, but allows to build the distribution. 5497 (Uwe Schindler) 5498 5499* LUCENE-7651: Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation 5500 build to inject "Google Code Prettify" without adding Javascript to Javadocs's 5501 -bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are 5502 built with Java 7, as there is no generic way in Java 7 to inject Javascript 5503 without breaking Java 8 (and possible paid Java 7 security updates). This 5504 fix also updates Prettify to latest version to work around a Google Chrome 5505 issue. (Uwe Schindler) 5506 5507======================= Lucene 5.5.3 ======================= 5508(No Changes) 5509 5510======================= Lucene 5.5.2 ======================= 5511 5512Bug Fixes 5513 5514* LUCENE-7065: Fix the explain for the global ordinals join query. Before the 5515 explain would also indicate that non matching documents would match. 5516 On top of that with score mode average, the explain would fail with a NPE. 5517 (Martijn van Groningen) 5518 5519* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for 5520 Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe) 5521 5522* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance 5523 implementation (Karl Wright via Mike McCandless) 5524 5525* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations 5526 should delegate to the wrapped weight. (Martijn van Groningen) 5527 5528* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException 5529 on some valid inputs (Mike McCandless) 5530 5531* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders 5532 match the underlying queries' (lower|upper)Term optionality logic. 5533 (Kaneshanathan Srivisagan, Christine Poerschke) 5534 5535* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan 5536 Woodward) 5537 5538* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term 5539 phrase queries. (Eva Popenda, Alan Woodward) 5540 5541* LUCENE-7301: Multiple doc values updates to the same document within 5542 one update batch could be applied in the wrong order resulting in 5543 the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless) 5544 5545* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases 5546 where ranges of documents had only a single clause matching while 5547 other ranges had more than one clause matching (Ahmet Arslan, 5548 hossman, Mike McCandless) 5549 5550* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the 5551 dateline and indexed non-point shapes are much bigger than the heatmap region. 5552 (David Smiley) 5553 5554======================= Lucene 5.5.1 ======================= 5555 5556Bug fixes 5557 5558* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called 5559 on queries that could not be extracted. (Adrien Grand) 5560 5561* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll() 5562 that led to IllegalStateException being thrown when nothing was wrong. 5563 (David Smiley, yonik) 5564 5565* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand) 5566 5567======================= Lucene 5.5.0 ======================= 5568 5569New Features 5570 5571* LUCENE-5868: JoinUtil.createJoinQuery(..,NumericType,..) query-time join 5572 for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values. 5573 (Alexey Zelin via Mikhail Khludnev) 5574 5575* LUCENE-6939: Add exponential reciprocal scoring to 5576 BlendedInfixSuggester, to even more strongly favor suggestions that 5577 match closer to the beginning (Arcadius Ahouansou via Mike McCandless) 5578 5579* LUCENE-6958: Improved CustomAnalyzer to take class references to factories 5580 as alternative to their SPI name. This enables compile-time safety when 5581 defining analyzer's components. (Uwe Schindler, Shai Erera) 5582 5583* LUCENE-6818, LUCENE-6986: Add DFISimilarity implementing the divergence 5584 from independence model. (Ahmet Arslan via Robert Muir) 5585 5586* SOLR-4619: Added removeAllAttributes() to AttributeSource, which removes 5587 all previously added attributes. 5588 5589* LUCENE-7010: Added MergePolicyWrapper to allow easy wrapping of other policies. 5590 (Shai Erera) 5591 5592API Changes 5593 5594* LUCENE-6997: refactor sandboxed GeoPointField and query classes to lucene-spatial 5595 module under new lucene.spatial.geopoint package (Nick Knize) 5596 5597* LUCENE-6908: GeoUtils static relational methods have been refactored to new 5598 GeoRelationUtils and now correctly handle large irregular rectangles, and 5599 pole crossing distance queries. (Nick Knize) 5600 5601* LUCENE-6900: Grouping sortWithinGroup variables used to allow null to mean 5602 Sort.RELEVANCE. Null is no longer permitted. (David Smiley) 5603 5604* LUCENE-6919: The Scorer class has been refactored to expose an iterator 5605 instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed 5606 to twoPhaseIterator() for consistency. (Adrien Grand) 5607 5608* LUCENE-6973: TeeSinkTokenFilter no longer accepts a SinkFilter (the latter 5609 has been removed). If you wish to filter the sinks, you can wrap them with 5610 any other TokenFilter (e.g. a FilteringTokenFilter). Also, you can no longer 5611 add a SinkTokenStream to an existing TeeSinkTokenFilter. If you need to 5612 share multiple streams with a single sink, chain them with multiple 5613 TeeSinkTokenFilters. 5614 DateRecognizerSinkFilter was renamed to DateRecognizerFilter and moved under 5615 analysis/common. TokenTypeSinkFilter was removed (use TypeTokenFilter instead). 5616 TokenRangeSinkFilter was removed. (Shai Erera, Uwe Schindler) 5617 5618* LUCENE-6980: Default applyAllDeletes to true when opening 5619 near-real-time readers (Mike McCandless) 5620 5621* LUCENE-6981: SpanQuery.getTermContexts() helper methods are now public, and 5622 SpanScorer has a public getSpans() method. (Alan Woodward) 5623 5624* LUCENE-6932: IndexInput.seek implementations now throw EOFException 5625 if you seek beyond the end of the file (Adrien Grand, Mike McCandless) 5626 5627* LUCENE-6988: IndexableField.tokenStream() no longer throws IOException 5628 (Alan Woodward) 5629 5630* LUCENE-7028: Deprecate a duplicate method in NumericUtils. 5631 (Uwe Schindler) 5632 5633Optimizations 5634 5635* LUCENE-6930: Decouple GeoPointField from NumericType by using a custom 5636 and efficient GeoPointTokenStream and TermEnum designed for GeoPoint prefix 5637 terms. (Nick Knize) 5638 5639* LUCENE-6951: Improve GeoPointInPolygonQuery using point orientation based 5640 line crossing algorithm, and adding result for multi-value docs when least 5641 1 point satisfies polygon criteria. (Nick Knize) 5642 5643* LUCENE-6889: BooleanQuery.rewrite now performs some query optimization, in 5644 particular to rewrite queries that look like: "+*:* #filter" to a 5645 "ConstantScore(filter)". (Adrien Grand) 5646 5647* LUCENE-6912: Grouping's Collectors now calculate a response to needsScores() 5648 instead of always 'true'. (David Smiley) 5649 5650* LUCENE-6815: DisjunctionScorer now advances two-phased iterators lazily, 5651 stopping to evaluate them as soon as a single one matches. The other iterators 5652 will be confirmed lazily when computing score() or freq(). (Adrien Grand) 5653 5654* LUCENE-6926: MUST_NOT clauses now use the match cost API to run the slow bits 5655 last whenever possible. (Adrien Grand) 5656 5657* LUCENE-6944: BooleanWeight no longer creates sub-scorers if BS1 is not 5658 applicable. (Adrien Grand) 5659 5660* LUCENE-6940: MUST_NOT clauses execute faster, especially when they are sparse. 5661 (Adrien Grand) 5662 5663* LUCENE-6470: Improve efficiency of TermsQuery constructors. (Robert Muir) 5664 5665Bug Fixes 5666 5667* LUCENE-6976: BytesRefTermAttributeImpl.copyTo NPE'ed if BytesRef was null. 5668 Added equals & hashCode, and a new test for these things. (David Smiley) 5669 5670* LUCENE-6932: RAMDirectory's IndexInput was failing to throw 5671 EOFException in some cases (Stéphane Campinas, Adrien Grand via Mike 5672 McCandless) 5673 5674* LUCENE-6896: Don't treat the smallest possible norm value as an infinitely 5675 long document in SimilarityBase or BM25Similarity. Add more warnings to sims 5676 that will not work well with extreme tf values. (Ahmet Arslan, Robert Muir) 5677 5678* LUCENE-6984: SpanMultiTermQueryWrapper no longer modifies its wrapped query. 5679 (Alan Woodward, Adrien Grand) 5680 5681* LUCENE-6998: Fix a couple places to better detect truncated index files 5682 as corruption. (Robert Muir, Mike McCandless) 5683 5684* LUCENE-7002: Fixed MultiCollector to not throw a NPE if setScorer is called 5685 after one of the sub collectors is done collecting. (John Wang, Adrien Grand) 5686 5687* LUCENE-7027: Fixed NumericTermAttribute to not throw IllegalArgumentException 5688 after NumericTokenStream was exhausted. (Uwe Schindler, Lee Hinman, 5689 Mike McCandless) 5690 5691* LUCENE-7018: Fix GeoPointTermQueryConstantScoreWrapper to add document on 5692 first GeoPointField match. (Nick Knize) 5693 5694* LUCENE-7019: Add two-phase iteration to GeoPointTermQueryConstantScoreWrapper. 5695 (Robert Muir via Nick Knize) 5696 5697* LUCENE-6989: Improve MMapDirectory's unmapping checks to catch more non-working 5698 cases. The unmap-hack does not yet work with recent Java 9. Official support 5699 will come with Lucene 6. (Uwe Schindler) 5700 5701Other 5702 5703* LUCENE-6924: Upgrade randomizedtesting to 2.3.2. (Dawid Weiss) 5704 5705* LUCENE-6920: Improve custom function checks in expressions module 5706 to use MethodHandles and work without extra security privileges. 5707 (Uwe Schindler, Robert Muir) 5708 5709* LUCENE-6921: Fix SPIClassIterator#isParentClassLoader to don't 5710 require extra permissions. (Uwe Schindler) 5711 5712* LUCENE-6923: Fix RamUsageEstimator to access private fields inside 5713 AccessController block for computing size. (Robert Muir) 5714 5715* LUCENE-6907: make TestParser extendable, rename test/.../xml/ 5716 NumericRangeQueryQuery.xml to NumericRangeQuery.xml 5717 (Christine Poerschke) 5718 5719* LUCENE-6925: add ForceMergePolicy class in test-framework 5720 (Christine Poerschke) 5721 5722* LUCENE-6945: factor out TestCorePlus(Queries|Extensions)Parser from 5723 TestParser, rename TestParser to TestCoreParser (Christine Poerschke) 5724 5725* LUCENE-6949: fix (potential) resource leak in SynonymFilterFactory 5726 (https://scan.coverity.com/projects/5620 CID 120656) 5727 (Christine Poerschke, Coverity Scan (via Rishabh Patel)) 5728 5729* LUCENE-6961: Improve Exception handling in AnalysisFactories / 5730 AnalysisSPILoader: Don't wrap exceptions occuring in factory's 5731 ctor inside InvocationTargetException. (Uwe Schindler) 5732 5733* LUCENE-6965: Expression's JavascriptCompiler now throw ParseException 5734 with bad function names or bad arity instead of IllegalArgumentException. 5735 (Tomás Fernández Löbbe, Uwe Schindler, Ryan Ernst) 5736 5737* LUCENE-6964: String-based signatures in JavascriptCompiler replaced 5738 with better compile-time-checked MethodType; generated class files 5739 are no longer marked as synthetic. (Uwe Schindler) 5740 5741* LUCENE-6978: Refactor several code places that lookup locales 5742 by string name to use BCP47 locale tag instead. LuceneTestCase 5743 now also prints locales on failing tests this way. 5744 Locale#forLanguageTag() and Locale#toString() were placed on list 5745 of forbidden signatures. (Uwe Schindler, Robert Muir) 5746 5747* LUCENE-6988: You can now add IndexableFields directly to a MemoryIndex, 5748 and create a MemoryIndex from a lucene Document. (Alan Woodward) 5749 5750* LUCENE-7005: TieredMergePolicy tweaks (>= vs. >, @see get vs. set) 5751 (Christine Poerschke) 5752 5753* LUCENE-7006: increase BaseMergePolicyTestCase use (TestNoMergePolicy and 5754 TestSortingMergePolicy now extend it, TestUpgradeIndexMergePolicy added) 5755 (Christine Poerschke) 5756 5757======================= Lucene 5.4.1 ======================= 5758 5759Bug Fixes 5760 5761* LUCENE-6910: fix 'if ... > Integer.MAX_VALUE' check in 5762 (Binary|Numeric)DocValuesFieldUpdates.merge 5763 (https://scan.coverity.com/projects/5620 CID 119973 and CID 120081) 5764 (Christine Poerschke, Coverity Scan (via Rishabh Patel)) 5765 5766* LUCENE-6946: SortField.equals now takes the missingValue parameter into 5767 account. (Adrien Grand) 5768 5769* LUCENE-6918: LRUQueryCache.onDocIdSetEviction is only called when at least 5770 one DocIdSet is being evicted. (Adrien Grand) 5771 5772* LUCENE-6929: Fix SpanNotQuery rewriting to not drop the pre/post parameters. 5773 (Tim Allison via Adrien Grand) 5774 5775* LUCENE-6950: Fix FieldInfos handling of UninvertingReader, e.g. do not 5776 hide the true docvalues update generation or other properties. 5777 (Ishan Chattopadhyaya via Robert Muir) 5778 5779* LUCENE-6948: Fix ArrayIndexOutOfBoundsException in PagedBytes$Reader.fill 5780 by removing an unnecessary long-to-int cast. 5781 (Michael Lawley via Christine Poerschke) 5782 5783* SOLR-7865: BlendedInfixSuggester was returning too many results 5784 (Arcadius Ahouansou via Mike McCandless) 5785 5786* LUCENE-6970: Fixed off-by-one error in Lucene54DocValuesProducer that could 5787 potentially corrupt doc values. (Adrien Grand) 5788 5789* LUCENE-2229: Fix Highlighter's SimpleSpanFragmenter when multiple adjacent 5790 stop words following a span can unduly make the fragment way too long. 5791 (Elmer Garduno, Lukhnos Liu via David Smiley) 5792 5793======================= Lucene 5.4.0 ======================= 5794 5795New Features 5796 5797* LUCENE-6875: New Serbian Filter. (Nikola Smolenski via Robert Muir, 5798 Dawid Weiss) 5799 5800* LUCENE-6720: New FunctionRangeQuery wrapper around ValueSourceScorer 5801 (returned from ValueSource/FunctionValues.getRangeScorer()). (David Smiley) 5802 5803* LUCENE-6724: Add utility APIs to GeoHashUtils to compute neighbor 5804 geohash cells (Nick Knize via Mike McCandless). 5805 5806* LUCENE-6737: Add DecimalDigitFilter which folds unicode digits to basic latin. 5807 (Robert Muir) 5808 5809* LUCENE-6699: Add integration of BKD tree and geo3d APIs to give 5810 fast, very accurate query to find all indexed points within an 5811 earth-surface shape (Karl Wright, Mike McCandless) 5812 5813* LUCENE-6838: Added IndexSearcher#getQueryCache and #getQueryCachingPolicy. 5814 (Adrien Grand) 5815 5816* LUCENE-6844: PayloadScoreQuery can include or exclude underlying span scores 5817 from its score calculations (Bill Bell, Alan Woodward) 5818 5819* LUCENE-6778: Add GeoPointDistanceRangeQuery, to search for points 5820 within a "ring" (beyond a minimum distance and below a maximum 5821 distance) (Nick Knize via Mike McCandless) 5822 5823* LUCENE-6874: Add a new UnicodeWhitespaceTokenizer to analysis/common 5824 that uses Unicode character properties extracted from ICU4J to tokenize 5825 text on whitespace. This tokenizer will split on non-breaking 5826 space (NBSP), too. (David Smiley, Uwe Schindler, Steve Rowe) 5827 5828API Changes 5829 5830* LUCENE-6590: Query.setBoost(), Query.getBoost() and Query.clone() are gone. 5831 In order to apply boosts, you now need to wrap queries in a BoostQuery. 5832 (Adrien Grand) 5833 5834* LUCENE-6716: SpanPayloadCheckQuery now takes a List<BytesRef> rather than 5835 a Collection<byte[]>. (Alan Woodward) 5836 5837* LUCENE-6489: The various span payload queries have been moved to the queries 5838 submodule, and PayloadSpanUtil is now in sandbox. (Alan Woodward) 5839 5840* LUCENE-6650: The spatial module no longer uses Filter in any way. All 5841 spatial Filters are now subclass Query. The spatial heatmap/facet API 5842 now accepts a Bits parameter to filter counts. (David Smiley, Adrien Grand) 5843 5844* LUCENE-6803: Deprecate sandbox Regexp Query. (Uwe Schindler) 5845 5846* LUCENE-6301: org.apache.lucene.search.Filter is now deprecated. You should use 5847 Query objects instead of Filters, and the BooleanClause.Occur.FILTER clause in 5848 order to let Lucene know that a Query should be used for filtering but not 5849 scoring. 5850 5851* LUCENE-6939: SpanOrQuery.addClause is now deprecated, clauses should all be 5852 provided at construction time. (Paul Elschot via Adrien Grand) 5853 5854* LUCENE-6855: CachingWrapperQuery is deprecated and will be removed in 6.0. 5855 (Adrien Grand) 5856 5857* LUCENE-6870: DisjunctionMaxQuery#add is now deprecated, clauses should all be 5858 provided at construction time. (Adrien Grand) 5859 5860* LUCENE-6884: Analyzer.tokenStream() and Tokenizer.setReader() are no longer 5861 declared as throwing IOException. (Alan Woodward) 5862 5863* LUCENE-6849: Expose IndexWriter.flush() method, to move all 5864 in-memory segments to disk without opening a near-real-time reader 5865 nor calling fsync (Robert Muir, Simon Willnauer, Mike McCandless) 5866 5867* LUCENE-6911: Add correct StandardQueryParser.getMultiFields() method, 5868 deprecate no-op StandardQueryParser.getMultiFields(CharSequence[]) method. 5869 (Christine Poerschke, Mikhail Khludnev, Coverity Scan (via Rishabh Patel)) 5870 5871Optimizations 5872 5873* LUCENE-6708: TopFieldCollector does not compute the score several times on the 5874 same document anymore. (Adrien Grand) 5875 5876* LUCENE-6720: ValueSourceScorer, returned from 5877 FunctionValues.getRangeScorer(), now uses TwoPhaseIterator. (David Smiley) 5878 5879* LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer for better 5880 performance when used as a top-level query. (Adrien Grand) 5881 5882* LUCENE-6746: DisjunctionMaxQuery, BoostingQuery and BoostedQuery now create 5883 sub weights through IndexSearcher so that they can be cached. (Adrien Grand) 5884 5885* LUCENE-6754: Optimized IndexSearcher.count for the cases when it can use 5886 index statistics instead of collecting all matches. (Adrien Grand) 5887 5888* LUCENE-6773: Nested conjunctions now iterate over documents as if clauses 5889 were all at the same level. (Adrien Grand) 5890 5891* LUCENE-6777: Reuse BytesRef when visiting term ranges in 5892 GeoPointTermsEnum to reduce GC pressure (Nick Knize via Mike 5893 McCandless) 5894 5895* LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write 5896 strings larger than 64kb by an amount equal to string's utf8 size. 5897 (Dawid Weiss, Robert Muir, shalin) 5898 5899* LUCENE-6850: Optimize BooleanScorer for sparse clauses. (Adrien Grand) 5900 5901* LUCENE-6840: Ordinal indexes for SORTED_SET/SORTED_NUMERIC fields and 5902 addresses for BINARY fields are now stored on disk instead of in memory. 5903 (Adrien Grand) 5904 5905* LUCENE-6878: Speed up TopDocs.merge. (Daniel Jelinski via Adrien Grand) 5906 5907* LUCENE-6885: StandardDirectoryReader (initialCapacity) tweaks 5908 (Christine Poerschke) 5909 5910* LUCENE-6863: Optimized storage requirements of doc values fields when less 5911 than 1% of documents have a value. (Adrien Grand) 5912 5913* LUCENE-6892: various lucene.index initialCapacity tweaks 5914 (Christine Poerschke) 5915 5916* LUCENE-6276: Added TwoPhaseIterator.matchCost() which allows to confirm the 5917 least costly TwoPhaseIterators first. (Paul Elschot via Adrien Grand) 5918 5919* LUCENE-6898: In the default codec, the last stored field value will not 5920 be fully read from disk if the supplied StoredFieldVisitor doesn't want it. 5921 So put your largest text field value last to benefit. (David Smiley) 5922 5923* LUCENE-6909: Remove unnecessary synchronized from 5924 FacetsConfig.getDimConfig for better concurrency (Sanne Grinovero 5925 via Mike McCandless) 5926 5927* SOLR-7730: Speed up SlowCompositeReaderWrapper.getSortedSetDocValues() by 5928 avoiding merging FieldInfos just to check doc value type. 5929 (Paul Vasilyev, Yuriy Pakhomov, Mikhail Khludnev, yonik) 5930 5931Bug Fixes 5932 5933* LUCENE-6905: Unwrap center longitude for dateline crossing 5934 GeoPointDistanceQuery. (Nick Knize) 5935 5936* LUCENE-6817: ComplexPhraseQueryParser.ComplexPhraseQuery does not display 5937 slop in toString(). (Ahmet Arslan via Dawid Weiss) 5938 5939* LUCENE-6730: Hyper-parameter c is ignored in term frequency NormalizationH1. 5940 (Ahmet Arslan via Robert Muir) 5941 5942* LUCENE-6742: Lovins & Finnish implementation of SnowballFilter was 5943 fixed to behave exactly as specified. A bug in the snowball compiler 5944 caused differences in output of the filter in comparison to the original 5945 test data. In addition, the performance of those filters was improved 5946 significantly. (Uwe Schindler, Robert Muir) 5947 5948* LUCENE-6783: Removed side effects from FuzzyLikeThisQuery.rewrite. 5949 (Adrien Grand) 5950 5951* LUCENE-6776: Fix geo3d math to handle randomly squashed planet 5952 models (Karl Wright via Mike McCandless) 5953 5954* LUCENE-6792: Fix TermsQuery.toString() to work with binary terms. 5955 (Ruslan Muzhikov, Robert Muir) 5956 5957* LUCENE-5503: When Highlighter's WeightedSpanTermExtractor converts a 5958 PhraseQuery to an equivalent SpanQuery, it would sometimes use a slop that is 5959 too low (no highlight) or determine inOrder wrong. 5960 (Tim Allison via David Smiley) 5961 5962* LUCENE-6790: Fix IndexWriter thread safety when one thread is 5963 handling a tragic exception but another is still committing (Mike 5964 McCandless) 5965 5966* LUCENE-6810: Upgrade to Spatial4j 0.5 -- fixes some edge-case bugs in the 5967 spatial module. See https://github.com/locationtech/spatial4j/blob/master/CHANGES.md 5968 (David Smiley) 5969 5970* LUCENE-6813: OfflineSorter no longer removes its output Path up 5971 front, and instead opens it for write with the 5972 StandardCopyOption.REPLACE_EXISTING to overwrite any prior file, so 5973 that callers can safely use Files.createTempFile for the output. 5974 This change also fixes OfflineSorter's default temp directory when 5975 running tests to use mock filesystems so e.g. we detect file handle 5976 leaks (Dawid Weiss, Robert Muir, Mike McCandless) 5977 5978* LUCENE-6813: RangeTreeWriter was failing to close all file handles 5979 it opened, leading to intermittent failures on Windows (Dawid Weiss, 5980 Robert Muir, Mike McCandless) 5981 5982* LUCENE-6826: Fix ClassCastException when merging a field that has no 5983 terms because they were filtered out by e.g. a FilterCodecReader 5984 (Trejkaz via Mike McCandless) 5985 5986* LUCENE-6823: LocalReplicator should use System.nanoTime as its clock 5987 source for checking for expiration (Ishan Chattopadhyaya via Mike 5988 McCandless) 5989 5990* LUCENE-6856: The Weight wrapper used by LRUQueryCache now delegates to the 5991 original Weight's BulkScorer when applicable. (Adrien Grand) 5992 5993* LUCENE-6858: Fix ContextSuggestField to correctly wrap token stream 5994 when using CompletionAnalyzer. (Areek Zillur) 5995 5996* LUCENE-6872: IndexWriter handles any VirtualMachineError, not just OOM, 5997 as tragic. (Robert Muir) 5998 5999* LUCENE-6814: PatternTokenizer no longer hangs onto heap sized to the 6000 maximum input string it's ever seen, which can be a large memory 6001 "leak" if you tokenize large strings with many threads across many 6002 indices (Alex Chow via Mike McCandless) 6003 6004* LUCENE-6888: Explain output of map() function now also prints default value (janhoy) 6005 6006Other 6007 6008* LUCENE-6899: Upgrade randomizedtesting to 2.3.1. (Dawid Weiss) 6009 6010* LUCENE-6478: Test execution can hang with java.security.debug. (Dawid Weiss) 6011 6012* LUCENE-6862: Upgrade of RandomizedRunner to version 2.2.0. (Dawid Weiss) 6013 6014* LUCENE-6857: Validate StandardQueryParser with NOT operator 6015 with-in parantheses. (Jigar Shah via Dawid Weiss) 6016 6017* LUCENE-6827: Use explicit capacity ArrayList instead of a LinkedList 6018 in MultiFieldQueryNodeProcessor. (Dawid Weiss). 6019 6020* LUCENE-6812: Upgrade RandomizedTesting to 2.1.17. (Dawid Weiss) 6021 6022* LUCENE-6174: Improve "ant eclipse" to select right JRE for building. 6023 (Uwe Schindler, Dawid Weiss) 6024 6025* LUCENE-6417, LUCENE-6830: Upgrade ANTLR used in expressions module 6026 to version 4.5.1-1. (Jack Conradson, Uwe Schindler) 6027 6028* LUCENE-6729: Upgrade ASM used in expressions module to version 5.0.4. 6029 (Uwe Schindler) 6030 6031* LUCENE-6738: remove IndexWriterConfig.[gs]etIndexingChain 6032 (Christine Poerschke) 6033 6034* LUCENE-6755: more tests of ToChildBlockJoinScorer.advance (hossman) 6035 6036* LUCENE-6571: fix some private access level javadoc errors and warnings 6037 (Cao Manh Dat, Christine Poerschke) 6038 6039* LUCENE-6768: AbstractFirstPassGroupingCollector.groupSort private member 6040 is not needed. (Christine Poerschke) 6041 6042* LUCENE-6761: MatchAllDocsQuery's Scorers do not expose approximations 6043 anymore. (Adrien Grand) 6044 6045* LUCENE-6775, LUCENE-6833: Improved MorfologikFilterFactory to allow 6046 loading of custom dictionaries from ResourceLoader. Upgraded 6047 Morfologik to version 2.0.1. The 'dictionary' attribute has been 6048 reverted back and now points at the dictionary resource to be 6049 loaded instead of the default Polish dictionary. 6050 (Uwe Schindler, Dawid Weiss) 6051 6052* LUCENE-6797: Make GeoCircle an interface and use a factory to create 6053 it, to eventually handle degenerate cases (Karl Wright via Mike 6054 McCandless) 6055 6056* LUCENE-6800: Use XYZSolidFactory to create XYZSolids (Karl Wright 6057 via Mike McCandless) 6058 6059* LUCENE-6798: Geo3d now models degenerate (too tiny) circles as a 6060 single point (Karl Wright via Mike McCandless) 6061 6062* LUCENE-6770: Add javadocs that FSDirectory canonicalizes the path. 6063 (Uwe Schindler, Vladimir Kuzmin) 6064 6065* LUCENE-6795: Fix various places where code used 6066 AccessibleObject#setAccessible() without a privileged block. Code 6067 without a hard requirement to do reflection were rewritten. This 6068 makes Lucene and Solr ready for Java 9 Jigsaw's module system, where 6069 reflection on Java's runtime classes is very restricted. 6070 (Robert Muir, Uwe Schindler) 6071 6072* LUCENE-6467: Simplify Query.equals. (Paul Elschot via Adrien Grand) 6073 6074* LUCENE-6845: SpanScorer is now merged into Spans (Alan Woodward, David Smiley) 6075 6076* LUCENE-6887: DefaultSimilarity is deprecated, use ClassicSimilarity for equivalent behavior, 6077 or consider switching to BM25Similarity which will become the new default in Lucene 6.0 (hossman) 6078 6079* LUCENE-6893: factor out CorePlusQueriesParser from CorePlusExtensionsParser 6080 (Christine Poerschke) 6081 6082* LUCENE-6902: Don't retry to fsync files / directories; fail 6083 immediately. (Daniel Mitterdorfer, Uwe Schindler) 6084 6085* LUCENE-6801: Clarify JavaDocs of PhraseQuery that it in fact supports terms 6086 at the same position (as does MultiPhraseQuery), treated like a conjunction. 6087 Added test. (David Smiley, Adrien Grand) 6088 6089Build 6090 6091* LUCENE-6732: Improve checker for invalid source patterns to also 6092 detect javadoc-style license headers. Use Groovy to implement the 6093 checks instead of plain Ant. (Uwe Schindler) 6094 6095* LUCENE-6594: Update forbiddenapis to 2.0. (Uwe Schindler) 6096 6097Tests 6098 6099* LUCENE-6752: Add Math#random() to forbiddenapis. (Uwe Schindler, 6100 Mikhail Khludnev, Andrei Beliakov) 6101 6102Changes in Backwards Compatibility Policy 6103 6104* LUCENE-6742: The Lovins & Finnish implementation of SnowballFilter 6105 were fixed to now behave exactly like the original Snowball stemmer. 6106 If you have indexed text using those stemmers you may need to reindex. 6107 (Uwe Schindler, Robert Muir) 6108 6109Changes in Runtime Behavior 6110 6111* LUCENE-6772: MultiCollector now catches CollectionTerminatedException and 6112 removes the collector that threw this exception from the list of sub 6113 collectors to collect. (Adrien Grand) 6114 6115* LUCENE-6784: IndexSearcher's query caching is enabled by default. Run 6116 indexSearcher.setQueryCache(null) to disable. (Adrien Grand) 6117 6118* LUCENE-6305: BooleanQuery.equals and hashcode do not depend on the order of 6119 clauses anymore. (Adrien Grand) 6120 6121======================= Lucene 5.3.2 ======================= 6122 6123Bug Fixes 6124 6125* SOLR-7865: BlendedInfixSuggester was returning too many results 6126 (Arcadius Ahouansou via Mike McCandless) 6127 6128======================= Lucene 5.3.1 ======================= 6129 6130Bug Fixes 6131 6132* LUCENE-6774: Remove classloader hack in MorfologikFilter. (Robert Muir, 6133 Uwe Schindler) 6134 6135* LUCENE-6748: UsageTrackingQueryCachingPolicy no longer caches trivial queries 6136 like MatchAllDocsQuery. (Adrien Grand) 6137 6138* LUCENE-6781: Fixed BoostingQuery to rewrite wrapped queries. (Adrien Grand) 6139 6140Tests 6141 6142* LUCENE-6760, SOLR-7958: Move TestUtil#randomWhitespace to the only 6143 Solr test that is using it. The method is not useful for Lucene tests 6144 (and easily breaks, e.g., in Java 9 caused by Unicode version updates). 6145 (Uwe Schindler) 6146 6147 6148======================= Lucene 5.3.0 ======================= 6149 6150New Features 6151 6152* LUCENE-6485: Add CustomSeparatorBreakIterator to postings 6153 highlighter which splits on any character. For example, it 6154 can be used with getMultiValueSeparator render whole field 6155 values. (Luca Cavanna via Robert Muir) 6156 6157* LUCENE-6459: Add common suggest API that mirrors Lucene's 6158 Query/IndexSearcher APIs for Document based suggester. 6159 Adds PrefixCompletionQuery, RegexCompletionQuery, 6160 FuzzyCompletionQuery and ContextQuery. 6161 (Areek Zillur via Mike McCandless) 6162 6163* LUCENE-6487: Spatial Geo3D API now has a WGS84 ellipsoid world model option. 6164 (Karl Wright via David Smiley) 6165 6166* LUCENE-6477: Add experimental BKD geospatial tree doc values format 6167 and queries, for fast "bbox/polygon contains lat/lon points" (Mike 6168 McCandless) 6169 6170* LUCENE-6526: Asserting(Query|Weight|Scorer) now ensure scores are not computed 6171 if they are not needed. (Adrien Grand) 6172 6173* LUCENE-6481: Add GeoPointField, GeoPointInBBoxQuery, 6174 GeoPointInPolygonQuery for simple "indexed lat/lon point in 6175 bbox/shape" searching. (Nick Knize via Mike McCandless) 6176 6177* LUCENE-5954: The segments_N commit point now stores the Lucene 6178 version that wrote the commit as well as the lucene version that 6179 wrote the oldest segment in the index, for faster checking of "too 6180 old" indices (Ryan Ernst, Robert Muir, Mike McCandless) 6181 6182* LUCENE-6519: BKDPointInPolygonQuery is much faster by avoiding 6183 the per-hit polygon check when a leaf cell is fully contained by the 6184 polygon. (Nick Knize, Mike McCandless) 6185 6186* LUCENE-6549: Add preload option to MMapDirectory. (Robert Muir) 6187 6188* LUCENE-6504: Add Lucene53Codec, with norms implemented directly 6189 via the Directory's RandomAccessInput api. (Robert Muir) 6190 6191* LUCENE-6539: Add new DocValuesNumbersQuery, to match any document 6192 containing one of the specified long values. This change also 6193 moves the existing DocValuesTermsQuery and DocValuesRangeQuery 6194 to Lucene's sandbox module, since in general these queries are 6195 quite slow and are only fast in specific cases. (Adrien Grand, 6196 Robert Muir, Mike McCandless) 6197 6198* LUCENE-6577: Give earlier and better error message for invalid CRC. 6199 (Robert Muir) 6200 6201* LUCENE-6544: Geo3D: (1) Regularize path & polygon construction, (2) add 6202 PlanetModel.surfaceDistance() (ellipsoidal calculation), (3) cache lat & lon 6203 in GeoPoint, (4) add thread-safety where missing -- Geo3dShape. (Karl Wright, 6204 David Smiley) 6205 6206* LUCENE-6606: SegmentInfo.toString now confesses how the documents 6207 were sorted, when SortingMergePolicy was used (Christine Poerschke 6208 via Mike McCandless) 6209 6210* LUCENE-6524: IndexWriter can now be initialized from an already open 6211 near-real-time or non-NRT reader. (Boaz Leskes, Robert Muir, Mike 6212 McCandless) 6213 6214* LUCENE-6578: Geo3D can now compute the distance from a point to a shape, both 6215 inner distance and to an outside edge. Multiple distance algorithms are 6216 available. (Karl Wright, David Smiley) 6217 6218* LUCENE-6632: Geo3D: Compute circle planes more accurately. 6219 (Karl Wright via David Smiley) 6220 6221* LUCENE-6653: Added general purpose BytesTermAttribute to basic token 6222 attributes package that can be used for TokenStreams that solely produce 6223 binary terms. (Uwe Schindler) 6224 6225* LUCENE-6365: Add Operations.topoSort, to run topological sort of the 6226 states in an Automaton (Markus Heiden via Mike McCandless) 6227 6228* LUCENE-6365: Replace Operations.getFiniteStrings with a 6229 more scalable iterator API (FiniteStringsIterator) (Markus Heiden 6230 via Mike McCandless) 6231 6232* LUCENE-6589: Add a new org.apache.lucene.search.join.CheckJoinIndex class 6233 that can be used to validate that an index has an appropriate structure to 6234 run join queries. (Adrien Grand) 6235 6236* LUCENE-6659: Remove IndexWriter's unnecessary hard limit on max concurrency 6237 (Robert Muir, Mike McCandless) 6238 6239* LUCENE-6547: Add GeoPointDistanceQuery, matching all points within 6240 the specified distance from the center point. Fix 6241 GeoPointInBBoxQuery to handle dateline crossing. 6242 6243* LUCENE-6694: Add LithuanianAnalyzer and LithuanianStemmer. 6244 (Dainius Jocas via Robert Muir) 6245 6246* LUCENE-6695: Added a new BlendedTermQuery to blend statistics across several 6247 terms. (Simon Willnauer, Adrien Grand) 6248 6249* LUCENE-6706: Added a new PayloadScoreQuery that generalises the behaviour of 6250 PayloadTermQuery and PayloadNearQuery to all Span queries. (Alan Woodward) 6251 6252* LUCENE-6697: Add experimental range tree doc values format and 6253 queries, based on a 1D version of the spatial BKD tree, for a faster 6254 and smaller alternative to postings-based numeric and binary term 6255 filtering. Range trees can also handle values larger than 64 bits. 6256 (Adrien Grand, Mike McCandless) 6257 6258* LUCENE-6647: Add GeoHash string utility APIs (Nick Knize via Mike 6259 McCandless). 6260 6261* LUCENE-6710: GeoPointField now uses full 64 bits (up from 62) to encode 6262 lat/lon (Nick Knize via Mike McCandless). 6263 6264* LUCENE-6580: SpanNearQuery now allows defined-width gaps in its subqueries 6265 (Alan Woodward, Adrien Grand). 6266 6267* LUCENE-6712: Use doc values to post-filter GeoPointField hits that 6268 fall in boundary cells, resulting in smaller index, faster searches 6269 and less heap used for each query (Nick Knize via Mike McCandless). 6270 6271API Changes 6272 6273* LUCENE-6508: Simplify Lock api, there is now just 6274 Directory.obtainLock() which returns a Lock that can be 6275 released (or fails with exception). Add lock verification 6276 to IndexWriter. Improve exception messages when locking fails. 6277 (Uwe Schindler, Mike McCandless, Robert Muir) 6278 6279* LUCENE-6371, LUCENE-6490: Payload collection from Spans is moved to a more generic 6280 SpanCollector framework. Spans no longer implements .hasPayload() and 6281 .getPayload() methods, and instead exposes a collect() method that allows 6282 the collection of arbitrary postings information. SpanPayloadCheckQuery and 6283 SpanPayloadNearCheckQuery have moved from the .spans package to the .payloads 6284 package. (Alan Woodward, David Smiley, Paul Elschot, Robert Muir) 6285 6286* LUCENE-6529: Removed an optimization in UninvertingReader that was causing 6287 incorrect results for Numeric fields using precisionStep 6288 (hossman, Robert Muir) 6289 6290* LUCENE-6551: Add missing ConcurrentMergeScheduler.getAutoIOThrottle 6291 getter (Simon Willnauer, Mike McCandless) 6292 6293* LUCENE-6552: Add MergePolicy.OneMerge.getMergeInfo and rename 6294 setInfo to setMergeInfo (Simon Willnauer, Mike McCandless) 6295 6296* LUCENE-6525: Deprecate IndexWriterConfig's writeLockTimeout. 6297 (Robert Muir) 6298 6299* LUCENE-6583: FilteredQuery is deprecated and will be removed in 6.0. It should 6300 be replaced with a BooleanQuery which handle the query as a MUST clause and 6301 the filter as a FILTER clause. (Adrien Grand) 6302 6303* LUCENE-6553: The postings, spans and scorer APIs no longer take an acceptDocs 6304 parameter. Live docs are now always checked on top of these APIs. 6305 (Adrien Grand) 6306 6307* LUCENE-6634: PKIndexSplitter now takes a Query instead of a Filter to decide 6308 how to split an index. (Adrien Grand) 6309 6310* LUCENE-6643: GroupingSearch from lucene/grouping was changed to take a Query 6311 object to define groups instead of a Filter. (Adrien Grand) 6312 6313* LUCENE-6554: ToParentBlockJoinFieldComparator was removed because of a bug 6314 with missing values that could not be fixed. ToParentBlockJoinSortField now 6315 works with string or numeric doc values selectors. Sorting on anything else 6316 than a string or numeric field would require to implement a custom selector. 6317 (Adrien Grand) 6318 6319* LUCENE-6648: All lucene/facet APIs now take Query objects where they used to 6320 take Filter objects. (Adrien Grand) 6321 6322* LUCENE-6640: Suggesters now take a BitsProducer object instead of a Filter 6323 object to reduce the scope of doc IDs that may be returned, emphasizing the 6324 fact that these objects need to support random-access. (Adrien Grand) 6325 6326* LUCENE-6646: Make EarlyTerminatingCollector take a Sort object directly 6327 instead of a SortingMergePolicy. (Christine Poerschke via Adrien Grand) 6328 6329* LUCENE-6649: BitDocIdSetFilter and BitDocIdSetCachingWrapperFilter are now 6330 deprecated in favour of BitSetProducer and QueryBitSetProducer, which do not 6331 extend oal.search.Filter. (Adrien Grand) 6332 6333* LUCENE-6607: Factor out geo3d into its own spatial3d module. (Karl 6334 Wright, Nick Knize, David Smiley, Mike McCandless) 6335 6336* LUCENE-6531: PhraseQuery is now immutable and can be built using the 6337 PhraseQuery.Builder class. (Adrien Grand) 6338 6339* LUCENE-6570: BooleanQuery is now immutable and can be built using the 6340 BooleanQuery.Builder class. (Adrien Grand) 6341 6342* LUCENE-6702: NRTSuggester: Add a method to inject context values at index time 6343 in ContextSuggestField. Simplify ContextQuery logic for extracting contexts and 6344 add dedicated method to consider all context values at query time. 6345 (Areek Zillur, Mike McCandless) 6346 6347* LUCENE-6719: NumericUtils getMinInt, getMaxInt, getMinLong, getMaxLong now 6348 return null if there are no terms for the specified field, previously these 6349 methods returned primitive values and raised an undocumented NullPointerException 6350 if there were no terms for the field. (hossman, Timothy Potter) 6351 6352Bug fixes 6353 6354* LUCENE-6500: ParallelCompositeReader did not always call 6355 closed listeners. This was fixed by LUCENE-6501. 6356 (Adrien Grand, Uwe Schindler) 6357 6358* LUCENE-6520: Geo3D GeoPath.done() would throw an NPE if adjacent path 6359 segments were co-linear. (Karl Wright via David Smiley) 6360 6361* LUCENE-5805: QueryNodeImpl.removeFromParent was doing nothing in a 6362 costly manner (Christoph Kaser, Cao Manh Dat via Mike McCAndless) 6363 6364* LUCENE-6533: SlowCompositeReaderWrapper no longer caches its live docs 6365 instance since this can prevent future improvements like a 6366 disk-backed live docs (Adrien Grand, Mike McCandless) 6367 6368* LUCENE-6558: Highlighters now work with CustomScoreQuery (Cao Manh 6369 Dat via Mike McCandless) 6370 6371* LUCENE-6560: BKDPointInBBoxQuery now handles "dateline crossing" 6372 correctly (Nick Knize, Mike McCandless) 6373 6374* LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8 6375 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString 6376 as workaround); fix output of tests to use same format. (Uwe Schindler, 6377 Ramkumar Aiyengar) 6378 6379* LUCENE-6593: Fixed ToChildBlockJoinQuery's scorer to not refuse to advance 6380 to a document that belongs to the parent space. (Adrien Grand) 6381 6382* LUCENE-6591: Never write a negative vLong (Robert Muir, Ryan Ernst, 6383 Adrien Grand, Mike McCandless) 6384 6385* LUCENE-6588: Fix how ToChildBlockJoinQuery deals with acceptDocs. 6386 (Christoph Kaser via Adrien Grand) 6387 6388* LUCENE-6597: Geo3D's GeoCircle now supports a world-globe diameter. 6389 (Karl Wright via David Smiley) 6390 6391* LUCENE-6608: Fix potential resource leak in BigramDictionary. 6392 (Rishabh Patel via Uwe Schindler) 6393 6394* LUCENE-6614: Improve partition detection in IOUtils#spins() so it 6395 works with NVMe drives. (Uwe Schindler, Mike McCandless) 6396 6397* LUCENE-6586: Fix typo in GermanStemmer, causing possible wrong value 6398 for substCount. (Christoph Kaser via Mike McCandless) 6399 6400* LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any 6401 segments. (Trejkaz, Uwe Schindler) 6402 6403* LUCENE-6677: QueryParserBase fails to enforce maxDeterminizedStates when 6404 creating a WildcardQuery (David Causse via Mike McCandless) 6405 6406* LUCENE-6680: Preserve two suggestions that have same key and weight but 6407 different payloads (Arcadius Ahouansou via Mike McCandless) 6408 6409* LUCENE-6681: SortingMergePolicy must override MergePolicy.size(...). 6410 (Christine Poerschke via Adrien Grand) 6411 6412* LUCENE-6682: StandardTokenizer performance bug: scanner buffer is 6413 unnecessarily copied when maxTokenLength doesn't change. Also stop silently 6414 maxing out buffer size (and effectively also max token length) at 1M chars, 6415 but instead throw an exception from setMaxTokenLength() when the given 6416 length is greater than 1M chars. (Piotr Idzikowski, Steve Rowe) 6417 6418* LUCENE-6696: Fix FilterDirectoryReader.close() to never close the 6419 underlying reader several times. (Adrien Grand) 6420 6421* LUCENE-6334: FastVectorHighlighter failed to highlight phrases across 6422 more than one value in a multi-valued field. (Chris Earle, Nik Everett 6423 via Mike McCandless) 6424 6425* LUCENE-6704: GeoPointDistanceQuery was visiting too many term ranges, 6426 consuming too much heap for a large radius (Nick Knize via Mike McCandless) 6427 6428* SOLR-5882: fix ScoreMode.Min at ToParentBlockJoinQuery (Mikhail Khludnev) 6429 6430* LUCENE-6718: JoinUtil.createJoinQuery failed to rewrite queries before 6431 creating a Weight. (Adrien Grand) 6432 6433* LUCENE-6713: TooComplexToDeterminizeException claims to be serializable 6434 but wasn't (Simon Willnauer, Mike McCandless) 6435 6436* LUCENE-6723: Fix date parsing problems in Java 9 with date formats using 6437 English weekday/month names. (Uwe Schindler) 6438 6439* LUCENE-6618: Properly set MMapDirectory.UNMAP_SUPPORTED when it is now allowed 6440 by security policy. (Robert Muir) 6441 6442Changes in Runtime Behavior 6443 6444* LUCENE-6501: The subreader structure in ParallelCompositeReader 6445 was flattened, because the current implementation had too many 6446 hidden bugs regarding refounting and close listeners. 6447 If you create a new ParallelCompositeReader, it will just take 6448 all leaves of the passed readers and form a flat structure of 6449 ParallelLeafReaders instead of trying to assemble the original 6450 structure of composite and leaf readers. (Adrien Grand, 6451 Uwe Schindler) 6452 6453* LUCENE-6537: NearSpansOrdered no longer tries to minimize its 6454 Span matches. This means that the matching algorithm is entirely 6455 lazy. All spans returned by the previous implementation are still 6456 reported, but matching documents may now also return additional 6457 spans that were previously discarded in preference to shorter 6458 overlapping ones. (Alan Woodward, Adrien Grand, Paul Elschot) 6459 6460* LUCENE-6538: Also include java.vm.version and java.runtime.version 6461 in per-segment diagnostics (Robert Muir, Mike McCandless) 6462 6463* LUCENE-6569: Optimize MultiFunction.anyExists and allExists to eliminate 6464 excessive array creation in common 2 argument usage (Jacob Graves, hossman) 6465 6466* LUCENE-2880: Span queries now score more consistently with regular queries. 6467 (Robert Muir, Adrien Grand) 6468 6469* LUCENE-6601: FilteredQuery now always rewrites to a BooleanQuery which handles 6470 the query as a MUST clause and the filter as a FILTER clause. 6471 LEAP_FROG_QUERY_FIRST_STRATEGY and LEAP_FROG_FILTER_FIRST_STRATEGY do not 6472 guarantee anymore which iterator will be advanced first, it will depend on the 6473 respective costs of the iterators. QUERY_FIRST_FILTER_STRATEGY and 6474 RANDOM_ACCESS_FILTER_STRATEGY still consume the filter using its random-access 6475 API, however the returned bits may be called on different documents compared 6476 to before. (Adrien Grand) 6477 6478* LUCENE-6542: FSDirectory's ctor now works with security policies or file systems 6479 that restrict write access. (Trejkaz, hossman, Uwe Schindler) 6480 6481* LUCENE-6651: The default implementation of AttributeImpl#reflectWith(AttributeReflector) 6482 now uses AccessControler#doPrivileged() to do the reflection. Please consider 6483 implementing this method in all your custom attributes, because the method will be 6484 made abstract in Lucene 6. (Uwe Schindler) 6485 6486* LUCENE-6639: LRUQueryCache and CachingWrapperQuery now consider a query as 6487 "used" when the first Scorer is pulled instead of when a Scorer is pulled on 6488 the first segment on an index. (Terry Smith, Adrien Grand) 6489 6490* LUCENE-6579: IndexWriter now sacrifices (closes) itself to protect the index 6491 when an unexpected, tragic exception strikes while merging. (Robert 6492 Muir, Mike McCandless) 6493 6494* LUCENE-6691: SortingMergePolicy.isSorted now considers FilterLeafReader instances. 6495 EarlyTerminatingSortingCollector.terminatedEarly accessor added. 6496 TestEarlyTerminatingSortingCollector.testTerminatedEarly test added. 6497 (Christine Poerschke) 6498 6499* LUCENE-6609: Add getSortField impls to many subclasses of FieldCacheSource which return 6500 the most direct SortField implementation. In many trivial sort by ValueSource usages, this 6501 will result in less RAM, and more precise sorting of extreme values due to no longer 6502 converting to double. (hossman) 6503 6504Optimizations 6505 6506* LUCENE-6548: Some optimizations for BlockTree's intersect with very 6507 finite automata (Mike McCandless) 6508 6509* LUCENE-6585: Flatten conjunctions and conjunction approximations into 6510 parent conjunctions. For example a sloppy phrase query of "foo bar"~5 6511 with a filter of "baz" will internally leapfrog foo,bar,baz as one 6512 conjunction. (Ryan Ernst, Robert Muir, Adrien Grand) 6513 6514* LUCENE-6325: Reduce RAM usage of FieldInfos, and speed up lookup by 6515 number, by using an array instead of TreeMap except in very sparse 6516 cases (Robert Muir, Mike McCandless) 6517 6518* LUCENE-6617: Reduce heap usage for small FSTs (Mike McCandless) 6519 6520* LUCENE-6616: IndexWriter now lists the files in the index directory 6521 only once on init, and IndexFileDeleter no longer suppresses 6522 FileNotFoundException and NoSuchFileException. This also improves 6523 IndexFileDeleter to delete segments_N files last, so that in the 6524 presence of a virus checker, the index is never left in a state 6525 where an expired segments_N references non-existing files (Robert 6526 Muir, Mike McCandless) 6527 6528* LUCENE-6645: Optimized the way we merge postings lists in multi-term queries 6529 and TermsQuery. This should especially help when there are lots of small 6530 postings lists. (Adrien Grand, Mike McCandless) 6531 6532* LUCENE-6668: Optimized storage for sorted set and sorted numeric doc values 6533 in the case that there are few unique sets of values. 6534 (Adrien Grand, Robert Muir) 6535 6536* LUCENE-6690: Sped up MultiTermsEnum.next() on high-cardinality fields. 6537 (Adrien Grand) 6538 6539* LUCENE-6621: Removed two unused variables in analysis/stempel/src/java/org/ 6540 egothor/stemmer/Compile.java 6541 (Rishabh Patel via Christine Poerschke) 6542 6543Build 6544 6545* LUCENE-6518: Don't report false thread leaks from IBM J9 6546 ClassCache Reaper in test framework. (Dawid Weiss) 6547 6548* LUCENE-6567: Simplify payload checking in SpanPayloadCheckQuery (Alan 6549 Woodward) 6550 6551* LUCENE-6568: Make rat invocation depend on ivy configuration being set up 6552 (Ramkumar Aiyengar) 6553 6554* LUCENE-6683: ivy-fail goal directs people to non-existent page 6555 (Mike Drob via Steve Rowe) 6556 6557* LUCENE-6693: Updated Groovy to 2.4.4, Pegdown to 1.5, Svnkit to 1.8.10. 6558 Also fixed some PermGen errors while running full build caused by 6559 these updates: Tasks are now installed from root's build.xml. 6560 (Uwe Schindler) 6561 6562* LUCENE-6741: Fix jflex files to regenerate the java files correctly. 6563 (Uwe Schindler) 6564 6565Test Framework 6566 6567* LUCENE-6637: Fix FSTTester to not violate file permissions 6568 on -Dtests.verbose=true. (Mesbah M. Alam, Uwe Schindler) 6569 6570* LUCENE-6542: LuceneTestCase now has runWithRestrictedPermissions() to run 6571 an action with reduced permissions. This can be used to simulate special 6572 environments (e.g., read-only dirs). If tests are running without a security 6573 manager, an assume cancels test execution automatically. (Uwe Schindler) 6574 6575* LUCENE-6652: Removed lots of useless Byte(s)TermAttributes all over test 6576 infrastructure. (Uwe Schindler) 6577 6578* LUCENE-6563: Improve MockFileSystemTestCase.testURI to check if a path 6579 can be encoded according to local filesystem requirements. Otherwise 6580 stop test execution. (Christine Poerschke via Uwe Schindler) 6581 6582Changes in Backwards Compatibility Policy 6583 6584* LUCENE-6553: The iterator returned by the LeafReader.postings method now 6585 always includes deleted docs, so you have to check for deleted documents on 6586 top of the iterator. (Adrien Grand) 6587 6588* LUCENE-6633: DuplicateFilter has been deprecated and will be removed in 6.0. 6589 DiversifiedTopDocsCollector can be used instead with a maximum number of hits 6590 per key equal to 1. (Adrien Grand) 6591 6592* LUCENE-6653: The workflow for consuming the TermToBytesRefAttribute was changed: 6593 getBytesRef() now does all work and is called on each token, fillBytesRef() 6594 was removed. The implementation is free to reuse the internal BytesRef 6595 or return a new one on each call. (Uwe Schindler) 6596 6597* LUCENE-6682: StandardTokenizer.setMaxTokenLength() now throws an exception if 6598 a length greater than 1M chars is given. Previously the effective max token 6599 length (the scanner's buffer) was capped at 1M chars, but getMaxTokenLength() 6600 incorrectly returned the previously requested length, even when it exceeded 1M. 6601 (Piotr Idzikowski, Steve Rowe) 6602 6603 6604======================= Lucene 5.2.1 ======================= 6605 6606Bug Fixes 6607 6608* LUCENE-6482: Fix class loading deadlock relating to Codec initialization, 6609 default codec and SPI discovery. (Shikhar Bhushan, Uwe Schindler) 6610 6611* LUCENE-6523: NRT readers now reflect a new commit even if there is 6612 no change to the commit user data (Mike McCandless) 6613 6614* LUCENE-6527: Queries now get a dummy Similarity when scores are not needed 6615 in order to not load unnecessary information like norms. (Adrien Grand) 6616 6617* LUCENE-6559: TimeLimitingCollector now also checks for timeout when a new 6618 leaf reader is pulled ie. if we move from one segment to another even without 6619 collecting a hit. (Simon Willnauer) 6620 6621======================= Lucene 5.2.0 ======================= 6622 6623New Features 6624 6625* LUCENE-6308, LUCENE-6385, LUCENE-6391: Span queries now share 6626 document conjunction/intersection 6627 code with boolean queries, and use two-phased iterators for 6628 faster intersection by avoiding loading positions in certain cases. 6629 (Paul Elschot, Terry Smith, Robert Muir via Mike McCandless) 6630 6631* LUCENE-6393: Add two-phase support to SpanPositionCheckQuery 6632 and its subclasses: SpanPositionRangeQuery, SpanPayloadCheckQuery, 6633 SpanNearPayloadCheckQuery, SpanFirstQuery. (Paul Elschot, Robert Muir) 6634 6635* LUCENE-6394: Add two-phase support to SpanNotQuery and refactor 6636 FilterSpans to just have an accept(Spans candidate) method for 6637 subclasses. (Robert Muir) 6638 6639* LUCENE-6373: SpanOrQuery shares disjunction logic with boolean 6640 queries, and supports two-phased iterators to avoid loading 6641 positions when possible. (Paul Elschot via Robert Muir) 6642 6643* LUCENE-6352, LUCENE-6472: Added a new query time join to the join module 6644 that uses global ordinals, which is faster for subsequent joins between 6645 reopens. (Martijn van Groningen, Adrien Grand) 6646 6647* LUCENE-5879: Added experimental auto-prefix terms to BlockTree terms 6648 dictionary, exposed as AutoPrefixPostingsFormat (Adrien Grand, 6649 Uwe Schindler, Robert Muir, Mike McCandless) 6650 6651* LUCENE-5579: New CompositeSpatialStrategy combines speed of RPT with 6652 accuracy of SDV. Includes optimized Intersect predicate to avoid many 6653 geometry checks. Uses TwoPhaseIterator. (David Smiley) 6654 6655* LUCENE-5989: Allow passing BytesRef to StringField to make it easier 6656 to index arbitrary binary tokens, and change the experimental 6657 StoredFieldVisitor.stringField API to take UTF-8 byte[] instead of 6658 String (Mike McCandless) 6659 6660* LUCENE-6389: Added ScoreMode.Min that aggregates the lowest child score 6661 to the parent hit. (Martijn van Groningen, Adrien Grand) 6662 6663* LUCENE-6423: New LimitTokenOffsetFilter that limits tokens to those before 6664 a configured maximum start offset. (David Smiley) 6665 6666* LUCENE-6422: New spatial PackedQuadPrefixTree, a generally more efficient 6667 choice than QuadPrefixTree, especially for high precision shapes. 6668 When used, you should typically disable RPT's pruneLeafyBranches option. 6669 (Nick Knize, David Smiley) 6670 6671* LUCENE-6451: Expressions now support bindings keys that look like 6672 zero arg functions (Jack Conradson via Ryan Ernst) 6673 6674* LUCENE-6083: Add SpanWithinQuery and SpanContainingQuery that return 6675 spans inside of / containing another spans. (Paul Elschot via Robert Muir) 6676 6677* LUCENE-6454: Added distinction between member variable and method in 6678 expression helper VariableContext 6679 (Jack Conradson via Ryan Ernst) 6680 6681* LUCENE-6196: New Spatial "Geo3d" API with partial Spatial4j integration. 6682 It is a set of shapes implemented using 3D planar geometry for calculating 6683 spatial relations on the surface of a sphere. Shapes include Point, BBox, 6684 Circle, Path (buffered line string), and Polygon. 6685 (Karl Wright via David Smiley) 6686 6687* LUCENE-6464: Add a new expert lookup method to 6688 AnalyzingInfixSuggester to accept an arbitrary BooleanQuery to 6689 express how contexts should be filtered. (Arcadius Ahouansou via 6690 Mike McCandless) 6691 6692Optimizations 6693 6694* LUCENE-6379: IndexWriter.deleteDocuments(Query...) now detects if 6695 one of the queries is MatchAllDocsQuery and just invokes the much 6696 faster IndexWriter.deleteAll in that case (Robert Muir, Adrien 6697 Grand, Mike McCandless) 6698 6699* LUCENE-6388: Optimize SpanNearQuery when payloads are not present. 6700 (Robert Muir) 6701 6702* LUCENE-6421: Defer reading of positions in MultiPhraseQuery until 6703 they are needed. (Robert Muir) 6704 6705* LUCENE-6392: Highligher- reduce memory of tokens in 6706 TokenStreamFromTermVector, and add maxStartOffset limit. (David Smiley) 6707 6708* LUCENE-6456: Queries that generate doc id sets that are too large for the 6709 query cache are not cached instead of evicting everything. (Adrien Grand) 6710 6711* LUCENE-6455: Require a minimum index size to enable query caching in order 6712 not to cache eg. on MemoryIndex. (Adrien Grand) 6713 6714* LUCENE-6330: BooleanScorer (used for top-level disjunctions) does not decode 6715 norms when not necessary anymore. (Adrien Grand) 6716 6717* LUCENE-6350: TermsQuery is now compressed with PrefixCodedTerms. 6718 (Robert Muir, Mike McCandless, Adrien Grand) 6719 6720* LUCENE-6458: Multi-term queries matching few terms per segment now execute 6721 like a disjunction. (Adrien Grand) 6722 6723* LUCENE-6360: TermsQuery rewrites to a disjunction when there are 16 matching 6724 terms or less. (Adrien Grand) 6725 6726Bug Fixes 6727 6728* LUCENE-329: Fix FuzzyQuery defaults to rank exact matches highest. 6729 (Mark Harwood, Adrien Grand) 6730 6731* LUCENE-6378: Fix all RuntimeExceptions to throw the underlying root cause. 6732 (Varun Thacker, Adrien Grand, Mike McCandless) 6733 6734* LUCENE-6415: TermsQuery.extractTerms is a no-op (used to throw an 6735 UnsupportedOperationException). (Adrien Grand) 6736 6737* LUCENE-6416: BooleanQuery.extractTerms now only extracts terms from scoring 6738 clauses. (Adrien Grand) 6739 6740* LUCENE-6409: Fixed integer overflow in LongBitSet.ensureCapacity. 6741 (Luc Vanlerberghe via Adrien Grand) 6742 6743* LUCENE-6424, LUCENE-6430: Fix many bugs with mockfs filesystems in the 6744 test-framework: always consistently wrap Path, fix buggy behavior for 6745 globs, implement equals/hashcode for filtered Paths, etc. 6746 (Ryan Ernst, Simon Willnauer, Robert Muir) 6747 6748* LUCENE-6426: Fix FieldType's copy constructor to also copy over the numeric 6749 precision step. (Adrien Grand) 6750 6751* LUCENE-6345: Null check terms/fields in Lucene queries (Lee 6752 Hinman via Mike McCandless) 6753 6754* LUCENE-6400: SolrSynonymParser should preserve original token instead 6755 of replacing it with a synonym, when expand=true and there is no 6756 explicit mapping (Ian Ribas, Robert Muir, Mike McCandless) 6757 6758* LUCENE-6449: Don't throw NullPointerException if some segments are 6759 missing the field being highlighted, in PostingsHighlighter (Roman 6760 Khmelichek via Mike McCandless) 6761 6762* LUCENE-6427: Added assertion about the presence of ghost bits in 6763 (Fixed|Long)BitSet. (Luc Vanlerberghe via Adrien Grand) 6764 6765* LUCENE-6468: Fixed NPE with empty Kuromoji user dictionary. 6766 (Jun Ohtani via Christian Moen) 6767 6768* LUCENE-6483: Ensure core closed listeners are called on the same cache key as 6769 the reader which has been used to register the listener. (Adrien Grand) 6770 6771* LUCENE-6486 DocumentDictionary iterator no longer skips 6772 documents with no payloads and now returns an empty BytesRef instead 6773 (Marius Grama via Michael McCandless) 6774 6775* LUCENE-6505: NRT readers now reflect segments_N filename and commit 6776 user data from previous commits (Mike McCandless) 6777 6778* LUCENE-6507: Don't let NativeFSLock.close() release other locks 6779 (Simon Willnauer, Robert Muir, Uwe Schindler, Mike McCandless) 6780 6781API Changes 6782 6783* LUCENE-6377: SearcherFactory#newSearcher now accepts the previous reader 6784 to simplify warming logic during opening new searchers. (Simon Willnauer) 6785 6786* LUCENE-6410: Removed unused "reuse" parameter to 6787 Terms.iterator. (Robert Muir, Mike McCandless) 6788 6789* LUCENE-6425: Replaced Query.extractTerms with Weight.extractTerms. 6790 (Adrien Grand) 6791 6792* LUCENE-6446: Simplified Explanation API. (Adrien Grand) 6793 6794* LUCENE-6445: Two new methods in Highlighter's TokenSources; the existing 6795 methods are now marked deprecated. (David Smiley) 6796 6797* LUCENE-6484: Removed EliasFanoDocIdSet, which was unused. 6798 (Paul Elschot via Adrien Grand) 6799 6800* LUCENE-6466: Moved SpanQuery.getSpans() and .extractTerms() to SpanWeight 6801 (Alan Woodward, Robert Muir) 6802 6803* LUCENE-6497: Allow subclasses of FieldType to check frozen state 6804 (Ryan Ernst) 6805 6806Other 6807 6808* LUCENE-6413: Test runner should report the number of suites completed/ 6809 remaining. (Dawid Weiss) 6810 6811* LUCENE-5439: Add 'ant jacoco' build target. (Robert Muir) 6812 6813* LUCENE-6315: Simplify the private iterator Lucene uses internally 6814 when resolving deleted terms to matched docids. (Robert Muir, Adrien 6815 Grand, Mike McCandless) 6816 6817* LUCENE-6399: Benchmark module's QueryMaker.resetInputs should call setConfig 6818 so queries can react to property changes in new rounds. (David Smiley) 6819 6820* LUCENE-6382: Lucene now enforces that positions never exceed the 6821 maximum value IndexWriter.MAX_POSITION. (Robert Muir, Mike McCandless) 6822 6823* LUCENE-6372: Simplified and improved equals/hashcode of span queries. 6824 (Paul Elschot via Adrien Grand) 6825 6826Build 6827 6828* LUCENE-6420: Update forbiddenapis to v1.8 (Uwe Schindler) 6829 6830Test Framework 6831 6832* LUCENE-6419: Added two-phase iteration assertions to AssertingQuery. 6833 (Adrien Grand) 6834 6835* LUCENE-6437: Randomly set CPU core count and spins, derived from 6836 test's master seed, used by ConcurrentMergeScheduler to set dynamic 6837 defaults, for better test randomization and to help tests reproduce 6838 (Robert Muir, Mike McCandless) 6839 6840======================= Lucene 5.1.0 ======================= 6841 6842New Features 6843 6844* LUCENE-6066: Added DiversifiedTopDocsCollector to misc for collecting no more 6845 than a given number of results under a choice of key. Introduces new remove 6846 method to core's PriorityQueue. (Mark Harwood) 6847 6848* LUCENE-6191: New spatial 2D heatmap faceting for PrefixTreeStrategy. (David Smiley) 6849 6850* LUCENE-6227: Added BooleanClause.Occur.FILTER to filter documents without 6851 participating in scoring (on the contrary to MUST). (Adrien Grand) 6852 6853* LUCENE-6294: Added oal.search.CollectorManager to allow for parallelization 6854 of the document collection process on IndexSearcher. (Adrien Grand) 6855 6856* LUCENE-6303: Added filter caching baked into IndexSearcher, disabled by 6857 default. (Adrien Grand) 6858 6859* LUCENE-6304: Added a new MatchNoDocsQuery that matches no documents. 6860 (Lee Hinman via Adrien Grand) 6861 6862* LUCENE-6341: Add a -fast option to CheckIndex. (Robert Muir) 6863 6864* LUCENE-6355: IndexWriter's infoStream now also logs time to write FieldInfos 6865 during merge (Lee Hinman via Mike McCandless) 6866 6867* LUCENE-6339: Added Near-real time Document Suggester via custom postings format 6868 (Areek Zillur, Mike McCandless, Simon Willnauer) 6869 6870Bug Fixes 6871 6872* LUCENE-6368: FST.save can truncate output (BufferedOutputStream may be closed 6873 after the underlying stream). (Ippei Matsushima via Dawid Weiss) 6874 6875* LUCENE-6249: StandardQueryParser doesn't support pure negative clauses. 6876 (Dawid Weiss) 6877 6878* LUCENE-6190: Spatial pointsOnly flag on PrefixTreeStrategy shouldn't switch all predicates to 6879 Intersects. (David Smiley) 6880 6881* LUCENE-6242: Ram usage estimation was incorrect for SparseFixedBitSet when 6882 object alignment was different from 8. (Uwe Schindler, Adrien Grand) 6883 6884* LUCENE-6293: Fixed TimSorter bug. (Adrien Grand) 6885 6886* LUCENE-6001: DrillSideways hits NullPointerException for certain 6887 BooleanQuery searches. (Dragan Jotannovic, jane chang via Mike 6888 McCandless) 6889 6890* LUCENE-6311: Fix NIOFSDirectory and SimpleFSDirectory so that the 6891 toString method of IndexInputs confess when they are from a compound 6892 file. (Robert Muir, Mike McCandless) 6893 6894* LUCENE-6381: Add defensive wait time limit in 6895 DocumentsWriterStallControl to prevent hangs during indexing if we 6896 miss a .notify/All somewhere (Mike McCandless) 6897 6898* LUCENE-6386: Correct IndexWriter.forceMerge documentation to state 6899 that up to 3X (X = current index size) spare disk space may be needed 6900 to complete forceMerge(1). (Robert Muir, Shai Erera, Mike McCandless) 6901 6902* LUCENE-6395: Seeking by term ordinal was failing to set the term's 6903 bytes in MemoryIndex (Mike McCandless) 6904 6905* LUCENE-6429: Removed the TermQuery(Term,int) constructor which could lead to 6906 inconsistent term statistics. (Adrien Grand, Robert Muir) 6907 6908Optimizations 6909 6910* LUCENE-6183, LUCENE-5647: Avoid recompressing stored fields 6911 and term vectors when merging segments without deletions. 6912 Lucene50Codec's BEST_COMPRESSION mode uses a higher deflate 6913 level for more compact storage. (Robert Muir) 6914 6915* LUCENE-6184: Make BooleanScorer only score windows that contain 6916 matches. (Adrien Grand) 6917 6918* LUCENE-6161: Speed up resolving of deleted terms to docIDs by doing 6919 a combined merge sort between deleted terms and segment terms 6920 instead of a separate merge sort for each segment. In delete-heavy 6921 use cases this can be a sizable speedup. (Mike McCandless) 6922 6923* LUCENE-6201: BooleanScorer can now deal with values of minShouldMatch that 6924 are greater than one and is used when queries produce dense result sets. 6925 (Adrien Grand) 6926 6927* LUCENE-6218: Don't decode frequencies or match all positions when scoring 6928 is not needed. (Robert Muir) 6929 6930* LUCENE-6233 Speed up CheckIndex when the index has term vectors 6931 (Robert Muir, Mike McCandless) 6932 6933* LUCENE-6198: Added the TwoPhaseIterator API, exposed on scorers which 6934 is for now only used on phrase queries and conjunctions in order to check 6935 positions lazily if the phrase query is in a conjunction with other queries. 6936 (Robert Muir, Adrien Grand, David Smiley) 6937 6938* LUCENE-6244, LUCENE-6251: All boolean queries but those that have a 6939 minShouldMatch > 1 now either propagate or take advantage of the two-phase 6940 iteration capabilities added in LUCENE-6198. (Adrien Grand, Robert Muir) 6941 6942* LUCENE-6241: FSDirectory.listAll() doesnt filter out subdirectories anymore, 6943 for faster performance. Subdirectories don't matter to Lucene. If you need to 6944 filter out non-index files with some custom usage, you may want to look at 6945 the IndexFileNames class. (Robert Muir) 6946 6947* LUCENE-6262: ConstantScoreQuery does not wrap the inner weight anymore when 6948 scores are not required. (Adrien Grand) 6949 6950* LUCENE-6263: MultiCollector automatically caches scores when several 6951 collectors need them. (Adrien Grand) 6952 6953* LUCENE-6275: SloppyPhraseScorer now uses the same logic as ConjunctionScorer 6954 in order to advance doc IDs, which takes advantage of the cost() API. 6955 (Adrien Grand) 6956 6957* LUCENE-6290: QueryWrapperFilter propagates approximations and FilteredQuery 6958 rewrites to a BooleanQuery when the filter is a QueryWrapperFilter in order 6959 to leverage approximations. (Adrien Grand) 6960 6961* LUCENE-6318: Reduce RAM usage of FieldInfos when there are many fields. 6962 (Mike McCandless, Robert Muir) 6963 6964* LUCENE-6320: Speed up CheckIndex. (Robert Muir) 6965 6966* LUCENE-4942: Optimized the encoding of PrefixTreeStrategy indexes for 6967 non-point data: 33% smaller index, 68% faster indexing, and 44% faster 6968 searching. YMMV (David Smiley) 6969 6970API Changes 6971 6972* LUCENE-6204, LUCENE-6208: Simplify CompoundFormat: remove files() 6973 and remove files parameter to write(). (Robert Muir) 6974 6975* LUCENE-6217: Add IndexWriter.isOpen and getTragicException. (Simon 6976 Willnauer, Mike McCandless) 6977 6978* LUCENE-6218, LUCENE-6220: Add Collector.needsScores() and needsScores 6979 parameter to Query.createWeight(). (Robert Muir, Adrien Grand) 6980 6981* LUCENE-4524, LUCENE-6246, LUCENE-6256, LUCENE-6271: Merge DocsEnum and DocsAndPositionsEnum 6982 into a single PostingsEnum iterator. TermsEnum.docs() and TermsEnum.docsAndPositions() 6983 are replaced by TermsEnum.postings(). 6984 (Alan Woodward, Simon Willnauer, Robert Muir, Ryan Ernst) 6985 6986* LUCENE-6222: Removed TermFilter, use a QueryWrapperFilter(TermQuery) 6987 instead. This will be as efficient now that queries can opt out from 6988 scoring. (Adrien Grand) 6989 6990* LUCENE-6269: Removed BooleanFilter, use a QueryWrapperFilter(BooleanQuery) 6991 instead. (Adrien Grand) 6992 6993* LUCENE-6270: Replaced TermsFilter with TermsQuery, use a 6994 QueryWrapperFilter(TermsQuery) instead. (Adrien Grand) 6995 6996* LUCENE-6223: Move BooleanQuery.BooleanWeight to BooleanWeight. 6997 (Robert Muir) 6998 6999* LUCENE-1518: Make Filter extend Query and return 0 as score. 7000 (Uwe Schindler, Adrien Grand) 7001 7002* LUCENE-6245: Force Filter subclasses to implement toString API from Query. 7003 (Ryan Ernst) 7004 7005* LUCENE-6268: Replace FieldValueFilter and DocValuesRangeFilter with equivalent 7006 queries that support approximations. (Adrien Grand) 7007 7008* LUCENE-6289: Replace DocValuesRangeFilter with DocValuesRangeQuery which 7009 supports approximations. (Adrien Grand) 7010 7011* LUCENE-6266: Remove unnecessary Directory params from SegmentInfo.toString, 7012 SegmentInfos.files/toString, and SegmentCommitInfo.toString. (Robert Muir) 7013 7014* LUCENE-6272: Scorer extends DocSetIdIterator rather than DocsEnum (Alan 7015 Woodward) 7016 7017* LUCENE-6281: Removed support for slow collations from lucene/sandbox. Better 7018 performance would be achieved through CollationKeyAnalyzer or 7019 ICUCollationKeyAnalyzer. (Adrien Grand) 7020 7021* LUCENE-6286: Removed IndexSearcher methods that take a Filter object. 7022 A BooleanQuery with a filter clause must be used instead. (Adrien Grand) 7023 7024* LUCENE-6300: PrefixFilter, TermRangeFilter and NumericRangeFilter have been 7025 removed. Use PrefixQuery, TermRangeQuery and NumericRangeQuery instead. 7026 (Adrien Grand) 7027 7028* LUCENE-6303: Replaced FilterCache with QueryCache and CachingWrapperFilter 7029 with CachingWrapperQuery. (Adrien Grand) 7030 7031* LUCENE-6317: Deprecate DataOutput.writeStringSet and writeStringStringMap. 7032 Use writeSetOfStrings/Maps instead. (Mike McCandless, Robert Muir) 7033 7034* LUCENE-6307: Rename SegmentInfo.getDocCount -> .maxDoc, 7035 SegmentInfos.totalDocCount -> .totalMaxDoc, MergeInfo.totalDocCount 7036 -> .totalMaxDoc and MergePolicy.OneMerge.totalDocCount -> 7037 .totalMaxDoc (Adrien Grand, Robert Muir, Mike McCandless) 7038 7039* LUCENE-6367: PrefixQuery now subclasses AutomatonQuery, removing the 7040 specialized PrefixTermsEnum. (Robert Muir, Mike McCandless) 7041 7042Other 7043 7044* LUCENE-6248: Remove unused odd constants from StandardSyntaxParser.jj 7045 (Dawid Weiss) 7046 7047* LUCENE-6193: Collapse identical catch branches in try-catch statements. 7048 (shalin) 7049 7050* LUCENE-6239: Removed RAMUsageEstimator's sun.misc.Unsafe calls. 7051 (Robert Muir, Dawid Weiss, Uwe Schindler) 7052 7053* LUCENE-6292: Seed StringHelper better. (Robert Muir) 7054 7055* LUCENE-6333: Refactored queries to delegate their equals and hashcode 7056 impls to the super class. (Lee Hinman via Adrien Grand) 7057 7058* LUCENE-6343: DefaultSimilarity javadocs had the wrong float value to 7059 demonstrate precision of encoded norms (András Péteri via Mike McCandless) 7060 7061Changes in Runtime Behavior 7062 7063* LUCENE-6255: PhraseQuery now ignores leading holes and requires that 7064 positions are positive and added in order. (Adrien Grand) 7065 7066* LUCENE-6298: SimpleQueryParser returns an empty query rather than 7067 null, if e.g. the terms were all stopwords. (Lee Hinman via Robert Muir) 7068 7069======================= Lucene 5.0.0 ======================= 7070 7071New Features 7072 7073* LUCENE-5945: All file handling converted to NIO.2 apis. (Robert Muir) 7074 7075* LUCENE-5946: SimpleFSDirectory now uses Files.newByteChannel, for 7076 portability with custom FileSystemProviders. If you want the old 7077 non-interruptible behavior of RandomAccessFile, use RAFDirectory 7078 in the misc/ module. (Uwe Schindler, Robert Muir) 7079 7080* SOLR-3359: Added analyzer attribute/property to SynonymFilterFactory. 7081 (Ryo Onodera via Koji Sekiguchi) 7082 7083* LUCENE-5648: Index and search date ranges, particularly multi-valued ones. It's 7084 implemented in the spatial module as DateRangePrefixTree used with 7085 NumberRangePrefixTreeStrategy. (David Smiley) 7086 7087* LUCENE-5895: Lucene now stores a unique id per-segment and per-commit to aid 7088 in accurate replication of index files (Robert Muir, Mike McCandless) 7089 7090* LUCENE-5889: Add commit method to AnalyzingInfixSuggester, and allow just using .add 7091 to build up the suggester. (Varun Thacker via Mike McCandless) 7092 7093* LUCENE-5123: Add a "pull" option to the postings writing API, so 7094 that a PostingsFormat now receives a Fields instance and it is 7095 responsible for iterating through all fields, terms, documents and 7096 positions. (Robert Muir, Mike McCandless) 7097 7098* LUCENE-5268: Full cutover of all postings formats to the "pull" 7099 FieldsConsumer API, removing PushFieldsConsumer. Added new 7100 PushPostingsWriterBase for single-pass push of docs/positions to the 7101 postings format. (Mike McCandless) 7102 7103* LUCENE-5906: Use Files.delete everywhere instead of File.delete, so that 7104 when things go wrong, you get a real exception message why. 7105 (Uwe Schindler, Robert Muir) 7106 7107* LUCENE-5933: Added FilterSpans for easier wrapping of Spans instance. (Shai Erera) 7108 7109* LUCENE-5925: Remove fallback logic from opening commits, instead use 7110 Directory.renameFile so that in-progress commits are never visible. 7111 (Robert Muir) 7112 7113* LUCENE-5820: SuggestStopFilter should have a factory. 7114 (Varun Thacker via Steve Rowe) 7115 7116* LUCENE-5949: Add Accountable.getChildResources(). (Robert Muir) 7117 7118* SOLR-5986: Added ExitableDirectoryReader that extends FilterDirectoryReader and enables 7119 exiting requests that take too long to enumerate over terms. (Anshum Gupta, Steve Rowe, 7120 Robert Muir) 7121 7122* LUCENE-5911: Add MemoryIndex.freeze() to allow thread-safe searching over a 7123 MemoryIndex. (Alan Woodward, David Smiley, Robert Muir) 7124 7125* LUCENE-5969: Lucene 5.0 has a new index format with mismatched file detection, 7126 improved exception handling, and indirect norms encoding for sparse fields. 7127 (Mike McCandless, Ryan Ernst, Robert Muir) 7128 7129* LUCENE-6053: Add Serbian analyzer. (Nikola Smolenski via Robert Muir, Mike McCandless) 7130 7131* LUCENE-4400: Add support for new NYSIIS Apache commons phonetic 7132 codec (Thomas Neidhart via Mike McCandless) 7133 7134* LUCENE-6059: Add Daitch-Mokotoff Soundex phonetic Apache commons 7135 phonetic codec, and upgrade to Apache commons codec 1.10. (Thomas 7136 Neidhart via Mike McCandless) 7137 7138* LUCENE-6058: With the upgrade to Apache commons codec 1.10, the 7139 experimental BeiderMorseFilter has changed its behavior, so any 7140 index using it will need to be rebuilt. (Thomas 7141 Neidhart via Mike McCandless) 7142 7143* LUCENE-6050: Accept MUST and MUST_NOT (in addition to SHOULD) for 7144 each context passed to Analyzing/BlendedInfixSuggester (Arcadius 7145 Ahouansou, jane chang via Mike McCandless) 7146 7147* LUCENE-5929: Also extract terms to highlight from block join 7148 queries. (Julie Tibshirani via Mike McCandless) 7149 7150* LUCENE-6063: Allow overriding whether/how ConcurrentMergeScheduler 7151 stalls incoming threads when merges are falling behind (Mike 7152 McCandless) 7153 7154* LUCENE-5833: DocumentDictionary now enumerates each value separately 7155 in a multi-valued field (not just the first value), so you can build 7156 suggesters from multi-valued fields. (Varun Thacker via Mike 7157 McCandless) 7158 7159* LUCENE-6077: Added a filter cache. (Adrien Grand, Robert Muir) 7160 7161* LUCENE-6088: TermsFilter implements Accountable. (Adrien Grand) 7162 7163* LUCENE-6034: The default highlighter when used with QueryScorer will highlight payload-sensitive 7164 queries provided that term vectors with positions, offsets, and payloads are present. This is the 7165 only highlighter that can highlight such queries accurately. (David Smiley) 7166 7167* LUCENE-5914: Add an option to Lucene50Codec to support either BEST_SPEED 7168 or BEST_COMPRESSION for stored fields. (Adrien Grand, Robert Muir) 7169 7170* LUCENE-6119: Add auto-IO-throttling to ConcurrentMergeScheduler, to 7171 rate limit IO writes for each merge depending on incoming merge 7172 rate. (Mike McCandless) 7173 7174* LUCENE-6155: Add payload support to MemoryIndex. The default highlighter's 7175 QueryScorer and WeighedSpanTermExtractor now have setUsePayloads(bool). 7176 (David Smiley) 7177 7178* LUCENE-6166: Deletions (alone) can now trigger new merges. (Mike McCandless) 7179 7180* LUCENE-6177: Add CustomAnalyzer that allows to configure analyzers 7181 like you do in Solr's index schema. This class has a builder API to configure 7182 Tokenizers, TokenFilters, and CharFilters based on their SPI names 7183 and parameters as documented by the corresponding factories. 7184 (Uwe Schindler) 7185 7186Optimizations 7187 7188* LUCENE-5960: Use a more efficient bitset, not a Set<Integer>, to 7189 track visited states. (Markus Heiden via Mike McCandless) 7190 7191* LUCENE-5959: Don't allocate excess memory when building automaton in 7192 finish. (Markus Heiden via Mike McCandless) 7193 7194* LUCENE-5963: Reduce memory allocations in 7195 AnalyzingSuggester. (Markus Heiden via Mike McCandless) 7196 7197* LUCENE-5938: MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is now faster on 7198 queries that match few documents by using a sparse bit set implementation. 7199 (Adrien Grand) 7200 7201* LUCENE-5969: Refactor merging to be more efficient, checksum calculation is 7202 per-segment/per-producer, and norms and doc values merging no longer cause 7203 RAM spikes for latent fields. (Mike McCandless, Robert Muir) 7204 7205* LUCENE-5983: CachingWrapperFilter now uses a new DocIdSet implementation 7206 called RoaringDocIdSet instead of WAH8DocIdSet. (Adrien Grand) 7207 7208* LUCENE-6022: DocValuesDocIdSet checks live docs before doc values. 7209 (Adrien Grand) 7210 7211* LUCENE-6030: Add norms patched compression for a small number of common values 7212 (Ryan Ernst) 7213 7214* LUCENE-6040: Speed up EliasFanoDocIdSet through broadword bit selection. 7215 (Paul Elschot) 7216 7217* LUCENE-6033: CachingTokenFilter now uses ArrayList not LinkedList, and has new 7218 isCached() method. (David Smiley) 7219 7220* LUCENE-6031: TokenSources (in the default highlighter) converts term vectors into a 7221 TokenStream much faster in linear time (not N*log(N) using less memory, and with reset() 7222 implemented. Only one of offsets or positions are required of the term vector. 7223 (David Smiley) 7224 7225* LUCENE-6089, LUCENE-6090: Tune CompressionMode.HIGH_COMPRESSION for 7226 better compression and less cpu usage. (Adrien Grand, Robert Muir) 7227 7228* LUCENE-6034: QueryScorer, used by the default highlighter, needn't re-index the provided 7229 TokenStream with MemoryIndex when it comes from TokenSources (term vectors) with offsets and 7230 positions. (David Smiley) 7231 7232* LUCENE-5951: ConcurrentMergeScheduler detects whether the index is on SSD or not 7233 and does a better job defaulting its settings. This only works on Linux for now; 7234 other OS's will continue to use the previous defaults (tuned for spinning disks). 7235 (Robert Muir, Uwe Schindler, hossman, Mike McCandless) 7236 7237* LUCENE-6131: Optimize SortingMergePolicy. (Robert Muir) 7238 7239* LUCENE-6133: Improve default StoredFieldsWriter.merge() to be more efficient. 7240 (Robert Muir) 7241 7242* LUCENE-6145: Make EarlyTerminatingSortingCollector able to early-terminate 7243 when the sort order is a prefix of the index-time order. (Adrien Grand) 7244 7245* LUCENE-6178: Score boolean queries containing MUST_NOT clauses with BooleanScorer2, 7246 to use skip list data and avoid unnecessary scoring. (Adrien Grand, Robert Muir) 7247 7248API Changes 7249 7250* LUCENE-5900: Deprecated more constructors taking Version in *InfixSuggester and 7251 ICUCollationKeyAnalyzer, and removed TEST_VERSION_CURRENT from the test framework. 7252 (Ryan Ernst) 7253 7254* LUCENE-4535: oal.util.FilterIterator is now an internal API. 7255 (Adrien Grand) 7256 7257* LUCENE-4924: DocIdSetIterator.docID() must now return -1 when the iterator is 7258 not positioned. This change affects all classes that inherit from 7259 DocIdSetIterator, including DocsEnum and DocsAndPositionsEnum. (Adrien Grand) 7260 7261* LUCENE-5127: Reduce RAM usage of FixedGapTermsIndex. Remove 7262 IndexWriterConfig.setTermIndexInterval, IndexWriterConfig.setReaderTermsIndexDivisor, 7263 and termsIndexDivisor from StandardDirectoryReader. These options have been no-ops 7264 with the default codec since Lucene 4.0. If you want to configure the interval for 7265 this term index, pass it directly in your codec, where it can also be configured 7266 per-field. (Robert Muir) 7267 7268* LUCENE-5388: Remove Reader from Tokenizer's constructor and from 7269 Analyzer's createComponents. TokenStreams now always get their input 7270 via setReader. 7271 (Benson Margulies via Robert Muir - pull request #16) 7272 7273* LUCENE-5527: The Collector API has been refactored to use a dedicated Collector 7274 per leaf. (Shikhar Bhushan, Adrien Grand) 7275 7276* LUCENE-5702: The FieldComparator API has been refactor to a per-leaf API, just 7277 like Collectors. (Adrien Grand) 7278 7279* LUCENE-4246: IndexWriter.close now always closes, even if it throws 7280 an exception. The new IndexWriterConfig.setCommitOnClose (default 7281 true) determines whether close() should commit before closing. 7282 7283* LUCENE-5608, LUCENE-5565: Refactor SpatialPrefixTree/Cell API. Doesn't use Strings 7284 as tokens anymore, and now iterates cells on-demand during indexing instead of 7285 building a collection. RPT now has more setters. (David Smiley) 7286 7287* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) 7288 to use the DocValues API instead of FieldCache. For FieldCache functionality, 7289 use UninvertingReader in lucene/misc (or implement your own FilterReader). 7290 UninvertingReader is more efficient: supports multi-valued numeric fields, 7291 detects when a multi-valued field is single-valued, reuses caches 7292 of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access 7293 without insanity). "Insanity" is no longer possible unless you explicitly want it. 7294 Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. 7295 Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which 7296 takes the same selectors. Add helper methods to DocValues.java that are better 7297 suited for search code (never return null, etc). (Mike McCandless, Robert Muir) 7298 7299* LUCENE-5871: Remove Version from IndexWriterConfig. Use 7300 IndexWriterConfig.setCommitOnClose to change the behavior of IndexWriter.close(). 7301 The default has been changed to match that of 4.x. 7302 (Ryan Ernst, Mike McCandless) 7303 7304* LUCENE-5965: CorruptIndexException requires a String or DataInput resource. 7305 (Robert Muir) 7306 7307* LUCENE-5972: IndexFormatTooOldException and IndexFormatTooNewException now 7308 extend from IOException. 7309 (Ryan Ernst, Robert Muir) 7310 7311* LUCENE-5569: *AtomicReader/AtomicReaderContext have been renamed to *LeafReader/LeafReaderContext. 7312 (Ryan Ernst) 7313 7314* LUCENE-5938: Removed MultiTermQuery.ConstantScoreAutoRewrite as 7315 MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is usually better. (Adrien Grand) 7316 7317* LUCENE-5924: Rename CheckIndex -fix option to -exorcise. This option does not 7318 actually fix the index, it just drops data. (Robert Muir) 7319 7320* LUCENE-5969: Add Codec.compoundFormat, which handles the encoding of compound 7321 files. Add getMergeInstance() to codec producer APIs, which can be overridden 7322 to return an instance optimized for merging instead of searching. Add 7323 Terms.getStats() which can return additional codec-specific statistics about a field. 7324 Change instance method SegmentInfos.read() to two static methods: SegmentInfos.readCommit() 7325 and SegmentInfos.readLatestCommit(). 7326 (Mike McCandless, Robert Muir) 7327 7328* LUCENE-5992: Remove FieldInfos from SegmentInfosWriter.write API. (Robert Muir, Mike McCandless) 7329 7330* LUCENE-5998: Simplify Field/SegmentInfoFormat to read+write methods. 7331 (Robert Muir) 7332 7333* LUCENE-6000: Removed StandardTokenizerInterface. Tokenizers now use 7334 their jflex impl directly. 7335 (Ryan Ernst) 7336 7337* LUCENE-6006: Removed FieldInfo.normType since it's redundant: it 7338 will be DocValuesType.NUMERIC if the field indexed and does not omit 7339 norms, else null. (Robert Muir, Mike McCandless) 7340 7341* LUCENE-6013: Removed indexed boolean from IndexableFieldType and 7342 FieldInfo, since it's redundant with IndexOptions != null. (Robert 7343 Muir, Mike McCandless) 7344 7345* LUCENE-6021: FixedBitSet.nextSetBit now returns DocIdSetIterator.NO_MORE_DOCS 7346 instead of -1 when there are no more bits which are set. (Adrien Grand) 7347 7348* LUCENE-5953: Directory and LockFactory APIs were restructured: Locking is 7349 now under the responsibility of the Directory implementation. LockFactory is 7350 only used by subclasses of BaseDirectory to delegate locking to an impl 7351 class. LockFactories are now singletons and are responsible to create a Lock 7352 instance based on a Directory implementation passed to the factory method. 7353 See MIGRATE.txt for more details. (Uwe Schindler, Robert Muir) 7354 7355* LUCENE-6062: Throw exception instead of silently doing nothing if you try to 7356 sort/group/etc on a misconfigured field (e.g. no docvalues, no UninvertingReader, etc). 7357 (Robert Muir) 7358 7359* LUCENE-6068: LeafReader.fields() never returns null. (Robert Muir) 7360 7361* LUCENE-6082: Remove abort() from codec apis. (Robert Muir) 7362 7363* LUCENE-6084: IndexOutput's constructor now requires a String 7364 resourceDescription so its toString is sane (Robert Muir, Mike 7365 McCandless) 7366 7367* LUCENE-6087: Allow passing custom DirectoryReader to SearcherManager 7368 (Mike McCandless) 7369 7370* LUCENE-6085: Undeprecate SegmentInfo attributes, but add safety so they 7371 won't be trappy if codec tries to use them during docvalues updates. 7372 (Robert Muir) 7373 7374* LUCENE-6097: Remove dangerous / overly expert 7375 IndexWriter.abortMerges and waitForMerges methods. (Robert Muir, 7376 Mike McCandless) 7377 7378* LUCENE-6099: Add FilterDirectory.unwrap and 7379 FilterDirectoryReader.unwrap (Simon Willnauer, Mike McCandless) 7380 7381* LUCENE-6121: CachingTokenFilter.reset() now propagates to its input if called before 7382 incrementToken(). You must call reset() now on this filter instead of doing it a-priori on the 7383 input(), which previously didn't work. (David Smiley, Robert Muir) 7384 7385* LUCENE-6147: Make the core Accountables.namedAccountable function public 7386 (Ryan Ernst) 7387 7388* LUCENE-6150: Remove staleFiles set and onIndexOutputClosed() from FSDirectory. 7389 (Uwe Schindler, Robert Muir, Mike McCandless) 7390 7391* LUCENE-6146: Replaced Directory.copy() with Directory.copyFrom(). 7392 (Robert Muir) 7393 7394* LUCENE-6149: Infix suggesters' highlighting and allTermsRequired can 7395 be set at the constructor for non-contextual lookup. 7396 (Boon Low, Tomás Fernández Löbbe) 7397 7398* LUCENE-6158, LUCENE-6165: IndexWriter.addIndexes(IndexReader...) changed to 7399 addIndexes(CodecReader...) (Robert Muir) 7400 7401* LUCENE-6179: Out-of-order scoring is not allowed anymore, so 7402 Weight.scoresDocsOutOfOrder and LeafCollector.acceptsDocsOutOfOrder have been 7403 removed and boolean queries now always score in order. 7404 7405* LUCENE-6212: IndexWriter no longer accepts per-document Analyzer to 7406 add/updateDocument. These methods were trappy as they made it 7407 easy to accidentally index tokens that were not easily 7408 searchable. (Mike McCandless) 7409 7410Bug Fixes 7411 7412* LUCENE-5650: Enforce read-only access to any path outside the temporary 7413 folder via security manager, and make test temp dirs absolute. 7414 (Ryan Ernst, Dawid Weiss) 7415 7416* LUCENE-5948: RateLimiter now fully inits itself on init. (Varun 7417 Thacker via Mike McCandless) 7418 7419* LUCENE-5981: CheckIndex obtains write.lock, since with some parameters it 7420 may modify the index, and to prevent false corruption reports, as it does 7421 not have the regular "spinlock" of DirectoryReader.open. It now implements 7422 Closeable and you must close it to release the lock. (Mike McCandless, Robert Muir) 7423 7424* LUCENE-6004: Don't highlight the LookupResult.key returned from 7425 AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless) 7426 7427* LUCENE-5980: Don't let document length overflow. (Robert Muir) 7428 7429* LUCENE-5961: Fix the exists() method for FunctionValues returned by many ValueSources to 7430 behave properly when wrapping other ValueSources which do not exist for the specified document 7431 (hossman) 7432 7433* LUCENE-6039: Add IndexOptions.NONE and DocValuesType.NONE instead of 7434 using null to mean not index and no doc values, renamed 7435 IndexOptions.DOCS_ONLY to DOCS, and pulled IndexOptions and 7436 DocValues out of FieldInfo into their own classes in 7437 org.apache.lucene.index (Simon Willnauer, Robert Muir, Mike 7438 McCandless) 7439 7440* LUCENE-6041: Remove sugar methods FieldInfo.isIndexed and 7441 FieldInfo.hasDocValues. (Robert Muir, Mike McCandless) 7442 7443* LUCENE-6044: Fix backcompat support for token filters with enablePositionIncrements=false. 7444 Also fixed backcompat for TrimFilter with updateOffsets=true. These options 7445 are supported with a match version before 4.4, and no longer valid at all with 5.0. 7446 (Ryan Ernst) 7447 7448* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases, 7449 such as when nested inside a boolean query. (Denis Lantsman via Robert Muir) 7450 7451* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has 7452 an exponential worst case) so that if it would create too many states, it 7453 now throws an exception instead of exhausting CPU/RAM. (Nik 7454 Everett via Mike McCandless) 7455 7456* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via 7457 Mike McCandless) 7458 7459* LUCENE-6049: Don't throw cryptic exception writing a segment when 7460 the only docs in it had fields that hit non-aborting exceptions 7461 during indexing but also had doc values. (Mike McCandless) 7462 7463* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying 7464 bytes. (Shai Erera) 7465 7466* LUCENE-6060: Remove dangerous IndexWriter.unlock method (Simon 7467 Willnauer, Mike McCandless) 7468 7469* LUCENE-6062: Pass correct fieldinfos to docvalues producer when the 7470 segment has updates. (Mike McCandless, Shai Erera, Robert Muir) 7471 7472* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes 7473 via Mike McCandless) 7474 7475* LUCENE-5987: IndexWriter will now forcefully close itself on 7476 aborting exception (an exception that would otherwise cause silent 7477 data loss). (Robert Muir, Mike McCandless) 7478 7479* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even 7480 when it's stalling because there are too many merges. (Mike McCandless) 7481 7482* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is 7483 small, or if the cache would be > 20% of the size of the FST. 7484 (Robert Muir, Mike McCandless) 7485 7486* LUCENE-6124: Fix double-close() problems in codec and store APIs. 7487 (Robert Muir) 7488 7489* LUCENE-6152: Fix double close problems in OutputStreamIndexOutput. 7490 (Uwe Schindler) 7491 7492* LUCENE-6139: Highlighter: TokenGroup start & end offset getters should have 7493 been returning the offsets of just the matching tokens in the group when 7494 there's a distinction. (David Smiley) 7495 7496* LUCENE-6173: NumericTermAttribute and spatial/CellTokenStream do not clone 7497 their BytesRef(Builder)s. Also equals/hashCode was missing. (Uwe Schindler) 7498 7499* LUCENE-6205: Fixed intermittent concurrency issue that could cause 7500 FileNotFoundException when writing doc values updates at the same 7501 time that a merge kicks off. (Mike McCandless) 7502 7503* LUCENE-6192: Fix int overflow corruption case in skip data for 7504 high frequency terms in extremely large indices (Robert Muir, Mike 7505 McCandless) 7506 7507* LUCENE-6093: Don't throw NullPointerException from 7508 BlendedInfixSuggester for lookups that do not end in a prefix 7509 token. (jane chang via Mike McCandless) 7510 7511* LUCENE-6214: Fixed IndexWriter deadlock when one thread is 7512 committing while another opens a near-real-time reader and an 7513 unrecoverable (tragic) exception is hit. (Simon Willnauer, Mike 7514 McCandless) 7515 7516Documentation 7517 7518* LUCENE-5392: Add/improve analysis package documentation to reflect 7519 analysis API changes. (Benson Margulies via Robert Muir - pull request #17) 7520 7521* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless) 7522 7523* LUCENE-6112: Fix compile error in FST package example code 7524 (Tomoko Uchida via Koji Sekiguchi) 7525 7526Tests 7527 7528* LUCENE-5957: Add option for tests to not randomize codec 7529 (Ryan Ernst) 7530 7531* LUCENE-5974: Add check that backcompat indexes use default codecs 7532 (Ryan Ernst) 7533 7534* LUCENE-5971: Create addBackcompatIndexes.py script to build and add 7535 backcompat test indexes for a given lucene version. Also renamed backcompat 7536 index files to use Version.toString() in filename. 7537 (Ryan Ernst) 7538 7539* LUCENE-6002: Monster tests no longer fail. Most of them now have an 80 hour 7540 timeout, effectively removing the timeout. The tests that operate near the 2 7541 billion limit now use IndexWriter.MAX_DOCS instead of Integer.MAX_VALUE. 7542 Some of the slow Monster tests now explicitly choose the default codec. 7543 (Mike McCandless, Shawn Heisey) 7544 7545* LUCENE-5968: Improve error message when 'ant beast' is run on top-level 7546 modules. (Ramkumar Aiyengar, Uwe Schindler) 7547 7548* LUCENE-6120: Fix MockDirectoryWrapper's close() handling. 7549 (Mike McCandless, Robert Muir) 7550 7551Build 7552 7553* LUCENE-5909: Smoke tester now has better command line parsing and 7554 optionally also runs on Java 8. (Ryan Ernst, Uwe Schindler) 7555 7556* LUCENE-5902: Add bumpVersion.py script to manage version increase after release branch is cut. 7557 7558* LUCENE-5962: Rename diffSources.py to createPatch.py and make it work with all text file types. 7559 (Ryan Ernst) 7560 7561* LUCENE-5995: Upgrade ICU to 54.1 (Robert Muir) 7562 7563* LUCENE-6070: Upgrade forbidden-apis to 1.7 (Uwe Schindler) 7564 7565Other 7566 7567* LUCENE-5563: Removed sep layout: which has fallen behind on features and doesn't 7568 perform as well as other options. (Robert Muir) 7569 7570* LUCENE-4086: Removed support for Lucene 3.x indexes. See migration guide for 7571 more information. (Robert Muir) 7572 7573* LUCENE-5858: Moved Lucene 4 compatibility codecs to 'lucene-backward-codecs.jar'. 7574 (Adrien Grand, Robert Muir) 7575 7576* LUCENE-5915: Remove Pulsing postings format. (Robert Muir) 7577 7578* LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs. 7579 (Ryan Ernst) 7580 7581======================= Lucene 4.10.4 ====================== 7582 7583Bug fixes 7584 7585* LUCENE-6019, LUCENE-6117: Remove -Dtests.assert to make IndexWriter 7586 infoStream sane. (Robert Muir, Mike McCandless) 7587 7588* LUCENE-6161: Resolving deletes was failing to reuse DocsEnum likely 7589 causing substantial performance cost for use cases that frequently 7590 delete old documents (Mike McCandless) 7591 7592* LUCENE-6192: Fix int overflow corruption case in skip data for 7593 high frequency terms in extremely large indices (Robert Muir, Mike 7594 McCandless) 7595 7596* LUCENE-6207: Fixed consumption of several terms enums on the same 7597 sorted (set) doc values instance at the same time. 7598 (Tom Shally, Robert Muir, Adrien Grand) 7599 7600* LUCENE-6093: Don't throw NullPointerException from 7601 BlendedInfixSuggester for lookups that do not end in a prefix 7602 token. (jane chang via Mike McCandless) 7603 7604* LUCENE-6279: Don't let an abusive leftover _N_upgraded.si in the 7605 index directory cause index corruption on upgrade (Robert Muir, Mike 7606 McCandless) 7607 7608* LUCENE-6287: Fix concurrency bug in IndexWriter that could cause 7609 index corruption (missing _N.si files) the first time 4.x kisses a 7610 3.x index if merges are also running. (Simon Willnauer, Mike 7611 McCandless) 7612 7613* LUCENE-6205: Fixed intermittent concurrency issue that could cause 7614 FileNotFoundException when writing doc values updates at the same 7615 time that a merge kicks off. (Mike McCandless) 7616 7617* LUCENE-6214: Fixed IndexWriter deadlock when one thread is 7618 committing while another opens a near-real-time reader and an 7619 unrecoverable (tragic) exception is hit. (Simon Willnauer, Mike 7620 McCandless) 7621 7622* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is 7623 small, or if the cache would be > 20% of the size of the FST. 7624 (Robert Muir, Mike McCandless) 7625 7626* LUCENE-6001: DrillSideways hits NullPointerException for certain 7627 BooleanQuery searches. (Dragan Jotannovic, jane chang via Mike 7628 McCandless) 7629 7630* LUCENE-6306: Merging of doc values and norms now checks whether the 7631 merge was aborted so IndexWriter.rollback can more promptly abort a 7632 running merge. (Robert Muir, Mike McCandless) 7633 7634API Changes 7635 7636* LUCENE-6212: Deprecate IndexWriter APIs that accept per-document Analyzer. 7637 These methods were trappy as they made it easy to accidentally index 7638 tokens that were not easily searchable and will be removed in 5.0.0. 7639 (Mike McCandless) 7640 7641======================= Lucene 4.10.3 ====================== 7642 7643Bug fixes 7644 7645* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has 7646 an exponential worst case) so that if it would create too many states, it 7647 now throws an exception instead of exhausting CPU/RAM. (Nik 7648 Everett via Mike McCandless) 7649 7650* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via 7651 Mike McCandless) 7652 7653* LUCENE-6049: Don't throw cryptic exception writing a segment when 7654 the only docs in it had fields that hit non-aborting exceptions 7655 during indexing but also had doc values. (Mike McCandless) 7656 7657* LUCENE-6060: Deprecate IndexWriter.unlock (Simon Willnauer, Mike 7658 McCandless) 7659 7660* LUCENE-3229: Overlapping ordered SpanNearQuery spans should not match. 7661 (Ludovic Boutros, Paul Elschot, Greg Dearing, ehatcher) 7662 7663* LUCENE-6004: Don't highlight the LookupResult.key returned from 7664 AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless) 7665 7666* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes 7667 via Mike McCandless) 7668 7669* LUCENE-5980: Don't let document length overflow. (Robert Muir) 7670 7671* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases, 7672 such as when nested inside a boolean query. (Denis Lantsman via Robert Muir) 7673 7674* LUCENE-5948: RateLimiter now fully inits itself on init. (Varun 7675 Thacker via Mike McCandless) 7676 7677* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying 7678 bytes. (Shai Erera) 7679 7680* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even 7681 when it's stalling because there are too many merges. (Mike McCandless) 7682 7683Documentation 7684 7685* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless) 7686 7687======================= Lucene 4.10.2 ====================== 7688 7689Bug fixes 7690 7691* LUCENE-5977: Fix tokenstream safety checks in IndexWriter to properly 7692 work across multi-valued fields. Previously some cases across multi-valued 7693 fields would happily create a corrupt index. (Dawid Weiss, Robert Muir) 7694 7695* LUCENE-6019: Detect when DocValuesType illegally changes for the 7696 same field name. Also added -Dtests.asserts=true|false so we can 7697 run tests with and without assertions. (Simon Willnauer, Robert 7698 Muir, Mike McCandless). 7699 7700======================= Lucene 4.10.1 ====================== 7701 7702Bug fixes 7703 7704* LUCENE-5934: Fix backwards compatibility for 4.0 indexes. 7705 (Ian Lea, Uwe Schindler, Robert Muir, Ryan Ernst) 7706 7707* LUCENE-5939: Regenerate old backcompat indexes to ensure they were built with 7708 the exact release 7709 (Ryan Ernst, Uwe Schindler) 7710 7711* LUCENE-5952: Improve error messages when version cannot be parsed; 7712 don't check for too old or too new major version (it's too low level 7713 to enforce here); use simple string tokenizer. (Ryan Ernst, Uwe Schindler, 7714 Robert Muir, Mike McCandless) 7715 7716* LUCENE-5958: Don't let exceptions during checkpoint corrupt the index. 7717 Refactor existing OOM handling too, so you don't need to handle OOM special 7718 for every IndexWriter method: instead such disasters will cause IW to close itself 7719 defensively. (Robert Muir, Mike McCandless) 7720 7721* LUCENE-5904: Fixed a corruption case that can happen when 1) 7722 IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2) 7723 on startup, when a new IndexWriter is created, a virus checker is 7724 holding some of the previously written but unused files open and 7725 preventing deletion, 3) IndexWriter writes these files again during 7726 the course of indexing, then the files can later be deleted, causing 7727 corruption. This case was detected by adding evilness to 7728 MockDirectoryWrapper to have it simulate a virus checker holding a 7729 file open and preventing deletion (Robert Muir, Mike McCandless) 7730 7731* LUCENE-5916: Static scope test components should be consistent between 7732 tests (and test iterations). Fix for FaultyIndexInput in particular. 7733 (Dawid Weiss) 7734 7735* LUCENE-5975: Fix reading of 3.0-3.3 indexes, where bugs in these old 7736 index formats would result in CorruptIndexException "did not read all 7737 bytes from file" when reading the deleted docs file. (Patrick Mi, Robert MUir) 7738 7739Tests 7740 7741* LUCENE-5936: Add backcompat checks to verify what is tested matches known versions 7742 (Ryan Ernst) 7743 7744======================= Lucene 4.10.0 ====================== 7745 7746New Features 7747 7748* LUCENE-5778: Support hunspell morphological description fields/aliases. 7749 (Robert Muir) 7750 7751* LUCENE-5801: Added (back) OrdinalMappingAtomicReader for merging search 7752 indexes that contain category ordinals from separate taxonomy indexes. 7753 (Nicola Buso via Shai Erera) 7754 7755* LUCENE-4175, LUCENE-5714, LUCENE-5779: Index and search rectangles with spatial 7756 BBoxSpatialStrategy using most predicates. Sort documents by relative overlap 7757 of query areas or just by indexed shape area. (Ryan McKinley, David Smiley) 7758 7759* LUCENE-5806: Extend expressions grammar to support array access in variables. 7760 Added helper class VariableContext to parse complex variable into pieces. 7761 (Ryan Ernst) 7762 7763* LUCENE-5826: Support proper hunspell case handling, LANG, KEEPCASE, NEEDAFFIX, 7764 and ONLYINCOMPOUND flags. (Robert Muir) 7765 7766* LUCENE-5815: Add TermAutomatonQuery, a proximity query allowing you 7767 to create an arbitrary automaton, using terms on the transitions, 7768 expressing which sequence of sequential terms (including a special 7769 "any" term) are allowed. This is a generalization of 7770 MultiPhraseQuery and span queries, and enables "correct" (including 7771 position) length search-time graph synonyms. (Mike McCandless) 7772 7773* LUCENE-5819: Add OrdsLucene41 block tree terms dict and postings 7774 format, to include term ordinals in the index so the optional 7775 TermsEnum.ord() and TermsEnum.seekExact(long ord) APIs work. (Mike 7776 McCandless) 7777 7778* LUCENE-5835: TermValComparator can sort missing values last. (Adrien Grand) 7779 7780* LUCENE-5825: Benchmark module can use custom postings format, e.g.: 7781 codec.postingsFormat=Memory (Varun Shenoy, David Smiley) 7782 7783* LUCENE-5842: When opening large files (where it's too expensive to compare 7784 checksum against all the bytes), retrieve checksum to validate structure 7785 of footer, this can detect some forms of corruption such as truncation. 7786 (Robert Muir) 7787 7788* LUCENE-5739: Added DataInput.readZ(Int|Long) and DataOutput.writeZ(Int|Long) 7789 to read and write small signed integers. (Adrien Grand) 7790 7791API Changes 7792 7793* LUCENE-5752: Simplified Automaton API to be immutable. (Mike McCandless) 7794 7795* LUCENE-5793: Add equals/hashCode to FieldType. (Shay Banon, Robert Muir) 7796 7797* LUCENE-5692: DisjointSpatialFilter is deprecated (used by RecursivePrefixTreeStrategy) 7798 (David Smiley) 7799 7800* LUCENE-5771: SpatialOperation's predicate names are now aliased to OGC standard names. 7801 Thus you can use: Disjoint, Equals, Intersects, Overlaps, Within, Contains, Covers, 7802 CoveredBy. The area requirement on the predicates was removed, and Overlaps' definition 7803 was fixed. (David Smiley) 7804 7805* LUCENE-5850: Made Version handling more robust and extensible. Deprecated 7806 Constants.LUCENE_MAIN_VERSION, Constants.LUCENE_VERSION and current Version 7807 constants of the form LUCENE_X_Y. Added version constants that include bugfix 7808 number of form LUCENE_X_Y_Z. Changed Version.LUCENE_CURRENT to Version.LATEST. 7809 CheckIndex now prints the Lucene version used to write each segment. 7810 (Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless) 7811 7812* LUCENE-5836: BytesRef has been splitted into BytesRef, whose intended usage is 7813 to be just a reference to a section of a larger byte[] and BytesRefBuilder 7814 which is a StringBuilder-like class for BytesRef instances. (Adrien Grand) 7815 7816* LUCENE-5883: You can now change the MergePolicy instance on a live IndexWriter, 7817 without first closing and reopening the writer. This allows to e.g. run a special 7818 merge with UpgradeIndexMergePolicy without reopening the writer. Also, MergePolicy 7819 no longer implements Closeable; if you need to release your custom MergePolicy's 7820 resources, you need to implement close() and call it explicitly. (Shai Erera) 7821 7822* LUCENE-5859: Deprecate Analyzer constructors taking Version. Use Analyzer.setVersion() 7823 to set the version an analyzer to replicate behavior from a specific release. 7824 (Ryan Ernst, Robert Muir) 7825 7826 7827Optimizations 7828 7829* LUCENE-5780: Make OrdinalMap more memory-efficient, especially in case the 7830 first segment has all values. (Adrien Grand, Robert Muir) 7831 7832* LUCENE-5782: OrdinalMap now sorts enums before being built in order to 7833 improve compression. (Adrien Grand) 7834 7835* LUCENE-5798: Optimize MultiDocsEnum reuse. (Robert Muir) 7836 7837* LUCENE-5799: Optimize numeric docvalues merging. (Robert Muir) 7838 7839* LUCENE-5797: Optimize norms merging (Adrien Grand, Robert Muir) 7840 7841* LUCENE-5803: Add DelegatingAnalyzerWrapper, an optimized variant 7842 of AnalyzerWrapper that doesn't allow to wrap components or readers. 7843 This wrapper class is the base class of all analyzers that just delegate 7844 to another analyzer, e.g. per field name: PerFieldAnalyzerWrapper and 7845 Solr's schema support. (Shay Banon, Uwe Schindler, Robert Muir) 7846 7847* LUCENE-5795: MoreLikeThisQuery now only collects the top N terms instead 7848 of collecting all terms from the like text when building the query. 7849 (Alex Ksikes, Simon Willnauer) 7850 7851* LUCENE-5681: Fix RAMDirectory's IndexInput to not do double buffering 7852 on slices (causes useless data copying, especially on random access slices). 7853 This also improves slices of NRTCachingDirectory, because the cache 7854 is based on RAMDirectory. BufferedIndexInput.wrap() was marked with a 7855 warning in javadocs. It is almost always a better idea to implement 7856 slicing on your own! (Uwe Schindler, Robert Muir) 7857 7858* LUCENE-5834: Empty sorted set and numeric doc values are now singletons. 7859 (Adrien Grand) 7860 7861* LUCENE-5841: Improve performance of block tree terms dictionary when 7862 assigning terms to blocks. (Mike McCandless) 7863 7864* LUCENE-5856: Optimize Fixed/Open/LongBitSet to remove unnecessary AND. 7865 (Robert Muir) 7866 7867* LUCENE-5884: Optimize FST.ramBytesUsed. (Adrien Grand, Robert Muir, 7868 Mike McCandless) 7869 7870* LUCENE-5882: Add Lucene410DocValuesFormat, with faster term lookups 7871 for SORTED/SORTED_SET fields. (Robert Muir) 7872 7873* LUCENE-5887: Remove WeakIdentityMap caching in AttributeFactory, 7874 AttributeSource, and VirtualMethod in favour of Java 7's ClassValue. 7875 Always use MethodHandles to create AttributeImpl classes. 7876 (Uwe Schindler) 7877 7878Bug Fixes 7879 7880* LUCENE-5796: Fixes the Scorer.getChildren() method for two combinations 7881 of BooleanQuery. (Terry Smith via Robert Muir) 7882 7883* LUCENE-5790: Fix compareTo in MutableValueDouble and MutableValueBool, this caused 7884 incorrect results when grouping on fields with missing values. 7885 (海老澤 志信, hossman) 7886 7887* LUCENE-5817: Fix hunspell zero-affix handling: previously only zero-strips worked 7888 correctly. (Robert Muir) 7889 7890* LUCENE-5818, LUCENE-5823: Fix hunspell overgeneration for short strings that also 7891 match affixes, words are only stripped to a zero-length string if FULLSTRIP option 7892 is specified in the dictionary. (Robert Muir) 7893 7894* LUCENE-5824: Fix hunspell 'long' flag handling. (Robert Muir) 7895 7896* LUCENE-5838: Fix hunspell when the .aff file has over 64k affixes. (Robert Muir) 7897 7898* LUCENE-5869: Added restriction to positive values for maxExpansions in 7899 FuzzyQuery. (Ryan Ernst) 7900 7901* LUCENE-5672: IndexWriter.addIndexes() calls maybeMerge(), to ensure the index stays 7902 healthy. If you don't want merging use NoMergePolicy instead. (Robert Muir) 7903 7904* LUCENE-5908: Fix Lucene43NGramTokenizer to be final 7905 7906Test Framework 7907 7908* LUCENE-5786: Unflushed/ truncated events file (hung testing subprocess). 7909 (Dawid Weiss) 7910 7911* LUCENE-5881: Add "beasting" of tests: repeats the whole "test" Ant target 7912 N times with "ant beast -Dbeast.iters=N". (Uwe Schindler, Robert Muir, 7913 Ryan Ernst, Dawid Weiss) 7914 7915Build 7916 7917* LUCENE-5770: Upgrade to JFlex 1.6, which has direct support for 7918 supplementary code points - as a result, ICU4J is no longer used 7919 to generate surrogate pairs to augment JFlex scanner specifications. 7920 (Steve Rowe) 7921 7922* SOLR-6358: Remove VcsDirectoryMappings from idea configuration 7923 vcs.xml (Ramkumar Aiyengar via Steve Rowe) 7924 7925======================= Lucene 4.9.1 ====================== 7926 7927Bug fixes 7928 7929* LUCENE-5907: Fix corruption case when opening a pre-4.x index with 7930 IndexWriter, then opening an NRT reader from that writer, then 7931 calling commit from the writer, then closing the NRT reader. This 7932 case would remove the wrong files from the index leading to a 7933 corrupt index. (Mike McCandless) 7934 7935* LUCENE-5919: Fix exception handling inside IndexWriter when 7936 deleteFile throws an exception, to not over-decRef index files, 7937 possibly deleting a file that's still in use in the index, leading 7938 to corruption. (Mike McCandless) 7939 7940* LUCENE-5922: DocValuesDocIdSet on 5.x and FieldCacheDocIdSet on 4.x 7941 are not cacheable. (Adrien Grand) 7942 7943* LUCENE-5843: Added IndexWriter.MAX_DOCS which is the maximum number 7944 of documents allowed in a single index, and any operations that add 7945 documents will now throw IllegalStateException if the max count 7946 would be exceeded, instead of silently creating an unusable 7947 index. (Mike McCandless) 7948 7949* LUCENE-5844: ArrayUtil.grow/oversize now returns a maximum of 7950 Integer.MAX_VALUE - 8 for the maximum array size. (Robert Muir, 7951 Mike McCandless) 7952 7953* LUCENE-5827: Make all Directory implementations correctly fail with 7954 IllegalArgumentException if slices are out of bounds. (Uwe Schindler) 7955 7956* LUCENE-5897, LUCENE-5400: JFlex-based tokenizers StandardTokenizer and 7957 UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of 7958 text partially matching certain grammar rules. The scanner default 7959 buffer size was reduced, and scanner buffer growth was disabled, resulting 7960 in much, much faster tokenization for these text sequences. 7961 (Chris Geeringh, Robert Muir, Steve Rowe) 7962 7963======================= Lucene 4.9.0 ======================= 7964 7965Changes in Runtime Behavior 7966 7967* LUCENE-5611: Changing the term vector options for multiple field 7968 instances by the same name in one document is not longer accepted; 7969 IndexWriter will now throw IllegalArgumentException. (Robert Muir, 7970 Mike McCandless) 7971 7972* LUCENE-5646: Remove rare/undertested bulk merge algorithm in 7973 CompressingStoredFieldsWriter. (Robert Muir, Adrien Grand) 7974 7975New Features 7976 7977* LUCENE-5610: Add Terms.getMin and Terms.getMax to get the lowest and 7978 highest terms, and NumericUtils.get{Min/Max}{Int/Long} to get the 7979 minimum numeric values from the provided Terms. (Robert Muir, Mike 7980 McCandless) 7981 7982* LUCENE-5675: Add IDVersionPostingsFormat, a postings format 7983 optimized for primary-key (ID) fields that also record a version 7984 (long) for each ID. (Robert Muir, Mike McCandless) 7985 7986* LUCENE-5680: Add ability to atomically update a set of DocValues 7987 fields. (Shai Erera) 7988 7989* LUCENE-5717: Add support for multiterm queries nested inside 7990 filtered and constant-score queries to postings highlighter. 7991 (Luca Cavanna via Robert Muir) 7992 7993* LUCENE-5731, LUCENE-5760: Add RandomAccessInput, a random access API for directory. 7994 Add DirectReader/Writer, optimized for reading packed integers directly 7995 from Directory. Add Lucene49Codec and Lucene49DocValuesFormat that make 7996 use of these. (Robert Muir) 7997 7998* LUCENE-5743: Add Lucene49NormsFormat, which can compress in some cases 7999 such as very short fields. (Ryan Ernst, Adrien Grand, Robert Muir) 8000 8001* LUCENE-5748: Add SORTED_NUMERIC docvalues type, which is efficient 8002 for processing numeric fields with multiple values. (Robert Muir) 8003 8004* LUCENE-5754: Allow "$" as part of variable and function names in 8005 expressions module. (Uwe Schindler) 8006 8007Changes in Backwards Compatibility Policy 8008 8009* LUCENE-5634: Add reuse argument to IndexableField.tokenStream. This 8010 can be used by custom fieldtypes, which don't use the Analyzer, but 8011 implement their own TokenStream. (Uwe Schindler, Robert Muir) 8012 8013* LUCENE-5640: AttributeSource.AttributeFactory was moved to a 8014 top-level class: org.apache.lucene.util.AttributeFactory 8015 (Uwe Schindler, Robert Muir) 8016 8017* LUCENE-4371: Removed IndexInputSlicer and Directory.createSlicer() and replaced 8018 with IndexInput.slice(). (Robert Muir) 8019 8020* LUCENE-5727, LUCENE-5678: Remove IndexOutput.seek, IndexOutput.setLength(). 8021 (Robert Muir, Uwe Schindler) 8022 8023API Changes 8024 8025* LUCENE-5756: IndexWriter now implements Accountable and IW#ramSizeInBytes() 8026 has been deprecated in favor of IW#ramBytesUsed() (Simon Willnauer) 8027 8028* LUCENE-5725: MoreLikeThis#like now accepts multiple values per field. 8029 The pre-existing method has been deprecated in favor of a variable arguments 8030 for the like text. (Alex Ksikes via Simon Willnauer) 8031 8032* LUCENE-5711: MergePolicy accepts an IndexWriter instance 8033 on each method rather than holding state against a single 8034 IndexWriter instance. (Simon Willnauer) 8035 8036* LUCENE-5582: Deprecate IndexOutput.length (just use 8037 IndexOutput.getFilePointer instead) and IndexOutput.setLength. 8038 (Mike McCandless) 8039 8040* LUCENE-5621: Deprecate IndexOutput.flush: this is not used by Lucene. 8041 (Robert Muir) 8042 8043* LUCENE-5611: Simplified Lucene's default indexing chain / APIs. 8044 AttributeSource/TokenStream.getAttribute now returns null if the 8045 attribute is not present (previously it threw 8046 IllegalArgumentException). StoredFieldsWriter.startDocument no 8047 longer receives the number of fields that will be added (Robert 8048 Muir, Mike McCandless) 8049 8050* LUCENE-5632: In preparation for coming Lucene versions, the Version 8051 enum constants were renamed to make them better readable. The constant 8052 for Lucene 4.9 is now "LUCENE_4_9". Version.parseLeniently() is still 8053 able to parse the old strings ("LUCENE_49"). The old identifiers got 8054 deprecated and will be removed in Lucene 5.0. (Uwe Schindler, 8055 Robert Muir) 8056 8057* LUCENE-5633: Change NoMergePolicy to a singleton with no distinction between 8058 compound and non-compound types. (Shai Erera) 8059 8060* LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams 8061 are using Attributes, Token is no longer used. (Uwe Schindler, Robert Muir) 8062 8063* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and 8064 IndexWriter.deleteDocuments(Query) with their varargs counterparts. 8065 (Shai Erera) 8066 8067* LUCENE-5701: Core closed listeners are now available in the AtomicReader API, 8068 they used to sit only in SegmentReader. (Adrien Grand, Robert Muir) 8069 8070* LUCENE-5706: Removed the option to unset a DocValues field through DocValues 8071 updates. (Shai Erera) 8072 8073* LUCENE-5700: Added oal.util.Accountable that is now implemented by all 8074 classes whose memory usage can be estimated. (Robert Muir, Adrien Grand) 8075 8076* LUCENE-5708: Remove IndexWriterConfig.clone, so now IndexWriter 8077 simply uses the IndexWriterConfig you pass it, and you must create a 8078 new IndexWriterConfig for each IndexWriter. (Mike McCandless) 8079 8080* LUCENE-5678: IndexOutput no longer allows seeking, so it is no longer required 8081 to use RandomAccessFile to write Indexes. Lucene now uses standard FileOutputStream 8082 wrapped with OutputStreamIndexOutput to write index data. BufferedIndexOutput was 8083 removed, because buffering and checksumming is provided by FilterOutputStreams, 8084 provided by the JDK. (Uwe Schindler, Mike McCandless) 8085 8086* LUCENE-5703: BinaryDocValues API changed to work like TermsEnum and not allocate/ 8087 copy bytes on each access, you are responsible for cloning if you want to keep 8088 data around. (Adrien Grand) 8089 8090* LUCENE-5695: DocIdSet implements Accountable. (Adrien Grand) 8091 8092* LUCENE-5757: Moved RamUsageEstimator's reflection-based processing to RamUsageTester 8093 in the test-framework module. (Robert Muir) 8094 8095* LUCENE-5761: Removed DiskDocValuesFormat, it was very inefficient and saved very little 8096 RAM over the default codec. (Robert Muir) 8097 8098* LUCENE-5775: Deprecate JaspellLookup. (Mike McCandless) 8099 8100Optimizations 8101 8102* LUCENE-5603: hunspell stemmer more efficiently strips prefixes 8103 and suffixes. (Robert Muir) 8104 8105* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped 8106 InputStream. (Christoph Kaser via Shai Erera) 8107 8108* LUCENE-5591: pass an IOContext with estimated flush size when applying DV 8109 updates. (Shai Erera) 8110 8111* LUCENE-5634: IndexWriter reuses TokenStream instances for String and Numeric 8112 fields by default. (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir) 8113 8114* LUCENE-5638, LUCENE-5640: TokenStream uses a more performant AttributeFactory 8115 by default, that packs the core attributes into one implementation 8116 (PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and 8117 restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for 8118 instantiating Attribute implementations. (Uwe Schindler, Robert Muir) 8119 8120* LUCENE-5609: Changed the default NumericField precisionStep from 4 8121 to 8 (for int/float) and 16 (for long/double), for faster indexing 8122 time and smaller indices. (Robert Muir, Uwe Schindler, Mike McCandless) 8123 8124* LUCENE-5670: Add skip/FinalOutput to FST Outputs. (Christian 8125 Ziech via Mike McCandless). 8126 8127* LUCENE-4236: Optimize BooleanQuery's in-order scoring. This speeds up 8128 some types of boolean queries. (Robert Muir) 8129 8130* LUCENE-5694: Don't score() subscorers in DisjunctionSumScorer or 8131 DisjunctionMaxScorer unless score() is called. (Robert Muir) 8132 8133* LUCENE-5720: Optimize DirectPackedReader's decompression. (Robert Muir) 8134 8135* LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing 8136 implementations. This improves random access as used by docvalues codecs 8137 if used with MMapDirectory. (Robert Muir, Uwe Schindler) 8138 8139* LUCENE-5730: FSDirectory.open returns MMapDirectory for 64-bit operating 8140 systems, not just Linux and Windows. (Robert Muir) 8141 8142* LUCENE-5703: BinaryDocValues producers don't allocate or copy bytes on 8143 each access anymore. (Adrien Grand) 8144 8145* LUCENE-5721: Monotonic compression doesn't use zig-zag encoding anymore. 8146 (Robert Muir, Adrien Grand) 8147 8148* LUCENE-5750: Speed up monotonic addressing for BINARY and SORTED_SET 8149 docvalues. (Robert Muir) 8150 8151* LUCENE-5751: Speed up MemoryDocValues. (Adrien Grand, Robert Muir) 8152 8153* LUCENE-5767: OrdinalMap optimizations, that mostly help on low cardinalities. 8154 (Martijn van Groningen, Adrien Grand) 8155 8156* LUCENE-5769: SingletonSortedSetDocValues now supports random access ordinals. 8157 (Robert Muir) 8158 8159Bug fixes 8160 8161* LUCENE-5738: Ensure NativeFSLock prevents opening the file channel for the 8162 lock if the lock is already obtained by the JVM. Trying to obtain an already 8163 obtained lock in the same JVM can unlock the file might allow other processes 8164 to lock the file even without explicitly unlocking the FileLock. This behavior 8165 is operating system dependent. (Simon Willnauer) 8166 8167* LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws 8168 a confusing OutOfMemoryError wrapped inside IOException if the FileChannel 8169 mapping failed because of lack of virtual address space. The IOException is 8170 rethrown with more useful information about the problem, omitting the 8171 incorrect OutOfMemoryError. (Robert Muir, Uwe Schindler) 8172 8173* LUCENE-5682: NPE in QueryRescorer when Scorer is null 8174 (Joel Bernstein, Mike McCandless) 8175 8176* LUCENE-5691: DocTermOrds lookupTerm(BytesRef) would return incorrect results 8177 if the underlying TermsEnum supports ord() and the insertion point would 8178 be at the end. (Robert Muir) 8179 8180* LUCENE-5618, LUCENE-5636: SegmentReader referenced unneeded files following 8181 doc-values updates. Now doc-values field updates are written in separate file 8182 per field. (Shai Erera, Robert Muir) 8183 8184* LUCENE-5684: Make best effort to detect invalid usage of Lucene, 8185 when IndexReader is reopened after all files in its index were 8186 removed and recreated by the application (the proper way to do 8187 this is IndexWriter.deleteAll, or opening an IndexWriter with 8188 OpenMode.CREATE) (Mike McCandless) 8189 8190* LUCENE-5704: Fix compilation error with Java 8u20. (Uwe Schindler) 8191 8192* LUCENE-5710: Include the inner exception as the cause and in the 8193 exception message when an immense term is hit during indexing (Lee 8194 Hinman via Mike McCandless) 8195 8196* LUCENE-5724: CompoundFileWriter was failing to pass through the 8197 IOContext in some cases, causing NRTCachingDirectory to cache 8198 compound files when it shouldn't, then causing OOMEs. (Mike 8199 McCandless) 8200 8201* LUCENE-5747: Project-specific settings for the eclipse development 8202 environment will prevent automatic code reformatting. (Shawn Heisey) 8203 8204* LUCENE-5768, LUCENE-5777: Hunspell condition checks containing character classes 8205 were buggy. (Clinton Gormley, Robert Muir) 8206 8207Test Framework 8208 8209* LUCENE-5622: Fail tests if they print over the given limit of bytes to 8210 System.out or System.err. (Robert Muir, Dawid Weiss) 8211 8212* LUCENE-5619: Added backwards compatibility tests to ensure we can update existing 8213 indexes with doc-values updates. (Shai Erera, Robert Muir) 8214 8215Build 8216 8217* LUCENE-5442: The Ant check-lib-versions target now runs Ivy resolution 8218 transitively, then fails the build when it finds a version conflict: when a 8219 transitive dependency's version is more recent than the direct dependency's 8220 version specified in lucene/ivy-versions.properties. Exceptions are 8221 specifiable in lucene/ivy-ignore-conflicts.properties. 8222 (Steve Rowe) 8223 8224* LUCENE-5715: Upgrade direct dependencies known to be older than transitive 8225 dependencies: com.sun.jersey.version:1.8->1.9; com.sun.xml.bind:jaxb-impl:2.2.2->2.2.3-1; 8226 commons-beanutils:commons-beanutils:1.7.0->1.8.3; commons-digester:commons-digester:2.0->2.1; 8227 commons-io:commons-io:2.1->2.3; commons-logging:commons-logging:1.1.1->1.1.3; 8228 io.netty:netty:3.6.2.Final->3.7.0.Final; javax.activation:activation:1.1->1.1.1; 8229 javax.mail:mail:1.4.1->1.4.3; log4j:log4j:1.2.16->1.2.17; org.apache.avro:avro:1.7.4->1.7.5; 8230 org.tukaani:xz:1.2->1.4; org.xerial.snappy:snappy-java:1.0.4.1->1.0.5 (Steve Rowe) 8231 8232======================= Lucene 4.8.1 ======================= 8233 8234Bug fixes 8235 8236* LUCENE-5639: Fix PositionLengthAttribute implementation in Token class. 8237 (Uwe Schindler, Robert Muir) 8238 8239* LUCENE-5635: IndexWriter didn't properly handle IOException on TokenStream.reset(), 8240 which could leave the analyzer in an inconsistent state. (Robert Muir) 8241 8242* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped 8243 InputStream. (Christoph Kaser via Shai Erera) 8244 8245* LUCENE-5600: HttpClientBase did not properly consume a connection if a server 8246 error occurred. (Christoph Kaser via Shai Erera) 8247 8248* LUCENE-5628: Change getFiniteStrings to iterative not recursive 8249 implementation, so that building suggesters on a long suggestion 8250 doesn't risk overflowing the stack; previously it consumed one Java 8251 stack frame per character in the expanded suggestion. If you are building 8252 a suggester this is a nasty trap. (Robert Muir, Simon Willnauer, 8253 Mike McCandless). 8254 8255* LUCENE-5559: Add additional argument validation for CapitalizationFilter 8256 and CodepointCountFilter. (Ahmet Arslan via Robert Muir) 8257 8258* LUCENE-5641: SimpleRateLimiter would silently rate limit at 8 MB/sec 8259 even if you asked for higher rates. (Mike McCandless) 8260 8261* LUCENE-5644: IndexWriter clears which threads use which internal 8262 thread states on flush, so that if an application reduces how many 8263 threads it uses for indexing, that results in a reduction of how 8264 many segments are flushed on a full-flush (e.g. to obtain a 8265 near-real-time reader). (Simon Willnauer, Mike McCandless) 8266 8267* LUCENE-5653: JoinUtil with ScoreMode.Avg on a multi-valued field 8268 with more than 256 values would throw exception. 8269 (Mikhail Khludnev via Robert Muir) 8270 8271* LUCENE-5654: Fix various close() methods that could suppress 8272 throwables such as OutOfMemoryError, instead returning scary messages 8273 that look like index corruption. (Mike McCandless, Robert Muir) 8274 8275* LUCENE-5656: Fix rare fd leak in SegmentReader when multiple docvalues 8276 fields have been updated with IndexWriter.updateXXXDocValue and one 8277 hits exception. (Shai Erera, Robert Muir) 8278 8279* LUCENE-5660: AnalyzingSuggester.build will now throw IllegalArgumentException if 8280 you give it a longer suggestion than it can handle (Robert Muir, Mike McCandless) 8281 8282* LUCENE-5662: Add missing checks to Field to prevent IndexWriter.abort 8283 if a stored value is null. (Robert Muir) 8284 8285* LUCENE-5668: Fix off-by-one in TieredMergePolicy (Mike McCandless) 8286 8287* LUCENE-5671: Upgrade ICU version to fix an ICU concurrency problem that 8288 could cause exceptions when indexing. (feedly team, Robert Muir) 8289 8290======================= Lucene 4.8.0 ======================= 8291 8292System Requirements 8293 8294* LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version. 8295 (Robert Muir, Uwe Schindler) 8296 8297Changes in Runtime Behavior 8298 8299* LUCENE-5472: IndexWriter.addDocument will now throw an IllegalArgumentException 8300 if a Term to be indexed exceeds IndexWriter.MAX_TERM_LENGTH. To recreate previous 8301 behavior of silently ignoring these terms, use LengthFilter in your Analyzer. 8302 (hossman, Mike McCandless, Varun Thacker) 8303 8304New Features 8305 8306* LUCENE-5356: Morfologik filter can accept custom dictionary resources. 8307 (Michal Hlavac, Dawid Weiss) 8308 8309* LUCENE-5454: Add SortedSetSortField to lucene/sandbox, to allow sorting 8310 on multi-valued field. (Robert Muir) 8311 8312* LUCENE-5478: CommonTermsQuery now allows to create custom term queries 8313 similar to the query parser by overriding a newTermQuery method. 8314 (Simon Willnauer) 8315 8316* LUCENE-5477: AnalyzingInfixSuggester now supports near-real-time 8317 additions and updates (to change weight or payload of an existing 8318 suggestion). (Mike McCandless) 8319 8320* LUCENE-5482: Improve default TurkishAnalyzer by adding apostrophe 8321 handling suitable for Turkish. (Ahmet Arslan via Robert Muir) 8322 8323* LUCENE-5479: FacetsConfig subclass can now customize the default 8324 per-dim facets configuration. (Rob Audenaerde via Mike McCandless) 8325 8326* LUCENE-5485: Add circumfix support to HunspellStemFilter. (Robert Muir) 8327 8328* LUCENE-5224: Add iconv, oconv, and ignore support to HunspellStemFilter. 8329 (Robert Muir) 8330 8331* LUCENE-5493: SortingMergePolicy, and EarlyTerminatingSortingCollector 8332 support arbitrary Sort specifications. 8333 (Robert Muir, Mike McCandless, Adrien Grand) 8334 8335* LUCENE-3758: Allow the ComplexPhraseQueryParser to search order or 8336 un-order proximity queries. (Ahmet Arslan via Erick Erickson) 8337 8338* LUCENE-5530: ComplexPhraseQueryParser throws ParseException for fielded queries. 8339 (Erick Erickson via Tomas Fernandez Lobbe and Ahmet Arslan) 8340 8341* LUCENE-5513: Add IndexWriter.updateBinaryDocValue which lets 8342 you update the value of a BinaryDocValuesField without reindexing the 8343 document(s). (Shai Erera) 8344 8345* LUCENE-4072: Add ICUNormalizer2CharFilter, which lets you do unicode normalization 8346 with offset correction before the tokenizer. (David Goldfarb, Ippei UKAI via Robert Muir) 8347 8348* LUCENE-5476: Add RandomSamplingFacetsCollector for computing facets on a sampled 8349 set of matching hits, in cases where there are millions of hits. 8350 (Rob Audenaerde, Gilad Barkai, Shai Erera) 8351 8352* LUCENE-4984: Add SegmentingTokenizerBase, abstract class for tokenizers 8353 that want to do two-pass tokenization such as by sentence and then by word. 8354 (Robert Muir) 8355 8356* LUCENE-5489: Add Rescorer/QueryRescorer, to resort the hits from a 8357 first pass search using scores from a more costly second pass 8358 search. (Simon Willnauer, Robert Muir, Mike McCandless) 8359 8360* LUCENE-5528: Add context to suggesters (InputIterator and Lookup 8361 classes), and fix AnalyzingInfixSuggester to handle contexts. 8362 Suggester contexts allow you to filter suggestions. (Areek Zillur, 8363 Mike McCandless) 8364 8365* LUCENE-5545: Add SortRescorer and Expression.getRescorer, to 8366 resort the hits from a first pass search using a Sort or an 8367 Expression. (Simon Willnauer, Robert Muir, Mike McCandless) 8368 8369* LUCENE-5558: Add TruncateTokenFilter which truncates terms to 8370 the specified length. (Ahmet Arslan via Robert Muir) 8371 8372* LUCENE-2446: Added checksums to lucene index files. As of 4.8, the last 8 8373 bytes of each file contain a zlib-crc32 checksum. Small metadata files are 8374 verified on load. Larger files can be checked on demand via 8375 AtomicReader.checkIntegrity. You can configure this to happen automatically 8376 before merges by enabling IndexWriterConfig.setCheckIntegrityAtMerge. 8377 (Robert Muir) 8378 8379* LUCENE-5580: Checksums are automatically verified on the default stored 8380 fields format when performing a bulk merge. (Adrien Grand) 8381 8382* LUCENE-5602: Checksums are automatically verified on the default term 8383 vectors format when performing a bulk merge. (Adrien Grand, Robert Muir) 8384 8385* LUCENE-5583: Added DataInput.skipBytes. ChecksumIndexInput can now seek, but 8386 only forward. (Adrien Grand, Mike McCandless, Simon Willnauer, Uwe Schindler) 8387 8388* LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring 8389 that all file metadata is persisted on disk in case of power failure. 8390 This does not work on all file systems and operating systems, but Linux 8391 and MacOSX are known to work. On Windows, fsyncing a directory is not 8392 possible with Java APIs. (Mike McCandless, Uwe Schindler) 8393 8394API Changes 8395 8396* LUCENE-5454: Add RandomAccessOrds, an optional extension of SortedSetDocValues 8397 that supports random access to the ordinals in a document. (Robert Muir) 8398 8399* LUCENE-5468: Move offline Sort (from suggest module) to OfflineSort. (Robert Muir) 8400 8401* LUCENE-5493: SortingMergePolicy and EarlyTerminatingSortingCollector take 8402 Sort instead of Sorter. BlockJoinSorter is removed, replaced with 8403 BlockJoinComparatorSource, which can take a Sort for ordering of parents 8404 and a separate Sort for ordering of children within a block. 8405 (Robert Muir, Mike McCandless, Adrien Grand) 8406 8407* LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as 8408 a boolean that indicates if a new merge was found in the caller thread before 8409 the scheduler was called. (Simon Willnauer) 8410 8411* LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from 8412 normal scoring (Weight.scorer) for those queries that can do bulk 8413 scoring more efficiently, e.g. BooleanQuery in some cases. This 8414 also simplified the Weight.scorer API by removing the two confusing 8415 booleans. (Robert Muir, Uwe Schindler, Mike McCandless) 8416 8417* LUCENE-5519: TopNSearcher now allows to retrieve incomplete results if the max 8418 size of the candidate queue is unknown. The queue can still be bound in order 8419 to apply pruning while retrieving the top N but will not throw an exception if 8420 too many results are rejected to guarantee an absolutely correct top N result. 8421 The TopNSearcher now returns a struct like class that indicates if the result 8422 is complete in the sense of the top N or not. Consumers of this API should assert 8423 on the completeness if the bounded queue size is know ahead of time. (Simon Willnauer) 8424 8425* LUCENE-4984: Deprecate ThaiWordFilter and smartcn SentenceTokenizer and WordTokenFilter. 8426 These filters would not work correctly with CharFilters and could not be safely placed 8427 at an arbitrary position in the analysis chain. Use ThaiTokenizer and HMMChineseTokenizer 8428 instead. (Robert Muir) 8429 8430* LUCENE-5543: Remove/deprecate Directory.fileExists (Mike McCandless) 8431 8432* LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues. 8433 (Dawid Weiss, Robert Muir) 8434 8435* LUCENE-5604: Switched BytesRef.hashCode to MurmurHash3 (32 bit). 8436 TermToBytesRefAttribute.fillBytesRef no longer returns the hash 8437 code. BytesRefHash now uses MurmurHash3 for its hashing. (Robert 8438 Muir, Mike McCandless) 8439 8440Optimizations 8441 8442* LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads 8443 all known openoffice dictionaries without error, and supports an additional 8444 longestOnly option for a less aggressive approach. (Robert Muir) 8445 8446* LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile 8447 for NIOFSDirectory and MMapDirectory. This allows to delete open files 8448 on Windows if NIOFSDirectory is used, mmapped files are still locked. 8449 (Michael Poindexter, Robert Muir, Uwe Schindler) 8450 8451* LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc 8452 array with length of at most equal to the specified size instead of length 8453 equal to at most from + size as was before. (Martijn van Groningen) 8454 8455* LUCENE-5529: Spatial search of non-point indexed shapes should be a little 8456 faster due to skipping intersection tests on redundant cells. (David Smiley) 8457 8458Bug fixes 8459 8460* LUCENE-5483: Fix inaccuracies in HunspellStemFilter. Multi-stage affix-stripping, 8461 prefix-suffix dependencies, and COMPLEXPREFIXES now work correctly according 8462 to the hunspell algorithm. Removed recursionCap parameter, as it's no longer needed, rules for 8463 recursive affix application are driven correctly by continuation classes in the affix file. 8464 (Robert Muir) 8465 8466* LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions. 8467 (Robert Muir) 8468 8469* LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying 8470 types of whitespace in SET/FLAG commands. (Robert Muir) 8471 8472* LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases 8473 etc before the encoding declaration. (Robert Muir) 8474 8475* LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order. (Robert Muir) 8476 8477* LUCENE-5555: Fix SortedInputIterator to correctly encode/decode contexts in presence of payload (Areek Zillur) 8478 8479* LUCENE-5559: Add missing argument checks to tokenfilters taking 8480 numeric arguments. (Ahmet Arslan via Robert Muir) 8481 8482* LUCENE-5568: Benchmark module's "default.codec" option didn't work. (David Smiley) 8483 8484* SOLR-5983: HTMLStripCharFilter is treating CDATA sections incorrectly. 8485 (Dan Funk, Steve Rowe) 8486 8487* LUCENE-5615: Validate per-segment delete counts at write time, to 8488 help catch bugs that might otherwise cause corruption (Mike McCandless) 8489 8490* LUCENE-5612: NativeFSLockFactory no longer deletes its lock file. This cannot be done 8491 safely without the risk of deleting someone else's lock file. If you use NativeFSLockFactory, 8492 you may see write.lock hanging around from time to time: it's harmless. 8493 (Uwe Schindler, Mike McCandless, Robert Muir) 8494 8495* LUCENE-5624: Ensure NativeFSLockFactory does not leak file handles if it is unable 8496 to obtain the lock. (Uwe Schindler, Robert Muir) 8497 8498* LUCENE-5626: Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed 8499 IOException (ERROR_ACCESS_DENIED) on Windows if the lock file was created 8500 concurrently. This error is now handled the same way like in NativeFSLockFactory 8501 by returning false. (Uwe Schindler, Robert Muir, Dawid Weiss) 8502 8503* LUCENE-5630: Add missing META-INF entry for UpperCaseFilterFactory. 8504 (Robert Muir) 8505 8506Tests 8507 8508* LUCENE-5630: Fix TestAllAnalyzersHaveFactories to correctly check for existence 8509 of class and corresponding Map<String,String> ctor. (Uwe Schindler, Robert Muir) 8510 8511Test Framework 8512 8513* LUCENE-5592: Incorrectly reported uncloseable files. (Dawid Weiss) 8514 8515* LUCENE-5577: Temporary folder and file management (and cleanup facilities) 8516 (Mark Miller, Uwe Schindler, Dawid Weiss) 8517 8518* LUCENE-5567: When a suite fails with zombie threads failure marker and count 8519 is not propagated properly. (Dawid Weiss) 8520 8521* LUCENE-5449: Rename _TestUtil and _TestHelper to remove the leading _. 8522 8523* LUCENE-5501: Added random out-of-order collection testing (when the collector 8524 supports it) to AssertingIndexSearcher. (Adrien Grand) 8525 8526Build 8527 8528* LUCENE-5463: RamUsageEstimator.(human)sizeOf(Object) is now a forbidden API. 8529 (Adrien Grand, Robert Muir) 8530 8531* LUCENE-5512: Remove redundant typing (use diamond operator) throughout 8532 the codebase. (Furkan KAMACI via Robert Muir) 8533 8534* LUCENE-5614: Enable building on Java 8 using Apache Ant 1.8.3 or 1.8.4 8535 by adding a workaround for the Ant bug. (Uwe Schindler) 8536 8537* LUCENE-5612: Add a new Ant target in lucene/core to test LockFactory 8538 implementations: "ant test-lock-factory". (Uwe Schindler, Mike McCandless, 8539 Robert Muir) 8540 8541Documentation 8542 8543* LUCENE-5534: Add javadocs to GreekStemmer methods. 8544 (Stamatis Pitsios via Robert Muir) 8545 8546======================= Lucene 4.7.2 ======================= 8547 8548Bug Fixes 8549 8550* LUCENE-5574: Closing a near-real-time reader no longer attempts to 8551 delete unreferenced files if the original writer has been closed; 8552 this could cause index corruption in certain cases where index files 8553 were directly changed (deleted, overwritten, etc.) in the index 8554 directory outside of Lucene. (Simon Willnauer, Shai Erera, Robert 8555 Muir, Mike McCandless) 8556 8557* LUCENE-5570: Don't let FSDirectory.sync() create new zero-byte files, instead throw 8558 exception if a file is missing. (Uwe Schindler, Mike McCandless, Robert Muir) 8559 8560======================= Lucene 4.7.1 ======================= 8561 8562Changes in Runtime Behavior 8563 8564* LUCENE-5532: AutomatonQuery.equals is no longer implemented as "accepts same language". 8565 This was inconsistent with hashCode, and unnecessary for any subclasses in Lucene. 8566 If you desire this in a custom subclass, minimize the automaton. (Robert Muir) 8567 8568Bug Fixes 8569 8570* LUCENE-5450: Fix getField() NPE issues with SpanOr/SpanNear when they have an 8571 empty list of clauses. This can happen for example, when a wildcard matches 8572 no terms. (Tim Allison via Robert Muir) 8573 8574* LUCENE-5473: Throw IllegalArgumentException, not 8575 NullPointerException, if the synonym map is empty when creating 8576 SynonymFilter (帅广应 via Mike McCandless) 8577 8578* LUCENE-5432: EliasFanoDocIdSet: Fix number of index entry bits when the maximum 8579 entry is a power of 2. (Paul Elschot via Adrien Grand) 8580 8581* LUCENE-5466: query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier. 8582 (Koji Sekiguchi) 8583 8584* LUCENE-5502: Fixed TermsFilter.equals that could return true for different 8585 filters. (Igor Motov via Adrien Grand) 8586 8587* LUCENE-5522: FacetsConfig didn't add drill-down terms for association facet 8588 fields labels. (Shai Erera) 8589 8590* LUCENE-5520: ToChildBlockJoinQuery would hit 8591 ArrayIndexOutOfBoundsException if a parent document had no children 8592 (Sally Ang via Mike McCandless) 8593 8594* LUCENE-5532: AutomatonQuery.hashCode was not thread-safe. (Robert Muir) 8595 8596* LUCENE-5525: Implement MultiFacets.getAllDims, so you can do sparse 8597 facets through DrillSideways, for example. (Jose Peleteiro, Mike 8598 McCandless) 8599 8600* LUCENE-5481: IndexWriter.forceMerge used to run a merge even if there was a 8601 single segment in the index. (Adrien Grand, Mike McCandless) 8602 8603* LUCENE-5538: Fix FastVectorHighlighter bug with index-time synonyms when the 8604 query is more complex than a single phrase. (Robert Muir) 8605 8606* LUCENE-5544: Exceptions during IndexWriter.rollback could leak file handles 8607 and the write lock. (Robert Muir) 8608 8609* LUCENE-4978: Spatial RecursivePrefixTree queries could result in false-negatives for 8610 indexed shapes within 1/2 maxDistErr from the edge of the query shape. This meant 8611 searching for a point by the same point as a query rarely worked. (David Smiley) 8612 8613* LUCENE-5553: IndexReader#ReaderClosedListener is not always invoked when 8614 IndexReader#close() is called or if refCount is 0. If an exception is 8615 thrown during internal close or on any of the close listeners some or all 8616 listeners might be missed. This can cause memory leaks if the core listeners 8617 are used to clear caches. (Simon Willnauer) 8618 8619Build 8620 8621* LUCENE-5511: "ant precommit" / "ant check-svn-working-copy" now work again 8622 with any working copy format (thanks to svnkit 1.8.4). (Uwe Schindler) 8623 8624======================= Lucene 4.7.0 ======================= 8625 8626New Features 8627 8628* LUCENE-5336: Add SimpleQueryParser: parser for human-entered queries. 8629 (Jack Conradson via Robert Muir) 8630 8631* LUCENE-5337: Add Payload support to FileDictionary (Suggest) and make it more 8632 configurable (Areek Zillur via Erick Erickson) 8633 8634* LUCENE-5329: suggest: DocumentDictionary and 8635 DocumentExpressionDictionary are now lenient for dirty documents 8636 (missing the term, weight or payload). (Areek Zillur via 8637 Mike McCandless) 8638 8639* LUCENE-5404: Add .getCount method to all suggesters (Lookup); persist count 8640 metadata on .store(); Dictionary returns InputIterator; Dictionary.getWordIterator 8641 renamed to .getEntryIterator. (Areek Zillur) 8642 8643* SOLR-1871: The RangeMapFloatFunction accepts an arbitrary ValueSource 8644 as target and default values. (Chris Harris, shalin) 8645 8646* LUCENE-5371: Speed up Lucene range faceting from O(N) per hit to 8647 O(log(N)) per hit using segment trees; this only really starts to 8648 matter in practice if the number of ranges is over 10 or so. (Mike 8649 McCandless) 8650 8651* LUCENE-5379: Add Analyzer for Kurdish. (Robert Muir) 8652 8653* LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens. (ryan) 8654 8655* LUCENE-5345: Add a new BlendedInfixSuggester, which is like 8656 AnalyzingInfixSuggester but boosts suggestions that matched tokens 8657 with lower positions. (Remi Melisson via Mike McCandless) 8658 8659* LUCENE-5399: When sorting by String (SortField.STRING), you can now 8660 specify whether missing values should be sorted first (the default), 8661 using SortField.setMissingValue(SortField.STRING_FIRST), or last, 8662 using SortField.setMissingValue(SortField.STRING_LAST). (Rob Muir, 8663 Mike McCandless) 8664 8665* LUCENE-5099: QueryNode should have the ability to detach from its node 8666 parent. Added QueryNode.removeFromParent() that allows nodes to be 8667 detached from its parent node. (Adriano Crestani) 8668 8669* LUCENE-5395 LUCENE-5451: Upgrade to Spatial4j 0.4.1: Parses WKT (including 8670 ENVELOPE) with extension "BUFFER"; buffering a point results in a Circle. 8671 JTS isn't needed for WKT any more but remains required for Polygons. New 8672 Shapes: ShapeCollection and BufferedLineString. Various other improvements and 8673 bug fixes too. More info: 8674 https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md (David Smiley) 8675 8676* LUCENE-5415: Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter. 8677 (Mike McCandless, Robert Muir) 8678 8679* LUCENE-3069: Add two memory resident dictionaries (FST terms dictionary and 8680 FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat 8681 API is also changed so that term dictionaries get the ability to block 8682 encode term metadata, and all dictionary implementations can now plug in any 8683 PostingsBaseFormat. (Han Jiang, Mike McCandless) 8684 8685* LUCENE-5353: ShingleFilter's filler token should be configurable. 8686 (Ahmet Arslan, Simon Willnauer, Steve Rowe) 8687 8688* LUCENE-5320: Add SearcherTaxonomyManager over search and taxonomy index 8689 directories (i.e. not only NRT). (Shai Erera) 8690 8691* LUCENE-5410: Add fuzzy and near support via '~' operator to SimpleQueryParser. 8692 (Lee Hinman via Robert Muir) 8693 8694* LUCENE-5426: Make SortedSetDocValuesReaderState abstract to allow 8695 custom implementations for Lucene doc values faceting (John Wang via 8696 Mike McCandless) 8697 8698* LUCENE-5434: NRT support for file systems that do no have delete on last 8699 close or cannot delete while referenced semantics. 8700 (Mark Miller, Mike McCandless) 8701 8702* LUCENE-5418: Drilling down or sideways on a Lucene facet range 8703 (using Range.getFilter()) is now faster for costly filters (uses 8704 random access, not iteration); range facet counts now accept a 8705 fast-match filter to avoid computing the value for documents that 8706 are out of bounds, e.g. using a bounding box filter with distance 8707 range faceting. (Mike McCandless) 8708 8709* LUCENE-5440: Add LongBitSet for managing more than 2.1B bits (otherwise use 8710 FixedBitSet). (Shai Erera) 8711 8712* LUCENE-5437: ASCIIFoldingFilter now has an option to preserve the original token 8713 and emit it on the same position as the folded token only if the actual token was 8714 folded. (Simon Willnauer, Nik Everett) 8715 8716* LUCENE-5408: Add spatial SerializedDVStrategy that serializes a binary 8717 representations of a shape into BinaryDocValues. It supports exact geometry 8718 relationship calculations. (David Smiley) 8719 8720* LUCENE-5457: Add SloppyMath.earthDiameter(double latitude) that returns an 8721 approximate value of the diameter of the earth at the given latitude. 8722 (Adrien Grand) 8723 8724* LUCENE-5979: FilteredQuery uses the cost API to decide on whether to use 8725 random-access or leap-frog to intersect the filter with the query. 8726 (Adrien Grand) 8727 8728Build 8729 8730* LUCENE-5217,LUCENE-5420: Maven config: get dependencies from Ant+Ivy config; 8731 disable transitive dependency resolution for all depended-on artifacts by 8732 putting an exclusion for each transitive dependency in the 8733 <dependencyManagement> section of the grandparent POM. (Steve Rowe) 8734 8735* LUCENE-5322: Clean up / simplify Maven-related Ant targets. 8736 (Steve Rowe) 8737 8738* LUCENE-5347: Upgrade forbidden-apis checker to version 1.4. 8739 (Uwe Schindler) 8740 8741* LUCENE-4381: Upgrade analysis/icu to 52.1. (Robert Muir) 8742 8743* LUCENE-5357: Upgrade StandardTokenizer and UAX29URLEmailTokenizer to 8744 Unicode 6.3; update UAX29URLEmailTokenizer's recognized top level 8745 domains in URLs and Emails from the IANA Root Zone Database. 8746 (Steve Rowe) 8747 8748* LUCENE-5360: Add support for developing in Netbeans IDE. 8749 (Michal Hlavac, Uwe Schindler, Steve Rowe) 8750 8751* SOLR-5590: Upgrade HttpClient/HttpComponents to 4.3.x. 8752 (Karl Wright via Shawn Heisey) 8753 8754* LUCENE-5385: "ant precommit" / "ant check-svn-working-copy" now work 8755 for SVN 1.8 or GIT checkouts. The ANT target prints a warning instead 8756 of failing. It also instructs the user, how to run on SVN 1.8 working 8757 copies. (Robert Muir, Uwe Schindler) 8758 8759* LUCENE-5383: fix changes2html to link pull requests (Steve Rowe) 8760 8761* LUCENE-5411: Upgrade to released JFlex 1.5.0; stop requiring 8762 a locally built JFlex snapshot jar. (Steve Rowe) 8763 8764* LUCENE-5465: Solr Contrib "map-reduce" breaks Manifest of all other 8765 JAR files by adding a broken Main-Class attribute. 8766 (Uwe Schindler, Steve Rowe) 8767 8768Bug fixes 8769 8770* LUCENE-5285: Improved highlighting of multi-valued fields with 8771 FastVectorHighlighter. (Nik Everett via Adrien Grand) 8772 8773* LUCENE-5391: UAX29URLEmailTokenizer should not tokenize no-scheme 8774 domain-only URLs that are followed by an alphanumeric character. 8775 (Chris Geeringh, Steve Rowe) 8776 8777* LUCENE-5405: If an analysis component throws an exception, Lucene 8778 logs the field name to the info stream to assist in 8779 diagnosis. (Benson Margulies) 8780 8781* SOLR-5661: PriorityQueue now refuses to allocate itself if the 8782 incoming maxSize is too large (Raintung Li via Mike McCandless) 8783 8784* LUCENE-5228: IndexWriter.addIndexes(Directory[]) now acquires a 8785 write lock in each Directory, to ensure that no open IndexWriter is 8786 changing the incoming indices. This also means that you cannot pass 8787 the same Directory to multiple concurrent addIndexes calls (which is 8788 anyways unusual). (Robert Muir, Mike McCandless) 8789 8790* LUCENE-5415: SpanMultiTermQueryWrapper didn't handle its boost in 8791 hashcode/equals/tostring/rewrite. (Robert Muir) 8792 8793* LUCENE-5409: ToParentBlockJoinCollector.getTopGroups would fail to 8794 return any groups when the joined query required more than one 8795 rewrite step (Peng Cheng via Mike McCandless) 8796 8797* LUCENE-5398: NormValueSource was incorrectly casting the long value 8798 to byte, before calling Similarity.decodeNormValue. (Peng Cheng via 8799 Mike McCandless) 8800 8801* LUCENE-5436: ReferenceManager#accquire can result in infinite loop if 8802 managed resource is abused outside of the ReferenceManager. Decrementing 8803 the reference without a corresponding incRef() call can cause an infinite 8804 loop. ReferenceManager now throws IllegalStateException if currently managed 8805 resources ref count is 0. (Simon Willnauer) 8806 8807* LUCENE-5443: Lucene45DocValuesProducer.ramBytesUsed() may throw 8808 ConcurrentModificationException. (Shai Erera, Simon Willnauer) 8809 8810* LUCENE-5444: MemoryIndex didn't respect the analyzers offset gap and 8811 offsets were corrupted if multiple fields with the same name were 8812 added to the memory index. (Britta Weber, Simon Willnauer) 8813 8814* LUCENE-5447: StandardTokenizer should break at consecutive chars matching 8815 Word_Break = MidLetter, MidNum and/or MidNumLet (Steve Rowe) 8816 8817* LUCENE-5462: RamUsageEstimator.sizeOf(Object) is not used anymore to 8818 estimate memory usage of segments. This used to make 8819 SegmentReader.ramBytesUsed very CPU-intensive. (Adrien Grand) 8820 8821* LUCENE-5461: ControlledRealTimeReopenThread would sometimes wait too 8822 long (up to targetMaxStaleSec) when a searcher is waiting for a 8823 specific generation, when it should have waited for at most 8824 targetMinStaleSec. (Hans Lund via Mike McCandless) 8825 8826API Changes 8827 8828* LUCENE-5339: The facet module was simplified/reworked to make the 8829 APIs more approachable to new users. Note: when migrating to the new 8830 API, you must pass the Document that is returned from FacetConfig.build() 8831 to IndexWriter.addDocument(). (Shai Erera, Gilad Barkai, Rob 8832 Muir, Mike McCandless) 8833 8834* LUCENE-5405: Make ShingleAnalyzerWrapper.getWrappedAnalyzer() public final (gsingers) 8835 8836* LUCENE-5395: The SpatialArgsParser now only reads WKT, no more "lat, lon" 8837 etc. but it's easy to override the parseShape method if you wish. (David 8838 Smiley) 8839 8840* LUCENE-5414: DocumentExpressionDictionary was renamed to 8841 DocumentValueSourceDictionary and all dependencies to the lucene-expression 8842 module were removed from lucene-suggest. DocumentValueSourceDictionary now 8843 only accepts a ValueSource instead of a convenience ctor for an expression 8844 string. (Simon Willnauer) 8845 8846* LUCENE-3069: PostingsWriterBase and PostingsReaderBase are no longer 8847 responsible for encoding/decoding a block of terms. Instead, they 8848 should encode/decode each term to/from a long[] and byte[]. (Han 8849 Jiang, Mike McCandless) 8850 8851* LUCENE-5425: FacetsCollector and MatchingDocs use a general DocIdSet, 8852 allowing for custom implementations to be used when faceting. 8853 (John Wang, Lei Wang, Shai Erera) 8854 8855Optimizations 8856 8857* LUCENE-5372: Replace StringBuffer by StringBuilder, where possible. 8858 (Joshua Hartman via Uwe Schindler, Dawid Weiss, Mike McCandless) 8859 8860* LUCENE-5271: A slightly more accurate SloppyMath distance. 8861 (Gilad Barkai via Ryan Ernst) 8862 8863* LUCENE-5399: Deep paging using IndexSearcher.searchAfter when 8864 sorting by fields is faster (Rob Muir, Mike McCandless) 8865 8866Changes in Runtime Behavior 8867 8868* LUCENE-5362: IndexReader and SegmentCoreReaders now throw 8869 AlreadyClosedException if the refCount in incremented but 8870 is less that 1. (Simon Willnauer) 8871 8872Documentation 8873 8874* LUCENE-5384: Add some tips for making tokenfilters and tokenizers 8875 to the analysis package overview. 8876 (Benson Margulies via Robert Muir - pull request #12) 8877 8878* LUCENE-5389: Add more guidance in the analysis documentation 8879 package overview. 8880 (Benson Margulies via Robert Muir - pull request #14) 8881 8882======================= Lucene 4.6.1 ======================= 8883 8884Bug fixes 8885 8886* LUCENE-5373: Memory usage of 8887 [Lucene40/Lucene42/Memory/Direct]DocValuesFormat was over-estimated. 8888 (Shay Banon, Adrien Grand, Robert Muir) 8889 8890* LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter. 8891 (Nik Everett via Adrien Grand) 8892 8893* LUCENE-5374: IndexWriter processes internal events after the it 8894 closed itself internally. This rare condition can happen if an 8895 IndexWriter has internal changes that were not fully applied yet 8896 like when index / flush requests happen concurrently to the close or 8897 rollback call. (Simon Willnauer) 8898 8899* LUCENE-5394: Fix TokenSources.getTokenStream to return payloads if 8900 they were indexed with the term vectors. (Mike McCandless) 8901 8902* LUCENE-5344: Flexible StandardQueryParser behaves differently than 8903 ClassicQueryParser. (Adriano Crestani) 8904 8905* LUCENE-5375: ToChildBlockJoinQuery works harder to detect mis-use, 8906 when the parent query incorrectly returns child documents, and throw 8907 a clear exception saying so. (Dr. Oleg Savrasov via Mike McCandless) 8908 8909* LUCENE-5401: Field.StringTokenStream#end() calls super.end() now, 8910 preventing wrong term positions for fields that use 8911 StringTokenStream. (Michael Busch) 8912 8913* LUCENE-5377: IndexWriter.addIndexes(Directory[]) would cause corruption 8914 on Lucene 4.6 if any index segments were Lucene 4.0-4.5. 8915 (Littlestar, Mike McCandless, Shai Erera, Robert Muir) 8916 8917======================= Lucene 4.6.0 ======================= 8918 8919New Features 8920 8921* LUCENE-4906: PostingsHighlighter can now render to custom Object, 8922 for advanced use cases where String is too restrictive (Luca 8923 Cavanna, Robert Muir, Mike McCandless) 8924 8925* LUCENE-5133: Changed AnalyzingInfixSuggester.highlight to return 8926 Object instead of String, to allow for advanced use cases where 8927 String is too restrictive (Robert Muir, Shai Erera, Mike 8928 McCandless) 8929 8930* LUCENE-5207, LUCENE-5334: Added expressions module for customizing ranking 8931 with script-like syntax. 8932 (Jack Conradson, Ryan Ernst, Uwe Schindler via Robert Muir) 8933 8934* LUCENE-5180: ShingleFilter now creates shingles with trailing holes, 8935 for example if a StopFilter had removed the last token. (Mike 8936 McCandless) 8937 8938* LUCENE-5219: Add support to SynonymFilterFactory for custom 8939 parsers. (Ryan Ernst via Robert Muir) 8940 8941* LUCENE-5235: Tokenizers now throw an IllegalStateException if the 8942 consumer does not call reset() before consuming the stream. Previous 8943 versions throwed NullPointerException or ArrayIndexOutOfBoundsException 8944 on best effort which was not user-friendly. 8945 (Uwe Schindler, Robert Muir) 8946 8947* LUCENE-5240: Tokenizers now throw an IllegalStateException if the 8948 consumer neglects to call close() on the previous stream before consuming 8949 the next one. (Uwe Schindler, Robert Muir) 8950 8951* LUCENE-5214: Add new FreeTextSuggester, to predict the next word 8952 using a simple ngram language model. This is useful for the "long 8953 tail" suggestions, when a primary suggester fails to find a 8954 suggestion. (Mike McCandless) 8955 8956* LUCENE-5251: New DocumentDictionary allows building suggesters via 8957 contents of existing field, weight and optionally payload stored 8958 fields in an index (Areek Zillur via Mike McCandless) 8959 8960* LUCENE-5261: Add QueryBuilder, a simple API to build queries from 8961 the analysis chain directly, or to make it easier to implement 8962 query parsers. (Robert Muir, Uwe Schindler) 8963 8964* LUCENE-5270: Add Terms.hasFreqs, to determine whether a given field 8965 indexed per-doc term frequencies. (Mike McCandless) 8966 8967* LUCENE-5269: Add CodepointCountFilter. (Robert Muir) 8968 8969* LUCENE-5294: Suggest module: add DocumentExpressionDictionary to 8970 compute each suggestion's weight using a javascript expression. 8971 (Areek Zillur via Mike McCandless) 8972 8973* LUCENE-5274: FastVectorHighlighter now supports highlighting against several 8974 indexed fields. (Nik Everett via Adrien Grand) 8975 8976* LUCENE-5304: SingletonSortedSetDocValues can now return the wrapped 8977 SortedDocValues (Robert Muir, Adrien Grand) 8978 8979* LUCENE-2844: The benchmark module can now test the spatial module. See 8980 spatial.alg (David Smiley, Liviy Ambrose) 8981 8982* LUCENE-5302: Make StemmerOverrideMap's methods public (Alan Woodward) 8983 8984* LUCENE-5296: Add DirectDocValuesFormat, which holds all doc values 8985 in heap as uncompressed java native arrays. (Mike McCandless) 8986 8987* LUCENE-5189: Add IndexWriter.updateNumericDocValues, to update 8988 numeric DocValues fields of documents, without re-indexing them. 8989 (Shai Erera, Mike McCandless, Robert Muir) 8990 8991* LUCENE-5298: Add SumValueSourceFacetRequest for aggregating facets by 8992 a ValueSource, such as a NumericDocValuesField or an expression. 8993 (Shai Erera) 8994 8995* LUCENE-5323: Add .sizeInBytes method to all suggesters (Lookup). 8996 (Areek Zillur via Mike McCandless) 8997 8998* LUCENE-5312: Add BlockJoinSorter, a new Sorter implementation that makes sure 8999 to never split up blocks of documents indexed with IndexWriter.addDocuments. 9000 (Adrien Grand) 9001 9002* LUCENE-5297: Allow to range-facet on any ValueSource, not just 9003 NumericDocValues fields. (Shai Erera) 9004 9005Bug Fixes 9006 9007* LUCENE-5272: OpenBitSet.ensureCapacity did not modify numBits, causing 9008 false assertion errors in fastSet. (Shai Erera) 9009 9010* LUCENE-5303: OrdinalsCache did not use coreCacheKey, resulting in 9011 over caching across multiple threads. (Mike McCandless, Shai Erera) 9012 9013* LUCENE-5307: Fix topScorer inconsistency in handling QueryWrapperFilter 9014 inside ConstantScoreQuery, which now rewrites to a query removing the 9015 obsolete QueryWrapperFilter. (Adrien Grand, Uwe Schindler) 9016 9017* LUCENE-5330: IndexWriter didn't process all internal events on 9018 #getReader(), #close() and #rollback() which causes files to be 9019 deleted at a later point in time. This could cause short-term disk 9020 pollution or OOM if in-memory directories are used. (Simon Willnauer) 9021 9022* LUCENE-5342: Fixed bulk-merge issue in CompressingStoredFieldsFormat which 9023 created corrupted segments when mixing chunk sizes. 9024 Lucene41StoredFieldsFormat is not impacted. (Adrien Grand, Robert Muir) 9025 9026API Changes 9027 9028* LUCENE-5222: Add SortField.needsScores(). Previously it was not possible 9029 for a custom Sort that makes use of the relevance score to work correctly 9030 with IndexSearcher when an ExecutorService is specified. 9031 (Ryan Ernst, Mike McCandless, Robert Muir) 9032 9033* LUCENE-5275: Change AttributeSource.toString() to display the current 9034 state of attributes. (Robert Muir) 9035 9036* LUCENE-5277: Modify FixedBitSet copy constructor to take an additional 9037 numBits parameter to allow growing/shrinking the copied bitset. You can 9038 use FixedBitSet.clone() if you only need to clone the bitset. (Shai Erera) 9039 9040* LUCENE-5260: Use TermFreqPayloadIterator for all suggesters; those 9041 suggesters that can't support payloads will throw an exception if 9042 hasPayloads() is true. (Areek Zillur via Mike McCandless) 9043 9044* LUCENE-5280: Rename TermFreqPayloadIterator -> InputIterator, along 9045 with associated suggest/spell classes. (Areek Zillur via Mike 9046 McCandless) 9047 9048* LUCENE-5157: Rename OrdinalMap methods to clarify API and internal structure. 9049 (Boaz Leskes via Adrien Grand) 9050 9051* LUCENE-5313: Move preservePositionIncrements from setter to ctor in 9052 Analyzing/FuzzySuggester. (Areek Zillur via Mike McCandless) 9053 9054* LUCENE-5321: Remove Facet42DocValuesFormat. Use DirectDocValuesFormat if you 9055 want to load the category list into memory. (Shai Erera, Mike McCandless) 9056 9057* LUCENE-5324: AnalyzerWrapper.getPositionIncrementGap and getOffsetGap can now 9058 be overridden. (Adrien Grand) 9059 9060Optimizations 9061 9062* LUCENE-5225: The ToParentBlockJoinQuery only keeps tracks of the the child 9063 doc ids and child scores if the ToParentBlockJoinCollector is used. 9064 (Martijn van Groningen) 9065 9066* LUCENE-5236: EliasFanoDocIdSet now has an index and uses broadword bit 9067 selection to speed-up advance(). (Paul Elschot via Adrien Grand) 9068 9069* LUCENE-5266: Improved number of read calls and branches in DirectPackedReader. (Ryan Ernst) 9070 9071* LUCENE-5300: Optimized SORTED_SET storage for fields which are single-valued. 9072 (Adrien Grand) 9073 9074Documentation 9075 9076* LUCENE-5211: Better javadocs and error checking of 'format' option in 9077 StopFilterFactory, as well as comments in all snowball formatted files 9078 about specifying format option. (hossman) 9079 9080Changes in backwards compatibility policy 9081 9082* LUCENE-5235: Sub classes of Tokenizer have to call super.reset() 9083 when implementing reset(). Otherwise the consumer will get an 9084 IllegalStateException because the Reader is not correctly assigned. 9085 It is important to never change the "input" field on Tokenizer 9086 without using setReader(). The "input" field must not be used 9087 outside reset(), incrementToken(), or end() - especially not in 9088 the constructor. (Uwe Schindler, Robert Muir) 9089 9090* LUCENE-5204: Directory doesn't have default implementations for 9091 LockFactory-related methods, which have been moved to BaseDirectory. If you 9092 had a custom Directory implementation that extended Directory, you need to 9093 extend BaseDirectory instead. (Adrien Grand) 9094 9095Build 9096 9097* LUCENE-5283: Fail the build if ant test didn't execute any tests 9098 (everything filtered out). (Dawid Weiss, Uwe Schindler) 9099 9100* LUCENE-5249, LUCENE-5257: All Lucene/Solr modules should use the same 9101 dependency versions. (Steve Rowe) 9102 9103* LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary 9104 distributions accompanying a release, including on Maven Central, 9105 should be identical across all distributions. (Steve Rowe, Uwe Schindler, 9106 Shalin Shekhar Mangar) 9107 9108* LUCENE-4753: Run forbidden-apis Ant task per module. This allows more 9109 improvements and prevents OOMs after the number of class files 9110 raised recently. (Uwe Schindler) 9111 9112Tests 9113 9114* LUCENE-5278: Fix MockTokenizer to work better with more regular expression 9115 patterns. Previously it could only behave like CharTokenizer (where a character 9116 is either a "word" character or not), but now it gives a general longest-match 9117 behavior. (Nik Everett via Robert Muir) 9118 9119======================= Lucene 4.5.1 ======================= 9120 9121Bug Fixes 9122 9123* LUCENE-4998: Fixed a few places to pass IOContext.READONCE instead 9124 of IOContext.READ (Shikhar Bhushan via Mike McCandless) 9125 9126* LUCENE-5242: DirectoryTaxonomyWriter.replaceTaxonomy did not fully reset 9127 its state, which could result in exceptions being thrown, as well as 9128 incorrect ordinals returned from getParent. (Shai Erera) 9129 9130* LUCENE-5254: Fixed bounded memory leak, where objects like live 9131 docs bitset were not freed from an starting reader after reopening 9132 to a new reader and closing the original one. (Shai Erera, Mike 9133 McCandless) 9134 9135* LUCENE-5262: Fixed file handle leaks when multiple attempts to open an 9136 NRT reader hit exceptions. (Shai Erera) 9137 9138* LUCENE-5263: Transient IOExceptions, e.g. due to disk full or file 9139 descriptor exhaustion, hit at unlucky times inside IndexWriter could 9140 lead to silently losing deletions. (Shai Erera, Mike McCandless) 9141 9142* LUCENE-5264: CommonTermsQuery ignored minMustMatch if only high-frequent 9143 terms were present in the query and the high-frequent operator was set 9144 to SHOULD. (Simon Willnauer) 9145 9146* LUCENE-5269: Fix bug in NGramTokenFilter where it would sometimes count 9147 unicode characters incorrectly. (Mike McCandless, Robert Muir) 9148 9149* LUCENE-5289: IndexWriter.hasUncommittedChanges was returning false 9150 when there were buffered delete-by-Term. (Shalin Shekhar Mangar, 9151 Mike McCandless) 9152 9153======================= Lucene 4.5.0 ======================= 9154 9155New features 9156 9157* LUCENE-5084: Added new Elias-Fano encoder, decoder and DocIdSet 9158 implementations. (Paul Elschot via Adrien Grand) 9159 9160* LUCENE-5081: Added WAH8DocIdSet, an in-memory doc id set implementation based 9161 on word-aligned hybrid encoding. (Adrien Grand) 9162 9163* LUCENE-5098: New broadword utility methods in oal.util.BroadWord. 9164 (Paul Elschot via Adrien Grand, Dawid Weiss) 9165 9166* LUCENE-5030: FuzzySuggester now supports optional unicodeAware 9167 (default is false). If true then edits are measured in Unicode code 9168 points instead of UTF8 bytes. (Artem Lukanin via Mike McCandless) 9169 9170* LUCENE-5118: SpatialStrategy.makeDistanceValueSource() now has an optional 9171 multiplier for scaling degrees to another unit. (David Smiley) 9172 9173* LUCENE-5091: SpanNotQuery can now be configured with pre and post slop to act 9174 as a hypothetical SpanNotNearQuery. (Tim Allison via David Smiley) 9175 9176* LUCENE-4985: FacetsAccumulator.create() is now able to create a 9177 MultiFacetsAccumulator over a mixed set of facet requests. MultiFacetsAccumulator 9178 allows wrapping multiple FacetsAccumulators, allowing to easily mix 9179 existing and custom ones. TaxonomyFacetsAccumulator supports any 9180 FacetRequest which implements createFacetsAggregator and was indexed 9181 using the taxonomy index. (Shai Erera) 9182 9183* LUCENE-5153: AnalyzerWrapper.wrapReader allows wrapping the Reader given to 9184 inputReader. (Shai Erera) 9185 9186* LUCENE-5155: FacetRequest.getValueOf and .getFacetArraysSource replaced by 9187 FacetsAggregator.createOrdinalValueResolver. This gives better options for 9188 resolving an ordinal's value by FacetAggregators. (Shai Erera) 9189 9190* LUCENE-5165: Add SuggestStopFilter, to be used with analyzing 9191 suggesters, so that a stop word at the very end of the lookup query, 9192 and without any trailing token characters, will be preserved. This 9193 enables query "a" to suggest apple; see 9194 http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html 9195 for details. 9196 9197* LUCENE-5178: Added support for missing values to DocValues fields. 9198 AtomicReader.getDocsWithField returns a Bits of documents with a value, 9199 and FieldCache.getDocsWithField forwards to that for DocValues fields. Things like 9200 SortField.setMissingValue, FunctionValues.exists, and FieldValueFilter now 9201 work with DocValues fields. (Robert Muir) 9202 9203* LUCENE-5124: Lucene 4.5 has a new Lucene45Codec with Lucene45DocValues, 9204 supporting missing values and with most datastructures residing off-heap. 9205 Added "Memory" docvalues format that works entirely in heap, and "Disk" 9206 loads no datastructures into RAM. Both of these also support missing values. 9207 Added DiskNormsFormat (in case you want norms entirely on disk). (Robert Muir) 9208 9209* LUCENE-2750: Added PForDeltaDocIdSet, an in-memory doc id set implementation 9210 based on the PFOR encoding. (Adrien Grand) 9211 9212* LUCENE-5186: Added CachingWrapperFilter.getFilter in order to be able to get 9213 the wrapped filter. (Trejkaz via Adrien Grand) 9214 9215* LUCENE-5197: Added SegmentReader.ramBytesUsed to return approximate heap RAM 9216 used by index datastructures. (Areek Zillur via Robert Muir) 9217 9218Bug Fixes 9219 9220* LUCENE-5116: IndexWriter.addIndexes(IndexReader...) should drop empty (or all 9221 deleted) segments. (Robert Muir, Shai Erera) 9222 9223* LUCENE-5132: Spatial RecursivePrefixTree Contains predicate will throw an NPE 9224 when there's no indexed data and maybe in other circumstances too. (David Smiley) 9225 9226* LUCENE-5146: AnalyzingSuggester sort comparator read part of the input key as the 9227 weight that caused the sorter to never sort by weight first since the weight is only 9228 considered if the input is equal causing the malformed weight to be identical as well. 9229 (Simon Willnauer) 9230 9231* LUCENE-5151: Associations FacetsAggregators could enter an infinite loop when 9232 some result documents were missing category associations. (Shai Erera) 9233 9234* LUCENE-5152: Fix MemoryPostingsFormat to not modify borrowed BytesRef from FSTEnum 9235 seek/lookup which can cause side effects if done on a cached FST root arc. 9236 (Simon Willnauer) 9237 9238* LUCENE-5160: Handle the case where reading from a file or FileChannel returns -1, 9239 which could happen in rare cases where something happens to the file between the 9240 time we start the read loop (where we check the length) and when we actually do 9241 the read. (gsingers, yonik, Robert Muir, Uwe Schindler) 9242 9243* LUCENE-5166: PostingsHighlighter would throw IOOBE if a term spanned the maxLength 9244 boundary, made it into the top-N and went to the formatter. 9245 (Manuel Amoabeng, Michael McCandless, Robert Muir) 9246 9247* LUCENE-4583: Indexing core no longer enforces a limit on maximum 9248 length binary doc values fields, but individual codecs (including 9249 the default one) have their own limits (David Smiley, Robert Muir, 9250 Mike McCandless) 9251 9252* LUCENE-3849: TokenStreams now set the position increment in end(), 9253 so we can handle trailing holes. If you have a custom TokenStream 9254 implementing end() then be sure it calls super.end(). (Robert Muir, 9255 Mike McCandless) 9256 9257* LUCENE-5192: IndexWriter could allow adding same field name with different 9258 DocValueTypes under some circumstances. (Shai Erera) 9259 9260* LUCENE-5191: SimpleHTMLEncoder in Highlighter module broke Unicode 9261 outside BMP because it encoded UTF-16 chars instead of codepoints. 9262 The escaping of codepoints > 127 was removed (not needed for valid HTML) 9263 and missing escaping for ' and / was added. (Uwe Schindler) 9264 9265* LUCENE-5201: Fixed compression bug in LZ4.compressHC when the input is highly 9266 compressible and the start offset of the array to compress is > 0. 9267 (Adrien Grand) 9268 9269* LUCENE-5221: SimilarityBase did not write norms the same way as DefaultSimilarity 9270 if discountOverlaps == false and index-time boosts are present for the field. 9271 (Yubin Kim via Robert Muir) 9272 9273* LUCENE-5223: Fixed IndexUpgrader command line parsing: -verbose is not required 9274 and -dir-impl option now works correctly. (hossman) 9275 9276* LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always 9277 return a ConstantScoreQuery to make scoring consistent. Previously it 9278 returned an empty unwrapped BooleanQuery, if no terms were available, 9279 which has a different query norm. (Nik Everett, Uwe Schindler) 9280 9281* LUCENE-5218: In some cases, trying to retrieve or merge a 0-length 9282 binary doc value would hit an ArrayIndexOutOfBoundsException. 9283 (Littlestar via Mike McCandless) 9284 9285API Changes 9286 9287* LUCENE-5094: Add ramBytesUsed() to MultiDocValues.OrdinalMap. 9288 (Robert Muir) 9289 9290* LUCENE-5114: Remove unused boolean useCache parameter from 9291 TermsEnum.seekCeil and .seekExact (Mike McCandless) 9292 9293* LUCENE-5128: IndexSearcher.searchAfter throws IllegalArgumentException if 9294 searchAfter exceeds the number of documents in the reader. 9295 (Crocket via Shai Erera) 9296 9297* LUCENE-5129: CategoryAssociationsContainer no longer supports null 9298 association values for categories. If you want to index categories without 9299 associations, you should add them using FacetFields. (Shai Erera) 9300 9301* LUCENE-4876: IndexWriter no longer clones the given IndexWriterConfig. If you 9302 need to use the same config more than once, e.g. when sharing between multiple 9303 writers, make sure to clone it before passing to each writer. 9304 (Shai Erera, Mike McCandless) 9305 9306* LUCENE-5144: StandardFacetsAccumulator renamed to OldFacetsAccumulator, and all 9307 associated classes were moved under o.a.l.facet.old. The intention to remove it 9308 one day, when the features it covers (complements, partitions, sampling) will be 9309 migrated to the new FacetsAggregator and FacetsAccumulator API. Also, 9310 FacetRequest.createAggregator was replaced by OldFacetsAccumulator.createAggregator. 9311 (Shai Erera) 9312 9313* LUCENE-5149: CommonTermsQuery now allows to set the minimum number of terms that 9314 should match for its high and low frequent sub-queries. Previously this was only 9315 supported on the low frequent terms query. (Simon Willnauer) 9316 9317* LUCENE-5156: CompressingTermVectors TermsEnum no longer supports ord(). 9318 (Robert Muir) 9319 9320* LUCENE-5161, LUCENE-5164: Fix default chunk sizes in FSDirectory to not be 9321 unnecessarily large (now 8192 bytes); also use chunking when writing to index 9322 files. FSDirectory#setReadChunkSize() is now deprecated and will be removed 9323 in Lucene 5.0. (Uwe Schindler, Robert Muir, gsingers) 9324 9325* LUCENE-5170: Analyzer.ReuseStrategy instances are now stateless and can 9326 be reused in other Analyzer instances, which was not possible before. 9327 Lucene ships now with stateless singletons for per field and global reuse. 9328 Legacy code can still instantiate the deprecated implementation classes, 9329 but new code should use the constants. Implementors of custom strategies 9330 have to take care of new method signatures. AnalyzerWrapper can now be 9331 configured to use a custom strategy, too, ideally the one from the wrapped 9332 Analyzer. Analyzer adds a getter to retrieve the strategy for this use-case. 9333 (Uwe Schindler, Robert Muir, Shay Banon) 9334 9335* LUCENE-5173: Lucene never writes segments with 0 documents anymore. 9336 (Shai Erera, Uwe Schindler, Robert Muir) 9337 9338* LUCENE-5178: SortedDocValues always returns -1 ord when a document is missing 9339 a value for the field. Previously it only did this if the SortedDocValues 9340 was produced by uninversion on the FieldCache. (Robert Muir) 9341 9342* LUCENE-5183: remove BinaryDocValues.MISSING. In order to determine a document 9343 is missing a field, use getDocsWithField instead. (Robert Muir) 9344 9345Changes in Runtime Behavior 9346 9347* LUCENE-5178: DocValues codec consumer APIs (iterables) return null values 9348 when the document has no value for the field. (Robert Muir) 9349 9350* LUCENE-5200: The HighFreqTerms command-line tool returns the true top-N 9351 by totalTermFreq when using the -t option, it uses the term statistics (faster) 9352 and now always shows totalTermFreq in the output. (Robert Muir) 9353 9354Optimizations 9355 9356* LUCENE-5088: Added TermFilter to filter docs by a specific term. 9357 (Martijn van Groningen) 9358 9359* LUCENE-5119: DiskDV keeps the document-to-ordinal mapping on disk for 9360 SortedDocValues. (Robert Muir) 9361 9362* LUCENE-5145: New AppendingPackedLongBuffer, a new variant of the former 9363 AppendingLongBuffer which assumes values are 0-based. 9364 (Boaz Leskes via Adrien Grand) 9365 9366* LUCENE-5145: All Appending*Buffer now support bulk get. 9367 (Boaz Leskes via Adrien Grand) 9368 9369* LUCENE-5140: Fixed a performance regression of span queries caused by 9370 LUCENE-4946. (Alan Woodward, Adrien Grand) 9371 9372* LUCENE-5150: Make WAH8DocIdSet able to inverse its encoding in order to 9373 compress dense sets efficiently as well. (Adrien Grand) 9374 9375* LUCENE-5159: Prefix-code the sorted/sortedset value dictionaries in DiskDV. 9376 (Robert Muir) 9377 9378* LUCENE-5170: Fixed several wrapper analyzers to inherit the reuse strategy 9379 of the wrapped Analyzer. (Uwe Schindler, Robert Muir, Shay Banon) 9380 9381* LUCENE-5006: Simplified DocumentsWriter and DocumentsWriterPerThread 9382 synchronization and concurrent interaction with IndexWriter. DWPT is now 9383 only setup once and has no reset logic. All segment publishing and state 9384 transition from DWPT into IndexWriter is now done via an Event-Queue 9385 processed from within the IndexWriter in order to prevent situations 9386 where DWPT or DW calling int IW causing deadlocks. (Simon Willnauer) 9387 9388* LUCENE-5182: Terminate phrase searches early if max phrase window is 9389 exceeded in FastVectorHighlighter to prevent very long running phrase 9390 extraction if phrase terms are high frequent. (Simon Willnauer) 9391 9392* LUCENE-5188: CompressingStoredFieldsFormat now slices chunks containing big 9393 documents into fixed-size blocks so that requesting a single field does not 9394 necessarily force to decompress the whole chunk. (Adrien Grand) 9395 9396* LUCENE-5101: CachingWrapper makes it easier to plug-in a custom cacheable 9397 DocIdSet implementation and uses WAH8DocIdSet by default, which should be 9398 more memory efficient than FixedBitSet on average as well as faster on small 9399 sets. (Robert Muir) 9400 9401Documentation 9402 9403* LUCENE-4894: remove facet userguide as it was outdated. Partially absorbed into 9404 package's documentation and classes javadocs. (Shai Erera) 9405 9406* LUCENE-5206: Clarify FuzzyQuery's unexpected behavior on short 9407 terms. (Tim Allison via Mike McCandless) 9408 9409Changes in backwards compatibility policy 9410 9411* LUCENE-5141: CheckIndex.fixIndex(Status,Codec) is now 9412 CheckIndex.fixIndex(Status). If you used to pass a codec to this method, just 9413 remove it from the arguments. (Adrien Grand) 9414 9415* LUCENE-5089, SOLR-5126: Update to Morfologik 1.7.1. MorfologikAnalyzer and MorfologikFilter 9416 no longer support multiple "dictionaries" as there is only one dictionary available. 9417 (Dawid Weiss) 9418 9419* LUCENE-5170: Changed method signatures of Analyzer.ReuseStrategy to take 9420 Analyzer. Closeable interface was removed because the class was changed to 9421 be stateless. (Uwe Schindler, Robert Muir, Shay Banon) 9422 9423* LUCENE-5187: SlowCompositeReaderWrapper constructor is now private, 9424 SlowCompositeReaderWrapper.wrap should be used instead. (Adrien Grand) 9425 9426* LUCENE-5101: CachingWrapperFilter doesn't always return FixedBitSet instances 9427 anymore. Users of the join module can use 9428 oal.search.join.FixedBitSetCachingWrapperFilter instead. (Adrien Grand) 9429 9430Build 9431 9432* SOLR-5159: Manifest includes non-parsed maven variables. 9433 (Artem Karpenko via Steve Rowe) 9434 9435* LUCENE-5193: Add jar-src as top-level target to generate all Lucene and Solr 9436 *-src.jar. (Steve Rowe, Shai Erera) 9437 9438======================= Lucene 4.4.0 ======================= 9439 9440Changes in backwards compatibility policy 9441 9442* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords 9443 (Dawid Weiss, Grzegorz Sobczyk) 9444 9445* LUCENE-4955: NGramTokenFilter now emits all n-grams for the same token at the 9446 same position and preserves the position length and the offsets of the 9447 original token. (Simon Willnauer, Adrien Grand) 9448 9449* LUCENE-4955: NGramTokenizer now emits n-grams in a different order 9450 (a, ab, b, bc, c) instead of (a, b, c, ab, bc) and doesn't trim trailing 9451 whitespaces. (Adrien Grand) 9452 9453* LUCENE-5042: The n-gram and edge n-gram tokenizers and filters now correctly 9454 handle supplementary characters, and the tokenizers have the ability to 9455 pre-tokenize the input stream similarly to CharTokenizer. (Adrien Grand) 9456 9457* LUCENE-4967: NRTManager is replaced by 9458 ControlledRealTimeReopenThread, for controlling which requests must 9459 see which indexing changes, so that it can work with any 9460 ReferenceManager (Mike McCandless) 9461 9462* LUCENE-4973: SnapshotDeletionPolicy no longer requires a unique 9463 String id (Mike McCandless, Shai Erera) 9464 9465* LUCENE-4946: The internal sorting API (SorterTemplate, now Sorter) has been 9466 completely refactored to allow for a better implementation of TimSort. 9467 (Adrien Grand, Uwe Schindler, Dawid Weiss) 9468 9469* LUCENE-4963: Some TokenFilter options that generate broken TokenStreams have 9470 been deprecated: updateOffsets=true on TrimFilter and 9471 enablePositionIncrements=false on all classes that inherit from 9472 FilteringTokenFilter: JapanesePartOfSpeechStopFilter, KeepWordFilter, 9473 LengthFilter, StopFilter and TypeTokenFilter. (Adrien Grand) 9474 9475* LUCENE-4963: In order not to take position increments into account in 9476 suggesters, you now need to call setPreservePositionIncrements(false) instead 9477 of configuring the token filters to not increment positions. (Adrien Grand) 9478 9479* LUCENE-3907: EdgeNGramTokenizer now supports maxGramSize > 1024, doesn't trim 9480 the input, sets position increment = 1 for all tokens and doesn't support 9481 backward grams anymore. (Adrien Grand) 9482 9483* LUCENE-3907: EdgeNGramTokenFilter does not support backward grams and does 9484 not update offsets anymore. (Adrien Grand) 9485 9486* LUCENE-4981: PositionFilter is now deprecated as it can corrupt token stream 9487 graphs. Since it main use-case was to make query parsers generate boolean 9488 queries instead of phrase queries, it is now advised to use 9489 QueryParser.setAutoGeneratePhraseQueries(false) (for simple cases) or to 9490 override QueryParser.newFieldQuery. (Adrien Grand, Steve Rowe) 9491 9492* LUCENE-5018: CompoundWordTokenFilterBase and its children 9493 DictionaryCompoundWordTokenFilter and HyphenationCompoundWordTokenFilter don't 9494 update offsets anymore. (Adrien Grand) 9495 9496* LUCENE-5015: SamplingAccumulator no longer corrects the counts of the sampled 9497 categories. You should set TakmiSampleFixer on SamplingParams if required (but 9498 notice that this means slower search). (Rob Audenaerde, Gilad Barkai, Shai Erera) 9499 9500* LUCENE-4933: Replace ExactSimScorer/SloppySimScorer with just SimScorer. Previously 9501 there were 2 implementations as a performance hack to support tableization of 9502 sqrt(), but this caching is removed, as sqrt is implemented in hardware with modern 9503 jvms and it's faster not to cache. (Robert Muir) 9504 9505* LUCENE-5038: MergePolicy now has a default implementation for useCompoundFile based 9506 on segment size and noCFSRatio. The default implementation was pulled up from 9507 TieredMergePolicy. (Simon Willnauer) 9508 9509* LUCENE-5063: FieldCache.get(Bytes|Shorts), SortField.Type.(BYTE|SHORT) and 9510 FieldCache.DEFAULT_(BYTE|SHORT|INT|LONG|FLOAT|DOUBLE)_PARSER are now 9511 deprecated. These methods/types assume that data is stored as strings although 9512 Lucene has much better support for numeric data through (Int|Long)Field, 9513 NumericRangeQuery and FieldCache.get(Int|Long)s. (Adrien Grand) 9514 9515* LUCENE-5078: TfIDFSimilarity lets you encode the norm value as any arbitrary long. 9516 As a result, encode/decodeNormValue were made abstract with their signatures changed. 9517 The default implementation was moved to DefaultSimilarity, which encodes the norm as 9518 a single-byte value. (Shai Erera) 9519 9520Bug Fixes 9521 9522* LUCENE-4890: QueryTreeBuilder.getBuilder() only finds interfaces on the 9523 most derived class. (Adriano Crestani) 9524 9525* LUCENE-4997: Internal test framework's tests are sensitive to previous 9526 test failures and tests.failfast. (Dawid Weiss, Shai Erera) 9527 9528* LUCENE-4955: NGramTokenizer now supports inputs larger than 1024 chars. 9529 (Adrien Grand) 9530 9531* LUCENE-4959: Fix incorrect return value in 9532 SimpleNaiveBayesClassifier.assignClass. (Alexey Kutin via Adrien Grand) 9533 9534* LUCENE-4972: DirectoryTaxonomyWriter created empty commits even if no changes 9535 were made. (Shai Erera, Michael McCandless) 9536 9537* LUCENE-949: AnalyzingQueryParser can't work with leading wildcards. 9538 (Tim Allison, Robert Muir, Steve Rowe) 9539 9540* LUCENE-4980: Fix issues preventing mixing of RangeFacetRequest and 9541 non-RangeFacetRequest when using DrillSideways. (Mike McCandless, 9542 Shai Erera) 9543 9544* LUCENE-4996: Ensure DocInverterPerField always includes field name 9545 in exception messages. (Markus Jelsma via Robert Muir) 9546 9547* LUCENE-4992: Fix constructor of CustomScoreQuery to take FunctionQuery 9548 for scoringQueries. Instead use QueryValueSource to safely wrap arbitrary 9549 queries and use them with CustomScoreQuery. (John Wang, Robert Muir) 9550 9551* LUCENE-5016: SamplingAccumulator returned inconsistent label if asked to 9552 aggregate a non-existing category. Also fixed a bug in RangeAccumulator if 9553 some readers did not have the requested numeric DV field. 9554 (Rob Audenaerde, Shai Erera) 9555 9556* LUCENE-5028: Remove pointless and confusing doShare option in FST's 9557 PositiveIntOutputs (Han Jiang via Mike McCandless) 9558 9559* LUCENE-5032: Fix IndexOutOfBoundsExc in PostingsHighlighter when 9560 multi-valued fields exceed maxLength (Tomás Fernández Löbbe 9561 via Mike McCandless) 9562 9563* LUCENE-4933: SweetSpotSimilarity didn't apply its tf function to some 9564 queries (SloppyPhraseQuery, SpanQueries). (Robert Muir) 9565 9566* LUCENE-5033: SlowFuzzyQuery was accepting too many terms (documents) when 9567 provided minSimilarity is an int > 1 (Tim Allison via Mike McCandless) 9568 9569* LUCENE-5045: DrillSideways.search did not work on an empty index. (Shai Erera) 9570 9571* LUCENE-4995: CompressingStoredFieldsReader now only reuses an internal buffer 9572 when there is no more than 32kb to decompress. This prevents from running 9573 into out-of-memory errors when working with large stored fields. 9574 (Adrien Grand) 9575 9576* LUCENE-5062: If the spatial data for a document was comprised of multiple 9577 overlapping or adjacent parts then a CONTAINS predicate query might not match 9578 when the sum of those shapes contain the query shape but none do individually. 9579 A flag was added to use the original faster algorithm. (David Smiley) 9580 9581* LUCENE-4971: Fixed NPE in AnalyzingSuggester when there are too many 9582 graph expansions. (Alexey Kudinov via Mike McCandless) 9583 9584* LUCENE-5080: Combined setMaxMergeCount and setMaxThreadCount into one 9585 setter in ConcurrentMergePolicy: setMaxMergesAndThreads. Previously these 9586 setters would not work unless you invoked them very carefully. 9587 (Robert Muir, Shai Erera) 9588 9589* LUCENE-5068: QueryParserUtil.escape() does not escape forward slash. 9590 (Matias Holte via Steve Rowe) 9591 9592* LUCENE-5103: A join on A single-valued field with deleted docs scored too few 9593 docs. (David Smiley) 9594 9595* LUCENE-5090: Detect mismatched readers passed to 9596 SortedSetDocValuesReaderState and SortedSetDocValuesAccumulator. 9597 (Robert Muir, Mike McCandless) 9598 9599* LUCENE-5120: AnalyzingSuggester modified its FST's cached root arc if payloads 9600 are used and the entire output resided on the root arc on the first access. This 9601 caused subsequent suggest calls to fail. (Simon Willnauer) 9602 9603Optimizations 9604 9605* LUCENE-4936: Improve numeric doc values compression in case all values share 9606 a common divisor. In particular, this improves the compression ratio of dates 9607 without time when they are encoded as milliseconds since Epoch. Also support 9608 TABLE compressed numerics in the Disk codec. (Robert Muir, Adrien Grand) 9609 9610* LUCENE-4951: DrillSideways uses the new Scorer.cost() method to make 9611 better decisions about which scorer to use internally. (Mike McCandless) 9612 9613* LUCENE-4976: PersistentSnapshotDeletionPolicy writes its state to a 9614 single snapshots_N file, and no longer requires closing (Mike 9615 McCandless, Shai Erera) 9616 9617* LUCENE-5035: Compress addresses in FieldCacheImpl.SortedDocValuesImpl more 9618 efficiently. (Adrien Grand, Robert Muir) 9619 9620* LUCENE-4941: Sort "from" terms only once when using JoinUtil. 9621 (Martijn van Groningen) 9622 9623* LUCENE-5050: Close the stored fields and term vectors index files as soon as 9624 the index has been loaded into memory to save file descriptors. (Adrien Grand) 9625 9626* LUCENE-5086: RamUsageEstimator now uses official Java 7 API or a proprietary 9627 Oracle Java 6 API to get Hotspot MX bean, preventing AWT classes to be 9628 loaded on MacOSX. (Shay Banon, Dawid Weiss, Uwe Schindler) 9629 9630New Features 9631 9632* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords 9633 (Dawid Weiss, Grzegorz Sobczyk) 9634 9635* LUCENE-5064: Added PagedMutable (internal), a paged extension of 9636 PackedInts.Mutable which allows for storing more than 2B values. (Adrien Grand) 9637 9638* LUCENE-4766: Added a PatternCaptureGroupTokenFilter that uses Java regexes to 9639 emit multiple tokens one for each capture group in one or more patterns. 9640 (Simon Willnauer, Clinton Gormley) 9641 9642* LUCENE-4952: Expose control (protected method) in DrillSideways to 9643 force all sub-scorers to be on the same document being collected. 9644 This is necessary when using collectors like 9645 ToParentBlockJoinCollector with DrillSideways. (Mike McCandless) 9646 9647* SOLR-4761: Add SimpleMergedSegmentWarmer, which just initializes terms, 9648 norms, docvalues, and so on. (Mark Miller, Mike McCandless, Robert Muir) 9649 9650* LUCENE-4964: Allow arbitrary Query for per-dimension drill-down to 9651 DrillDownQuery and DrillSideways, to support future dynamic faceting 9652 methods (Mike McCandless) 9653 9654* LUCENE-4966: Add CachingWrapperFilter.sizeInBytes() (Mike McCandless) 9655 9656* LUCENE-4965: Add dynamic (no taxonomy index used) numeric range 9657 faceting to Lucene's facet module (Mike McCandless, Shai Erera) 9658 9659* LUCENE-4979: LiveFieldFields can work with any ReferenceManager, not 9660 just ReferenceManager<IndexSearcher> (Mike McCandless). 9661 9662* LUCENE-4975: Added a new Replicator module which can replicate index 9663 revisions between server and client. (Shai Erera, Mike McCandless) 9664 9665* LUCENE-5022: Added FacetResult.mergeHierarchies to merge multiple 9666 FacetResult of the same dimension into a single one with the reconstructed 9667 hierarchy. (Shai Erera) 9668 9669* LUCENE-5026: Added PagedGrowableWriter, a new internal packed-ints structure 9670 that grows the number of bits per value on demand, can store more than 2B 9671 values and supports random write and read access. (Adrien Grand) 9672 9673* LUCENE-5025: FST's Builder can now handle more than 2.1 billion 9674 "tail nodes" while building a minimal FST. (Aaron Binns, Adrien 9675 Grand, Mike McCandless) 9676 9677* LUCENE-5063: FieldCache.DEFAULT.get(Ints|Longs) now uses bit-packing to save 9678 memory. (Adrien Grand) 9679 9680* LUCENE-5079: IndexWriter.hasUncommittedChanges() returns true if there are 9681 changes that have not been committed. (yonik, Mike McCandless, Uwe Schindler) 9682 9683* SOLR-4565: Extend NorwegianLightStemFilter and NorwegianMinimalStemFilter 9684 to handle "nynorsk" (Erlend Garåsen, janhoy via Robert Muir) 9685 9686* LUCENE-5087: Add getMultiValuedSeparator to PostingsHighlighter, for cases 9687 where you want a different logical separator between field values. This can 9688 be set to e.g. U+2029 PARAGRAPH SEPARATOR if you never want passes to span 9689 values. (Mike McCandless, Robert Muir) 9690 9691* LUCENE-5013: Added ScandinavianFoldingFilterFactory and 9692 ScandinavianNormalizationFilterFactory (Karl Wettin via janhoy) 9693 9694* LUCENE-4845: AnalyzingInfixSuggester finds suggestions based on 9695 matches to any tokens in the suggestion, not just based on pure 9696 prefix matching. (Mike McCandless, Robert Muir) 9697 9698API Changes 9699 9700* LUCENE-5077: Make it easier to use compressed norms. Lucene42NormsFormat takes 9701 an overhead parameter, so you can easily pass a different value other than 9702 PackedInts.FASTEST from your own codec. (Robert Muir) 9703 9704* LUCENE-5097: Analyzer now has an additional tokenStream(String fieldName, 9705 String text) method, so wrapping by StringReader for common use is no 9706 longer needed. This method uses an internal reusable reader, which was 9707 previously only used by the Field class. (Uwe Schindler, Robert Muir) 9708 9709* LUCENE-4542: HunspellStemFilter's maximum recursion level is now configurable. 9710 (Piotr, Rafał Kuć via Adrien Grand) 9711 9712Build 9713 9714* LUCENE-4987: Upgrade randomized testing to version 2.0.10: 9715 Test framework may fail internally due to overly aggressive J9 optimizations. 9716 (Dawid Weiss, Shai Erera) 9717 9718* LUCENE-5043: The eclipse target now uses the containing directory for the 9719 project name. This also enforces UTF-8 encoding when files are copied with 9720 filtering. 9721 9722* LUCENE-5055: "rat-sources" target now checks also build.xml, ivy.xml, 9723 forbidden-api signatures, and parts of resources folders. (Ryan Ernst, 9724 Uwe Schindler) 9725 9726* LUCENE-5072: Automatically patch javadocs generated by JDK versions 9727 before 7u25 to work around the frame injection vulnerability (CVE-2013-1571, 9728 VU#225657). (Uwe Schindler) 9729 9730Tests 9731 9732* LUCENE-4901: TestIndexWriterOnJRECrash should work on any 9733 JRE vendor via Runtime.halt(). 9734 (Mike McCandless, Robert Muir, Uwe Schindler, Rodrigo Trujillo, Dawid Weiss) 9735 9736Changes in runtime behavior 9737 9738* LUCENE-5038: New segments written by IndexWriter are now wrapped into CFS 9739 by default. DocumentsWriterPerThread doesn't consult MergePolicy anymore 9740 to decide if a CFS must be written, instead IndexWriterConfig now has a 9741 property to enable / disable CFS for newly created segments. (Simon Willnauer) 9742 9743* LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding, 9744 Unicode is no longer escaped. Reading of legacy properties files with 9745 \u escapes is still possible. (Uwe Schindler, Robert Muir) 9746 9747======================= Lucene 4.3.1 ======================= 9748 9749Bug Fixes 9750 9751* SOLR-4813: Fix SynonymFilterFactory to allow init parameters for 9752 tokenizer factory used when parsing synonyms file. (Shingo Sasaki, hossman) 9753 9754* LUCENE-4935: CustomScoreQuery wrongly applied its query boost twice 9755 (boost^2). (Robert Muir) 9756 9757* LUCENE-4948: Fixed ArrayIndexOutOfBoundsException in PostingsHighlighter 9758 if you had a 64-bit JVM without compressed OOPS: IBM J9, or Oracle with 9759 large heap/explicitly disabled. (Mike McCandless, Uwe Schindler, Robert Muir) 9760 9761* LUCENE-4953: Fixed ParallelCompositeReader to inform ReaderClosedListeners of 9762 its synthetic subreaders. FieldCaches keyed on the atomic children will be purged 9763 earlier and FC insanity prevented. In addition, ParallelCompositeReader's 9764 toString() was changed to better reflect the reader structure. 9765 (Mike McCandless, Uwe Schindler) 9766 9767* LUCENE-4968: Fixed ToParentBlockJoinQuery/Collector: correctly handle parent 9768 hits that had no child matches, don't throw IllegalArgumentEx when 9769 the child query has no hits, more aggressively catch cases where childQuery 9770 incorrectly matches parent documents (Mike McCandless) 9771 9772* LUCENE-4970: Fix boost value of rewritten NGramPhraseQuery. 9773 (Shingo Sasaki via Adrien Grand) 9774 9775* LUCENE-4974: CommitIndexTask was broken if no params were set. (Shai Erera) 9776 9777* LUCENE-4986: Fixed case where a newly opened near-real-time reader 9778 fails to reflect a delete from IndexWriter.tryDeleteDocument (Reg, 9779 Mike McCandless) 9780 9781* LUCENE-4994: Fix PatternKeywordMarkerFilter to have public constructor. 9782 (Uwe Schindler) 9783 9784* LUCENE-4993: Fix BeiderMorseFilter to preserve custom attributes when 9785 inserting tokens with position increment 0. (Uwe Schindler) 9786 9787* LUCENE-4991: Fix handling of synonyms in classic QueryParser.getFieldQuery for 9788 terms not separated by whitespace. PositionIncrementAttribute was ignored, so with 9789 default AND synonyms wrongly became mandatory clauses, and with OR, the 9790 coordination factor was wrong. (李威, Robert Muir) 9791 9792* LUCENE-5002: IndexWriter#deleteAll() caused a deadlock in DWPT / DWSC if a 9793 DwPT was flushing concurrently while deleteAll() aborted all DWPT. The IW 9794 should never wait on DWPT via the flush control while holding on to the IW 9795 Lock. (Simon Willnauer) 9796 9797Optimizations 9798 9799* LUCENE-4938: Don't use an unnecessarily large priority queue in IndexSearcher 9800 methods that take top-N. (Uwe Schindler, Mike McCandless, Robert Muir) 9801 9802 9803======================= Lucene 4.3.0 ======================= 9804 9805Changes in backwards compatibility policy 9806 9807* LUCENE-4810: EdgeNGramTokenFilter no longer increments position for 9808 multiple ngrams derived from the same input token. (Walter Underwood 9809 via Mike McCandless) 9810 9811* LUCENE-4822: KeywordTokenFilter is now an abstract class. Subclasses 9812 need to implement #isKeyword() in order to mark terms as keywords. 9813 The existing functionality has been factored out into a new 9814 SetKeywordTokenFilter class. (Simon Willnauer, Uwe Schindler) 9815 9816* LUCENE-4642: Remove Tokenizer's and subclasses' ctors taking 9817 AttributeSource. (Renaud Delbru, Uwe Schindler, Steve Rowe) 9818 9819* LUCENE-4833: IndexWriterConfig used to use LogByteSizeMergePolicy when 9820 calling setMergePolicy(null) although the default merge policy is 9821 TieredMergePolicy. IndexWriterConfig setters now throw an exception when 9822 passed null if null is not a valid value. (Adrien Grand) 9823 9824* LUCENE-4849: Made ParallelTaxonomyArrays abstract with a concrete 9825 implementation for DirectoryTaxonomyWriter/Reader. Also moved it under 9826 o.a.l.facet.taxonomy. (Shai Erera) 9827 9828* LUCENE-4876: IndexDeletionPolicy is now an abstract class instead of an 9829 interface. IndexDeletionPolicy, MergeScheduler and InfoStream now implement 9830 Cloneable. (Adrien Grand) 9831 9832* LUCENE-4874: FilterAtomicReader and related classes (FilterTerms, 9833 FilterDocsEnum, ...) don't forward anymore to the filtered instance when the 9834 method has a default implementation through other abstract methods. 9835 (Adrien Grand, Robert Muir) 9836 9837* LUCENE-4642, LUCENE-4877: Implementors of TokenizerFactory, TokenFilterFactory, 9838 and CharFilterFactory now need to provide at least one constructor taking 9839 Map<String,String> to be able to be loaded by the SPI framework (e.g., from Solr). 9840 In addition, TokenizerFactory needs to implement the abstract 9841 create(AttributeFactory,Reader) method. (Renaud Delbru, Uwe Schindler, 9842 Steve Rowe, Robert Muir) 9843 9844API Changes 9845 9846* LUCENE-4896: Made PassageFormatter abstract in PostingsHighlighter, made 9847 members of DefaultPassageFormatter protected. (Luca Cavanna via Robert Muir) 9848 9849* LUCENE-4844: removed TaxonomyReader.getParent(), you should use 9850 TaxonomyReader.getParallelArrays().parents() instead. (Shai Erera) 9851 9852* LUCENE-4742: Renamed spatial 'Node' to 'Cell', along with any method names 9853 and variables using this terminology. (David Smiley) 9854 9855New Features 9856 9857* LUCENE-4815: DrillSideways now allows more than one FacetRequest per 9858 dimension (Mike McCandless) 9859 9860* LUCENE-3918: IndexSorter has been ported to 4.3 API and now supports 9861 sorting documents by a numeric DocValues field, or reverse the order of 9862 the documents in the index. Additionally, apps can implement their own 9863 sort criteria. (Anat Hashavit, Shai Erera) 9864 9865* LUCENE-4817: Added KeywordRepeatFilter that allows to emit a token twice 9866 once as a keyword and once as an ordinary token allow stemmers to emit 9867 a stemmed version along with the un-stemmed version. (Simon Willnauer) 9868 9869* LUCENE-4822: PatternKeywordTokenFilter can mark tokens as keywords based 9870 on regular expressions. (Simon Willnauer, Uwe Schindler) 9871 9872* LUCENE-4821: AnalyzingSuggester now uses the ending offset to 9873 determine whether the last token was finished or not, so that a 9874 query "i " will no longer suggest "Isla de Muerta" for example. 9875 (Mike McCandless) 9876 9877* LUCENE-4642: Add create(AttributeFactory) to TokenizerFactory and 9878 subclasses with ctors taking AttributeFactory. 9879 (Renaud Delbru, Uwe Schindler, Steve Rowe) 9880 9881* LUCENE-4820: Add payloads to Analyzing/FuzzySuggester, to record an 9882 arbitrary byte[] per suggestion (Mike McCandless) 9883 9884* LUCENE-4816: Add WholeBreakIterator to PostingsHighlighter 9885 for treating the entire content as a single Passage. (Robert 9886 Muir, Mike McCandless) 9887 9888* LUCENE-4827: Add additional ctor to PostingsHighlighter PassageScorer 9889 to provide bm25 k1,b,avgdl parameters. (Robert Muir) 9890 9891* LUCENE-4607: Add DocIDSetIterator.cost() and Spans.cost() for optimizing 9892 scoring. (Simon Willnauer, Robert Muir) 9893 9894* LUCENE-4795: Add SortedSetDocValuesFacetFields and 9895 SortedSetDocValuesAccumulator, to compute topK facet counts from a 9896 field's SortedSetDocValues. This method only supports flat 9897 (dim/label) facets, is a bit (~25%) slower, has added cost 9898 per-IndexReader-open to compute its ordinal map, but it requires no 9899 taxonomy index and it tie-breaks facet labels in an understandable 9900 (by Unicode sort order) way. (Robert Muir, Mike McCandless) 9901 9902* LUCENE-4843: Add LimitTokenPositionFilter: don't emit tokens with 9903 positions that exceed the configured limit. (Steve Rowe) 9904 9905* LUCENE-4832: Add ToParentBlockJoinCollector.getTopGroupsWithAllChildDocs, to retrieve 9906 all children in each group. (Aleksey Aleev via Mike McCandless) 9907 9908* LUCENE-4846: PostingsHighlighter subclasses can override where the 9909 String values come from (it still defaults to pulling from stored 9910 fields). (Robert Muir, Mike McCandless) 9911 9912* LUCENE-4853: Add PostingsHighlighter.highlightFields method that 9913 takes int[] docIDs instead of TopDocs. (Robert Muir, Mike 9914 McCandless) 9915 9916* LUCENE-4856: If there are no matches for a given field, return the 9917 first maxPassages sentences (Robert Muir, Mike McCandless) 9918 9919* LUCENE-4859: IndexReader now exposes Terms statistics: getDocCount, 9920 getSumDocFreq, getSumTotalTermFreq. (Shai Erera) 9921 9922* LUCENE-4862: It is now possible to terminate collection of a single 9923 IndexReader leaf by throwing a CollectionTerminatedException in 9924 Collector.collect. (Adrien Grand, Shai Erera) 9925 9926* LUCENE-4752: New SortingMergePolicy (in lucene/misc) that sorts documents 9927 before merging segments. (Adrien Grand, Shai Erera, David Smiley) 9928 9929* LUCENE-4860: Customize scoring and formatting per-field in 9930 PostingsHighlighter by subclassing and overriding the getFormatter 9931 and/or getScorer methods. This also changes Passage.getMatchTerms() 9932 to return BytesRef[] instead of Term[]. (Robert Muir, Mike 9933 McCandless) 9934 9935* LUCENE-4839: Added SorterTemplate.timSort, a O(n log n) stable sort algorithm 9936 that performs well on partially sorted data. (Adrien Grand) 9937 9938* LUCENE-4644: Added support for the "IsWithin" spatial predicate for 9939 RecursivePrefixTreeStrategy. It's for matching non-point indexed shapes; if 9940 you only have points (1/doc) then "Intersects" is equivalent and faster. 9941 See the javadocs. (David Smiley) 9942 9943* LUCENE-4861: Make BreakIterator per-field in PostingsHighlighter. This means 9944 you can override getBreakIterator(String field) to use different mechanisms 9945 for e.g. title vs. body fields. (Mike McCandless, Robert Muir) 9946 9947* LUCENE-4645: Added support for the "Contains" spatial predicate for 9948 RecursivePrefixTreeStrategy. (David Smiley) 9949 9950* LUCENE-4898: DirectoryReader.openIfChanged now allows opening a reader 9951 on an IndexCommit starting from a near-real-time reader (previously 9952 this would throw IllegalArgumentException). (Mike McCandless) 9953 9954* LUCENE-4905: Made the maxPassages parameter per-field in PostingsHighlighter. 9955 (Robert Muir) 9956 9957* LUCENE-4897: Added TaxonomyReader.getChildren for traversing a category's 9958 children. (Shai Erera) 9959 9960* LUCENE-4902: Added FilterDirectoryReader to allow easy filtering of a 9961 DirectoryReader's subreaders. (Alan Woodward, Adrien Grand, Uwe Schindler) 9962 9963* LUCENE-4858: Added EarlyTerminatingSortingCollector to be used in conjunction 9964 with SortingMergePolicy, which allows to early terminate queries on sorted 9965 indexes, when the sort order matches the index order. (Adrien Grand, Shai Erera) 9966 9967* LUCENE-4904: Added descending sort order to NumericDocValuesSorter. (Shai Erera) 9968 9969* LUCENE-3786: Added SearcherTaxonomyManager, to manage access to both 9970 IndexSearcher and DirectoryTaxonomyReader for near-real-time 9971 faceting. (Shai Erera, Mike McCandless) 9972 9973* LUCENE-4915: DrillSideways now allows drilling down on fields that 9974 are not faceted. (Mike McCandless) 9975 9976* LUCENE-4895: Added support for the "IsDisjointTo" spatial predicate for 9977 RecursivePrefixTreeStrategy. (David Smiley) 9978 9979* LUCENE-4774: Added FieldComparator that allows sorting parent documents based on 9980 fields on the child / nested document level. (Martijn van Groningen) 9981 9982Optimizations 9983 9984* LUCENE-4839: SorterTemplate.merge can now be overridden in order to replace 9985 the default implementation which merges in-place by a faster implementation 9986 that could require fewer swaps at the expense of some extra memory. 9987 ArrayUtil and CollectionUtil override it so that their mergeSort and timSort 9988 methods are faster but only require up to 1% of extra memory. (Adrien Grand) 9989 9990* LUCENE-4571: Speed up BooleanQuerys with minNrShouldMatch to use 9991 skipping. (Stefan Pohl via Robert Muir) 9992 9993* LUCENE-4863: StemmerOverrideFilter now uses a FST to represent its overrides 9994 in memory. (Simon Willnauer) 9995 9996* LUCENE-4889: UnicodeUtil.codePointCount implementation replaced with a 9997 non-array-lookup version. (Dawid Weiss) 9998 9999* LUCENE-4923: Speed up BooleanQuerys processing of in-order disjunctions. 10000 (Robert Muir) 10001 10002* LUCENE-4926: Speed up DisjunctionMatchQuery. (Robert Muir) 10003 10004* LUCENE-4930: Reduce contention in older/buggy JVMs when using 10005 AttributeSource#addAttribute() because java.lang.ref.ReferenceQueue#poll() 10006 is implemented using synchronization. (Christian Ziech, Karl Wright, 10007 Uwe Schindler) 10008 10009Bug Fixes 10010 10011* LUCENE-4868: SumScoreFacetsAggregator used an incorrect index into 10012 the scores array. (Shai Erera) 10013 10014* LUCENE-4882: FacetsAccumulator did not allow to count ROOT category (i.e. 10015 count dimensions). (Shai Erera) 10016 10017* LUCENE-4876: IndexWriterConfig.clone() now clones its MergeScheduler, 10018 IndexDeletionPolicy and InfoStream in order to make an IndexWriterConfig and 10019 its clone fully independent. (Adrien Grand) 10020 10021* LUCENE-4893: Facet counts were multiplied as many times as 10022 FacetsCollector.getFacetResults() is called. (Shai Erera) 10023 10024* LUCENE-4888: Fixed SloppyPhraseScorer, MultiDocs(AndPositions)Enum and 10025 MultiSpansWrapper which happened to sometimes call DocIdSetIterator.advance 10026 with target<=current (in this case the behavior of advance is undefined). 10027 (Adrien Grand) 10028 10029* LUCENE-4899: FastVectorHighlighter failed with StringIndexOutOfBoundsException 10030 if a single highlight phrase or term was greater than the fragCharSize producing 10031 negative string offsets. (Simon Willnauer) 10032 10033* LUCENE-4877: Throw exception for invalid arguments in analysis factories. 10034 (Steve Rowe, Uwe Schindler, Robert Muir) 10035 10036* LUCENE-4914: SpatialPrefixTree's Node/Cell.reset() forgot to reset the 'leaf' 10037 flag. It affects SpatialRecursivePrefixTreeStrategy on non-point indexed 10038 shapes, as of Lucene 4.2. (David Smiley) 10039 10040* LUCENE-4913: FacetResultNode.ordinal was always 0 when all children 10041 are returned. (Mike McCandless) 10042 10043* LUCENE-4918: Highlighter closes the given IndexReader if QueryScorer 10044 is used with an external IndexReader. (Simon Willnauer, Sirvan Yahyaei) 10045 10046* LUCENE-4880: Fix MemoryIndex to consume empty terms from the tokenstream consistent 10047 with IndexWriter. Previously it discarded them. (Timothy Allison via Robert Muir) 10048 10049* LUCENE-4885: FacetsAccumulator did not set the correct value for 10050 FacetResult.numValidDescendants. (Mike McCandless, Shai Erera) 10051 10052* LUCENE-4925: Fixed IndexSearcher.search when the argument list contains a Sort 10053 and one of the sort fields is the relevance score. Only IndexSearchers created 10054 with an ExecutorService are concerned. (Adrien Grand) 10055 10056* LUCENE-4738, LUCENE-2727, LUCENE-2812: Simplified 10057 DirectoryReader.indexExists so that it's more robust to transient 10058 IOExceptions (e.g. due to issues like file descriptor exhaustion), 10059 but this will also cause it to err towards returning true for 10060 example if the directory contains a corrupted index or an incomplete 10061 initial commit. In addition, IndexWriter with OpenMode.CREATE will 10062 now succeed even if the directory contains a corrupted index (Billow 10063 Gao, Robert Muir, Mike McCandless) 10064 10065* LUCENE-4928: Stored fields and term vectors could become super slow in case 10066 of tiny documents (a few bytes). This is especially problematic when switching 10067 codecs since bulk-merge strategies can't be applied and the same chunk of 10068 documents can end up being decompressed thousands of times. A hard limit on 10069 the number of documents per chunk has been added to fix this issue. 10070 (Robert Muir, Adrien Grand) 10071 10072* LUCENE-4934: Fix minor equals/hashcode problems in facet/DrillDownQuery, 10073 BoostingQuery, MoreLikeThisQuery, FuzzyLikeThisQuery, and block join queries. 10074 (Robert Muir, Uwe Schindler) 10075 10076* LUCENE-4504: Fix broken sort comparator in ValueSource.getSortField, 10077 used when sorting by a function query. (Tom Shally via Robert Muir) 10078 10079* LUCENE-4937: Fix incorrect sorting of float/double values (+/-0, NaN). 10080 (Robert Muir, Uwe Schindler) 10081 10082Documentation 10083 10084* LUCENE-4841: Added example SimpleSortedSetFacetsExample to show how 10085 to use the new SortedSetDocValues backed facet implementation. 10086 (Shai Erera, Mike McCandless) 10087 10088Build 10089 10090* LUCENE-4879: Upgrade randomized testing to version 2.0.9: 10091 Filter stack traces on console output. (Dawid Weiss, Robert Muir) 10092 10093 10094======================= Lucene 4.2.1 ======================= 10095 10096Bug Fixes 10097 10098* LUCENE-4713: The SPI components used to load custom codecs or analysis 10099 components were fixed to also scan the Lucene ClassLoader in addition 10100 to the context ClassLoader, so Lucene is always able to find its own 10101 codecs. The special case of a null context ClassLoader is now also 10102 supported. (Christian Kohlschütter, Uwe Schindler) 10103 10104* LUCENE-4819: seekExact(BytesRef, boolean) did not work correctly with 10105 Sorted[Set]DocValuesTermsEnum. (Robert Muir) 10106 10107* LUCENE-4826: PostingsHighlighter was not returning the top N best 10108 scoring passages. (Robert Muir, Mike McCandless) 10109 10110* LUCENE-4854: Fix DocTermOrds.getOrdTermsEnum() to not return negative 10111 ord on initial next(). (Robert Muir) 10112 10113* LUCENE-4836: Fix SimpleRateLimiter#pause to return the actual time spent 10114 sleeping instead of the wakeup timestamp in nano seconds. (Simon Willnauer) 10115 10116* LUCENE-4828: BooleanQuery no longer extracts terms from its MUST_NOT 10117 clauses. (Mike McCandless) 10118 10119* SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading 10120 of multivalued fields. (hossman) 10121 10122* LUCENE-4870: Fix bug where an entire index might be deleted by the IndexWriter 10123 due to false detection if an index exists in the directory when 10124 OpenMode.CREATE_OR_APPEND is used. This might also affect application that set 10125 the open mode manually using DirectoryReader#indexExists. (Simon Willnauer) 10126 10127* LUCENE-4878: Override getRegexpQuery in MultiFieldQueryParser to prevent 10128 NullPointerException when regular expression syntax is used with 10129 MultiFieldQueryParser. (Simon Willnauer, Adam Rauch) 10130 10131Optimizations 10132 10133* LUCENE-4819: Added Sorted[Set]DocValues.termsEnum(), and optimized the 10134 default codec for improved enumeration performance. (Robert Muir) 10135 10136* LUCENE-4854: Speed up TermsEnum of FieldCache.getDocTermOrds. 10137 (Robert Muir) 10138 10139* LUCENE-4857: Don't unnecessarily copy stem override map in 10140 StemmerOverrideFilter. (Simon Willnauer) 10141 10142======================= Lucene 4.2.0 ======================= 10143 10144Changes in backwards compatibility policy 10145 10146* LUCENE-4602: FacetFields now stores facet ordinals in a DocValues field, 10147 rather than a payload. This forces rebuilding existing indexes, or do a 10148 one time migration using FacetsPayloadMigratingReader. Since DocValues 10149 support in-memory caching, CategoryListCache was removed too. 10150 (Shai Erera, Michael McCandless) 10151 10152* LUCENE-4697: FacetResultNode is now a concrete class with public members 10153 (instead of getter methods). (Shai Erera) 10154 10155* LUCENE-4600: FacetsCollector is now an abstract class with two 10156 implementations: StandardFacetsCollector (the old version of 10157 FacetsCollector) and CountingFacetsCollector. FacetsCollector.create() 10158 returns the most optimized collector for the given parameters. 10159 (Shai Erera, Michael McCandless) 10160 10161* LUCENE-4700: OrdinalPolicy is now per CategoryListParams, and is no longer 10162 an interface, but rather an enum with values NO_PARENTS and ALL_PARENTS. 10163 PathPolicy was removed, you should extend FacetFields and DrillDownStream 10164 to control which categories are added as drill-down terms. (Shai Erera) 10165 10166* LUCENE-4547: DocValues improvements: 10167 - Simplified codec API: codecs are now only responsible for encoding and 10168 decoding docvalues, they do not need to do buffering or RAM accounting. 10169 - Per-Field support: added PerFieldDocValuesFormat, which allows you to 10170 use a different DocValuesFormat per field (like postings). 10171 - Unified with FieldCache api: DocValues can be accessed via FieldCache API, 10172 so it works automatically with grouping/join/sort/function queries, etc. 10173 - Simplified types: There are only 3 types (NUMERIC, BINARY, SORTED), so it's 10174 not necessary to specify for example that all of your binary values have 10175 the same length. Instead it's easy for the Codec API to optimize encoding 10176 based on any properties of the content. 10177 (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir) 10178 10179* LUCENE-4757: Cleanup and refactoring of FacetsAccumulator, FacetRequest, 10180 FacetsAggregator and FacetResultsHandler API. If your application did 10181 FacetsCollector.create(), you should not be affected, but if you wrote 10182 an Aggregator, then you should migrate it to the per-segment 10183 FacetsAggregator. You can still use StandardFacetsAccumulator, which works 10184 with the old API (for now). (Shai Erera) 10185 10186* LUCENE-4761: Facet packages reorganized. Should be easy to fix your import 10187 statements, if you use an IDE such as Eclipse. (Shai Erera) 10188 10189* LUCENE-4750: Convert DrillDown to DrillDownQuery, so you can initialize it 10190 and add drill-down categories to it. (Michael McCandless, Shai Erera) 10191 10192* LUCENE-4759: remove FacetRequest.SortBy; result categories are always 10193 sorted by value, while ties are broken by category ordinal. (Shai Erera) 10194 10195* LUCENE-4772: Facet associations moved to new FacetsAggregator API. You 10196 should override FacetsAccumulator and return the relevant aggregator, 10197 for aggregating the association values. (Shai Erera) 10198 10199* LUCENE-4748: A FacetRequest on a non-existent field now returns an 10200 empty FacetResult instead of skipping it. (Shai Erera, Mike McCandless) 10201 10202* LUCENE-4806: The default category delimiter character was changed 10203 from U+F749 to U+001F, since the latter uses 1 byte vs 3 bytes for 10204 the former. Existing facet indices must be reindexed. (Robert 10205 Muir, Shai Erera, Mike McCandless) 10206 10207Optimizations 10208 10209* LUCENE-4687: BloomFilterPostingsFormat now lazily initializes delegate 10210 TermsEnum only if needed to do a seek or get a DocsEnum. (Simon Willnauer) 10211 10212* LUCENE-4677, LUCENE-4682: unpacked FSTs now use vInt to encode the node target, 10213 to reduce their size (Mike McCandless) 10214 10215* LUCENE-4678: FST now uses a paged byte[] structure instead of a 10216 single byte[] internally, to avoid large memory spikes during 10217 building (James Dyer, Mike McCandless) 10218 10219* LUCENE-3298: FST can now be larger than 2.1 GB / 2.1 B nodes. 10220 (James Dyer, Mike McCandless) 10221 10222* LUCENE-4690: Performance improvements and non-hashing versions 10223 of NumericUtils.*ToPrefixCoded() (yonik) 10224 10225* LUCENE-4715: CategoryListParams.getOrdinalPolicy now allows to return a 10226 different OrdinalPolicy per dimension, to better tune how you index 10227 facets. Also added OrdinalPolicy.ALL_BUT_DIMENSION. 10228 (Shai Erera, Michael McCandless) 10229 10230* LUCENE-4740: Don't track clones of MMapIndexInput if unmapping 10231 is disabled. This reduces GC overhead. (Kristofer Karlsson, Uwe Schindler) 10232 10233* LUCENE-4733: The default Lucene 4.2 codec now uses a more compact 10234 TermVectorsFormat (Lucene42TermVectorsFormat) based on 10235 CompressingTermVectorsFormat. (Adrien Grand) 10236 10237* LUCENE-3729: The default Lucene 4.2 codec now uses a more compact 10238 DocValuesFormat (Lucene42DocValuesFormat). Sorted values are stored in an 10239 FST, Numerics and Ordinals use a number of strategies (delta-compression, 10240 table-compression, etc), and memory addresses use MonotonicBlockPackedWriter. 10241 (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir) 10242 10243* LUCENE-4792: Reduction of the memory required to build the doc ID maps used 10244 when merging segments. (Adrien Grand) 10245 10246* LUCENE-4794: Spatial RecursivePrefixTreeStrategy's search filter: Skip calls 10247 to termsEnum.seek() when the next term is known to follow the current cell. 10248 (David Smiley) 10249 10250New Features 10251 10252* LUCENE-4686: New specialized DGapVInt8IntEncoder for facets (now the 10253 default). (Shai Erera) 10254 10255* LUCENE-4703: Add simple PrintTaxonomyStats tool to see summary 10256 information about the facets taxonomy index. (Mike McCandless) 10257 10258* LUCENE-4599: New oal.codecs.compressing.CompressingTermVectorsFormat which 10259 compresses term vectors into chunks of documents similarly to 10260 CompressingStoredFieldsFormat. (Adrien Grand) 10261 10262* LUCENE-4695: Added LiveFieldValues utility class, for getting the 10263 current (live, real-time) value for any indexed doc/field. The 10264 class buffers recently indexed doc/field values until a new 10265 near-real-time reader is opened that contains those changes. 10266 (Robert Muir, Mike McCandless) 10267 10268* LUCENE-4723: Add AnalyzerFactoryTask to benchmark, and enable analyzer 10269 creation via the resulting factories using NewAnalyzerTask. (Steve Rowe) 10270 10271* LUCENE-4728: Unknown and not explicitly mapped queries are now rewritten 10272 against the highlighting IndexReader to obtain primitive queries before 10273 discarding the query entirely. WeightedSpanTermExtractor now builds a 10274 MemoryIndex only once even if multiple fields are highlighted. 10275 (Simon Willnauer) 10276 10277* LUCENE-4035: Added ICUCollationDocValuesField, more efficient 10278 support for Locale-sensitive sort and range queries for 10279 single-valued fields. (Robert Muir) 10280 10281* LUCENE-4547: Added MonotonicBlockPacked(Reader/Writer), which provide 10282 efficient random access to large amounts of monotonically increasing 10283 positive values (e.g. file offsets). Each block stores the minimum value 10284 and the average gap, and values are encoded as signed deviations from 10285 the expected value. (Adrien Grand) 10286 10287* LUCENE-4547: Added AppendingLongBuffer, an append-only buffer that packs 10288 signed long values in memory and provides an efficient iterator API. 10289 (Adrien Grand) 10290 10291* LUCENE-4540: It is now possible for a codec to represent norms with 10292 less than 8 bits per value. For performance reasons this is not done 10293 by default, but you can customize your codec (e.g. pass PackedInts.DEFAULT 10294 to Lucene42DocValuesConsumer) if you want to make this tradeoff. 10295 (Adrien Grand, Robert Muir) 10296 10297* LUCENE-4764: A new Facet42Codec and Facet42DocValuesFormat provide 10298 faster but more RAM-consuming facet performance. (Shai Erera, Mike 10299 McCandless) 10300 10301* LUCENE-4769: Added OrdinalsCache and CachedOrdsCountingFacetsAggregator 10302 which uses the cache to obtain a document's ordinals. This aggregator 10303 is faster than others, however consumes much more RAM. 10304 (Michael McCandless, Shai Erera) 10305 10306* LUCENE-4778: Add a getter for the delegate in RateLimitedDirectoryWrapper. 10307 (Mark Miller) 10308 10309* LUCENE-4765: Add a multi-valued docvalues type (SORTED_SET). This is equivalent 10310 to building a FieldCache.getDocTermOrds at index-time. (Robert Muir) 10311 10312* LUCENE-4780: Add MonotonicAppendingLongBuffer: an append-only buffer for 10313 monotonically increasing values. (Adrien Grand) 10314 10315* LUCENE-4748: Added DrillSideways utility class for computing both 10316 drill-down and drill-sideways counts for a DrillDownQuery. (Mike 10317 McCandless) 10318 10319API Changes 10320 10321* LUCENE-4709: FacetResultNode no longer has a residue field. (Shai Erera) 10322 10323* LUCENE-4716: DrillDown.query now takes Occur, allowing to specify if 10324 categories should be OR'ed or AND'ed. (Shai Erera) 10325 10326* LUCENE-4695: ReferenceManager.RefreshListener.afterRefresh now takes 10327 a boolean indicating whether a new reference was in fact opened, and 10328 a new beforeRefresh method notifies you when a refresh attempt is 10329 starting. (Robert Muir, Mike McCandless) 10330 10331* LUCENE-4794: Spatial RecursivePrefixTreeFilter replaced by 10332 IntersectsPrefixTreeFilter and some extensible base classes. (David Smiley) 10333 10334Bug Fixes 10335 10336* LUCENE-4705: Pass on FilterStrategy in FilteredQuery if the filtered query is 10337 rewritten. (Simon Willnauer) 10338 10339* LUCENE-4712: MemoryIndex#normValues() throws NPE if field doesn't exist. 10340 (Simon Willnauer, Ricky Pritchett) 10341 10342* LUCENE-4550: Shapes wider than 180 degrees would use too much accuracy for the 10343 PrefixTree based SpatialStrategy. For a pathological case of nearly 360 10344 degrees and barely any height, it would generate so many indexed terms 10345 (> 500k) that it could even cause an OutOfMemoryError. Fixed. (David Smiley) 10346 10347* LUCENE-4704: Make join queries override hashcode and equals methods. 10348 (Martijn van Groningen) 10349 10350* LUCENE-4724: Fix bug in CategoryPath which allowed passing null or empty 10351 string components. This is forbidden now (throws an exception). Note that if 10352 you have a taxonomy index created with such strings, you should rebuild it. 10353 (Michael McCandless, Shai Erera) 10354 10355* LUCENE-4732: Fixed TermsEnum.seekCeil/seekExact on term vectors. 10356 (Adrien Grand, Robert Muir) 10357 10358* LUCENE-4739: Fixed bugs that prevented FSTs more than ~1.1GB from 10359 being saved and loaded (Adrien Grand, Mike McCandless) 10360 10361* LUCENE-4717: Fixed bug where Lucene40DocValuesFormat would sometimes write 10362 an extra unused ordinal for sorted types. The bug is detected and corrected 10363 on-the-fly for old indexes. (Robert Muir) 10364 10365* LUCENE-4547: Fixed bug where Lucene40DocValuesFormat was unable to encode 10366 segments that would exceed 2GB total data. This could happen in some surprising 10367 cases, for example if you had an index with more than 260M documents and a 10368 VAR_INT field. (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir) 10369 10370* LUCENE-4775: Remove SegmentInfo.sizeInBytes() and make 10371 MergePolicy.OneMerge.totalBytesSize thread safe (Josh Bronson via 10372 Robert Muir, Mike McCandless) 10373 10374* LUCENE-4770: If spatial's TermQueryPrefixTreeStrategy was used to search 10375 indexed non-point shapes, then there was an edge case where a query should 10376 find a shape but it didn't. The fix is the removal of an optimization that 10377 simplifies some leaf cells into a parent. The index data for such a field is 10378 now ~20% larger. This optimization is still done for the query shape, and for 10379 indexed data for RecursivePrefixTreeStrategy. Furthermore, this optimization 10380 is enhanced to roll up beyond the bottom cell level. (David Smiley, 10381 Florian Schilling) 10382 10383* LUCENE-4790: Fix FieldCacheImpl.getDocTermOrds to not bake deletes into the 10384 cached datastructure. Otherwise this can cause inconsistencies with readers 10385 at different points in time. (Robert Muir) 10386 10387* LUCENE-4791: A conjunction of terms (ConjunctionTermScorer) scanned on 10388 the lowest frequency term instead of skipping, leading to potentially 10389 large performance impacts for many non-random or non-uniform 10390 term distributions. (John Wang, yonik) 10391 10392* LUCENE-4798: PostingsHighlighter's formatter sometimes didn't highlight 10393 matched terms. (Robert Muir) 10394 10395* LUCENE-4796, SOLR-4373: Fix concurrency issue in NamedSPILoader and 10396 AnalysisSPILoader when doing reload (e.g. from Solr). 10397 (Uwe Schindler, Hossman) 10398 10399* LUCENE-4802: Don't compute norms for drill-down facet fields. (Mike McCandless) 10400 10401* LUCENE-4804: PostingsHighlighter sometimes applied terms to the wrong passage, 10402 if they started exactly on a passage boundary. (Robert Muir) 10403 10404Documentation 10405 10406* LUCENE-4718: Fixed documentation of oal.queryparser.classic. 10407 (Hayden Muhl via Adrien Grand) 10408 10409* LUCENE-4784, LUCENE-4785, LUCENE-4786: Fixed references to deprecated classes 10410 SinkTokenizer, ValueSourceQuery and RangeQuery. (Hao Zhong via Adrien Grand) 10411 10412Build 10413 10414* LUCENE-4654: Test duration statistics from multiple test runs should be 10415 reused. (Dawid Weiss) 10416 10417* LUCENE-4636: Upgrade ivy to 2.3.0 (Shawn Heisey via Robert Muir) 10418 10419* LUCENE-4570: Use the Policeman Forbidden API checker, released separately 10420 from Lucene and downloaded via Ivy. (Uwe Schindler, Robert Muir) 10421 10422* LUCENE-4758: 'ant jar', 'ant compile', and 'ant compile-test' should 10423 recurse. (Steve Rowe) 10424 10425======================= Lucene 4.1.0 ======================= 10426 10427Changes in backwards compatibility policy 10428 10429* LUCENE-4514: Scorer's freq() method returns an integer value indicating 10430 the number of times the scorer matches the current document. Previously 10431 this was only sometimes the case, in some cases it returned a (meaningless) 10432 floating point value. Scorer now extends DocsEnum so it has attributes(). 10433 (Robert Muir) 10434 10435* LUCENE-4543: TFIDFSimilarity's index-time computeNorm is now final to 10436 match the fact that its query-time norm usage requires a FIXED_8 encoding. 10437 Override lengthNorm and/or encode/decodeNormValue to change the specifics, 10438 like Lucene 3.x. (Robert Muir) 10439 10440* LUCENE-3441: The facet module now supports NRT. As a result, the following 10441 changes were made: 10442 - DirectoryTaxonomyReader has a new constructor which takes a 10443 DirectoryTaxonomyWriter. You should use that constructor in order to get 10444 the NRT support (or the old one for non-NRT). 10445 - TaxonomyReader.refresh() removed in exchange for TaxonomyReader.openIfChanged 10446 static method. Similar to DirectoryReader, the method either returns null 10447 if no changes were made to the taxonomy, or a new TR instance otherwise. 10448 Instead of calling refresh(), you should write similar code to how you reopen 10449 a regular DirectoryReader. 10450 - TaxonomyReader.openIfChanged (previously refresh()) no longer throws 10451 InconsistentTaxonomyException, and supports recreate. InconsistentTaxoEx 10452 was removed. 10453 - ChildrenArrays was pulled out of TaxonomyReader into a top-level class. 10454 - TaxonomyReader was made an abstract class (instead of an interface), with 10455 methods such as close() and reference counting management pulled from 10456 DirectoryTaxonomyReader, and made final. The rest of the methods, remained 10457 abstract. 10458 (Shai Erera, Gilad Barkai) 10459 10460* LUCENE-4576: Remove CachingWrapperFilter(Filter, boolean). This recacheDeletes 10461 option gave less than 1% speedup at the expense of cache churn (filters were 10462 invalidated on reopen if even a single delete was posted against the segment). 10463 (Robert Muir) 10464 10465* LUCENE-4575: Replace IndexWriter's commit/prepareCommit versions that take 10466 commitData with setCommitData(). That allows committing changes to IndexWriter 10467 even if the commitData is the only thing that changes. 10468 (Shai Erera, Michael McCandless) 10469 10470* LUCENE-4565: TaxonomyReader.getParentArray and .getChildrenArrays consolidated 10471 into one getParallelTaxonomyArrays(). You can obtain the 3 arrays that the 10472 previous two methods returned by calling parents(), children() or siblings() 10473 on the returned ParallelTaxonomyArrays. (Shai Erera) 10474 10475* LUCENE-4585: Spatial PrefixTree based Strategies (either TermQuery or 10476 RecursivePrefix based) MAY want to re-index if used for point data. If a 10477 re-index is not done, then an indexed point is ~1/2 the smallest grid cell 10478 larger and as such is slightly more likely to match a query shape. 10479 (David Smiley) 10480 10481* LUCENE-4604: DefaultOrdinalPolicy removed in favor of OrdinalPolicy.ALL_PARENTS. 10482 Same for DefaultPathPolicy (now PathPolicy.ALL_CATEGORIES). In addition, you 10483 can use OrdinalPolicy.NO_PARENTS to never write any parent category ordinal 10484 to the fulltree posting payload (but note that you need a special 10485 FacetsAccumulator - see javadocs). (Shai Erera) 10486 10487* LUCENE-4594: Spatial PrefixTreeStrategy no longer indexes center points of 10488 non-point shapes. If you want to call makeDistanceValueSource() based on 10489 shape centers, you need to do this yourself in another spatial field. 10490 (David Smiley) 10491 10492* LUCENE-4615: Replace IntArrayAllocator and FloatArrayAllocator by ArraysPool. 10493 FacetArrays no longer takes those allocators; if you need to reuse the arrays, 10494 you should use ReusingFacetArrays. (Shai Erera, Gilad Barkai) 10495 10496* LUCENE-4621: FacetIndexingParams is now a concrete class (instead of DefaultFIP). 10497 Also, the entire IndexingParams chain is now immutable. If you need to override 10498 a setting, you should extend the relevant class. 10499 Additionally, FacetSearchParams is now immutable, and requires all FacetRequests 10500 to specified at initialization time. (Shai Erera) 10501 10502* LUCENE-4647: CategoryDocumentBuilder and EnhancementsDocumentBuilder are replaced 10503 by FacetFields and AssociationsFacetFields respectively. CategoryEnhancement and 10504 AssociationEnhancement were removed in favor of a simplified CategoryAssociation 10505 interface, with CategoryIntAssociation and CategoryFloatAssociation 10506 implementations. 10507 NOTE: indexes that contain category enhancements/associations are not supported 10508 by the new code and should be recreated. (Shai Erera) 10509 10510* LUCENE-4659: Massive cleanup to CategoryPath API. Additionally, CategoryPath is 10511 now immutable, so you don't need to clone() it. (Shai Erera) 10512 10513* LUCENE-4670: StoredFieldsWriter and TermVectorsWriter have new finish* callbacks 10514 which are called after a doc/field/term has been completely added. 10515 (Adrien Grand, Robert Muir) 10516 10517* LUCENE-4620: IntEncoder/Decoder were changed to do bulk encoding/decoding. As a 10518 result, few other classes such as Aggregator and CategoryListIterator were 10519 changed to handle bulk category ordinals. (Shai Erera) 10520 10521* LUCENE-4683: CategoryListIterator and Aggregator are now per-segment. As such 10522 their implementations no longer take a top-level IndexReader in the constructor 10523 but rather implement a setNextReader. (Shai Erera) 10524 10525New Features 10526 10527* LUCENE-4226: New experimental StoredFieldsFormat that compresses chunks of 10528 documents together in order to improve the compression ratio. (Adrien Grand) 10529 10530* LUCENE-4426: New ValueSource implementations (in lucene/queries) for 10531 DocValues fields. (Adrien Grand) 10532 10533* LUCENE-4410: FilteredQuery now exposes a FilterStrategy that exposes 10534 how filters are applied during query execution. (Simon Willnauer) 10535 10536* LUCENE-4404: New ListOfOutputs (in lucene/misc) for FSTs wraps 10537 another Outputs implementation, allowing you to store more than one 10538 output for a single input. UpToTwoPositiveIntsOutputs was moved 10539 from lucene/core to lucene/misc. (Mike McCandless) 10540 10541* LUCENE-3842: New AnalyzingSuggester, for doing auto-suggest 10542 using an analyzer. This can create powerful suggesters: if the analyzer 10543 remove stop words then "ghost chr..." could suggest "The Ghost of 10544 Christmas Past"; if SynonymFilter is used to map wifi and wireless 10545 network to hotspot, then "wirele..." could suggest "wifi router"; 10546 token normalization likes stemmers, accent removal, etc. would allow 10547 the suggester to ignore such variations. (Robert Muir, Sudarshan 10548 Gaikaiwari, Mike McCandless) 10549 10550* LUCENE-4446: Lucene 4.1 has a new default index format (Lucene41Codec) 10551 that incorporates the previously experimental "Block" postings format 10552 for better search performance. 10553 (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless) 10554 10555* LUCENE-3846: New FuzzySuggester, like AnalyzingSuggester except it 10556 also finds completions allowing for fuzzy edits in the input string. 10557 (Robert Muir, Simon Willnauer, Mike McCandless) 10558 10559* LUCENE-4515: MemoryIndex now supports adding the same field multiple 10560 times. (Simon Willnauer) 10561 10562* LUCENE-4489: Added consumeAllTokens option to LimitTokenCountFilter 10563 (hossman, Robert Muir) 10564 10565* LUCENE-4566: Add NRT/SearcherManager.RefreshListener/addListener to 10566 be notified whenever a new searcher was opened. (selckin via Shai 10567 Erera, Mike McCandless) 10568 10569* SOLR-4123: Add per-script customizability to ICUTokenizerFactory via 10570 rule files in the ICU RuleBasedBreakIterator format. 10571 (Shawn Heisey, Robert Muir, Steve Rowe) 10572 10573* LUCENE-4590: Added WriteEnwikiLineDocTask - a benchmark task for writing 10574 Wikipedia category pages and non-category pages into separate line files. 10575 extractWikipedia.alg was changed to use this task, so now it creates two 10576 files. (Doron Cohen) 10577 10578* LUCENE-4290: Added PostingsHighlighter to the highlighter module. It uses 10579 offsets from the postings lists to highlight documents. (Robert Muir) 10580 10581* LUCENE-4628: Added CommonTermsQuery that executes high-frequency terms 10582 in a optional sub-query to prevent slow queries due to "common" terms 10583 like stopwords. (Simon Willnauer) 10584 10585API Changes 10586 10587* LUCENE-4399: Deprecated AppendingCodec. Lucene's term dictionaries 10588 no longer seek when writing. (Adrien Grand, Robert Muir) 10589 10590* LUCENE-4479: Rename TokenStream.getTokenStream(IndexReader, int, String) 10591 to TokenStream.getTokenStreamWithOffsets, and return null on failure 10592 rather than throwing IllegalArgumentException. (Alan Woodward) 10593 10594* LUCENE-4472: MergePolicy now accepts a MergeTrigger that provides 10595 information about the trigger of the merge ie. merge triggered due 10596 to a segment merge or a full flush etc. (Simon Willnauer) 10597 10598* LUCENE-4415: TermsFilter is now immutable. All terms need to be provided 10599 as constructor argument. (Simon Willnauer) 10600 10601* LUCENE-4520: ValueSource.getSortField no longer throws IOExceptions 10602 (Alan Woodward) 10603 10604* LUCENE-4537: RateLimiter is now separated from FSDirectory and exposed via 10605 RateLimitingDirectoryWrapper. Any Directory can now be rate-limited. 10606 (Simon Willnauer) 10607 10608* LUCENE-4591: CompressingStoredFields{Writer,Reader} now accept a segment 10609 suffix as a constructor parameter. (Renaud Delbru via Adrien Grand) 10610 10611* LUCENE-4605: Added DocsEnum.FLAG_NONE which can be passed instead of 0 as 10612 the flag to .docs() and .docsAndPositions(). (Shai Erera) 10613 10614* LUCENE-4617: Remove FST.pack() method. Previously to make a packed FST, 10615 you had to make a Builder with willPackFST=true (telling it you will later pack it), 10616 create your fst with finish(), and then call pack() to get another FST. 10617 Instead just pass true for doPackFST to Builder and finish() returns a packed FST. 10618 (Robert Muir) 10619 10620* LUCENE-4663: Deprecate IndexSearcher.document(int, Set). This was not intended 10621 to be final, nor named document(). Use IndexSearcher.doc(int, Set) instead. 10622 (Robert Muir) 10623 10624* LUCENE-4684: Made DirectSpellChecker extendable. 10625 (Martijn van Groningen) 10626 10627Bug Fixes 10628 10629* LUCENE-1822: BaseFragListBuilder hard-coded 6 char margin is too naive. 10630 (Alex Vigdor, Arcadius Ahouansou, Koji Sekiguchi) 10631 10632* LUCENE-4468: Fix rareish integer overflows in Lucene41 postings 10633 format. (Robert Muir) 10634 10635* LUCENE-4486: Add support for ConstantScoreQuery in Highlighter. 10636 (Simon Willnauer) 10637 10638* LUCENE-4485: When CheckIndex terms, terms/docs pairs and tokens, 10639 these counts now all exclude deleted documents. (Mike McCandless) 10640 10641* LUCENE-4479: Highlighter works correctly for fields with term vector 10642 positions, but no offsets. (Alan Woodward) 10643 10644* SOLR-3906: JapaneseReadingFormFilter in romaji mode will return 10645 romaji even for out-of-vocabulary kana cases (e.g. half-width forms). 10646 (Robert Muir) 10647 10648* LUCENE-4511: TermsFilter might return wrong results if a field is not 10649 indexed or doesn't exist in the index. (Simon Willnauer) 10650 10651* LUCENE-4521: IndexWriter.tryDeleteDocument could return true 10652 (successfully deleting the document) but then on IndexWriter 10653 close/commit fail to write the new deletions, if no other changes 10654 happened in the IndexWriter instance. (Ivan Vasilev via Mike 10655 McCandless) 10656 10657* LUCENE-4513: Fixed that deleted nested docs are scored into the 10658 parent doc when using ToParentBlockJoinQuery. (Martijn van Groningen) 10659 10660* LUCENE-4534: Fixed WFSTCompletionLookup and Analyzing/FuzzySuggester 10661 to allow 0 byte values in the lookup keys. (Mike McCandless) 10662 10663* LUCENE-4532: DirectoryTaxonomyWriter use a timestamp to denote taxonomy 10664 index re-creation, which could cause a bug in case machine clocks were 10665 not synced. Instead, it now tracks an 'epoch' version, which is incremented 10666 whenever the taxonomy is re-created, or replaced. (Shai Erera) 10667 10668* LUCENE-4544: Fixed off-by-1 in ConcurrentMergeScheduler that would 10669 allow 1+maxMergeCount merges threads to be created, instead of just 10670 maxMergeCount (Radim Kolar, Mike McCandless) 10671 10672* LUCENE-4567: Fixed NullPointerException in analyzing, fuzzy, and 10673 WFST suggesters when no suggestions were added (selckin via Mike 10674 McCandless) 10675 10676* LUCENE-4568: Fixed integer overflow in 10677 PagedBytes.PagedBytesData{In,Out}put.getPosition. (Adrien Grand) 10678 10679* LUCENE-4581: GroupingSearch.setAllGroups(true) was failing to 10680 actually compute allMatchingGroups (dizh@neusoft.com via Mike 10681 McCandless) 10682 10683* LUCENE-4009: Improve TermsFilter.toString (Tim Costermans via Chris 10684 Male, Mike McCandless) 10685 10686* LUCENE-4588: Benchmark's EnwikiContentSource was discarding last wiki 10687 document and had leaking threads in 'forever' mode. (Doron Cohen) 10688 10689* LUCENE-4585: Spatial RecursivePrefixTreeFilter had some bugs that only 10690 occurred when shapes were indexed. In what appears to be rare circumstances, 10691 documents with shapes near a query shape were erroneously considered a match. 10692 In addition, it wasn't possible to index a shape representing the entire 10693 globe. 10694 10695* LUCENE-4595: EnwikiContentSource had a thread safety problem (NPE) in 10696 'forever' mode (Doron Cohen) 10697 10698* LUCENE-4587: fix WordBreakSpellChecker to not throw AIOOBE when presented 10699 with 2-char codepoints, and to correctly break/combine terms containing 10700 non-latin characters. (James Dyer, Andreas Hubold) 10701 10702* LUCENE-4596: fix a concurrency bug in DirectoryTaxonomyWriter. 10703 (Shai Erera) 10704 10705* LUCENE-4594: Spatial PrefixTreeStrategy would index center-points in addition 10706 to the shape to index if it was non-point, in the same field. But sometimes 10707 the center-point isn't actually in the shape (consider a LineString), and for 10708 highly precise shapes it could cause makeDistanceValueSource's cache to load 10709 parts of the shape's boundary erroneously too. So center points aren't 10710 indexed any more; you should use another spatial field. (David Smiley) 10711 10712* LUCENE-4629: IndexWriter misses to delete documents if a document block is 10713 indexed and the Iterator throws an exception. Documents were only rolled back 10714 if the actual indexing process failed. (Simon Willnauer) 10715 10716* LUCENE-4608: Handle large number of requested fragments better. 10717 (Martijn van Groningen) 10718 10719* LUCENE-4633: DirectoryTaxonomyWriter.replaceTaxonomy did not refresh its 10720 internal reader, which could cause an existing category to be added twice. 10721 (Shai Erera) 10722 10723* LUCENE-4461: If you added the same FacetRequest more than once, you would get 10724 inconsistent results. (Gilad Barkai via Shai Erera) 10725 10726* LUCENE-4656: Fix regression in IndexWriter to work with empty TokenStreams 10727 that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute), 10728 e.g., oal.analysis.miscellaneous.EmptyTokenStream. 10729 (Uwe Schindler, Adrien Grand, Robert Muir) 10730 10731* LUCENE-4660: ConcurrentMergeScheduler was taking too long to 10732 un-pause incoming threads it had paused when too many merges were 10733 queued up. (Mike McCandless) 10734 10735* LUCENE-4662: Add missing elided articles and prepositions to FrenchAnalyzer's 10736 DEFAULT_ARTICLES list passed to ElisionFilter. (David Leunen via Steve Rowe) 10737 10738* LUCENE-4671: Fix CharsRef.subSequence method. (Tim Smith via Robert Muir) 10739 10740* LUCENE-4465: Let ConstantScoreQuery's Scorer return its child scorer. 10741 (selckin via Uwe Schindler) 10742 10743Changes in Runtime Behavior 10744 10745* LUCENE-4586: Change default ResultMode of FacetRequest to PER_NODE_IN_TREE. 10746 This only affects requests with depth>1. If you execute such requests and 10747 rely on the facet results being returned flat (i.e. no hierarchy), you should 10748 set the ResultMode to GLOBAL_FLAT. (Shai Erera, Gilad Barkai) 10749 10750* LUCENE-1822: Improves the text window selection by recalculating the starting margin 10751 once all phrases in the fragment have been identified in FastVectorHighlighter. This 10752 way if a single word is matched in a fragment, it will appear in the middle of the highlight, 10753 instead of 6 characters from the beginning. This way one can also guarantee that 10754 the entirety of short texts are represented in a fragment by specifying a large 10755 enough fragCharSize. 10756 10757Optimizations 10758 10759* LUCENE-2221: oal.util.BitUtil was modified to use Long.bitCount and 10760 Long.numberOfTrailingZeros (which are intrinsics since Java 6u18) instead of 10761 pure java bit twiddling routines in order to improve performance on modern 10762 JVMs/hardware. (Dawid Weiss, Adrien Grand) 10763 10764* LUCENE-4509: Enable stored fields compression by default in the Lucene 4.1 10765 default codec. (Adrien Grand) 10766 10767* LUCENE-4536: PackedInts on-disk format is now byte-aligned (it used to be 10768 long-aligned), saving up to 7 bytes per array of values. 10769 (Adrien Grand, Mike McCandless) 10770 10771* LUCENE-4512: Additional memory savings for CompressingStoredFieldsFormat. 10772 (Adrien Grand, Robert Muir) 10773 10774* LUCENE-4443: Lucene41PostingsFormat no longer writes unnecessary offsets 10775 into the skipdata. (Robert Muir) 10776 10777* LUCENE-4459: Improve WeakIdentityMap.keyIterator() to remove GCed keys 10778 from backing map early instead of waiting for reap(). This makes test 10779 failures in TestWeakIdentityMap disappear, too. 10780 (Uwe Schindler, Mike McCandless, Robert Muir) 10781 10782* LUCENE-4473: Lucene41PostingsFormat encodes offsets more efficiently 10783 for low frequency terms (< 128 occurrences). (Robert Muir) 10784 10785* LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds 10786 CFS files if necessary during segment flush and not during publishing. The latter 10787 was a single threaded process while now all IO and CPU heavy computation is done 10788 concurrently in DocumentsWriterPerThread. (Simon Willnauer) 10789 10790* LUCENE-4496: Optimize Lucene41PostingsFormat when requesting a subset of 10791 the postings data (via flags to TermsEnum.docs/docsAndPositions) to use 10792 ForUtil.skipBlock. (Robert Muir) 10793 10794* LUCENE-4497: Don't write PosVIntCount to the positions file in 10795 Lucene41PostingsFormat, as it's always totalTermFreq % BLOCK_SIZE. (Robert Muir) 10796 10797* LUCENE-4498: In Lucene41PostingsFormat, when a term appears in only one document, 10798 Instead of writing a file pointer to a VIntBlock containing the doc id, just 10799 write the doc id. (Mike McCandless, Robert Muir) 10800 10801* LUCENE-4515: MemoryIndex now uses Byte/IntBlockPool internally to hold terms and 10802 posting lists. All index data is represented as consecutive byte/int arrays to 10803 reduce GC cost and memory overhead. (Simon Willnauer) 10804 10805* LUCENE-4538: DocValues now caches direct sources in a ThreadLocal exposed via SourceCache. 10806 Users of this API can now simply obtain an instance via DocValues#getDirectSource per thread. 10807 (Simon Willnauer) 10808 10809* LUCENE-4580: DrillDown.query variants return a ConstantScoreQuery with boost set to 0.0f 10810 so that documents scores are not affected by running a drill-down query. (Shai Erera) 10811 10812* LUCENE-4598: PayloadIterator no longer uses top-level IndexReader to iterate on the 10813 posting's payload. (Shai Erera, Michael McCandless) 10814 10815* LUCENE-4661: Drop default maxThreadCount to 1 and maxMergeCount to 2 10816 in ConcurrentMergeScheduler, for faster merge performance on 10817 spinning-magnet drives (Mike McCandless) 10818 10819Documentation 10820 10821* LUCENE-4483: Refer to BytesRef.deepCopyOf in Term's constructor that takes BytesRef. 10822 (Paul Elschot via Robert Muir) 10823 10824Build 10825 10826* LUCENE-4650: Upgrade randomized testing to version 2.0.8: make the 10827 test framework more robust under low memory conditions. (Dawid Weiss) 10828 10829* LUCENE-4603: Upgrade randomized testing to version 2.0.5: print forked 10830 JVM PIDs on heartbeat from hung tests (Dawid Weiss) 10831 10832* Upgrade randomized testing to version 2.0.4: avoid hangs on shutdown 10833 hooks hanging forever by calling Runtime.halt() in addition to 10834 Runtime.exit() after a short delay to allow graceful shutdown (Dawid Weiss) 10835 10836* LUCENE-4451: Memory leak per unique thread caused by 10837 RandomizedContext.contexts static map. Upgrade randomized testing 10838 to version 2.0.2 (Mike McCandless, Dawid Weiss) 10839 10840* LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version 10841 1.9.17, removing the workaround in Lucene's HTML parser for the 10842 Turkish locale. (Uwe Schindler) 10843 10844* LUCENE-4601: Fix ivy availability check to use typefound, so it works 10845 if called from another build file. (Ryan Ernst via Robert Muir) 10846 10847 10848======================= Lucene 4.0.0 ======================= 10849 10850Changes in backwards compatibility policy 10851 10852* LUCENE-4392: Class org.apache.lucene.util.SortedVIntList has been removed. 10853 (Adrien Grand) 10854 10855* LUCENE-4393: RollingCharBuffer has been moved to the o.a.l.analysis.util 10856 package of lucene-analysis-common. (Adrien Grand) 10857 10858New Features 10859 10860* LUCENE-1888: Added the option to store payloads in the term 10861 vectors (IndexableFieldType.storeTermVectorPayloads()). Note 10862 that you must store term vector positions to store payloads. 10863 (Robert Muir) 10864 10865* LUCENE-3892: Add a new BlockPostingsFormat that bulk-encodes docs, 10866 freqs and positions in large (size 128) packed-int blocks for faster 10867 search performance. This was from Han Jiang's 2012 Google Summer of 10868 Code project (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless) 10869 10870* LUCENE-4323: Added support for an absolute maximum CFS segment size 10871 (in MiB) to LogMergePolicy and TieredMergePolicy. 10872 (Alexey Lef via Uwe Schindler) 10873 10874* LUCENE-4339: Allow deletes against 3.x segments for easier upgrading. 10875 Lucene3x Codec is still otherwise read-only, you should not set it 10876 as the default Codec on IndexWriter, because it cannot write new segments. 10877 (Mike McCandless, Robert Muir) 10878 10879* SOLR-3441: ElisionFilterFactory is now MultiTermAware 10880 (Jack Krupansky via hossman) 10881 10882API Changes 10883 10884* LUCENE-4391, LUCENE-4440: All methods of Lucene40Codec but 10885 getPostingsFormatForField are now final. To reuse functionality 10886 of Lucene40, you should extend FilterCodec and delegate to Lucene40 10887 instead of extending Lucene40Codec. (Adrien Grand, Shai Erera, 10888 Robert Muir, Uwe Schindler) 10889 10890* LUCENE-4299: Added Terms.hasPositions() and Terms.hasOffsets(). 10891 Previously you had no real way to know that a term vector field 10892 had positions or offsets, since this can be configured on a 10893 per-field-per-document basis. (Robert Muir) 10894 10895* Removed DocsAndPositionsEnum.hasPayload() and simplified the 10896 contract of getPayload(). It returns null if there is no payload, 10897 otherwise returns the current payload. You can now call it multiple 10898 times per position if you want. (Robert Muir) 10899 10900* Removed FieldsEnum. Fields API instead implements Iterable<String> 10901 and exposes Iterator, so you can iterate over field names with 10902 for (String field : fields) instead. (Robert Muir) 10903 10904* LUCENE-4152: added IndexReader.leaves(), which lets you enumerate 10905 the leaf atomic reader contexts for all readers in the tree. 10906 (Uwe Schindler, Robert Muir) 10907 10908* LUCENE-4304: removed PayloadProcessorProvider. If you want to change 10909 payloads (or other things) when merging indexes, it's recommended 10910 to just use a FilterAtomicReader + IndexWriter.addIndexes. See the 10911 OrdinalMappingAtomicReader and TaxonomyMergeUtils in the facets 10912 module if you want an example of this. 10913 (Mike McCandless, Uwe Schindler, Shai Erera, Robert Muir) 10914 10915* LUCENE-4304: Make CompositeReader.getSequentialSubReaders() 10916 protected. To get atomic leaves of any IndexReader use the new method 10917 leaves() (LUCENE-4152), which lists AtomicReaderContexts including 10918 the doc base of each leaf. (Uwe Schindler, Robert Muir) 10919 10920* LUCENE-4307: Renamed IndexReader.getTopReaderContext to 10921 IndexReader.getContext. (Robert Muir) 10922 10923* LUCENE-4316: Deprecate Fields.getUniqueTermCount and remove it from 10924 AtomicReader. If you really want the unique term count across all 10925 fields, just sum up Terms.size() across those fields. This method 10926 only exists so that this statistic can be accessed for Lucene 3.x 10927 segments, which don't support Terms.size(). (Uwe Schindler, Robert Muir) 10928 10929* LUCENE-4321: Change CharFilter to extend Reader directly, as FilterReader 10930 overdelegates (read(), read(char[], int, int), skip, etc). This made it 10931 hard to implement CharFilters that were correct. Instead only close() is 10932 delegated by default: read(char[], int, int) and correct(int) are abstract 10933 so that it's obvious which methods you should implement. The protected 10934 inner Reader is 'input' like CharFilter in the 3.x series, instead of 'in'. 10935 (Dawid Weiss, Uwe Schindler, Robert Muir) 10936 10937* LUCENE-3309: The expert FieldSelector API, used to load only certain 10938 fields in a stored document, has been replaced with the simpler 10939 StoredFieldVisitor API. (Mike McCandless) 10940 10941* LUCENE-4343: Made Tokenizer.setReader final. This is a setter that should 10942 not be overridden by subclasses: per-stream initialization should happen 10943 in reset(). (Robert Muir) 10944 10945* LUCENE-4377: Remove IndexInput.copyBytes(IndexOutput, long). 10946 Use DataOutput.copyBytes(DataInput, long) instead. 10947 (Mike McCandless, Robert Muir) 10948 10949* LUCENE-4355: Simplify AtomicReader's sugar methods such as termDocsEnum, 10950 termPositionsEnum, docFreq, and totalTermFreq to only take Term as a 10951 parameter. If you want to do expert things such as pass a different 10952 Bits as liveDocs, then use the flex apis (fields(), terms(), etc) directly. 10953 (Mike McCandless, Robert Muir) 10954 10955* LUCENE-4425: clarify documentation of StoredFieldVisitor.binaryValue 10956 and simplify the api to binaryField(FieldInfo, byte[]). 10957 (Adrien Grand, Robert Muir) 10958 10959Bug Fixes 10960 10961* LUCENE-4423: DocumentStoredFieldVisitor.binaryField ignored offset and 10962 length. (Adrien Grand) 10963 10964* LUCENE-4297: BooleanScorer2 would multiply the coord() factor 10965 twice for conjunctions: for most users this is no problem, but 10966 if you had a customized Similarity that returned something other 10967 than 1 when overlap == maxOverlap (always the case for conjunctions), 10968 then the score would be incorrect. (Pascal Chollet, Robert Muir) 10969 10970* LUCENE-4298: MultiFields.getTermDocsEnum(IndexReader, Bits, String, BytesRef) 10971 did not work at all, it would infinitely recurse. 10972 (Alberto Paro via Robert Muir) 10973 10974* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you 10975 had a custom Similarity where coord(1,1) != 1F, then the rewritten 10976 query would be scored differently. (Robert Muir) 10977 10978* Don't allow negatives in the positions file. If you have an index 10979 from 2.4.0 or earlier with such negative positions, and you already 10980 upgraded to 3.x, then to Lucene 4.0-ALPHA or -BETA, you should run 10981 CheckIndex. If it fails, then you need to upgrade again to 4.0 (Robert Muir) 10982 10983* LUCENE-4303: PhoneticFilterFactory and SnowballPorterFilterFactory load their 10984 encoders / stemmers via the ResourceLoader now instead of Class.forName(). 10985 Solr users should now no longer have to embed these in its war. (David Smiley) 10986 10987* SOLR-3737: StempelPolishStemFilterFactory loaded its stemmer table incorrectly. 10988 Also, ensure immutability and use only one instance of this table in RAM (lazy 10989 loaded) since it's quite large. (sausarkar, Steven Rowe, Robert Muir) 10990 10991* LUCENE-4310: MappingCharFilter was failing to match input strings 10992 containing non-BMP Unicode characters. (Dawid Weiss, Robert Muir, 10993 Mike McCandless) 10994 10995* LUCENE-4224: Add in-order scorer to query time joining and the 10996 out-of-order scorer throws an UOE. (Martijn van Groningen, Robert Muir) 10997 10998* LUCENE-4333: Fixed NPE in TermGroupFacetCollector when faceting on mv fields. 10999 (Jesse MacVicar, Martijn van Groningen) 11000 11001* LUCENE-4218: Document.get(String) and Field.stringValue() again return 11002 values for numeric fields, like Lucene 3.x and consistent with the documentation. 11003 (Jamie, Uwe Schindler, Robert Muir) 11004 11005* NRTCachingDirectory was always caching a newly flushed segment in 11006 RAM, instead of checking the estimated size of the segment 11007 to decide whether to cache it. (Mike McCandless) 11008 11009* LUCENE-3720: fix memory-consumption issues with BeiderMorseFilter. 11010 (Thomas Neidhart via Robert Muir) 11011 11012* LUCENE-4401: Fix bug where DisjunctionSumScorer would sometimes call score() 11013 on a subscorer that had already returned NO_MORE_DOCS. (Liu Chao, Robert Muir) 11014 11015* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth 11016 parameter is reset to the default (1), even if set otherwise. 11017 (Gilad Barkai via Shai Erera) 11018 11019* LUCENE-4455: Fix bug in SegmentInfoPerCommit.sizeInBytes() that was 11020 returning 2X the true size, inefficiently. Also fixed bug in 11021 CheckIndex that would report no deletions when a segment has 11022 deletions, and vice/versa. (Uwe Schindler, Robert Muir, Mike McCandless) 11023 11024* LUCENE-4456: Fixed double-counting sizeInBytes for a segment 11025 (affects how merge policies pick merges); fixed CheckIndex's 11026 incorrect reporting of whether a segment has deletions; fixed case 11027 where on abort Lucene could remove files it didn't create; fixed 11028 many cases where IndexWriter could leave leftover files (on 11029 exception in various places, on reuse of a segment name after crash 11030 and recovery. (Uwe Schindler, Robert Muir, Mike McCandless) 11031 11032Optimizations 11033 11034* LUCENE-4322: Decrease lucene-core JAR size. The core JAR size had increased a 11035 lot because of generated code introduced in LUCENE-4161 and LUCENE-3892. 11036 (Adrien Grand) 11037 11038* LUCENE-4317: Improve reuse of internal TokenStreams and StringReader 11039 in oal.document.Field. (Uwe Schindler, Chris Male, Robert Muir) 11040 11041* LUCENE-4327: Support out-of-order scoring in FilteredQuery for higher 11042 performance. (Mike McCandless, Robert Muir) 11043 11044* LUCENE-4364: Optimize MMapDirectory to not make a mapping per-cfs-slice, 11045 instead one map per .cfs file. This reduces the total number of maps. 11046 Additionally factor out a (package-private) generic 11047 ByteBufferIndexInput from MMapDirectory. (Uwe Schindler, Robert Muir) 11048 11049Build 11050 11051* LUCENE-4406, LUCENE-4407: Upgrade to randomizedtesting 2.0.1. 11052 Workaround for broken test output XMLs due to non-XML text unicode 11053 chars in strings. Added printing of failed tests at the end of a 11054 test run (Dawid Weiss) 11055 11056* LUCENE-4252: Detect/Fail tests when they leak RAM in static fields 11057 (Robert Muir, Dawid Weiss) 11058 11059* LUCENE-4360: Support running the same test suite multiple times in 11060 parallel (Dawid Weiss) 11061 11062* LUCENE-3985: Upgrade to randomizedtesting 2.0.0. Added support for 11063 thread leak detection. Added support for suite timeouts. (Dawid Weiss) 11064 11065* LUCENE-4354: Corrected maven dependencies to be consistent with 11066 the licenses/ folder and the binary release. Some had different 11067 versions or additional unnecessary dependencies. (selckin via Robert Muir) 11068 11069* LUCENE-4340: Move all non-default codec, postings format and terms 11070 dictionary implementations to lucene/codecs. (Adrien Grand) 11071 11072Documentation 11073 11074* LUCENE-4302: Fix facet userguide to have HTML loose doctype like 11075 all other javadocs. (Karl Nicholas via Uwe Schindler) 11076 11077======================= Lucene 4.0.0-BETA ======================= 11078 11079New features 11080 11081* LUCENE-4249: Changed the explanation of the PayloadTermWeight to use the 11082 underlying PayloadFunction's explanation as the explanation 11083 for the payload score. (Scott Smerchek via Robert Muir) 11084 11085* LUCENE-4069: Added BloomFilteringPostingsFormat for use with low-frequency terms 11086 such as primary keys (Mark Harwood, Mike McCandless) 11087 11088* LUCENE-4201: Added JapaneseIterationMarkCharFilter to normalize Japanese 11089 iteration marks. (Robert Muir, Christian Moen) 11090 11091* LUCENE-3832: Added BasicAutomata.makeStringUnion method to efficiently 11092 create automata from a fixed collection of UTF-8 encoded BytesRef 11093 (Dawid Weiss, Robert Muir) 11094 11095* LUCENE-4153: Added option to fast vector highlighting via BaseFragmentsBuilder to 11096 respect field boundaries in the case of highlighting for multivalued fields. 11097 (Martijn van Groningen) 11098 11099* LUCENE-4227: Added DirectPostingsFormat, to hold all postings in 11100 memory as uncompressed simple arrays. This uses a tremendous amount 11101 of RAM but gives good search performance gains. (Mike McCandless) 11102 11103* LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and 11104 CharFilterFactories to the lucene-analysis module. The API is still 11105 experimental. (Chris Male, Robert Muir, Uwe Schindler) 11106 11107* LUCENE-4230: When pulling a DocsAndPositionsEnum you can now 11108 specify whether or not you require payloads (in addition to 11109 offsets); turning one or both off may allow some codec 11110 implementations to optimize the enum implementation. (Robert Muir, 11111 Mike McCandless) 11112 11113* LUCENE-4203: Add IndexWriter.tryDeleteDocument(AtomicReader reader, 11114 int docID), to attempt deletion by docID as long as the provided 11115 reader is an NRT reader, and the segment has not yet been merged 11116 away (Mike McCandless). 11117 11118* LUCENE-4286: Added option to CJKBigramFilter to always also output 11119 unigrams. This can be used for a unigram+bigram approach, or at 11120 index-time only for better support of short queries. 11121 (Tom Burton-West, Robert Muir) 11122 11123API Changes 11124 11125* LUCENE-4138: update of morfologik (Polish morphological analyzer) to 1.5.3. 11126 The tag attribute class has been renamed to MorphosyntacticTagsAttribute and 11127 has a different API (carries a list of tags instead of a compound tag). Upgrade 11128 of embedded morfologik dictionaries to version 1.9. (Dawid Weiss) 11129 11130* LUCENE-4178: set 'tokenized' to true on FieldType by default, so that if you 11131 make a custom FieldType and set indexed = true, it's analyzed by the analyzer. 11132 (Robert Muir) 11133 11134* LUCENE-4220: Removed the buggy JavaCC-based HTML parser in the benchmark 11135 module and replaced by NekoHTML. HTMLParser interface was cleaned up while 11136 changing method signatures. (Uwe Schindler, Robert Muir) 11137 11138* LUCENE-2191: Rename Tokenizer.reset(Reader) to Tokenizer.setReader(Reader). 11139 The purpose of this method was always to set a new Reader on the Tokenizer, 11140 reusing the object. But the name was often confused with TokenStream.reset(). 11141 (Robert Muir) 11142 11143* LUCENE-4228: Refactored CharFilter to extend java.io.FilterReader. CharFilters 11144 filter another reader and you override correct() for offset correction. 11145 (Robert Muir) 11146 11147* LUCENE-4240: Analyzer api now just takes fieldName for getOffsetGap. If the 11148 field is not analyzed (e.g. StringField), then the analyzer is not invoked 11149 at all. If you want to tweak things like positionIncrementGap and offsetGap, 11150 analyze the field with KeywordTokenizer instead. (Grant Ingersoll, Robert Muir) 11151 11152* LUCENE-4250: Pass fieldName to the PayloadFunction explain method, so it 11153 parallels with docScore and the default implementation is correct. 11154 (Robert Muir) 11155 11156* LUCENE-3747: Support Unicode 6.1.0. (Steve Rowe) 11157 11158* LUCENE-3884: Moved ElisionFilter out of org.apache.lucene.analysis.fr 11159 package into org.apache.lucene.analysis.util. (Robert Muir) 11160 11161* LUCENE-4230: When pulling a DocsAndPositionsEnum you now pass an int 11162 flags instead of the previous boolean needOffsets. Currently 11163 recognized flags are DocsAndPositionsEnum.FLAG_PAYLOADS and 11164 DocsAndPositionsEnum.FLAG_OFFSETS (Robert Muir, Mike McCandless) 11165 11166* LUCENE-4273: When pulling a DocsEnum, you can pass an int flags 11167 instead of the previous boolean needsFlags; consistent with the changes 11168 for DocsAndPositionsEnum in LUCENE-4230. Currently the only flag 11169 is DocsEnum.FLAG_FREQS. (Robert Muir, Mike McCandless) 11170 11171* LUCENE-3616: TextField(String, Reader, Store) was reduced to TextField(String, Reader), 11172 as the Store parameter didn't make sense: if you supplied Store.YES, you would only 11173 receive an exception anyway. (Robert Muir) 11174 11175Optimizations 11176 11177* LUCENE-4171: Performance improvements to Packed64. 11178 (Toke Eskildsen via Adrien Grand) 11179 11180* LUCENE-4184: Performance improvements to the aligned packed bits impl. 11181 (Toke Eskildsen, Adrien Grand) 11182 11183* LUCENE-4235: Remove enforcing of Filter rewrite for NRQ queries. 11184 (Uwe Schindler) 11185 11186* LUCENE-4279: Regenerated snowball Stemmers from snowball r554, 11187 making them substantially more lightweight. Behavior is unchanged. 11188 (Robert Muir) 11189 11190* LUCENE-4291: Reduced internal buffer size for Jflex-based tokenizers 11191 such as StandardTokenizer from 32kb to 8kb. 11192 (Raintung Li, Steven Rowe, Robert Muir) 11193 11194Bug Fixes 11195 11196* LUCENE-4109: BooleanQueries are not parsed correctly with the 11197 flexible query parser. (Karsten Rauch via Robert Muir) 11198 11199* LUCENE-4176: Fix AnalyzingQueryParser to analyze range endpoints as bytes, 11200 so that it works correctly with Analyzers that produce binary non-UTF-8 terms 11201 such as CollationAnalyzer. (Nattapong Sirilappanich via Robert Muir) 11202 11203* LUCENE-4209: Fix FSTCompletionLookup to close its sorter, so that it won't 11204 leave temp files behind in /tmp. Fix SortedTermFreqIteratorWrapper to not 11205 leave temp files behind in /tmp on Windows. Fix Sort to not leave 11206 temp files behind when /tmp is a separate volume. (Uwe Schindler, Robert Muir) 11207 11208* LUCENE-4221: Fix overeager CheckIndex validation for term vector offsets. 11209 (Robert Muir) 11210 11211* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the 11212 size in bytes not MB (Chris Fuller via Mike McCandless) 11213 11214* LUCENE-3505: Fix bug (Lucene 4.0alpha only) where boolean conjunctions 11215 were sometimes scored incorrectly. Conjunctions of only termqueries where 11216 at least one term omitted term frequencies (IndexOptions.DOCS_ONLY) would 11217 be scored as if all terms omitted term frequencies. (Robert Muir) 11218 11219* LUCENE-2686, LUCENE-3505: Fixed BooleanQuery scorers to return correct 11220 freq(). Added support for scorer navigation API (Scorer.getChildren) to 11221 all queries. Made Scorer.freq() abstract. 11222 (Koji Sekiguchi, Mike McCandless, Robert Muir) 11223 11224* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest, 11225 and the number of matching documents is too large. (Gilad Barkai via Shai Erera) 11226 11227* LUCENE-4245: Make IndexWriter#close() and MergeScheduler#close() 11228 non-interruptible. (Mark Miller, Uwe Schindler) 11229 11230* LUCENE-4190: restrict allowed filenames that a codec may create to 11231 the patterns recognized by IndexFileNames. This also fixes 11232 IndexWriter to only delete files matching this pattern from an index 11233 directory, to reduce risk when the wrong index path is accidentally 11234 passed to IndexWriter (Robert Muir, Mike McCandless) 11235 11236* LUCENE-4277: Fix IndexWriter deadlock during rollback if flushable DWPT 11237 instance are already checked out and queued up but not yet flushed. 11238 (Simon Willnauer) 11239 11240* LUCENE-4282: Automaton FuzzyQuery didn't always deliver all results. 11241 (Johannes Christen, Uwe Schindler, Robert Muir) 11242 11243* LUCENE-4289: Fix minor idf inconsistencies/inefficiencies in highlighter. 11244 (Robert Muir) 11245 11246Changes in Runtime Behavior 11247 11248* LUCENE-4109: Enable position increments in the flexible queryparser by default. 11249 (Karsten Rauch via Robert Muir) 11250 11251* LUCENE-3616: Field throws exception if you try to set a boost on an 11252 unindexed field or one that omits norms. (Robert Muir) 11253 11254Build 11255 11256* LUCENE-4094: Support overriding file.encoding on forked test JVMs 11257 (force via -Drandomized.file.encoding=XXX). (Dawid Weiss) 11258 11259* LUCENE-4189: Test output should include timestamps (start/end for each 11260 test/ suite). Added -Dtests.timestamps=[off by default]. (Dawid Weiss) 11261 11262* LUCENE-4110: Report long periods of forked jvm inactivity (hung tests/ suites). 11263 Added -Dtests.heartbeat=[seconds] with the default of 60 seconds. 11264 (Dawid Weiss) 11265 11266* LUCENE-4160: Added a property to quit the tests after a given 11267 number of failures has occurred. This is useful in combination 11268 with -Dtests.iters=N (you can start N iterations and wait for M 11269 failures, in particular M = 1). -Dtests.maxfailures=M. Alternatively, 11270 specify -Dtests.failfast=true to skip all tests after the first failure. 11271 (Dawid Weiss) 11272 11273* LUCENE-4115: JAR resolution/ cleanup should be done automatically for ant 11274 clean/ eclipse/ resolve (Dawid Weiss) 11275 11276* LUCENE-4199, LUCENE-4202, LUCENE-4206: Add a new target "check-forbidden-apis" 11277 that parses all generated .class files for use of APIs that use default 11278 charset, default locale, or default timezone and fail build if violations 11279 found. This ensures, that Lucene / Solr is independent on local configuration 11280 options. (Uwe Schindler, Robert Muir, Dawid Weiss) 11281 11282* LUCENE-4217: Add the possibility to run tests with Atlassian Clover 11283 loaded from IVY. A development License solely for Apache code was added in 11284 the tools/ folder, but is not included in releases. (Uwe Schindler) 11285 11286Documentation 11287 11288* LUCENE-4195: Added package documentation and examples for 11289 org.apache.lucene.codecs (Alan Woodward via Robert Muir) 11290 11291======================= Lucene 4.0.0-ALPHA ======================= 11292 11293More information about this release, including any errata related to the 11294release notes, upgrade instructions, or other changes may be found online at: 11295 https://wiki.apache.org/lucene-java/Lucene4.0 11296 11297For "contrib" changes prior to 4.0, please see: 11298http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/contrib/CHANGES.txt 11299 11300Changes in backwards compatibility policy 11301 11302* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing: 11303 11304 - On upgrading to 4.0, if you do not fully reindex your documents, 11305 Lucene will emulate the new flex API on top of the old index, 11306 incurring some performance cost (up to ~10% slowdown, typically). 11307 To prevent this slowdown, use oal.index.IndexUpgrader 11308 to upgrade your indexes to latest file format (LUCENE-3082). 11309 11310 - Mixed flex/pre-flex indexes are perfectly fine -- the two 11311 emulation layers (flex API on pre-flex index, and pre-flex API on 11312 flex index) will remap the access as required. So on upgrading to 11313 4.0 you can start indexing new documents into an existing index. 11314 To get optimal performance, use oal.index.IndexUpgrader 11315 to upgrade your indexes to latest file format (LUCENE-3082). 11316 11317 - The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum) 11318 have been removed in favor of the new flexible 11319 indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum, 11320 DocsEnum, DocsAndPositionsEnum). One big difference is that field 11321 and terms are now enumerated separately: a TermsEnum provides a 11322 BytesRef (wraps a byte[]) per term within a single field, not a 11323 Term. Another is that when asking for a Docs/AndPositionsEnum, you 11324 now specify the skipDocs explicitly (typically this will be the 11325 deleted docs, but in general you can provide any Bits). 11326 11327 - The term vectors APIs (TermFreqVector, TermPositionVector, 11328 TermVectorMapper) have been removed in favor of the above 11329 flexible indexing APIs, presenting a single-document inverted 11330 index of the document from the term vectors. 11331 11332 - MultiReader ctor now throws IOException 11333 11334 - Directory.copy/Directory.copyTo now copies all files (not just 11335 index files), since what is and isn't and index file is now 11336 dependent on the codecs used. 11337 11338 - UnicodeUtil now uses BytesRef for UTF-8 output, and some method 11339 signatures have changed to CharSequence. These are internal APIs 11340 and subject to change suddenly. 11341 11342 - Positional queries (PhraseQuery, *SpanQuery) will now throw an 11343 exception if use them on a field that omits positions during 11344 indexing (previously they silently returned no results). 11345 11346 - FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has 11347 changed -- each parse method now takes a BytesRef instead of a 11348 String. If you have an existing Parser, a simple way to fix it is 11349 invoke BytesRef.utf8ToString, and pass that String to your 11350 existing parser. This will work, but performance would be better 11351 if you could fix your parser to instead operate directly on the 11352 byte[] in the BytesRef. 11353 11354 - The internal (experimental) API of NumericUtils changed completely 11355 from String to BytesRef. Client code should never use this class, 11356 so the change would normally not affect you. If you used some of 11357 the methods to inspect terms or create TermQueries out of 11358 prefix encoded terms, change to use BytesRef. Please note: 11359 Do not use TermQueries to search for single numeric terms. 11360 The recommended way is to create a corresponding NumericRangeQuery 11361 with upper and lower bound equal and included. TermQueries do not 11362 score correct, so the constant score mode of NRQ is the only 11363 correct way to handle single value queries. 11364 11365 - NumericTokenStream now works directly on byte[] terms. If you 11366 plug a TokenFilter on top of this stream, you will likely get 11367 an IllegalArgumentException, because the NTS does not support 11368 TermAttribute/CharTermAttribute. If you want to further filter 11369 or attach Payloads to NTS, use the new NumericTermAttribute. 11370 11371 (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch) 11372 11373* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract 11374 AtomicReader, CompositeReader, and DirectoryReader. To open Directory- 11375 based indexes use DirectoryReader.open(), the corresponding method in 11376 IndexReader is now deprecated for easier migration. Only DirectoryReader 11377 supports commits, versions, and reopening with openIfChanged(). Terms, 11378 postings, docvalues, and norms can from now on only be retrieved using 11379 AtomicReader; DirectoryReader and MultiReader extend CompositeReader, 11380 only offering stored fields and access to the sub-readers (which may be 11381 composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be 11382 used to emulate atomic readers on top of composites. 11383 Please review MIGRATE.txt for information how to migrate old code. 11384 (Uwe Schindler, Robert Muir, Mike McCandless) 11385 11386* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints, 11387 not unicode code units. For example, a Wildcard "?" represents any unicode 11388 character. Furthermore, the rest of the automaton package and RegexpQuery use 11389 true Unicode codepoint representation. (Robert Muir, Mike McCandless) 11390 11391* LUCENE-2380: The String-based FieldCache methods (getStrings, 11392 getStringIndex) have been replaced with BytesRef-based equivalents 11393 (getTerms, getTermsIndex). Also, the sort values (returned in 11394 FieldDoc.fields) when sorting by SortField.STRING or 11395 SortField.STRING_VAL are now BytesRef instances. See MIGRATE.txt 11396 for more details. (yonik, Mike McCandless) 11397 11398* LUCENE-2480: Though not a change in backwards compatibility policy, pre-3.0 11399 indexes are no longer supported. You should upgrade to 3.x first, then run 11400 optimize(), or reindex. (Shai Erera, Earwin Burrfoot) 11401 11402* LUCENE-2484: Removed deprecated TermAttribute. Use CharTermAttribute 11403 and TermToBytesRefAttribute instead. (Uwe Schindler) 11404 11405* LUCENE-2600: Remove IndexReader.isDeleted in favor of 11406 AtomicReader.getDeletedDocs(). (Mike McCandless) 11407 11408* LUCENE-2667: FuzzyQuery's defaults have changed for more performant 11409 behavior: the minimum similarity is 2 edit distances from the word, 11410 and the priority queue size is 50. To support this, FuzzyQuery now allows 11411 specifying unscaled edit distances (foobar~2). If your application depends 11412 upon the old defaults of 0.5 (scaled) minimum similarity and Integer.MAX_VALUE 11413 priority queue size, you can use FuzzyQuery(Term, float, int, int) to specify 11414 those explicitly. 11415 11416* LUCENE-2674: MultiTermQuery.TermCollector.collect now accepts the 11417 TermsEnum as well. (Robert Muir, Mike McCandless) 11418 11419* LUCENE-588: WildcardQuery and QueryParser now allows escaping with 11420 the '\' character. Previously this was impossible (you could not escape */?, 11421 for example). If your code somehow depends on the old behavior, you will 11422 need to change it (e.g. using "\\" to escape '\' itself). 11423 (Sunil Kamath, Terry Yang via Robert Muir) 11424 11425* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher; 11426 removed contrib/remote and MultiSearcher (Mike McCandless); absorbed 11427 ParallelMultiSearcher into IndexSearcher as an optional 11428 ExecutorServiced passed to its ctor. (Mike McCandless) 11429 11430* LUCENE-2908, LUCENE-4037: Removed serialization code from lucene classes. 11431 It is recommended that you serialize user search needs at a higher level 11432 in your application. 11433 (Robert Muir, Benson Margulies) 11434 11435* LUCENE-2831: Changed Weight#scorer, Weight#explain & Filter#getDocIdSet to 11436 operate on a AtomicReaderContext instead of directly on IndexReader to enable 11437 searches to be aware of IndexSearcher's context. (Simon Willnauer) 11438 11439* LUCENE-2839: Scorer#score(Collector,int,int) is now public because it is 11440 called from other classes and part of public API. (Uwe Schindler) 11441 11442* LUCENE-2865: Weight#scorer(AtomicReaderContext, boolean, boolean) now accepts 11443 a ScorerContext struct instead of booleans.(Simon Willnauer) 11444 11445* LUCENE-2882: Cut over SpanQuery#getSpans to AtomicReaderContext to enforce 11446 per segment semantics on SpanQuery & Spans. (Simon Willnauer) 11447 11448* LUCENE-2236: Similarity can now be configured on a per-field basis. See the 11449 migration notes in MIGRATE.txt for more details. (Robert Muir, Doron Cohen) 11450 11451* LUCENE-2315: AttributeSource's methods for accessing attributes are now final, 11452 else it's easy to corrupt the internal states. (Uwe Schindler) 11453 11454* LUCENE-2814: The IndexWriter.flush method no longer takes "boolean 11455 flushDocStores" argument, as we now always flush doc stores (index 11456 files holding stored fields and term vectors) while flushing a 11457 segment. (Mike McCandless) 11458 11459* LUCENE-2548: Field names (eg in Term, FieldInfo) are no longer 11460 interned. (Mike McCandless) 11461 11462* LUCENE-2883: The contents of o.a.l.search.function has been consolidated into 11463 the queries module and can be found at o.a.l.queries.function. See 11464 MIGRATE.txt for more information (Chris Male) 11465 11466* LUCENE-2392, LUCENE-3299: Decoupled vector space scoring from 11467 Query/Weight/Scorer. If you extended Similarity directly before, you should 11468 extend TFIDFSimilarity instead. Similarity is now a lower-level API to 11469 implement other scoring algorithms. See MIGRATE.txt for more details. 11470 (David Nemeskey, Simon Willnauer, Mike McCandless, Robert Muir) 11471 11472* LUCENE-3330: The expert visitor API in Scorer has been simplified and 11473 extended to support arbitrary relationships. To navigate to a scorer's 11474 children, call Scorer.getChildren(). (Robert Muir) 11475 11476* LUCENE-2308: Field is now instantiated with an instance of IndexableFieldType, 11477 of which there is a core implementation FieldType. Most properties 11478 describing a Field have been moved to IndexableFieldType. See MIGRATE.txt 11479 for more details. (Nikola Tankovic, Mike McCandless, Chris Male) 11480 11481* LUCENE-3396: ReusableAnalyzerBase.TokenStreamComponents.reset(Reader) now 11482 returns void instead of boolean. If a Component cannot be reset, it should 11483 throw an Exception. (Chris Male) 11484 11485* LUCENE-3396: ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer 11486 implementations must now use Analyzer.TokenStreamComponents, rather than 11487 overriding .tokenStream() and .reusableTokenStream() (which are now final). 11488 (Chris Male) 11489 11490* LUCENE-3346: Analyzer.reusableTokenStream() has been renamed to tokenStream() 11491 with the old tokenStream() method removed. Consequently it is now mandatory 11492 for all Analyzers to support reusability. (Chris Male) 11493 11494* LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when 11495 it cannot be easily determined. Instead, it returns -1 to be consistent with 11496 this behavior across other index statistics. 11497 (Robert Muir) 11498 11499* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer 11500 allowed to throw IOException. This change was required to make it conform 11501 to the Bits interface. This method should never do I/O for performance reasons. 11502 (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, 11503 Jason Rutherglen, Paul Elschot) 11504 11505* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were removed, 11506 as these are no longer used by the scoring system. See MIGRATE.txt for more 11507 details. (Robert Muir) 11508 11509* LUCENE-3533: Removed SpanFilters, they created large lists of objects and 11510 did not scale. (Robert Muir) 11511 11512* LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer 11513 possible to delete or undelete documents using IndexReader; you have to use 11514 IndexWriter now. As deleting by internal Lucene docID is no longer possible, 11515 this requires adding a unique identifier field to your index. Deleting/ 11516 relying upon Lucene docIDs is not recommended anyway, because they can 11517 change. Consequently commit() was removed and DirectoryReader.open(), 11518 openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy 11519 instances. Furthermore, IndexReader.setNorm() was removed. If you need 11520 customized norm values, the recommended way to do this is by modifying 11521 Similarity to use an external byte[] or one of the new DocValues 11522 fields (LUCENE-3108). Alternatively, to dynamically change norms (boost 11523 *and* length norm) at query time, wrap your AtomicReader using 11524 FilterAtomicReader, overriding FilterAtomicReader.norms(). To persist the 11525 changes on disk, copy the FilteredIndexReader to a new index using 11526 IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir) 11527 11528* LUCENE-3640: Removed IndexSearcher.close(), because IndexSearcher no longer 11529 takes a Directory and no longer "manages" IndexReaders, it is a no-op. 11530 (Robert Muir) 11531 11532* LUCENE-3684: Add offsets into DocsAndPositionsEnum, and a few 11533 FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert 11534 Muir, Mike McCandless) 11535 11536* LUCENE-2858, LUCENE-3770: FilterIndexReader was renamed to 11537 FilterAtomicReader and now extends AtomicReader. If you want to filter 11538 composite readers like DirectoryReader or MultiReader, filter their 11539 atomic leaves and build a new CompositeReader (e.g. MultiReader) around 11540 them. (Uwe Schindler, Robert Muir) 11541 11542* LUCENE-3736: ParallelReader was split into ParallelAtomicReader 11543 and ParallelCompositeReader. Lucene 3.x's ParallelReader is now 11544 ParallelAtomicReader; but the new composite variant has improved performance 11545 as it works on the atomic subreaders. It requires that all parallel 11546 composite readers have the same subreader structure. If you cannot provide this, 11547 you can use SlowCompositeReaderWrapper to make all parallel readers atomic 11548 and use ParallelAtomicReader. (Uwe Schindler, Mike McCandless, Robert Muir) 11549 11550* LUCENE-2000: clone() now returns covariant types where possible. (ryan) 11551 11552* LUCENE-3970: Rename Fields.getUniqueFieldCount -> .size() and 11553 Terms.getUniqueTermCount -> .size(). (Iulius Curt via Mike McCandless) 11554 11555* LUCENE-3514: IndexSearcher.setDefaultFieldSortScoring was removed 11556 and replaced with per-search control via new expert search methods 11557 that take two booleans indicating whether hit scores and max 11558 score should be computed. (Mike McCandless) 11559 11560* LUCENE-4055: You can't put foreign files into the index dir anymore. 11561 11562* LUCENE-3866: CompositeReader.getSequentialSubReaders() now returns 11563 unmodifiable List<? extends IndexReader>. ReaderUtil.Gather was 11564 removed, as IndexReaderContext.leaves() is now the preferred way 11565 to access sub-readers. (Uwe Schindler) 11566 11567* LUCENE-4155: oal.util.ReaderUtil, TwoPhaseCommit, TwoPhaseCommitTool 11568 classes were moved to oal.index package. oal.util.CodecUtil class was moved 11569 to oal.codecs package. oal.util.DummyConcurrentLock was removed 11570 (no longer used in Lucene 4.0). (Uwe Schindler) 11571 11572Changes in Runtime Behavior 11573 11574* LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you 11575 omitNorms(true) for field "a" for 1000 documents, but then add a document with 11576 omitNorms(false) for field "a", all documents for field "a" will have no 11577 norms. Previously, Lucene would fill the first 1000 documents with 11578 "fake norms" from Similarity.getDefault(). (Robert Muir, Mike McCandless) 11579 11580* LUCENE-2846: When some documents contain field "a", and others do not, the 11581 documents that don't have the field get a norm byte value of 0. Previously, 11582 Lucene would populate "fake norms" with Similarity.getDefault() for these 11583 documents. (Robert Muir, Mike McCandless) 11584 11585* LUCENE-2720: IndexWriter throws IndexFormatTooOldException on open, rather 11586 than later when e.g. a merge starts. 11587 (Shai Erera, Mike McCandless, Uwe Schindler) 11588 11589* LUCENE-2881: FieldInfos is now tracked per segment. Before it was tracked 11590 per IndexWriter session, which resulted in FieldInfos that had the FieldInfo 11591 properties from all previous segments combined. Field numbers are now tracked 11592 globally across IndexWriter sessions and persisted into a _X.fnx file on 11593 successful commit. The corresponding file format changes are backwards- 11594 compatible. (Michael Busch, Simon Willnauer) 11595 11596* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from 11597 DocumentsWriterPerThread: 11598 11599 - IndexWriter now uses a DocumentsWriter per thread when indexing documents. 11600 Each DocumentsWriterPerThread indexes documents in its own private segment, 11601 and the in memory segments are no longer merged on flush. Instead, each 11602 segment is separately flushed to disk and subsequently merged with normal 11603 segment merging. 11604 11605 - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a 11606 FlushPolicy. When a DWPT is flushed, a fresh DWPT is swapped in so that 11607 indexing may continue concurrently with flushing. The selected 11608 DWPT flushes all its RAM resident documents do disk. Note: Segment flushes 11609 don't flush all RAM resident documents but only the documents private to 11610 the DWPT selected for flushing. 11611 11612 - Flushing is now controlled by FlushPolicy that is called for every add, 11613 update or delete on IndexWriter. By default DWPTs are flushed either on 11614 maxBufferedDocs per DWPT or the global active used memory. Once the active 11615 memory exceeds ramBufferSizeMB only the largest DWPT is selected for 11616 flushing and the memory used by this DWPT is subtracted from the active 11617 memory and added to a flushing memory pool, which can lead to temporarily 11618 higher memory usage due to ongoing indexing. 11619 11620 - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address 11621 up to 2048 MB memory such that the ramBufferSize is now bounded by the max 11622 number of DWPT available in the used DocumentsWriterPerThreadPool. 11623 IndexWriters net memory consumption can grow far beyond the 2048 MB limit if 11624 the application can use all available DWPTs. To prevent a DWPT from 11625 exhausting its address space IndexWriter will forcefully flush a DWPT if its 11626 hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled 11627 via IndexWriterConfig and defaults to 1945 MB. 11628 Since IndexWriter flushes DWPT concurrently not all memory is released 11629 immediately. Applications should still use a ramBufferSize significantly 11630 lower than the JVMs available heap memory since under high load multiple 11631 flushing DWPT can consume substantial transient memory when IO performance 11632 is slow relative to indexing rate. 11633 11634 - IndexWriter#commit now doesn't block concurrent indexing while flushing all 11635 'currently' RAM resident documents to disk. Yet, flushes that occur while a 11636 a full flush is running are queued and will happen after all DWPT involved 11637 in the full flush are done flushing. Applications using multiple threads 11638 during indexing and trigger a full flush (eg call commit() or open a new 11639 NRT reader) can use significantly more transient memory. 11640 11641 - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing 11642 threads if the number of active + number of flushing DWPT exceed a 11643 safety limit. By default this happens if 2 * max number available thread 11644 states (DWPTPool) is exceeded. This safety limit prevents applications from 11645 exhausting their available memory if flushing can't keep up with 11646 concurrently indexing threads. 11647 11648 - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms 11649 limit is reached during indexing. No segment flushes will be triggered 11650 due to this setting. 11651 11652 - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter 11653 anymore. A dedicated flushLock has been introduced to prevent multiple full- 11654 flushes happening concurrently. 11655 11656 - DocumentsWriter doesn't write shared doc stores anymore. 11657 11658 (Mike McCandless, Michael Busch, Simon Willnauer) 11659 11660* LUCENE-3309: Stored fields no longer record whether they were 11661 tokenized or not. In general you should not rely on stored fields 11662 to record any "metadata" from indexing (tokenized, omitNorms, 11663 IndexOptions, boost, etc.) (Mike McCandless) 11664 11665* LUCENE-3309: Fast vector highlighter now inserts the 11666 MultiValuedSeparator for NOT_ANALYZED fields (in addition to 11667 ANALYZED fields). To ensure your offsets are correct you should 11668 provide an analyzer that returns 1 from the offsetGap method. 11669 (Mike McCandless) 11670 11671* LUCENE-2621: Removed contrib/instantiated. (Robert Muir) 11672 11673* LUCENE-1768: StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder 11674 for RangeQueryNodes, since theses two classes were removed; 11675 TermRangeQueryNodeProcessor now creates TermRangeQueryNode, 11676 instead of RangeQueryNode; the same applies for numeric nodes; 11677 (Vinicius Barros via Uwe Schindler) 11678 11679* LUCENE-3455: QueryParserBase.newFieldQuery() will throw a ParseException if 11680 any of the calls to the Analyzer throw an IOException. QueryParseBase.analyzeRangePart() 11681 will throw a RuntimeException if an IOException is thrown by the Analyzer. 11682 11683* LUCENE-4127: IndexWriter will now throw IllegalArgumentException if 11684 the first token of an indexed field has 0 positionIncrement 11685 (previously it silently corrected it to 1, possibly masking bugs). 11686 OffsetAttributeImpl will throw IllegalArgumentException if startOffset 11687 is less than endOffset, or if offsets are negative. 11688 (Robert Muir, Mike McCandless) 11689 11690API Changes 11691 11692* LUCENE-2302, LUCENE-1458, LUCENE-2111, LUCENE-2514: Terms are no longer 11693 required to be character based. Lucene views a term as an arbitrary byte[]: 11694 during analysis, character-based terms are converted to UTF8 byte[], 11695 but analyzers are free to directly create terms as byte[] 11696 (NumericField does this, for example). The term data is buffered as 11697 byte[] during indexing, written as byte[] into the terms dictionary, 11698 and iterated as byte[] (wrapped in a BytesRef) by IndexReader for 11699 searching. 11700 11701* LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its 11702 deleted docs (getDeletedDocs), providing a new Bits interface to 11703 directly query by doc ID. 11704 11705* LUCENE-2691: IndexWriter.getReader() has been made package local and is now 11706 exposed via open and reopen methods on DirectoryReader. The semantics of the 11707 call is the same as it was prior to the API change. 11708 (Grant Ingersoll, Mike McCandless) 11709 11710* LUCENE-2566: QueryParser: Unary operators +,-,! will not be treated as 11711 operators if they are followed by whitespace. (yonik) 11712 11713* LUCENE-2831: Weight#scorer, Weight#explain, Filter#getDocIdSet, 11714 Collector#setNextReader & FieldComparator#setNextReader now expect an 11715 AtomicReaderContext instead of an IndexReader. (Simon Willnauer) 11716 11717* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by 11718 default) which takes Analyzer as a parameter, for easier customization by 11719 subclasses. (Robert Muir) 11720 11721* LUCENE-2953: In addition to changes in 3.x, PriorityQueue#initialize(int) 11722 function was moved into the ctor. (Uwe Schindler, Yonik Seeley) 11723 11724* LUCENE-3219: SortField type properties have been moved to an enum 11725 SortField.Type. In be consistent, CachedArrayCreator.getSortTypeID() has 11726 been changed CachedArrayCreator.getSortType(). (Chris Male) 11727 11728* LUCENE-3225: Add TermsEnum.seekExact for faster seeking when you 11729 don't need the ceiling term; renamed existing seek methods to either 11730 seekCeil or seekExact; changed seekExact(ord) to return no value. 11731 Fixed MemoryCodec and SimpleTextCodec to optimize the seekExact 11732 case, and fixed places in Lucene to use seekExact when possible. 11733 (Mike McCandless) 11734 11735* LUCENE-1536: Filter.getDocIdSet() now takes an acceptDocs Bits interface (like 11736 Scorer) limiting the documents that can appear in the returned DocIdSet. 11737 Filters are now required to respect these acceptDocs, otherwise deleted documents 11738 may get returned by searches. Most filters will pass these Bits down to DocsEnum, 11739 but those, e.g. working on FieldCache, may need to use BitsFilteredDocIdSet.wrap() 11740 to exclude them. 11741 (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, 11742 Jason Rutherglen, Paul Elschot) 11743 11744* LUCENE-3722: Similarity methods and collection/term statistics now take 11745 long instead of int (to enable distributed scoring of > 2B docs). 11746 (Yonik Seeley, Andrzej Bialecki, Robert Muir) 11747 11748* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager. 11749 SearcherManager remains a concrete class, but due to the refactoring, the 11750 method maybeReopen has been deprecated in favor of maybeRefresh(). 11751 (Shai Erera, Mike McCandless, Simon Willnauer) 11752 11753* LUCENE-3859: AtomicReader.hasNorms(field) is deprecated, instead you 11754 can inspect the FieldInfo yourself to see if norms are present, which 11755 also allows you to get the type. (Robert Muir) 11756 11757* LUCENE-2606: Changed RegexCapabilities interface to fix thread 11758 safety, serialization, and performance problems. If you have 11759 written a custom RegexCapabilities it will need to be updated 11760 to the new API. (Robert Muir, Uwe Schindler) 11761 11762* LUCENE-2638 MakeHighFreqTerms.TermStats public to make it more useful 11763 for API use. (Andrzej Bialecki) 11764 11765* LUCENE-2912: The field-specific hashmaps in SweetSpotSimilarity were removed. 11766 Instead, use PerFieldSimilarityWrapper to return different SweetSpotSimilaritys 11767 for different fields, this way all parameters (such as TF factors) can be 11768 customized on a per-field basis. (Robert Muir) 11769 11770* LUCENE-3308: DuplicateFilter keepMode and processingMode have been converted to 11771 enums DuplicateFilter.KeepMode and DuplicateFilter.ProcessingMode respectively. 11772 11773* LUCENE-3483: Move Function grouping collectors from Solr to grouping module. 11774 (Martijn van Groningen) 11775 11776* LUCENE-3606: FieldNormModifier was deprecated, because IndexReader's 11777 setNorm() was deprecated. Furthermore, this class is broken, as it does 11778 not take position overlaps into account while recalculating norms. 11779 (Uwe Schindler, Robert Muir) 11780 11781* LUCENE-3936: Renamed StringIndexDocValues to DocTermsIndexDocValues. 11782 (Martijn van Groningen) 11783 11784* LUCENE-1768: Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder), 11785 ParametricRangeQueryNodeProcessor were removed. (Vinicius Barros via Uwe Schindler) 11786 11787* LUCENE-3820: Deprecated constructors accepting pattern matching bounds. The input 11788 is buffered and matched in one pass. (Dawid Weiss) 11789 11790* LUCENE-2413: Deprecated PatternAnalyzer in common/miscellaneous, in favor 11791 of the pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir) 11792 11793* LUCENE-2413: Removed the AnalyzerUtil in common/miscellaneous. (Robert Muir) 11794 11795* LUCENE-1370: Added ShingleFilter option to output unigrams if no shingles 11796 can be generated. (Chris Harris via Steven Rowe) 11797 11798* LUCENE-2514, LUCENE-2551: JDK and ICU CollationKeyAnalyzers were changed to 11799 use pure byte keys when Version >= 4.0. This cuts sort key size approximately 11800 in half. (Robert Muir) 11801 11802* LUCENE-3400: Removed DutchAnalyzer.setStemDictionary (Chris Male) 11803 11804* LUCENE-3431: Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods 11805 since they prevented reuse. Stopwords are now generated at instantiation through 11806 the Analyzer's constructors. (Chris Male) 11807 11808* LUCENE-3434: Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer 11809 since they prevent reuse. Both Analyzers should be configured at instantiation. 11810 (Chris Male) 11811 11812* LUCENE-3765: Stopset ctors that previously took Set<?> or Map<?,String> now take 11813 CharArraySet and CharArrayMap respectively. Previously the behavior was confusing, 11814 and sometimes different depending on the type of set, and ultimately a CharArraySet 11815 or CharArrayMap was always used anyway. (Robert Muir) 11816 11817* LUCENE-3830: Switched to NormalizeCharMap.Builder to create 11818 immutable instances of NormalizeCharMap. (Dawid Weiss, Mike 11819 McCandless) 11820 11821* LUCENE-4063: FrenchLightStemmer no longer deletes repeated digits. 11822 (Tanguy Moal via Steve Rowe) 11823 11824* LUCENE-4122: Replace Payload with BytesRef. (Andrzej Bialecki) 11825 11826* LUCENE-4132: IndexWriter.getConfig() now returns a LiveIndexWriterConfig object 11827 which can be used to change the IndexWriter's live settings. IndexWriterConfig 11828 is used only for initializing the IndexWriter. (Shai Erera) 11829 11830* LUCENE-3866: IndexReaderContext.leaves() is now the preferred way to access 11831 atomic sub-readers of any kind of IndexReader (for AtomicReaders it returns 11832 itself as only leaf with docBase=0). (Uwe Schindler) 11833 11834New features 11835 11836* LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions 11837 are directly supported by the standard queryparser via 11838 fieldName:/expression/ OR /expression against default field/ 11839 Users who wish to search for literal "/" characters are advised to 11840 backslash-escape or quote those characters as needed. 11841 (Simon Willnauer, Robert Muir) 11842 11843* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that 11844 matches terms against a finite-state machine. Implement WildcardQuery 11845 and FuzzyQuery with finite-state methods. Adds RegexpQuery. 11846 (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller) 11847 11848* LUCENE-3662: Add support for levenshtein distance with transpositions 11849 to LevenshteinAutomata, FuzzyTermsEnum, and DirectSpellChecker. 11850 (Jean-Philippe Barrette-LaPierre, Robert Muir) 11851 11852* LUCENE-2321: Cutover to a more RAM efficient packed-ints based 11853 representation for the in-memory terms dict index. (Mike 11854 McCandless) 11855 11856* LUCENE-2126: Add new classes for data (de)serialization: DataInput 11857 and DataOutput. IndexInput and IndexOutput extend these new classes. 11858 (Michael Busch) 11859 11860* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible 11861 for an application to create its own postings codec, to alter how 11862 fields, terms, docs and positions are encoded into the index. The 11863 standard codec is the default codec. IndexWriter accepts a Codec 11864 class to obtain codecs for newly written segments. 11865 11866* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added 11867 for flexible indexing, including pulsing codec (inlines 11868 low-frequency terms directly into the terms dict, avoiding seeking 11869 for some queries), sep codec (stores docs, freqs, positions, skip 11870 data and payloads in 5 separate files instead of the 2 used by 11871 standard codec), and int block (really a "base" for using 11872 block-based compressors like PForDelta for storing postings data). 11873 11874* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard 11875 codec is more RAM efficient: terms data is stored as block byte 11876 arrays and packed integers. Net RAM reduction for indexes that have 11877 many unique terms should be substantial, and initial open time for 11878 IndexReaders should be faster. These gains only apply for newly 11879 written segments after upgrading. 11880 11881* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as 11882 byte[] during indexing, which uses half the RAM for ascii terms (and 11883 also numeric fields). This can improve indexing throughput for 11884 applications that have many unique terms, since it reduces how often 11885 a new segment must be flushed given a fixed RAM buffer size. 11886 11887* LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which 11888 lets you set the Codec per field (Mike McCandless) 11889 11890* LUCENE-2373: Extend Codec to use SegmentInfosWriter and 11891 SegmentInfosReader to allow customization of SegmentInfos data. 11892 (Andrzej Bialecki) 11893 11894* LUCENE-2504: FieldComparator.setNextReader now returns a 11895 FieldComparator instance. You can "return this", to just reuse the 11896 same instance, or you can return a comparator optimized to the new 11897 segment. (yonik, Mike McCandless) 11898 11899* LUCENE-2648: PackedInts.Iterator now supports to advance by more than a 11900 single ordinal. (Simon Willnauer) 11901 11902* LUCENE-2649: Objects in the FieldCache can optionally store Bits 11903 that mark which docs have real values in the native[] (ryan) 11904 11905* LUCENE-2664: Add SimpleText codec, which stores all terms/postings 11906 data in a single text file for transparency (at the expense of poor 11907 performance). (Sahin Buyrukbilen via Mike McCandless) 11908 11909* LUCENE-2589: Add a VariableSizedIntIndexInput, which, when used w/ 11910 Sep*, makes it simple to take any variable sized int block coders 11911 (like Simple9/16) and use them in a codec. (Mike McCandless) 11912 11913* LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a 11914 composite reader (eg MultiReader or DirectoryReader), making it 11915 pretend it's an atomic reader. This is a convenience class (you can 11916 use MultiFields static methods directly, instead) if you need to use 11917 the flex APIs directly on a composite reader. (Mike McCandless) 11918 11919* LUCENE-2690: MultiTermQuery boolean rewrites per segment. 11920 (Uwe Schindler, Robert Muir, Mike McCandless, Simon Willnauer) 11921 11922* LUCENE-996: The QueryParser now accepts mixed inclusive and exclusive 11923 bounds for range queries. Example: "{3 TO 5]" 11924 QueryParser subclasses that overrode getRangeQuery will need to be changed 11925 to use the new getRangeQuery method. (Andrew Schurman, Mark Miller, yonik) 11926 11927* LUCENE-2742: Add native per-field postings format support. Codec lets you now 11928 register a postings format for each field and which is in turn recorded 11929 into the index. Postings formats are maintained on a per-segment basis and be 11930 resolved without knowing the actual postings format used for writing the segment. 11931 (Simon Willnauer) 11932 11933* LUCENE-2741: Add support for multiple codecs that use the same file 11934 extensions within the same segment. Codecs now use their per-segment codec 11935 ID in the file names. (Simon Willnauer) 11936 11937* LUCENE-2843: Added a new terms index impl, 11938 VariableGapTermsIndexWriter/Reader, that accepts a pluggable 11939 IndexTermSelector for picking which terms should be indexed in the 11940 terms dict. This impl stores the indexed terms in an FST, which is 11941 much more RAM efficient than FixedGapTermsIndex. (Mike McCandless) 11942 11943* LUCENE-2862: Added TermsEnum.totalTermFreq() and 11944 Terms.getSumTotalTermFreq(). (Mike McCandless, Robert Muir) 11945 11946* LUCENE-3290: Added Terms.getSumDocFreq() (Mike McCandless, Robert Muir) 11947 11948* LUCENE-3003: Added new expert class oal.index.DocTermsOrd, 11949 refactored from Solr's UnInvertedField, for accessing term ords for 11950 multi-valued fields, per document. This is similar to FieldCache in 11951 that it inverts the index to compute the ords, but differs in that 11952 it's able to handle multi-valued fields and does not hold the term 11953 bytes in RAM. (Mike McCandless) 11954 11955* LUCENE-3108, LUCENE-2935, LUCENE-2168, LUCENE-1231: Changes from 11956 DocValues (ColumnStrideFields): 11957 11958 - IndexWriter now supports typesafe dense per-document values stored in 11959 a column like storage. DocValues are stored on a per-document 11960 basis where each documents field can hold exactly one value of a given 11961 type. DocValues are provided via Fieldable and can be used in 11962 conjunction with stored and indexed values. 11963 11964 - DocValues provides an entirely RAM resident document id to value 11965 mapping per field as well as a DocIdSetIterator based disk-resident 11966 sequential access API relying on filesystem-caches. 11967 11968 - Both APIs are exposed via IndexReader and the Codec / Flex API allowing 11969 expert users to integrate customized DocValues reader and writer 11970 implementations by extending existing Codecs. 11971 11972 - DocValues provides implementations for primitive datatypes like int, 11973 long, float, double and arrays of byte. Byte based implementations further 11974 provide storage variants like straight or dereferenced stored bytes, fixed 11975 and variable length bytes as well as index time sorted based on 11976 user-provided comparators. 11977 11978 (Mike McCandless, Simon Willnauer) 11979 11980* LUCENE-3209: Added MemoryCodec, which stores all terms & postings in 11981 RAM as an FST; this is good for primary-key fields if you frequently 11982 need to lookup by that field or perform deletions against it, for 11983 example in a near-real-time setting. (Mike McCandless) 11984 11985* SOLR-2533: Added support for rewriting Sort and SortFields using an 11986 IndexSearcher. SortFields can have SortField.REWRITEABLE type which 11987 requires they are rewritten before they are used. (Chris Male) 11988 11989* LUCENE-3203: FSDirectory can now limit the max allowed write rate 11990 (MB/sec) of all running merges, to reduce impact ongoing merging has 11991 on searching, NRT reopen time, etc. (Mike McCandless) 11992 11993* LUCENE-2793: Directory#createOutput & Directory#openInput now accept an 11994 IOContext instead of a buffer size to allow low level optimizations for 11995 different usecases like merging, flushing and reading. 11996 (Simon Willnauer, Mike McCandless, Varun Thacker) 11997 11998* LUCENE-3354: FieldCache can cache DocTermOrds. (Martijn van Groningen) 11999 12000* LUCENE-3376: ReusableAnalyzerBase has been moved from modules/analysis/common 12001 into lucene/src/java/org/apache/lucene/analysis (Chris Male) 12002 12003* LUCENE-3423: add Terms.getDocCount(), which returns the number of documents 12004 that have at least one term for a field. (Yonik Seeley, Robert Muir) 12005 12006* LUCENE-2959: Added a variety of different relevance ranking systems to Lucene. 12007 12008 - Added Okapi BM25, Language Models, Divergence from Randomness, and 12009 Information-Based Models. The models are pluggable, support all of lucene's 12010 features (boosts, slops, explanations, etc) and queries (spans, etc). 12011 12012 - All models default to the same index-time norm encoding as 12013 DefaultSimilarity, so you can easily try these out/switch back and 12014 forth/run experiments and comparisons without reindexing. Note: most of 12015 the models do rely upon index statistics that are new in Lucene 4.0, so 12016 for existing 3.x indexes it's a good idea to upgrade your index to the 12017 new format with IndexUpgrader first. 12018 12019 - Added a new subclass SimilarityBase which provides a simplified API 12020 for plugging in new ranking algorithms without dealing with all of the 12021 nuances and implementation details of Lucene. 12022 12023 - For example, to use BM25 for all fields: 12024 searcher.setSimilarity(new BM25Similarity()); 12025 12026 If you instead want to apply different similarities (e.g. ones with 12027 different parameter values or different algorithms entirely) to different 12028 fields, implement PerFieldSimilarityWrapper with your per-field logic. 12029 12030 (David Mark Nemeskey via Robert Muir) 12031 12032* LUCENE-3396: ReusableAnalyzerBase now provides a ReuseStrategy abstraction 12033 which controls how TokenStreamComponents are reused per request. Two 12034 implementations are provided - GlobalReuseStrategy which implements the 12035 current behavior of sharing components between all fields, and 12036 PerFieldReuseStrategy which shares per field. (Chris Male) 12037 12038* LUCENE-2309: Added IndexableField.tokenStream(Analyzer) which is now 12039 responsible for creating the TokenStreams for Fields when they are to 12040 be indexed. (Chris Male) 12041 12042* LUCENE-3433: Added random access for non RAM resident IndexDocValues. RAM 12043 resident and disk resident IndexDocValues are now exposed via the Source 12044 interface. ValuesEnum has been removed in favour of Source. (Simon Willnauer) 12045 12046* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements 12047 a new bits() method, returning all documents in a random access way. If the 12048 DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer 12049 as replacement for AtomicReader's live docs. 12050 In addition, FilteredQuery backs now IndexSearcher's filtering search methods. 12051 Using FilteredQuery you can chain Filters in a very performant way 12052 [new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not 12053 possible with IndexSearcher's methods. FilteredQuery also allows to override 12054 the heuristics used to decide if filtering should be done random access or 12055 using a conjunction on DocIdSet's iterator(). 12056 (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, 12057 Jason Rutherglen, Paul Elschot) 12058 12059* LUCENE-3638: Added sugar methods to IndexReader and IndexSearcher to 12060 load only certain fields when loading a document. (Peter Chang via 12061 Mike McCandless) 12062 12063* LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes 12064 a #normValues(String) method to obtain norms per field. (Simon Willnauer) 12065 12066* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute 12067 norm values or arbitrary precision. Instead of returning a fixed single byte 12068 value, custom similarities can now set a integer, float or byte value to the 12069 given Norm object. (Simon Willnauer) 12070 12071* LUCENE-2604, LUCENE-4103: Added RegexpQuery support to contrib/queryparser. 12072 (Simon Willnauer, Robert Muir, Daniel Truemper) 12073 12074* LUCENE-2373: Added a Codec implementation that works with append-only 12075 filesystems (such as e.g. Hadoop DFS). SegmentInfos writing/reading 12076 code is refactored to support append-only FS, and to allow for future 12077 customization of per-segment information. (Andrzej Bialecki) 12078 12079* LUCENE-2479: Added ability to provide a sort comparator for spelling suggestions along 12080 with two implementations. The existing comparator (score, then frequency) is the default (Grant Ingersoll) 12081 12082* LUCENE-2608: Added the ability to specify the accuracy at method time in the SpellChecker. The per class 12083 method is also still available. (Grant Ingersoll) 12084 12085* LUCENE-2507: Added DirectSpellChecker, which retrieves correction candidates directly 12086 from the term dictionary using levenshtein automata. (Robert Muir) 12087 12088* LUCENE-3527: Add LuceneLevenshteinDistance, which computes string distance in a compatible 12089 way as DirectSpellChecker. This can be used to merge top-N results from more than one 12090 SpellChecker. (James Dyer via Robert Muir) 12091 12092* LUCENE-3496: Support grouping by DocValues. (Martijn van Groningen) 12093 12094* LUCENE-2795: Generified DirectIOLinuxDirectory to work across any 12095 unix supporting the O_DIRECT flag when opening a file (tested on 12096 Linux and OS X but likely other Unixes will work), and improved it 12097 so it can be used for indexing and searching. The directory uses 12098 direct IO when doing large merges to avoid unnecessarily evicting 12099 cached IO pages due to large merges. (Varun Thacker, Mike 12100 McCandless) 12101 12102* LUCENE-3827: DocsAndPositionsEnum from MemoryIndex implements 12103 start/endOffset, if offsets are indexed. (Alan Woodward via Mike 12104 McCandless) 12105 12106* LUCENE-3802, LUCENE-3856: Support for grouped faceting. (Martijn van Groningen) 12107 12108* LUCENE-3444: Added a second pass grouping collector that keeps track of distinct 12109 values for a specified field for the top N group. (Martijn van Groningen) 12110 12111* LUCENE-3778: Added a grouping utility class that makes it easier to use result 12112 grouping for pure Lucene apps. (Martijn van Groningen) 12113 12114* LUCENE-2341: A new analysis/ filter: Morfologik - a dictionary-driven lemmatizer 12115 (accurate stemmer) for Polish (includes morphosyntactic annotations). 12116 (Michał Dybizbański, Dawid Weiss) 12117 12118* LUCENE-2413: Consolidated Lucene/Solr analysis components into analysis/common. 12119 New features from Solr now available to Lucene users include: 12120 - o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms 12121 and phrases. 12122 - o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML 12123 constructs. 12124 - o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words 12125 into subwords and performs optional transformations on subword groups. 12126 - o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which 12127 filters out Tokens at the same position and Term text as the previous token. 12128 - o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace 12129 from Tokens in the stream. 12130 - o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens 12131 with text contained in the required words (inverse of StopFilter). 12132 - o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts 12133 hyphenated words broken into two lines back together. 12134 - o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies 12135 capitalization rules to tokens. 12136 - o.a.l.analysis.pattern: Package for pattern-based analysis, containing a 12137 CharFilter, Tokenizer, and TokenFilter for transforming text with regexes. 12138 - o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word 12139 synonyms. 12140 - o.a.l.analysis.phonetic: Package for phonetic search, containing various 12141 phonetic encoders such as Double Metaphone. 12142 12143 Some existing analysis components changed packages: 12144 - o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer 12145 - o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer 12146 - o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer 12147 - o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter 12148 - o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer 12149 - o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer 12150 - o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer 12151 - o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter 12152 - o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer 12153 - o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer 12154 - o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter 12155 - o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter 12156 - o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter 12157 - o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter 12158 - o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter 12159 - o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper 12160 - o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter 12161 - o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter 12162 - o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter 12163 - o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter 12164 - o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap 12165 - o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet 12166 - o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap 12167 - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase 12168 - o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase 12169 - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader 12170 - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer 12171 - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils 12172 12173 All analyzers in contrib/analyzers and contrib/icu were moved to the 12174 analysis/ module. The 'smartcn' and 'stempel' components now depend on 'common'. 12175 (Chris Male, Robert Muir) 12176 12177* LUCENE-4004: Add DisjunctionMaxQuery support to the xml query parser. 12178 (Benson Margulies via Robert Muir) 12179 12180* LUCENE-4025: Add maybeRefreshBlocking to ReferenceManager, to let a caller 12181 block until the refresh logic has been executed. (Shai Erera, Mike McCandless) 12182 12183* LUCENE-4039: Add AddIndexesTask to benchmark, which uses IW.addIndexes. 12184 (Shai Erera) 12185 12186* LUCENE-3514: Added IndexSearcher.searchAfter when Sort is used, 12187 returning results after a specified FieldDoc for deep 12188 paging. (Mike McCandless) 12189 12190* LUCENE-4043: Added scoring support via score mode for query time joining. 12191 (Martijn van Groningen, Mike McCandless) 12192 12193* LUCENE-3523: Added oal.search.spell.WordBreakSpellChecker, which 12194 generates suggestions by combining two or more terms and/or 12195 breaking terms into multiple words. See Javadocs for usage. (James Dyer) 12196 12197* LUCENE-4019: Added improved parsing of Hunspell Dictionaries so those 12198 rules missing the required number of parameters either ignored or 12199 cause a ParseException (depending on whether strict parsing is enabled). 12200 (Luca Cavanna via Chris Male) 12201 12202* LUCENE-3440: Add ordered fragments feature with IDF-weighted terms for FVH. 12203 (Sebastian Lutze via Koji Sekiguchi) 12204 12205* LUCENE-4082: Added explain to ToParentBlockJoinQuery. 12206 (Christoph Kaser, Martijn van Groningen) 12207 12208* LUCENE-4108: add replaceTaxonomy to DirectoryTaxonomyWriter, which replaces 12209 the taxonomy in place with the given one. (Shai Erera) 12210 12211* LUCENE-3030: new BlockTree terms dictionary (used by the default 12212 Lucene40 postings format) uses less RAM (for the terms index) and 12213 disk space (for all terms and metadata) and gives sizable 12214 performance gains for terms dictionary intensive operations like 12215 FuzzyQuery, direct spell checker and primary-key lookup (Mike 12216 McCandless). 12217 12218Optimizations 12219 12220* LUCENE-2588: Don't store unnecessary suffixes when writing the terms 12221 index, saving RAM in IndexReader; change default terms index 12222 interval from 128 to 32, because the terms index now requires much 12223 less RAM. (Robert Muir, Mike McCandless) 12224 12225* LUCENE-2669: Optimize NumericRangeQuery.NumericRangeTermsEnum to 12226 not seek backwards when a sub-range has no terms. It now only seeks 12227 when the current term is less than the next sub-range's lower end. 12228 (Uwe Schindler, Mike McCandless) 12229 12230* LUCENE-2694: Optimize MultiTermQuery to be single pass for Term lookups. 12231 MultiTermQuery now stores TermState per leaf reader during rewrite to re- 12232 seek the term dictionary in TermQuery / TermWeight. 12233 (Simon Willnauer, Mike McCandless, Robert Muir) 12234 12235* LUCENE-3292: IndexWriter no longer shares the same SegmentReader 12236 instance for merging and NRT readers, which enables directory impls 12237 to separately tune IO flags for each. (Varun Thacker, Simon 12238 Willnauer, Mike McCandless) 12239 12240* LUCENE-3328: BooleanQuery now uses a specialized ConjunctionScorer if all 12241 boolean clauses are required and instances of TermQuery. 12242 (Simon Willnauer, Robert Muir) 12243 12244* LUCENE-3643: FilteredQuery and IndexSearcher.search(Query, Filter,...) 12245 now optimize the special case query instanceof MatchAllDocsQuery to 12246 execute as ConstantScoreQuery. (Uwe Schindler) 12247 12248* LUCENE-3509: Added fasterButMoreRam option for docvalues. This option controls whether the space for packed ints 12249 should be rounded up for better performance. This option only applies for docvalues types bytes fixed sorted 12250 and bytes var sorted. (Simon Willnauer, Martijn van Groningen) 12251 12252* LUCENE-3795: Replace contrib/spatial with modules/spatial. This includes 12253 a basic spatial strategy interface. (David Smiley, Chris Male, ryan) 12254 12255* LUCENE-3932: Lucene3x codec loads terms index faster, by 12256 pre-allocating the packed ints array based on the .tii file size 12257 (Sean Bridges via Mike McCandless) 12258 12259* LUCENE-3468: Replaced last() and remove() with pollLast() in 12260 FirstPassGroupingCollector (Martijn van Groningen) 12261 12262* LUCENE-3830: Changed MappingCharFilter/NormalizeCharMap to use an 12263 FST under the hood, which requires less RAM. NormalizeCharMap no 12264 longer accepts empty string match (it did previously, but ignored 12265 it). (Dawid Weiss, Mike McCandless) 12266 12267* LUCENE-4061: improve synchronization in DirectoryTaxonomyWriter.addCategory 12268 and few general improvements to DirectoryTaxonomyWriter. 12269 (Shai Erera, Gilad Barkai) 12270 12271* LUCENE-4062: Add new aligned packed bits impls for faster lookup 12272 performance; add float acceptableOverheadRatio to getWriter and 12273 getMutable API to give packed ints freedom to pick faster 12274 implementations (Adrien Grand via Mike McCandless) 12275 12276* LUCENE-2357: Reduce transient RAM usage when merging segments in 12277 IndexWriter. (Adrien Grand) 12278 12279* LUCENE-4098: Add bulk get/set methods to PackedInts (Adrien Grand 12280 via Mike McCandless) 12281 12282* LUCENE-4156: DirectoryTaxonomyWriter.getSize is no longer synchronized. 12283 (Shai Erera, Sivan Yogev) 12284 12285* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using 12286 the new WeakIdentityMap on top of a ConcurrentHashMap to manage 12287 the cloned instances. WeakIdentityMap was extended to support 12288 iterating over its keys. (Uwe Schindler) 12289 12290Bug fixes 12291 12292* LUCENE-2803: The FieldCache can miss values if an entry for a reader 12293 with more document deletions is requested before a reader with fewer 12294 deletions, provided they share some segments. (yonik) 12295 12296* LUCENE-2645: Fix false assertion error when same token was added one 12297 after another with 0 posIncr. (David Smiley, Kurosaka Teruhiko via Mike 12298 McCandless) 12299 12300* LUCENE-3348: Fix thread safety hazards in IndexWriter that could 12301 rarely cause deletions to be incorrectly applied. (Yonik Seeley, 12302 Simon Willnauer, Mike McCandless) 12303 12304* LUCENE-3515: Fix terrible merge performance versus 3.x, especially 12305 when the directory isn't MMapDirectory, due to failing to reuse 12306 DocsAndPositionsEnum while merging (Marc Sturlese, Erick Erickson, 12307 Robert Muir, Simon Willnauer, Mike McCandless) 12308 12309* LUCENE-3589: BytesRef copy(short) didn't set length. 12310 (Peter Chang via Robert Muir) 12311 12312* LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was 12313 not lowercasing the key before checking for the tag (Adriano Crestani) 12314 12315* LUCENE-3890: Fixed NPE for grouped faceting on multi-valued fields. 12316 (Michael McCandless, Martijn van Groningen) 12317 12318* LUCENE-2945: Fix hashCode/equals for surround query parser generated queries. 12319 (Paul Elschot, Simon Rosenthal, gsingers via ehatcher) 12320 12321* LUCENE-3971: MappingCharFilter could return invalid final token position. 12322 (Dawid Weiss, Robert Muir) 12323 12324* LUCENE-3820: PatternReplaceCharFilter could return invalid token positions. 12325 (Dawid Weiss) 12326 12327* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in 12328 CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter, 12329 SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer, 12330 WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and 12331 CommonGramsFilter now populate PositionLengthAttribute. Fixed 12332 PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in 12333 ReversePathHierarchyTokenizer if skip is large. Fixed wrong final 12334 offset calculation in PathHierarchyTokenizer. 12335 (Mike McCandless, Uwe Schindler, Robert Muir) 12336 12337* LUCENE-4060: Fix a synchronization bug in 12338 DirectoryTaxonomyWriter.addTaxonomies(). Also, the method has been renamed to 12339 addTaxonomy and now takes only one Directory and one OrdinalMap. 12340 (Shai Erera, Gilad Barkai) 12341 12342* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when 12343 offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix 12344 CharsRef's CharSequence methods to throw exceptions in boundary cases 12345 to properly meet the specification. (Robert Muir) 12346 12347* LUCENE-4084: Attempting to reuse a single IndexWriterConfig instance 12348 across more than one IndexWriter resulted in a cryptic exception. 12349 This is now fixed, but requires that certain members of 12350 IndexWriterConfig (MergePolicy, FlushPolicy, 12351 DocumentsWriterThreadPool) implement clone. (Robert Muir, Simon 12352 Willnauer, Mike McCandless) 12353 12354* LUCENE-4079: Fixed loading of Hunspell dictionaries that use aliasing (AF rules) 12355 (Ludovic Boutros via Chris Male) 12356 12357* LUCENE-4077: Expose the max score and per-group scores from 12358 ToParentBlockJoinCollector (Christoph Kaser, Mike McCandless) 12359 12360* LUCENE-4114: Fix int overflow bugs in BYTES_FIXED_STRAIGHT and 12361 BYTES_FIXED_DEREF doc values implementations (Walt Elder via Mike McCandless). 12362 12363* LUCENE-4147: Fixed thread safety issues when rollback() and commit() 12364 are called simultaneously. (Simon Willnauer, Mike McCandless) 12365 12366* LUCENE-4165: Removed closing of the Reader used to read the affix file in 12367 HunspellDictionary. Consumers are now responsible for closing all InputStreams 12368 once the Dictionary has been instantiated. (Torsten Krah, Uwe Schindler, Chris Male) 12369 12370Documentation 12371 12372* LUCENE-3958: Javadocs corrections for IndexWriter. 12373 (Iulius Curt via Robert Muir) 12374 12375Build 12376 12377* LUCENE-4047: Cleanup of LuceneTestCase: moved blocks of initialization/ cleanup 12378 code into JUnit instance and class rules. (Dawid Weiss) 12379 12380* LUCENE-4016: Require ANT 1.8.2+ for the build. 12381 12382* LUCENE-3808: Refactoring of testing infrastructure to use randomizedtesting 12383 package: http://labs.carrotsearch.com/randomizedtesting.html (Dawid Weiss) 12384 12385* LUCENE-3964: Added target stage-maven-artifacts, which stages 12386 Maven release artifacts to a Maven staging repository in preparation 12387 for release. (Steve Rowe) 12388 12389* LUCENE-2845: Moved contrib/benchmark to lucene/benchmark. 12390 12391* LUCENE-2995: Moved contrib/spellchecker into lucene/suggest. 12392 12393* LUCENE-3285: Moved contrib/queryparser into lucene/queryparser 12394 12395* LUCENE-3285: Moved contrib/xml-query-parser's demo into lucene/demo 12396 12397* LUCENE-3271: Moved contrib/queries BooleanFilter, BoostingQuery, 12398 ChainedFilter, FilterClause and TermsFilter into lucene/queries 12399 12400* LUCENE-3381: Moved contrib/queries regex.*, DuplicateFilter, 12401 FuzzyLikeThisQuery and SlowCollated* into lucene/sandbox. 12402 Removed contrib/queries. 12403 12404* LUCENE-3286: Moved remainder of contrib/xml-query-parser to lucene/queryparser. 12405 Classes now found at org.apache.lucene.queryparser.xml.* 12406 12407* LUCENE-4059: Improve ANT task prepare-webpages (used by documentation 12408 tasks) to correctly encode build file names as URIs for later processing by 12409 XSL. (Greg Bowyer, Uwe Schindler) 12410 12411 12412======================= Lucene 3.6.2 ======================= 12413 12414Bug Fixes 12415 12416* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest, 12417 and the number of matching documents is too large. (Gilad Barkai via Shai Erera) 12418 12419* LUCENE-2686, LUCENE-3505, LUCENE-4401: Fix BooleanQuery scorers to 12420 return correct freq(). 12421 (Koji Sekiguchi, Mike McCandless, Liu Chao, Robert Muir) 12422 12423* LUCENE-2501: Fixed rare thread-safety issue that could cause 12424 ArrayIndexOutOfBoundsException inside ByteBlockPool (Robert Muir, 12425 Mike McCandless) 12426 12427* LUCENE-4297: BooleanScorer2 would multiply the coord() factor 12428 twice for conjunctions: for most users this is no problem, but 12429 if you had a customized Similarity that returned something other 12430 than 1 when overlap == maxOverlap (always the case for conjunctions), 12431 then the score would be incorrect. (Pascal Chollet, Robert Muir) 12432 12433* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you 12434 had a custom Similarity where coord(1,1) != 1F, then the rewritten 12435 query would be scored differently. (Robert Muir) 12436 12437* LUCENE-4398: If you index many different field names in your 12438 documents then due to a bug in how it measures its RAM 12439 usage, IndexWriter would flush each segment too early eventually 12440 reaching the point where it flushes after every doc. (Tim Smith via 12441 Mike McCandless) 12442 12443* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth 12444 parameter is reset to the default (1), even if set otherwise. 12445 (Gilad Barkai via Shai Erera) 12446 12447* LUCENE-4635: Fixed ArrayIndexOutOfBoundsException when in-memory 12448 terms index requires more than 2.1 GB RAM (indices with billions of 12449 terms). (Tom Burton-West via Mike McCandless) 12450 12451Documentation 12452 12453* LUCENE-4302: Fix facet userguide to have HTML loose doctype like 12454 all other javadocs. (Karl Nicholas via Uwe Schindler) 12455 12456 12457======================= Lucene 3.6.1 ======================= 12458More information about this release, including any errata related to the 12459release notes, upgrade instructions, or other changes may be found online at: 12460 https://wiki.apache.org/lucene-java/Lucene3.6.1 12461 12462Bug Fixes 12463 12464* LUCENE-3969: Throw IAE on bad arguments that could cause confusing 12465 errors in KeywordTokenizer. 12466 (Uwe Schindler, Mike McCandless, Robert Muir) 12467 12468* LUCENE-3971: MappingCharFilter could return invalid final token position. 12469 (Dawid Weiss, Robert Muir) 12470 12471* LUCENE-4023: DisjunctionMaxScorer now implements visitSubScorers(). 12472 (Uwe Schindler) 12473 12474* LUCENE-2566: + - operators allow any amount of whitespace (yonik, janhoy) 12475 12476* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when 12477 offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix 12478 CharsRef's CharSequence methods to throw exceptions in boundary cases 12479 to properly meet the specification. (Robert Muir) 12480 12481* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the 12482 size in bytes not MB (Chris Fuller via Mike McCandless) 12483 12484API Changes 12485 12486* LUCENE-4023: Changed the visibility of Scorer#visitSubScorers() to 12487 public, otherwise it's impossible to implement Scorers outside 12488 the Lucene package. (Uwe Schindler) 12489 12490Optimizations 12491 12492* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using 12493 the new WeakIdentityMap on top of a ConcurrentHashMap to manage 12494 the cloned instances. WeakIdentityMap was extended to support 12495 iterating over its keys. (Uwe Schindler) 12496 12497Tests 12498 12499* LUCENE-3873: add MockGraphTokenFilter, testing analyzers with 12500 random graph tokens. (Mike McCandless) 12501 12502* LUCENE-3968: factor out LookaheadTokenFilter from 12503 MockGraphTokenFilter (Mike McCandless) 12504 12505 12506======================= Lucene 3.6.0 ======================= 12507More information about this release, including any errata related to the 12508release notes, upgrade instructions, or other changes may be found online at: 12509 https://wiki.apache.org/lucene-java/Lucene3.6 12510 12511Changes in backwards compatibility policy 12512 12513* LUCENE-3594: The protected inner class (never intended to be visible) 12514 FieldCacheTermsFilter.FieldCacheTermsFilterDocIdSet was removed and 12515 replaced by another internal implementation. (Uwe Schindler) 12516 12517* LUCENE-3620: FilterIndexReader now overrides all methods of IndexReader that 12518 it should (note that some are still not overridden, as they should be 12519 overridden by sub-classes only). In the process, some methods of IndexReader 12520 were made final. This is not expected to affect many apps, since these methods 12521 already delegate to abstract methods, which you had to already override 12522 anyway. (Shai Erera) 12523 12524* LUCENE-3636: Added SearcherFactory, used by SearcherManager and NRTManager 12525 to create new IndexSearchers. You can provide your own implementation to 12526 warm new searchers, set an ExecutorService, set a custom Similarity, or 12527 even return your own subclass of IndexSearcher. The SearcherWarmer and 12528 ExecutorService parameters on these classes were removed, as they are 12529 subsumed by SearcherFactory. (Shai Erera, Mike McCandless, Robert Muir) 12530 12531* LUCENE-3644: The expert ReaderFinishedListener api suffered problems (propagated 12532 down to subreaders, but was not called on SegmentReaders, unless they were 12533 the owner of the reader core, and other ambiguities). The API is revised: 12534 You can set ReaderClosedListeners on any IndexReader, and onClose is called 12535 when that reader is closed. SegmentReader has CoreClosedListeners that you 12536 can register to know when a shared reader core is closed. 12537 (Uwe Schindler, Mike McCandless, Robert Muir) 12538 12539* LUCENE-3652: The package org.apache.lucene.messages was moved to 12540 contrib/queryparser. If you have used those classes in your code 12541 just add the lucene-queryparser.jar file to your classpath. 12542 (Uwe Schindler) 12543 12544* LUCENE-3681: FST now stores labels for BYTE2 input type as 2 bytes 12545 instead of vInt; this can make FSTs smaller and faster, but it is a 12546 break in the binary format so if you had built and saved any FSTs 12547 then you need to rebuild them. (Robert Muir, Mike McCandless) 12548 12549* LUCENE-3679: The expert IndexReader.getFieldNames(FieldOption) API 12550 has been removed and replaced with the experimental getFieldInfos 12551 API. All IndexReader subclasses must implement getFieldInfos. 12552 (Mike McCandless) 12553 12554* LUCENE-3695: Move confusing add(X) methods out of FST.Builder into 12555 FST.Util. (Robert Muir, Mike McCandless) 12556 12557* LUCENE-3701: Added an additional argument to the expert FST.Builder 12558 ctor to take FreezeTail, which you can use to (very-expertly) customize 12559 the FST construction process. Pass null if you want the default 12560 behavior. Added seekExact() to FSTEnum, and added FST.save/read 12561 from a File. (Mike McCandless, Dawid Weiss, Robert Muir) 12562 12563* LUCENE-3712: Removed unused and untested ReaderUtil#subReader methods. 12564 (Uwe Schindler) 12565 12566* LUCENE-3672: Deprecate Directory.fileModified, 12567 IndexCommit.getTimestamp and .getVersion and 12568 IndexReader.lastModified and getCurrentVersion (Andrzej Bialecki, 12569 Robert Muir, Mike McCandless) 12570 12571* LUCENE-3760: In IndexReader/DirectoryReader, deprecate static 12572 methods getCurrentVersion and getCommitUserData, and non-static 12573 method getCommitUserData (use getIndexCommit().getUserData() 12574 instead). (Ryan McKinley, Robert Muir, Mike McCandless) 12575 12576* LUCENE-3867: Deprecate instance creation of RamUsageEstimator, instead 12577 the new static method sizeOf(Object) should be used. As the algorithm 12578 is now using Hotspot(TM) internals (reference size, header sizes, 12579 object alignment), the abstract o.a.l.util.MemoryModel class was 12580 completely removed (without replacement). The new static methods 12581 no longer support String intern-ness checking, interned strings 12582 now count to memory usage as any other Java object. 12583 (Dawid Weiss, Uwe Schindler, Shai Erera) 12584 12585* LUCENE-3738: All readXxx methods in BufferedIndexInput were made 12586 final. Subclasses should only override protected readInternal / 12587 seekInternal. (Uwe Schindler) 12588 12589* LUCENE-2599: Deprecated the spatial contrib module, which was buggy and not 12590 well maintained. Lucene 4 includes a new spatial module that replaces this. 12591 (David Smiley, Ryan McKinley, Chris Male) 12592 12593Changes in Runtime Behavior 12594 12595* LUCENE-3796, SOLR-3241: Throw an exception if you try to set an index-time 12596 boost on a field that omits norms. Because the index-time boost 12597 is multiplied into the norm, previously your boost would be 12598 silently discarded. (Tomás Fernández Löbbe, Hoss Man, Robert Muir) 12599 12600* LUCENE-3848: Fix tokenstreams to not produce a stream with an initial 12601 position increment of 0: which is out of bounds (overlapping with a 12602 non-existent previous term). Consumers such as IndexWriter and QueryParser 12603 still check for and silently correct this situation today, but at some point 12604 in the future they may throw an exception. (Mike McCandless, Robert Muir) 12605 12606* LUCENE-3738: DataInput/DataOutput no longer allow negative vLongs. Negative 12607 vInts are still supported (for index backwards compatibility), but 12608 should not be used in new code. The read method for negative vLongs 12609 was already broken since Lucene 3.1. 12610 (Uwe Schindler, Mike McCandless, Robert Muir) 12611 12612Security fixes 12613 12614* LUCENE-3588: Try harder to prevent SIGSEGV on cloned MMapIndexInputs: 12615 Previous versions of Lucene could SIGSEGV the JVM if you try to access 12616 the clone of an IndexInput retrieved from MMapDirectory. This security fix 12617 prevents this as best as it can by throwing AlreadyClosedException 12618 also on clones. (Uwe Schindler, Robert Muir) 12619 12620API Changes 12621 12622* LUCENE-3606: IndexReader will be made read-only in Lucene 4.0, so all 12623 methods allowing to delete or undelete documents using IndexReader were 12624 deprecated; you should use IndexWriter now. Consequently 12625 IndexReader.commit() and all open(), openIfChanged(), clone() methods 12626 taking readOnly booleans (or IndexDeletionPolicy instances) were 12627 deprecated. IndexReader.setNorm() is superfluous and was deprecated. 12628 If you have to change per-document boost use CustomScoreQuery. 12629 If you want to dynamically change norms (boost *and* length norm) at 12630 query time, wrap your IndexReader using FilterIndexReader, overriding 12631 FilterIndexReader.norms(). To persist the changes on disk, copy the 12632 FilteredIndexReader to a new index using IndexWriter.addIndexes(). 12633 In Lucene 4.0, SimilarityProvider will allow you to customize scoring 12634 using external norms, too. (Uwe Schindler, Robert Muir) 12635 12636* LUCENE-3735: PayloadProcessorProvider was changed to return a 12637 ReaderPayloadProcessor instead of DirPayloadProcessor. The selection 12638 of the provider to return for the factory is now based on the IndexReader 12639 to be merged. To mimic the old behaviour, just use IndexReader.directory() 12640 for choosing the provider by Directory. (Uwe Schindler) 12641 12642* LUCENE-3765: Deprecated StopFilter ctor that took ignoreCase, because 12643 in some cases (if the set is a CharArraySet), the argument is ignored. 12644 Deprecated StandardAnalyzer and ClassicAnalyzer ctors that take File, 12645 please use the Reader ctor instead. (Robert Muir) 12646 12647* LUCENE-3766: Deprecate no-arg ctors of Tokenizer. Tokenizers are 12648 TokenStreams with Readers: tokenizers with null Readers will not be 12649 supported in Lucene 4.0, just use a TokenStream. 12650 (Mike McCandless, Robert Muir) 12651 12652* LUCENE-3769: Simplified NRTManager by requiring applyDeletes to be 12653 passed to ctor only; if an app needs to mix and match it's free to 12654 create two NRTManagers (one always applying deletes and the other 12655 never applying deletes). (MJB, Shai Erera, Mike McCandless) 12656 12657* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager. 12658 SearcherManager remains a concrete class, but due to the refactoring, the 12659 method maybeReopen has been deprecated in favor of maybeRefresh(). 12660 (Shai Erera, Mike McCandless, Simon Willnauer) 12661 12662* LUCENE-3776: You now acquire/release the IndexSearcher directly from 12663 NRTManager. (Mike McCandless) 12664 12665New Features 12666 12667* LUCENE-3593: Added a FieldValueFilter that accepts all documents that either 12668 have at least one or no value at all in a specific field. (Simon Willnauer, 12669 Uwe Schindler, Robert Muir) 12670 12671* LUCENE-3586: CheckIndex and IndexUpgrader allow you to specify the 12672 specific FSDirectory implementation to use (with the new -dir-impl 12673 command-line option). (Luca Cavanna via Mike McCandless) 12674 12675* LUCENE-3634: IndexReader's static main method was moved to a new 12676 tool, CompoundFileExtractor, in contrib/misc. (Robert Muir, Mike 12677 McCandless) 12678 12679* LUCENE-995: The QueryParser now interprets * as an open end for range 12680 queries. Literal asterisks may be represented by quoting or escaping 12681 (i.e. \* or "*") Custom QueryParser subclasses overriding getRangeQuery() 12682 will be passed null for any open endpoint. (Ingo Renner, Adriano 12683 Crestani, yonik, Mike McCandless 12684 12685* LUCENE-3121: Add sugar reverse lookup (given an output, find the 12686 input mapping to it) for FSTs that have strictly monotonic long 12687 outputs (such as an ord). (Mike McCandless) 12688 12689* LUCENE-3671: Add TypeTokenFilter that filters tokens based on 12690 their TypeAttribute. (Tommaso Teofili via Uwe Schindler) 12691 12692* LUCENE-3690,LUCENE-3913: Added HTMLStripCharFilter, a CharFilter that strips 12693 HTML markup. (Steve Rowe) 12694 12695* LUCENE-3725: Added optional packing to FST building; this uses extra 12696 RAM during building but results in a smaller FST. (Mike McCandless) 12697 12698* LUCENE-3714: Add top N shortest cost paths search for FST. 12699 (Robert Muir, Dawid Weiss, Mike McCandless) 12700 12701* LUCENE-3789: Expose MTQ TermsEnum via RewriteMethod for non package private 12702 access (Simon Willnauer) 12703 12704* LUCENE-3881: Added UAX29URLEmailAnalyzer: a standard analyzer that recognizes 12705 URLs and emails. (Steve Rowe) 12706 12707Bug fixes 12708 12709* LUCENE-3595: Fixed FieldCacheRangeFilter and FieldCacheTermsFilter 12710 to correctly respect deletions on reopened SegmentReaders. Factored out 12711 FieldCacheDocIdSet to be a top-level class. (Uwe Schindler, Simon Willnauer) 12712 12713* LUCENE-3627: Don't let an errant 0-byte segments_N file corrupt the index. 12714 (Ken McCracken via Mike McCandless) 12715 12716* LUCENE-3630: The internal method MultiReader.doOpenIfChanged(boolean doClone) 12717 was overriding IndexReader.doOpenIfChanged(boolean readOnly), so changing the 12718 contract of the overridden method. This method was renamed and made private. 12719 In ParallelReader the bug was not existent, but the implementation method 12720 was also made private. (Uwe Schindler) 12721 12722* LUCENE-3641: Fixed MultiReader to correctly propagate readerFinishedListeners 12723 to clones/reopened readers. (Uwe Schindler) 12724 12725* LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed bugs in CharTokenizer, n-gram tokenizers/filters, 12726 compound token filters, thai word filter, icutokenizer, pattern analyzer, 12727 wikipediatokenizer, and smart chinese where they would create invalid offsets in 12728 some situations, leading to problems in highlighting. 12729 (Max Beutel, Edwin Steiner via Robert Muir) 12730 12731* LUCENE-3639: TopDocs.merge was incorrectly setting TopDocs.maxScore to 12732 Float.MIN_VALUE when it should be Float.NaN, when there were 0 12733 hits. Improved age calculation in SearcherLifetimeManager, to have 12734 double precision and to compute age to be how long ago the searcher 12735 was replaced with a new searcher (Mike McCandless) 12736 12737* LUCENE-3658: Corrected potential concurrency issues with 12738 NRTCachingDir, fixed createOutput to overwrite any previous file, 12739 and removed invalid asserts (Robert Muir, Mike McCandless) 12740 12741* LUCENE-3605: don't sleep in a retry loop when trying to locate the 12742 segments_N file (Robert Muir, Mike McCandless) 12743 12744* LUCENE-3711: SentinelIntSet with a small initial size can go into 12745 an infinite loop when expanded. This can affect grouping using 12746 TermAllGroupsCollector or TermAllGroupHeadsCollector if instantiated with a 12747 non default small size. (Martijn van Groningen, yonik) 12748 12749* LUCENE-3727: When writing stored fields and term vectors, Lucene 12750 checks file sizes to detect a bug in some Sun JREs (LUCENE-1282), 12751 however, on some NFS filesystems File.length() could be stale, 12752 resulting in false errors like "fdx size mismatch while indexing". 12753 These checks now use getFilePointer instead to avoid this. 12754 (Jamir Shaikh, Mike McCandless, Robert Muir) 12755 12756* LUCENE-3816: Fixed problem in FilteredDocIdSet, if null was returned 12757 from the delegate DocIdSet.iterator(), which is allowed to return 12758 null by DocIdSet specification when no documents match. 12759 (Shay Banon via Uwe Schindler) 12760 12761* LUCENE-3821: SloppyPhraseScorer missed documents that ExactPhraseScorer finds 12762 When phrase query had repeating terms (e.g. "yes no yes") 12763 sloppy query missed documents that exact query matched. 12764 Fixed except when for repeating multiterms (e.g. "yes no yes|no"). 12765 (Robert Muir, Doron Cohen) 12766 12767* LUCENE-3841: Fix CloseableThreadLocal to also purge stale entries on 12768 get(); this fixes certain cases where we were holding onto objects 12769 for dead threads for too long (Matthew Bellew, Mike McCandless) 12770 12771* LUCENE-3872: IndexWriter.close() now throws IllegalStateException if 12772 you call it after calling prepareCommit() without calling commit() 12773 first. (Tim Bogaert via Mike McCandless) 12774 12775* LUCENE-3874: Throw IllegalArgumentException from IndexWriter (rather 12776 than producing a corrupt index), if a positionIncrement would cause 12777 integer overflow. This can happen, for example when using a buggy 12778 TokenStream that forgets to call clearAttributes() in combination 12779 with a StopFilter. (Robert Muir) 12780 12781* LUCENE-3876: Fix bug where positions for a document exceeding 12782 Integer.MAX_VALUE/2 would produce a corrupt index. 12783 (Simon Willnauer, Mike McCandless, Robert Muir) 12784 12785* LUCENE-3880: UAX29URLEmailTokenizer now recognizes emails when the mailto: 12786 scheme is prepended. (Kai Gülzau, Steve Rowe) 12787 12788Optimizations 12789 12790* LUCENE-3653: Improve concurrency in VirtualMethod and AttributeSource by 12791 using a WeakIdentityMap based on a ConcurrentHashMap. (Uwe Schindler, 12792 Gerrit Jansen van Vuuren) 12793 12794Documentation 12795 12796* LUCENE-3597: Fixed incorrect grouping documentation. (Martijn van Groningen, 12797 Robert Muir) 12798 12799* LUCENE-3926: Improve documentation of RAMDirectory, because this 12800 class is not intended to work with huge indexes. Everything beyond 12801 several hundred megabytes will waste resources (GC cycles), because 12802 it uses an internal buffer size of 1024 bytes, producing millions of 12803 byte[1024] arrays. This class is optimized for small memory-resident 12804 indexes. It also has bad concurrency on multithreaded environments. 12805 It is recommended to materialize large indexes on disk and use 12806 MMapDirectory, which is a high-performance directory implementation 12807 working directly on the file system cache of the operating system, 12808 so copying data to Java heap space is not useful. (Uwe Schindler, 12809 Mike McCandless, Robert Muir) 12810 12811Build 12812 12813* LUCENE-3857: exceptions from other threads in beforeclass/etc do not fail 12814 the test (Dawid Weiss) 12815 12816* LUCENE-3847: LuceneTestCase will now check for modifications of System 12817 properties before and after each test (and suite). If changes are detected, 12818 the test will fail. A rule can be used to reset system properties to 12819 before-scope state (and this has been used to make Solr tests pass). 12820 (Dawid Weiss, Uwe Schindler). 12821 12822* LUCENE-3228: Stop downloading external javadoc package-list files: 12823 12824 - Added package-list files for Oracle Java javadocs and JUnit javadocs to 12825 Lucene/Solr subversion. 12826 12827 - The Oracle Java javadocs package-list file is excluded from Lucene and 12828 Solr source release packages. 12829 12830 - Regardless of network connectivity, javadocs built from a subversion 12831 checkout contain links to Oracle & JUnit javadocs. 12832 12833 - Building javadocs from a source release package will download the Oracle 12834 Java package-list file if it isn't already present. 12835 12836 - When the Oracle Java package-list file is not present and download fails, 12837 the javadocs targets will not fail the build, though an error will appear 12838 in the build log. In this case, the built javadocs will not contain links 12839 to Oracle Java javadocs. 12840 12841 - Links from Solr javadocs to Lucene's javadocs are enabled. When building 12842 a X.Y.Z-SNAPSHOT version, the links are to the most recently built nightly 12843 Jenkins javadocs. When building a release version, links are to the 12844 Lucene release javadocs for the same version. 12845 12846 (Steve Rowe, hossman) 12847 12848* LUCENE-3753: Restructure the Lucene build system: 12849 - Created a new Lucene-internal module named "core" by moving the java/ 12850 and test/ directories from lucene/src/ to lucene/core/src/. 12851 - Eliminated lucene/src/ by moving all its directories up one level. 12852 - Each internal module (core/, test-framework/, and tools/) now has its own 12853 build.xml, from which it is possible to run module-specific targets. 12854 lucene/build.xml delegates all build tasks (via 12855 <ant dir="internal-module-dir"> calls) to these modules' build.xml files. 12856 (Steve Rowe) 12857 12858* LUCENE-3774: Optimized and streamlined license and notice file validation 12859 by refactoring the build task into an ANT task and modifying build scripts 12860 to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir) 12861 12862* LUCENE-3762: Upgrade JUnit to 4.10, refactor state-machine of detecting 12863 setUp/tearDown call chaining in LuceneTestCase. (Dawid Weiss, Robert Muir) 12864 12865* LUCENE-3944: Make the 'generate-maven-artifacts' target use filtered POMs 12866 placed under lucene/build/poms/, rather than in each module's base 12867 directory. The 'clean' target now removes them. 12868 (Steve Rowe, Robert Muir) 12869 12870* LUCENE-3930: Changed build system to use Apache Ivy for retrival of 3rd 12871 party JAR files. Please review BUILD.txt for instructions. 12872 (Robert Muir, Chris Male, Uwe Schindler, Steven Rowe, Hossman) 12873 12874 12875======================= Lucene 3.5.0 ======================= 12876 12877Changes in backwards compatibility policy 12878 12879* LUCENE-3390: The first approach in Lucene 3.4.0 for missing values 12880 support for sorting had a design problem that made the missing value 12881 be populated directly into the FieldCache arrays during sorting, 12882 leading to concurrency issues. To fix this behaviour, the method 12883 signatures had to be changed: 12884 - FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField() 12885 returning a Bits interface (backported from Lucene 4.0). 12886 - FieldComparator.setMissingValue() was removed and added to 12887 constructor 12888 As this is expert API, most code will not be affected. 12889 (Uwe Schindler, Doron Cohen, Mike McCandless) 12890 12891* LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to 12892 keep a buffer in your IndexInput, do this yourself in your implementation, 12893 and be sure to do the right thing on clone()! (Robert Muir) 12894 12895* LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of 12896 relying on a private daemon thread. The global time limiting clock thread 12897 has been exposed and is now lazily loaded and fully optional. 12898 TimeLimitingCollector now supports setting clock baseline manually to include 12899 prelude of a search. Previous versions set the baseline on construction time, 12900 now baseline is set once the first IndexReader is passed to the collector 12901 unless set before. (Simon Willnauer) 12902 12903Changes in runtime behavior 12904 12905* LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time 12906 reader, will now return null if there are no changes. The API has 12907 always reserved the right to do this; it's just that in the past for 12908 near-real-time readers it never did. (Mike McCandless) 12909 12910Bug fixes 12911 12912* LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results 12913 for queries with many repeats (Doron Cohen) 12914 12915* LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false. 12916 (Edward Drapkin via Robert Muir) 12917 12918* LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy 12919 should ignore the maxMergedSegmentMB setting (v.sevel via Mike 12920 McCandless) 12921 12922* LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic 12923 IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing 12924 QueryWrapperFilter and similar classes to get a top-level DocIdSet. 12925 (Dan C., Uwe Schindler) 12926 12927* LUCENE-3390: Corrected handling of missing values when two parallel searches 12928 using different missing values for sorting: the missing value was populated 12929 directly into the FieldCache arrays during sorting, leading to concurrency 12930 issues. (Uwe Schindler, Doron Cohen, Mike McCandless) 12931 12932* LUCENE-3439: Closing an NRT reader after the writer was closed was 12933 incorrectly invoking the DeletionPolicy and (then possibly deleting 12934 files) on the closed IndexWriter (Robert Muir, Mike McCandless) 12935 12936* LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq 12937 (Robert Muir, Doron Cohen) 12938 12939* LUCENE-3503: DisjunctionSumScorer would give slightly different scores 12940 for a document depending if you used nextDoc() versus advance(). 12941 (Mike McCandless, Robert Muir) 12942 12943* LUCENE-3529: Properly support indexing an empty field with empty term text. 12944 Previously, if you had assertions enabled you would receive an error during 12945 flush, if you didn't, you would get an invalid index. 12946 (Mike McCandless, Robert Muir) 12947 12948* LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal 12949 structures larger than 256MB (Toke Eskildsen via Mike McCandless) 12950 12951* LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the 12952 error message in IndexFormatTooOldException was incorrect. (Uwe Schindler, 12953 Mike McCandless) 12954 12955* LUCENE-3541: IndexInput's default copyBytes() implementation was not safe 12956 across multiple threads, because all clones shared the same buffer. 12957 (Robert Muir) 12958 12959* LUCENE-3548: Fix CharsRef#append to extend length of the existing char[] 12960 and preserve existing chars. (Simon Willnauer) 12961 12962* LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() / 12963 NumericUtils.doubleToSortableLong(), so this is consistent with stored 12964 fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open 12965 ranges (one bound is null). Because of normalization, NumericRangeQuery 12966 can now be used to hit NaN values by creating a query with 12967 upper == lower == NaN (inclusive). (Dawid Weiss, Uwe Schindler) 12968 12969API Changes 12970 12971* LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage 12972 use of this method since it is horribly costly and rarely justified 12973 anymore. MergePolicy.findMergesForOptimize was renamed to 12974 findForcedMerges. IndexReader.isOptimized was 12975 deprecated. IndexCommit.isOptimized was replaced with 12976 getSegmentCount. (Robert Muir, Mike McCandless) 12977 12978* LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and 12979 related methods], as the numbers returned are not useful 12980 for multi-segment indexes. They were only needed for tests of 12981 NumericRangeQuery. (Mike McCandless, Uwe Schindler) 12982 12983* LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants 12984 and add new ones for Java 6 and Java 7. (Uwe Schindler) 12985 12986* LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors 12987 that take IndexReader instead. (Robert Muir) 12988 12989* LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes, 12990 and revamped the javadocs, to discourage 12991 use of this method since it is horribly costly and rarely 12992 justified. MergePolicy.findMergesToExpungeDeletes was renamed to 12993 findForcedDeletesMerges. (Robert Muir, Mike McCandless) 12994 12995* LUCENE-3464: IndexReader.reopen has been renamed to 12996 IndexReader.openIfChanged (a static method), and now returns null 12997 (instead of the old reader) if there are no changes in the index, to 12998 prevent the common pitfall of accidentally closing the old reader. 12999 13000New Features 13001 13002* LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI). 13003 (Uwe Schindler) 13004 13005* LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a 13006 specified ScoreDoc (e.g. last document on the previous page) to support deep 13007 paging use cases. (Aaron McCurry, Grant Ingersoll, Robert Muir) 13008 13009* LUCENE-1990: Adds internal packed ints implementation, to be used 13010 for more efficient storage of int arrays when the values are 13011 bounded, for example for storing the terms dict index (Toke 13012 Eskildsen via Mike McCandless) 13013 13014* LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into 13015 core. All classes are contained in o.a.l.search. (Simon Willnauer) 13016 13017Optimizations 13018 13019* LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to 13020 reduce the number of terms of the query when rewrite(), in order to improve 13021 performance. (Robert Muir, Koji Sekiguchi) 13022 13023* LUCENE-3494: Optimize FilteredQuery to remove a multiply in score() 13024 (Uwe Schindler, Robert Muir) 13025 13026* LUCENE-3534: Remove filter logic from IndexSearcher and delegate to 13027 FilteredQuery's Scorer. This is a partial backport of a cleanup in 13028 FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0. 13029 (Uwe Schindler) 13030 13031* LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold 13032 the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless) 13033 13034* LUCENE-3443: FieldCache can now set docsWithField, and create an 13035 array, in a single pass. This results in faster init time for apps 13036 that need both (such as sorting by a field with a missing value). 13037 (Mike McCandless) 13038 13039Test Cases 13040 13041* LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer 13042 for implementing subclasses in different packages, where assertions are not 13043 enabled. (Uwe Schindler) 13044 13045* LUCENE-3506: tests relying on assertions being enabled were no-op because 13046 they ignored AssertionError. With this fix now entire test framework 13047 (every test) fails if assertions are disabled, unless 13048 -Dtests.asserts.gracious=true is specified. (Doron Cohen) 13049 13050Build 13051 13052* SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe) 13053 13054* LUCENE-3561: Fix maven xxx-src.jar files that were missing resources. 13055 (Uwe Schindler) 13056 13057======================= Lucene 3.4.0 ======================= 13058 13059Bug fixes 13060 13061* LUCENE-3251: Directory#copy failed to close target output if opening the 13062 source stream failed. (Simon Willnauer) 13063 13064* LUCENE-3255: If segments_N file is all zeros (due to file 13065 corruption), don't read that to mean the index is empty. (Gregory 13066 Tarr, Mark Harwood, Simon Willnauer, Mike McCandless) 13067 13068* LUCENE-3254: Fixed minor bug in deletes were written to disk, 13069 causing the file to sometimes be larger than it needed to be. (Mike 13070 McCandless) 13071 13072* LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a 13073 corrupt index if a term with docfreq >= 16 was indexed more than once 13074 at the same position. (Robert Muir) 13075 13076* LUCENE-3339: Fixed deadlock case when multiple threads use the new 13077 block-add (IndexWriter.add/updateDocuments) methods. (Robert Muir, 13078 Mike McCandless) 13079 13080* LUCENE-3340: Fixed case where IndexWriter was not flushing at 13081 exactly maxBufferedDeleteTerms (Mike McCandless) 13082 13083* LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer 13084 wrongly discarded combining marks attached to Han or Hiragana characters, 13085 this is fixed if you supply Version >= 3.4 If you supply a previous 13086 lucene version, you get the old buggy behavior for backwards compatibility. 13087 (Trejkaz, Robert Muir) 13088 13089* LUCENE-3368: IndexWriter commits segments without applying their buffered 13090 deletes when flushing concurrently. (Simon Willnauer, Mike McCandless) 13091 13092* LUCENE-3365: Create or Append mode determined before obtaining write lock 13093 can cause IndexWriter overriding an existing index. 13094 (Geoff Cooney via Simon Willnauer) 13095 13096* LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly 13097 throw NoSuchDirectoryException when all files written so far have been 13098 written to one directory, but the other still has not yet been created on the 13099 filesystem. (Robert Muir) 13100 13101* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT 13102 SegmentReaders, leading to unused files accumulating in the 13103 Directory. (tal steier via Mike McCandless) 13104 13105* LUCENE-3418: Lucene was failing to fsync index files on commit, 13106 meaning an operating system or hardware crash, or power loss, could 13107 easily corrupt the index. (Mark Miller, Robert Muir, Mike 13108 McCandless) 13109 13110New Features 13111 13112* LUCENE-3290: Added FieldInvertState.numUniqueTerms 13113 (Mike McCandless, Robert Muir) 13114 13115* LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic 13116 (grow on demand if you set/get/clear too-large indices). (Mike 13117 McCandless) 13118 13119* LUCENE-2048: Added the ability to omit positions but still index 13120 term frequencies, you can now control what is indexed into 13121 the postings via AbstractField.setIndexOptions: 13122 DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted 13123 DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted 13124 DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions 13125 AbstractField.setOmitTermFrequenciesAndPositions is deprecated, 13126 you should use DOCS_ONLY instead. (Robert Muir) 13127 13128* LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant 13129 documents per group. This can be useful in situations when one wants to compute grouping 13130 based facets / statistics on the complete query result. (Martijn van Groningen) 13131 13132* LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log 13133 suppressed exceptions in the original exception, so stack trace 13134 will contain them. (Uwe Schindler) 13135 13136Optimizations 13137 13138* LUCENE-3201, LUCENE-3218: CompoundFileSystem code has been consolidated 13139 into a Directory implementation. Reading is optimized for MMapDirectory, 13140 NIOFSDirectory and SimpleFSDirectory to only map requested parts of the 13141 CFS into an IndexInput. Writing to a CFS now tries to append to the CF 13142 directly if possible and merges separately written files on the fly instead 13143 of during close. (Simon Willnauer, Robert Muir) 13144 13145* LUCENE-3289: When building an FST you can now tune how aggressively 13146 the FST should try to share common suffixes. Typically you can 13147 greatly reduce RAM required during building, and CPU consumed, at 13148 the cost of a somewhat larger FST. (Mike McCandless) 13149 13150Test Cases 13151 13152* LUCENE-3327: Fix AIOOBE when TestFSTs is run with -Dtests.verbose=true 13153 (James Dyer via Mike McCandless) 13154 13155Build 13156 13157* LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr 13158 to package sources from the local working copy. 13159 (Seung-Yeoul Yang via Steve Rowe) 13160 13161 13162======================= Lucene 3.3.0 ======================= 13163 13164Changes in backwards compatibility policy 13165 13166* LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass 13167 of IndexInput) as its first argument. (Robert Muir, Dawid Weiss, 13168 Mike McCandless) 13169 13170* LUCENE-3191: FieldComparator.value now returns an Object not 13171 Comparable; FieldDoc.fields also changed from Comparable[] to 13172 Object[] (Uwe Schindler, Mike McCandless) 13173 13174* LUCENE-3208: Made deprecated methods Query.weight(Searcher) and 13175 Searcher.createWeight() final to prevent override. If you have 13176 overridden one of these methods, cut over to the non-deprecated 13177 implementation. (Uwe Schindler, Robert Muir, Yonik Seeley) 13178 13179* LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent 13180 problems (such as not properly setting rewrite methods, or 13181 not working correctly with things like SpanMultiTermQueryWrapper). 13182 To rewrite to a simpler form, instead return a simpler enum 13183 from getEnum(IndexReader). For example, to rewrite to a single term, 13184 return a SingleTermEnum. (ludovic Boutros, Uwe Schindler, Robert Muir) 13185 13186Changes in runtime behavior 13187 13188* LUCENE-2834: the hash used to compute the lock file name when the 13189 lock file is not stored in the index has changed. This means you 13190 will see a different lucene-XXX-write.lock in your lock directory. 13191 (Robert Muir, Uwe Schindler, Mike McCandless) 13192 13193* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field 13194 does not store norms. (Shai Erera, Mike McCandless) 13195 13196* LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping, 13197 FSDirectory.open now defaults to MMapDirectory instead of 13198 NIOFSDirectory since MMapDirectory gives better performance. (Mike 13199 McCandless) 13200 13201* LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2. 13202 When setting the chunk size, it is rounded down to the next possible 13203 value. The new default value for 64 bit platforms is 2^30 (1 GiB), 13204 for 32 bit platforms it stays unchanged at 2^28 (256 MiB). 13205 Internally, MMapDirectory now only uses one dedicated final IndexInput 13206 implementation supporting multiple chunks, which makes Hotspot's life 13207 easier. (Uwe Schindler, Robert Muir, Mike McCandless) 13208 13209Bug fixes 13210 13211* LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the 13212 code. Now MockDirectoryWrapper (in test-framework) tracks all open files, 13213 including locks, and fails if the test fails to release all of them. 13214 (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer) 13215 13216* LUCENE-3102: CachingCollector.replay was failing to call setScorer 13217 per-segment (Martijn van Groningen via Mike McCandless) 13218 13219* LUCENE-3183: Fix rare corner case where seeking to empty term 13220 (field="", term="") with terms index interval 1 could hit 13221 ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike 13222 McCandless) 13223 13224* LUCENE-3208: IndexSearcher had its own private similarity field 13225 and corresponding get/setter overriding Searcher's implementation. If you 13226 setted a different Similarity instance on IndexSearcher, methods implemented 13227 in the superclass Searcher were not using it, leading to strange bugs. 13228 (Uwe Schindler, Robert Muir) 13229 13230* LUCENE-3197: Fix core merge policies to not over-merge during 13231 background optimize when documents are still being deleted 13232 concurrently with the optimize (Mike McCandless) 13233 13234* LUCENE-3222: The RAM accounting for buffered delete terms was 13235 failing to measure the space required to hold the term's field and 13236 text character data. (Mike McCandless) 13237 13238* LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside 13239 of a SpanMultiTermQueryWrapper rewrote incorrectly and returned 13240 an error instead. (ludovic Boutros, Uwe Schindler, Robert Muir) 13241 13242API Changes 13243 13244* LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert 13245 public method IndexSearcher.createNormalizedWeight() as this better describes 13246 what this method does. The old method is still there for backwards 13247 compatibility. Query.weight() was deprecated and simply delegates to 13248 IndexSearcher. Both deprecated methods will be removed in Lucene 4.0. 13249 (Uwe Schindler, Robert Muir, Yonik Seeley) 13250 13251* LUCENE-3197: MergePolicy.findMergesForOptimize now takes 13252 Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second 13253 argument, so the merge policy knows which segments were originally 13254 present vs produced by an optimizing merge (Mike McCandless) 13255 13256Optimizations 13257 13258* LUCENE-1736: DateTools.java general improvements. 13259 (David Smiley via Steve Rowe) 13260 13261New Features 13262 13263* LUCENE-3140: Added experimental FST implementation to Lucene. 13264 (Robert Muir, Dawid Weiss, Mike McCandless) 13265 13266* LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit 13267 algorithm over objects that implement the new TwoPhaseCommit interface (such 13268 as IndexWriter). (Shai Erera) 13269 13270* LUCENE-3191: Added TopDocs.merge, to facilitate merging results from 13271 different shards (Uwe Schindler, Mike McCandless) 13272 13273* LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless) 13274 13275* LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming 13276 segments with deletions; added new methods 13277 set/getReclaimDeletesWeight to control this. (Mike McCandless) 13278 13279Build 13280 13281* LUCENE-1344: Create OSGi bundle using dev-tools/maven. 13282 (Nicolas Lalevée, Luca Stancapiano via ryan) 13283 13284* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree; 13285 users of the generate-maven-artifacts target no longer have to manually 13286 place this jar in the Ant classpath. NOTE: when Ant looks for the 13287 maven-ant-tasks jar, it looks first in its pre-existing classpath, so 13288 any copies it finds will be used instead of the copy included in the 13289 Lucene/Solr source tree. For this reason, it is recommeded to remove 13290 any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under 13291 ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe) 13292 13293 13294======================= Lucene 3.2.0 ======================= 13295 13296Changes in backwards compatibility policy 13297 13298* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing 13299 with generics can lead to ClassCastException. For advanced use (e.g. in Solr) 13300 a method getHeapArray() was added to retrieve the internal heap array as a 13301 non-generic Object[]. (Uwe Schindler, Yonik Seeley) 13302 13303* LUCENE-1076: IndexWriter.setInfoStream now throws IOException 13304 (Mike McCandless, Shai Erera) 13305 13306* LUCENE-3084: MergePolicy.OneMerge.segments was changed from 13307 SegmentInfos to a List<SegmentInfo>. SegmentInfos itself was changed 13308 to no longer extend Vector<SegmentInfo> (to update code that is using 13309 Vector-API, use the new asList() and asSet() methods returning unmodifiable 13310 collections; modifying SegmentInfos is now only possible through 13311 the explicitely declared methods). IndexWriter.segString() now takes 13312 Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile 13313 should fix this. MergePolicy and SegmentInfos are internal/experimental 13314 APIs not covered by the strict backwards compatibility policy. 13315 (Uwe Schindler, Mike McCandless) 13316 13317Changes in runtime behavior 13318 13319* LUCENE-3065: When a NumericField is retrieved from a Document loaded 13320 from IndexReader (or IndexSearcher), it will now come back as 13321 NumericField not as a Field with a string-ified version of the 13322 numeric value you had indexed. Note that this only applies for 13323 newly-indexed Documents; older indices will still return Field 13324 with the string-ified numeric value. If you call Document.get(), 13325 the value comes still back as String, but Document.getFieldable() 13326 returns NumericField instances. (Uwe Schindler, Ryan McKinley, 13327 Mike McCandless) 13328 13329* LUCENE-1076: Changed the default merge policy from 13330 LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32 13331 (passed to IndexWriterConfig), which is able to merge non-contiguous 13332 segments. This means docIDs no longer necessarily stay "in order" 13333 during indexing. If this is a problem then you can use either of 13334 the LogMergePolicy impls. (Mike McCandless) 13335 13336New features 13337 13338* LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader 13339 that allows to upgrade all segments to last recent supported index 13340 format without fully optimizing. (Uwe Schindler, Mike McCandless) 13341 13342* LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous 13343 segments, which means docIDs no longer necessarily stay "in order". 13344 (Mike McCandless, Shai Erera) 13345 13346* LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to 13347 PathHierarchyTokenizer (Olivier Favre via ryan) 13348 13349* LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache 13350 document IDs and scores encountered during the search, and "replay" them to 13351 another Collector. (Mike McCandless, Shai Erera) 13352 13353* LUCENE-3112: Added experimental IndexWriter.add/updateDocuments, 13354 enabling a block of documents to be indexed, atomically, with 13355 guaranteed sequential docIDs. (Mike McCandless) 13356 13357API Changes 13358 13359* LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public 13360 (though @lucene.experimental), allowing for custom MergeScheduler 13361 implementations. (Shai Erera) 13362 13363* LUCENE-3065: Document.getField() was deprecated, as it throws 13364 ClassCastException when loading lazy fields or NumericFields. 13365 (Uwe Schindler, Ryan McKinley, Mike McCandless) 13366 13367* LUCENE-2027: Directory.touchFile is deprecated and will be removed 13368 in 4.0. (Mike McCandless) 13369 13370Optimizations 13371 13372* LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early 13373 on empty or one-element lists/arrays. (Uwe Schindler) 13374 13375* LUCENE-2897: Apply deleted terms while flushing a segment. We still 13376 buffer deleted terms to later apply to past segments. (Mike McCandless) 13377 13378* LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they 13379 aren't already and MergePolicy allows that. (Shai Erera) 13380 13381Bug fixes 13382 13383* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new 13384 indexes, causing existing deletions to be applied on the incoming indexes as 13385 well. (Shai Erera, Mike McCandless) 13386 13387* LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when 13388 seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike 13389 McCandless) 13390 13391* LUCENE-3042: When a filter or consumer added Attributes to a TokenStream 13392 chain after it was already (partly) consumed [or clearAttributes(), 13393 captureState(), cloneAttributes(),... was called by the Tokenizer], 13394 the Tokenizer calling clearAttributes() or capturing state after addition 13395 may not do this on the newly added Attribute. This bug affected only 13396 very special use cases of the TokenStream-API, most users would not 13397 have recognized it. (Uwe Schindler, Robert Muir) 13398 13399* LUCENE-3054: PhraseQuery can in some cases stack overflow in 13400 SorterTemplate.quickSort(). This fix also adds an optimization to 13401 PhraseQuery as term with lower doc freq will also have less positions. 13402 (Uwe Schindler, Robert Muir, Otis Gospodnetic) 13403 13404* LUCENE-3068: sloppy phrase query failed to match valid documents when multiple 13405 query terms had same position in the query. (Doron Cohen) 13406 13407* LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN) 13408 (Robert Muir) 13409 13410Build 13411 13412* LUCENE-3006: Building javadocs will fail on warnings by default. 13413 Override with -Dfailonjavadocwarning=false (sarowe, gsingers) 13414 13415* LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse 13416 integration (unless one already exists). (Daniel Serodio via Shai Erera) 13417 13418Test Cases 13419 13420* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to 13421 stop iterating if at least 'tests.iter.min' ran and a failure occured. 13422 (Shai Erera, Chris Hostetter) 13423 13424======================= Lucene 3.1.0 ======================= 13425 13426Changes in backwards compatibility policy 13427 13428* LUCENE-2719: Changed API of internal utility class 13429 org.apache.lucene.util.SorterTemplate to support faster quickSort using 13430 pivot values and also merge sort and insertion sort. If you have used 13431 this class, you have to implement two more methods for handling pivots. 13432 (Uwe Schindler, Robert Muir, Mike McCandless) 13433 13434* LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to 13435 toString. These are advanced APIs and subject to change suddenly. 13436 (Tim Smith via Mike McCandless) 13437 13438* LUCENE-2190: Removed deprecated customScore() and customExplain() 13439 methods from experimental CustomScoreQuery. (Uwe Schindler) 13440 13441* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. 13442 This means that terms with a position increment gap of zero do not 13443 affect the norms calculation by default. (Robert Muir) 13444 13445* LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting 13446 the IndexWriter for a MergePolicy exactly once. You can change references to 13447 'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code> 13448 (it is also advisable to add an <code>assert writer != null;</code> before you 13449 access the wrapped IndexWriter.) 13450 13451 In addition, MergePolicy only exposes a default constructor, and the one that 13452 took IndexWriter as argument has been removed from all MergePolicy extensions. 13453 (Shai Erera via Mike McCandless) 13454 13455* LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to 13456 FSDirectory.FSIndexInput. Anyone extending this class will have to 13457 fix their code on upgrading. (Earwin Burrfoot via Mike McCandless) 13458 13459* LUCENE-2302: The new interface for term attributes, CharTermAttribute, 13460 now implements CharSequence. This requires the toString() methods of 13461 CharTermAttribute, deprecated TermAttribute, and Token to return only 13462 the term text and no other attribute contents. LUCENE-2374 implements 13463 an attribute reflection API to no longer rely on toString() for attribute 13464 inspection. (Uwe Schindler, Robert Muir) 13465 13466* LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer, 13467 PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed 13468 the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod(). 13469 Analyzer and TokenStream base classes now have an assertion in their ctor, 13470 that check subclasses to be final or at least have final implementations 13471 of incrementToken(), tokenStream(), and reusableTokenStream(). 13472 (Uwe Schindler, Robert Muir) 13473 13474* LUCENE-2316: Directory.fileLength contract was clarified - it returns the 13475 actual file's length if the file exists, and throws FileNotFoundException 13476 otherwise. Returning length=0 for a non-existent file is no longer allowed. If 13477 you relied on that, make sure to catch the exception. (Shai Erera) 13478 13479* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index 13480 creation. Previously, if you passed an empty Directory and set OpenMode to 13481 CREATE*, IndexWriter would make a first empty commit. If you need that 13482 behavior you can call writer.commit()/close() immediately after you create it. 13483 (Shai Erera, Mike McCandless) 13484 13485* LUCENE-2733: Removed public constructors of utility classes with only static 13486 methods to prevent instantiation. (Uwe Schindler) 13487 13488* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now 13489 takes deletions into account by default. You can disable this by 13490 calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike 13491 McCandless) 13492 13493* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty 13494 values in multi-valued field has been changed for some cases in index. 13495 If you index empty fields and uses positions/offsets information on that 13496 fields, reindex is recommended. (David Smiley, Koji Sekiguchi) 13497 13498* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException. 13499 (Shai Erera, Robert Muir) 13500 13501* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and 13502 Searchable are collapsed into IndexSearcher; contrib/remote and 13503 MultiSearcher have been removed. (Mike McCandless) 13504 13505* LUCENE-2854: Deprecated SimilarityDelegator and 13506 Similarity.lengthNorm; the latter is now final, forcing any custom 13507 Similarity impls to cutover to the more general computeNorm (Robert 13508 Muir, Mike McCandless) 13509 13510* LUCENE-2869: Deprecated Query.getSimilarity: instead of using 13511 "runtime" subclassing/delegation, subclass the Weight instead. 13512 (Robert Muir) 13513 13514* LUCENE-2674: A new idfExplain method was added to Similarity, that 13515 accepts an incoming docFreq. If you subclass Similarity, make sure 13516 you also override this method on upgrade. (Robert Muir, Mike 13517 McCandless) 13518 13519Changes in runtime behavior 13520 13521* LUCENE-1923: Made IndexReader.toString() produce something 13522 meaningful (Tim Smith via Mike McCandless) 13523 13524* LUCENE-2179: CharArraySet.clear() is now functional. 13525 (Robert Muir, Uwe Schindler) 13526 13527* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index 13528 before it adds the new ones. Also, the existing segments are not merged and so 13529 the index will not end up with a single segment (unless it was empty before). 13530 In addition, addIndexesNoOptimize was renamed to addIndexes and no longer 13531 invokes a merge on the incoming and target segments, but instead copies the 13532 segments to the target index. You can call maybeMerge or optimize after this 13533 method completes, if you need to. 13534 13535 In addition, Directory.copyTo* were removed in favor of copy which takes the 13536 target Directory, source and target files as arguments, and copies the source 13537 file to the target Directory under the target file name. (Shai Erera) 13538 13539* LUCENE-2663: IndexWriter no longer forcefully clears any existing 13540 locks when create=true. This was a holdover from when 13541 SimpleFSLockFactory was the default locking implementation, and, 13542 even then it was dangerous since it could mask bugs in IndexWriter's 13543 usage, allowing applications to accidentally open two writers on the 13544 same directory. (Mike McCandless) 13545 13546* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on 13547 LogMergePolicy now affect optimize() as well (as opposed to only regular 13548 merges). This means that you can run optimize() and too large segments won't 13549 be merged. (Shai Erera) 13550 13551* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List, 13552 guaranteeing the commits are sorted from oldest to latest. (Shai Erera) 13553 13554* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and 13555 the IndexSearcher search methods that take an int nDocs will now 13556 throw IllegalArgumentException if nDocs is 0. Instead, you should 13557 use the newly added TotalHitCountCollector. (Mike McCandless) 13558 13559* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio 13560 to determine whether the passed in segment should be compound. 13561 (Shai Erera, Earwin Burrfoot) 13562 13563* LUCENE-2805: IndexWriter now increments the index version on every change to 13564 the index instead of for every commit. Committing or closing the IndexWriter 13565 without any changes to the index will not cause any index version increment. 13566 (Simon Willnauer, Mike McCandless) 13567 13568* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit 13569 Windows and Solaris systems that support unmapping, FSDirectory.open returns 13570 MMapDirectory. Additionally the behavior of MMapDirectory has been 13571 changed to enable unmapping by default if supported by the JRE. 13572 (Mike McCandless, Uwe Schindler, Robert Muir) 13573 13574* LUCENE-2829: Improve the performance of "primary key" lookup use 13575 case (running a TermQuery that matches one document) on a 13576 multi-segment index. (Robert Muir, Mike McCandless) 13577 13578* LUCENE-2010: Segments with 100% deleted documents are now removed on 13579 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) 13580 13581* LUCENE-2960: Allow some changes to IndexWriterConfig to take effect 13582 "live" (after an IW is instantiated), via 13583 IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless) 13584 13585API Changes 13586 13587* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George 13588 Aroush via Mike McCandless) 13589 13590* LUCENE-1260: Change norm encode (float->byte) and decode 13591 (byte->float) to be instance methods not static methods. This way a 13592 custom Similarity can alter how norms are encoded, though they must 13593 still be encoded as a single byte (Johan Kindgren via Mike 13594 McCandless) 13595 13596* LUCENE-2103: NoLockFactory should have a private constructor; 13597 until Lucene 4.0 the default one will be deprecated. 13598 (Shai Erera via Uwe Schindler) 13599 13600* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store. 13601 Since the removal of compressed fields, Store can only be YES, so 13602 it's not necessary to specify. (Erik Hatcher via Mike McCandless) 13603 13604* LUCENE-2200: Several final classes had non-overriding protected 13605 members. These were converted to private and unused protected 13606 constructors removed. (Steven Rowe via Robert Muir) 13607 13608* LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have 13609 Version ctors. (Simon Willnauer via Uwe Schindler) 13610 13611* LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing 13612 unused files. This is only useful on Windows, which prevents 13613 deletion of open files. IndexWriter will eventually remove these 13614 files itself; this method just lets you do so when you know the 13615 files are no longer open by IndexReaders. (luocanrao via Mike 13616 McCandless) 13617 13618* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier 13619 use by external code. In addition it offers a matchExtension method which 13620 callers can use to query whether a certain file matches a certain extension. 13621 (Shai Erera via Mike McCandless) 13622 13623* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery. 13624 This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but 13625 only scores terms by their boost values. For example, this can be used 13626 with FuzzyQuery to ensure that exact matches are always scored higher, 13627 because only the boost will be used in scoring. (Robert Muir) 13628 13629* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to 13630 expose its folding logic. (Cédrik Lime via Robert Muir) 13631 13632* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a 13633 single ctor which accepts IndexWriterConfig and a Directory. You can set all 13634 the parameters related to IndexWriter on IndexWriterConfig. The different 13635 setter/getter methods were deprecated as well. One should call 13636 writer.getConfig().getXYZ() to query for a parameter XYZ. 13637 Additionally, the setter/getter related to MergePolicy were deprecated as 13638 well. One should interact with the MergePolicy directly. 13639 (Shai Erera via Mike McCandless) 13640 13641* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to 13642 IndexWriterConfig and the respective methods on IndexWriter were deprecated. 13643 (Shai Erera via Mike McCandless) 13644 13645* LUCENE-2328: Directory now keeps track itself of the files that are written 13646 but not yet fsynced. The old Directory.sync(String file) method is deprecated 13647 and replaced with Directory.sync(Collection<String> files). Take a look at 13648 FSDirectory to see a sample of how such tracking might look like, if needed 13649 in your custom Directories. (Earwin Burrfoot via Mike McCandless) 13650 13651* LUCENE-2302: Deprecated TermAttribute and replaced by a new 13652 CharTermAttribute. The change is backwards compatible, so 13653 mixed new/old TokenStreams all work on the same char[] buffer 13654 independent of which interface they use. CharTermAttribute 13655 has shorter method names and implements CharSequence and 13656 Appendable. This allows usage like Java's StringBuilder in 13657 addition to direct char[] access. Also terms can directly be 13658 used in places where CharSequence is allowed (e.g. regular 13659 expressions). 13660 (Uwe Schindler, Robert Muir) 13661 13662* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit 13663 points too. If you use an IndexDeletionPolicy which holds onto index commits 13664 (such as SnapshotDeletionPolicy), you can call this method to remove those 13665 commit points when they are not needed anymore (instead of waiting for the 13666 next commit). (Shai Erera) 13667 13668* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced 13669 with equivalent ones that take a String (id) as argument. You can pass 13670 whatever ID you want, as long as you use the same one when calling both. 13671 (Shai Erera) 13672 13673* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to 13674 set what IndexWriter passes for termsIndexDivisor to the readers it 13675 opens internally when apply deletions or creating a near-real-time 13676 reader. (Earwin Burrfoot via Mike McCandless) 13677 13678* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer 13679 in common/standard/ now implement the Word Break rules from the Unicode 6.0.0 13680 Text Segmentation algorithm (UAX#29), covering the full range of Unicode code 13681 points, including values from U+FFFF to U+10FFFF 13682 13683 ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/ 13684 Analyzer implementation and behavior. Only the Unicode Basic Multilingual 13685 Plane (code points from U+0000 to U+FFFF) is covered. 13686 13687 UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the 13688 relevant RFCs, in addition to implementing the UAX#29 Word Break rules. 13689 (Steven Rowe, Robert Muir, Uwe Schindler) 13690 13691* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override 13692 and return a different RAMFile implementation. (Shai Erera) 13693 13694* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to 13695 count the number of hits matching the query. (Mike McCandless) 13696 13697* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method 13698 is only syntactic sugar for setNorm(int, String, byte), but using the global 13699 Similarity.getDefault().encodeNormValue(). Use the byte-based method instead 13700 to ensure that the norm is encoded with your Similarity. 13701 (Robert Muir, Mike McCandless) 13702 13703* LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the 13704 contents of AttributeImpl and AttributeSource using a well-defined API. 13705 This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes 13706 in a structured way. 13707 There are also some backwards incompatible changes in toString() output, 13708 as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute 13709 leading to changed toString() return values. The new API allows to get a 13710 string representation in a well-defined way using a new method 13711 reflectAsString(). For backwards compatibility reasons, when toString() 13712 was implemented by implementation subclasses, the default implementation of 13713 AttributeImpl.reflectWith() uses toString()s output instead to report the 13714 Attribute's properties. Otherwise, reflectWith() uses Java's reflection 13715 (like toString() did before) to get the attribute properties. 13716 In addition, the mandatory equals() and hashCode() are no longer required 13717 for AttributeImpls, but can still be provided (if needed). 13718 (Uwe Schindler) 13719 13720* LUCENE-2691: Deprecate IndexWriter.getReader in favor of 13721 IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless) 13722 13723* LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity, 13724 it should keep it itself. Fixed Scorers to pass their parent Weight, so that 13725 Scorer.visitSubScorers (LUCENE-2590) will work correctly. 13726 (Robert Muir, Doron Cohen) 13727 13728* LUCENE-2900: When opening a near-real-time (NRT) reader 13729 (IndexReader.re/open(IndexWriter)) you can now specify whether 13730 deletes should be applied. Applying deletes can be costly, and some 13731 expert use cases can handle seeing deleted documents returned. The 13732 deletes remain buffered so that the next time you open an NRT reader 13733 and pass true, all deletes will be a applied. (Mike McCandless) 13734 13735* LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now 13736 require up front specification of enablePositionIncrement. Together with 13737 StopFilter they have a common base class (FilteringTokenFilter) that handles 13738 the position increments automatically. Implementors only need to override an 13739 accept() method that filters tokens. (Uwe Schindler, Robert Muir) 13740 13741Bug fixes 13742 13743* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on 13744 close. (Martin Traverso via Uwe Schindler) 13745 13746* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap 13747 incorrectly and lead to ConcurrentModificationException. 13748 (Uwe Schindler, Robert Muir) 13749 13750* LUCENE-2328: Index files fsync tracking moved from 13751 IndexWriter/IndexReader to Directory, and it no longer leaks memory. 13752 (Earwin Burrfoot via Mike McCandless) 13753 13754* LUCENE-2074: Reduce buffer size of lexer back to default on reset. 13755 (Ruben Laguna, Shai Erera via Uwe Schindler) 13756 13757* LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on 13758 a prior (corrupt) index missing its segments_N file. (Mike 13759 McCandless) 13760 13761* LUCENE-2458: QueryParser no longer automatically forms phrase queries, 13762 assuming whitespace tokenization. Previously all CJK queries, for example, 13763 would be turned into phrase queries. The old behavior is preserved with 13764 the matchVersion parameter for previous versions. Additionally, you can 13765 explicitly enable the old behavior with setAutoGeneratePhraseQueries(true) 13766 (Robert Muir) 13767 13768* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in 13769 OOM if a large file was copied. (Shai Erera) 13770 13771* LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions 13772 exceeds number of terms at one position (Jayendra Patil via Mike McCandless) 13773 13774* LUCENE-2617: Optional clauses of a BooleanQuery were not factored 13775 into coord if the scorer for that segment returned null. This 13776 can cause the same document to score to differently depending on 13777 what segment it resides in. (yonik) 13778 13779* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll) 13780 13781* LUCENE-2732: Fix charset problems in XML loading in 13782 HyphenationCompoundWordTokenFilter. (Uwe Schindler) 13783 13784* LUCENE-2802: NRT DirectoryReader returned incorrect values from 13785 getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due 13786 to a mutable reference to the IndexWriters SegmentInfos. 13787 (Simon Willnauer, Earwin Burrfoot) 13788 13789* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a 13790 false EOF after seeking to EOF then seeking back to same block you 13791 were just in and then calling readBytes (Robert Muir, Mike McCandless) 13792 13793* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it 13794 decides whether to return the cached computed size or not. (Shai Erera) 13795 13796* LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if 13797 called by multiple threads. (Alexander Kanarsky via Shai Erera) 13798 13799* LUCENE-2809: Fixed IndexWriter.numDocs to take into account 13800 applied but not yet flushed deletes. (Mike McCandless) 13801 13802* LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing 13803 internally, it now calls Similarity.idfExplain(Collection, IndexSearcher). 13804 (Robert Muir) 13805 13806* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed. 13807 (Jason Rutherglen via Shai Erera) 13808 13809* LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round() 13810 is safe also in strange locales. (Uwe Schindler) 13811 13812* LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor, 13813 which can be used to prevent loading the terms index into memory. (Shai Erera) 13814 13815* LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during 13816 indexing) had an underflow detection bug that caused floatToByte(f)==0 where 13817 f was greater than 0, but slightly less than byteToFloat(1). This meant that 13818 certain very small field norms (index_boost * length_norm) could have 13819 been rounded down to 0 instead of being rounded up to the smallest 13820 positive number. (yonik) 13821 13822* LUCENE-2936: PhraseQuery score explanations were not correctly 13823 identifying matches vs non-matches. (hossman) 13824 13825* LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if 13826 the underlying readByte() is inlined (which happens e.g. in MMapDirectory). 13827 The loop was unwinded which makes the hotspot bug disappear. 13828 (Uwe Schindler, Robert Muir, Mike McCandless) 13829 13830New features 13831 13832* LUCENE-2128: Parallelized fetching document frequencies during weight 13833 creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler) 13834 13835* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch 13836 to Java 5, supplementary characters are now lowercased correctly if the 13837 set is created as case insensitive. 13838 CharArraySet now requires a Version argument to preserve 13839 backwards compatibility. If Version < 3.1 is passed to the constructor, 13840 CharArraySet yields the old behavior. (Simon Willnauer) 13841 13842* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch 13843 to Java 5, supplementary characters are now lowercased correctly. 13844 LowerCaseFilter now requires a Version argument to preserve 13845 backwards compatibility. If Version < 3.1 is passed to the constructor, 13846 LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir) 13847 13848* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer 13849 that makes it easier to reuse TokenStreams correctly. This issue also added 13850 StopwordAnalyzerBase, which improves consistency of all Analyzers that use 13851 stopwords, and implement many analyzers in contrib with it. 13852 (Simon Willnauer via Robert Muir) 13853 13854* LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a 13855 new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler) 13856 13857* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support 13858 to CharTokenizer and its subclasses. CharTokenizer now has new 13859 int-API which is conditionally preferred to the old char-API depending 13860 on the provided Version. Version < 3.1 will use the char-API. 13861 (Simon Willnauer via Uwe Schindler) 13862 13863* LUCENE-2247: Added a CharArrayMap<V> for performance improvements 13864 in some stemmers and synonym filters. (Uwe Schindler) 13865 13866* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set 13867 exactly once. (Shai Erera via Mike McCandless) 13868 13869* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that 13870 allows to use cloneAttributes() and this method as a replacement 13871 for captureState()/restoreState(), if the state itself 13872 needs to be inspected/modified. (Uwe Schindler) 13873 13874* LUCENE-2293: Expose control over max number of threads that 13875 IndexWriter will allow to run concurrently while indexing 13876 documents (previously this was hardwired to 5), using 13877 IndexWriterConfig.setMaxThreadStates. (Mike McCandless) 13878 13879* LUCENE-2297: Enable turning on reader pooling inside IndexWriter 13880 even when getReader (near-real-timer reader) is not in use, through 13881 IndexWriterConfig.enable/disableReaderPooling. (Mike McCandless) 13882 13883* LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In 13884 addition, add NoMergeScheduler which never executes any merges. These two are 13885 convenient classes in case you want to disable segment merges by IndexWriter 13886 without tweaking a particular MergePolicy parameters, such as mergeFactor. 13887 MergeScheduler's methods are now public. (Shai Erera via Mike McCandless) 13888 13889* LUCENE-2339: Deprecate static method Directory.copy in favor of 13890 Directory.copyTo, and use nio's FileChannel.transferTo when copying 13891 files between FSDirectory instances. (Earwin Burrfoot via Mike 13892 McCandless). 13893 13894* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the 13895 matchVersion parameter is Version.LUCENE_31. (Uwe Schindler) 13896 13897* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy 13898 can be used to prevent commits from ever getting deleted from the index. 13899 (Shai Erera) 13900 13901* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can 13902 return a DirPayloadProcessor for a given Directory, which returns a 13903 PayloadProcessor for a given Term. The PayloadProcessor will be used to 13904 process the payloads of the segments as they are merged (e.g. if one wants to 13905 rewrite payloads of external indexes as they are added, or of local ones). 13906 (Shai Erera, Michael Busch, Mike McCandless) 13907 13908* LUCENE-2440: Add support for custom ExecutorService in 13909 ParallelMultiSearcher (Edward Drapkin via Mike McCandless) 13910 13911* LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter 13912 to wrap any other Analyzer and provide the same functionality as 13913 MaxFieldLength provided on IndexWriter. This patch also fixes a bug 13914 in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera) 13915 13916* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when 13917 it's empty. (Ross Woolf via Mike McCandless) 13918 13919* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike 13920 McCandless) 13921 13922* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along 13923 with a custom Collector these experimental methods make it possible 13924 to gather the hit-count per sub-clause and per document while a 13925 search is running. (Simon Willnauer, Mike McCandless) 13926 13927* LUCENE-2636: Added MultiCollector which allows running the search with several 13928 Collectors. (Shai Erera) 13929 13930* LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries 13931 to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>. 13932 Using this wrapper it's easy to add fuzzy/wildcard to e.g. a SpanNearQuery. 13933 (Robert Muir, Uwe Schindler) 13934 13935* LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query 13936 instance for stripping off scores. The use of a QueryWrapperFilter 13937 is no longer needed and discouraged for that use case. Directly wrapping 13938 Query improves performance, as out-of-order collection is now supported. 13939 (Uwe Schindler) 13940 13941* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to 13942 FieldInvertState so that it can be used in Similarity.computeNorm. 13943 (Robert Muir) 13944 13945* LUCENE-2720: Segments now record the code version which created them. 13946 (Shai Erera, Mike McCandless, Uwe Schindler) 13947 13948* LUCENE-2474: Added expert ReaderFinishedListener API to 13949 IndexReader, to allow apps that maintain external per-segment caches 13950 to evict entries when a segment is finished. (Shay Banon, Yonik 13951 Seeley, Mike McCandless) 13952 13953* LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and 13954 the ICUTokenizer in contrib now all tag types with a consistent set 13955 of token types (defined in StandardTokenizer). Tokens in the major 13956 CJK types are explicitly marked to allow for custom downstream handling: 13957 <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>. 13958 (Robert Muir, Steven Rowe) 13959 13960* LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler) 13961 13962* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields 13963 (Tim Smith, Grant Ingersoll) 13964 13965* LUCENE-2692: Added several new SpanQuery classes for positional checking 13966 (match is in a range, payload is a specific value) (Grant Ingersoll) 13967 13968Optimizations 13969 13970* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of 13971 simple polling for results. (Edward Drapkin, Simon Willnauer) 13972 13973* LUCENE-2075: Terms dict cache is now shared across threads instead 13974 of being stored separately in thread local storage. Also fixed 13975 terms dict so that the cache is used when seeking the thread local 13976 term enum, which will be important for MultiTermQuery impls that do 13977 lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik 13978 Seeley) 13979 13980* LUCENE-2136: If the multi reader (DirectoryReader or MultiReader) 13981 only has a single sub-reader, delegate all enum requests to it. 13982 This avoid the overhead of using a PQ unnecessarily. (Mike 13983 McCandless) 13984 13985* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin 13986 Burrfoot via Mike McCandless) 13987 13988* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode 13989 into MultiTermQuery. The number of fuzzy expansions can be specified with 13990 the maxExpansions parameter to FuzzyQuery. 13991 (Uwe Schindler, Robert Muir, Mike McCandless) 13992 13993* LUCENE-2164: ConcurrentMergeScheduler has more control over merge 13994 threads. First, it gives smaller merges higher thread priority than 13995 larges ones. Second, a new set/getMaxMergeCount setting will pause 13996 the larger merges to allow smaller ones to finish. The defaults for 13997 these settings are now dynamic, depending the number CPU cores as 13998 reported by Runtime.getRuntime().availableProcessors() (Mike 13999 McCandless) 14000 14001* LUCENE-2169: Improved CharArraySet.copy(), if source set is 14002 also a CharArraySet. (Simon Willnauer via Uwe Schindler) 14003 14004* LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[] 14005 directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to 14006 take advantage of this for faster performance. 14007 (Steven Rowe, Uwe Schindler, Robert Muir) 14008 14009* LUCENE-2188: Add a utility class for tracking deprecated overridden 14010 methods in non-final subclasses. 14011 (Uwe Schindler, Robert Muir) 14012 14013* LUCENE-2195: Speedup CharArraySet if set is empty. 14014 (Simon Willnauer via Robert Muir) 14015 14016* LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler) 14017 14018* LUCENE-2303: Remove code duplication in Token class by subclassing 14019 TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve 14020 null-handling for TypeAttribute. (Uwe Schindler) 14021 14022* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique 14023 term to parallel arrays, indexed by termID. This reduces garbage collection 14024 overhead significantly, which results in great indexing performance wins 14025 when the available JVM heap space is low. This will become even more 14026 important when the DocumentsWriter RAM buffer is searchable in the future, 14027 because then it will make sense to make the RAM buffers as large as 14028 possible. (Mike McCandless, Michael Busch) 14029 14030* LUCENE-2380: The terms field cache methods (getTerms, 14031 getTermsIndex), which replace the older String equivalents 14032 (getStrings, getStringIndex), consume quite a bit less RAM in most 14033 cases. (Mike McCandless) 14034 14035* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching. 14036 (Mike McCandless) 14037 14038* LUCENE-2531: Fix issue when sorting by a String field that was 14039 causing too many fallbacks to compare-by-value (instead of by-ord). 14040 (Mike McCandless) 14041 14042* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for 14043 efficient copying by sub-classes. Optimized copy is implemented for RAM and FS 14044 streams. (Shai Erera) 14045 14046* LUCENE-2719: Improved TermsHashPerField's sorting to use a better 14047 quick sort algorithm that dereferences the pivot element not on 14048 every compare call. Also replaced lots of sorting code in Lucene 14049 by the improved SorterTemplate class. 14050 (Uwe Schindler, Robert Muir, Mike McCandless) 14051 14052* LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery. 14053 (Robert Muir) 14054 14055* LUCENE-2770: Make SegmentMerger always work on atomic subreaders, 14056 even when IndexWriter.addIndexes(IndexReader...) is used with 14057 DirectoryReaders or other MultiReaders. This saves lots of memory 14058 during merge of norms. (Uwe Schindler, Mike McCandless) 14059 14060* LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks. 14061 (Robert Muir) 14062 14063* LUCENE-2010: Segments with 100% deleted documents are now removed on 14064 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) 14065 14066* LUCENE-1472: Removed synchronization from static DateTools methods 14067 by using a ThreadLocal. Also converted DateTools.Resolution to a 14068 Java 5 enum (this should not break backwards). (Uwe Schindler) 14069 14070Build 14071 14072* LUCENE-2124: Moved the JDK-based collation support from contrib/collation 14073 into core, and moved the ICU-based collation support into contrib/icu. 14074 (Robert Muir) 14075 14076* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards 14077 branch is now included in the svn repository using "svn copy" 14078 after release. (Uwe Schindler) 14079 14080* LUCENE-2074: Regenerating StandardTokenizerImpl files now needs 14081 JFlex 1.5 (currently only available on SVN). (Uwe Schindler) 14082 14083* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You 14084 can force them to run sequentially by passing -Drunsequential=1 on the command 14085 line. The number of threads that are spawned per CPU defaults to '1'. If you 14086 wish to change that, you can run the tests with -DthreadsPerProcessor=[num]. 14087 (Robert Muir, Shai Erera, Peter Kofler) 14088 14089* LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar 14090 from tarball of previous version. Backwards tests are now packaged together 14091 with src distribution. (Uwe Schindler) 14092 14093* LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration: 14094 "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ 14095 (Steven Rowe) 14096 14097* LUCENE-2657: Switch from using Maven POM templates to full POMs when 14098 generating Maven artifacts (Steven Rowe) 14099 14100* LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's 14101 tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera, 14102 Steven Rowe) 14103 14104Test Cases 14105 14106* LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson 14107 via Mike McCandless) 14108 14109* LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson, 14110 Mike McCandless) 14111 14112* LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay 14113 Kay via Mike McCandless) 14114 14115* LUCENE-2155: Fix time and zone dependent localization test failures 14116 in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir) 14117 14118* LUCENE-2170: Fix thread starvation problems. (Uwe Schindler) 14119 14120* LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use 14121 Version.LUCENE_CURRENT, but instead use a global static value 14122 from LuceneTestCase(J4), that contains the release version. 14123 (Uwe Schindler, Simon Willnauer, Shai Erera) 14124 14125* LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control 14126 verbosity of tests. If VERBOSE==false (default) tests should not print 14127 anything other than errors to System.(out|err). The setting can be 14128 changed with -Dtests.verbose=true on test invocation. 14129 (Shai Erera, Paul Elschot, Uwe Schindler) 14130 14131* LUCENE-2318: Remove inconsistent system property code for retrieving 14132 temp and data directories inside test cases. It is now centralized in 14133 LuceneTestCase(J4). Also changed lots of tests to use 14134 getClass().getResourceAsStream() to retrieve test data. Tests needing 14135 access to "real" files from the test folder itself, can use 14136 LuceneTestCase(J4).getDataFile(). (Uwe Schindler) 14137 14138* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such 14139 as Eclipse and IntelliJ. 14140 (Paolo Castagna, Steven Rowe via Robert Muir) 14141 14142* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at 14143 random. (Shai Erera, Robert Muir) 14144 14145Documentation 14146 14147* LUCENE-2579: Fix oal.search's package.html description of abstract 14148 methods. (Santiago M. Mola via Mike McCandless) 14149 14150* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage 14151 that the TermEnum must be seeked since it is unpositioned. 14152 (Adriano Crestani via Robert Muir) 14153 14154* LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc. 14155 (Shinichiro Abe, Koji Sekiguchi) 14156 14157================== Release 2.9.4 / 3.0.3 ==================== 14158 14159Changes in runtime behavior 14160 14161* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a 14162 test lock just before the real lock is acquired. (Surinder Pal 14163 Singh Bindra via Mike McCandless) 14164 14165* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file 14166 handles against deleted files when compound-file was enabled (the 14167 default) and readers are pooled. As a result of this the peak 14168 worst-case free disk space required during optimize is now 3X the 14169 index size, when compound file is enabled (else 2X). (Mike 14170 McCandless) 14171 14172* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = 14173 0.1), which means any time a merged segment is greater than 10% of 14174 the index size, it will be left in non-compound format even if 14175 compound format is on. This change was made to reduce peak 14176 transient disk usage during optimize which increased due to 14177 LUCENE-2762. (Mike McCandless) 14178 14179Bug fixes 14180 14181* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer 14182 throws an exception when term count exceeds doc count. 14183 (Mike McCandless, Uwe Schindler) 14184 14185* LUCENE-2513: when opening writable IndexReader on a not-current 14186 commit, do not overwrite "future" commits. (Mike McCandless) 14187 14188* LUCENE-2536: IndexWriter.rollback was failing to properly rollback 14189 buffered deletions against segments that were flushed (Mark Harwood 14190 via Mike McCandless) 14191 14192* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results 14193 with endpoints near Long.MIN_VALUE and Long.MAX_VALUE: 14194 NumericUtils.splitRange() overflowed, if 14195 - the range contained a LOWER bound 14196 that was greater than (Long.MAX_VALUE - (1L << precisionStep)) 14197 - the range contained an UPPER bound 14198 that was less than (Long.MIN_VALUE + (1L << precisionStep)) 14199 With standard precision steps around 4, this had no effect on 14200 most queries, only those that met the above conditions. 14201 Queries with large precision steps failed more easy. Queries with 14202 precision step >=64 were not affected. Also 32 bit data types int 14203 and float were not affected. 14204 (Yonik Seeley, Uwe Schindler) 14205 14206* LUCENE-2593: Fixed certain rare cases where a disk full could lead 14207 to a corrupted index (Robert Muir, Mike McCandless) 14208 14209* LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks 14210 would result in unbearably slow performance. (Nick Barkas via Robert Muir) 14211 14212* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an 14213 exact multiple of the chunk size. (Robert Muir) 14214 14215* LUCENE-2634: isCurrent on an NRT reader was failing to return false 14216 if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless) 14217 14218* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing 14219 an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir) 14220 14221* LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset. 14222 (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074) 14223 14224* LUCENE-2658: Exceptions while processing term vectors enabled for multiple 14225 fields could lead to invalid ArrayIndexOutOfBoundsExceptions. 14226 (Robert Muir, Mike McCandless) 14227 14228* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap(). 14229 (Javier Godoy via Uwe Schindler) 14230 14231* LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked 14232 already sync'd files. (Earwin Burrfoot via Mike McCandless) 14233 14234* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record 14235 the absolute docid. (Uwe Schindler) 14236 14237* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when 14238 primary & secondary dirs share the same underlying directory. 14239 (Michael McCandless) 14240 14241* LUCENE-2365: IndexWriter.newestSegment (used normally for testing) 14242 is fixed to return null if there are no segments. (Karthick 14243 Sankarachary via Mike McCandless) 14244 14245* LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless) 14246 14247* LUCENE-2744: CheckIndex was stating total number of fields, 14248 not the number that have norms enabled, on the "test: field 14249 norms..." output. (Mark Kristensson via Mike McCandless) 14250 14251* LUCENE-2759: Fixed two near-real-time cases where doc store files 14252 may be opened for read even though they are still open for write. 14253 (Mike McCandless) 14254 14255* LUCENE-2618: Fix rare thread safety issue whereby 14256 IndexWriter.optimize could sometimes return even though the index 14257 wasn't fully optimized (Mike McCandless) 14258 14259* LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[]) 14260 that could potentially result in index corruption. (Mike 14261 McCandless) 14262 14263* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file 14264 handles against deleted files when compound-file was enabled (the 14265 default) and readers are pooled. As a result of this the peak 14266 worst-case free disk space required during optimize is now 3X the 14267 index size, when compound file is enabled (else 2X). (Mike 14268 McCandless) 14269 14270* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for 14271 sets that only differed by trailing zeros. (Dawid Weiss, yonik) 14272 14273* LUCENE-2782: Fix rare potential thread hazard with 14274 IndexWriter.commit (Mike McCandless) 14275 14276API Changes 14277 14278* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = 14279 0.1), which means any time a merged segment is greater than 10% of 14280 the index size, it will be left in non-compound format even if 14281 compound format is on. This change was made to reduce peak 14282 transient disk usage during optimize which increased due to 14283 LUCENE-2762. (Mike McCandless) 14284 14285Optimizations 14286 14287* LUCENE-2556: Improve memory usage after cloning TermAttribute. 14288 (Adriano Crestani via Uwe Schindler) 14289 14290* LUCENE-2098: Improve the performance of BaseCharFilter, especially for 14291 large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir) 14292 14293New features 14294 14295* LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files 14296 also in 2.9. The file format did not change, only the version number was 14297 upgraded to mark segments that have no compression. FieldsWriter still only 14298 writes 2.9 segments as they could contain compressed fields. This cross-version 14299 index format compatibility is provided here solely because Lucene 2.9 and 3.0 14300 have the same bugfix level, features, and the same index format with this slight 14301 compression difference. In general, Lucene does not support reading newer 14302 indexes with older library versions. (Uwe Schindler) 14303 14304Documentation 14305 14306* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to 14307 Java NIO behavior when a Thread is interrupted while blocking on IO. 14308 (Simon Willnauer, Robert Muir) 14309 14310================== Release 2.9.3 / 3.0.2 ==================== 14311 14312Changes in backwards compatibility policy 14313 14314* LUCENE-2135: Added FieldCache.purge(IndexReader) method to the 14315 interface. Anyone implementing FieldCache externally will need to 14316 fix their code to implement this, on upgrading. (Mike McCandless) 14317 14318Changes in runtime behavior 14319 14320* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if 14321 it cannot delete the lock file, since obtaining the lock does not fail if the 14322 file is there. (Shai Erera) 14323 14324* LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for 14325 maxNumThreads from 3 to 1, because in practice we get the most gains 14326 from running a single merge in the backround. More than one 14327 concurrent merge causes alot of thrashing (though it's possible on 14328 SSD storage that there would be net gains). (Jason Rutherglen, Mike 14329 McCandless) 14330 14331Bug fixes 14332 14333* LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after 14334 IndexWriter.prepareCommit has been called but before 14335 IndexWriter.commit is called. (Peter Keegan via Mike McCandless) 14336 14337* LUCENE-2119: Don't throw NegativeArraySizeException if you pass 14338 Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul 14339 Taylor via Mike McCandless) 14340 14341* LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an 14342 exception when term count exceeds doc count. (Mike McCandless) 14343 14344* LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by 14345 another thread/process. (Shai Erera via Uwe Schindler) 14346 14347* LUCENE-2283: Use shared memory pool for term vector and stored 14348 fields buffers. This memory will be reclaimed if needed according to 14349 the configured RAM Buffer Size for the IndexWriter. This also fixes 14350 potentially excessive memory usage when many threads are indexing a 14351 mix of small and large documents. (Tim Smith via Mike McCandless) 14352 14353* LUCENE-2300: If IndexWriter is pooling reader (because NRT reader 14354 has been obtained), and addIndexes* is run, do not pool the 14355 readers from the external directory. This is harmless (NRT reader is 14356 correct), but a waste of resources. (Mike McCandless) 14357 14358* LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains 14359 little performance, and ties up possibly large amounts of memory 14360 for apps that index large docs. (Ross Woolf via Mike McCandless) 14361 14362* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed, 14363 in IndexWriter, nor the Reader in Tokenizer after close is 14364 called. (Ruben Laguna, Uwe Schindler, Mike McCandless) 14365 14366* LUCENE-2417: IndexCommit did not implement hashCode() and equals() 14367 consistently. Now they both take Directory and version into consideration. In 14368 addition, all of IndexComnmit methods which threw 14369 UnsupportedOperationException are now abstract. (Shai Erera) 14370 14371* LUCENE-2467: Fixed memory leaks in IndexWriter when large documents 14372 are indexed. (Mike McCandless) 14373 14374* LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war 14375 demo resulted in ArrayIndexOutOfBoundsException. 14376 (Sami Siren via Robert Muir) 14377 14378* LUCENE-2476: If any exception is hit init'ing IW, release the write 14379 lock (previously we only released on IOException). (Tamas Cservenak 14380 via Mike McCandless) 14381 14382* LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when 14383 Filter.getDocIdSet() returns null. (Uwe Schindler, Daniel Noll) 14384 14385* LUCENE-2468: Allow specifying how new deletions should be handled in 14386 CachingWrapperFilter and CachingSpanFilter. By default, new 14387 deletions are ignored in CachingWrapperFilter, since typically this 14388 filter is AND'd with a query that correctly takes new deletions into 14389 account. This should be a performance gain (higher cache hit rate) 14390 in apps that reopen readers, or use near-real-time reader 14391 (IndexWriter.getReader()), but may introduce invalid search results 14392 (allowing deleted docs to be returned) for certain cases, so a new 14393 expert ctor was added to CachingWrapperFilter to enforce deletions 14394 at a performance cost. CachingSpanFilter by default recaches if 14395 there are new deletions (Shay Banon via Mike McCandless) 14396 14397* LUCENE-2299: If you open an NRT reader while addIndexes* is running, 14398 it may miss some segments (Earwin Burrfoot via Mike McCandless) 14399 14400* LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if 14401 there are no commits yet (Shai Erera) 14402 14403* LUCENE-2424: Fix FieldDoc.toString to actually return its fields 14404 (Stephen Green via Mike McCandless) 14405 14406* LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores) 14407 SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so 14408 that warming is free to do whatever it needs to. (Earwin Burrfoot 14409 via Mike McCandless) 14410 14411* LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero 14412 position-increment tokens that would sometimes assign different 14413 scores to identical docs. (Mike McCandless) 14414 14415* LUCENE-2486: Fixed intermittent FileNotFoundException on doc store 14416 files when a mergedSegmentWarmer is set on IndexWriter. (Mike 14417 McCandless) 14418 14419* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a 14420 multi-segment index (Michael McCandless) 14421 14422API Changes 14423 14424* LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform 14425 operations before flush starts. Also exposed doAfterFlush as protected instead 14426 of package-private. (Shai Erera via Mike McCandless) 14427 14428* LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set 14429 what IndexWriter passes for termsIndexDivisor to the readers it 14430 opens internally when applying deletions or creating a 14431 near-real-time reader. (Earwin Burrfoot via Mike McCandless) 14432 14433Optimizations 14434 14435* LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher 14436 instead of simple polling for results. (Edward Drapkin, Simon Willnauer) 14437 14438* LUCENE-2135: On IndexReader.close, forcefully evict any entries from 14439 the FieldCache rather than waiting for the WeakHashMap to release 14440 the reference (Mike McCandless) 14441 14442* LUCENE-2161: Improve concurrency of IndexReader, especially in the 14443 context of near real-time readers. (Mike McCandless) 14444 14445* LUCENE-2360: Small speedup to recycling of reused per-doc RAM in 14446 IndexWriter (Robert Muir, Mike McCandless) 14447 14448Build 14449 14450* LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5 14451 contrib modules on request (pass '-Dforce.jdk14.build=true') when 14452 compiling/testing/packaging. This marks the benchmark contrib also 14453 as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler) 14454 14455================== Release 2.9.2 / 3.0.1 ==================== 14456 14457Changes in backwards compatibility policy 14458 14459* LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm 14460 from FuzzyQuery. The change was needed because the comparator of this 14461 class had to be changed in an incompatible way. The class was never 14462 intended to be public. (Uwe Schindler, Mike McCandless) 14463 14464Bug fixes 14465 14466 * LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode 14467 and equals methods, cause bad things to happen when caching 14468 BooleanQueries. (Chris Hostetter, Mike McCandless) 14469 14470 * LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at 14471 the same time, it's possible for commit to return control back to 14472 one of the threads before all changes are actually committed. 14473 (Sanne Grinovero via Mike McCandless) 14474 14475 * LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser 14476 with a Version argument. (Brian Li via Robert Muir) 14477 14478 * LUCENE-2166: Don't incorrectly keep warning about the same immense 14479 term, when IndexWriter.infoStream is on. (Mike McCandless) 14480 14481 * LUCENE-2158: At high indexing rates, NRT reader could temporarily 14482 lose deletions. (Mike McCandless) 14483 14484 * LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load 14485 implementation class when interface was loaded by a different 14486 class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy) 14487 14488 * LUCENE-2257: Increase max number of unique terms in one segment to 14489 termIndexInterval (default 128) * ~2.1 billion = ~274 billion. 14490 (Tom Burton-West via Mike McCandless) 14491 14492 * LUCENE-2260: Fixed AttributeSource to not hold a strong 14493 reference to the Attribute/AttributeImpl classes which prevents 14494 unloading of custom attributes loaded by other classloaders 14495 (e.g. in Solr plugins). (Uwe Schindler) 14496 14497 * LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when 14498 only one payload is present. (Erik Hatcher, Mike McCandless 14499 via Uwe Schindler) 14500 14501 * LUCENE-2270: Queries consisting of all zero-boost clauses 14502 (for example, text:foo^0) sorted incorrectly and produced 14503 invalid docids. (yonik) 14504 14505API Changes 14506 14507 * LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor 14508 (it was accidentally removed in 3.0.0) (Mike McCandless) 14509 14510 * LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource 14511 (it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler) 14512 14513 * LUCENE-2190: Added a new class CustomScoreProvider to function package 14514 that can be subclassed to provide custom scoring to CustomScoreQuery. 14515 The methods in CustomScoreQuery that did this before were deprecated 14516 and replaced by a method getCustomScoreProvider(IndexReader) that 14517 returns a custom score implementation using the above class. The change 14518 is necessary with per-segment searching, as CustomScoreQuery is 14519 a stateless class (like all other Queries) and does not know about 14520 the currently searched segment. This API works similar to Filter's 14521 getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless, 14522 Uwe Schindler) 14523 14524 * LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant 14525 will cause backwards compatibility problems when upgrading Lucene. See 14526 the Version javadocs for additional information. 14527 (Robert Muir) 14528 14529Optimizations 14530 14531 * LUCENE-2086: When resolving deleted terms, do so in term sort order 14532 for better performance (Bogdan Ghidireac via Mike McCandless) 14533 14534 * LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue 14535 added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless) 14536 14537 * LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum. 14538 (Uwe Schindler, Robert Muir) 14539 14540Test Cases 14541 14542 * LUCENE-2114: Change TestFilteredSearch to test on multi-segment 14543 index as well. (Simon Willnauer via Mike McCandless) 14544 14545 * LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute 14546 that checks if clearAttributes() was called correctly. 14547 (Uwe Schindler, Robert Muir) 14548 14549 * LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if 14550 end() is implemented correctly. (Koji Sekiguchi, Robert Muir) 14551 14552Documentation 14553 14554 * LUCENE-2114: Improve javadocs of Filter to call out that the 14555 provided reader is per-segment (Simon Willnauer via Mike 14556 McCandless) 14557 14558======================= Release 3.0.0 ======================= 14559 14560Changes in backwards compatibility policy 14561 14562* LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot() 14563 from IndexCommitPoint to IndexCommit. Code that uses this method 14564 needs to be recompiled against Lucene 3.0 in order to work. The 14565 previously deprecated IndexCommitPoint is also removed. 14566 (Michael Busch) 14567 14568* o.a.l.Lock.isLocked() is now allowed to throw an IOException. 14569 (Mike McCandless) 14570 14571* LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide 14572 the internal cache implementation for thread safety, before it was 14573 declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer) 14574 14575* LUCENE-2053: If you call Thread.interrupt() on a thread inside 14576 Lucene, Lucene will do its best to interrupt the thread. However, 14577 instead of throwing InterruptedException (which is a checked 14578 exception), you'll get an oal.util.ThreadInterruptedException (an 14579 unchecked exception, subclassing RuntimeException). The interrupt 14580 status on the thread is cleared when this exception is thrown. 14581 (Mike McCandless) 14582 14583* LUCENE-2052: Some methods in Lucene core were changed to accept 14584 Java 5 varargs. This is not a backwards compatibility problem as 14585 long as you not try to override such a method. We left common 14586 overridden methods unchanged and added varargs to constructors, 14587 static, or final methods (MultiSearcher,...). (Uwe Schindler) 14588 14589* LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true 14590 reader, and new IndexSearcher(Directory) does the same. Note that 14591 this is a change in the default from 2.9, when these methods were 14592 previously deprecated. (Mike McCandless) 14593 14594* LUCENE-1753: Make not yet final TokenStreams final to enforce 14595 decorator pattern. (Uwe Schindler) 14596 14597Changes in runtime behavior 14598 14599* LUCENE-1677: Remove the system property to set SegmentReader class 14600 implementation. (Uwe Schindler) 14601 14602* LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS, 14603 support for this type of fields was removed. Lucene 3.0 is still able 14604 to read indexes with compressed fields, but as soon as merges occur 14605 or the index is optimized, all compressed fields are decompressed 14606 and converted to Field.Store.YES. Because of this, indexes with 14607 compressed fields can suddenly get larger. Also the first merge with 14608 decompression cannot be done in raw mode, it is therefore slower. 14609 This change has no effect for code that uses such old indexes, 14610 they behave as before (fields are automatically decompressed 14611 during read). Indexes converted to Lucene 3.0 format cannot be read 14612 anymore with previous versions. 14613 It is recommended to optimize your indexes after upgrading to convert 14614 to the new format and decompress all fields. 14615 If you want compressed fields, you can use CompressionTools, that 14616 creates compressed byte[] to be added as binary stored field. This 14617 cannot be done automatically, as you also have to decompress such 14618 fields when reading. You have to reindex to do that. 14619 (Michael Busch, Uwe Schindler) 14620 14621* LUCENE-2060: Changed ConcurrentMergeScheduler's default for 14622 maxNumThreads from 3 to 1, because in practice we get the most 14623 gains from running a single merge in the background. More than one 14624 concurrent merge causes a lot of thrashing (though it's possible on 14625 SSD storage that there would be net gains). (Jason Rutherglen, 14626 Mike McCandless) 14627 14628API Changes 14629 14630* LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012, 14631 LUCENE-1998: Port to Java 1.5: 14632 14633 - Add generics to public and internal APIs (see below). 14634 - Replace new Integer(int), new Double(double),... by static valueOf() calls. 14635 - Replace for-loops with Iterator by foreach loops. 14636 - Replace StringBuffer with StringBuilder. 14637 - Replace o.a.l.util.Parameter by Java 5 enums (see below). 14638 - Add @Override annotations. 14639 (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera, 14640 DM Smith) 14641 14642* Generify Lucene API: 14643 14644 - TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an 14645 instance of the requested attribute interface and no cast needed anymore 14646 (LUCENE-1855). 14647 - NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter 14648 now have Integer, Long, Float, Double as type param (LUCENE-1857). 14649 - Document.getFields() returns List<Fieldable>. 14650 - Query.extractTerms(Set<Term>) 14651 - CharArraySet and stop word sets in core/contrib 14652 - PriorityQueue (LUCENE-1935) 14653 - TopDocCollector 14654 - DisjunctionMaxQuery (LUCENE-1984) 14655 - MultiTermQueryWrapperFilter 14656 - CloseableThreadLocal 14657 - MapOfSets 14658 - o.a.l.util.cache package 14659 - lot's of internal APIs of IndexWriter 14660 (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani) 14661 14662* LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961, 14663 LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975, 14664 LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011: 14665 Remove deprecated methods/constructors/classes: 14666 14667 - Remove all String/File directory paths in IndexReader / 14668 IndexSearcher / IndexWriter. 14669 - Remove FSDirectory.getDirectory() 14670 - Make FSDirectory abstract. 14671 - Remove Field.Store.COMPRESS (see above). 14672 - Remove Filter.bits(IndexReader) method and make 14673 Filter.getDocIdSet(IndexReader) abstract. 14674 - Remove old DocIdSetIterator methods and make the new ones abstract. 14675 - Remove some methods in PriorityQueue. 14676 - Remove old TokenStream API and backwards compatibility layer. 14677 - Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery. 14678 - Remove SpanQuery.getTerms(). 14679 - Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO. 14680 - Remove old-style custom sort. 14681 - Remove legacy search setting in SortField. 14682 - Remove Hits and all references from core and contrib. 14683 - Remove HitCollector and its TopDocs support implementations. 14684 - Remove term field and accessors in MultiTermQuery 14685 (and fix Highlighter). 14686 - Remove deprecated methods in BooleanQuery. 14687 - Remove deprecated methods in Similarity. 14688 - Remove BoostingTermQuery. 14689 - Remove MultiValueSource. 14690 - Remove Scorer.explain(int). 14691 ...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller) 14692 14693* LUCENE-1925: Make IndexSearcher's subReaders and docStarts members 14694 protected; add expert ctor to directly specify reader, subReaders 14695 and docStarts. (John Wang, Tim Smith via Mike McCandless) 14696 14697* LUCENE-1945: All public classes that have a close() method now 14698 also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...). 14699 (Uwe Schindler) 14700 14701* LUCENE-1998: Change all Parameter instances to Java 5 enums. This 14702 is no backwards-break, only a change of the super class. Parameter 14703 was deprecated and will be removed in a later version. 14704 (DM Smith, Uwe Schindler) 14705 14706Bug fixes 14707 14708* LUCENE-1951: When the text provided to WildcardQuery has no wildcard 14709 characters (ie matches a single term), don't lose the boost and 14710 rewrite method settings. Also, rewrite to PrefixQuery if the 14711 wildcard is form "foo*", for slightly faster performance. (Robert 14712 Muir via Mike McCandless) 14713 14714* LUCENE-2013: SpanRegexQuery does not work with QueryScorer. 14715 (Benjamin Keil via Mark Miller) 14716 14717* LUCENE-2088: addAttribute() should only accept interfaces that 14718 extend Attribute. (Shai Erera, Uwe Schindler) 14719 14720* LUCENE-2045: Fix silly FileNotFoundException hit if you enable 14721 infoStream on IndexWriter and then add an empty document and commit 14722 (Shai Erera via Mike McCandless) 14723 14724* LUCENE-2046: IndexReader should not see the index as changed, after 14725 IndexWriter.prepareCommit has been called but before 14726 IndexWriter.commit is called. (Peter Keegan via Mike McCandless) 14727 14728New features 14729 14730* LUCENE-1933: Provide a convenience AttributeFactory that creates a 14731 Token instance for all basic attributes. (Uwe Schindler) 14732 14733* LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of 14734 code refactoring and Java 5 concurrent support in MultiSearcher. 14735 (Joey Surls, Simon Willnauer via Uwe Schindler) 14736 14737* LUCENE-2051: Add CharArraySet.copy() as a simple method to copy 14738 any Set<?> to a CharArraySet that is optimized, if Set<?> is already 14739 an CharArraySet. (Simon Willnauer) 14740 14741Optimizations 14742 14743* LUCENE-1183: Optimize Levenshtein Distance computation in 14744 FuzzyQuery. (Cédrik Lime via Mike McCandless) 14745 14746* LUCENE-2006: Optimization of FieldDocSortedHitQueue to always 14747 use Comparable<?> interface. (Uwe Schindler, Mark Miller) 14748 14749* LUCENE-2087: Remove recursion in NumericRangeTermEnum. 14750 (Uwe Schindler) 14751 14752Build 14753 14754* LUCENE-486: Remove test->demo dependencies. (Michael Busch) 14755 14756* LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0 14757 (Uwe Schindler, Mike McCandless) 14758 14759======================= Release 2.9.1 ======================= 14760 14761Changes in backwards compatibility policy 14762 14763 * LUCENE-2002: Add required Version matchVersion argument when 14764 constructing QueryParser or MultiFieldQueryParser and, default (as 14765 of 2.9) enablePositionIncrements to true to match 14766 StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless) 14767 14768Bug fixes 14769 14770 * LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used 14771 BooleanScorer for scoring), whereby some matching documents fail to 14772 be collected. (Fulin Tang via Mike McCandless) 14773 14774 * LUCENE-1124: Make sure FuzzyQuery always matches the precise term. 14775 (stefatwork@gmail.com via Mike McCandless) 14776 14777 * LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing 14778 when the reader is a near real-time reader. (Jake Mannix via Mike 14779 McCandless) 14780 14781 * LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan, 14782 Mark Miller via Mike McCandless) 14783 14784 * LUCENE-1992: Fix thread hazard if a merge is committing just as an 14785 exception occurs during sync (Uwe Schindler, Mike McCandless) 14786 14787 * LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB 14788 cannot exceed 2048 MB, and throw IllegalArgumentException if it 14789 does. (Aaron McKee, Yonik Seeley, Mike McCandless) 14790 14791 * LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined 14792 by client code. (Uwe Schindler) 14793 14794 * LUCENE-2016: Replace illegal U+FFFF character with the replacement 14795 char (U+FFFD) during indexing, to prevent silent index corruption. 14796 (Peter Keegan, Mike McCandless) 14797 14798API Changes 14799 14800 * Un-deprecate search(Weight weight, Filter filter, int n) from 14801 Searchable interface (deprecated by accident). (Uwe Schindler) 14802 14803 * Un-deprecate o.a.l.util.Version constants. (Mike McCandless) 14804 14805 * LUCENE-1987: Un-deprecate some ctors of Token, as they will not 14806 be removed in 3.0 and are still useful. Also add some missing 14807 o.a.l.util.Version constants for enabling invalid acronym 14808 settings in StandardAnalyzer to be compatible with the coming 14809 Lucene 3.0. (Uwe Schindler) 14810 14811 * LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring, 14812 to allow controlling per-IndexSearcher whether scores are computed 14813 when sorting by field. (Uwe Schindler, Mike McCandless) 14814 14815 * LUCENE-2043: Make IndexReader.commit(Map<String,String>) public. 14816 (Mike McCandless) 14817 14818Documentation 14819 14820 * LUCENE-1955: Fix Hits deprecation notice to point users in right 14821 direction. (Mike McCandless, Mark Miller) 14822 14823 * Fix javadoc about score tracking done by search methods in Searcher 14824 and IndexSearcher. (Mike McCandless) 14825 14826 * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token 14827 (Luke Nezda via Mike McCandless) 14828 14829======================= Release 2.9.0 ======================= 14830 14831Changes in backwards compatibility policy 14832 14833 * LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no 14834 longer computes a document score for each hit by default. If 14835 document score tracking is still needed, you can call 14836 IndexSearcher.setDefaultFieldSortScoring(true, true) to enable 14837 both per-hit and maxScore tracking; however, this is deprecated 14838 and will be removed in 3.0. 14839 14840 Alternatively, use Searchable.search(Weight, Filter, Collector) 14841 and pass in a TopFieldCollector instance, using the following code 14842 sample: 14843 14844 <code> 14845 TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields, 14846 true /* trackDocScores */, 14847 true /* trackMaxScore */, 14848 false /* docsInOrder */); 14849 searcher.search(query, tfc); 14850 TopDocs results = tfc.topDocs(); 14851 </code> 14852 14853 Note that your Sort object cannot use SortField.AUTO when you 14854 directly instantiate TopFieldCollector. 14855 14856 Also, the method search(Weight, Filter, Collector) was added to 14857 the Searchable interface and the Searcher abstract class to 14858 replace the deprecated HitCollector versions. If you either 14859 implement Searchable or extend Searcher, you should change your 14860 code to implement this method. If you already extend 14861 IndexSearcher, no further changes are needed to use Collector. 14862 14863 Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not 14864 valid scores. Lucene uses these values internally in certain 14865 places, so if you have hits with such scores, it will cause 14866 problems. (Shai Erera via Mike McCandless) 14867 14868 * LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache 14869 have been moved into FieldCache. ExtendedFieldCache is now deprecated and 14870 contains only a few declarations for binary backwards compatibility. 14871 ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and 14872 ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation. 14873 The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of 14874 ExtendedFieldCache and FieldCache, FieldCache can now additionally return 14875 long[] and double[] arrays in addition to int[] and float[] and StringIndex. 14876 14877 The interface changes are only notable for users implementing the interfaces, 14878 which was unlikely done, because there is no possibility to change 14879 Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler) 14880 14881 * LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract 14882 class. Some of the method signatures have changed, but it should be fairly 14883 easy to see what adjustments must be made to existing code to sync up 14884 with the new API. You can find more detail in the API Changes section. 14885 14886 Going forward Searchable will be kept for convenience only and may 14887 be changed between minor releases without any deprecation 14888 process. It is not recommended that you implement it, but rather extend 14889 Searcher. 14890 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless) 14891 14892 * LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below) 14893 has some backwards breaks in rare cases. We did our best to make the 14894 transition as easy as possible and you are not likely to run into any problems. 14895 If your tokenizers still implement next(Token) or next(), the calls are 14896 automatically wrapped. The indexer and query parser use the new API 14897 (eg use incrementToken() calls). All core TokenStreams are implemented using 14898 the new API. You can mix old and new API style TokenFilters/TokenStream. 14899 Problems only occur when you have done the following: 14900 You have overridden next(Token) or next() in one of the non-abstract core 14901 TokenStreams/-Filters. These classes should normally be final, but some 14902 of them are not. In this case, next(Token)/next() would never be called. 14903 To fail early with a hard compile/runtime error, the next(Token)/next() 14904 methods in these TokenStreams/-Filters were made final in this release. 14905 (Michael Busch, Uwe Schindler) 14906 14907 * LUCENE-1763: MergePolicy now requires an IndexWriter instance to 14908 be passed upon instantiation. As a result, IndexWriter was removed 14909 as a method argument from all MergePolicy methods. (Shai Erera via 14910 Mike McCandless) 14911 14912 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back 14913 compat break and caused custom SpanQuery implementations to fail at runtime 14914 in a variety of ways. This issue attempts to remedy things by causing 14915 a compile time break on custom SpanQuery implementations and removing 14916 the PayloadSpans class, with its functionality now moved to Spans. To 14917 help in alleviating future back compat pain, Spans has been changed from 14918 an interface to an abstract class. 14919 (Hugh Cayless, Mark Miller) 14920 14921 * LUCENE-1808: Query.createWeight has been changed from protected to 14922 public. This will be a back compat break if you have overridden this 14923 method - but you are likely already affected by the LUCENE-1693 (make Weight 14924 abstract rather than an interface) back compat break if you have overridden 14925 Query.creatWeight, so we have taken the opportunity to make this change. 14926 (Tim Smith, Shai Erera via Mark Miller) 14927 14928 * LUCENE-1708 - IndexReader.document() no longer checks if the document is 14929 deleted. You can call IndexReader.isDeleted(n) prior to calling document(n). 14930 (Shai Erera via Mike McCandless) 14931 14932 14933Changes in runtime behavior 14934 14935 * LUCENE-1424: QueryParser now by default uses constant score auto 14936 rewriting when it generates a WildcardQuery and PrefixQuery (it 14937 already does so for TermRangeQuery, as well). Call 14938 setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE) 14939 to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike 14940 McCandless) 14941 14942 * LUCENE-1575: As of 2.9, the core collectors as well as 14943 IndexSearcher's search methods that return top N results, no 14944 longer filter documents with scores <= 0.0. If you rely on this 14945 functionality you can use PositiveScoresOnlyCollector like this: 14946 14947 <code> 14948 TopDocsCollector tdc = new TopScoreDocCollector(10); 14949 Collector c = new PositiveScoresOnlyCollector(tdc); 14950 searcher.search(query, c); 14951 TopDocs hits = tdc.topDocs(); 14952 ... 14953 </code> 14954 14955 * LUCENE-1604: IndexReader.norms(String field) is now allowed to 14956 return null if the field has no norms, as long as you've 14957 previously called IndexReader.setDisableFakeNorms(true). This 14958 setting now defaults to false (to preserve the fake norms back 14959 compatible behavior) but in 3.0 will be hardwired to true. (Shon 14960 Vella via Mike McCandless). 14961 14962 * LUCENE-1624: If you open IndexWriter with create=true and 14963 autoCommit=false on an existing index, IndexWriter no longer 14964 writes an empty commit when it's created. (Paul Taylor via Mike 14965 McCandless) 14966 14967 * LUCENE-1593: When you call Sort() or Sort.setSort(String field, 14968 boolean reverse), the resulting SortField array no longer ends 14969 with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties 14970 internally by docID). (Shai Erera via Michael McCandless) 14971 14972 * LUCENE-1542: When the first token(s) have 0 position increment, 14973 IndexWriter used to incorrectly record the position as -1, if no 14974 payload is present, or Integer.MAX_VALUE if a payload is present. 14975 This causes positional queries to fail to match. The bug is now 14976 fixed, but if your app relies on the buggy behavior then you must 14977 call IndexWriter.setAllowMinus1Position(). That API is deprecated 14978 so you must fix your application, and rebuild your index, to not 14979 rely on this behavior by the 3.0 release of Lucene. (Jonathan 14980 Mamou, Mark Miller via Mike McCandless) 14981 14982 14983 * LUCENE-1715: Finalizers have been removed from the 4 core classes 14984 that still had them, since they will cause GC to take longer, thus 14985 tying up memory for longer, and at best they mask buggy app code. 14986 DirectoryReader (returned from IndexReader.open) & IndexWriter 14987 previously released the write lock during finalize. 14988 SimpleFSDirectory.FSIndexInput closed the descriptor in its 14989 finalizer, and NativeFSLock released the lock. It's possible 14990 applications will be affected by this, but only if the application 14991 is failing to close reader/writers. (Brian Groose via Mike 14992 McCandless) 14993 14994 * LUCENE-1717: Fixed IndexWriter to account for RAM usage of 14995 buffered deletions. (Mike McCandless) 14996 14997 * LUCENE-1727: Ensure that fields are stored & retrieved in the 14998 exact order in which they were added to the document. This was 14999 true in all Lucene releases before 2.3, but was broken in 2.3 and 15000 2.4, and is now fixed in 2.9. (Mike McCandless) 15001 15002 * LUCENE-1678: The addition of Analyzer.reusableTokenStream 15003 accidentally broke back compatibility of external analyzers that 15004 subclassed core analyzers that implemented tokenStream but not 15005 reusableTokenStream. This is now fixed, such that if 15006 reusableTokenStream is invoked on such a subclass, that method 15007 will forcefully fallback to tokenStream. (Mike McCandless) 15008 15009 * LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear 15010 startOffset, endOffset and type. This is not likely to affect any 15011 Tokenizer chains, as Tokenizers normally always set these three values. 15012 This change was made to be conform to the new AttributeImpl.clear() and 15013 AttributeSource.clearAttributes() to work identical for Token as one for all 15014 AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch) 15015 15016 * LUCENE-1483: When searching over multiple segments, a new Scorer is now created 15017 for each segment. Searching has been telescoped out a level and IndexSearcher now 15018 operates much like MultiSearcher does. The Weight is created only once for the top 15019 level Searcher, but each Scorer is passed a per-segment IndexReader. This will 15020 result in doc ids in the Scorer being internal to the per-segment IndexReader. It 15021 has always been outside of the API to count on a given IndexReader to contain every 15022 doc id in the index - and if you have been ignoring MultiSearcher in your custom code 15023 and counting on this fact, you will find your code no longer works correctly. If a 15024 custom Scorer implementation uses any caches/filters that rely on being based on the 15025 top level IndexReader, it will need to be updated to correctly use contextless 15026 caches/filters eg you can't count on the IndexReader to contain any given doc id or 15027 all of the doc ids. (Mark Miller, Mike McCandless) 15028 15029 * LUCENE-1846: DateTools now uses the US locale to format the numbers in its 15030 date/time strings instead of the default locale. For most locales there will 15031 be no change in the index format, as DateFormatSymbols is using ASCII digits. 15032 The usage of the US locale is important to guarantee correct ordering of 15033 generated terms. (Uwe Schindler) 15034 15035 * LUCENE-1860: MultiTermQuery now defaults to 15036 CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it 15037 was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery 15038 and WildcardQuery will now produce constant score for all matching 15039 docs, equal to the boost of the query. (Mike McCandless) 15040 15041API Changes 15042 15043 * LUCENE-1419: Add expert API to set custom indexing chain. This API is 15044 package-protected for now, so we don't have to officially support it. 15045 Yet, it will give us the possibility to try out different consumers 15046 in the chain. (Michael Busch) 15047 15048 * LUCENE-1427: DocIdSet.iterator() is now allowed to throw 15049 IOException. (Paul Elschot, Mike McCandless) 15050 15051 * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called 15052 AttributeSource instead of the Token class, which is now a utility class that 15053 holds common Token attributes. All attributes that the Token class had have 15054 been moved into separate classes: TermAttribute, OffsetAttribute, 15055 PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute. 15056 The new API is much more flexible; it allows to combine the Attributes 15057 arbitrarily and also to define custom Attributes. The new API has the same 15058 performance as the old next(Token) approach. For conformance with this new 15059 API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter. 15060 (Michael Busch, Uwe Schindler; additional contributions and bug fixes by 15061 Daniel Shane, Doron Cohen) 15062 15063 * LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator. 15064 These methods can be used to avoid additional calls to doc(). 15065 (Michael Busch) 15066 15067 * LUCENE-1468: Deprecate Directory.list(), which sometimes (in 15068 FSDirectory) filters out files that don't look like index files, in 15069 favor of new Directory.listAll(), which does no filtering. Also, 15070 listAll() will never return null; instead, it throws an IOException 15071 (or subclass). Specifically, FSDirectory.listAll() will throw the 15072 newly added NoSuchDirectoryException if the directory does not 15073 exist. (Marcel Reutegger, Mike McCandless) 15074 15075 * LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing 15076 you to record an opaque commitUserData (maps String -> String) into 15077 the commit written by IndexReader. This matches IndexWriter's 15078 commit methods. (Jason Rutherglen via Mike McCandless) 15079 15080 * LUCENE-652: Added org.apache.lucene.document.CompressionTools, to 15081 enable compressing & decompressing binary content, external to 15082 Lucene's indexing. Deprecated Field.Store.COMPRESS. 15083 15084 * LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions 15085 (Otis Gospodnetic via Mike McCandless) 15086 15087 * LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods 15088 to denote issues when offsets in TokenStream tokens exceed the length of the 15089 provided text. (Mark Harwood) 15090 15091 * LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of 15092 a new Collector abstract class. For easy migration, people can use 15093 HitCollectorWrapper which translates (wraps) HitCollector into 15094 Collector. Note that this class is also deprecated and will be 15095 removed when HitCollector is removed. Also TimeLimitedCollector 15096 is deprecated in favor of the new TimeLimitingCollector which 15097 extends Collector. (Shai Erera, Mark Miller, Mike McCandless) 15098 15099 * LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because 15100 it is used nowhere in core/contrib and there is only a very ineffective 15101 default implementation available. If you want to position a TermEnum 15102 to another Term, create a new one using IndexReader.terms(Term). 15103 (Uwe Schindler) 15104 15105 * LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does 15106 not make sense for all subclasses of MultiTermQuery. Check individual 15107 subclasses to see if they support getTerm(). (Mark Miller) 15108 15109 * LUCENE-1636: Make TokenFilter.input final so it's set only 15110 once. (Wouter Heijke, Uwe Schindler via Mike McCandless). 15111 15112 * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory 15113 (but left an FSDirectory base class). Added an FSDirectory.open 15114 static method to pick a good default FSDirectory implementation 15115 given the OS. FSDirectories should now be instantiated using 15116 FSDirectory.open or with public constructors rather than 15117 FSDirectory.getDirectory(), which has been deprecated. 15118 (Michael McCandless, Uwe Schindler, yonik) 15119 15120 * LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0. 15121 Instead, when sorting by field, the application should explicitly 15122 state the type of the field. (Mike McCandless) 15123 15124 * LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now 15125 require up front specification of enablePositionIncrement (Mike 15126 McCandless) 15127 15128 * LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor 15129 of the new nextDoc() and advance(). The new methods return the doc Id they 15130 landed on, saving an extra call to doc() in most cases. 15131 For easy migration of the code, you can change the calls to next() to 15132 nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo(). 15133 However it is advised that you take advantage of the returned doc ID and not 15134 call doc() following those two. 15135 Also, doc() was deprecated in favor of docID(). docID() should return -1 or 15136 NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the 15137 iterator has exhausted. Otherwise it should return the current doc ID. 15138 (Shai Erera via Mike McCandless) 15139 15140 * LUCENE-1672: All ctors/opens and other methods using String/File to 15141 specify the directory in IndexReader, IndexWriter, and IndexSearcher 15142 were deprecated. You should instantiate the Directory manually before 15143 and pass it to these classes (LUCENE-1451, LUCENE-1658). 15144 (Uwe Schindler) 15145 15146 * LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out 15147 of Lucene's core into new contrib/remote package. Searchable no 15148 longer extends java.rmi.Remote (Simon Willnauer via Mike 15149 McCandless) 15150 15151 * LUCENE-1677: The global property 15152 org.apache.lucene.SegmentReader.class, and 15153 ReadOnlySegmentReader.class are now deprecated, to be removed in 15154 3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike 15155 McCandless) 15156 15157 * LUCENE-1673: Deprecated NumberTools in favour of the new 15158 NumericRangeQuery and its new indexing format for numeric or 15159 date values. (Uwe Schindler) 15160 15161 * LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds 15162 a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /* 15163 topScorer */) method instead of scorer(IndexReader). IndexSearcher uses 15164 this method to obtain a scorer matching the capabilities of the Collector 15165 wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more 15166 efficient if out-of-order documents scoring is allowed by a Collector. 15167 Collector must now implement acceptsDocsOutOfOrder. If you write a 15168 Collector which does not care about doc ID orderness, it is recommended 15169 that you return true. Weight has a scoresDocsOutOfOrder method, which by 15170 default returns false. If you create a Weight which will score documents 15171 out of order if requested, you should override that method to return true. 15172 BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been 15173 deprecated as they are not needed anymore. BooleanQuery will now score docs 15174 out of order when used with a Collector that can accept docs out of order. 15175 Finally, Weight#explain now takes a sub-reader and sub-docID, rather than 15176 a top level reader and docID. 15177 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless) 15178 15179 * LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows 15180 chaining & mapping of characters before tokenizers run. CharStream (subclass of 15181 Reader) is the base class for custom java.io.Reader's, that support offset 15182 correction. Tokenizers got an additional method correctOffset() that is passed 15183 down to the underlying CharStream if input is a subclass of CharStream/-Filter. 15184 (Koji Sekiguchi via Mike McCandless, Uwe Schindler) 15185 15186 * LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike 15187 McCandless) 15188 15189 * LUCENE-1625: CheckIndex's programmatic API now returns separate 15190 classes detailing the status of each component in the index, and 15191 includes more detailed status than previously. (Tim Smith via 15192 Mike McCandless) 15193 15194 * LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to 15195 TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant 15196 score auto rewrite mode by default. The new classes also have new 15197 ctors taking field and term ranges as Strings (see also 15198 LUCENE-1424). (Uwe Schindler) 15199 15200 * LUCENE-1609: The termInfosIndexDivisor must now be specified 15201 up-front when opening the IndexReader. Attempts to call 15202 IndexReader.setTermInfosIndexDivisor will hit an 15203 UnsupportedOperationException. This was done to enable removal of 15204 all synchronization in TermInfosReader, which previously could 15205 cause threads to pile up in certain cases. (Dan Rosher via Mike 15206 McCandless) 15207 15208 * LUCENE-1688: Deprecate static final String stop word array in and 15209 StopAnalzyer and replace it with an immutable implementation of 15210 CharArraySet. (Simon Willnauer via Mark Miller) 15211 15212 * LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been 15213 made public as expert, experimental APIs. These APIs may suddenly 15214 change from release to release (Jason Rutherglen via Mike 15215 McCandless). 15216 15217 * LUCENE-1754: QueryWeight.scorer() can return null if no documents 15218 are going to be matched by the query. Similarly, 15219 Filter.getDocIdSet() can return null if no documents are going to 15220 be accepted by the Filter. Note that these 'can' return null, 15221 however they don't have to and can return a Scorer/DocIdSet which 15222 does not match / reject all documents. This is already the 15223 behavior of some QueryWeight/Filter implementations, and is 15224 documented here just for emphasis. (Shai Erera via Mike 15225 McCandless) 15226 15227 * LUCENE-1705: Added IndexWriter.deleteAllDocuments. (Tim Smith via 15228 Mike McCandless) 15229 15230 * LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to 15231 use the new TokenStream API. (Robert Muir, Michael Busch) 15232 15233 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back 15234 compat break and caused custom SpanQuery implementations to fail at runtime 15235 in a variety of ways. This issue attempts to remedy things by causing 15236 a compile time break on custom SpanQuery implementations and removing 15237 the PayloadSpans class, with its functionality now moved to Spans. To 15238 help in alleviating future back compat pain, Spans has been changed from 15239 an interface to an abstract class. 15240 (Hugh Cayless, Mark Miller) 15241 15242 * LUCENE-1808: Query.createWeight has been changed from protected to 15243 public. (Tim Smith, Shai Erera via Mark Miller) 15244 15245 * LUCENE-1826: Add constructors that take AttributeSource and 15246 AttributeFactory to all Tokenizer implementations. 15247 (Michael Busch) 15248 15249 * LUCENE-1847: Similarity#idf for both a Term and Term Collection have 15250 been deprecated. New versions that return an IDFExplanation have been 15251 added. (Yasoja Seneviratne, Mike McCandless, Mark Miller) 15252 15253 * LUCENE-1877: Made NativeFSLockFactory the default for 15254 the new FSDirectory API (open(), FSDirectory subclass ctors). 15255 All FSDirectory system properties were deprecated and all lock 15256 implementations use no lock prefix if the locks are stored inside 15257 the index directory. Because the deprecated String/File ctors of 15258 IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory() 15259 still use the old SimpleFSLockFactory and the new API 15260 NativeFSLockFactory, we strongly recommend not to mix deprecated 15261 and new API. (Uwe Schindler, Mike McCandless) 15262 15263 * LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method 15264 should return true, if the underlying implementation does not use disk 15265 I/O and is fast enough to be directly cached by CachingWrapperFilter. 15266 OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates. 15267 The default implementation of the abstract DocIdSet class returns false. 15268 In this case, CachingWrapperFilter copies the DocIdSetIterator into an 15269 OpenBitSet for caching. (Uwe Schindler, Thomas Becker) 15270 15271Bug fixes 15272 15273 * LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals() 15274 implementation - Leads to Solr Cache misses. 15275 (Todd Feak, Mark Miller via yonik) 15276 15277 * LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs 15278 of Terms#skipTo(). (Michael Busch) 15279 15280 * LUCENE-1573: Do not ignore InterruptedException (caused by 15281 Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt 15282 will cause a RuntimeException to be thrown. In 3.0 we will change 15283 public APIs to throw InterruptedException. (Jeremy Volkman via 15284 Mike McCandless) 15285 15286 * LUCENE-1590: Fixed stored-only Field instances do not change the 15287 value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you 15288 retrieve such fields they will now have omitNorms=true and 15289 omitTermFreqAndPositions=false (though these values are unused). 15290 (Uwe Schindler via Mike McCandless) 15291 15292 * LUCENE-1587: RangeQuery#equals() could consider a RangeQuery 15293 without a collator equal to one with a collator. 15294 (Mark Platvoet via Mark Miller) 15295 15296 * LUCENE-1600: Don't call String.intern unnecessarily in some cases 15297 when loading documents from the index. (P Eger via Mike 15298 McCandless) 15299 15300 * LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter 15301 could cause "infinite merging" to happen. (Christiaan Fluit via 15302 Mike McCandless) 15303 15304 * LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that 15305 contain field names with non-ascii characters. (Mike Streeton via 15306 Mike McCandless) 15307 15308 * LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in 15309 sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs. 15310 when it wasn't). (Shai Erera via Michael McCandless) 15311 15312 * LUCENE-1647: Fix case where IndexReader.undeleteAll would cause 15313 the segment's deletion count to be incorrect. (Mike McCandless) 15314 15315 * LUCENE-1542: When the first token(s) have 0 position increment, 15316 IndexWriter used to incorrectly record the position as -1, if no 15317 payload is present, or Integer.MAX_VALUE if a payload is present. 15318 This causes positional queries to fail to match. The bug is now 15319 fixed, but if your app relies on the buggy behavior then you must 15320 call IndexWriter.setAllowMinus1Position(). That API is deprecated 15321 so you must fix your application, and rebuild your index, to not 15322 rely on this behavior by the 3.0 release of Lucene. (Jonathan 15323 Mamou, Mark Miller via Mike McCandless) 15324 15325 * LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions 15326 on EOF, removed numeric overflow possibilities and added support 15327 for a hack to unmap the buffers on closing IndexInput. 15328 (Uwe Schindler) 15329 15330 * LUCENE-1681: Fix infinite loop caused by a call to DocValues methods 15331 getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller) 15332 15333 * LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts 15334 on this functionality and does not work correctly without it. 15335 (Billow Gao, Mark Miller) 15336 15337 * LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened 15338 readers (Mike McCandless) 15339 15340 * LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans 15341 documentation indicates it should. (Moti Nisenson via Mark Miller) 15342 15343 * LUCENE-1566: Sun JVM Bug 15344 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes 15345 invalid OutOfMemoryError when reading too many bytes at once from 15346 a file on 32bit JVMs that have a large maximum heap size. This 15347 fix adds set/getReadChunkSize to FSDirectory so that large reads 15348 are broken into chunks, to work around this JVM bug. On 32bit 15349 JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't 15350 show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer 15351 via Mike McCandless) 15352 15353 * LUCENE-1448: Added TokenStream.end() to perform end-of-stream 15354 operations (ie to return the end offset of the tokenization). 15355 This is important when multiple fields with the same name are added 15356 to a document, to ensure offsets recorded in term vectors for all 15357 of the instances are correct. 15358 (Mike McCandless, Mark Miller, Michael Busch) 15359 15360 * LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(), 15361 although it does allow it in set(Object). Fix get() to not assert the object 15362 is not null. (Shai Erera via Mike McCandless) 15363 15364 * LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib) 15365 that are the source of Tokens to always call 15366 AttributeSource.clearAttributes() first. (Uwe Schindler) 15367 15368 * LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output 15369 that is parsable by the QueryParser. (John Wang, Mark Miller) 15370 15371 * LUCENE-1836: Fix localization bug in the new query parser and add 15372 new LocalizedTestCase as base class for localization junit tests. 15373 (Robert Muir, Uwe Schindler via Michael Busch) 15374 15375 * LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats 15376 in their Weight#explain methods - these stats should be corpus wide. 15377 (Yasoja Seneviratne, Mike McCandless, Mark Miller) 15378 15379 * LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work, 15380 if the lock was obtained by another NativeFSLock(Factory) instance. 15381 Because of this IndexReader.isLocked() and IndexWriter.isLocked() did 15382 not work correctly. (Uwe Schindler) 15383 15384 * LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an 15385 OpenBitSet, due to an inefficiency in how the underlying storage is 15386 reallocated. (Nadav Har'El via Mike McCandless) 15387 15388 * LUCENE-1918: Fixed cases where a ParallelReader would 15389 generate exceptions on being passed to 15390 IndexWriter.addIndexes(IndexReader[]). First case was when the 15391 ParallelReader was empty. Second case was when the ParallelReader 15392 used to contain documents with TermVectors, but all such documents 15393 have been deleted. (Christian Kohlschütter via Mike McCandless) 15394 15395New features 15396 15397 * LUCENE-1411: Added expert API to open an IndexWriter on a prior 15398 commit, obtained from IndexReader.listCommits. This makes it 15399 possible to rollback changes to an index even after you've closed 15400 the IndexWriter that made the changes, assuming you are using an 15401 IndexDeletionPolicy that keeps past commits around. This is useful 15402 when building transactional support on top of Lucene. (Mike 15403 McCandless) 15404 15405 * LUCENE-1382: Add an optional arbitrary Map (String -> String) 15406 "commitUserData" to IndexWriter.commit(), which is stored in the 15407 segments file and is then retrievable via 15408 IndexReader.getCommitUserData instance and static methods. 15409 (Shalin Shekhar Mangar via Mike McCandless) 15410 15411 * LUCENE-1420: Similarity now has a computeNorm method that allows 15412 custom Similarity classes to override how norm is computed. It's 15413 provided a FieldInvertState instance that contains details from 15414 inverting the field. The default impl is boost * 15415 lengthNorm(numTerms), to be backwards compatible. Also added 15416 {set/get}DiscountOverlaps to DefaultSimilarity, to control whether 15417 overlapping tokens (tokens with 0 position increment) should be 15418 counted in lengthNorm. (Andrzej Bialecki via Mike McCandless) 15419 15420 * LUCENE-1424: Moved constant score query rewrite capability into 15421 MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery 15422 to switch between constant-score rewriting or BooleanQuery 15423 expansion rewriting via a new setRewriteMethod method. 15424 Deprecated ConstantScoreRangeQuery (Mark Miller via Mike 15425 McCandless) 15426 15427 * LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for 15428 single-term fields that uses FieldCache to compute the filter. If 15429 your documents all have a single term for a given field, and you 15430 need to create many RangeFilters with varying lower/upper bounds, 15431 then this is likely a much faster way to create the filters than 15432 RangeFilter. FieldCacheRangeFilter allows ranges on all data types, 15433 FieldCache supports (term ranges, byte, short, int, long, float, double). 15434 However, it comes at the expense of added RAM consumption and slower 15435 first-time usage due to populating the FieldCache. It also does not 15436 support collation (Tim Sturge, Matt Ericson via Mike McCandless and 15437 Uwe Schindler) 15438 15439 * LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache 15440 to allow subclasses to choose which DocIdSet implementation to use 15441 (Paul Elschot via Mike McCandless) 15442 15443 * LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts 15444 alphabetic, numeric, and symbolic Unicode characters which are not in 15445 the first 127 ASCII characters (the "Basic Latin" Unicode block) into 15446 their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which 15447 handles a subset of this filter, has been deprecated. 15448 (Andi Vajda, Steven Rowe via Mark Miller) 15449 15450 * LUCENE-1478: Added new SortField constructor allowing you to 15451 specify a custom FieldCache parser to generate numeric values from 15452 terms for a field. (Uwe Schindler via Mike McCandless) 15453 15454 * LUCENE-1528: Add support for Ideographic Space to the queryparser. 15455 (Luis Alves via Michael Busch) 15456 15457 * LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple 15458 terms on single-valued fields. The filter loads the FieldCache 15459 for the field the first time it's called, and subsequent usage of 15460 that field, even with different Terms in the filter, are fast. 15461 (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless). 15462 15463 * LUCENE-1314: Add clone(), clone(boolean readOnly) and 15464 reopen(boolean readOnly) to IndexReader. Cloning an IndexReader 15465 gives you a new reader which you can make changes to (deletions, 15466 norms) without affecting the original reader. Now, with clone or 15467 reopen you can change the readOnly of the original reader. (Jason 15468 Rutherglen, Mike McCandless) 15469 15470 * LUCENE-1506: Added FilteredDocIdSet, an abstract class which you 15471 subclass to implement the "match" method to accept or reject each 15472 docID. Unlike ChainedFilter (under contrib/misc), 15473 FilteredDocIdSet never requires you to materialize the full 15474 bitset. Instead, match() is called on demand per docID. (John 15475 Wang via Mike McCandless) 15476 15477 * LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter 15478 to reverse the characters in each token. (Koji Sekiguchi via yonik) 15479 15480 * LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow 15481 efficiently opening a new reader on a specific commit, sharing 15482 resources with the original reader. (Torin Danil via Mike 15483 McCandless) 15484 15485 * LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools, 15486 to encode byte[] as String values that are valid terms, and 15487 maintain sort order of the original byte[] when the bytes are 15488 interpreted as unsigned. (Steven Rowe via Mike McCandless) 15489 15490 * LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from 15491 a specific fields to set the score for a document. (Karl Wettin 15492 via Mike McCandless) 15493 15494 * LUCENE-1586: Add IndexReader.getUniqueTermCount(). (Mike 15495 McCandless via Derek) 15496 15497 * LUCENE-1516: Added "near real-time search" to IndexWriter, via a 15498 new expert getReader() method. This method returns a reader that 15499 searches the full index, including any uncommitted changes in the 15500 current IndexWriter session. This should result in a faster 15501 turnaround than the normal approach of commiting the changes and 15502 then reopening a reader. (Jason Rutherglen via Mike McCandless) 15503 15504 * LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any 15505 MultiTermQuery as a Filter. Also made some improvements to 15506 MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no 15507 terms in the enum; track the total number of terms it visited 15508 during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also 15509 more friendly to subclassing. (Uwe Schindler via Mike McCandless) 15510 15511 * LUCENE-1605: Added BitVector.subset(). (Jeremy Volkman via Mike 15512 McCandless) 15513 15514 * LUCENE-1618: Added FileSwitchDirectory that enables files with 15515 specified extensions to be stored in a primary directory and the 15516 rest of the files to be stored in the secondary directory. For 15517 example, this can be useful for the large doc-store (stored 15518 fields, term vectors) files in FSDirectory and the rest of the 15519 index files in a RAMDirectory. (Jason Rutherglen via Mike 15520 McCandless) 15521 15522 * LUCENE-1494: Added FieldMaskingSpanQuery which can be used to 15523 cross-correlate Spans from different fields. 15524 (Paul Cowan and Chris Hostetter) 15525 15526 * LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take 15527 deletions into account when considering merges. (Yasuhiro Matsuda 15528 via Mike McCandless) 15529 15530 * LUCENE-1550: Added new n-gram based String distance measure for spell checking. 15531 See the Javadocs for NGramDistance.java for a reference paper on why 15532 this is helpful (Tom Morton via Grant Ingersoll) 15533 15534 * LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712: 15535 Added NumericRangeQuery and NumericRangeFilter, a fast alternative to 15536 RangeQuery/RangeFilter for numeric searches. They depend on a specific 15537 structure of terms in the index that can be created by indexing 15538 using the new NumericField or NumericTokenStream classes. NumericField 15539 can only be used for indexing and optionally stores the values as 15540 string representation in the doc store. Documents returned from 15541 IndexReader/IndexSearcher will return only the String value using 15542 the standard Fieldable interface. NumericFields can be sorted on 15543 and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley, 15544 Mike McCandless) 15545 15546 * LUCENE-1405: Added support for Ant resource collections in contrib/ant 15547 <index> task. (Przemyslaw Sztoch via Erik Hatcher) 15548 15549 * LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing 15550 in conjunction with any other ways to specify stored field values, 15551 currently binary or string values. (yonik) 15552 15553 * LUCENE-1701: Made the standard FieldCache.Parsers public and added 15554 parsers for fields generated using NumericField/NumericTokenStream. 15555 All standard parsers now also implement Serializable and enforce 15556 their singleton status. (Uwe Schindler, Mike McCandless) 15557 15558 * LUCENE-1741: User configurable maximum chunk size in MMapDirectory. 15559 On 32 bit platforms, the address space can be very fragmented, so 15560 one big ByteBuffer for the whole file may not fit into address space. 15561 (Eks Dev via Uwe Schindler) 15562 15563 * LUCENE-1644: Enable 4 rewrite modes for queries deriving from 15564 MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery, 15565 NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a 15566 filter and then assigns constant score (boost) to docs; 15567 CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but 15568 uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also 15569 creates a BooleanQuery but keeps the BooleanQuery's scores; 15570 CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant 15571 constant-score rewrite method. (Mike McCandless) 15572 15573 * LUCENE-1448: Added TokenStream.end(), to perform end-of-stream 15574 operations. This is currently used to fix offset problems when 15575 multiple fields with the same name are added to a document. 15576 (Mike McCandless, Mark Miller, Michael Busch) 15577 15578 * LUCENE-1776: Add an option to not collect payloads for an ordered 15579 SpanNearQuery. Payloads were not lazily loaded in this case as 15580 the javadocs implied. If you have payloads and want to use an ordered 15581 SpanNearQuery that does not need to use the payloads, you can 15582 disable loading them with a new constructor switch. (Mark Miller) 15583 15584 * LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality 15585 with payloads (Peter Keegan, Grant Ingersoll, Mark Miller) 15586 15587 * LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads 15588 based on the maximum payload seen for a document. 15589 Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller) 15590 15591 * LUCENE-1749: Addition of FieldCacheSanityChecker utility, and 15592 hooks to use it in all existing Lucene Tests. This class can 15593 be used by any application to inspect the FieldCache and provide 15594 diagnostic information about the possibility of inconsistent 15595 FieldCache usage. Namely: FieldCache entries for the same field 15596 with different datatypes or parsers; and FieldCache entries for 15597 the same field in both a reader, and one of its (descendant) sub 15598 readers. 15599 (Chris Hostetter, Mark Miller) 15600 15601 * LUCENE-1789: Added utility class 15602 oal.search.function.MultiValueSource to ease the transition to 15603 segment based searching for any apps that directly call 15604 oal.search.function.* APIs. This class wraps any other 15605 ValueSource, but takes care when composite (multi-segment) are 15606 passed to not double RAM usage in the FieldCache. (Chris 15607 Hostetter, Mark Miller, Mike McCandless) 15608 15609Optimizations 15610 15611 * LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing 15612 scores of the query, since they are just discarded. Also, made it 15613 more efficient (single pass) by not creating & populating an 15614 intermediate OpenBitSet (Paul Elschot, Mike McCandless) 15615 15616 * LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd() 15617 (Paul Elschot via yonik) 15618 15619 * LUCENE-1484: Remove synchronization of IndexReader.document() by 15620 using CloseableThreadLocal internally. (Jason Rutherglen via Mike 15621 McCandless). 15622 15623 * LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length 15624 is small compared to minSimilarity. (Timo Nentwig, Mark Miller) 15625 15626 * LUCENE-1316: MatchAllDocsQuery now avoids the synchronized 15627 IndexReader.isDeleted() call per document, by directly accessing 15628 the underlying deleteDocs BitVector. This improves performance 15629 with non-readOnly readers, especially in a multi-threaded 15630 environment. (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike 15631 McCandless) 15632 15633 * LUCENE-1483: When searching over multiple segments we now visit 15634 each sub-reader one at a time. This speeds up warming, since 15635 FieldCache entries (if required) can be shared across reopens for 15636 those segments that did not change, and also speeds up searches 15637 that sort by relevance or by field values. (Mark Miller, Mike 15638 McCandless) 15639 15640 * LUCENE-1575: The new Collector class decouples collect() from 15641 score computation. Collector.setScorer is called to establish the 15642 current Scorer in-use per segment. Collectors that require the 15643 score should then call Scorer.score() per hit inside 15644 collect(). (Shai Erera via Mike McCandless) 15645 15646 * LUCENE-1596: MultiTermDocs speedup when set with 15647 MultiTermDocs.seek(MultiTermEnum) (yonik) 15648 15649 * LUCENE-1653: Avoid creating a Calendar in every call to 15650 DateTools#dateToString, DateTools#timeToString and 15651 DateTools#round. (Shai Erera via Mark Miller) 15652 15653 * LUCENE-1688: Deprecate static final String stop word array and 15654 replace it with an immutable implementation of CharArraySet. 15655 Removes conversions between Set and array. 15656 (Simon Willnauer via Mark Miller) 15657 15658 * LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if 15659 it won't match any documents (e.g. if there are no required and 15660 optional scorers, or not enough optional scorers to satisfy 15661 minShouldMatch). (Shai Erera via Mike McCandless) 15662 15663 * LUCENE-1607: To speed up string interning for commonly used 15664 strings, the StringHelper.intern() interface was added with a 15665 default implementation that uses a lockless cache. 15666 (Earwin Burrfoot, yonik) 15667 15668 * LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik) 15669 15670 15671Documentation 15672 15673 * LUCENE-1908: Scoring documentation imrovements in Similarity javadocs. 15674 (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen) 15675 15676 * LUCENE-1872: NumericField javadoc improvements 15677 (Michael McCandless, Uwe Schindler) 15678 15679 * LUCENE-1875: Make TokenStream.end javadoc less confusing. 15680 (Uwe Schindler) 15681 15682 * LUCENE-1862: Rectified duplicate package level javadocs for 15683 o.a.l.queryParser and o.a.l.analysis.cn. 15684 (Chris Hostetter) 15685 15686 * LUCENE-1886: Improved hyperlinking in key Analysis javadocs 15687 (Bernd Fondermann via Chris Hostetter) 15688 15689 * LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with 15690 typos. 15691 (Robert Muir via Chris Hostetter) 15692 15693 * LUCENE-1898: Switch changes to use bullets rather than numbers and 15694 update changes-to-html script to handle the new format. 15695 (Steven Rowe, Mark Miller) 15696 15697 * LUCENE-1900: Improve Searchable Javadoc. 15698 (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller) 15699 15700 * LUCENE-1896: Improve Similarity#queryNorm javadocs. 15701 (Jiri Kuhn, Mark Miller) 15702 15703Build 15704 15705 * LUCENE-1440: Add new targets to build.xml that allow downloading 15706 and executing the junit testcases from an older release for 15707 backwards-compatibility testing. (Michael Busch) 15708 15709 * LUCENE-1446: Add compatibility tag to common-build.xml and run 15710 backwards-compatibility tests in the nightly build. (Michael Busch) 15711 15712 * LUCENE-1529: Properly test "drop-in" replacement of jar with 15713 backwards-compatibility tests. (Mike McCandless, Michael Busch) 15714 15715 * LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build 15716 and clean contrib/surround files. (Luis Alves via Michael Busch) 15717 15718 * LUCENE-1854: tar task should use longfile="gnu" to avoid false file 15719 name length warnings. (Mark Miller) 15720 15721Test Cases 15722 15723 * LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility 15724 classes to wrap IndexReaders and Searchers in MultiReaders or 15725 MultiSearcher when possible to help exercise more edge cases. 15726 (Chris Hostetter, Mark Miller) 15727 15728 * LUCENE-1852: Fix localization test failures. 15729 (Robert Muir via Michael Busch) 15730 15731 * LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others 15732 in core and contrib to use a new BaseTokenStreamTestCase 15733 base class. Also rewrote some tests to use this general analysis assert 15734 functions instead of own ones (e.g. TestMappingCharFilter). 15735 The new base class also tests tokenization with the TokenStream.next() 15736 backwards layer enabled (using Token/TokenWrapper as attribute 15737 implementation) and disabled (default for Lucene 3.0) 15738 (Uwe Schindler, Robert Muir) 15739 15740 * LUCENE-1836: Added a new LocalizedTestCase as base class for localization 15741 junit tests. (Robert Muir, Uwe Schindler via Michael Busch) 15742 15743======================= Release 2.4.1 ======================= 15744 15745API Changes 15746 157471. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal 15748 resources. (Christian Kohlschütter via Mike McCandless) 15749 15750Bug fixes 15751 157521. LUCENE-1452: Fixed silent data-loss case whereby binary fields are 15753 truncated to 0 bytes during merging if the segments being merged 15754 are non-congruent (same field name maps to different field 15755 numbers). This bug was introduced with LUCENE-1219. (Andrzej 15756 Bialecki via Mike McCandless). 15757 157582. LUCENE-1429: Don't throw incorrect IllegalStateException from 15759 IndexWriter.close() if you've hit an OOM when autoCommit is true. 15760 (Mike McCandless) 15761 157623. LUCENE-1474: If IndexReader.flush() is called twice when there were 15763 pending deletions, it could lead to later false AssertionError 15764 during IndexReader.open. (Mike McCandless) 15765 157664. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open 15767 (masking an actual IOException) that takes String or File path. 15768 (Mike McCandless) 15769 157705. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count 15771 token offsets. (Mike McCandless) 15772 157736. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in 15774 incorrectly closing the shared FSDirectory. This bug would only 15775 happen if you use IndexReader.open() with a File or String argument. 15776 The returned readers are wrapped by a FilterIndexReader that 15777 correctly handles closing of directory after reopen()/clone(). 15778 (Mark Miller, Uwe Schindler, Mike McCandless) 15779 157807. LUCENE-1457: Fix possible overflow bugs during binary 15781 searches. (Mark Miller via Mike McCandless) 15782 157838. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if 15784 both bits() and getDocIdSet() methods are called. (Matt Jones via 15785 Mike McCandless) 15786 157879. LUCENE-1519: Fix int overflow bug during segment merging. (Deepak 15788 via Mike McCandless) 15789 1579010. LUCENE-1521: Fix int overflow bug when flushing segment. 15791 (Shon Vella via Mike McCandless). 15792 1579311. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]). 15794 (Mike McCandless via Doug Sale) 15795 1579612. LUCENE-1547: Fix rare thread safety issue if two threads call 15797 IndexWriter commit() at the same time. (Mike McCandless) 15798 1579913. LUCENE-1465: NearSpansOrdered returns payloads from first possible match 15800 rather than the correct, shortest match; Payloads could be returned even 15801 if the max slop was exceeded; The wrong payload could be returned in 15802 certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller) 15803 1580414. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal 15805 resources. (Christian Kohlschütter via Mike McCandless) 15806 1580715. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly 15808 rollback IndexWriter's internal state on hitting an 15809 exception. (Scott Garland via Mike McCandless) 15810 15811======================= Release 2.4.0 ======================= 15812 15813Changes in backwards compatibility policy 15814 158151. LUCENE-1340: In a minor change to Lucene's backward compatibility 15816 policy, we are now allowing the Fieldable interface to have 15817 changes, within reason, and made on a case-by-case basis. If an 15818 application implements its own Fieldable, please be aware of 15819 this. Otherwise, no need to be concerned. This is in effect for 15820 all 2.X releases, starting with 2.4. Also note, that in all 15821 likelihood, Fieldable will be changed in 3.0. 15822 15823 15824Changes in runtime behavior 15825 15826 1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names 15827 (eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4 15828 backwards compatible, but buggy, behavior, you can either call 15829 StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static 15830 method), or, set system property 15831 org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym 15832 to "false" on JVM startup. All StandardAnalyzer instances created 15833 after that will then show the pre-2.4 behavior. Alternatively, 15834 you can call setReplaceInvalidAcronym(false) to change the 15835 behavior per instance of StandardAnalyzer. This backwards 15836 compatibility will be removed in 3.0 (hardwiring the value to 15837 true). (Mike McCandless) 15838 15839 2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such 15840 that a reader can see the changes) far less often than it used to. 15841 Previously, every flush was also a commit. You can always force a 15842 commit by calling IndexWriter.commit(). Furthermore, in 3.0, 15843 autoCommit will be hardwired to false (IndexWriter constructors 15844 that take an autoCommit argument have been deprecated) (Mike 15845 McCandless) 15846 15847 3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and 15848 addIndexesNoOptimize no longer allow the same Directory instance 15849 to be passed in more than once. Internally, IndexWriter uses 15850 Directory and segment name to uniquely identify segments, so 15851 adding the same Directory more than once was causing duplicates 15852 which led to problems (Mike McCandless) 15853 15854 4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the 15855 positions are indicated with a ? and multiple terms at the same 15856 position are joined with a |. (Andrzej Bialecki via Mike 15857 McCandless) 15858 15859API Changes 15860 15861 1. LUCENE-1084: Changed all IndexWriter constructors to take an 15862 explicit parameter for maximum field size. Deprecated all the 15863 pre-existing constructors; these will be removed in release 3.0. 15864 NOTE: these new constructors set autoCommit to false. (Steven 15865 Rowe via Mike McCandless) 15866 15867 2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a 15868 java.util.BitSet. This allows using more efficient data structures 15869 for Filters and makes them more flexible. This deprecates 15870 Filter.bits(), so all filters that implement this outside 15871 the Lucene code base will need to be adapted. See also the javadocs 15872 of the Filter class. (Paul Elschot, Michael Busch) 15873 15874 3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered 15875 adds/deletes and then commits a new segments file so readers will 15876 see the changes. Deprecate IndexWriter.flush() in favor of 15877 IndexWriter.commit(). (Mike McCandless) 15878 15879 4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which 15880 consult the MergePolicy to find merges necessary to merge away all 15881 deletes from the index. This should be a somewhat lower cost 15882 operation than optimize. (John Wang via Mike McCandless) 15883 15884 5. LUCENE-1233: Return empty array instead of null when no fields 15885 match the specified name in these methods in Document: 15886 getFieldables, getFields, getValues, getBinaryValues. (Stefan 15887 Trcek vai Mike McCandless) 15888 15889 6. LUCENE-1234: Make BoostingSpanScorer protected. (Andi Vajda via Grant Ingersoll) 15890 15891 7. LUCENE-510: The index now stores strings as true UTF-8 bytes 15892 (previously it was Java's modified UTF-8). If any text, either 15893 stored fields or a token, has illegal UTF-16 surrogate characters, 15894 these characters are now silently replaced with the Unicode 15895 replacement character U+FFFD. This is a change to the index file 15896 format. (Marvin Humphrey via Mike McCandless) 15897 15898 8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor 15899 and RAM buffer size. (Otis Gospodnetic) 15900 15901 9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator 15902 and remove all references to these classes from the core. Also update demos 15903 and tutorials. (Michael Busch) 15904 1590510. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit. 15906 getVersion() returns the same value that IndexReader.getVersion() 15907 returns when the reader is opened on the same commit. (Jason 15908 Rutherglen via Mike McCandless) 15909 1591011. LUCENE-1311: Added IndexReader.listCommits(Directory) static 15911 method to list all commits in a Directory, plus IndexReader.open 15912 methods that accept an IndexCommit and open the index as of that 15913 commit. These methods are only useful if you implement a custom 15914 DeletionPolicy that keeps more than the last commit around. 15915 (Jason Rutherglen via Mike McCandless) 15916 1591712. LUCENE-1325: Added IndexCommit.isOptimized(). (Shalin Shekhar 15918 Mangar via Mike McCandless) 15919 1592013. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike 15921 McCandless) 15922 1592314. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term 15924 frequency, positions and payloads. This saves index space, and 15925 indexing/searching time. (Eks Dev via Mike McCandless) 15926 1592715. LUCENE-1219: Add basic reuse API to Fieldable for binary fields: 15928 getBinaryValue/Offset/Length(); currently only lazy fields reuse 15929 the provided byte[] result to getBinaryValue. (Eks Dev via Mike 15930 McCandless) 15931 1593216. LUCENE-1334: Add new constructor for Term: Term(String fieldName) 15933 which defaults term text to "". (DM Smith via Mike McCandless) 15934 1593517. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a 15936 Token. Also added term() method to return a String, with a 15937 performance penalty clearly documented. Also implemented 15938 hashCode() and equals() in Token, and fixed all core and contrib 15939 analyzers to use the re-use APIs. (DM Smith via Mike McCandless) 15940 1594118. LUCENE-1329: Add optional readOnly boolean when opening an 15942 IndexReader. A readOnly reader is not allowed to make changes 15943 (deletions, norms) to the index; in exchanged, the isDeleted 15944 method, often a bottleneck when searching with many threads, is 15945 not synchronized. The default for readOnly is still false, but in 15946 3.0 the default will become true. (Jason Rutherglen via Mike 15947 McCandless) 15948 1594919. LUCENE-1367: Add IndexCommit.isDeleted(). (Shalin Shekhar Mangar 15950 via Mike McCandless) 15951 1595220. LUCENE-1061: Factored out all "new XXXQuery(...)" in 15953 QueryParser.java into protected methods newXXXQuery(...) so that 15954 subclasses can create their own subclasses of each Query type. 15955 (John Wang via Mike McCandless) 15956 1595721. LUCENE-753: Added new Directory implementation 15958 org.apache.lucene.store.NIOFSDirectory, which uses java.nio's 15959 FileChannel to do file reads. On most non-Windows platforms, with 15960 many threads sharing a single searcher, this may yield sizable 15961 improvement to query throughput when compared to FSDirectory, 15962 which only allows a single thread to read from an open file at a 15963 time. (Jason Rutherglen via Mike McCandless) 15964 1596522. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n). 15966 (Mike McCandless) 15967 1596823. LUCENE-1356: Allow easy extensions of TopDocCollector by turning 15969 constructor and fields from package to protected. (Shai Erera 15970 via Doron Cohen) 15971 1597224. LUCENE-1375: Added convenience method IndexCommit.getTimestamp, 15973 which is equivalent to 15974 getDirectory().fileModified(getSegmentsFileName()). (Mike McCandless) 15975 1597623. LUCENE-1366: Rename Field.Index options to be more accurate: 15977 TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED; 15978 NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS 15979 is added. (Mike McCandless) 15980 1598124. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic) 15982 15983Bug fixes 15984 15985 1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single 15986 clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch) 15987 15988 2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with 15989 a filter might miss some hits because scorer.skipTo() is called 15990 without checking if the scorer is already at the right position. 15991 scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as 15992 scorer.next(). (Eks Dev, Michael Busch) 15993 15994 3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll) 15995 15996 4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case 15997 of a single field phrase. (Trejkaz via Doron Cohen) 15998 15999 5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as 16000 result IndexReader.reopen() failed to sense index changes. (Doron Cohen) 16001 16002 6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter; 16003 deprecated docCount(). (Mike McCandless) 16004 16005 7. LUCENE-1274: Added new prepareCommit() method to IndexWriter, 16006 which does phase 1 of a 2-phase commit (commit() does phase 2). 16007 This is needed when you want to update an index as part of a 16008 transaction involving external resources (eg a database). Also 16009 deprecated abort(), renaming it to rollback(). (Mike McCandless) 16010 16011 8. LUCENE-1003: Stop RussianAnalyzer from removing numbers. 16012 (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic) 16013 16014 9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary 16015 methods, plus removal of IndexReader reference. 16016 (Naveen Belkale via Otis Gospodnetic) 16017 1601810. LUCENE-1046: Removed dead code in SpellChecker 16019 (Daniel Naber via Otis Gospodnetic) 16020 1602111. LUCENE-1189: Fixed the QueryParser to handle escaped characters within 16022 quoted terms correctly. (Tomer Gabel via Michael Busch) 16023 1602412. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll) 16025 1602613. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match 16027 depending only upon the non-payload score part, regardless of the effect of 16028 the payload on the score. Prior to this, score of a query containing a BTQ 16029 differed from its explanation. (Doron Cohen) 16030 1603114. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more 16032 than twice in the query. (Doron Cohen) 16033 1603415. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll) 16035 1603616. LUCENE-1383: Workaround a nasty "leak" in Java's builtin 16037 ThreadLocal, to prevent Lucene from causing unexpected 16038 OutOfMemoryError in certain situations (notably J2EE 16039 applications). (Chris Lu via Mike McCandless) 16040 16041New features 16042 16043 1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis 16044 process. The flag is not indexed/stored and is thus only used by analysis. 16045 16046 2. LUCENE-1147: Add -segment option to CheckIndex tool so you can 16047 check only a specific segment or segments in your index. (Mike 16048 McCandless) 16049 16050 3. LUCENE-1045: Reopened this issue to add support for short and bytes. 16051 16052 4. LUCENE-584: Added new data structures to o.a.l.util, such as 16053 OpenBitSet and SortedVIntList. These extend DocIdSet and can 16054 directly be used for Filters with the new Filter API. Also changed 16055 the core Filters to use OpenBitSet instead of java.util.BitSet. 16056 (Paul Elschot, Michael Busch) 16057 16058 5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms. 16059 This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll) 16060 16061 6. LUCENE-1044: Change Lucene to properly "sync" files after 16062 committing, to ensure on a machine or OS crash or power cut, even 16063 with cached writes, the index remains consistent. Also added 16064 explicit commit() method to IndexWriter to force a commit without 16065 having to close. (Mike McCandless) 16066 16067 7. LUCENE-997: Add search timeout (partial) support. 16068 A TimeLimitedCollector was added to allow limiting search time. 16069 It is a partial solution since timeout is checked only when 16070 collecting a hit, and therefore a search for rare words in a 16071 huge index might not stop within the specified time. 16072 (Sean Timm via Doron Cohen) 16073 16074 8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across 16075 close/re-open of IndexWriter while still protecting an open 16076 snapshot (Tim Brennan via Mike McCandless) 16077 16078 9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete 16079 documents matching the specified query. Also added static unlock 16080 and isLocked methods (deprecating the ones in IndexReader). (Mike 16081 McCandless) 16082 1608310. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan 16084 via Mike McCandless) 16085 1608611. LUCENE-550: Added InstantiatedIndex implementation. Experimental 16087 Index store similar to MemoryIndex but allows for multiple documents 16088 in memory. (Karl Wettin via Grant Ingersoll) 16089 1609012. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper 16091 that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll) 16092 1609313. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll) 16094 1609514. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API 16096 and DocIdSetIterator-based filters. Backwards-compatibility with old 16097 BitSet-based filters is ensured. (Paul Elschot via Michael Busch) 16098 1609915. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll) 16100 1610116. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll) 16102 1610317. LUCENE-1297: Allow other string distance measures for the SpellChecker 16104 (Thomas Morton via Otis Gospodnetic) 16105 1610618. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll) 16107 1610819. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless) 16109 1611020. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser. (Steve Rowe via Grant Ingersoll) 16111 16112Optimizations 16113 16114 1. LUCENE-705: When building a compound file, use 16115 RandomAccessFile.setLength() to tell the OS/filesystem to 16116 pre-allocate space for the file. This may improve fragmentation 16117 in how the CFS file is stored, and allows us to detect an upcoming 16118 disk full situation before actually filling up the disk. (Mike 16119 McCandless) 16120 16121 2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the 16122 raw bytes for each contiguous range of non-deleted documents. 16123 (Mike McCandless) 16124 16125 3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in 16126 SegmentTermEnum is null for every call of scanTo(). 16127 (Christian Kohlschuetter via Michael Busch) 16128 16129 4. LUCENE-1217: Internal to Field.java, use isBinary instead of 16130 runtime type checking for possible speedup of binaryValue(). 16131 (Eks Dev via Mike McCandless) 16132 16133 5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses 16134 less memory than the previous version. (Cédrik LIME via Otis Gospodnetic) 16135 16136 6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the 16137 TermInfosReader. In performance experiments the speedup was about 25% on 16138 average on mid-size indexes with ~500,000 documents for queries with 3 16139 terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch) 16140 16141Documentation 16142 16143 1. LUCENE-1236: Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll) 16144 16145 2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically 16146 from CHANGES.txt. This HTML file is currently visible only via developers page. 16147 (Steven Rowe via Doron Cohen) 16148 16149 3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at 16150 the top of this file and also on Fieldable.java). (Grant Ingersoll) 16151 16152 4. LUCENE-1873: Update documentation to reflect current Contrib area status. 16153 (Steven Rowe, Mark Miller) 16154 16155Build 16156 16157 1. LUCENE-1153: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib. 16158 16159 2. LUCENE-1202: Small fixes to the way Clover is used to work better 16160 with contribs. Of particular note: a single clover db is used 16161 regardless of whether tests are run globally or in the specific 16162 contrib directories. 16163 16164 3. LUCENE-1353: Javacc target in contrib/miscellaneous for 16165 generating the precedence query parser. 16166 16167Test Cases 16168 16169 1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded. 16170 Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped 16171 collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen) 16172 16173 2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to 16174 timeout exceeded (just because test machine is very busy). 16175 16176======================= Release 2.3.2 ======================= 16177 16178Bug fixes 16179 16180 1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying 16181 methods in IndexWriter, do not commit any further changes to the 16182 index to prevent risk of possible corruption. (Mike McCandless) 16183 16184 2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM 16185 too early when TermVectors were in use. (Mike McCandless) 16186 16187 3. LUCENE-1198: Don't corrupt index if an exception happens inside 16188 DocumentsWriter.init (Mike McCandless) 16189 16190 4. LUCENE-1199: Added defensive check for null indexReader before 16191 calling close in IndexModifier.close() (Mike McCandless) 16192 16193 5. LUCENE-1200: Fix rare deadlock case in addIndexes* when 16194 ConcurrentMergeScheduler is in use (Mike McCandless) 16195 16196 6. LUCENE-1208: Fix deadlock case on hitting an exception while 16197 processing a document that had triggered a flush (Mike McCandless) 16198 16199 7. LUCENE-1210: Fix deadlock case on hitting an exception while 16200 starting a merge when using ConcurrentMergeScheduler (Mike McCandless) 16201 16202 8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on 16203 flush (Mark Ferguson via Mike McCandless) 16204 16205 9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit 16206 successfully created compound files. (Michael Busch) 16207 1620810. LUCENE-1150: Re-expose StandardTokenizer's constants publicly; 16209 this was accidentally lost with LUCENE-966. (Nicolas Lalevée via 16210 Mike McCandless) 16211 1621211. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on 16213 hitting an exception in readInternal, the buffer is incorrectly 16214 filled with stale bytes such that subsequent calls to readByte() 16215 return incorrect results. (Trejkaz via Mike McCandless) 16216 1621712. LUCENE-1270: Fixed intermittent case where IndexWriter.close() 16218 would hang after IndexWriter.addIndexesNoOptimize had been 16219 called. (Stu Hood via Mike McCandless) 16220 16221Build 16222 16223 1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch) 16224 16225 16226======================= Release 2.3.1 ======================= 16227 16228Bug fixes 16229 16230 1. LUCENE-1168: Fixed corruption cases when autoCommit=false and 16231 documents have mixed term vectors (Suresh Guvvala via Mike 16232 McCandless). 16233 16234 2. LUCENE-1171: Fixed some cases where OOM errors could cause 16235 deadlock in IndexWriter (Mike McCandless). 16236 16237 3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk 16238 merging of stored fields is used (Yonik via Mike McCandless). 16239 16240 4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int 16241 offset, int len) that was ignoring offset and thus giving the 16242 wrong answer. (Thomas Peuss via Mike McCandless) 16243 16244 5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too 16245 many merges at the end. (Mike McCandless) 16246 16247 6. LUCENE-1176: Fix corruption case when documents with no term 16248 vector fields are added before documents with term vector fields. 16249 (Mike McCandless) 16250 16251 7. LUCENE-1179: Fixed assert statement that was incorrectly 16252 preventing Fields with empty-string field name from working. 16253 (Sergey Kabashnyuk via Mike McCandless) 16254 16255======================= Release 2.3.0 ======================= 16256 16257Changes in runtime behavior 16258 16259 1. LUCENE-994: Defaults for IndexWriter have been changed to maximize 16260 out-of-the-box indexing speed. First, IndexWriter now flushes by 16261 RAM usage (16 MB by default) instead of a fixed doc count (call 16262 IndexWriter.setMaxBufferedDocs to get backwards compatible 16263 behavior). Second, ConcurrentMergeScheduler is used to run merges 16264 using background threads (call IndexWriter.setMergeScheduler(new 16265 SerialMergeScheduler()) to get backwards compatible behavior). 16266 Third, merges are chosen based on size in bytes of each segment 16267 rather than document count of each segment (call 16268 IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get 16269 backwards compatible behavior). 16270 16271 NOTE: users of ParallelReader must change back all of these 16272 defaults in order to ensure the docIDs "align" across all parallel 16273 indices. 16274 16275 (Mike McCandless) 16276 16277 2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting 16278 the field type for sorting automatically, numbers used to be 16279 interpreted as int, then as float, if parsing the number as an int 16280 failed. Now the detection checks for int, then for long, 16281 then for float. (Daniel Naber) 16282 16283API Changes 16284 16285 1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have 16286 IndexWriter flush whenever the buffered documents are using more 16287 than the specified amount of RAM. Also added new APIs to Token 16288 that allow one to set a char[] plus offset and length to specify a 16289 token (to avoid creating a new String() for each Token). (Mike 16290 McCandless) 16291 16292 2. LUCENE-963: Add setters to Field to allow for re-using a single 16293 Field instance during indexing. This is a sizable performance 16294 gain, especially for small documents. (Mike McCandless) 16295 16296 3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to 16297 permit re-using of Token and TokenStream instances during 16298 indexing. Changed Token to use a char[] as the store for the 16299 termText instead of String. This gives faster tokenization 16300 performance (~10-15%). (Mike McCandless) 16301 16302 4. LUCENE-847: Factored MergePolicy, which determines which merges 16303 should take place and when, as well as MergeScheduler, which 16304 determines when the selected merges should actually run, out of 16305 IndexWriter. The default merge policy is now 16306 LogByteSizeMergePolicy (see LUCENE-845) and the default merge 16307 scheduler is now ConcurrentMergeScheduler (see 16308 LUCENE-870). (Steven Parkes via Mike McCandless) 16309 16310 5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method 16311 that allows you to reduce memory usage of the termInfos by further 16312 sub-sampling (over the termIndexInterval that was used during 16313 indexing) which terms are loaded into memory. (Chuck Williams, 16314 Doug Cutting via Mike McCandless) 16315 16316 6. LUCENE-743: Add IndexReader.reopen() method that re-opens an 16317 existing IndexReader (see New features -> 8.) (Michael Busch) 16318 16319 7. LUCENE-1062: Add setData(byte[] data), 16320 setData(byte[] data, int offset, int length), getData(), getOffset() 16321 and clone() methods to o.a.l.index.Payload. Also add the field name 16322 as arg to Similarity.scorePayload(). (Michael Busch) 16323 16324 8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to 16325 "partially optimize" an index down to maxNumSegments segments. 16326 (Mike McCandless) 16327 16328 9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public. 16329 1633010. LUCENE-1064: Changed TopDocs constructor to be public. 16331 (Shai Erera via Michael Busch) 16332 1633311. LUCENE-1079: DocValues cleanup: constructor now has no params, 16334 and getInnerArray() now throws UnsupportedOperationException (Doron Cohen) 16335 1633612. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns 16337 the Object (if any) that was bumped from the queue to allow 16338 re-use. (Shai Erera via Mike McCandless) 16339 1634013. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969) 16341 modified so it is token producer's responsibility 16342 to call Token.clear(). (Doron Cohen) 16343 1634414. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default > 16345 255 characters) tokens. You can increase this limit by calling 16346 StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless) 16347 16348 16349Bug fixes 16350 16351 1. LUCENE-933: QueryParser fixed to not produce empty sub 16352 BooleanQueries "()" even if the Analyzer produced no 16353 tokens for input. (Doron Cohen) 16354 16355 2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the 16356 first term in the dictionary. (Michael Busch) 16357 16358 3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader 16359 that was thrown after a call of TermPositions.seek(). 16360 (Rich Johnson via Michael Busch) 16361 16362 4. LUCENE-938: Fixed cases where an unhandled exception in 16363 IndexWriter's methods could cause deletes to be lost. 16364 (Steven Parkes via Mike McCandless) 16365 16366 5. LUCENE-962: Fixed case where an unhandled exception in 16367 IndexWriter.addDocument or IndexWriter.updateDocument could cause 16368 unreferenced files in the index to not be deleted 16369 (Steven Parkes via Mike McCandless) 16370 16371 6. LUCENE-957: RAMDirectory fixed to properly handle directories 16372 larger than Integer.MAX_VALUE. (Doron Cohen) 16373 16374 7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(), 16375 isOptimized() or getVersion() is called. Separated MultiReader 16376 into two classes: MultiSegmentReader extends IndexReader, is 16377 package-protected and is created automatically by IndexReader.open() 16378 in case the index has multiple segments. The public MultiReader 16379 now extends MultiSegmentReader and is intended to be used by users 16380 who want to add their own subreaders. (Daniel Naber, Michael Busch) 16381 16382 8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before 16383 a call of isOptimized() would throw a NPE. (Michael Busch) 16384 16385 9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(), 16386 isOptimized() or getVersion() is called. (Michael Busch) 16387 1638810. LUCENE-948: Fix FNFE exception caused by stale NFS client 16389 directory listing caches when writers on different machines are 16390 sharing an index over NFS and using a custom deletion policy (Mike 16391 McCandless) 16392 1639311. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader 16394 close any streams they had opened if an exception is hit in the 16395 constructor. (Ning Li via Mike McCandless) 16396 1639712. LUCENE-985: If an extremely long term is in a doc (> 16383 chars), 16398 we now throw an IllegalArgumentException saying the term is too 16399 long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl 16400 Wettin via Mike McCandless) 16401 1640213. LUCENE-991: The explain() method of BoostingTermQuery had errors 16403 when no payloads were present on a document. (Peter Keegan via 16404 Grant Ingersoll) 16405 1640614. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again 16407 (this was broken by LUCENE-843). (Ning Li via Mike McCandless) 16408 1640915. LUCENE-1008: Fixed corruption case when document with no term 16410 vector fields is added after documents with term vector fields. 16411 This bug was introduced with LUCENE-843. (Grant Ingersoll via 16412 Mike McCandless) 16413 1641416. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero 16415 length quoted string.) (yonik) 16416 1641717. LUCENE-1010: Fixed corruption case when document with no term 16418 vector fields is added after documents with term vector fields. 16419 This case is hit during merge and would cause an EOFException. 16420 This bug was introduced with LUCENE-984. (Andi Vajda via Mike 16421 McCandless) 16422 1642319. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when 16424 autoCommit=false and documents are using stored fields and/or term 16425 vectors. (Mark Miller via Mike McCandless) 16426 1642720. LUCENE-1011: Fixed corruption case when two or more machines, 16428 sharing an index over NFS, can be writers in quick succession. 16429 (Patrick Kimber via Mike McCandless) 16430 1643121. LUCENE-1028: Fixed Weight serialization for few queries: 16432 DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery. 16433 Serialization check added for all queries. 16434 (Kyle Maxwell via Doron Cohen) 16435 1643622. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the 16437 timeout argument is very large (eg Long.MAX_VALUE). Also added 16438 Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay 16439 Diakov via Mike McCandless) 16440 1644123. LUCENE-1050: Throw LockReleaseFailedException in 16442 Simple/NativeFSLockFactory if we fail to delete the lock file when 16443 releasing the lock. (Nikolay Diakov via Mike McCandless) 16444 1644524. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in 16446 the merged segment. (Michael Busch) 16447 1644825. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent 16449 with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll) 16450 1645126. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted 16452 along with iterating the hits. Deleting docs already retrieved 16453 now works seamlessly. If docs not yet retrieved are deleted 16454 (e.g. from another thread), and then, relying on the initial 16455 Hits.length(), an application attempts to retrieve more hits 16456 than actually exist , a ConcurrentMidificationException 16457 is thrown. (Doron Cohen) 16458 1645927. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking 16460 the type of some tokens incorrectly. This is done by adding a new flag named 16461 replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting 16462 this flag to true fixes the problem. This flag is a temporary fix and is already 16463 marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll) 16464 LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll) 16465 1646628. LUCENE-749: ChainedFilter behavior fixed when logic of 16467 first filter is ANDNOT. (Antonio Bruno via Doron Cohen) 16468 1646929. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last 16470 term) after next() returns false. (Steven Tamm via Mike 16471 McCandless) 16472 16473 16474New features 16475 16476 1. LUCENE-906: Elision filter for French. 16477 (Mathieu Lecarme via Otis Gospodnetic) 16478 16479 2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for 16480 not only filtering, but knowing where in a Document a Filter matches 16481 (Grant Ingersoll) 16482 16483 3. LUCENE-868: Added new Term Vector access features. New callback 16484 mechanism allows application to define how and where to read Term 16485 Vectors from disk. This implementation contains several extensions 16486 of the new abstract TermVectorMapper class. The new API should be 16487 back-compatible. No changes in the actual storage of Term Vectors 16488 has taken place. 16489 3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper 16490 to provide information about what document is being accessed. 16491 (Karl Wettin via Grant Ingersoll) 16492 16493 4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for 16494 position based lookup of term vector information. 16495 See item #3 above (LUCENE-868). 16496 16497 5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store) 16498 to verify that locking is working properly. LockVerifyServer runs 16499 a separate server to verify locks. LockStressTest runs a simple 16500 tool that rapidly obtains and releases locks. 16501 VerifyingLockFactory is a LockFactory that wraps any other 16502 LockFactory and consults the LockVerifyServer whenever a lock is 16503 obtained or released, throwing an exception if an illegal lock 16504 obtain occurred. (Patrick Kimber via Mike McCandless) 16505 16506 6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to 16507 support doubles and longs. Added support into SortField for sorting 16508 on doubles and longs as well. (Grant Ingersoll) 16509 16510 7. LUCENE-1020: Created basic index checking & repair tool 16511 (o.a.l.index.CheckIndex). When run without -fix it does a 16512 detailed test of all segments in the index and reports summary 16513 information and any errors it hit. With -fix it will remove 16514 segments that had errors. (Mike McCandless) 16515 16516 8. LUCENE-743: Add IndexReader.reopen() method that re-opens an 16517 existing IndexReader by only loading those portions of an index 16518 that have changed since the reader was (re)opened. reopen() can 16519 be significantly faster than open(), depending on the amount of 16520 index changes. SegmentReader, MultiSegmentReader, MultiReader, 16521 and ParallelReader implement reopen(). (Michael Busch) 16522 16523 9. LUCENE-1040: CharArraySet useful for efficiently checking 16524 set membership of text specified by char[]. (yonik) 16525 1652610. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a 16527 live backup of an index without pausing indexing. (Mike 16528 McCandless) 16529 1653011. LUCENE-1019: CustomScoreQuery enhanced to support multiple 16531 ValueSource queries. (Kyle Maxwell via Doron Cohen) 16532 1653312. LUCENE-1095: Added an option to StopFilter to increase 16534 positionIncrement of the token succeeding a stopped token. 16535 Disabled by default. Similar option added to QueryParser 16536 to consider token positions when creating PhraseQuery 16537 and MultiPhraseQuery. Disabled by default (so by default 16538 the query parser ignores position increments). 16539 (Doron Cohen) 16540 1654113. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll) 16542 16543 16544 16545Optimizations 16546 16547 1. LUCENE-937: CachingTokenFilter now uses an iterator to access the 16548 Tokens that are cached in the LinkedList. This increases performance 16549 significantly, especially when the number of Tokens is large. 16550 (Mark Miller via Michael Busch) 16551 16552 2. LUCENE-843: Substantial optimizations to improve how IndexWriter 16553 uses RAM for buffering documents and to speed up indexing (2X-8X 16554 faster). A single shared hash table now records the in-memory 16555 postings per unique term and is directly flushed into a single 16556 segment. (Mike McCandless) 16557 16558 3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes 16559 takes place when using compound files. (Mike McCandless) 16560 16561 4. LUCENE-959: Remove synchronization in Document (yonik) 16562 16563 5. LUCENE-963: Add setters to Field to allow for re-using a single 16564 Field instance during indexing. This is a sizable performance 16565 gain, especially for small documents. (Mike McCandless) 16566 16567 6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos 16568 and don't rely on exceptions. (Michael Busch) 16569 16570 7. LUCENE-966: Very substantial speedups (~6X faster) for 16571 StandardTokenizer (StandardAnalyzer) by using JFlex instead of 16572 JavaCC to generate the tokenizer. 16573 (Stanislaw Osinski via Mike McCandless) 16574 16575 8. LUCENE-969: Changed core tokenizers & filters to re-use Token and 16576 TokenStream instances when possible to improve tokenization 16577 performance (~10-15%). (Mike McCandless) 16578 16579 9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike 16580 McCandless) 16581 1658210. LUCENE-986: Refactored SegmentInfos from IndexReader into the new 16583 subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader 16584 now extend DirectoryIndexReader and are the only IndexReader 16585 implementations that use SegmentInfos to access an index and 16586 acquire a write lock for index modifications. (Michael Busch) 16587 1658811. LUCENE-1007: Allow flushing in IndexWriter to be triggered by 16589 either RAM usage or document count or both (whichever comes 16590 first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable 16591 one of the flush triggers. (Ning Li via Mike McCandless) 16592 1659312. LUCENE-1043: Speed up merging of stored fields by bulk-copying the 16594 raw bytes for each contiguous range of non-deleted documents. 16595 (Robert Engels via Mike McCandless) 16596 1659713. LUCENE-693: Speed up nested conjunctions (~2x) that match many 16598 documents, and a slight performance increase for top level 16599 conjunctions. (yonik) 16600 1660114. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static 16602 and final. (Nathan Beyer via Michael Busch) 16603 16604Documentation 16605 16606 1. LUCENE-1051: Generate separate javadocs for core, demo and contrib 16607 classes, as well as an unified view. Also add an appropriate menu 16608 structure to the website. (Michael Busch) 16609 16610 2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery. 16611 (Ronnie Kolehmainen via Michael Busch) 16612 16613Build 16614 16615 1. LUCENE-908: Improvements and simplifications for how the MANIFEST 16616 file and the META-INF dir are created. (Michael Busch) 16617 16618 2. LUCENE-935: Various improvements for the maven artifacts. Now the 16619 artifacts also include the sources as .jar files. (Michael Busch) 16620 16621 3. Added apply-patch target to top-level build. Defaults to looking for 16622 a patch in ${basedir}/../patches with name specified by -Dpatch.name. 16623 Can also specify any location by -Dpatch.file property on the command 16624 line. This should be helpful for easy application of patches, but it 16625 is also a step towards integrating automatic patch application with 16626 JIRA and Hudson, and is thus subject to change. (Grant Ingersoll) 16627 16628 4. LUCENE-935: Defined property "m2.repository.url" to allow setting 16629 the url to a maven remote repository to deploy to. (Michael Busch) 16630 16631 5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch) 16632 16633 6. LUCENE-1055: Remove gdata-server from build files and its sources 16634 from trunk. (Michael Busch) 16635 16636 7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository 16637 via scp and ssh authentication. (Michael Busch) 16638 16639 8. LUCENE-1123: Allow overriding the specification version for 16640 MANIFEST.MF (Michael Busch) 16641 16642Test Cases 16643 16644 1. LUCENE-766: Test adding two fields with the same name but different 16645 term vector setting. (Nicolas Lalevée via Doron Cohen) 16646 16647======================= Release 2.2.0 ======================= 16648 16649Changes in runtime behavior 16650 16651API Changes 16652 16653 1. LUCENE-793: created new exceptions and added them to throws clause 16654 for many methods (all subclasses of IOException for backwards 16655 compatibility): index.StaleReaderException, 16656 index.CorruptIndexException, store.LockObtainFailedException. 16657 This was done to better call out the possible root causes of an 16658 IOException from these methods. (Mike McCandless) 16659 16660 2. LUCENE-811: make SegmentInfos class, plus a few methods from related 16661 classes, package-private again (they were unnecessarily made public 16662 as part of LUCENE-701). (Mike McCandless) 16663 16664 3. LUCENE-710: added optional autoCommit boolean to IndexWriter 16665 constructors. When this is false, index changes are not committed 16666 until the writer is closed. This gives explicit control over when 16667 a reader will see the changes. Also added optional custom 16668 deletion policy to explicitly control when prior commits are 16669 removed from the index. This is intended to allow applications to 16670 share an index over NFS by customizing when prior commits are 16671 deleted. (Mike McCandless) 16672 16673 4. LUCENE-818: changed most public methods of IndexWriter, 16674 IndexReader (and its subclasses), FieldsReader and RAMDirectory to 16675 throw AlreadyClosedException if they are accessed after being 16676 closed. (Mike McCandless) 16677 16678 5. LUCENE-834: Changed some access levels for certain Span classes to allow them 16679 to be overridden. They have been marked expert only and not for public 16680 consumption. (Grant Ingersoll) 16681 16682 6. LUCENE-796: Removed calls to super.* from various get*Query methods in 16683 MultiFieldQueryParser, in order to allow sub-classes to override them. 16684 (Steven Parkes via Otis Gospodnetic) 16685 16686 7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter 16687 in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter 16688 combination when caching is desired. 16689 (Chris Hostetter, Otis Gospodnetic) 16690 16691 8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory 16692 to enable extensibility of these classes. (Michael Busch) 16693 16694 9. LUCENE-580: Added the public method reset() to TokenStream. This method does 16695 nothing by default, but may be overwritten by subclasses to support consuming 16696 the TokenStream more than once. (Michael Busch) 16697 1669810. LUCENE-580: Added a new constructor to Field that takes a TokenStream as 16699 argument, available as tokenStreamValue(). This is useful to avoid the need of 16700 "dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch) 16701 1670211. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and 16703 getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and 16704 getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.) 16705 improves performance for certain queries but results in scoring out of docid 16706 order. This patch reverse this change, so now by default hit docs are scored 16707 in docid order if not setAllowDocsOutOfOrder(true) is explicitly called. 16708 This patch also enables the tests in QueryUtils again that check for docid 16709 order. (Paul Elschot, Doron Cohen, Michael Busch) 16710 1671112. LUCENE-888: Added Directory.openInput(File path, int bufferSize) 16712 to optionally specify the size of the read buffer. Also added 16713 BufferedIndexInput.setBufferSize(int) to change the buffer size. 16714 (Mike McCandless) 16715 1671613. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need 16717 to be public because it implements the public interface TermPositionVector. 16718 (Michael Busch) 16719 16720Bug fixes 16721 16722 1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen) 16723 16724 2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard. 16725 Query parser modified to create a prefix query only for the case 16726 that there is a single trailing wildcard (and no additional wildcard 16727 or '?' in the query text). (Doron Cohen) 16728 16729 3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory 16730 and SimpleFSLockFactory. This enables all 4 builtin LockFactory 16731 implementations to be specified via the System property 16732 org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless) 16733 16734 4. LUCENE-821: The new single-norm-file introduced by LUCENE-756 16735 failed to reduce the number of open descriptors since it was still 16736 opened once per field with norms. (yonik) 16737 16738 5. LUCENE-823: Make sure internal file handles are closed when 16739 hitting an exception (eg disk full) while flushing deletes in 16740 IndexWriter's mergeSegments, and also during 16741 IndexWriter.addIndexes. (Mike McCandless) 16742 16743 6. LUCENE-825: If directory is removed after 16744 FSDirectory.getDirectory() but before IndexReader.open you now get 16745 a FileNotFoundException like Lucene pre-2.1 (before this fix you 16746 got an NPE). (Mike McCandless) 16747 16748 7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser, 16749 because the backslash is the escape character. Also changed the ESCAPED_CHAR 16750 list to contain all possible characters, because every character that 16751 follows a backslash should be considered as escaped. (Michael Busch) 16752 16753 8. LUCENE-372: QueryParser.parse() now ensures that the entire input string 16754 is consumed. Now a ParseException is thrown if a query contains too many 16755 closing parentheses. (Andreas Neumann via Michael Busch) 16756 16757 9. LUCENE-814: javacc build targets now fix line-end-style of generated files. 16758 Now also deleting all javacc generated files before calling javacc. 16759 (Steven Parkes, Doron Cohen) 16760 1676110. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen) 16762 1676311. LUCENE-828: Minor fix for Term's equal(). 16764 (Paul Cowan via Otis Gospodnetic) 16765 1676612. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false, 16767 and you call addIndexes, and hit an exception (eg disk full) then 16768 when IndexWriter rolls back its internal state this could corrupt 16769 the instance of IndexWriter (but, not the index itself) by 16770 referencing already deleted segments. This bug was only present 16771 in 2.2 (trunk), ie was never released. (Mike McCandless) 16772 1677313. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs. 16774 For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen) 16775 1677614. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported 16777 by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used. 16778 Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity 16779 was set has no effect - it is masked by the similarity of the MultiSearcher. This is as 16780 designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen) 16781 1678215. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it 16783 has written the postings. Then the resources associated with the 16784 TokenStreams can safely be released. (Michael Busch) 16785 1678616. LUCENE-883: consecutive calls to Spellchecker.indexDictionary() 16787 won't insert terms twice anymore. (Daniel Naber) 16788 1678917. LUCENE-881: QueryParser.escape() now also escapes the characters 16790 '|' and '&' which are part of the queryparser syntax. (Michael Busch) 16791 1679218. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR 16793 anymore and ignored, but re-thrown. Some javadoc improvements. 16794 (Daniel Naber) 16795 1679619. LUCENE-698: FilteredQuery now takes the query boost into account for 16797 scoring. (Michael Busch) 16798 1679920. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in 16800 enumeration. (Christian Mallwitz via Daniel Naber) 16801 1680221. LUCENE-903: FilteredQuery explanation inaccuracy with boost. 16803 Explanation tests now "deep" check the explanation details. 16804 (Chris Hostetter, Doron Cohen) 16805 1680622. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the 16807 skip target param and ends up at the first match. 16808 (Sudaakeran B. via Chris Hostetter & Doron Cohen) 16809 1681023. LUCENE-913: Two consecutive score() calls return different 16811 scores for Boolean Queries. (Michael Busch, Doron Cohen) 16812 1681324. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the 16814 box", again, by moving set/getMaxMergeDocs up from 16815 LogDocMergePolicy into LogMergePolicy. This fixes the API 16816 breakage (non backwards compatible change) caused by LUCENE-994. 16817 (Yonik Seeley via Mike McCandless) 16818 16819New features 16820 16821 1. LUCENE-759: Added two n-gram-producing TokenFilters. 16822 (Otis Gospodnetic) 16823 16824 2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with 16825 RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll) 16826 16827 3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list. 16828 These metadata are called Payloads. For every position of a Token one Payload in the form 16829 of a variable length byte array can be stored in the prox file. 16830 Remark: The APIs introduced with this feature are in experimental state and thus 16831 contain appropriate warnings in the javadocs. 16832 (Michael Busch) 16833 16834 4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the 16835 values of a payload (see #3 above.) (Grant Ingersoll) 16836 16837 5. LUCENE-834: Similarity has a new method for scoring payloads called 16838 scorePayloads that can be overridden to take advantage of payload 16839 storage (see #3 above) 16840 16841 6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and 16842 implemented it in the appropriate places (Grant Ingersoll) 16843 16844 7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters 16845 on the remote side of the RMI connection. 16846 (Matt Ericson via Otis Gospodnetic) 16847 16848 8. LUCENE-446: Added Solr's search.function for scores based on field 16849 values, plus CustomScoreQuery for simple score (post) customization. 16850 (Yonik Seeley, Doron Cohen) 16851 16852 9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more 16853 Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two 16854 Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations 16855 between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples. 16856 (Grant Ingersoll, Michael Busch, Yonik Seeley) 16857 16858Optimizations 16859 16860 1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions 16861 when nextPosition() is called for the first time. This allows using instances 16862 of SegmentTermPositions instead of SegmentTermDocs without additional costs. 16863 (Michael Busch) 16864 16865 2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and 16866 IndexOutput directly now. This avoids further buffering and thus avoids 16867 unnecessary array copies. (Michael Busch) 16868 16869 3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some 16870 cases and possibly improve scoring performance. Documents can now be 16871 delivered out-of-order as they are scored (e.g. to HitCollector). 16872 N.B. A bit of code had to be disabled in QueryUtils in order for 16873 TestBoolean2 test to keep passing. 16874 (Paul Elschot via Otis Gospodnetic) 16875 16876 4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes 16877 them to keep the spell index small. (Daniel Naber) 16878 16879 5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput. 16880 Together with LUCENE-888 this will allow to adjust the buffer size 16881 dynamically. (Paul Elschot, Michael Busch) 16882 16883 6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and 16884 BufferedIndexOutput. Also increase buffer size in 16885 BufferedIndexInput, but only when used during merging. Together, 16886 these increases yield 10-18% overall performance gain vs the 16887 previous 1K defaults. (Mike McCandless) 16888 16889 7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds 16890 up most queries that use skipTo(), especially on big indexes with large posting 16891 lists. For average AND queries the speedup is about 20%, for queries that 16892 contain very frequent and very unique terms the speedup can be over 80%. 16893 (Michael Busch) 16894 16895Documentation 16896 16897 1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to 16898 http://wiki.apache.org/lucene-java/ Updated the links in the docs and 16899 wherever else I found references. (Grant Ingersoll, Joe Schaefer) 16900 16901 2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be 16902 consistent with java.util.Comparator.compare(): Any integer is allowed to 16903 be returned instead of only -1/0/1. 16904 (Paul Cowan via Michael Busch) 16905 16906 3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4. 16907 Solved javadoc errors under jdk5 (jars in path for gdata). 16908 Made "javadocs" target depend on "build-contrib" for first downloading 16909 contrib jars configured for dynamic downloaded. (Note: when running 16910 behind firewall, a firewall prompt might pop up) (Doron Cohen) 16911 16912 4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a 16913 remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch) 16914 16915 5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen) 16916 16917 6. LUCENE-926: Added document package javadocs. (Grant Ingersoll) 16918 16919Build 16920 16921 1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars. 16922 (Steven Parkes via Michael Busch) 16923 16924 2. LUCENE-885: "ant test" now includes all contrib tests. The new 16925 "ant test-core" target can be used to run only the Core (non 16926 contrib) tests. 16927 (Chris Hostetter) 16928 16929 3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages). 16930 (Doron Cohen) 16931 16932 4. LUCENE-894: Add custom build file for binary distributions that includes 16933 targets to build the demos. (Chris Hostetter, Michael Busch) 16934 16935 5. LUCENE-904: The "package" targets in build.xml now also generate .md5 16936 checksum files. (Chris Hostetter, Michael Busch) 16937 16938 6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of 16939 demo war, demo jar, and the contrib jars. (Michael Busch) 16940 16941 7. LUCENE-909: Demo targets for running the demo. (Doron Cohen) 16942 16943 8. LUCENE-908: Improves content of MANIFEST file and makes it customizable 16944 for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball 16945 jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt. 16946 (Chris Hostetter, Michael Busch) 16947 16948 9. LUCENE-930: Various contrib building improvements to ensure contrib 16949 dependencies are met, and test compilation errors fail the build. 16950 (Steven Parkes, Chris Hostetter) 16951 1695210. LUCENE-622: Add ant target and pom.xml files for building maven artifacts 16953 of the Lucene core and the contrib modules. 16954 (Sami Siren, Karl Wettin, Michael Busch) 16955 16956======================= Release 2.1.0 ======================= 16957 16958Changes in runtime behavior 16959 16960 1. 's' and 't' have been removed from the list of default stopwords 16961 in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's' 16962 as a stopword meant that 's-class' led to the same results as 'class'. 16963 Note that this problem still exists for 'a', e.g. in 'a-class' as 16964 'a' continues to be a stopword. 16965 (Daniel Naber) 16966 16967 2. LUCENE-478: Updated the list of Unicode code point ranges for CJK 16968 (now split into CJ and K) in StandardAnalyzer. (John Wang and 16969 Steven Rowe via Otis Gospodnetic) 16970 16971 3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj, 16972 and added a few more of them to increase CJK character coverage. 16973 Also documented some of the ranges. 16974 (Otis Gospodnetic) 16975 16976 4. LUCENE-489: Add support for leading wildcard characters (*, ?) to 16977 QueryParser. Default is to disallow them, as before. 16978 (Steven Parkes via Otis Gospodnetic) 16979 16980 5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery 16981 for range queries. Added useOldRangeQuery property to QueryParser to allow 16982 selection of old RangeQuery class if required. 16983 (Mark Harwood) 16984 16985 6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term 16986 does not contain a wildcard character (? or *), when previously a 16987 StringIndexOutOfBoundsException was thrown. 16988 (Michael Busch via Erik Hatcher) 16989 16990 7. LUCENE-726: Removed the use of deprecated doc.fields() method and 16991 Enumeration. 16992 (Michael Busch via Otis Gospodnetic) 16993 16994 8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader, 16995 and added a call to enumerators.remove() in TermInfosReader.close(). 16996 The finalize() overrides were added to help with a pre-1.4.2 JVM bug 16997 that has since been fixed, plus we no longer support pre-1.4.2 JVMs. 16998 (Otis Gospodnetic) 16999 17000 9. LUCENE-771: The default location of the write lock is now the 17001 index directory, and is named simply "write.lock" (without a big 17002 digest prefix). The system properties "org.apache.lucene.lockDir" 17003 nor "java.io.tmpdir" are no longer used as the global directory 17004 for storing lock files, and the LOCK_DIR field of FSDirectory is 17005 now deprecated. (Mike McCandless) 17006 17007New features 17008 17009 1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers 17010 (Samphan Raruenrom via Chris Hostetter) 17011 17012 2. LUCENE-545: New FieldSelector API and associated changes to 17013 IndexReader and implementations. New Fieldable interface for use 17014 with the lazy field loading mechanism. (Grant Ingersoll and Chuck 17015 Williams via Grant Ingersoll) 17016 17017 3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura 17018 Smolsky, Yonik Seeley) 17019 17020 4. LUCENE-678: Added NativeFSLockFactory, which implements locking 17021 using OS native locking (via java.nio.*). (Michael McCandless via 17022 Yonik Seeley) 17023 17024 5. LUCENE-544: Added the ability to specify different boosts for 17025 different fields when using MultiFieldQueryParser (Matt Ericson 17026 via Otis Gospodnetic) 17027 17028 6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't 17029 optimize the index when adding new segments, only performing 17030 merges as needed. (Ning Li via Yonik Seeley) 17031 17032 7. LUCENE-573: QueryParser now allows backslash escaping in 17033 quoted terms and phrases. (Michael Busch via Yonik Seeley) 17034 17035 8. LUCENE-716: QueryParser now allows specification of Unicode 17036 characters in terms via a unicode escape of the form \uXXXX 17037 (Michael Busch via Yonik Seeley) 17038 17039 9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes() 17040 and IndexWriter.flushRamSegments(), allowing applications to 17041 control the amount of memory used to buffer documents. 17042 (Chuck Williams via Yonik Seeley) 17043 1704410. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery 17045 (Yonik Seeley) 17046 1704711. LUCENE-741: Command-line utility for modifying or removing norms 17048 on fields in an existing index. This is mostly based on LUCENE-496 17049 and lives in contrib/miscellaneous. 17050 (Chris Hostetter, Otis Gospodnetic) 17051 1705212. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and 17053 their passing unit tests. 17054 (Otis Gospodnetic) 17055 1705613. LUCENE-565: Added methods to IndexWriter to more efficiently 17057 handle updating documents (the "delete then add" use case). This 17058 is intended to be an eventual replacement for the existing 17059 IndexModifier. Added IndexWriter.flush() (renamed from 17060 flushRamSegments()) to flush all pending updates (held in RAM), to 17061 the Directory. (Ning Li via Mike McCandless) 17062 1706314. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options 17064 which allow one to retrieve the size of a field without retrieving the 17065 actual field. (Chuck Williams via Grant Ingersoll) 17066 1706715. LUCENE-799: Properly handle lazy, compressed fields. 17068 (Mike Klaas via Grant Ingersoll) 17069 17070API Changes 17071 17072 1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow 17073 changing of termText via setTermText(). (Yonik Seeley) 17074 17075 2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated 17076 and is supposed to be replaced with the WordlistLoader class in 17077 package org.apache.lucene.analysis (Daniel Naber) 17078 17079 3. LUCENE-609: Revert return type of Document.getField(s) to Field 17080 for backward compatibility, added new Document.getFieldable(s) 17081 for access to new lazy loaded fields. (Yonik Seeley) 17082 17083 4. LUCENE-608: Document.fields() has been deprecated and a new method 17084 Document.getFields() has been added that returns a List instead of 17085 an Enumeration (Daniel Naber) 17086 17087 5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation 17088 subclass allows explain methods to produce Explanations which model 17089 "matching" independent of having a positive value. 17090 (Chris Hostetter) 17091 17092 6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout 17093 and IndexWriter.setDefaultCommitLockTimeout for overriding default 17094 timeout values for all future instances of IndexWriter (as well 17095 as for any other classes that may reference the static values, 17096 ie: IndexReader). 17097 (Michael McCandless via Chris Hostetter) 17098 17099 7. LUCENE-638: FSDirectory.list() now only returns the directory's 17100 Lucene-related files. Thanks to this change one can now construct 17101 a RAMDirectory from a file system directory that contains files 17102 not related to Lucene. 17103 (Simon Willnauer via Daniel Naber) 17104 17105 8. LUCENE-635: Decoupling locking implementation from Directory 17106 implementation. Added set/getLockFactory to Directory and moved 17107 all locking code into subclasses of abstract class LockFactory. 17108 FSDirectory and RAMDirectory still default to their prior locking 17109 implementations, but now you can mix & match, for example using 17110 SingleInstanceLockFactory (ie, in memory locking) locking with an 17111 FSDirectory. Note that now you must call setDisableLocks before 17112 the instantiation a FSDirectory if you wish to disable locking 17113 for that Directory. 17114 (Michael McCandless, Jeff Patterson via Yonik Seeley) 17115 17116 9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected. 17117 (Steven Parkes via Otis Gospodnetic) 17118 1711910. LUCENE-701: Lockless commits: a commit lock is no longer required 17120 when a writer commits and a reader opens the index. This includes 17121 a change to the index file format (see docs/fileformats.html for 17122 details). It also removes all APIs associated with the commit 17123 lock & its timeout. Readers are now truly read-only and do not 17124 block one another on startup. This is the first step to getting 17125 Lucene to work correctly over NFS (second step is 17126 LUCENE-710). (Mike McCandless) 17127 1712811. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ 17129 in Similarity's MoreLikeThis class. The misspelling has been 17130 replaced by the correct spelling. 17131 (Andi Vajda via Daniel Naber) 17132 1713312. LUCENE-738: Reduce the size of the file that keeps track of which 17134 documents are deleted when the number of deleted documents is 17135 small. This changes the index file format and cannot be 17136 read by previous versions of Lucene. (Doron Cohen via Yonik Seeley) 17137 1713813. LUCENE-756: Maintain all norms in a single .nrm file to reduce the 17139 number of open files and file descriptors for the non-compound index 17140 format. This changes the index file format, but maintains the 17141 ability to read and update older indices. The first segment merge 17142 on an older format index will create a single .nrm file for the new 17143 segment. (Doron Cohen via Yonik Seeley) 17144 1714514. LUCENE-732: DateTools support has been added to QueryParser, with 17146 setters for both the default Resolution, and per-field Resolution. 17147 For backwards compatibility, DateField is still used if no Resolutions 17148 are specified. (Michael Busch via Chris Hostetter) 17149 1715015. Added isOptimized() method to IndexReader. 17151 (Otis Gospodnetic) 17152 1715316. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that 17154 take a boolean "create" argument. Instead you should use 17155 IndexWriter's "create" argument to create a new index. 17156 (Mike McCandless) 17157 1715817. LUCENE-780: Add a static Directory.copy() method to copy files 17159 from one Directory to another. (Jiri Kuhn via Mike McCandless) 17160 1716118. LUCENE-773: Added Directory.clearLock(String name) to forcefully 17162 remove an old lock. The default implementation is to ask the 17163 lockFactory (if non null) to clear the lock. (Mike McCandless) 17164 1716519. LUCENE-795: Directory.renameFile() has been deprecated as it is 17166 not used anymore inside Lucene. (Daniel Naber) 17167 17168Bug fixes 17169 17170 1. Fixed the web application demo (built with "ant war-demo") which 17171 didn't work because it used a QueryParser method that had 17172 been removed (Daniel Naber) 17173 17174 2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement 17175 (Yonik Seeley) 17176 17177 3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar 17178 (Karl Wettin via Yonik Seeley) 17179 17180 4. LUCENE-587: Explanation.toHtml was producing malformed HTML 17181 (Chris Hostetter) 17182 17183 5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley) 17184 17185 6. LUCENE-601: RAMDirectory and RAMFile made Serializable 17186 (Karl Wettin via Otis Gospodnetic) 17187 17188 7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score 17189 Explanations match up with the real scores. 17190 (Chris Hostetter) 17191 17192 8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to 17193 new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley) 17194 17195 9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj: 17196 disambiguate inner class scorer's use of doc() in BooleanScorer2, 17197 other test code changes. (DM Smith via Yonik Seeley) 17198 1719910. LUCENE-451: All core query types now use ComplexExplanations so that 17200 boosts of zero don't confuse the BooleanWeight explain method. 17201 (Chris Hostetter) 17202 1720311. LUCENE-593: Fixed LuceneDictionary's inner Iterator 17204 (Kåre Fiedler Christiansen via Otis Gospodnetic) 17205 1720612. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength() 17207 (Daniel Naber) 17208 1720913. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap() 17210 to the correct analyzer for the field. (Chuck Williams via Yonik Seeley) 17211 1721214. LUCENE-650: Fixed NPE in Locale specific String Sort when Document 17213 has no value. 17214 (Oliver Hutchison via Chris Hostetter) 17215 1721615. LUCENE-683: Fixed data corruption when reading lazy loaded fields. 17217 (Yonik Seeley) 17218 1721916. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same 17220 lock to be shared between different directories. 17221 (Michael McCandless via Yonik Seeley) 17222 1722317. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields. 17224 (Yonik Seeley) 17225 1722618. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo() 17227 called on it before next(). (Yonik Seeley) 17228 1722919. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail 17230 to recognize ordered spans if they overlapped with unordered spans. 17231 (Paul Elschot via Chris Hostetter) 17232 1723320. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value 17234 in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll) 17235 1723621. LUCENE-715: Fixed private constructor in IndexWriter.java to 17237 properly release the acquired write lock if there is an 17238 IOException after acquiring the write lock but before finishing 17239 instantiation. (Matthew Bogosian via Mike McCandless) 17240 1724122. LUCENE-651: Multiple different threads requesting the same 17242 FieldCache entry (often for Sorting by a field) at the same 17243 time caused multiple generations of that entry, which was 17244 detrimental to performance and memory use. 17245 (Oliver Hutchison via Otis Gospodnetic) 17246 1724723. LUCENE-717: Fixed build.xml not to fail when there is no lib dir. 17248 (Doron Cohen via Otis Gospodnetic) 17249 1725024. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries 17251 classes from contrib/similarity, as their new home is under 17252 contrib/queries. 17253 (Otis Gospodnetic) 17254 1725525. LUCENE-669: Do not double-close the RandomAccessFile in 17256 FSIndexInput/Output during finalize(). Besides sending an 17257 IOException up to the GC, this may also be the cause intermittent 17258 "The handle is invalid" IOExceptions on Windows when trying to 17259 close readers or writers. (Michael Busch via Mike McCandless) 17260 1726126. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index 17262 on any exceptions (eg disk full). The semantics of these methods 17263 is now transactional: either all indices are merged or none are. 17264 Also fixed IndexWriter.mergeSegments (called outside of 17265 addIndexes(*) by addDocument, optimize, flushRamSegments) and 17266 IndexReader.commit() (called by close) to clean up and keep the 17267 instance state consistent to what's actually in the index (Mike 17268 McCandless). 17269 1727027. LUCENE-129: Change finalizers to do "try {...} finally 17271 {super.finalize();}" to make sure we don't miss finalizers in 17272 classes above us. (Esmond Pitt via Mike McCandless) 17273 1727428. LUCENE-754: Fix a problem introduced by LUCENE-651, causing 17275 IndexReaders to hang around forever, in addition to not 17276 fixing the original FieldCache performance problem. 17277 (Chris Hostetter, Yonik Seeley) 17278 1727929. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to 17280 correctly raise ArrayIndexOutOfBoundsException when docNum is too 17281 large. Previously, if docNum was only slightly too large (within 17282 the same multiple of 8, ie, up to 7 ints beyond maxDoc), no 17283 exception would be raised and instead the index would become 17284 silently corrupted. The corruption then only appears much later, 17285 in mergeSegments, when the corrupted segment is merged with 17286 segment(s) after it. (Mike McCandless) 17287 1728830. LUCENE-768: Fix case where an Exception during deleteDocument, 17289 undeleteAll or setNorm in IndexReader could leave the reader in a 17290 state where close() fails to release the write lock. 17291 (Mike McCandless) 17292 1729331. Remove "tvp" from known index file extensions because it is 17294 never used. (Nicolas Lalevée via Bernhard Messer) 17295 1729632. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not 17297 rely on file length check and instead use the SegmentInfo's 17298 docCount that's already stored explicitly in the index. This is a 17299 defensive bug fix (ie, there is no known problem seen "in real 17300 life" due to this, just a possible future problem). (Chuck 17301 Williams via Mike McCandless) 17302 17303Optimizations 17304 17305 1. LUCENE-586: TermDocs.skipTo() is now more efficient for 17306 multi-segment indexes. This will improve the performance of many 17307 types of queries against a non-optimized index. (Andrew Hudson 17308 via Yonik Seeley) 17309 17310 2. LUCENE-623: RAMDirectory.close now nulls out its reference to all 17311 internal "files", allowing them to be GCed even if references to the 17312 RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter) 17313 17314 3. LUCENE-629: Compressed fields are no longer uncompressed and 17315 recompressed during segment merges (e.g. during indexing or 17316 optimizing), thus improving performance . (Michael Busch via Otis 17317 Gospodnetic) 17318 17319 4. LUCENE-388: Improve indexing performance when maxBufferedDocs is 17320 large by keeping a count of buffered documents rather than 17321 counting after each document addition. (Doron Cohen, Paul Smith, 17322 Yonik Seeley) 17323 17324 5. Modified TermScorer.explain to use TermDocs.skipTo() instead of 17325 looping through docs. (Grant Ingersoll) 17326 17327 6. LUCENE-672: New indexing segment merge policy flushes all 17328 buffered docs to their own segment and delays a merge until 17329 mergeFactor segments of a certain level have been accumulated. 17330 This increases indexing performance in the presence of deleted 17331 docs or partially full segments as well as enabling future 17332 optimizations. 17333 17334 NOTE: this also fixes an "under-merging" bug whereby it is 17335 possible to get far too many segments in your index (which will 17336 drastically slow down search, risks exhausting file descriptor 17337 limit, etc.). This can happen when the number of buffered docs 17338 at close, plus the number of docs in the last non-ram segment is 17339 greater than mergeFactor. (Ning Li, Yonik Seeley) 17340 17341 7. Lazy loaded fields unnecessarily retained an extra copy of loaded 17342 String data. (Yonik Seeley) 17343 17344 8. LUCENE-443: ConjunctionScorer performance increase. Speed up 17345 any BooleanQuery with more than one mandatory clause. 17346 (Abdul Chaudhry, Paul Elschot via Yonik Seeley) 17347 17348 9. LUCENE-365: DisjunctionSumScorer performance increase of 17349 ~30%. Speeds up queries with optional clauses. (Paul Elschot via 17350 Yonik Seeley) 17351 17352 10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium 17353 size buffers, which will speed up merging and retrieving binary 17354 and compressed fields. (Nadav Har'El via Yonik Seeley) 17355 17356 11. LUCENE-687: Lazy skipping on proximity file speeds up most 17357 queries involving term positions, including phrase queries. 17358 (Michael Busch via Yonik Seeley) 17359 17360 12. LUCENE-714: Replaced 2 cases of manual for-loop array copying 17361 with calls to System.arraycopy instead, in DocumentWriter.java. 17362 (Nicolas Lalevee via Mike McCandless) 17363 17364 13. LUCENE-729: Non-recursive skipTo and next implementation of 17365 TermDocs for a MultiReader. The old implementation could 17366 recurse up to the number of segments in the index. (Yonik Seeley) 17367 17368 14. LUCENE-739: Improve segment merging performance by reusing 17369 the norm array across different fields and doing bulk writes 17370 of norms of segments with no deleted docs. 17371 (Michael Busch via Yonik Seeley) 17372 17373 15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access 17374 to the List of clauses and replaced the internal synchronized Vector 17375 with an unsynchronized List. (Yonik Seeley) 17376 17377 16. LUCENE-750: Remove finalizers from FSIndexOutput and move the 17378 FSIndexInput finalizer to the actual file so all clones don't 17379 register a new finalizer. (Yonik Seeley) 17380 17381Test Cases 17382 17383 1. Added TestTermScorer.java (Grant Ingersoll) 17384 17385 2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless) 17386 17387 3. LUCENE-744 Append the user.name property onto the temporary directory 17388 that is created so it doesn't interfere with other users. (Grant Ingersoll) 17389 17390Documentation 17391 17392 1. Added style sheet to xdocs named lucene.css and included in the 17393 Anakia VSL descriptor. (Grant Ingersoll) 17394 17395 2. Added scoring.xml document into xdocs. Updated Similarity.java 17396 scoring formula.(Grant Ingersoll and Steve Rowe. Updates from: 17397 Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting). 17398 Issue 664. 17399 17400 3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll) 17401 17402 4. Moved xdocs directory to src/site/src/documentation/content/xdocs per 17403 Issue 707. Site now builds using Forrest, just like the other Lucene 17404 siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite 17405 for info on updating the website. (Grant Ingersoll with help from Steve Rowe, 17406 Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley) 17407 17408 5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll) 17409 17410 6. LUCENE-713 Updated the Term Vector section of File Formats to include 17411 documentation on how Offset and Position info are stored in the TVF file. 17412 (Grant Ingersoll, Samir Abdou) 17413 17414 7. Added in link to Clover Test Code Coverage Reports under the Develop 17415 section in Resources (Grant Ingersoll) 17416 17417 8. LUCENE-748: Added details for semantics of IndexWriter.close on 17418 hitting an Exception. (Jed Wesley-Smith via Mike McCandless) 17419 17420 9. Added some text about what is contained in releases. 17421 (Eric Haszlakiewicz via Grant Ingersoll) 17422 17423 10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory) 17424 makes a full copy of the starting Directory. (Mike McCandless) 17425 17426 11. LUCENE-764: Fix javadocs to detail temporary space requirements 17427 for IndexWriter's optimize(), addIndexes(*) and addDocument(...) 17428 methods. (Mike McCandless) 17429 17430Build 17431 17432 1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721 17433 To enable clover code coverage, you must have clover.jar in the ANT 17434 classpath and specify -Drun.clover=true on the command line. 17435 (Michael Busch and Grant Ingersoll) 17436 17437 2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to 17438 ${build.dir}/test just like the tempDir sysproperty. 17439 17440 3. LUCENE-757 Added new target named init-dist that does setup for 17441 distribution of both binary and source distributions. Called by package 17442 and package-*-src 17443 17444======================= Release 2.0.0 ======================= 17445 17446API Changes 17447 17448 1. All deprecated methods and fields have been removed, except 17449 DateField, which will still be supported for some time 17450 so Lucene can read its date fields from old indexes 17451 (Yonik Seeley & Grant Ingersoll) 17452 17453 2. DisjunctionSumScorer is no longer public. 17454 (Paul Elschot via Otis Gospodnetic) 17455 17456 3. Creating a Field with both an empty name and an empty value 17457 now throws an IllegalArgumentException 17458 (Daniel Naber) 17459 17460 4. LUCENE-301: Added new IndexWriter({String,File,Directory}, 17461 Analyzer) constructors that do not take a boolean "create" 17462 argument. These new constructors will create a new index if 17463 necessary, else append to the existing one. (Dan Armbrust via 17464 Mike McCandless) 17465 17466New features 17467 17468 1. LUCENE-496: Command line tool for modifying the field norms of an 17469 existing index; added to contrib/miscellaneous. (Chris Hostetter) 17470 17471 2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous. 17472 (Chris Hostetter) 17473 17474Bug fixes 17475 17476 1. LUCENE-330: Fix issue of FilteredQuery not working properly within 17477 BooleanQuery. (Paul Elschot via Erik Hatcher) 17478 17479 2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work 17480 with RemoteSearchable. (Philippe Laflamme via Yonik Seeley) 17481 17482 3. Added methods to get/set writeLockTimeout and commitLockTimeout in 17483 IndexWriter. These could be set in Lucene 1.4 using a system property. 17484 This feature had been removed without adding the corresponding 17485 getter/setter methods. (Daniel Naber) 17486 17487 4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions 17488 when using SpanQueries. (Paul Elschot via Yonik Seeley) 17489 17490 5. Implemented FilterIndexReader.getVersion() and isCurrent() 17491 (Yonik Seeley) 17492 17493 6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[]) 17494 that sometimes caused the index order of documents to change. 17495 (Yonik Seeley) 17496 17497 7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused 17498 subsequent String sorts with different locales to sort identically. 17499 (Paul Cowan via Yonik Seeley) 17500 17501 8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery 17502 (Stefan Will via Yonik Seeley) 17503 17504 9. LUCENE-514: Added getTermArrays() and extractTerms() to 17505 MultiPhraseQuery (Eric Jain & Yonik Seeley) 17506 1750710. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors 17508 (frederic via Yonik) 17509 1751011. LUCENE-352: Fixed bug in SpanNotQuery that manifested as 17511 NullPointerException when "exclude" query was not a SpanTermQuery. 17512 (Chris Hostetter) 17513 1751412. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause 17515 (Chris Hostetter) 17516 1751713. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader 17518 didn't know about the field yet, reader didn't keep track if it had deletions, 17519 and deleteDocument calls could circumvent synchronization on the subreaders. 17520 (Chuck Williams via Yonik Seeley) 17521 1752214. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and 17523 ConstantScoreQuery in order to allow their use with a MultiSearcher. 17524 (Yonik Seeley) 17525 1752615. LUCENE-546: Removed 2GB file size limitations for RAMDirectory. 17527 (Peter Royal, Michael Chan, Yonik Seeley) 17528 1752916. LUCENE-485: Don't hold commit lock while removing obsolete index 17530 files. (Luc Vanlerberghe via cutting) 17531 17532 175331.9.1 17534 17535Bug fixes 17536 17537 1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization 17538 introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting) 17539 175401.9 final 17541 17542Note that this release is mostly but not 100% source compatible with 17543the previous release of Lucene (1.4.3). In other words, you should 17544make sure your application compiles with this version of Lucene before 17545you replace the old Lucene JAR with the new one. Many methods have 17546been deprecated in anticipation of release 2.0, so deprecation 17547warnings are to be expected when upgrading from 1.4.3 to 1.9. 17548 17549Bug fixes 17550 17551 1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative 17552 effects on indexing performance and has thus been reverted. The 17553 argument for setMaxBufferedDocs(int) must now at least be 2, otherwise 17554 an exception is thrown. (Daniel Naber) 17555 17556Optimizations 17557 17558 1. Optimized BufferedIndexOutput.writeBytes() to use 17559 System.arraycopy() in more cases, rather than copying byte-by-byte. 17560 (Lukas Zapletal via Cutting) 17561 175621.9 RC1 17563 17564Requirements 17565 17566 1. To compile and use Lucene you now need Java 1.4 or later. 17567 17568Changes in runtime behavior 17569 17570 1. FuzzyQuery can no longer throw a TooManyClauses exception. If a 17571 FuzzyQuery expands to more than BooleanQuery.maxClauseCount 17572 terms only the BooleanQuery.maxClauseCount most similar terms 17573 go into the rewritten query and thus the exception is avoided. 17574 (Christoph) 17575 17576 2. Changed system property from "org.apache.lucene.lockdir" to 17577 "org.apache.lucene.lockDir", so that its casing follows the existing 17578 pattern used in other Lucene system properties. (Bernhard) 17579 17580 3. The terms of RangeQueries and FuzzyQueries are now converted to 17581 lowercase by default (as it has been the case for PrefixQueries 17582 and WildcardQueries before). Use setLowercaseExpandedTerms(false) 17583 to disable that behavior but note that this also affects 17584 PrefixQueries and WildcardQueries. (Daniel Naber) 17585 17586 4. Document frequency that is computed when MultiSearcher is used is now 17587 computed correctly and "globally" across subsearchers and indices, while 17588 before it used to be computed locally to each index, which caused 17589 ranking across multiple indices not to be equivalent. 17590 (Chuck Williams, Wolf Siberski via Otis, bug #31841) 17591 17592 5. When opening an IndexWriter with create=true, Lucene now only deletes 17593 its own files from the index directory (looking at the file name suffixes 17594 to decide if a file belongs to Lucene). The old behavior was to delete 17595 all files. (Daniel Naber and Bernhard Messer, bug #34695) 17596 17597 6. The version of an IndexReader, as returned by getCurrentVersion() 17598 and getVersion() doesn't start at 0 anymore for new indexes. Instead, it 17599 is now initialized by the system time in milliseconds. 17600 (Bernhard Messer via Daniel Naber) 17601 17602 7. Several default values cannot be set via system properties anymore, as 17603 this has been considered inappropriate for a library like Lucene. For 17604 most properties there are set/get methods available in IndexWriter which 17605 you should use instead. This affects the following properties: 17606 See IndexWriter for getter/setter methods: 17607 org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout, 17608 org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs, 17609 org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval, 17610 org.apache.lucene.mergeFactor, 17611 See BooleanQuery for getter/setter methods: 17612 org.apache.lucene.maxClauseCount 17613 See FSDirectory for getter/setter methods: 17614 disableLuceneLocks 17615 (Daniel Naber) 17616 17617 8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser, 17618 instead of using Integer and Float classes for parsing. 17619 (Yonik Seeley via Otis Gospodnetic) 17620 17621 9. Expert level search routines returning TopDocs and TopFieldDocs 17622 no longer normalize scores. This also fixes bugs related to 17623 MultiSearchers and score sorting/normalization. 17624 (Luc Vanlerberghe via Yonik Seeley, LUCENE-469) 17625 17626New features 17627 17628 1. Added support for stored compressed fields (patch #31149) 17629 (Bernhard Messer via Christoph) 17630 17631 2. Added support for binary stored fields (patch #29370) 17632 (Drew Farris and Bernhard Messer via Christoph) 17633 17634 3. Added support for position and offset information in term vectors 17635 (patch #18927). (Grant Ingersoll & Christoph) 17636 17637 4. A new class DateTools has been added. It allows you to format dates 17638 in a readable format adequate for indexing. Unlike the existing 17639 DateField class DateTools can cope with dates before 1970 and it 17640 forces you to specify the desired date resolution (e.g. month, day, 17641 second, ...) which can make RangeQuerys on those fields more efficient. 17642 (Daniel Naber) 17643 17644 5. QueryParser now correctly works with Analyzers that can return more 17645 than one token per position. For example, a query "+fast +car" 17646 would be parsed as "+fast +(car automobile)" if the Analyzer 17647 returns "car" and "automobile" at the same position whenever it 17648 finds "car" (Patch #23307). 17649 (Pierrick Brihaye, Daniel Naber) 17650 17651 6. Permit unbuffered Directory implementations (e.g., using mmap). 17652 InputStream is replaced by the new classes IndexInput and 17653 BufferedIndexInput. OutputStream is replaced by the new classes 17654 IndexOutput and BufferedIndexOutput. InputStream and OutputStream 17655 are now deprecated and FSDirectory is now subclassable. (cutting) 17656 17657 7. Add native Directory and TermDocs implementations that work under 17658 GCJ. These require GCC 3.4.0 or later and have only been tested 17659 on Linux. Use 'ant gcj' to build demo applications. (cutting) 17660 17661 8. Add MMapDirectory, which uses nio to mmap input files. This is 17662 still somewhat slower than FSDirectory. However it uses less 17663 memory per query term, since a new buffer is not allocated per 17664 term, which may help applications which use, e.g., wildcard 17665 queries. It may also someday be faster. (cutting & Paul Elschot) 17666 17667 9. Added javadocs-internal to build.xml - bug #30360 17668 (Paul Elschot via Otis) 17669 1767010. Added RangeFilter, a more generically useful filter than DateFilter. 17671 (Chris M Hostetter via Erik) 17672 1767311. Added NumberTools, a utility class indexing numeric fields. 17674 (adapted from code contributed by Matt Quail; committed by Erik) 17675 1767612. Added public static IndexReader.main(String[] args) method. 17677 IndexReader can now be used directly at command line level 17678 to list and optionally extract the individual files from an existing 17679 compound index file. 17680 (adapted from code contributed by Garrett Rooney; committed by Bernhard) 17681 1768213. Add IndexWriter.setTermIndexInterval() method. See javadocs. 17683 (Doug Cutting) 17684 1768514. Added LucenePackage, whose static get() method returns java.util.Package, 17686 which lets the caller get the Lucene version information specified in 17687 the Lucene Jar. 17688 (Doug Cutting via Otis) 17689 1769015. Added Hits.iterator() method and corresponding HitIterator and Hit objects. 17691 This provides standard java.util.Iterator iteration over Hits. 17692 Each call to the iterator's next() method returns a Hit object. 17693 (Jeremy Rayner via Erik) 17694 1769516. Add ParallelReader, an IndexReader that combines separate indexes 17696 over different fields into a single virtual index. (Doug Cutting) 17697 1769817. Add IntParser and FloatParser interfaces to FieldCache, so that 17699 fields in arbitrarily formats can be cached as ints and floats. 17700 (Doug Cutting) 17701 1770218. Added class org.apache.lucene.index.IndexModifier which combines 17703 IndexWriter and IndexReader, so you can add and delete documents without 17704 worrying about synchronization/locking issues. 17705 (Daniel Naber) 17706 1770719. Lucene can now be used inside an unsigned applet, as Lucene's access 17708 to system properties will not cause a SecurityException anymore. 17709 (Jon Schuster via Daniel Naber, bug #34359) 17710 1771120. Added a new class MatchAllDocsQuery that matches all documents. 17712 (John Wang via Daniel Naber, bug #34946) 17713 1771421. Added ability to omit norms on a per field basis to decrease 17715 index size and memory consumption when there are many indexed fields. 17716 See Field.setOmitNorms() 17717 (Yonik Seeley, LUCENE-448) 17718 1771922. Added NullFragmenter to contrib/highlighter, which is useful for 17720 highlighting entire documents or fields. 17721 (Erik Hatcher) 17722 1772323. Added regular expression queries, RegexQuery and SpanRegexQuery. 17724 Note the same term enumeration caveats apply with these queries as 17725 apply to WildcardQuery and other term expanding queries. 17726 These two new queries are not currently supported via QueryParser. 17727 (Erik Hatcher) 17728 1772924. Added ConstantScoreQuery which wraps a filter and produces a score 17730 equal to the query boost for every matching document. 17731 (Yonik Seeley, LUCENE-383) 17732 1773325. Added ConstantScoreRangeQuery which produces a constant score for 17734 every document in the range. One advantage over a normal RangeQuery 17735 is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum 17736 number of terms the range can cover. Both endpoints may also be open. 17737 (Yonik Seeley, LUCENE-383) 17738 1773926. Added ability to specify a minimum number of optional clauses that 17740 must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch(). 17741 (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395) 17742 1774327. Added DisjunctionMaxQuery which provides the maximum score across its clauses. 17744 It's very useful for searching across multiple fields. 17745 (Chuck Williams via Yonik Seeley, LUCENE-323) 17746 1774728. New class ISOLatin1AccentFilter that replaces accented characters in the ISO 17748 Latin 1 character set by their unaccented equivalent. 17749 (Sven Duzont via Erik Hatcher) 17750 1775129. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token. 17752 This is useful for data like zip codes, ids, and some product names. 17753 (Erik Hatcher) 17754 1775530. Copied LengthFilter from contrib area to core. Removes words that are too 17756 long and too short from the stream. 17757 (David Spencer via Otis and Daniel) 17758 1775931. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows 17760 custom analyzers to put gaps between Field instances with the same field 17761 name, preventing phrase or span queries crossing these boundaries. The 17762 default implementation issues a gap of 0, allowing the default token 17763 position increment of 1 to put the next field's first token into a 17764 successive position. 17765 (Erik Hatcher, with advice from Yonik) 17766 1776732. StopFilter can now ignore case when checking for stop words. 17768 (Grant Ingersoll via Yonik, LUCENE-248) 17769 1777033. Add TopDocCollector and TopFieldDocCollector. These simplify the 17771 implementation of hit collectors that collect only the 17772 top-scoring or top-sorting hits. 17773 17774API Changes 17775 17776 1. Several methods and fields have been deprecated. The API documentation 17777 contains information about the recommended replacements. It is planned 17778 that most of the deprecated methods and fields will be removed in 17779 Lucene 2.0. (Daniel Naber) 17780 17781 2. The Russian and the German analyzers have been moved to contrib/analyzers. 17782 Also, the WordlistLoader class has been moved one level up in the 17783 hierarchy and is now org.apache.lucene.analysis.WordlistLoader 17784 (Daniel Naber) 17785 17786 3. The API contained methods that declared to throw an IOException 17787 but that never did this. These declarations have been removed. If 17788 your code tries to catch these exceptions you might need to remove 17789 those catch clauses to avoid compile errors. (Daniel Naber) 17790 17791 4. Add a serializable Parameter Class to standardize parameter enum 17792 classes in BooleanClause and Field. (Christoph) 17793 17794 5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys. 17795 This allows custom SpanQuery subclasses that rewrite (for term expansion, for 17796 example) to nest within the built-in SpanQuery classes successfully. 17797 17798Bug fixes 17799 17800 1. The JSP demo page (src/jsp/results.jsp) now properly closes the 17801 IndexSearcher it opens. (Daniel Naber) 17802 17803 2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that 17804 prevented deletion of obsolete segments. (Christoph Goller) 17805 17806 3. Fix in FieldInfos to avoid the return of an extra blank field in 17807 IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard) 17808 17809 4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly 17810 PhrasePrefixQuery) could provoke UnsupportedOperationException 17811 (bug #33161). (Rhett Sutphin via Daniel Naber) 17812 17813 5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException 17814 if skipTo() was called without prior call to next() fixed. (Christoph) 17815 17816 6. Disable Similiarty.coord() in the scoring of most automatically 17817 generated boolean queries. The coord() score factor is 17818 appropriate when clauses are independently specified by a user, 17819 but is usually not appropriate when clauses are generated 17820 automatically, e.g., by a fuzzy, wildcard or range query. Matches 17821 on such automatically generated queries are no longer penalized 17822 for not matching all terms. (Doug Cutting, Patch #33472) 17823 17824 7. Getting a lock file with Lock.obtain(long) was supposed to wait for 17825 a given amount of milliseconds, but this didn't work. 17826 (John Wang via Daniel Naber, Bug #33799) 17827 17828 8. Fix FSDirectory.createOutput() to always create new files. 17829 Previously, existing files were overwritten, and an index could be 17830 corrupted when the old version of a file was longer than the new. 17831 Now any existing file is first removed. (Doug Cutting) 17832 17833 9. Fix BooleanQuery containing nested SpanTermQuery's, which previously 17834 could return an incorrect number of hits. 17835 (Reece Wilton via Erik Hatcher, Bug #35157) 17836 1783710. Fix NullPointerException that could occur with a MultiPhraseQuery 17838 inside a BooleanQuery. 17839 (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626) 17840 1784111. Fixed SnowballFilter to pass through the position increment from 17842 the original token. 17843 (Yonik Seeley via Erik Hatcher, LUCENE-437) 17844 1784512. Added Unicode range of Korean characters to StandardTokenizer, 17846 grouping contiguous characters into a token rather than one token 17847 per character. This change also changes the token type to "<CJ>" 17848 for Chinese and Japanese character tokens (previously it was "<CJK>"). 17849 (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461) 17850 1785113. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and 17852 FieldInfo.storePositionWithTermVector and creates the Field with 17853 correct TermVector parameter. 17854 (Frank Steinmann via Bernhard, LUCENE-455) 17855 1785614. Fixed WildcardQuery to prevent "cat" matching "ca??". 17857 (Xiaozheng Ma via Bernhard, LUCENE-306) 17858 1785915. Fixed a bug where MultiSearcher and ParallelMultiSearcher could 17860 change the sort order when sorting by string for documents without 17861 a value for the sort field. 17862 (Luc Vanlerberghe via Yonik, LUCENE-453) 17863 1786416. Fixed a sorting problem with MultiSearchers that can lead to 17865 missing or duplicate docs due to equal docs sorting in an arbitrary order. 17866 (Yonik Seeley, LUCENE-456) 17867 1786817. A single hit using the expert level sorted search methods 17869 resulted in the score not being normalized. 17870 (Yonik Seeley, LUCENE-462) 17871 1787218. Fixed inefficient memory usage when loading an index into RAMDirectory. 17873 (Volodymyr Bychkoviak via Bernhard, LUCENE-475) 17874 1787519. Corrected term offsets returned by ChineseTokenizer. 17876 (Ray Tsang via Erik Hatcher, LUCENE-324) 17877 1787820. Fixed MultiReader.undeleteAll() to correctly update numDocs. 17879 (Robert Kirchgessner via Doug Cutting, LUCENE-479) 17880 1788121. Race condition in IndexReader.getCurrentVersion() and isCurrent() 17882 fixed by acquiring the commit lock. 17883 (Luc Vanlerberghe via Yonik Seeley, LUCENE-481) 17884 1788522. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect, 17886 this has now been fixed. (Daniel Naber) 17887 1788823. Fixed QueryParser when called with a date in local form like 17889 "[1/16/2000 TO 1/18/2000]". This query did not include the documents 17890 of 1/18/2000, i.e. the last day was not included. (Daniel Naber) 17891 1789224. Removed sorting constraint that threw an exception if there were 17893 not yet any values for the sort field (Yonik Seeley, LUCENE-374) 17894 17895Optimizations 17896 17897 1. Disk usage (peak requirements during indexing and optimization) 17898 in case of compound file format has been improved. 17899 (Bernhard, Dmitry, and Christoph) 17900 17901 2. Optimize the performance of certain uses of BooleanScorer, 17902 TermScorer and IndexSearcher. In particular, a BooleanQuery 17903 composed of TermQuery, with not all terms required, that returns a 17904 TopDocs (e.g., through a Hits with no Sort specified) runs much 17905 faster. (cutting) 17906 17907 3. Removed synchronization from reading of term vectors with an 17908 IndexReader (Patch #30736). (Bernhard Messer via Christoph) 17909 17910 4. Optimize term-dictionary lookup to allocate far fewer terms when 17911 scanning for the matching term. This speeds searches involving 17912 low-frequency terms, where the cost of dictionary lookup can be 17913 significant. (cutting) 17914 17915 5. Optimize fuzzy queries so the standard fuzzy queries with a prefix 17916 of 0 now run 20-50% faster (Patch #31882). 17917 (Jonathan Hager via Daniel Naber) 17918 17919 6. A Version of BooleanScorer (BooleanScorer2) added that delivers 17920 documents in increasing order and implements skipTo. For queries 17921 with required or forbidden clauses it may be faster than the old 17922 BooleanScorer, for BooleanQueries consisting only of optional 17923 clauses it is probably slower. The new BooleanScorer is now the 17924 default. (Patch 31785 by Paul Elschot via Christoph) 17925 17926 7. Use uncached access to norms when merging to reduce RAM usage. 17927 (Bug #32847). (Doug Cutting) 17928 17929 8. Don't read term index when random-access is not required. This 17930 reduces time to open IndexReaders and they use less memory when 17931 random access is not required, e.g., when merging segments. The 17932 term index is now read into memory lazily at the first 17933 random-access. (Doug Cutting) 17934 17935 9. Optimize IndexWriter.addIndexes(Directory[]) when the number of 17936 added indexes is larger than mergeFactor. Previously this could 17937 result in quadratic performance. Now performance is n log(n). 17938 (Doug Cutting) 17939 1794010. Speed up the creation of TermEnum for indices with multiple 17941 segments and deleted documents, and thus speed up PrefixQuery, 17942 RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter, 17943 and sorting the first time on a field. 17944 (Yonik Seeley, LUCENE-454) 17945 1794611. Optimized and generalized 32 bit floating point to byte 17947 (custom 8 bit floating point) conversions. Increased the speed of 17948 Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM. 17949 (Yonik Seeley, LUCENE-467) 17950 17951Infrastructure 17952 17953 1. Lucene's source code repository has converted from CVS to 17954 Subversion. The new repository is at 17955 http://svn.apache.org/repos/asf/lucene/java/trunk 17956 17957 2. Lucene's issue tracker has migrated from Bugzilla to JIRA. 17958 Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE 17959 The old issues are still available at 17960 http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx 17961 (use the bug number instead of xxxx) 17962 17963 179641.4.3 17965 17966 1. The JSP demo page (src/jsp/results.jsp) now properly escapes error 17967 messages which might contain user input (e.g. error messages about 17968 query parsing). If you used that page as a starting point for your 17969 own code please make sure your code also properly escapes HTML 17970 characters from user input in order to avoid so-called cross site 17971 scripting attacks. (Daniel Naber) 17972 17973 2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old 17974 API is supported again. (Christoph) 17975 17976 179771.4.2 17978 17979 1. Fixed bug #31241: Sorting could lead to incorrect results (documents 17980 missing, others duplicated) if the sort keys were not unique and there 17981 were more than 100 matches. (Daniel Naber) 17982 17983 2. Memory leak in Sort code (bug #31240) eliminated. 17984 (Rafal Krzewski via Christoph and Daniel) 17985 17986 3. FuzzyQuery now takes an additional parameter that specifies the 17987 minimum similarity that is required for a term to match the query. 17988 The QueryParser syntax for this is term~x, where x is a floating 17989 point number >= 0 and < 1 (a bigger number means that a higher 17990 similarity is required). Furthermore, a prefix can be specified 17991 for FuzzyQuerys so that only those terms are considered similar that 17992 start with this prefix. This can speed up FuzzyQuery greatly. 17993 (Daniel Naber, Christoph Goller) 17994 17995 4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification 17996 of relative positions. (Christoph Goller) 17997 17998 5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions 17999 (patch #9110); some unused method parameters removed; The ability 18000 to specify a minimum similarity for FuzzyQuery has been added. 18001 (Christoph Goller) 18002 18003 6. IndexSearcher optimization: a new ScoreDoc is no longer allocated 18004 for every non-zero-scoring hit. This makes 'OR' queries that 18005 contain common terms substantially faster. (cutting) 18006 18007 180081.4.1 18009 18010 1. Fixed a performance bug in hit sorting code, where values were not 18011 correctly cached. (Aviran via cutting) 18012 18013 2. Fixed errors in file format documentation. (Daniel Naber) 18014 18015 180161.4 final 18017 18018 1. Added "an" to the list of stop words in StopAnalyzer, to complement 18019 the existing "a" there. Fix for bug 28960 18020 (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis) 18021 18022 2. Added new class FieldCache to manage in-memory caches of field term 18023 values. (Tim Jones) 18024 18025 3. Added overloaded getFieldQuery method to QueryParser which 18026 accepts the slop factor specified for the phrase (or the default 18027 phrase slop for the QueryParser instance). This allows overriding 18028 methods to replace a PhraseQuery with a SpanNearQuery instead, 18029 keeping the proper slop factor. (Erik Hatcher) 18030 18031 4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to 18032 UTF-8 and changed the build encoding to UTF-8, to make changed files 18033 compile. (Otis Gospodnetic) 18034 18035 5. Removed synchronization from term lookup under IndexReader methods 18036 termFreq(), termDocs() or termPositions() to improve 18037 multi-threaded performance. (cutting) 18038 18039 6. Fix a bug where obsolete segment files were not deleted on Win32. 18040 18041 180421.4 RC3 18043 18044 1. Fixed several search bugs introduced by the skipTo() changes in 18045 release 1.4RC1. The index file format was changed a bit, so 18046 collections must be re-indexed to take advantage of the skipTo() 18047 optimizations. (Christoph Goller) 18048 18049 2. Added new Document methods, removeField() and removeFields(). 18050 (Christoph Goller) 18051 18052 3. Fixed inconsistencies with index closing. Indexes and directories 18053 are now only closed automatically by Lucene when Lucene opened 18054 them automatically. (Christoph Goller) 18055 18056 4. Added new class: FilteredQuery. (Tim Jones) 18057 18058 5. Added a new SortField type for custom comparators. (Tim Jones) 18059 18060 6. Lock obtain timed out message now displays the full path to the lock 18061 file. (Daniel Naber via Erik) 18062 18063 7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting) 18064 18065 8. Fixed so that FSDirectory's locks still work when the 18066 java.io.tmpdir system property is null. (cutting) 18067 18068 9. Changed FilteredTermEnum's constructor to take no parameters, 18069 as the parameters were ignored anyway (bug #28858) 18070 180711.4 RC2 18072 18073 1. GermanAnalyzer now throws an exception if the stopword file 18074 cannot be found (bug #27987). It now uses LowerCaseFilter 18075 (bug #18410) (Daniel Naber via Otis, Erik) 18076 18077 2. Fixed a few bugs in the file format documentation. (cutting) 18078 18079 180801.4 RC1 18081 18082 1. Changed the format of the .tis file, so that: 18083 18084 - it has a format version number, which makes it easier to 18085 back-compatibly change file formats in the future. 18086 18087 - the term count is now stored as a long. This was the one aspect 18088 of the Lucene's file formats which limited index size. 18089 18090 - a few internal index parameters are now stored in the index, so 18091 that they can (in theory) now be changed from index to index, 18092 although there is not yet an API to do so. 18093 18094 These changes are back compatible. The new code can read old 18095 indexes. But old code will not be able read new indexes. (cutting) 18096 18097 2. Added an optimized implementation of TermDocs.skipTo(). A skip 18098 table is now stored for each term in the .frq file. This only 18099 adds a percent or two to overall index size, but can substantially 18100 speedup many searches. (cutting) 18101 18102 3. Restructured the Scorer API and all Scorer implementations to take 18103 advantage of an optimized TermDocs.skipTo() implementation. In 18104 particular, PhraseQuerys and conjunctive BooleanQuerys are 18105 faster when one clause has substantially fewer matches than the 18106 others. (A conjunctive BooleanQuery is a BooleanQuery where all 18107 clauses are required.) (cutting) 18108 18109 4. Added new class ParallelMultiSearcher. Combined with 18110 RemoteSearchable this makes it easy to implement distributed 18111 search systems. (Jean-Francois Halleux via cutting) 18112 18113 5. Added support for hit sorting. Results may now be sorted by any 18114 indexed field. For details see the javadoc for 18115 Searcher#search(Query, Sort). (Tim Jones via Cutting) 18116 18117 6. Changed FSDirectory to auto-create a full directory tree that it 18118 needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis) 18119 18120 7. Added a new span-based query API. This implements, among other 18121 things, nested phrases. See javadocs for details. (Doug Cutting) 18122 18123 8. Added new method Query.getSimilarity(Searcher), and changed 18124 scorers to use it. This permits one to subclass a Query class so 18125 that it can specify its own Similarity implementation, perhaps 18126 one that delegates through that of the Searcher. (Julien Nioche 18127 via Cutting) 18128 18129 9. Added MultiReader, an IndexReader that combines multiple other 18130 IndexReaders. (Cutting) 18131 1813210. Added support for term vectors. See Field#isTermVectorStored(). 18133 (Grant Ingersoll, Cutting & Dmitry) 18134 1813511. Fixed the old bug with escaping of special characters in query 18136 strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665 18137 (Jean-Francois Halleux via Otis) 18138 1813912. Added support for overriding default values for the following, 18140 using system properties: 18141 - default commit lock timeout 18142 - default maxFieldLength 18143 - default maxMergeDocs 18144 - default mergeFactor 18145 - default minMergeDocs 18146 - default write lock timeout 18147 (Otis) 18148 1814913. Changed QueryParser.jj to allow '-' and '+' within tokens: 18150 http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 18151 (Morus Walter via Otis) 18152 1815314. Changed so that the compound index format is used by default. 18154 This makes indexing a bit slower, but vastly reduces the chances 18155 of file handle problems. (Cutting) 18156 18157 181581.3 final 18159 18160 1. Added catch of BooleanQuery$TooManyClauses in QueryParser to 18161 throw ParseException instead. (Erik Hatcher) 18162 18163 2. Fixed a NullPointerException in Query.explain(). (Doug Cutting) 18164 18165 3. Added a new method IndexReader.setNorm(), that permits one to 18166 alter the boosting of fields after an index is created. 18167 18168 4. Distinguish between the final position and length when indexing a 18169 field. The length is now defined as the total number of tokens, 18170 instead of the final position, as it was previously. Length is 18171 used for score normalization (Similarity.lengthNorm()) and for 18172 controlling memory usage (IndexWriter.maxFieldLength). In both of 18173 these cases, the total number of tokens is a better value to use 18174 than the final token position. Position is used in phrase 18175 searching (see PhraseQuery and Token.setPositionIncrement()). 18176 18177 5. Fix StandardTokenizer's handling of CJK characters (Chinese, 18178 Japanese and Korean ideograms). Previously contiguous sequences 18179 were combined in a single token, which is not very useful. Now 18180 each ideogram generates a separate token, which is more useful. 18181 18182 181831.3 RC3 18184 18185 1. Added minMergeDocs in IndexWriter. This can be raised to speed 18186 indexing without altering the number of files, but only using more 18187 memory. (Julien Nioche via Otis) 18188 18189 2. Fix bug #24786, in query rewriting. (bschneeman via Cutting) 18190 18191 3. Fix bug #16952, in demo HTML parser, skip comments in 18192 javascript. (Christoph Goller) 18193 18194 4. Fix bug #19253, in demo HTML parser, add whitespace as needed to 18195 output (Daniel Naber via Christoph Goller) 18196 18197 5. Fix bug #24301, in demo HTML parser, long titles no longer 18198 hang things. (Christoph Goller) 18199 18200 6. Fix bug #23534, Replace use of file timestamp of segments file 18201 with an index version number stored in the segments file. This 18202 resolves problems when running on file systems with low-resolution 18203 timestamps, e.g., HFS under MacOS X. (Christoph Goller) 18204 18205 7. Fix QueryParser so that TokenMgrError is not thrown, only 18206 ParseException. (Erik Hatcher) 18207 18208 8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller) 18209 18210 9. Fixed a problem compiling TestRussianStem. (Christoph Goller) 18211 1821210. Cleaned up some build stuff. (Erik Hatcher) 18213 18214 182151.3 RC2 18216 18217 1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and 18218 SegmentsReader. (Julien Nioche via otis) 18219 18220 2. Changed file locking to place lock files in 18221 System.getProperty("java.io.tmpdir"), where all users are 18222 permitted to write files. This way folks can open and correctly 18223 lock indexes which are read-only to them. 18224 18225 3. IndexWriter: added a new method, addDocument(Document, Analyzer), 18226 permitting one to easily use different analyzers for different 18227 documents in the same index. 18228 18229 4. Minor enhancements to FuzzyTermEnum. 18230 (Christoph Goller via Otis) 18231 18232 5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher 18233 and MultiIndexSearcher to use it. 18234 (Christoph Goller via Otis) 18235 18236 6. Fixed a bug in IndexWriter that returned incorrect docCount(). 18237 (Christoph Goller via Otis) 18238 18239 7. Fixed SegmentsReader to eliminate the confusing and slightly different 18240 behaviour of TermEnum when dealing with an enumeration of all terms, 18241 versus an enumeration starting from a specific term. 18242 This patch also fixes incorrect term document frequencies when the same term 18243 is present in multiple segments. 18244 (Christoph Goller via Otis) 18245 18246 8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher) 18247 18248 9. Added support for the new "compound file" index format (Dmitry 18249 Serebrennikov) 18250 1825110. Added Locale setting to QueryParser, for use by date range parsing. 18252 1825311. Changed IndexReader so that it can be subclassed by classes 18254 outside of its package. Previously it had package-private 18255 abstract methods. Also modified the index merging code so that it 18256 can work on an arbitrary IndexReader implementation, and added a 18257 new method, IndexWriter.addIndexes(IndexReader[]), to take 18258 advantage of this. (cutting) 18259 1826012. Added a limit to the number of clauses which may be added to a 18261 BooleanQuery. The default limit is 1024 clauses. This should 18262 stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy 18263 queries which run amok. (cutting) 18264 1826513. Add new method: IndexReader.undeleteAll(). This undeletes all 18266 deleted documents which still remain in the index. (cutting) 18267 18268 182691.3 RC1 18270 18271 1. Fixed PriorityQueue's clear() method. 18272 Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454 18273 (Matthijs Bomhoff via otis) 18274 18275 2. Changed StandardTokenizer.jj grammar for EMAIL tokens. 18276 Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015 18277 (Dale Anson via otis) 18278 18279 3. Added the ability to disable lock creation by using disableLuceneLocks 18280 system property. This is useful for read-only media, such as CD-ROMs. 18281 (otis) 18282 18283 4. Added id method to Hits to be able to access the index global id. 18284 Required for sorting options. 18285 (carlson) 18286 18287 5. Added support for new range query syntax to QueryParser.jj. 18288 (briangoetz) 18289 18290 6. Added the ability to retrieve HTML documents' META tag values to 18291 HTMLParser.jj. 18292 (Mark Harwood via otis) 18293 18294 7. Modified QueryParser to make it possible to programmatically specify the 18295 default Boolean operator (OR or AND). 18296 (Péter Halácsy via otis) 18297 18298 8. Made many search methods and classes non-final, per requests. 18299 This includes IndexWriter and IndexSearcher, among others. 18300 (cutting) 18301 18302 9. Added class RemoteSearchable, providing support for remote 18303 searching via RMI. The test class RemoteSearchableTest.java 18304 provides an example of how this can be used. (cutting) 18305 18306 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The 18307 test class TestPhrasePrefixQuery provides the usage example. 18308 (Anders Nielsen via otis) 18309 18310 11. Changed the German stemming algorithm to ignore case while 18311 stripping. The new algorithm is faster and produces more equal 18312 stems from nouns and verbs derived from the same word. 18313 (gschwarz) 18314 18315 12. Added support for boosting the score of documents and fields via 18316 the new methods Document.setBoost(float) and Field.setBoost(float). 18317 18318 Note: This changes the encoding of an indexed value. Indexes 18319 should be re-created from scratch in order for search scores to 18320 be correct. With the new code and an old index, searches will 18321 yield very large scores for shorter fields, and very small scores 18322 for longer fields. Once the index is re-created, scores will be 18323 as before. (cutting) 18324 18325 13. Added new method Token.setPositionIncrement(). 18326 18327 This permits, for the purpose of phrase searching, placing 18328 multiple terms in a single position. This is useful with 18329 stemmers that produce multiple possible stems for a word. 18330 18331 This also permits the introduction of gaps between terms, so that 18332 terms which are adjacent in a token stream will not be matched by 18333 and exact phrase query. This makes it possible, e.g., to build 18334 an analyzer where phrases are not matched over stop words which 18335 have been removed. 18336 18337 Finally, repeating a token with an increment of zero can also be 18338 used to boost scores of matches on that token. (cutting) 18339 18340 14. Added new Filter class, QueryFilter. This constrains search 18341 results to only match those which also match a provided query. 18342 Results are cached, so that searches after the first on the same 18343 index using this filter are very fast. 18344 18345 This could be used, for example, with a RangeQuery on a formatted 18346 date field to implement date filtering. One could re-use a 18347 single QueryFilter that matches, e.g., only documents modified 18348 within the last week. The QueryFilter and RangeQuery would only 18349 need to be reconstructed once per day. (cutting) 18350 18351 15. Added a new IndexWriter method, getAnalyzer(). This returns the 18352 analyzer used when adding documents to this index. (cutting) 18353 18354 16. Fixed a bug with IndexReader.lastModified(). Before, document 18355 deletion did not update this. Now it does. (cutting) 18356 18357 17. Added Russian Analyzer. 18358 (Boris Okner via otis) 18359 18360 18. Added a public, extensible scoring API. For details, see the 18361 javadoc for org.apache.lucene.search.Similarity. 18362 18363 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter). 18364 18365 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes. 18366 (Peter Mularien via otis) 18367 18368 21. Added getFields(String) and getValues(String) methods. 18369 Contributed by Rasik Pandey on 2002-10-09 18370 (Rasik Pandey via otis) 18371 18372 22. Revised internal search APIs. Changes include: 18373 18374 a. Queries are no longer modified during a search. This makes 18375 it possible, e.g., to reuse the same query instance with 18376 multiple indexes from multiple threads. 18377 18378 b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery, 18379 etc.) now work correctly with MultiSearcher, fixing bugs 12619 18380 and 12667. 18381 18382 c. Boosting BooleanQuery's now works, and is supported by the 18383 query parser (problem reported by Lee Mallabone). Thus a query 18384 like "(+foo +bar)^2 +baz" is now supported and equivalent to 18385 "(+foo^2 +bar^2) +baz". 18386 18387 d. New method: Query.rewrite(IndexReader). This permits a 18388 query to re-write itself as an alternate, more primitive query. 18389 Most of the term-expanding query classes (PrefixQuery, 18390 WildcardQuery, etc.) are now implemented using this method. 18391 18392 e. New method: Searchable.explain(Query q, int doc). This 18393 returns an Explanation instance that describes how a particular 18394 document is scored against a query. An explanation can be 18395 displayed as either plain text, with the toString() method, or 18396 as HTML, with the toHtml() method. Note that computing an 18397 explanation is as expensive as executing the query over the 18398 entire index. This is intended to be used in developing 18399 Similarity implementations, and, for good performance, should 18400 not be displayed with every hit. 18401 18402 f. Scorer and Weight are public, not package protected. It now 18403 possible for someone to write a Scorer implementation that is 18404 not in the org.apache.lucene.search package. This is still 18405 fairly advanced programming, and I don't expect anyone to do 18406 this anytime soon, but at least now it is possible. 18407 18408 g. Added public accessors to the primitive query classes 18409 (TermQuery, PhraseQuery and BooleanQuery), permitting access to 18410 their terms and clauses. 18411 18412 Caution: These are extensive changes and they have not yet been 18413 tested extensively. Bug reports are appreciated. 18414 (cutting) 18415 18416 23. Added convenience RAMDirectory constructors taking File and String 18417 arguments, for easy FSDirectory to RAMDirectory conversion. 18418 (otis) 18419 18420 24. Added code for manual renaming of files in FSDirectory, since it 18421 has been reported that java.io.File's renameTo(File) method sometimes 18422 fails on Windows JVMs. 18423 (Matt Tucker via otis) 18424 18425 25. Refactored QueryParser to make it easier for people to extend it. 18426 Added the ability to automatically lower-case Wildcard terms in 18427 the QueryParser. 18428 (Tatu Saloranta via otis) 18429 18430 184311.2 RC6 18432 18433 1. Changed QueryParser.jj to have "?" be a special character which 18434 allowed it to be used as a wildcard term. Updated TestWildcard 18435 unit test also. (Ralf Hettesheimer via carlson) 18436 184371.2 RC5 18438 18439 1. Renamed build.properties to default.properties and updated 18440 the BUILD.txt document to describe how to override the 18441 default.property settings without having to edit the file. This 18442 brings the build process closer to Scarab's build process. 18443 (jon) 18444 18445 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis) 18446 18447 3. Updated "powered by" links. (otis) 18448 18449 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis) 18450 18451 5. Added throwing exception if FSDirectory could not create directory 18452 - Bug #6914 (Eugene Gluzberg via otis) 18453 18454 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter, 18455 LowerCaseTokenizer javadoc (otis) 18456 18457 7. Added fix to avoid NullPointerException in results.jsp 18458 (Mark Hayes via otis) 18459 18460 8. Changed Wildcard search to find 0 or more char instead of 1 or more 18461 (Lee Mallobone, via otis) 18462 18463 9. Fixed error in offset issue in GermanStemFilter - Bug #7412 18464 (Rodrigo Reyes, via otis) 18465 18466 10. Added unit tests for wildcard search and DateFilter (otis) 18467 18468 11. Allow co-existence of indexed and non-indexed fields with the same name 18469 (cutting/casper, via otis) 18470 18471 12. Add escape character to query parser. 18472 (briangoetz) 18473 18474 13. Applied a patch that ensures that searches that use DateFilter 18475 don't throw an exception when no matches are found. (David Smiley, via 18476 otis) 18477 18478 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson) 18479 18480 184811.2 RC4 18482 18483 1. Updated contributions section of website. 18484 Add XML Document #3 implementation to Document Section. 18485 Also added Term Highlighting to Misc Section. (carlson) 18486 18487 2. Fixed NullPointerException for phrase searches containing 18488 unindexed terms, introduced in 1.2RC3. (cutting) 18489 18490 3. Changed document deletion code to obtain the index write lock, 18491 enforcing the fact that document addition and deletion cannot be 18492 performed concurrently. (cutting) 18493 18494 4. Various documentation cleanups. (otis, acoliver) 18495 18496 5. Updated "powered by" links. (cutting, jon) 18497 18498 6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis) 18499 18500 7. Changed Term and Query to implement Serializable. (scottganyo) 18501 18502 8. Fixed to never delete indexes added with IndexWriter.addIndexes(). 18503 (cutting) 18504 18505 9. Upgraded to JUnit 3.7. (otis) 18506 185071.2 RC3 18508 18509 1. IndexWriter: fixed a bug where adding an optimized index to an 18510 empty index failed. This was encountered using addIndexes to copy 18511 a RAMDirectory index to an FSDirectory. 18512 18513 2. RAMDirectory: fixed a bug where RAMInputStream could not read 18514 across more than across a single buffer boundary. 18515 18516 3. Fix query parser so it accepts queries with unicode characters. 18517 (briangoetz) 18518 18519 4. Fix query parser so that PrefixQuery is used in preference to 18520 WildcardQuery when there's only an asterisk at the end of the 18521 term. Previously PrefixQuery would never be used. 18522 18523 5. Fix tests so they compile; fix ant file so it compiles tests 18524 properly. Added test cases for Analyzers and PriorityQueue. 18525 18526 6. Updated demos, added Getting Started documentation. (acoliver) 18527 18528 7. Added 'contributions' section to website & docs. (carlson) 18529 18530 8. Removed JavaCC from source distribution for copyright reasons. 18531 Folks must now download this separately from metamata in order to 18532 compile Lucene. (cutting) 18533 18534 9. Substantially improved the performance of DateFilter by adding the 18535 ability to reuse TermDocs objects. (cutting) 18536 1853710. Added IndexReader methods: 18538 public static boolean indexExists(String directory); 18539 public static boolean indexExists(File directory); 18540 public static boolean indexExists(Directory directory); 18541 public static boolean isLocked(Directory directory); 18542 public static void unlock(Directory directory); 18543 (cutting, otis) 18544 1854511. Fixed bugs in GermanAnalyzer (gschwarz) 18546 18547 185481.2 RC2 18549 - added sources to distribution 18550 - removed broken build scripts and libraries from distribution 18551 - SegmentsReader: fixed potential race condition 18552 - FSDirectory: fixed so that getDirectory(xxx,true) correctly 18553 erases the directory contents, even when the directory 18554 has already been accessed in this JVM. 18555 - RangeQuery: Fix issue where an inclusive range query would 18556 include the nearest term in the index above a non-existant 18557 specified upper term. 18558 - SegmentTermEnum: Fix NullPointerException in clone() method 18559 when the Term is null. 18560 - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1, 18561 since they rely on a feature added in JDK 1.2. 18562 185631.2 RC1 18564 - first Apache release 18565 - packages renamed from com.lucene to org.apache.lucene 18566 - license switched from LGPL to Apache 18567 - ant-only build -- no more makefiles 18568 - addition of lock files--now fully thread & process safe 18569 - addition of German stemmer 18570 - MultiSearcher now supports low-level search API 18571 - added RangeQuery, for term-range searching 18572 - Analyzers can choose tokenizer based on field name 18573 - misc bug fixes. 18574 185751.01b 18576 . last Sourceforge release 18577 . a few bug fixes 18578 . new Query Parser 18579 . new prefix query (search for "foo*" matches "food") 18580 185811.0 18582 18583This release fixes a few serious bugs and also includes some 18584performance optimizations, a stemmer, and a few other minor 18585enhancements. 18586 185870.04 18588 18589Lucene now includes a grammar-based tokenizer, StandardTokenizer. 18590 18591The only tokenizer included in the previous release (LetterTokenizer) 18592identified terms consisting entirely of alphabetic characters. The 18593new tokenizer uses a regular-expression grammar to identify more 18594complex classes of terms, including numbers, acronyms, email 18595addresses, etc. 18596 18597StandardTokenizer serves two purposes: 18598 18599 1. It is a much better, general purpose tokenizer for use by 18600 applications as is. 18601 18602 The easiest way for applications to start using 18603 StandardTokenizer is to use StandardAnalyzer. 18604 18605 2. It provides a good example of grammar-based tokenization. 18606 18607 If an application has special tokenization requirements, it can 18608 implement a custom tokenizer by copying the directory containing 18609 the new tokenizer into the application and modifying it 18610 accordingly. 18611 186120.01 18613 18614First open source release. 18615 18616The code has been re-organized into a new package and directory 18617structure for this release. It builds OK, but has not been tested 18618beyond that since the re-organization. 18619