xref: /Lucene/lucene/CHANGES.txt (revision 89dbe651ccaec07d3fd1b6ee41e4ad9d34aeb259)
1Lucene Change Log
2
3For more information on past and future Lucene versions, please see:
4http://s.apache.org/luceneversions
5
6======================= Lucene 10.0.0 =======================
7
8API Changes
9---------------------
10
11* LUCENE-10010: AutomatonQuery, CompiledAutomaton, RunAutomaton, RegExp
12  classes no longer determinize NFAs. Instead it is the responsibility
13  of the caller to determinize.  (Robert Muir)
14
15* LUCENE-10368: IntTaxonomyFacets has been make pkg-private and serves only as an internal
16  implementation detail of taxonomy-faceting. (Greg Miller)
17
18* LUCENE-10400: Remove deprecated dictionary constructors in Kuromoji and Nori (Tomoko Uchida)
19
20* LUCENE-10440: TaxonomyFacets and FloatTaxonomyFacets have been made pkg-private and only serve
21  as internal implementation details of taxonomy-faceting. (Greg Miller)
22
23* LUCENE-10431: MultiTermQuery.setRewriteMethod() has been removed. (Alan Woodward)
24
25* LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and
26  KnnVectorFieldExistsQuery. (Zach Chen, Adrien Grand)
27
28* LUCENE-10561: Reduce class/member visibility of all normalizer and stemmer classes. (Rushabh Shah)
29
30* LUCENE-10266: Move nearest-neighbor search on points to core. (Rushabh Shah)
31
32New Features
33---------------------
34
35* LUCENE-10010 Introduce NFARunAutomaton to run NFA directly. (Patrick Zhai)
36
37Improvements
38---------------------
39
40* LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori.
41  (Uihyun Kim)
42
43Optimizations
44---------------------
45(No changes)
46
47Bug Fixes
48---------------------
49
50* LUCENE-10599: LogMergePolicy is more likely to keep merging segments until
51  they reach the maximum merge size. (Adrien Grand)
52
53Other
54---------------------
55* LUCENE-10283: The minimum required Java version was bumped from 11 to 17.
56  (Adrien Grand, Uwe Schindler, Dawid Weiss, Robert Muir)
57
58* LUCENE-10253: The @BadApple annotation has been removed from the test
59  framework. (Adrien Grand)
60
61* LUCENE-10393: Unify binary dictionary and dictionary writer in Kuromoji and Nori.
62  (Tomoko Uchida, Robert Muir)
63
64* LUCENE-10475: Merge dictionary builders in `util` package into `dict` package in Kuromoji and Nori.
65  All classes in `org.apache.lucene.analysis.[ja|ko].util` was moved to `org.apache.lucene.analysis.[ja|ko].dict`.
66  (Tomoko Uchida)
67
68* LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
69
70======================== Lucene 9.3.0 =======================
71
72API Changes
73---------------------
74(No changes)
75
76New Features
77---------------------
78(No changes)
79
80Improvements
81---------------------
82
83* LUCENE-10078: Merge on full flush is now enabled by default with a timeout of
84  500ms. (Adrien Grand)
85
86* LUCENE-10585: Facet module code cleanup (copy/paste scrubbing, simplification and some very minor
87  optimization tweaks). (Greg Miller)
88
89Optimizations
90---------------------
91* LUCENE-8519: MultiDocValues.getNormValues should not call getMergedFieldInfos (Rushabh Shah)
92
93Bug Fixes
94---------------------
95
96* LUCENE-10574: Prevent pathological O(N^2) merging. (Adrien Grand)
97
98* LUCENE-10582: Fix merging of overridden CollectionStatistics in CombinedFieldQuery (Yannick Welsch)
99
100* LUCENE-10598: SortedSetDocValues#docValueCount() should be always greater than zero. (Lu Xugang)
101
102* LUCENE-10563: Fix failure to tessellate complex polygon (Craig Taverner)
103
104* LUCENE-10605: Fix error in 32bit jvm object alignment gap calculation (Sun Wuqiang)
105
106* GITHUB#956: Make sure KnnVectorQuery applies search boost. (Julie Tibshirani)
107
108Other
109---------------------
110
111* LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests. (Dawid Weiss)
112
113* LUCENE-10604: Improve ability to test and debug triangulation algorithm in Tessellator.
114  (Craig Taverner)
115
116======================= Lucene 9.2.0 =======================
117
118API Changes
119---------------------
120
121* LUCENE-10325: Facets API extended to support getTopFacets. (Yuting Gan)
122
123* LUCENE-10482: Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the
124  taxoEpoch decide. Add a test case that demonstrates the inconsistencies caused when you reuse taxoArrays on older
125  checkpoints. (Gautam Worah)
126
127* LUCENE-10558: Add new constructors to Kuromoji and Nori dictionary classes to support classpath /
128  module system usage. It is now possible to use JDK's Class/ClassLoader/Module#getResource(...) apis
129  and pass their returned URL to dictionary constructors to load resources from Classpath or Module
130  resources. (Uwe Schindler, Tomoko Uchida, Mike Sokolov)
131
132New Features
133---------------------
134
135* LUCENE-10312: Add PersianStemmer based on the Arabic stemmer. (Ramin Alirezaee)
136
137* LUCENE-10539: Return a stream of completions from FSTCompletion. (Dawid Weiss)
138
139* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
140  to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand)
141
142* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory`
143  implementation. `Monitor` can be created with a readonly `QueryIndex` in order to
144  have readonly `Monitor` instances. (Niko Usai)
145
146* LUCENE-10456: Implement rewrite and Weight#count for MultiRangeQuery
147  by merging overlapping ranges . (Jianping Weng)
148
149* LUCENE-10444: Support alternate aggregation functions in association facets. (Greg Miller)
150
151Improvements
152---------------------
153
154* LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to
155  work properly with or without offsets. (Dawid Weiss)
156
157* LUCENE-10494: Implement method to bulk add all collection elements to a PriorityQueue.
158  (Bauyrzhan Sakhariyev)
159
160* LUCENE-10484: Add support for concurrent random sampling by calling
161  RandomSamplingFacetsCollector#createManager. (Luca Cavanna)
162
163* LUCENE-10467: Throws IllegalArgumentException for Facets#getAllDims and Facets#getTopChildren
164  if topN <= 0. (Yuting Gan)
165
166* LUCENE-9848: Correctly sort HNSW graph neighbors when applying diversity criterion (Mayya
167  Sharipova, Michael Sokolov)
168
169* LUCENE-10527: Use 2*maxConn for the last layer in HNSW (Mayya Sharipova)
170
171Optimizations
172---------------------
173
174* LUCENE-10555: avoid NumericLeafComparator#iteratorCost repeated initialization
175  when NumericLeafComparator#setScorer is called. (Jianping Weng)
176
177* LUCENE-10452: Hunspell: call checkCanceled less frequently to reduce the overhead (Peter Gromov)
178
179* LUCENE-10451: Hunspell: don't perform potentially expensive spellchecking after timeout (Peter Gromov)
180
181* LUCENE-10418: More `Query#rewrite` optimizations for the non-scoring case.
182  (Adrien Grand)
183
184* LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery
185  with FieldExistsQuery. (Zach Chen, Michael McCandless, Adrien Grand)
186
187* LUCENE-10481: FacetsCollector will not request scores if it does not use them. (Mike Drob)
188
189* LUCENE-10503: Potential speedup for pure disjunctions whose clauses produce
190  scores that are very close to each other. (Adrien Grand)
191
192* LUCENE-10315: Use SIMD instructions to decode BKD doc IDs. (Guo Feng, Adrien Grand, Ignacio Vera)
193
194* LUCENE-8836: Speed up calls to TermsEnum#lookupOrd on doc values terms enums
195  and sequences of increasing ords. (Bruno Roustant, Adrien Grand)
196
197* LUCENE-10536: Doc values terms dictionaries now use the first (uncompressed)
198  term of each block as a dictionary when compressing suffixes of the other 63
199  terms of the block. (Adrien Grand)
200
201* LUCENE-10411: Add nearest neighbors vectors support to ExitableDirectoryReader.
202  (Zach Chen, Adrien Grand, Julie Tibshirani, Tomoko Uchida)
203
204* LUCENE-10542: FieldSource exists implementations can avoid value retrieval (Kevin Risden)
205
206* LUCENE-10534: MinFloatFunction / MaxFloatFunction exists check can be slow (Kevin Risden)
207
208* LUCENE-10496: Queries sorted by field now better handle the degenerate case
209  when the search order and the index order are in opposite directions.
210  (Jianping Weng)
211
212* LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle
213  ordToDoc in HNSW vectors (Lu Xugang)
214
215* LUCENE-10488: Facets#getTopDims optimized for taxonomy faceting and
216  ConcurrentSortedSetDocValuesFacetCounts. (Yuting Gan)
217
218Bug Fixes
219---------------------
220* LUCENE-10477: Highlighter: WeightedSpanTermExtractor.extractWeightedSpanTerms to Query#rewrite
221  multiple times if necessary. (Christine Poerschke, Adrien Grand)
222
223* LUCENE-10491: A correctness bug in the way scores are provided within TaxonomyFacetSumValueSource
224  was fixed. (Michael McCandless, Greg Miller)
225
226* LUCENE-10466: Ensure IndexSortSortedNumericDocValuesRangeQuery handles sort field
227  types besides LONG (Andriy Redko)
228
229* LUCENE-10292: Suggest: Fix AnalyzingInfixSuggester / BlendedInfixSuggester to correctly return
230  existing lookup() results during concurrent build().  Fix other FST based suggesters so that
231  getCount() returned results consistent with lookup() during concurrent build().  (hossman)
232
233* LUCENE-10508: Fixes some edge cases where GeoArea were built in a way that vertical planes
234  could not evaluate their sign, either because the planes where the same or the center between those
235  planes was lying in one of the planes. (Ignacio Vera)
236
237* LUCENE-10495: Fix return statement of siblingsLoaded() in TaxonomyFacets. (Yuting Gan)
238
239* LUCENE-10533: SpellChecker.formGrams is missing bounds check (Kevin Risden)
240
241* LUCENE-10529: Properly handle when TestTaxonomyFacetAssociations test case randomly indexes
242  no documents instead of throwing an NPE. (Greg Miller)
243
244* LUCENE-10470: Check if polygon has been successfully tessellated before we fail (we are failing some valid
245  tessellations) and allow filtering edges that fold on top of the previous one. (Ignacio Vera)
246
247* LUCENE-10530: Avoid floating point precision test case bug in TestTaxonomyFacetAssociations.
248  (Greg Miller)
249
250* LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode. (Lu Xugang)
251
252* LUCENE-10558: Restore behaviour of deprecated Kuromoji and Nori dictionary constructors for
253  custom dictionary support. Please also use new URL-based constructors for classpath/module
254  system ressources.  (Uwe Schindler, Tomoko Uchida, Mike Sokolov)
255
256* LUCENE-10564: Make sure SparseFixedBitSet#or updates ramBytesUsed. (Julie Tibshirani)
257
258Build
259---------------------
260
261* GITHUB#768: Upgrade forbiddenapis to version 3.3.  (Uwe Schindler)
262
263* GITHUB#890: Detect CI builds on Github or Jenkins and enable errorprone.  (Uwe Schindler, Dawid Weiss)
264
265* LUCENE-10532: Remove LuceneTestCase.Slow annotation. All tests can be fast. (Robert Muir)
266
267Other
268---------------------
269* LUCENE-10526: Test-framework: Add FilterFileSystemProvider.wrapPath(Path) method for mock filesystems
270  to override if they need to extend the Path implementation. (Gautam Worah, Robert Muir)
271
272* LUCENE-10525: Test-framework: Add detection of illegal windows filenames to WindowsFS. (Gautam Worah)
273
274* LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255.
275  (Robert Muir, Uwe Schindler, Tomoko Uchida, Dawid Weiss)
276
277* GITHUB#854: Allow to link to GitHub pull request from CHANGES. (Tomoko Uchida, Jan Høydahl)
278
279======================= Lucene 9.1.0 =======================
280
281API Changes
282---------------------
283
284* LUCENE-10244: MultiCollector::getCollectors is now public, allowing users to access the wrapped
285  collectors. (Andriy Redko)
286
287* LUCENE-10197: UnifiedHighlighter now has a Builder to construct it.  The UH's setters are now
288  deprecated.  (Animesh Pandey, David Smiley)
289
290* LUCENE-10301: the test framework is now a module. All the classes have been moved from
291  org.apache.lucene.* to org.apache.lucene.tests.* to avoid package name conflicts with the
292  core module. (Dawid Weiss)
293
294* LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues.
295  (Zach Chen, Michael Sokolov, Julie Tibshirani, Adrien Grand)
296
297* LUCENE-10335: Deprecate helper methods for resource loading in IOUtils and StopwordAnalyzerBase
298  that are not compatible with module system (Class#getResourceAsStream() and Class#getResource()
299  are caller sensitive in Java 11). Instead add utility method IOUtils#requireResourceNonNull(T)
300  to test existence of resource based on null return value.  (Uwe Schindler, Dawid Weiss)
301
302* LUCENE-10349: WordListLoader methods now return unmodifiable CharArraySets.  (Uwe Schindler)
303
304* LUCENE-10377: SortField.getComparator() has changed signature. The second parameter is now
305  a boolean indicating whether or not skipping should be enabled on the comparator.
306  (Alan Woodward)
307
308* LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting. (Greg Miller)
309
310* LUCENE-10368: IntTaxonomyFacets has been deprecated and is no longer a supported extension point
311  for user-created faceting implementations. (Greg Miller)
312
313* LUCENE-10400: Add constructors that take external resource Paths to dictionary classes in Kuromoji and Nori:
314  ConnectionCosts, TokenInfoDictionary, and UnknownDictionary. Old constructors that take resource scheme and
315  resource path in those classes are deprecated; These are replaced with the new constructors and planned to be
316  removed in a future release. (Tomoko Uchida, Uwe Schindler, Mike Sokolov)
317
318* LUCENE-10050: Deprecate DrillSideways#search(Query, Collector) in favor of
319  DrillSideways#search(Query, CollectorManager). This reflects the change (LUCENE-10002) being made in
320  IndexSearcher#search that trends towards using CollectorManagers over Collectors. (Gautam Worah)
321
322* LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces.
323  (David Smiley, Uwe Schindler, Dawid Weiss, Tomoko Uchida)
324
325* LUCENE-10398: Add static method for getting Terms from LeafReader. (Spike Liu)
326
327* LUCENE-10440: TaxonomyFacets and FloatTaxonomyFacets have been deprecated and are no longer
328  supported extension points for user-created faceting implementations. (Greg Miller)
329
330* LUCENE-10431: MultiTermQuery.setRewriteMethod() has been deprecated, and constructor
331  parameters for the various implementations added. (Alan Woodward)
332
333* LUCENE-10171: OpenNLPOpsFactory.getLemmatizerDictionary(String, ResourceLoader) now returns a
334  DictionaryLemmatizer object instead of a raw String serialization of the dictionary.
335  (Spyros Kapnissis via Michael Gibney, Alessandro Benedetti)
336
337New Features
338---------------------
339
340* LUCENE-10255: Lucene JARs are now proper modules, with module descriptors and dependency information.
341  (Chris Hegarty, Uwe Schindler, Tomoko Uchida, Dawid Weiss)
342
343* LUCENE-10342: Lucene Core now depends on java.logging (JUL) module and reports
344  if MMapDirectory cannot unmap mapped ByteBuffers or RamUsageEstimator's object size
345  calculations may be off. This was added especially for users running Lucene with the
346  Java Module System where some optional features are not available by default or supported.
347  For all apps using Lucene it is strongly recommended, to explicitely require non-standard
348  JDK modules: jdk.unsupported (unmapping) and jdk.management (OOP size for RAM usage calculatons).
349  It is also recommended to install JUL logging adapters to feed the log events into your app's
350  logging system.  (Uwe Schindler, Dawid Weiss, Tomoko Uchida, Robert Muir)
351
352* LUCENE-10330: Make MMapDirectory tests fail by default, if unmapping does not work.
353  (Uwe Schindler, Dawid Weiss)
354
355* LUCENE-10223: Add interval function support to StandardQueryParser. Add min-should-match operator
356  support to StandardQueryParser. Update and clean up package documentation in flexible query parser
357  module. (Dawid Weiss, Alan Woodward)
358
359* LUCENE-10220: Add an utility method to get IntervalSource from analyzed text (or token stream).
360  (Uwe Schindler, Dawid Weiss, Alan Woodward)
361
362* LUCENE-10085: Added Weight#count on DocValuesFieldExistsQuery to speed up the query if terms or
363  points are indexed.
364  (Quentin Pradet, Adrien Grand)
365
366* LUCENE-10263: Added Weight#count to NormsFieldExistsQuery to speed up the query if all
367  documents have the field.. (Alan Woodward)
368
369* LUCENE-10248: Add SpanishPluralStemFilter, for precise stemming of Spanish plurals.
370  For more information, see https://s.apache.org/spanishplural  (Xavier Sanchez Loro)
371
372* LUCENE-10243: StandardTokenizer, UAX29URLEmailTokenizer, and HTMLStripCharFilter have
373  been upgraded to Unicode 12.1  (Robert Muir)
374
375* LUCENE-10335: Add ModuleResourceLoader as complement to ClasspathResourceLoader.
376  (Uwe Schindler)
377
378* LUCENE-10245: MultiDoubleValues(Source) and MultiLongValues(Source) were added as multi-valued
379  versions of DoubleValues(Source) and LongValues(Source) to the facets module. LongValueFacetCounts,
380  LongRangeFacetCounts and DoubleRangeFacetCounts were augmented to support these new multi-valued
381  abstractions. DoubleRange and LongRange also support creating queries from these multi-valued
382  sources. (Greg Miller)
383
384* LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets. (Marc D'mello)
385
386* LUCENE-10395: Add support for TotalHitCountCollectorManager, a collector manager
387  based on TotalHitCountCollector that allows users to parallelize counting the
388  number of hits. (Luca Cavanna, Adrien Grand)
389
390* LUCENE-10403: Add ArrayUtil#grow(T[]). (Greg Miller)
391
392* LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (Dawid Weiss,
393  Alan Woodward)
394
395* LUCENE-10378: Implement Weight#count for PointRangeQuery to provide a faster way to calculate
396  the number of matching range docs when each doc has at-most one point and the points are 1-dimensional.
397  (Gautam Worah, Ignacio Vera, Adrien Grand)
398
399* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera)
400
401* LUCENE-10382: Add support for filtering in KnnVectorQuery. This allows for finding the
402  nearest k documents that also match a query. (Julie Tibshirani, Joel Bernstein)
403
404* LUCENE-10237: Add MergeOnFlushMergePolicy to sandbox.
405  (Michael Froh, Anand Kotriwal)
406
407Improvements
408---------------------
409
410* LUCENE-10313: use java util logging in Luke. Add dynamic log filtering. Drop
411  the persistent log previously written to ~/.luke.d/luke.log. Configure Java's default
412  logging handlers to persist Luke logs according to your needs. (Tomoko Uchida, Dawid Weiss)
413
414* LUCENE-10238: Upgrade icu4j dependency to 70.1. (Dawid Weiss)
415
416* LUCENE-9820: Extract BKD tree interface and move intersecting logic to the
417  PointValues abstract class. (Ignacio Vera, Adrien Grand)
418
419* LUCENE-10262: Lift up restrictions for navigating PointValues#PointTree
420  added in LUCENE-9820 (Ignacio Vera)
421
422* LUCENE-9538: Detect polygon self-intersections in the Tessellator. (Ignacio Vera)
423
424* LUCENE-10275: Speed up MultiRangeQuery by using an interval tree. (Ignacio Vera)
425
426* LUCENE-10229: Unify behaviour of match offsets for interval queries on fields
427  with or without offsets enabled. (Patrick Zhai)
428
429* LUCENE-10054 Make HnswGraph hierarchical (Mayya Sharipova, Julie Tibshirani, Mike Sokolov,
430  Adrien Grand)
431
432* LUCENE-10371: Make IndexRearranger able to arrange segment in a determined order.
433  (Patrick Zhai)
434
435Optimizations
436---------------------
437
438* LUCENE-10329: Use computed block mask for DirectMonotonicReader#get. (Guo Feng)
439
440* LUCENE-10280: Optimize BKD leaves' doc IDs codec when they are continuous. (Guo Feng)
441
442* LUCENE-10233: Store BKD leaves' doc IDs as bitset in some cases (typically for low cardinality fields
443 or sorted indices) to speed up addAll. (Guo Feng, Adrien Grand)
444
445* LUCENE-10225: Improve IntroSelector with 3-ways partitioning. (Bruno Roustant, Adrien Grand)
446
447* LUCENE-10321: Tweak MultiRangeQuery interval tree creation to skip "pulling up" mins. (Greg Miller)
448
449* LUCENE-10252: ValueSource.asDoubleValues and asLongValues should not compute the score unless
450  asked to -- typically never.  This fixes a performance regression since 7.3 LUCENE-8099 when some
451  older boosting queries were replaced with this. (David Smiley)
452
453* LUCENE-10346: Optimize facet counting for single-valued TaxonomyFacetCounts. (Guo Feng)
454
455* LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts. (Greg Miller)
456
457* LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll.
458  (Guo Feng, Greg Miller)
459
460* LUCENE-10375: Speed up HNSW vectors merge by first writing combined vector
461  data to a file. (Julie Tibshirani, Adrien Grand)
462
463* LUCENE-10388: Remove MultiLevelSkipListReader#SkipBuffer to make JVM less confused. (Guo Feng)
464
465* LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of
466  matching clauses is a constant. (LuYunCheng via Adrien Grand)
467
468* LUCENE-10412: More `Query#rewrite` optimizations for MatchNoDocsQuery.
469  (Adrien Grand)
470
471* LUCENE-10408 Better encoding of doc Ids in vectors. (Mayya Sharipova, Julie Tibshirani, Adrien Grand)
472
473* LUCENE-10424, LUCENE-10439: Optimize the "everything matches" case for count query in PointRangeQuery. (Ignacio Vera, Lu Xugang)
474
475* LUCENE-10084, LUCENE-10435: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery whenever
476  terms or points have a docCount that is equal to maxDoc. (Vigya Sharma, Lu Xugang)
477
478* LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery
479  then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery. (Lu Xugang)
480
481* LUCENE-10450: IndexSortSortedNumericDocValuesRangeQuery could be rewrite to MatchAllDocsQuery. (Lu Xugang)
482
483* LUCENE-10453: Indexing and search speedup with KNN vectors when using
484  euclidean distance. (Adrien Grand)
485
486* LUCENE-10455: IndexSortSortedNumericDocValuesRangeQuery now implements the scorerSupplier API. (Lu Xugang)
487
488Changes in runtime behavior
489---------------------
490
491* LUCENE-10291: Lucene now only writes files for terms and postings if at least
492  one field is indexed with postings. (Yannick Welsch)
493
494* LUCENE-10311: FixedBitSet#approximateCardinality now trades accuracy for
495  speed instead of delegating to FixedBitSet#cardinality.
496  (Robert Muir, Adrien Grand)
497
498Bug Fixes
499---------------------
500
501* LUCENE-10316: fix TestLRUQueryCache.testCachingAccountableQuery failure. (Patrick Zhai)
502
503* LUCENE-10279: Fix equals in MultiRangeQuery. (Ignacio Vera)
504
505* LUCENE-10349: Fix all analyzers to behave according to their documentation:
506  getDefaultStopSet() methods now return unmodifiable CharArraySets.  (Uwe Schindler)
507
508* LUCENE-10352: Add missing service provider entries: KoreanNumberFilterFactory,
509  DaitchMokotoffSoundexFilterFactory (Uwe Schindler, Robert Muir)
510
511* LUCENE-10352: Fixed ctor argument checks: JapaneseKatakanaStemFilter,
512  DoubleMetaphoneFilter (Uwe Schindler, Robert Muir)
513
514* LUCENE-10236: Stop duplicating norms when scoring in CombinedFieldQuery.
515  (Zach Chen, Jim Ferenczi, Julie Tibshirani)
516
517* LUCENE-10353: Add random null injection to TestRandomChains. (Robert Muir,
518  Uwe Schindler)
519
520* LUCENE-10377: CheckIndex could incorrectly throw an error when checking index sorts
521  defined on older indexes. (Alan Woodward)
522
523* LUCENE-9952: Address inaccurate dim counts for SSDV faceting in cases where a dim is configured
524  as multi-valued. (Greg Miller)
525
526* LUCENE-10401: Fix lookups on empty doc-value terms dictionaries to no longer
527  throw an ArrayIndexOutOfBoundsException. (Adrien Grand)
528
529* LUCENE-10402: Prefix intervals should declare their automaton as binary, otherwise prefixes
530  containing multibyte characters will not correctly match. (Alan Woodward)
531
532* LUCENE-10407: Containing intervals could sometimes yield incorrect matches when wrapped
533  in a disjunction. (Alan Woodward, Dawid Weiss)
534
535* LUCENE-10405: When using the MemoryIndex, binary and Sorted doc values are stored
536   as BytesRef instead of BytesRefHash so they don't have a limit on size. (Ignacio Vera)
537
538* LUCENE-10428: Queries with a misbehaving score function may no longer cause
539  infinite loops in their parent BooleanQuery.
540  (Ankit Jain, Daniel Doubrovkine, Adrien Grand)
541
542* LUCENE-10431: MultiTermQuery no longer includes its rewrite method in its hashcode
543  calculation, as this could cause problems with wrapper queries like BooleanQuery which
544  expect their child queries hashcodes to be stable. (Alan Woodward)
545
546* LUCENE-10469: Fix ScoreMode propagation by ConstantScoreQuery. (Adrien Grand)
547
548Other
549---------------------
550
551* LUCENE-10273: Deprecate SpanishMinimalStemFilter in favor of SpanishPluralStemFilter. (Robert Muir)
552
553* LUCENE-10284: Upgrade morfologik-stemming to 2.1.8. (Dawid Weiss)
554
555* LUCENE-10310: TestXYDocValuesQueries#doRandomDistanceTest does not produce random circles with radius
556  with '0' value any longer.
557
558* LUCENE-10352: Removed duplicate instances of StringMockResourceLoader and migrated class to
559  test-framework.  (Uwe Schindler, Robert Muir)
560
561* LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test
562  and discover classes to check from module system. The test now checks all analyzer modules,
563  so it may discover new bugs outside of analysis:common module.  (Uwe Schindler, Robert Muir)
564
565* LUCENE-10413: Make Ukrainian default stop words list available as a public getter. (Alan Woodward)
566
567* LUCENE-10437: Polygon tessellator throws a more informative error message when the provided polygon
568  does not contain enough no-collinear points. (Ignacio Vera)
569
570======================= Lucene 9.0.0 =======================
571
572New Features
573---------------------
574
575* LUCENE-9322, LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida)
576
577* LUCENE-9004, LUCENE-10040: Approximate nearest vector search via NSW graphs (Mike Sokolov, Tomoko Uchida et al.)
578
579* LUCENE-9659: SpanPayloadCheckQuery now supports inequalities. (Kevin Watters, Gus Heck)
580
581* LUCENE-9589: Swedish Minimal Stemmer (janhoy)
582
583* LUCENE-9313: Add SerbianAnalyzer based on the snowball stemmer. (Dragan Ivanovic)
584
585* LUCENE-10095: Add NepaliAnalyzer based on the snowball stemmer. (Robert Muir)
586
587* LUCENE-10096: Add TamilAnalyzer based on the snowball stemmer. (Robert Muir)
588
589* LUCENE-10102: Add JapaneseCompletionFilter for Input Method-aware auto-completion (Tomoko Uchida, Robert Muir, Jun Ohtani)
590
591System Requirements
592---------------------
593
594* LUCENE-8738: Move to Java 11 as minimum Java version.
595  (Adrien Grand, Uwe Schindler)
596
597API Changes
598---------------------
599
600* LUCENE-8638: Remove many deprecated methods and classes including FST.lookupByOutput(),
601  LegacyBM25Similarity and Jaspell suggester.
602
603* LUCENE-8982: Separate out native code to another module to allow cpp
604  build with gradle. This also changes the name of the native "posix-support"
605  library to LuceneNativeIO. (Zachary Chen, Dawid Weiss)
606
607* LUCENE-9562: All binary analysis packages (and corresponding
608  Maven artifacts) with names containing '-analyzers-' have been renamed
609  to '-analysis-'. (Dawid Weiss)
610
611* LUCENE-8474: RAMDirectory and associated deprecated classes have been
612  removed. (Dawid Weiss)
613
614* LUCENE-3041: The deprecated Weight#extractTerms() method has been
615  removed (Alan Woodward, Simon Willnauer, David Smiley, Luca Cavanna)
616
617* LUCENE-8805: StoredFieldVisitor#stringField now takes a String rather than a
618  byte[] that stores the UTF-8 bytes of the stored string.
619  (Namgyu Kim via Adrien Grand)
620
621* LUCENE-8811: BooleanQuery#setMaxClauseCount() and #getMaxClauseCount() have
622  moved to IndexSearcher. The checks are now implemented using a QueryVisitor
623  and apply to all queries, rather than only booleans. (Atri Sharma, Adrien
624  Grand, Alan Woodward)
625
626* LUCENE-8909: The deprecated IndexWriter#getFieldNames() method has been removed.
627  (Adrien Grand, Munendra S N)
628
629* LUCENE-8948: Change "name" argument in ICU factories to "form". Here, "form" is
630  named after "Unicode Normalization Form". (Tomoko Uchida)
631
632* LUCENE-8933: Validate JapaneseTokenizer user dictionary entry. (Tomoko Uchida)
633
634* LUCENE-8905: Better defence against malformed arguments in TopDocsCollector
635  (Atri Sharma)
636
637* LUCENE-9089: FST Builder renamed FSTCompiler with fluent-style Builder.
638  (Bruno Roustant)
639
640* LUCENE-9212: Deprecated Intervals.multiterm() methods that take a bare Automaton
641  have been removed (Alan Woodward)
642
643* LUCENE-9264: SimpleFSDirectory has been removed in favor of NIOFSDirectory.
644  (Yannick Welsch)
645
646* LUCENE-9281: Use java.util.ServiceLoader to load codec components and analysis
647  factories to be compatible with Java Module System. This allows to load factories
648  without META-INF/service from a Java module exposing the factory in the module
649  descriptor. This breaks backwards compatibility as custom analysis factories
650  must now also implement the default constructor (see MIGRATE.md).
651  (Uwe Schindler, Dawid Weiss)
652
653* LUCENE-9307: BufferedIndexInput#setBufferSize has been removed. (Adrien Grand)
654
655* LUCENE-9340: SimpleBindings#add(SortField) has been removed. (Alan Woodward)
656
657* LUCENE-9462: Fields without positions should still return MatchIterator.
658  (Alan Woodward, Dawid Weiss)
659
660* LUCENE-9516: Removed the ability to replace the IndexingChain / DocConsumer
661  in Lucenes IndexWriter. The interface is not sufficient to efficiently
662  replace the functionality with reasonable efforts. (Simon Willnauer)
663
664* LUCENE-9317 LUCENE-9318 LUCENE-9319 LUCENE-9558 LUCENE-9600 : Clean up package name conflicts
665  between modules. See MIGRATE.md for details. (David Ryan, Tomoko Uchida, Uwe Schindler, Dawid Weiss)
666
667* LUCENE-9646: Set BM25Similarity discountOverlaps via the constructor (Patrick Marty via Bruno Roustant)
668
669* LUCENE-9480: Make DataInput's skipBytes(long) abstract as the implementation was not performant.
670  IndexInput's api is unaffected: skipBytes() is implemented via seek(). (Greg Miller)
671
672* LUCENE-9796: SortedDocValues no longer extends BinaryDocValues, as binaryValue() was not performant.
673  See MIGRATE.md for details. (Robert Muir)
674
675* LUCENE-9853: JapaneseAnalyzer should use CJKWidthCharFilter for full-width and half-width character normalization.
676  (Tomoko Uchida)
677
678* LUCENE-9387: Removed CodecReader#ramBytesUsed. (Adrien Grand)
679
680* LUCENE-9334: Require consistency between data-structures on a per-field basis.
681  A field across all documents within an index must be indexed with the same index
682  options and data-structures. As a consequence of this, doc values updates are
683  only applicable for fields that are indexed with doc values only. (Mayya Sharipova,
684  Adrien Grand, Simon Willnauer)
685
686* LUCENE-9047: Directory API is now little endian. (Ignacio Vera, Adrien Grand)
687
688* LUCENE-9948: No longer require the user to specify whether-or-not a field is multi-valued in
689  LongValueFacetCounts (detect automatically based on what is indexed). (Greg Miller)
690
691* LUCENE-9843: Remove compression option on default codec's docvalues. (Jack Conradson)
692
693* LUCENE-9204: SpanQuery and its subclasses have been moved from core/ into the
694  queries/ module. (Alan Woodward)
695
696* LUCENE-9454: Analyzer no longer has a mutable version field. (Alan Woodward)
697
698* LUCENE-9956: Expose the getBaseQuery, getDrillDownQueries APIs from DrillDownQuery (Gautam Worah)
699
700* LUCENE-8143: SpanBoostQuery has been removed. (Alan Woodward)
701
702* LUCENE-9998: Remove unused parameter fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish(),
703  including those subclasses. (kkewwei)
704
705* LUCENE-7020: TieredMergePolicy#setMaxMergeAtOnceExplicit has been removed.
706  TieredMergePolicy no longer sets a limit on the maximum number of segments
707  that can be merged at once via a forced merge. (Adrien Grand, Shawn Heisey)
708
709* LUCENE-10027: Directory reader open API from indexCommit and leafSorter has been modified
710  to add an extra parameter - minSupportedMajorVersion. (Mayya Sharipova)
711
712* LUCENE-9620: Added a (sometimes) faster implementation for IndexSearcher#count that relies on the new Weight#count API.
713  The Weight#count API represents a cleaner way for Query classes to optimize their counting method.
714  (Gautam Worah, Adrien Grand)
715
716* LUCENE-10089: Add a method to SortField that allows to enable or disable numeric sort
717  optimization to use the points index to skip over non-competitive documents,
718  which is enabled by default from 9.0 (Mayya Sharipova, Adrien Grand)
719
720* LUCENE-10115: Add an extension point, BaseQueryParser#getFuzzyDistance, to allow custom
721  query parsers to determine the similarity distance for fuzzy queries. (Chris Hegarty)
722
723* LUCENE-10132: Support addition of diagnostics by custom merge policies (Chris Hegarty)
724
725* LUCENE-9325: Sort is now final, and the `setSort()` method has been removed (Alan Woodward)
726
727* LUCENE-9431: The UnifiedHighlighter's WEIGHT_MATCHES flag is now set by default, provided its
728  requirements are met.  It can be disabled via over-riding getFlags (Animesh Pandey, David Smiley)
729
730* LUCENE-10158: Add a new interface Unwrappable to the utils package to allow code to
731  unwrap wrappers/delegators that are added by Lucene's testing framework. This will allow
732  testing new MMapDirectory implementation based on JDK Project Panama. (Uwe Schindler)
733
734* LUCENE-10260: LucenePackage class has been removed. The implementation string can be
735  retrieved from Version.getPackageImplementationVersion(). (Uwe Schindler, Dawid Weiss)
736
737Improvements
738---------------------
739
740* LUCENE-10234: Added Automatic-Module-Name to all JARs. This is the first step to enable full Java
741  module system (JMS) support in later Lucene versions. At the moment, the automatic names should
742  not be considered stable. (Dawid Weiss, Uwe Schindler)
743
744* LUCENE-10182: TestRamUsageEstimator used RamUsageTester.sizeOf throughout, making some of the
745  tests trivial. Now, it compares results from RamUsageEstimator with those from RamUsageTester.
746  To prevent this error in the future, RamUsageTester.sizeOf was renamed to ramUsed.
747  (Uwe Schindler, Dawid Weiss, Stefan Vodita)
748
749* LUCENE-10129: RamUsageEstimator overloads the shallowSizeOf method for primitive arrays
750  to avoid falling back on shallowSizeOf(Object), which could lead to performance traps.
751  (Robert Muir, Uwe Schindler, Stefan Vodita)
752
753* LUCENE-10139: ExternalRefSorter returns a covariant with a subtype of BytesRefIterator
754  that is Closeable. (Dawid Weiss).
755
756* LUCENE-10135: Correct passage selector behavior for long matching snippets (Dawid Weiss).
757
758* LUCENE-9960: Avoid unnecessary top element replacement for equal elements in PriorityQueue. (Dawid Weiss)
759
760* LUCENE-9633: Improve match highlighter behavior for degenerate intervals (on non-existing positions).
761  (Dawid Weiss)
762
763* LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS is returned. (Patrick Zhai)
764
765* LUCENE-9576: Improve ConcurrentMergeScheduler settings by default, assuming modern I/O.
766  Previously Lucene was too conservative, jumping through hoops to detect if disks were SSD-backed.
767  In many common modern cases (VMs, RAID arrays, containers, encrypted mounts, non-Linux OS),
768  the pessimistic heuristics were wrong, resulting in slower indexing performance. Heuristics were
769  also complex and would trigger JDK issues even on unrelated mount points. Merge scheduler defaults
770  are now modernized and the heuristics removed. Users with spinning disks that want to maximize I/O
771  performance should tweak ConcurrentMergeScheduler. (Robert Muir)
772
773* LUCENE-9463: Query match region retrieval component, passage scoring and formatting
774  for building custom highlighters. (Alan Woodward, Dawid Weiss)
775
776* LUCENE-9370: RegExp query is no longer lenient about inappropriate backslashes and
777  follows the Java Pattern policy for rejecting illegal syntax.  (Mark Harwood)
778
779* LUCENE-9336: RegExp query now supports \w \W \d \D \s \S expressions.
780  This is a break with previous behaviour where these were (mis)interpreted
781  as literally the characters w W d etc. (Mark Harwood)
782
783* LUCENE-8757: When provided with an ExecutorService to run queries across
784  multiple threads, IndexSearcher now groups small segments together, up to
785  250k docs per slice. (Atri Sharma via Adrien Grand)
786
787* LUCENE-8857: Introduce Custom Tiebreakers in TopDocs.merge for tie breaking on
788  docs on equal scores. Also, remove the ability of TopDocs.merge to set shard
789  indices (Atri Sharma, Adrien Grand, Simon Willnauer)
790
791* LUCENE-8958: Shared count early termination for relevance sorted indices (Atri Sharma)
792
793* LUCENE-8937: Avoid aggressive stemming on numbers in the FrenchMinimalStemmer.
794  (Adrien Gallou via Tomoko Uchida)
795
796* LUCENE-8596: Kuromoji user dictionary now accepts entries containing hash mark (#) that were
797  previously treated as beginning a line-ending comment (Satoshi Kato and Masaru Hasegawa via
798  Michael Sokolov)
799
800* LUCENE-9109: Use StackWalker to implement TestSecurityManager's detection
801  of JVM exit (Uwe Schindler)
802
803* LUCENE-9110: Refactor stack analysis in tests to use generalized LuceneTestCase
804  methods that use StackWalker (Uwe Schindler)
805
806* LUCENE-9206: IndexMergeTool gets additional options to control the merging.
807  This tool no longer forceMerge(1)s to a single segment by default. If you
808  rely upon this behavior, pass -max-segments 1 instead. (Robert Muir)
809
810* LUCENE-9220: Upgrade snowball to 2.0. New snowball stemmers: Hindi, Indonesian,
811  Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball'
812  task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss)
813
814* LUCENE-9354: Improvements to snowball french stopwords list, so that it is less
815  aggressive. (Philippe Ouellet)
816
817* LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley)
818
819* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma)
820
821* LUCENE-9280: Add an ability for field comparators to skip non-competitive documents.
822  Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE
823  instructs Lucene to skip non-competitive documents whenever possible. For numeric
824  sort fields the skipping functionality works when the same field is indexed both
825  with doc values and points. In this case, there is an assumption that the same data is
826  stored in these points and doc values (Mayya Sharipova, Jim Ferenczi, Adrien Grand)
827
828* LUCENE-9449: Enhance DocComparator to provide an iterator over competitive
829  documents when searching with "after". This iterator can quickly position
830  on the desired "after" document skipping all documents and segments before
831  "after". Also redesign numeric comparators to provide skipping functionality
832  by default. (Mayya Sharipova, Jim Ferenczi)
833
834* LUCENE-9527: Upgrade javacc to 7.0.4, regenerate query parsers. (Dawid Weiss)
835
836* LUCENE-9531: Consolidated CharStream and FastCharStream classes: these have been moved
837  from each query parser package to org.apache.lucene.queryparser.charstream (Dawid Weiss).
838
839* LUCENE-9450: Use BinaryDocValues for the taxonomy index instead of StoredFields.
840  Add backwards compatibility tests for the taxonomy index. (Gautam Worah, Michael McCandless)
841
842* LUCENE-9605: Update snowball to d8cf01ddf37a, adds Yiddish stemmer. (Robert Muir)
843
844* LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag,
845  and rename to DirectIODirectory (Zach Chen, Uwe Schindler, Mike McCandless, Dawid Weiss).
846
847* LUCENE-9674: Implement faster advance on VectorValues using binary search.
848  (Anand Kotriwal, Mike Sokolov)
849
850* LUCENE-9794: Speed up implementations of DataInput.skipBytes(). (Greg Miller)
851
852* LUCENE-9898: Removes no longer used scorePayload method from BM25Similarity
853  (Pieter van Boxtel)
854
855* LUCENE-9850: Switch to PFOR encoding for doc IDs (instead of FOR). (Greg Miller)
856
857* LUCENE-9929: Add NorwegianNormalizationFilter, which does the same as ScandinavianNormalizationFilter except
858  it does not fold oo->ø and ao->å. (janhoy, Robert Muir, Adrien Grand)
859
860* LUCENE-9535: Improve DocumentsWriterPerThreadPool to prefer larger instances.
861  (Adrien Grand)
862
863* LUCENE-10000: MultiCollectorManager now has parity with MultiCollector with respect to how it
864  handles CollectionTerminationException and setMinCompetitiveScore calls. (Greg Miller)
865
866* LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)
867  (Uwe Schinder)
868
869* LUCENE-9662: Make CheckIndex concurrent by parallelizing index check across segments.
870  (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir)
871
872* LUCENE-9476: Add new getBulkPath API to DirectoryTaxonomyReader to more efficiently retrieve FacetLabels for multiple
873  facet ordinals at once. This API is 2-4% faster than iteratively calling getPath.
874  The getPath API now throws an IAE instead of returning null if the ordinal is out of bounds.
875  (Gautam Worah, Mike McCandless)
876
877* LUCENE-10113: Use VarHandles to access int/long/short primitive types in byte arrays.
878  This improves readability and performance of encoding/decoding of primitives to index
879  file format in input/output classes like DataInput / DataOutput and codecs.
880  (Uwe Schindler, Robert Muir)
881
882* LUCENE-10112: Improve LZ4 Compression performance with direct primitive read/writes.
883  (Tim Brooks, Uwe Schindler, Robert Muir, Adrien Grand)
884
885* LUCENE-10125: Optimize primitive writes in OutputStreamIndexOutput.
886  (Uwe Schindler, Robert Muir, Adrien Grand)
887
888* LUCENE-10143: Delegate primitive writes in RateLimitedIndexOutput.
889  (Uwe Schindler, Robert Muir, Adrien Grand)
890
891* LUCENE-10145, LUCENE-10153: Faster flushes and merges of points by leveraging
892  VarHandles. (Adrien Grand)
893
894* LUCENE-10201: Spatial-Extras: Upgrading Spatial4j to 0.8 improving a varitety of minor things.
895  See release notes. https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8
896  (David Smiley)
897
898* LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values
899  with its own custom encoding. (Greg Miller)
900
901Bug fixes
902---------------------
903
904* LUCENE-9686: Fix read past EOF handling in DirectIODirectory. (Zach Chen,
905  Julie Tibshirani)
906
907* LUCENE-8663: NRTCachingDirectory.slowFileExists may open a file while
908  it's inaccessible. (Dawid Weiss)
909
910* LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to
911  estimate Long.valueOf cache size. (Cleber Muramoto, Dawid Weiss)
912
913* LUCENE-9290: Don't assume that different XYPoint have different hash code
914  (Ignacio Vera via Mike Drob)
915
916* LUCENE-9372: Fix paths for cygwin/msys before gradle wrapper jar lookup.
917  (Peter Barna)
918
919* LUCENE-9365: FuzzyQuery was missing matches when prefix length was equal to the term length
920  (Mark Harwood, Mike Drob)
921
922* LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear edges during polygon
923  splitting. (Ignacio Vera)
924
925* LUCENE-9930: The Ukrainian analyzer was reloading its dictionary for every new
926  TokenStreamComponents, which could lead to memory leaks. (Alan Woodward)
927
928* LUCENE-9940: The order of disjuncts in DisjunctionMaxQuery does not matter
929  for equality checks (Alan Woodward)
930
931* LUCENE-9971: Requesting facet counts for unseen dimensions in SortedSetDocValueFacetCounts and
932  ConcurrentSortedSetDocValueFacetCounts now returns null / -1 instead of throwing
933  IllegalArgumentException as per Javadoc spec in Facets. (Alexander Lukyanchikov)
934
935* LUCENE-9823: Prevent unsafe rewrites for SynonymQuery and CombinedFieldQuery. Before, rewriting
936  could slightly change the scoring when weights were specified. (Naoto Minami via Julie Tibshirani)
937
938* LUCENE-10047: Fix a value de-duping bug in LongValueFacetCounts and RangeFacetCounts
939  (Greg Miller)
940
941* LUCENE-10101, LUCENE-9281: Use getField() instead of getDeclaredField() to
942  minimize security impact by analysis SPI discovery. (Uwe Schindler)
943
944* LUCENE-10114: Remove unused byte order mark in Lucene90PostingsWriter. This
945  was initially introduced by accident in Lucene 8.4. (Uwe Schindler)
946
947* LUCENE-10140: Fix cases where minimizing interval iterators could return
948  incorrect matches (Nikolay Khitrin, Alan Woodward)
949
950Changes in Backwards Compatibility Policy
951
952* LUCENE-9904: regenerated UAX29URLEmailTokenizer and the corresponding analyzer with up-to-date top
953  level domains. This may change the token sequence compared to previous Lucene versions. (Dawid Weiss)
954
955* LUCENE-9669: DirectoryReader#open now accepts an argument to open indices created with versions
956  older than N-1. Lucene now can open indices created with a major version of N-2 in read-only mode.
957  Opening an index created with a major version of N-2 with an IndexWriter is not supported.
958  Further does lucene only support file-format compatibilty which enables reading of old indices while
959  semantic changes like analysis or certain encoding on top of the file format are only supported on
960  a best effort basis. (Simon Willnauer)
961
962* LUCENE-10232: Fix MultiRangeQuery to confirm all dimensions for a given range match. (Greg Miller)
963
964Build
965---------------------
966
967* LUCENE-9077 LUCENE-9433: Support Gradle build, remove Ant support from trunk (Dawid Weiss, Erick Erickson, Uwe Schindler et.al.)
968
969* LUCENE-8768: Fix Javadocs build in Java 11. (Namgyu Kim)
970
971* LUCENE-9544: add regenerate gradle script for nori dictionary (Namgyu Kim)
972
973* LUCENE-10195: Add gradle cache option and make some tasks cacheable. (Jerome Prinet, Dawid Weiss)
974
975* LUCENE-10198: LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults
976  (balmukund.mandal@intel.com, Dawid Weiss)
977
978* LUCENE-10163: Move LICENSE and NOTICE files to top level to satisfy src artifact requirements (janhoy)
979
980Other
981---------------------
982
983* LUCENE-10122: Use NumericDocValues to store taxonomy parent array (Patrick Zhai)
984
985* LUCENE-10136: allow 'var' declarations in source code (Dawid Weiss)
986
987* LUCENE-9570, LUCENE-9564: Apply google java format and enforce it on source Java files.
988  Review diffs and correct automatic formatting oddities. (Erick Erickson,
989  Bruno Roustant, Dawid Weiss)
990
991* LUCENE-9631: Properly override slice() on subclasses of OffsetRange. (Dawid Weiss)
992
993* LUCENE-9391: Upgrade HPPC to 0.8.2. (Patrick Zhai)
994
995* LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap. (Patrick Zhai)
996
997* LUCENE-9092: upgrade randomizedtesting to 2.7.5 (Dawid Weiss)
998
999* LUCENE-8656: Deprecations in FuzzyQuery and get compiler warnings out of
1000  queryparser code (Alan Woodward, Erick Erickson)
1001
1002* LUCENE-9344: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
1003
1004* LUCENE-9267: Update MatchingQueries documentation to correct
1005  time unit. (Pierre-Luc Perron via Mike Drob)
1006
1007* LUCENE-9411: Fail compilation on warnings, 9x gradle-only (Erick Erickson, Dawid Weiss)
1008  Deserves mention here as well as Lucene CHANGES.txt since it affects both.
1009
1010* LUCENE-9215: Replace checkJavaDocs.py with doclet (Robert Muir, Dawid Weiss, Uwe Schindler)
1011
1012* LUCENE-9497: Integrate Error Prone, a static analysis tool during compilation (Dawid Weiss, Varun Thacker)
1013
1014* LUCENE-9627: Remove unused Lucene50FieldInfosFormat codec and small refactor some codecs
1015  to separate reading header/footer from reading content of the file. (Ignacio Vera)
1016
1017* LUCENE-9773: Upgrade icu to 68.2 (Robert Muir)
1018
1019* LUCENE-9822: Add assertion to PFOR exception encoding, documenting the BLOCK_SIZE assumption. (Greg Miller)
1020
1021* LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting. (Zach Chen)
1022
1023* LUCENE-9705: Make new versions of all index formats for the Lucene90 codec and move
1024  the existing ones to the backwards codecs. (Julie Tibshirani, Ignacio Vera)
1025
1026* LUCENE-9907: Remove dependency on PackedInts#getReader() from the current codecs and move the
1027  method to backwards codec. (Ignacio Vera)
1028
1029* LUCENE-10024: Catch NoSuchFileException when opening index directory with Luke.
1030  (Michael Wechner, Tomoko Uchida)
1031
1032======================= Lucene 8.11.1 =======================
1033
1034Bug Fixes
1035---------------------
1036* SOLR-15843: Update Log4J to 2.16 (Mike Drob, janhoy)
1037
1038======================= Lucene 8.11.0 =======================
1039
1040API Changes
1041---------------------
1042(No changes)
1043
1044New Features
1045---------------------
1046(No changes)
1047
1048Improvements
1049---------------------
1050
1051* LUCENE-9662: Make CheckIndex concurrent by parallelizing index check across segments.
1052  (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir)
1053
1054* LUCENE-10103: Make QueryCache respect Accountable queries. (Patrick Zhai)
1055
1056Optimizations
1057---------------------
1058
1059* LUCENE-9673: Substantially improve RAM efficiency of how MemoryIndex stores
1060  postings in memory, and reduced a bit of RAM overhead in
1061  IndexWriter's internal postings book-keeping (mashudong)
1062
1063* LUCENE-10196: Improve IntroSorter with 3-ways partitioning. (Bruno Roustant)
1064
1065Bug Fixes
1066---------------------
1067
1068* LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter.
1069  (Lu Xugang)
1070
1071* LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter.
1072  (Lu Xugang)
1073
1074* LUCENE-10070 Skip deleted docs when accumulating facet counts for all docs. (Ankur Goel, Greg Miller)
1075
1076* LUCENE-10134: ConcurrentSortedSetDocValuesFacetCounts shouldn't share liveDocs Bits across threads.
1077  (Ankur Goel)
1078
1079* LUCENE-10154: NumericLeafComparator to define getPointValues. (Mayya Sharipova, Adrien Grand)
1080
1081* LUCENE-10208: Ensure that the minimum competitive score does not decrease in concurrent search. (Jim Ferenczi, Adrien Grand)
1082
1083Build
1084---------------------
1085
1086* LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2.  (Uwe Schindler)
1087
1088Other
1089---------------------
1090
1091* LUCENE-10098: Add docs/links to GermanAnalyzer describing how to decompound nouns. (Robert Muir)
1092
1093======================= Lucene 8.10.1 =======================
1094
1095Bug Fixes
1096---------------------
1097
1098* LUCENE-10110: MultiCollector now handles single leaf collector that wants to skip low-scoring hits
1099 but the combined score mode doesn't allow it. (Jim Ferenczi)
1100
1101* LUCENE-10119: Sort optimization with search_after can wrongly skip documents
1102  whose values are equal to the last value of the previous page (Nhat Nguyen)
1103
1104* LUCENE-10126: Sort optimization with a chunked bulk scorer
1105  can wrongly skip documents (Nhat Nguyen, Mayya Sharipova)
1106
1107======================= Lucene 8.10.0 =======================
1108
1109API Changes
1110---------------------
1111* LUCENE-9962: DrillSideways allows sub-classes to provide "drill down" FacetsCollectors. They
1112  may provide a null collector if they choose to bypass "drill down" facet collection. (Greg Miller)
1113
1114* LUCENE-9902: Change the getValue method from IntTaxonomyFacets to be protected instead of private.
1115  Users can now access the count of an ordinal directly without constructing an extra FacetLabel.
1116  Also use variable length arguments for the getOrdinal call in TaxonomyReader. (Gautam Worah)
1117
1118* LUCENE-10036: Replaced the ScoreCachingWrappingScorer ctor with a static factory method that
1119  ensures unnecessary wrapping doesn't occur. (Greg Miller)
1120
1121* LUCENE-10027: Add a new Directory reader open API from indexCommit and
1122  a custom comparator for sorting leaf readers. (Mayya Sharipova)
1123
1124* LUCENE-7020: TieredMergePolicy#setMaxMergeAtOnceExplicit is deprecated
1125  and the number of segments that get merged via explicit merges is unlimited
1126  by default. (Adrien Grand, Shawn Heisey)
1127
1128New Features
1129---------------------
1130* LUCENE-10083: Analyzer and stemmer for Telugu language (Vinod Singh)
1131
1132* LUCENE-10035: The SimpleText codec now writes skip lists.
1133  (wuda via Adrien Grand)
1134
1135Improvements
1136---------------------
1137* LUCENE-9944: Allow DrillSideways users to provide their own CollectorManager without also requiring
1138  them to provide an ExecutorService. (Greg Miller)
1139
1140* LUCENE-9946: Support for multi-value fields in LongRangeFacetCounts and
1141  DoubleRangeFacetCounts. (Greg Miller)
1142
1143* LUCENE-9965: Added QueryProfilerIndexSearcher and ProfilerCollector to support debugging
1144  query execution strategy and timing. (Jack Conradson, Julie Tibshirani)
1145
1146* LUCENE-9981: Operations.getCommonSuffix/Prefix(Automaton) is now much more
1147  efficient, from a worst case exponential down to quadratic cost in the
1148  number of states + transitions in the Automaton.  These methods no longer
1149  use the costly determinize method, removing the risk of
1150  TooComplexToDeterminizeException (Robert Muir, Mike McCandless)
1151
1152* LUCENE-9981: Operations.determinize now throws TooComplexToDeterminizeException
1153  based on too much "effort" spent determinizing rather than a precise state
1154  count on the resulting returned automaton, to better handle adversarial
1155  cases like det(rev(regexp("(.*a){2000}"))) that spend lots of effort but
1156  result in smallish eventual returned automata.  (Robert Muir, Mike McCandless)
1157
1158* LUCENE-9983: Stop sorting determinize powersets unnecessarily. (Patrick Zhai)
1159
1160* LUCENE-9177: ICUNormalizer2CharFilter no longer requires normalization-inert
1161  characters as boundaries for incremental processing, vastly improving worst-case
1162  performance. (Michael Gibney)
1163
1164* LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring
1165  (Grigoriy Troitskiy)
1166
1167* LUCENE-9945: Extend DrillSideways to support exposing FacetCollectors directly.
1168  (Greg Miller, Sejal Pawar)
1169
1170* LUCENE-10043: Decrease default for LRUQueryCache's skipCacheFactor to 10.
1171  This prevents caching a query clause when it is much more expensive than
1172  running the top-level query. (Julie Tibshirani)
1173
1174* LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts. (Greg Miller)
1175
1176* LUCENE-9917: The BEST_SPEED compression mode now trades more compression ratio
1177  in exchange of faster reads. (Adrien Grand)
1178
1179Optimizations
1180---------------------
1181* LUCENE-9996: Improved memory efficiency of IndexWriter's RAM buffer, in
1182  particular in the case of many fields and many indexing threads.
1183  (Adrien Grand)
1184
1185* LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery.
1186  (David Harsha via Julie Tibshirani)
1187
1188* LUCENE-10031: Slightly faster segment merging for sorted indices.
1189  (Adrien Grand)
1190
1191* LUCENE-10014: Lucene90DocValuesFormat was using too many bits per
1192  value when compressing via gcd, unnecessarily wasting index storage.
1193  (weizijun)
1194
1195Bug Fixes
1196---------------------
1197* LUCENE-9988: Fix DrillSideways correctness bug introduced in LUCENE-9944 (Greg Miller)
1198
1199* LUCENE-9964: Duplicate long values in a document field should only be counted once when using SortedNumericDocValuesFields
1200  (Gautam Worah)
1201
1202* LUCENE-9999: CombinedFieldQuery can fail with an exception when document
1203  is missing some fields. (Jim Ferenczi, Julie Tibshirani)
1204
1205* LUCENE-10020: DocComparator should not skip docs with the same docID on
1206  multiple sorts with search after (Mayya Sharipova, Julie Tibshirani)
1207
1208* LUCENE-10026: Fix CombinedFieldQuery equals and hashCode, which ensures
1209  query rewrites don't drop CombinedFieldQuery clauses. (Julie Tibshirani)
1210
1211* LUCENE-10039: Correct CombinedFieldQuery scoring when there is a single
1212  field. (Julie Tibshirani)
1213
1214* LUCENE-10046: Counting bug fixed in StringValueFacetCounts. (Greg Miller)
1215
1216* LUCENE-9963: FlattenGraphFilter is now more robust when handling
1217  incoming holes in the input token graph (Geoff Lawson)
1218
1219* LUCENE-10008: Respect ignoreCase in CommonGramsFilterFactory (Vigya Sharma)
1220
1221* LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached. (Greg Miller, Zachary Chen)
1222
1223* LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces.
1224  (Jim Ferenczi)
1225
1226* LUCENE-10106: Sort optimization can wrongly skip the first document of
1227  each segment (Nhat Nguyen)
1228
1229Other
1230---------------------
1231(No changes)
1232
1233======================= Lucene 8.9.0 =======================
1234
1235API Changes
1236---------------------
1237
1238* LUCENE-9680: IndexWriter#getFieldNames() method added to get fields present in index.
1239  This method was removed in LUCENE-8909. (Oren Ovadia)
1240
1241New Features
1242---------------------
1243* LUCENE-9507: Custom order for leaves in IndexReader and IndexWriter
1244  (Mayya Sharipova, Mike McCandless, Jim Ferenczi)
1245
1246* LUCENE-9575: PatternTypingFilter has been added to allow setting a type attribute on tokens based on
1247  a configured set of regular expressions (Gus Heck).
1248
1249* LUCENE-9572: TypeAsSynonymFilter has been enhanced support ignoring some types, and to allow
1250  the generated synonyms to copy some or all flags from the original token (Gus Heck).
1251
1252* LUCENE-9574 A token filter to drop tokens that match all specified flags. (Gus Heck, Uwe Schindler)
1253
1254* LUCENE-9537:  Added smoothingScore method and default implementation to
1255  Scorable abstract class.  The smoothing score allows scorers to calculate a
1256  score for a document where the search term or subquery is not present.  The
1257  smoothing score acts like an idf so that documents that do not have terms or
1258  subqueries that are more frequent in the index are not penalized as much as
1259  documents that do not have less frequent terms or subqueries and prevents
1260  scores which are the product or terms or subqueries from going to zero. Added
1261  the implementation of the Indri AND and the IndriDirichletSimilarity from the
1262  academic Indri search engine: http://www.lemurproject.org/indri.php.
1263  (Cameron VandenBerg)
1264
1265* LUCENE-9694: New tool for creating a deterministic index to enable benchmarking changes
1266  on a consistent multi-segment index even when they require re-indexing. (Patrick Zhai)
1267
1268* LUCENE-9385: Add FacetsConfig option to control which drill-down
1269  terms are indexed for a FacetLabel (Zachary Chen)
1270
1271* LUCENE-9950: New facet counting implementation for general string doc value fields
1272  (SortedSetDocValues / SortedDocValues) not created through FacetsConfig (Greg Miller)
1273
1274Improvements
1275---------------------
1276
1277* LUCENE-9725: BM25FQuery was extended to handle similarities beyond BM25Similarity. It
1278  was renamed to CombinedFieldQuery to reflect its more general scope. (Julie Tibshirani)
1279
1280* LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues.
1281  (Jaison Bi via Bruno Roustant)
1282
1283* LUCENE-9687: Hunspell support improvements: add API for spell-checking and suggestions, support compound words,
1284  fix various behavior differences between Java and C++ implementations, improve performance (Peter Gromov, Dawid Weiss)
1285
1286* LUCENE-9877: Reduce index size by increasing allowable exceptions in PForUtil from 3 to 7. (Greg Miller)
1287
1288* LUCENE-9935: Enable bulk merge for stored fields with index sort. (Robert Muir, Adrien Grand, Nhat Nguyen)
1289
1290Optimizations
1291---------------------
1292
1293* LUCENE-9932: Performance improvement for BKD index building (neoremind)
1294
1295* LUCENE-9827: Speed up merging of stored fields and term vectors for smaller segments.
1296  (Daniel Mitterdorfer, Dimitrios Liapis, Adrien Grand, Robert Muir)
1297
1298Bug Fixes
1299---------------------
1300
1301* LUCENE-9791: BytesRefHash.equals/find is now thread safe, fixing a
1302  Luwak/Monitor bug causing registered queries to sometimes fail to
1303  match. (Paweł Bugalski)
1304
1305* LUCENE-9887: Fixed parameter use in RadixSelector.
1306  (liupanfeng via Adrien Grand)
1307
1308* LUCENE-9958: Fixed performance regression for boolean queries that configure a
1309  minimum number of matching clauses. (Adrien Grand, Matt Weber)
1310
1311* LUCENE-9953: LongValueFacetCounts should count each document at most once when determining
1312  the total count for a dimension. Prior to this fix, multi-value docs could contribute a > 1
1313  count to the dimension count. (Greg Miller)
1314
1315* LUCENE-9967: Do not throw NullPointerException while trying to handle another exception in
1316  ReplicaNode.start (Steven Schlansker)
1317
1318* LUCENE-9991: Fix edge case failure in TestStringValueFacetCounts (Greg Miller)
1319
1320Other
1321---------------------
1322
1323* LUCENE-9836: Removed the pure Maven build. It is no longer possible to build
1324  artifacts using Maven (this feature was no longer working correctly). Due to
1325  migration to Gradle for Lucene/Solr 9.0, the maintenance of the Maven build
1326  was no longer reasonable. POM files are generated for deployment to Maven
1327  Central only. Please use "ant generate-maven-artifacts" to produce and deploy
1328  artifacts to any repository.  (Uwe Schindler, Dawid Weiss)
1329
1330* LUCENE-9836: Migrate Maven tasks to use "maven-resolver-ant-tasks"
1331  instead of the no longer maintained "maven-ant-tasks".  (Uwe Schindler)
1332
1333* LUCENE-9985: Upgrade jetty to 9.4.41 (janhoy)
1334
1335* LUCENE-9976: Fix WANDScorer assertion error. (Zach Chen, Adrien Grand, Dawid Weiss)
1336======================= Lucene 8.8.2 =======================
1337
1338Bug Fixes
1339---------------------
1340
1341* LUCENE-9870: Fix Circle2D intersectsLine t-value (distance) range clamp (Jørgen Nystad)
1342
1343* LUCENE-9744: NPE on a degenerate query in MinimumShouldMatchIntervalsSource
1344  $MinimumMatchesIterator.getSubMatches(). (Alan Woodward)
1345
1346* LUCENE-9762: DoubleValuesSource.fromQuery (also used by FunctionScoreQuery.boostByQuery) could
1347  throw an exception when the query implements TwoPhaseIterator and when the score is requested
1348  repeatedly. (David Smiley, hossman)
1349
1350======================= Lucene 8.8.1 =======================
1351
1352Bug Fixes
1353---------------------
1354(No changes)
1355
1356======================= Lucene 8.8.0 =======================
1357
1358New Features
1359---------------------
1360
1361* LUCENE-9552: New LatLonPoint query that accepts an array of LatLonGeometries. (Ignacio Vera)
1362
1363* LUCENE-9641: LatLonPoint query support for spatial relationships. (Ignacio Vera)
1364
1365* LUCENE-9553: New XYPoint query that accepts an array of XYGeometries. (Ignacio Vera)
1366
1367* LUCENE-9378: Doc values now allow configuring how to trade compression for
1368  retrieval speed. (Adrien Grand)
1369
1370* LUCENE-9413: Add CJKWidthCharFilter and its factory (Tomoko Uchida)
1371
1372Improvements
1373---------------------
1374
1375* LUCENE-9455: ExitableTermsEnum should sample timeout and interruption
1376  check before calling next(). (Zach Chen via Bruno Roustant)
1377
1378* LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when the
1379  provided min is 1. (Jim Ferenczi)
1380
1381* LUCENE-9675: Binary doc values fields now expose their configured compression mode
1382  in the attributes of the field info. (Jim Ferenczi)
1383
1384Optimizations
1385---------------------
1386
1387* LUCENE-9536: Reduced memory usage for OrdinalMap when a segment has all
1388  values. (Julie Tibshirani via Adrien Grand)
1389
1390* LUCENE-9021: QueryParser: re-use the LookaheadSuccess exception. (Przemek Bruski via Mikhail Khludnev)
1391
1392* LUCENE-9636: Faster decoding of postings for some numbers of bits per value.
1393  (Guo Feng via Adrien Grand)
1394
1395* LUCENE-9346: WANDScorer now supports queries that have a
1396  `minimumNumberShouldMatch` configured. (Xi Zachary Chen via Adrien Grand)
1397
1398Bug Fixes
1399---------------------
1400
1401* LUCENE-9508: DocumentsWriter was only stalling threads for 1 second allowing
1402  documents to be indexed even the DocumentsWriter wasn't able to keep up flushing.
1403  Unless IW can't make progress due to an ill behaving DWPT this issue was barely
1404  noticeable. (Simon Willnauer)
1405
1406* LUCENE-9581: Japanese tokenizer should discard the compound token instead of disabling the decomposition
1407  of long tokens when discardCompoundToken is activated. (Jim Ferenczi)
1408
1409* LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic.
1410  (Ignacio Vera)
1411
1412* LUCENE-9606: Wrap boolean queries generated by shape fields with a Constant score query. (Ignacio Vera)
1413
1414* LUCENE-9635: BM25FQuery - Mask encoded norm long value in array lookup.
1415  (Yilun Cui)
1416
1417* LUCENE-9617: Fix per-field memory leak in IndexWriter.deleteAll(). Reset next available internal
1418  field number to 0 on FieldInfos.clear(), to avoid wasting FieldInfo references. (Michael Froh)
1419
1420* LUCENE-9642: When encoding triangles in ShapeField, make sure generated triangles are CCW by rotating
1421  triangle points before checking triangle orientation. (Ignacio Vera)
1422
1423* LUCENE-9661: Fix deadlock in TermsEnum.EMPTY that occurs when trying to initialize TermsEnum and BaseTermsEnum
1424  at the same time (Namgyu Kim)
1425
1426Other
1427---------------------
1428
1429* SOLR-14995: Update Jetty to 9.4.34 (Mike Drob)
1430
1431* LUCENE-9637: Removes some unused code and replaces the Point implementation on ShapeField/ShapeQuery
1432  random tests. (Ignacio Vera)
1433
1434======================= Lucene 8.7.0 =======================
1435
1436API Changes
1437---------------------
1438
1439* LUCENE-9437: Lucene's facet module's DocValuesOrdinalsReader.decode method
1440  is now public, making it easier for applications to decode facet
1441  ordinals into their corresponding labels (Ankur Goel)
1442
1443* LUCENE-9515: IndexingChain now accepts individual primitives rather than a
1444  DocumentsWriterPerThread instance in order to create a new DocConsumer.
1445  (Simon Willnauer)
1446
1447New Features
1448---------------------
1449
1450* LUCENE-9386: RegExpQuery added case insensitive matching option. (Mark Harwood)
1451
1452* LUCENE-8962: Add IndexWriter merge-on-refresh feature to selectively merge
1453  small segments on getReader, subject to a configurable timeout, to improve
1454  search performance by reducing the number of small segments for searching. (Simon Willnauer)
1455
1456* LUCENE-9484: Allow sorting an index after it was created. With SortingCodecReader, existing
1457  unsorted segments can be wrapped and merged into a fresh index using IndexWriter#addIndices
1458  API. (Simon Willnauer, Adrien Grand)
1459
1460* LUCENE-9444: Add utility class to retrieve facet labels from the
1461  taxonomy index for a facet field so such fields do not also have to
1462  be redundantly stored (Ankur Goel)
1463
1464Improvements
1465---------------------
1466
1467* LUCENE-8574: Add a new ExpressionValueSource which will enforce only one value per name
1468  per hit in dependencies, ExpressionFunctionValues will no longer
1469  recompute already computed values (Patrick Zhai)
1470
1471* LUCENE-9416: Fix CheckIndex to print an invalid non-zero norm as
1472  unsigned long when detecting corruption.
1473
1474* LUCENE-9440: FieldInfo#checkConsistency called twice from Lucene50(60)FieldInfosFormat#read;
1475  Removed the (redundant?) assert and do these checks for real. (Yauheni Putsykovich)
1476
1477* LUCENE-9446: In BooleanQuery rewrite, always remove MatchAllDocsQuery filter clauses
1478  when possible. (Julie Tibshirani)
1479
1480* LUCENE-9501: Improve coverage for Asserting* test classes: make sure to handle singleton doc
1481  values, and sometimes exercise Weight#scorer instead of Weight#bulkScorer for top-level
1482  queries. (Julie Tibshirani)
1483
1484* LUCENE-9511: Include StoredFieldsWriter in DWPT accounting to ensure that it's
1485  heap consumption is taken into account when IndexWriter stalls or should flush
1486  DWPTs. (Simon Willnauer)
1487
1488* LUCENE-9514: Include TermVectorsWriter in DWPT accounting to ensure that it's
1489  heap consumption is taken into account when IndexWriter stalls or should flush
1490  DWPTs. (Simon Willnauer)
1491
1492* LUCENE-9523: In query shapes over shape fields, skip points while traversing the
1493  BKD tree when the relationship with the document is already known. (Ignacio Vera)
1494
1495* LUCENE-9539: Use more compact datastructures to represent sorted doc-values in memory when
1496  sorting a segment before flush and in SortingCodecReader. (Simon Willnauer)
1497
1498* LUCENE-9458: WordDelimiterGraphFilter should order tokens at the same position by endOffset to
1499  emit longer tokens first.  The same graph is produced. (David Smiley)
1500
1501Optimizations
1502---------------------
1503
1504* LUCENE-9395: ConstantValuesSource now shares a single DoubleValues
1505  instance across all segments (Tony Xu)
1506
1507* LUCENE-9447, LUCENE-9486: Stored fields now get higer compression ratios on
1508  highly compressible data. (Adrien Grand)
1509
1510* LUCENE-9373: FunctionMatchQuery now accepts a "matchCost" optimization hint.
1511  (Maxim Glazkov, David Smiley)
1512
1513* LUCENE-9510: Indexing with an index sort is now faster by not compressing
1514  temporary representations of the data. (Adrien Grand)
1515
1516Bug Fixes
1517---------------------
1518
1519* LUCENE-9427: Fix a regression where the unified highlighter didn't produce
1520  highlights on fuzzy queries that correspond to exact matches. (Julie Tibshirani)
1521
1522* LUCENE-9467: Fix NRTCachingDirectory to use Directory#fileLength to check if a file
1523  already exists instead of opening an IndexInput on the file which might throw a AccessDeniedException
1524  in some Directory implementations. (Simon Willnauer)
1525
1526* LUCENE-9501: Fix a bug in IndexSortSortedNumericDocValuesRangeQuery where it could violate the
1527  DocIdSetIterator contract. (Julie Tibshirani)
1528
1529* LUCENE-9401: Include field in ComplexPhraseQuery's toString() (Thomas Hecker via Munendra S N)
1530
1531* LUCENE-9578: Fix TermRangeQuery when there is no upper bound and the lower
1532  bound is the empty string excluded. This would previously match no strings at
1533  all while it should match all non-empty strings.
1534  (Christoph Buescher via Adrien Grand)
1535
1536* LUCENE-9524: Fix NPE in SpanWeight#explain when no scoring is required and
1537  SpanWeight has null Similarity.SimScorer. (Zach Chen)
1538
1539Documentation
1540---------------------
1541
1542* LUCENE-9424: Add a performance warning to AttributeSource.captureState javadocs (Patrick Zhai)
1543
1544Changes in Runtime Behavior
1545---------------------
1546
1547* LUCENE-9539: SortingCodecReader now doesn't cache doc values fields anymore. Previously, SortingCodecReader
1548  used to cache all doc values fields after they were loaded into memory. This reader should only be used
1549  to sort segments after the fact using IndexWriter#addIndices. (Simon Willnauer)
1550
1551
1552Other
1553---------------------
1554
1555* LUCENE-9292: Refactor BKD point configuration into its own class. (Ignacio Vera)
1556
1557* LUCENE-9470: Make TestXYMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera)
1558
1559* LUCENE-9512: Move LockFactory stress test to be a unit/integration
1560  test. (Uwe Schindler, Dawid Weiss, Robert Muir)
1561
1562Build
1563
1564* Upgrade forbiddenapis to version 3.1.  (Uwe Schindler)
1565
1566======================= Lucene 8.6.3 =======================
1567
1568Bug Fixes
1569---------------------
1570(No changes)
1571
1572======================= Lucene 8.6.2 =======================
1573
1574Bug Fixes
1575---------------------
1576* LUCENE-9478: Prevent DWPTDeleteQueue from referencing itself and leaking memory. The queue
1577  passed an implicit this reference to the next queue instance on flush which leaked about 500byte
1578  of memory on each full flush, commit or getReader call. (Simon Willnauer)
1579
1580======================= Lucene 8.6.1 =======================
1581
1582Bug Fixes
1583---------------------
1584* LUCENE-9443: The UnifiedHighlighter was closing the underlying reader when there were multiple term-vector fields.
1585  This was a regression in 8.6.0.  (David Smiley, Chris Beer)
1586
1587======================= Lucene 8.6.0 =======================
1588
1589API Changes
1590---------------------
1591
1592* LUCENE-9265: SimpleFSDirectory is deprecated in favor of NIOFSDirectory. (Yannick Welsch)
1593
1594* LUCENE-9304: Removed ability to set DocumentsWriterPerThreadPool on IndexWriterConfig.
1595  The DocumentsWriterPerThreadPool is a packaged protected final class which made it impossible
1596  to customize. (Simon Willnauer)
1597
1598* LUCENE-9339: MergeScheduler#merge doesn't accept a parameter if a new merge was found anymore.
1599  (Simon Willnauer)
1600
1601* LUCENE-9330: SortFields are now responsible for writing themselves into index headers if they
1602  are used as index sorts.  (Alan Woodward, Uwe Schindler, Adrien Grand)
1603
1604* LUCENE-9340: Deprecate SimpleBindings#add(SortField). (Alan Woodward)
1605
1606* LUCENE-9345: MergeScheduler is now decoupled from IndexWriter. Instead it accepts a MergeSource
1607  interface that offers the basic methods to acquire pending merges, run the merge and do accounting
1608  around it. (Simon Willnauer)
1609
1610* LUCENE-9349: QueryVisitor.consumeTermsMatching() now takes a
1611  Supplier<ByteRunAutomaton> to enable queries that build large automata to
1612  provide them lazily.  TermsInSetQuery switches to using this method
1613  to report matching terms. (Alan Woodward)
1614
1615* LUCENE-9366: DocValues.emptySortedNumeric() not longer takes a maxDoc parameter
1616  (Alan Woodward)
1617
1618* LUCENE-7822: CodecUtil#checkFooter(IndexInput, Throwable) now throws a
1619  CorruptIndexException if checksums mismatch or if checksums can't be verified.
1620  (Martin Amirault, Adrien Grand)
1621
1622New Features
1623---------------------
1624
1625* LUCENE-7889: Grouping by range based on values from DoubleValuesSource and LongValuesSource
1626  (Alan Woodward)
1627
1628* LUCENE-8962: Add IndexWriter merge-on-commit feature to selectively merge small segments on commit,
1629  subject to a configurable timeout, to improve search performance by reducing the number of small
1630  segments for searching (Michael Froh, Mike Sokolov, Mike Mccandless, Simon Willnauer)
1631
1632Improvements
1633---------------------
1634* LUCENE-9276: Use same code-path for updateDocuments and updateDocument in IndexWriter and
1635  DocumentsWriter. (Simon Willnauer)
1636
1637* LUCENE-9279: Update dictionary version for Ukrainian analyzer to 4.9.1 (Andriy Rysin via Dawid Weiss)
1638
1639* LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values.
1640  (David Smiley, Juan Rodriguez)
1641
1642* LUCENE-9304: Removed ThreadState abstraction from DocumentsWriter which allows pooling of DWPT directly and
1643  improves the approachability of the IndexWriter code. (Simon Willnauer)
1644
1645* LUCENE-9324: Add an ID to SegmentCommitInfo in order to compare commits for equality and make
1646  snapshots incremental on generational files. (Simon Willnauer, Mike Mccandless, Adrien Grand)
1647
1648* LUCENE-9342: TotalHits' relation will be EQUAL_TO when the number of hits is lower than TopDocsColector's numHits
1649  (Tomás Fernández Löbbe)
1650
1651* LUCENE-9353: Metadata of the terms dictionary moved to its own file, with the
1652  `.tmd` extension. This allows checksums of metadata to be verified when
1653  opening indices and helps save seeks when opening an index. (Adrien Grand)
1654
1655* LUCENE-9359: SegmentInfos#readCommit now always returns a
1656  CorruptIndexException if the content of the file is invalid. (Adrien Grand)
1657
1658* LUCENE-9393: Make FunctionScoreQuery use ScoreMode.COMPLETE for creating the inner query weight when
1659  ScoreMode.TOP_DOCS is requested. (Tomás Fernández Löbbe)
1660
1661* LUCENE-9392: Make FacetsConfig.DELIM_CHAR publicly accessible (Ankur Goel)
1662
1663* LUCENE-9397: UniformSplit supports encodable fields metadata. (Bruno Roustant)
1664
1665* LUCENE-9396: Improved truncation detection for points. (Adrien Grand, Robert Muir)
1666
1667* LUCENE-9402: Let MultiCollector handle minCompetitiveScore (Tomás Fernández Löbbe, Adrien Grand)
1668
1669Optimizations
1670---------------------
1671
1672* LUCENE-9254: UniformSplit keeps FST off-heap. (Bruno Roustant)
1673
1674* LUCENE-8103: DoubleValuesSource and QueryValueSource now use a TwoPhaseIterator if one is provided by the Query.
1675  (Michele Palmia, David Smiley)
1676
1677* LUCENE-9287: UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery. (Ignacio Vera)
1678
1679* LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster.
1680  (Bruno Roustant)
1681
1682* LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects (Erick Erickson)
1683
1684* LUCENE-9273: Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic
1685  relate method for all relations, we use specialize methods for each one. In addition, the type of triangle is
1686  computed at deserialization time, therefore we can be more selective when decoding points of a triangle.
1687  (Ignacio Vera)
1688
1689* LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512.
1690  (Ignacio Vera)
1691
1692* LUCENE-9148: Points now write their index in a separate file. (Adrien Grand)
1693
1694Bug Fixes
1695---------------------
1696* LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option (Paul Pazderski)
1697
1698* LUCENE-8849: DocValuesRewriteMethod.visit wasn't visiting its embedded query (Michele Palmia, David Smiley)
1699
1700* LUCENE-9258: DocTermsIndexDocValues assumed it was operating on a SortedDocValues (single valued) field when
1701  it could be multi-valued used with a SortedSetSelector (Michele Palmia)
1702
1703* LUCENE-9164: Ensure IW processes all internal events before it closes itself on a rollback.
1704  (Simon Willnauer, Nhat Nguyen, Dawid Weiss, Mike Mccandless)
1705
1706* LUCENE-8908: Return default value from objectVal when doc doesn't match the query in QueryValueSource
1707  (Bill Bell, hossman, Munendra S N, Michele Palmia)
1708
1709* LUCENE-9133: Fix for potential NPE in TermFilteredPresearcher for empty fields (Marvin Justice via Mike Drob)
1710
1711* LUCENE-9309: Wait for #addIndexes merges when aborting merges. (Simon Willnauer)
1712
1713* LUCENE-9337: Ensure CMS updates it's thread accounting datastructures consistently.
1714  CMS today releases it's lock after finishing a merge before it re-acquires it to update
1715  the thread accounting datastructures. This causes threading issues where concurrently
1716  finishing threads fail to pick up pending merges causing potential thread starvation on
1717  forceMerge calls. (Simon Willnauer)
1718
1719* LUCENE-9314: Single-document monitor runs were using the less efficient MultiDocumentBatch
1720  implementation. (Pierre-Luc Perron, Alan Woodward)
1721
1722* LUCENE-9362: Fix equality check in ExpressionValueSource#rewrite. This fixes rewriting of inner value sources.
1723  (Dmitry Emets)
1724
1725* LUCENE-9405: IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted.
1726  (Michael Froh, Simon Willnauer, Mike Mccandless, Mike Sokolov)
1727
1728* LUCENE-9400: Tessellator might build illegal polygons when several holes share the shame vertex. (Ignacio Vera)
1729
1730* LUCENE-9417: Tessellator might build illegal polygons when several holes share are connected to the same
1731  vertex. (Ignacio Vera)
1732
1733* LUCENE-9418: Fix ordered intervals over interleaved terms (Alan Woodward)
1734
1735Other
1736---------------------
1737
1738* LUCENE-9257: Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed. (Bruno Roustant)
1739
1740* LUCENE-9272: Checksums of the terms index are now verified when
1741  LeafReader#checkIntegrity is called rather than when opening the index.
1742  (Adrien Grand)
1743
1744* LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder. (Namgyu Kim)
1745
1746* LUCENE-9275: Make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera)
1747
1748* LUCENE-9244: Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point
1749  is shared by multiple leaves. (Ignacio Vera)
1750
1751* LUCENE-9271: ByteBufferIndexInput was refactored to work on top of the
1752  ByteBuffer API. (Adrien Grand)
1753
1754* LUCENE-9191: Make LineFileDocs's random seeking more efficient, making tests using LineFileDocs faster (Robert Muir,
1755  Mike McCandless)
1756
1757* LUCENE-9338: Refactors SimpleBindings to improve type safety and cycle detection (Alan Woodward,
1758  Adrien Grand)
1759
1760* LUCENE-9358: Change the way the multi-dimensional BKD tree builder generates the intermediate tree representation to be
1761  equal to the one dimensional case to avoid unnecessary tree and leaves rotation. (Ignacio Vera)
1762
1763* LUCENE-9288: poll_mirrors.py release script can handle HTTPS mirrors. (Ignacio Vera)
1764
1765* LUCENE-9232: Fix or suppress 13 resource leak precommit warnings in lucene/replicator (Andras Salamon via Erick Erickson)
1766
1767* LUCENE-9398: Always keep BKD index off-heap. BKD reader does not implement Accountable any more. (Ignacio Vera)
1768
1769Build
1770
1771* Upgrade forbiddenapis to version 3.0.1.  (Uwe Schindler)
1772
1773* LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search
1774  (Andras Salamon via Erick Erickson)
1775
1776* LUCENE-9380: Fix auxiliary class warnings in Lucene (Erick Erickson)
1777
1778* LUCENE-9389: Enhance gradle logging calls validation: eliminate getMessage() (Andras Salamon via Erick Erickson)
1779
1780======================= Lucene 8.5.2 =======================
1781
1782Optimizations
1783---------------------
1784
1785* LUCENE-9350: Partial reversion of LUCENE-9068; holding levenshtein automata on FuzzyQuery can end
1786  up blowing up query caches which use query objects as cache keys, so building the automata is
1787  now delayed to search time again.  (Alan Woodward, Mike Drob)
1788
1789======================= Lucene 8.5.1 =======================
1790
1791Bug Fixes
1792---------------------
1793
1794* LUCENE-9300: Fix corruption of the new gen field infos when doc values updates are applied on a segment created
1795  externally and added to the index with IndexWriter#addIndexes(Directory). (Jim Ferenczi, Adrien Grand)
1796
1797======================= Lucene 8.5.0 =======================
1798
1799API Changes
1800---------------------
1801
1802* LUCENE-9093: Not an API change but a change in behavior of the UnifiedHighlighter's LengthGoalBreakIterator that will
1803  yield Passages sized a little different due to the fact that the sizing pivot is now the center of the first match and
1804  not its left edge.
1805
1806* LUCENE-9116: PostingsWriterBase and PostingsReaderBase no longer support
1807  setting a field's metadata via a `long[]`. (Adrien Grand)
1808
1809* LUCENE-9116: The FSTOrd postings format has been removed.
1810  (Adrien Grand)
1811
1812* LUCENE-8369: Remove obsolete spatial module. (Nick Knize, David Smiley)
1813
1814* LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core. (Nick Knize)
1815
1816* LUCENE-9218: XY geometries API works in float space. (Ignacio Vera)
1817
1818* LUCENE-9212: Intervals.multiterm() takes CompiledAutomaton rather than plain Automaton
1819  (Alan Woodward)
1820
1821* LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d. (Nick Knize)
1822
1823* LUCENE-9171: QueryBuilder.newTermQuery() and .newSynonymQuery() now take boost parameters.
1824  (Alessandro Benedetti, Alan Woodward)
1825
1826New Features
1827---------------------
1828
1829* LUCENE-8903: Add LatLonShape and XYShape point query. (Ignacio Vera)
1830
1831* LUCENE-8707: Add LatLonShape and XYShape distance query. (Ignacio Vera)
1832
1833* LUCENE-9238: New XYPointField field and Queries for indexing, searching and sorting
1834  cartesian points. (Ignacio Vera)
1835
1836Improvements
1837---------------------
1838
1839* LUCENE-9149: Increase data dimension limit in BKD. (Nick Knize)
1840
1841* LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. (Andy Webb via Bruno Roustant)
1842
1843* LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials (Nándor Mátravölgyi)
1844
1845* LUCENE-9105: UniformSplit postings format detects corrupted index and better handles IO exceptions. (Bruno Roustant)
1846
1847* LUCENE-9106: UniformSplit postings format allows extension of block/line serializers. (Bruno Roustant)
1848
1849* LUCENE-9093: UnifiedHighlighter's LengthGoalBreakIterator has a new fragmentAlignment option to better center the
1850  first match in the passage. Also the sizing point now pivots at the center of the first match term and not its left
1851  edge. This yields Passages that won't be identical to the previous behavior. (Nándor Mátravölgyi, David Smiley)
1852
1853* LUCENE-9153: Allow WhitespaceAnalyzer to set a maxTokenLength other than the default of 255
1854  (Alan Woodward)
1855
1856* LUCENE-9152: Improve line intersections with polygons when they are touching from the outside. (Ignacio Vera)
1857
1858* LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option that controls whether
1859  the tokenizer emits original (compound) tokens when the mode is not NORMAL. (Kazuaki Hiraga via Tomoko Uchida)
1860
1861* LUCENE-9253: KoreanTokenizer now supports custom dictionaries(system, unknown). (Namgyu Kim)
1862
1863* LUCENE-9171: QueryBuilder can now use BoostAttributes on input token streams to selectively
1864  boost particular terms or synonyms in parsed queries. (Alessandro Benedetti, Alan Woodward)
1865
1866* LUCENE-9298: Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared. (Yu Binglei, Simon Willnauer)
1867
1868Optimizations
1869---------------------
1870
1871* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
1872
1873* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
1874
1875* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
1876  single field to the same value. This optimization can reduce the flush time by around
1877  20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
1878
1879* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
1880
1881* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
1882
1883* LUCENE-9260: LeafReader#checkIntegrity verifies checksums of CFS files.
1884  (Adrien Grand)
1885
1886* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)
1887
1888* LUCENE-9113: Faster merging of SORTED/SORTED_SET doc values. (Adrien Grand)
1889
1890* LUCENE-9125: Optimize Automaton.step() with binary search and introduce Automaton.next(). (Bruno Roustant)
1891
1892* LUCENE-9147: The index of stored fields and term vectors in now off-heap.
1893  (Adrien Grand)
1894
1895Bug Fixes
1896---------------------
1897
1898* LUCENE-9084: Fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester (Paul Ward)
1899
1900* LUCENE-9115: NRTCachingDirectory no longer caches files of unknown size.
1901  (Adrien Grand)
1902
1903* LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer.
1904  (Ignacio Vera)
1905
1906* LUCENE-9135: Make UniformSplit FieldMetadata counters long. (Bruno Roustant)
1907
1908* LUCENE-9200: Fix TieredMergePolicy to use double (not float) math to make its merging decisions, fixing
1909  a corner-case bug uncovered by fun randomized tests (Robert Muir, Mike McCandless)
1910
1911* LUCENE-9099: Unordered and Ordered interval queries now correctly handle
1912  repeated subterms - ordered intervals could supply an 'extra' minimized
1913  interval, resulting in odd matches when combined with eg CONTAINS queries;
1914  and unordered intervals would match duplicate subterms on the same position,
1915  so an query for UNORDERED(foo, foo) would match a document containing 'foo'
1916  only once.  (Alan Woodward)
1917
1918* LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. (Ignacio Vera)
1919
1920* LUCENE-9243: Add fudge factor when creating a bounding box of a XYCircle. (Ignacio Vera)
1921
1922* LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (Ignacio Vera)
1923
1924* LUCENE-9251: Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon
1925  were bot filtered out properly. (Ignacio Vera)
1926
1927* LUCENE-9263: Fix wrong transformation of distance in meters to radians in Geo3DPoint. (Ignacio Vera)
1928
1929Other
1930---------------------
1931
1932* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
1933  TestSecurityManager (Uwe Schindler)
1934
1935* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
1936  LuceneTestCase methods (Uwe Schindler)
1937
1938* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
1939  executed with input objects that extend such interface. (Ignacio Vera)
1940
1941* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
1942  executed with input objects that extend such interface. (Ignacio Vera)
1943
1944* LUCENE-9096: Simplification of CompressingTermVectorsWriter#flushOffsets.
1945  (kkewwei via Adrien Grand)
1946
1947* LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection. (Ignacio Vera)
1948
1949======================= Lucene 8.4.1 =======================
1950
1951Bug Fixes
1952---------------------
1953(No changes)
1954
1955======================= Lucene 8.4.0 =======================
1956
1957API Changes
1958
1959* LUCENE-9029: Deprecate SloppyMath toRadians/toDegrees in favor of Java Math.
1960  (Jack Conradson via Adrien Grand)
1961
1962New Features
1963
1964* LUCENE-8620: Add CONTAINS support for LatLonShape and XYShape. (Ignacio Vera)
1965
1966Improvements
1967
1968* LUCENE-9002: Skip costly caching clause in LRUQueryCache if it makes the query
1969  many times slower. (Guoqiang Jiang)
1970
1971* LUCENE-9006: WordDelimiterGraphFilter's catenateAll token is now ordered before any token parts, like WDF did.
1972  (David Smiley)
1973
1974* LUCENE-9028: introducing Intervals.multiterm() (Mikhail Khludnev)
1975
1976* LUCENE-9018: ConcatenateGraphFilter now has a configurable separator. (Stanislav Mikulchik, David Smiley)
1977
1978* LUCENE-9036: ExitableDirectoryReader may interupt scaning over DocValues (Mikhail Khludnev)
1979
1980* LUCENE-9062: QueryVisitor now has a consumeTermsMatching() method, allowing queries
1981  that match a class of terms to pass a ByteRunAutomaton matching those that class
1982  back to the visitor. (Alan Woodward, David Smiley)
1983
1984* LUCENE-9073: IntervalQuery to respond field on toString() and explain() (Mikhail Khludnev)
1985
1986Optimizations
1987
1988* LUCENE-8928: When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits
1989  to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4.
1990  (Ignacio Vera, Adrien Grand)
1991
1992* BaseDirectoryReader no longer sums up the `LeafReader#numDocs` of its leaves
1993  eagerly. This especially helps when creating views of readers that hide
1994  documents, since computing the number of live documents is an expensive
1995  operation. (Adrien Grand)
1996
1997* LUCENE-8992: TopFieldCollector and TopScoreDocCollector can now share minimum scores across leaves
1998  concurrently. (Adrien Grand, Atri Sharma, Jim Ferenczi)
1999
2000* LUCENE-8932: BKDReader's index is now stored off-heap when the IndexInput is
2001  an instance of ByteBufferIndexInput. (Jack Conradson via Adrien Grand)
2002
2003* LUCENE-9024: IntroSelector now falls back to the median of medians algorithm
2004  instead of sorting when the maximum recursion level is exceeded, providing
2005  better worst-case runtime. (Paul Sanwald via Adrien Grand)
2006
2007* LUCENE-8920: The denser arcs of FST now index labels with a bitset in order
2008  to provide near constant time access. (Bruno Roustant, Mike Sokolov via Adrien Grand)
2009
2010* LUCENE-9027: Use SIMD instructions to decode postings. (Adrien Grand)
2011
2012* LUCENE-9049: Remove FST cached root arcs now redundant with labels indexed by bitset.
2013  This frees some on-heap FST space. (Jack Conradson via Bruno Roustant)
2014
2015* LUCENE-9045: Do not use TreeMap/TreeSet in BlockTree and PerFieldPostingsFormat. (Bruno Roustant)
2016
2017Bug Fixes
2018
2019* LUCENE-9001: Fix race condition in SetOnce. (Przemko Robakowski)
2020
2021* LUCENE-9030: Fix WordnetSynonymParser behaviour so it behaves similar to
2022  SolrSynonymParser. (Christoph Buescher via Alan Woodward)
2023
2024* LUCENE-9054: Fix reproduceJenkinsFailures.py to not overwrite junit XML files when retrying (hossman)
2025
2026* LUCENE-9031: UnsupportedOperationException on MatchesIterator.getQuery() (Alan Woodward, Mikhail Khludnev)
2027
2028* LUCENE-8996: maxScore was sometimes missing from distributed grouped responses.
2029  (Julien Massenet, Diego Ceccarelli, Munendra S N, Christine Poerschke)
2030
2031* LUCENE-9055: Fix the detection of lines crossing triangles through edge points.
2032  (Ignacio Vera)
2033
2034* LUCENE-9103: Disjunctions can miss some hits in some rare conditions. (Adrien Grand)
2035
2036Other
2037
2038* LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - Part 2 (Koen De Groote)
2039
2040* LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). (Koen De Groote)
2041
2042* LUCENE-8746: Refactor EdgeTree - Introduce a Component tree that represents the tree of components (e.g polygons).
2043  Edge tree is now just a tree of edges. (Ignacio Vera)
2044
2045* LUCENE-9046: Fix wrong example in Javadoc of TermInSetQuery (Namgyu Kim)
2046
2047* LUCENE-8983: Add sandbox PhraseWildcardQuery to control multi-terms expansions in a phrase. (Bruno Roustant)
2048
2049* LUCENE-9067: Polygon2D#contains() is now thread safe. (Ignacio Vera)
2050
2051Build
2052
2053* Upgrade forbiddenapis to version 2.7; upgrade Groovy to 2.4.17.  (Uwe Schindler)
2054
2055* LUCENE-9041: Upgrade ecj to 3.19.0 to fix sporadic precommit javadoc issues (Kevin Risden)
2056
2057======================= Lucene 8.3.1 =======================
2058
2059Bug Fixes
2060
2061* LUCENE-9050: MultiTermIntervalsSource.visit() was not calling back to its
2062  visitor. (Alan Woodward)
2063
2064======================= Lucene 8.3.0 =======================
2065
2066API Changes
2067
2068* LUCENE-8909: IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316, this
2069  method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method. (Adrien Grand, Munendra S N)
2070
2071* LUCENE-8755: SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class.  The quad
2072  and packed quad prefix trees are sensitive to this.  It's recommended to pass the version like you
2073  should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions
2074  may be incompatible with older indexes.  (Chongchen Chen, David Smiley)
2075
2076* LUCENE-8956: QueryRescorer now only sorts the first topN hits instead of all
2077  initial hits. (Paul Sanwald via Adrien Grand)
2078
2079* LUCENE-8921: IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq.
2080  And don't call if docFreq <= 0.  The previous implementation survives as deprecated and final.  It's removed in 9.0.
2081  (Bruno Roustant, David Smiley, Alan Woodward)
2082
2083* LUCENE-8990: PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by
2084  the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of
2085  PointValues#estimatePointCount(visitor). (Ignacio Vera, Adrien Grand)
2086
2087New Features
2088
2089* LUCENE-8936: Add SpanishMinimalStemFilter (vinod kumar via Tomoko Uchida)
2090
2091* LUCENE-8764 LUCENE-8945: Add "export all terms and doc freqs" feature to Luke with delimiters. (Leonardo Menezes, Amish Shah via Tomoko Uchida)
2092
2093* LUCENE-8747: Composite Matches from multiple subqueries now allow access to
2094  their submatches, and a new NamedMatches API allows marking of subqueries
2095  and a simple way to find which subqueries have matched on a given document
2096  (Alan Woodward, Jim Ferenczi)
2097
2098* LUCENE-8769: Introduce Range Query For Multiple Connected Ranges (Atri Sharma)
2099
2100* LUCENE-8960: Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField (Ignacio Vera)
2101
2102* LUCENE-8753: New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and
2103  extensibility.  New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal
2104  term dictionary across fields.  (Bruno Roustant, Juan Rodriguez, David Smiley)
2105
2106Improvements
2107
2108* LUCENE-8874: Show SPI names instead of class names in Luke Analysis tab. (Tomoko Uchida)
2109
2110* LUCENE-8894: Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes. (Tomoko Uchida)
2111
2112* LUCENE-8914: move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor
2113  so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
2114
2115* LUCENE-8955: move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor
2116  so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
2117
2118* LUCENE-8918: PhraseQuery throws exceptions at construction time if it is passed
2119  null arguments. (Alan Woodward)
2120
2121* LUCENE-8916: GraphTokenStreamFiniteStrings preserves all Token attributes
2122  through its finite strings TokenStreams (Alan Woodward)
2123
2124* LUCENE-8906: Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it.
2125  (Bruno Roustant)
2126
2127* LUCENE-8942: Remove redundant parameters and improve visibility strictness in
2128  LRUQueryCache (Atri Sharma)
2129
2130* SOLR-13663: Introduce <SpanPositionRange> into XML Query Parser (Alessandro Benedetti via Mikhail Khludnev)
2131
2132* LUCENE-8952: Use a sort key instead of true distance in NearestNeighbor (Julie Tibshirani).
2133
2134* LUCENE-8620: Tessellator labels the edges of the generated triangles whether they belong to
2135  the original polygon. This information is added to the triangle encoding. (Ignacio Vera)
2136
2137* LUCENE-8964: Fix geojson shape parsing on string arrays in properties
2138  (Alexander Reelsen)
2139
2140* LUCENE-8976: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor. (Ignacio Vera)
2141
2142* LUCENE-8966: The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters. (Jim Ferenczi)
2143
2144* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (Andy Hind via Anshum Gupta)
2145
2146Optimizations
2147
2148* LUCENE-8922: DisjunctionMaxQuery more efficiently leverages impacts to skip
2149  non-competitive hits. (Adrien Grand)
2150
2151* LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when
2152  the total hits is not requested. (Jim Ferenczi)
2153
2154* LUCENE-8941: Matches on wildcard queries will defer building their full
2155  disjunction until a MatchesIterator is pulled (Alan Woodward)
2156
2157* LUCENE-8755: spatial-extras quad and packed quad prefix trees now index points faster.
2158  (Chongchen Chen, David Smiley)
2159
2160* LUCENE-8860: add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery.
2161  (Igor Motov via Ignacio Vera)
2162
2163* LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries by
2164  doing just one pass whenever possible. (Ignacio Vera)
2165
2166* LUCENE-8939: Introduce shared count based early termination across multiple slices
2167  (Atri Sharma)
2168
2169* LUCENE-8980: Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment.
2170  Large perf gain for ID/time like data when populated sequentially.  (Guoqiang Jiang)
2171
2172Bug Fixes
2173
2174* LUCENE-8755: spatial-extras quad and packed quad prefix trees could throw a
2175  NullPointerException for certain cell edge coordinates (Chongchen Chen, David Smiley)
2176
2177* LUCENE-9005: BooleanQuery.visit() would pull subVisitors from its parent visitor, rather
2178  than from a visitor for its own specific query.  This could cause problems when BQ was
2179  nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST
2180  subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from
2181  the parent.  (Alan Woodward)
2182
2183Other
2184
2185* LUCENE-8778 LUCENE-8911 LUCENE-8957: Define analyzer SPI names as static final fields and document the names in Javadocs.
2186  (Tomoko Uchida, Uwe Schindler)
2187
2188* LUCENE-8758: QuadPrefixTree: removed levelS and levelN fields which weren't used. (Amish Shah)
2189
2190* LUCENE-8975: Code Cleanup: Use entryset for map iteration wherever possible. (Koen De Groote)
2191
2192* LUCENE-8993, LUCENE-8807: Changed all repository and download references in build files
2193  to HTTPS. (Uwe Schindler)
2194
2195* LUCENE-8998: Fix OverviewImplTest.testIsOptimized reproducible failure. (Tomoko Uchida)
2196
2197* LUCENE-8999: LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test
2198  w/o wrapping in a new assertion failure unless the caller has explicitly expected them (hossman)
2199
2200* LUCENE-8062: GlobalOrdinalsWithScoreQuery is no longer eligible for query caching. (Jim Ferenczi)
2201
2202======================= Lucene 8.2.0 =======================
2203
2204API Changes
2205
2206* LUCENE-8865: IndexSearcher now uses Executor instead of ExecutorSerivce.
2207  This change is fully backwards compatible since ExecutorService directly
2208  implements Executor. (Simon Willnauer)
2209
2210* LUCENE-8856: Intervals queries have moved from the sandbox to the queries
2211  module. (Alan Woodward)
2212
2213* LUCENE-8893: Intervals.wildcard() and Intervals.prefix() methods now take
2214  BytesRef rather than String. (Alan Woodward)
2215
2216New Features
2217
2218* LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian
2219  geometries. (Nick Knize)
2220
2221* LUCENE-8891: Snowball stemmer/analyzer for the Estonian language.
2222 (Gert Morten Paimla via Tomoko Uchida)
2223
2224* LUCENE-8815: Provide a DoubleValues implementation for retrieving the value of features without
2225  requiring a separate numeric field. Note that as feature values are stored with only 8 bits of
2226  mantissa the values returned may have a delta from the original values indexed.
2227  (Colin Goodheart-Smithe via Adrien Grand)
2228
2229* LUCENE-8803: Provide a FeatureSortfield to allow sorting search hits by descending value of a
2230  feature. This is exposed via the factory method FeatureField#newFeatureSort.
2231  (Colin Goodheart-Smithe via Adrien Grand)
2232
2233* LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set
2234  to false (defaults to true).
2235  (Namgyu Kim via Jim Ferenczi)
2236
2237* LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number
2238  and process decimal point. It is similar to the JapaneseNumberFilter.
2239  (Namgyu Kim)
2240
2241* LUCENE-8362: Add doc-value support to range fields. (Atri Sharma via Adrien Grand)
2242
2243* LUCENE-8766: Add monitor subproject (previously Luwak monitoring library). This
2244  allows a stream of documents to be matched against a set of registered queries
2245  in an efficient manner, for use as a monitoring or classification tool.
2246  (Alan Woodward)
2247
2248* LUCENE-7714: Add a numeric range query in sandbox that takes advantage of index sorting.
2249  (Julie Tibshirani via Jim Ferenczi)
2250
2251* LUCENE-8859: The completion suggester's postings format now have an option to
2252  load its internal FST off-heap. (Jim Ferenczi)
2253
2254Bug Fixes
2255
2256* LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode methods. (Ignacio Vera)
2257
2258* LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex
2259  with the polygon. (Ignacio Vera)
2260
2261* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
2262  This causes assertion errors and potentially broken field attributes in the IndexWriter when
2263  IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
2264
2265* LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances.
2266  (Vamshi Vijay Nakkirtha via Adrien Grand)
2267
2268* LUCENE-8828: Removes the buggy 'disallow overlaps' boolean from Intervals.unordered(),
2269  and replaces it with a new Intervals.unorderedNoOverlaps() method (Alan Woodward)
2270
2271* LUCENE-8843: Don't ignore exceptions that are thrown when trying to open a
2272  file in IOUtils#fsync. (Jason Tedor via Adrien Grand)
2273
2274* LUCENE-8835: FileSwitchDirectory now respects the file extension when listing directory
2275  contents to ensure we don't expose pending deletes if both directory point to the same
2276  underlying filesystem directory. (Simon Willnauer)
2277
2278* LUCENE-8853: FileSwitchDirectory now applies best effort to place tmp files in the same
2279  directory as the target files. (Simon Willnauer)
2280
2281* LUCENE-8892: Add missing closing parentheses in MultiBoolFunction's description() (Florian Diebold, Munendra S N)
2282
2283Improvements
2284
2285* LUCENE-7840: Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier
2286  as opposed to eliminating them during scoring construction. (Atri Sharma via Jim Ferenczi)
2287
2288* LUCENE-8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid
2289  executing the second phase when scorers don't intersect. (Adrien Grand, Jim Ferenczi)
2290
2291* LUCENE-8818: Fix smokeTestRelease.py encoding bug (janhoy)
2292
2293* LUCENE-8845: Allow Intervals.prefix() and Intervals.wildcard() to specify
2294  their maximum allowed expansions (Alan Woodward)
2295
2296* LUCENE-8875: Introduce a Collector optimized for use cases when large
2297  number of hits are requested (Atri Sharma)
2298
2299* LUCENE-8848 LUCENE-7757 LUCENE-8492: The UnifiedHighlighter now detects that parts of the query are not understood by
2300  it, and thus it should not make optimizations that result in no highlights or slow highlighting.  This generally works
2301  best for WEIGHT_MATCHES mode.  Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser
2302  will now highlight correctly. (David Smiley)
2303
2304* LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps. (Jun Ohtani via Tomoko Uchida)
2305
2306* LUCENE-8855: Add Accountable to some Query implementations (ab, Adrien Grand)
2307
2308Optimizations
2309
2310* LUCENE-8796: Use exponential search instead of binary search in
2311  IntArrayDocIdSet#advance method (Luca Cavanna via Adrien Grand)
2312
2313* LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor.
2314  Now caller threads execute at least one search on an index even if there is
2315  an executor provided to minimize thread context switching. (Simon Willnauer)
2316
2317* LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality.
2318  It stores the distinct values once with the cardinality value reducing the
2319  storage cost. (Ignacio Vera)
2320
2321* LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored
2322  on leaves. (Ignacio Vera)
2323
2324* LUCENE-8896: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[])
2325  for several queries. (Ignacio Vera)
2326
2327* LUCENE-8901: Load frequencies lazily only when needed in BlockDocsEnum and
2328  BlockImpactsEverythingEnum (Mayya Sharipova).
2329
2330* LUCENE-8888: Optimize distribution of points with data dimensions in
2331  BKD tree leaves. (Ignacio Vera)
2332
2333* LUCENE-8311: Phrase queries now leverage impacts. (Adrien Grand)
2334
2335Test Framework
2336
2337* LUCENE-8825: CheckHits now display the shard index in case of mismatch
2338  between top hits. (Atri Sharma via Adrien Grand)
2339
2340Other
2341
2342* LUCENE-8847: Code Cleanup: Remove StringBuilder.append with concatenated
2343  strings. (Koen De Groote via Uwe Schindler)
2344
2345* LUCENE-8861: Script to find open Github PRs that needs attention (janhoy)
2346
2347* LUCENE-8852: ReleaseWizard tool for release managers (janhoy)
2348
2349* LUCENE-8838: Remove support for Steiner points on Tessellator. (Ignacio Vera)
2350
2351* LUCENE-8879: Improve BKDRadixSelector tests. (Ignacio Vera)
2352
2353* LUCENE-8886: Fix TestMutablePointsReaderUtils tests. (Ignacio Vera)
2354
2355======================= Lucene 8.1.1 =======================
2356Improvements
2357
2358* LUCENE-8781: FST lookup performance has been improved in many cases by
2359  encoding Arcs using full-sized arrays with gaps. The new encoding is
2360  enabled for postings in the default codec and for suggesters. (Mike Sokolov)
2361
2362
2363======================= Lucene 8.1.0 =======================
2364
2365API Changes
2366
2367* LUCENE-3041: A query introspection API has been added.  Queries should
2368  implement a visit() method, taking a QueryVisitor, and either pass the
2369  visitor down to any child queries, or call a visitX() or consumeX() method
2370  on it.  All locations in the code that called Weight.extractTerms()
2371  have been changed to use this API, and the extractTerms() method has
2372  been deprecated. (Alan Woodward, Simon Willnauer, David Smiley, Luca
2373  Cavanna)
2374
2375* LUCENE-8735: Directory.getPendingDeletions is now abstract to ensure
2376  subclasses override it. FilterDirectory now delegates the call, ensuring
2377  correct default behaviour for subclasses. (Henning Andersen)
2378
2379New Features
2380
2381* LUCENE-2562: The well-known graphical user interface for inspecting Lucene
2382  indexes "Luke" was added as a Lucene module. It can be started from the
2383  binary distribution by calling the shell scripts in the module folder
2384  or from the source checkout by using `ant -f lucene/luke/build.xml run`.
2385  Luke provides a Swing-based user interface and can be used to open
2386  Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index
2387  commits and segments, or test (custom) analyzers. It also has maintenance
2388  functions to check index structures and force merge indexes for archival.
2389  Luke was originally developed by Andrzej Bialecki, later maintained by
2390  Dmitry Kan and finally rewritten by Tomoko Uchida to use the ASF licensing
2391  compatible Swing framework (as shipped with JDKs).
2392  (Tomoko Uchida, Uwe Schindler)
2393
2394Bug fixes
2395
2396* LUCENE-8736: LatLonShapePolygonQuery returns incorrect WITHIN results
2397  with shared boundaries. Point in Polygon now correctly includes boundary
2398  points. Box and Polygon relations with triangles have also been improved to
2399  correctly include boundary points. (Nick Knize)
2400
2401* LUCENE-8712: Polygon2D does not detect crossings through segment edges.
2402  (Ignacio Vera)
2403
2404* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
2405  overflow bug that disabled cleaning of the cache (Russell A Brown)
2406
2407* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
2408  IndexSearcher (Alan Woodward, Yury Pakhomov)
2409
2410* LUCENE-8719: FixedShingleFilter can miss shingles at the end of a token stream if
2411  there are multiple paths with different lengths. (Alan Woodward)
2412
2413* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
2414  cheapest merges that allow the index to go down to `maxSegmentCount` segments
2415  or less. (Armin Braun via Adrien Grand)
2416
2417* LUCENE-8477: Interval disjunctions could miss valid hits if some of the
2418  clauses of the disjunction are minimized away.  We now rewrite intervals
2419  if a source contains a disjunction and the internal gaps matter for
2420  matching.  This behaviour can be disabled if users are more interested
2421  in speed rather than accuracy of matching. (Alan Woodward, Jim Ferenczi)
2422
2423* LUCENE-8741: ValueSource.fromDoubleValuesSource() was casting to
2424  Scorer instead of Scorable, leading to ClassCastExceptions (Markus Jelsma,
2425  Alan Woodward)
2426
2427* LUCENE-8754: Fix ConcurrentModificationException in SegmentInfo if
2428  attributes are accessed in MergePolicy while the merge is running (Simon Willnauer)
2429
2430* LUCENE-8765: Fixed validation of the number of added points in KD trees.
2431  (Zhao Yang via Adrien Grand)
2432
2433Improvements
2434
2435* LUCENE-8673: Use radix partitioning when merging dimensional points instead
2436  of sorting all dimensions before hand. (Ignacio Vera, Adrien Grand)
2437
2438* LUCENE-8687: Optimise radix partitioning for points on heap. (Ignacio Vera)
2439
2440* LUCENE-8699: Change HeapPointWriter to use a single byte array instead to a list
2441  of byte arrays. In addition a new interface PointValue is added to abstract out
2442  the different formats between offline and on-heap writers. (Ignacio Vera)
2443
2444* LUCENE-8703: Build point writers in the BKD tree only when they are needed.
2445  (Ignacio Vera)
2446
2447* LUCENE-8652: SynonymQuery can now deboost the document frequency of each term when
2448  blending the score of the synonym. (Jim Ferenczi)
2449
2450* LUCENE-8631: The Korean's user dictionary now picks the longest-matching word and discards
2451  the other matches. (Yeongsu Kim via Jim Ferenczi)
2452
2453* LUCENE-8732: ConstantScoreQuery can now early terminate the query if the minimum score is
2454  greater than the constant score and total hits are not requested. (Jim Ferenczi)
2455
2456* LUCENE-8750: Implements setMissingValue() on sort fields produced from
2457  DoubleValuesSource and LongValuesSource (Mike Sokolov via Alan Woodward)
2458
2459* LUCENE-8701: ToParentBlockJoinQuery now creates a child scorer that disallows skipping over
2460  non-competitive documents if the score of a parent depends on the score of multiple
2461  children (avg, max, min). Additionally the score mode `none` that assigns a constant score to
2462  each parent can early terminate top scores's collection. (Jim Ferenczi)
2463
2464* LUCENE-8751: Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1
2465  (single document). (Jim Ferenczi)
2466
2467* LUCENE-8752: Japanese new era name '令和' (Reiwa) is added to the dictionary used in
2468  JapaneseTokenizer so that the analyzer handles the era name correctly.
2469  Reiwa is set to replace the Heisei Era on May 1, 2019. (Tomoko Uchida)
2470
2471* LUCENE-8671: Introduced reader attributes allows a per IndexReader configuration
2472  of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a
2473  per field basis (Simon Willnauer)
2474
2475* LUCENE-8787: spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3
2476  digits of milliseconds precision but now parses other lengths (although > 3 not used).
2477  (Thomas Lemmé via David Smiley)
2478
2479Changes in Runtime Behavior
2480
2481* LUCENE-8671: Load FST off-heap also for ID-like fields if reader is not opened
2482  from an IndexWriter. (Simon Willnauer)
2483
2484* LUCENE-8730: WordDelimiterGraphFilter always emits its original token first.  This
2485  brings its behaviour into line with the deprecated WordDelimiterFilter, so that
2486  the only difference in output between the two is in the position length
2487  attribute.  (Alan Woodward, Jim Ferenczi)
2488
2489* LUCENE-7386: Disjunctions nested in disjunctions are now flattened. This might
2490  trigger changes in the produced scores due to changes to the order in which
2491  scores of sub clauses are summed up. (Adrien Grand)
2492
2493* LUCENE-8756: MoreLikeThisQuery now respects custom term frequencies
2494  (TermFrequencyAttribute) at search time (Olli Kuonanoja)
2495
2496Other
2497
2498* LUCENE-8680: Refactor EdgeTree#relateTriangle method. (Ignacio Vera)
2499
2500* LUCENE-8685: Refactor LatLonShape tests. (Ignacio Vera)
2501
2502* LUCENE-8713: Add Line2D tests. (Ignacio Vera)
2503
2504* LUCENE-8729: Workaround: Disable accessibility doclints (Java 13+),
2505  so compilation with recent JDK succeeds.  (Uwe Schindler)
2506
2507* LUCENE-8725: Make TermsQuery.SeekingTermSetTermsEnum a top level class and public (noble)
2508
2509======================= Lucene 8.0.0 =======================
2510
2511API Changes
2512
2513* LUCENE-8662: TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef)
2514  in FilterLeafReader.FilterTermsEnum. (Jeffery Yuan via Tomás Fernández Löbbe, Simon Willnauer)
2515
2516* LUCENE-8469: Deprecated StringHelper.compare has been removed. (Dawid Weiss)
2517
2518* LUCENE-8039: Introduce a "delta distance" method set to GeoDistance.  This
2519  allows distance calculations, especially for paths, to take into account an
2520  "excursion" to include the specified point.
2521
2522* LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are
2523  now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq()
2524  and Terms.getSumTotalTermFreq() are now required: if frequencies are not
2525  stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(),
2526  respectively, because all freq() values equal 1. (Adrien Grand, Robert Muir)
2527
2528* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan
2529  Woodward)
2530
2531* LUCENE-8014: Similarity.computeSlopFactor() and
2532  Similarity.computePayloadFactor() have been removed (Alan Woodward)
2533
2534* LUCENE-7996: Queries are now required to produce positive scores.
2535  (Adrien Grand)
2536
2537* LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been
2538  removed (Alan Woodward)
2539
2540* LUCENE-8012: Explanation now takes Number rather than float (Alan Woodward,
2541  Robert Muir)
2542
2543* LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document
2544  scoring factors. (Adrien Grand)
2545
2546* LUCENE-8113: TermContext has been renamed to TermStates, and can now be
2547  constructed lazily if term statistics are not required (Alan Woodward)
2548
2549* LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has
2550  been removed (Alan Woodward)
2551
2552* LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings,
2553  MemoryDocValues). (Dawid Weiss)
2554
2555* LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework.
2556  (Nhat Nguyen via Adrien Grand)
2557
2558* LUCENE-8356: StandardFilter and StandardFilterFactory have been removed
2559  (Alan Woodward)
2560
2561* LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed
2562  (Alan Woodward)
2563
2564* LUCENE-8388: Unused PostingsEnum#attributes() method has been removed
2565  (Alan Woodward)
2566
2567* LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector
2568  no longer have an option to compute the maximum score when sorting by field.
2569  (Adrien Grand)
2570
2571* LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now
2572  always fills fields. (Adrien Grand)
2573
2574* LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores
2575  need to be set on top hits via TopFieldCollector#populateScores instead.
2576  (Adrien Grand)
2577
2578* LUCENE-6228: A new Scorable abstract class has been added, containing only those
2579  methods from Scorer that should be called from Collectors.  LeafCollector.setScorer()
2580  now takes a Scorable rather than a Scorer. (Alan Woodward, Adrien Grand)
2581
2582* LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator.
2583  (Dimitrios Athanasiou)
2584
2585* LUCENE-8483: Scorers may no longer take null as a Weight (Alan Woodward)
2586
2587* LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer<Reader>
2588  in its constructor (Mark Harwood, Alan Woodward, Adrien Grand)
2589
2590* LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer
2591  takes a normalizer function. (Alan Woodward)
2592
2593* LUCENE-7875: Moved MultiFields static methods out of the class.  getLiveDocs is now
2594  in MultiBits which is now public.  getMergedFieldInfos and getIndexedFields are now in
2595  FieldInfos.  getTerms is now in MultiTerms.  getTermPositionsEnum and getTermDocsEnum
2596  were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms.
2597  (David Smiley)
2598
2599* LUCENE-8513: MultiFields.getFields is now removed.  Please avoid this class,
2600  and Fields in general, when possible. (David Smiley)
2601
2602* LUCENE-8497: MultiTermAwareComponent has been removed, and in its place
2603  TokenFilterFactory and CharFilterFactory now expose type-safe normalize()
2604  methods.  This decouples normalization from tokenization entirely.
2605  (Mayya Sharipova, Alan Woodward)
2606
2607* LUCENE-8597: IntervalIterator now exposes a gaps() method that reports the
2608  number of gaps between its component sub-intervals.  This can be used in a
2609  new filter available via Intervals.maxgaps().  (Alan Woodward)
2610
2611* LUCENE-8609: Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor
2612  of IndexWriter#getDocStats(). (Simon Willnauer)
2613
2614* LUCENE-8292: Make TermsEnum fully abstract. (Simon Willnauer)
2615
2616Changes in Runtime Behavior
2617
2618* LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of
2619  numDocs. (Robert Muir, Dawid Weiss).
2620
2621* LUCENE-7837: Indices that were created before the previous major version
2622  will now fail to open even if they have been merged with the previous major
2623  version. (Adrien Grand)
2624
2625* LUCENE-8020: Similarities are no longer passed terms that don't exist by
2626  queries such as SpanOrQuery, so scoring formulas no longer require
2627  divide-by-zero hacks.  IndexSearcher.termStatistics/collectionStatistics return null
2628  instead of returning bogus values for a non-existent term or field. (Robert Muir)
2629
2630* LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0
2631  when the function produces a negative value. (Adrien Grand)
2632
2633* LUCENE-8116: Similarities now score fields that omit norms as if the norm was
2634  1. This might change score values on fields that omit norms. (Adrien Grand)
2635
2636* LUCENE-8134: Index options are no longer automatically downgraded.
2637  (Adrien Grand)
2638
2639* LUCENE-8031: Length normalization correctly reflects omission of term frequencies.
2640  (Robert Muir, Adrien Grand)
2641
2642* LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords
2643  (Alan Woodward)
2644
2645* LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute
2646  total hit counts accurately up to 1,000 in order to enable top-hits
2647  optimizations such as block-max WAND (LUCENE-8135). (Adrien Grand)
2648
2649* LUCENE-8505: IndexWriter#addIndices will now fail if the target index is sorted but
2650  the candidate is not. (Jim Ferenczi)
2651
2652* LUCENE-8535: Highlighter and FVH doesn't support ToParent and ToChildBlockJoinQuery out of the
2653  box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery
2654  should be used. (Simon Willnauer, Jim Ferenczi, Julie Tibshirani)
2655
2656* LUCENE-8563: BM25 scores don't include the (k1+1) factor in their numerator
2657  anymore. This doesn't affect ordering as this is a constant factor which is
2658  the same for every document. (Luca Cavanna via Adrien Grand)
2659
2660* LUCENE-8509: WordDelimiterGraphFilter will no longer set the offsets of internal
2661  tokens by default, preventing a number of bugs when the filter is chained with
2662  tokenfilters that change the length of their tokens (Alan Woodward)
2663
2664* LUCENE-8633: IntervalQuery scores do not use term weighting any more, the score
2665  is instead calculated as a function of the sloppy frequency of the matching
2666  intervals. (Alan Woodward, Jim Ferenczi)
2667
2668* LUCENE-8635: FSTs can now remain off-heap, accessed via
2669  IndexInput, and the default codec's term dictionary
2670  (BlockTreeTermsReader) will now leave the FST for the terms index
2671  off-heap for non-primary-key fields using MMapDirectory, reducing
2672  heap usage for such fields. (Ankit Jain)
2673
2674New Features
2675
2676* LUCENE-8340: LongPoint#newDistanceFeatureQuery may be used to boost scores based on
2677  how close a value of a long field is from an configurable origin. This is
2678  typically useful to boost by recency. (Adrien Grand)
2679
2680* LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores
2681  based on the haversine distance of a LatLonPoint field to a provided point. This is
2682  typically useful to boost by distance. (Ignacio Vera)
2683
2684* LUCENE-8216: Added a new BM25FQuery in sandbox to blend statistics across several fields
2685  using the BM25F formula. (Adrien Grand, Jim Ferenczi)
2686
2687* LUCENE-8564: GraphTokenFilter is an abstract class useful for token filters that need
2688  to read-ahead in the token stream and take into account graph structures.  This
2689  also changes FixedShingleFilter to extend GraphTokenFilter (Alan Woodward)
2690
2691* LUCENE-8612: Intervals.extend() treats an interval as if it covered a wider
2692  span than it actually does, allowing users to force minimum gaps between
2693  intervals in a phrase. (Alan Woodward)
2694
2695* LUCENE-8629: New interval functions: Intervals.before(), Intervals.after(),
2696  Intervals.within() and Intervals.overlapping(). (Alan Woodward)
2697
2698* LUCENE-8622: Adds a minimum-should-match interval function that produces intervals
2699  spanning a subset of a set of sources. (Alan Woodward)
2700
2701* LUCENE-8645: Intervals.fixField() allows you to report intervals from one field
2702  as if they came from another. (Alan Woodward)
2703
2704* LUCENE-8646: New interval functions: Intervals.prefix() and Intervals.wildcard()
2705  (Alan Woodward)
2706
2707* LUCENE-8655: Add a getter in FunctionScoreQuery class in order to access to the
2708  underlying DoubleValuesSource. (Gérald Quaire via Alan Woodward)
2709
2710* LUCENE-8697: GraphTokenStreamFiniteStrings correctly handles side paths
2711  containing gaps (Alan Woodward)
2712
2713* LUCENE-8702: Simplify intervals returned from vararg Intervals factory methods
2714  (Alan Woodward)
2715
2716Improvements
2717
2718* LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities.
2719  SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues.
2720  Add missing range checks for similarity parameters.
2721  Improve BM25 and ClassicSimilarity's explanations. (Robert Muir)
2722
2723* LUCENE-8011: Improved similarity explanations.
2724  (Mayya Sharipova via Adrien Grand)
2725
2726* LUCENE-4198: Codecs now have the ability to index score impacts.
2727  (Adrien Grand)
2728
2729* LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in
2730  order to speed up selection of top scored documents. (Adrien Grand)
2731
2732* LUCENE-8279: CheckIndex now cross-checks terms with norms. (Adrien Grand)
2733
2734* LUCENE-8660: TopDocsCollectors now return an accurate count (instead of a lower bound)
2735  if the total hit count is equal to the provided threshold. (Adrien Grand, Jim Ferenczi)
2736
2737Optimizations
2738
2739* LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms
2740  (David Smiley, Robert Muir)
2741
2742* LUCENE-4100: Disjunctions now support faster collection of top hits when the
2743  total hit count is not required. (Stefan Pohl, Adrien Grand, Robert Muir)
2744
2745* LUCENE-7993: Phrase queries are now faster if total hit counts are not
2746  required. (Adrien Grand)
2747
2748* LUCENE-8109: Boolean queries propagate information about the minimum
2749  competitive score in order to make collection faster if there are disjunctions
2750  or phrase queries as sub queries, which know how to leverage this information
2751  to run faster. (Adrien Grand)
2752
2753* LUCENE-8439: Disjunction max queries can skip blocks to select the top documents
2754  if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
2755
2756* LUCENE-8204: Boolean queries with a mix of required and optional clauses are
2757  now faster if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
2758
2759* LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers.
2760  (Jim Ferenczi, Adrien Grand)
2761
2762* LUCENE-8511: MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos
2763  (David Smiley)
2764
2765* LUCENE-8507: TopFieldCollector can now update the minimum competitive score if the primary sort
2766  is by relevancy and the total hit count is not required. (Jim Ferenczi)
2767
2768* LUCENE-8464: ConstantScoreScorer now implements setMinCompetitveScore in order
2769  to early terminate the iterator if the minimum score is greater than the constant
2770  score. (Christophe Bismuth via Jim Ferenczi)
2771
2772* LUCENE-8607: MatchAllDocsQuery can shortcut when total hit count is not
2773  required (Alan Woodward, Adrien Grand)
2774
2775* LUCENE-8585: Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values.
2776  (Toke Eskildsen, Adrien Grand)
2777
2778======================= Lucene 7.7.2 =======================
2779
2780Bug fixes
2781
2782* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
2783  IndexSearcher (Alan Woodward, Yury Pakhomov)
2784
2785* LUCENE-8735: FilterDirectory.getPendingDeletions now forwards to the delegate
2786  even the method is not abstract in the super class. This prevents issues
2787  where our best effort in carrying on generations in the IndexWriter since pending
2788  deletions are swallowed by the FilterDirectory. (Henning Andersen, Simon Willnauer)
2789
2790* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
2791  cheapest merges that allow the index to go down to `maxSegmentCount` segments
2792  or less. (Armin Braun via Adrien Grand)
2793
2794* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
2795  This causes assertion errors and potentially broken field attributes in the IndexWriter when
2796  IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
2797
2798* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
2799  overflow bug that disabled cleaning of the cache (Russell A Brown)
2800
2801* LUCENE-8809: Refresh and rollback concurrently can leave segment states unclosed (Nhat Nguyen)
2802
2803======================= Lucene 7.7.1 =======================
2804(No Changes)
2805
2806======================= Lucene 7.7.0 =======================
2807
2808Changes in Runtime Behavior
2809
2810* LUCENE-8527: StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0,
2811  and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type.
2812
2813Build
2814
2815* LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core
2816  dependency. (Dawid Weiss)
2817
2818* LUCENE-8537: ant test command fails under lucene/tools (Peter Somogyi)
2819
2820Bug fixes:
2821
2822* LUCENE-8669: Fix LatLonShape WITHIN queries that fail with Multiple search Polygons
2823  that share the dateline. (Nick Knize)
2824
2825* LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary.
2826  (Yoo Jeongin via Jim Ferenczi)
2827
2828* LUCENE-8624: int overflow in ByteBuffersDataOutput.size(). (Mulugeta Mammo,
2829  Dawid Weiss)
2830
2831* LUCENE-8625: int overflow in ByteBuffersDataInput.sliceBufferList. (Mulugeta Mammo,
2832  Dawid Weiss)
2833
2834* LUCENE-8639: Newly created threadstates while flushing / refreshing can cause duplicated
2835  sequence IDs on IndexWriter. (Simon Willnauer)
2836
2837* LUCENE-8649: LatLonShape's within and disjoint queries can return false positives with
2838  indexed multi-shapes. (Ignacio Vera)
2839
2840* LUCENE-8654: Polygon2D#relateTriangle returns the wrong answer if polygon is inside
2841  the triangle. (Ignacio Vera)
2842
2843* LUCENE-8650: ConcatenatingTokenStream did not correctly clear its state in reset(), and
2844  was not propagating final position increments from its child streams correctly.
2845  (Dan Meehl, Alan Woodward)
2846
2847* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
2848  by a big buffer (1024 chars). (Jim Ferenczi)
2849
2850New Features
2851
2852* LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
2853  points such as range queries or geo queries.
2854  (Christophe Bismuth via Adrien Grand)
2855
2856* LUCENE-8508: IndexWriter can now set the created version via
2857  IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature.
2858  (Adrien Grand)
2859
2860* LUCENE-8601: Attributes set in the IndexableFieldType for each field during indexing will
2861  now be recorded into the corresponding FieldInfo's attributes, accessible at search
2862  time (Murali Krishna P)
2863
2864Improvements
2865
2866* LUCENE-8463: TopFieldCollector can now early-terminates queries when sorting by SortField.DOC.
2867  (Christophe Bismuth via Jim Ferenczi)
2868
2869* LUCENE-8562: Speed up merging segments of points with data dimensions by only sorting on the indexed
2870  dimensions. (Ignacio Vera)
2871
2872* LUCENE-8529: TopSuggestDocsCollector will now use the completion key to tiebreak completion
2873  suggestion with identical scores. (Jim Ferenczi)
2874
2875* LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics.
2876  (Namgyu Kim via Adrien Grand)
2877
2878* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and
2879  detects script boundaries more accurately with Character#UnicodeScript#of.
2880  (Christophe Bismuth, Jim Ferenczi)
2881
2882* LUCENE-8581: Change LatLonShape encoding to use 4 bytes Per Dimension.
2883  (Ignacio Vera, Nick Knize, Adrien Grand)
2884
2885* LUCENE-8527: Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer,
2886  increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization.
2887
2888* LUCENE-8640: Date Range format validation (Lucky Sharma, David Smiley via Mikhail Khludnev)
2889
2890Optimizations
2891
2892* LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment.
2893  (Christophe Bismuth via David Smiley)
2894
2895* LUCENE-8590: BufferedUpdates now uses an optimized storage for buffering docvalues updates that
2896  can safe up to 80% of the heap used compared to the previous implementation and uses non-object
2897  based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand)
2898
2899* LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats
2900  yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grand)
2901
2902* LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates.
2903  (Simon Willnauer, Adrien Grand)
2904
2905* LUCENE-8600: Doc-value updates get applied faster by sorting with quicksort,
2906  rather than an in-place mergesort, which needs to perform fewer swaps.
2907  (Adrien Grand)
2908
2909* LUCENE-8623: Decrease I/O pressure when merging high dimensional points. (Ignacio Vera)
2910
2911Test Framework
2912
2913* LUCENE-8604: TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written
2914  to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of
2915  logs per test class. (Dawid Weiss)
2916
2917Other
2918
2919* LUCENE-8573: BKDWriter now uses FutureArrays#mismatch to compute shared prefixes.
2920  (Christoph Büscher via Adrien Grand)
2921
2922* LUCENE-8605: Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery.
2923  (Ignacio Vera)
2924
2925* LUCENE-8609: Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats()
2926  that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes.
2927  (Simon Willnauer, Nhat Nguyen)
2928
2929======================= Lucene 7.6.0 =======================
2930
2931Build
2932
2933* LUCENE-8504: Upgrade forbiddenapis to version 2.6.  (Uwe Schindler)
2934
2935* LUCENE-8493: Stop publishing insecure .sha1 files with releases (janhoy)
2936
2937Bug fixes
2938
2939* LUCENE-8479: QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception
2940  if the number of expanded path reaches the BooleanQuery#maxClause limit. (Jim Ferenczi)
2941
2942* LUCENE-8522: throw InvalidShapeException when constructing a polygon and
2943  all points are coplanar. (Ignacio Vera)
2944
2945* LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings
2946  in the graph if the slop is greater than 0. Span queries cannot be used in this case because
2947  they don't handle slop the same way than phrase queries. (Steve Rowe, Uwe Schindler, Jim Ferenczi)
2948
2949* LUCENE-8524: Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer.
2950  This change also removes empty terms and trim surface form in Nori's Korean dictionary. (Trey Jones, Jim Ferenczi)
2951
2952* LUCENE-8550: Fix filtering of coplanar points when creating linked list on
2953  polygon tesselator. (Ignacio Vera)
2954
2955* LUCENE-8549: Polygon tessellator throws an error if some parts of the shape
2956   could not be processed. (Ignacio Vera)
2957
2958* LUCENE-8540: Better handling of min/max values for Geo3d encoding. (Ignacio Vera)
2959
2960* LUCENE-8534: Fix incorrect computation for triangles intersecting polygon edges in
2961  shape tessellation. (Ignacio Vera)
2962
2963* LUCENE-8559: Fix bug where polygon edges were skipped when checking for intersections.
2964  (Ignacio Vera)
2965
2966* LUCENE-8556: Use latitude and longitude instead of encoding values to check if triangle is ear
2967  when using morton optimisation. (Ignacio Vera)
2968
2969* LUCENE-8586: Intervals.or() could get stuck in an infinite loop on certain indexes
2970  (Alan Woodward)
2971
2972* LUCENE-8595: Fix interleaved DV update and reset. Interleaved update and reset value
2973  to the same doc in the same updates package looses an update if the reset comes before
2974  the update as well as loosing the reset if the update comes frist. (Simon Willnauer, Adrien Grand)
2975
2976* LUCENE-8592: Fix index sorting corruption due to numeric overflow. The merge of sorted segments
2977  can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains
2978  values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge
2979  (instead of last because of the reverse order) due to this bug. Indices affected by the bug can be
2980  detected by running the CheckIndex command on a distribution that contains the fix (7.6+).
2981  (Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer)
2982
2983New Features
2984
2985* LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users
2986  to select a fewer number of dimensions to be used for creating the index than
2987  the total number of dimensions used for field encoding. i.e., dimensions 0 to N
2988  may be used to determine how to split the inner nodes, and dimensions N+1 to D
2989  are ignored and stored as data dimensions at the leaves. (Nick Knize)
2990
2991* LUCENE-8538: Add a Simple WKT Shape Parser for creating Lucene Geometries (Polygon, Line,
2992  Rectangle) from WKT format. (Nick Knize)
2993
2994* LUCENE-8462: Adds an Arabic snowball stemmer based on
2995  https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl
2996  (Ryadh Dahimene via Jim Ferenczi)
2997
2998* LUCENE-8554: Add new LatLonShapeLineQuery that queries indexed LatLonShape fields
2999  by arbitrary lines. (Nick Knize)
3000
3001* LUCENE-8555: Add dateline crossing support to LatLonShapeBoundingBoxQuery. (Nick Knize)
3002
3003Improvements
3004
3005* LUCENE-8521: Change LatLonShape encoding to 7 dimensions instead of 6; where the
3006  first 4 are index dimensions defining the bounding box of the Triangle and the
3007  remaining 3 data dimensions define the vertices of the triangle. (Nick Knize)
3008
3009* LUCENE-8557: LeafReader.getFieldInfos is now documented and tested that it ought to return
3010  the same cached instance.  MemoryIndex's impl now pre-creates the FieldInfos instead of
3011  re-calculating a new instance each time.  (Tim Underwood, David Smiley)
3012
3013* LUCENE-8558: Replace O(N) lookup with O(1) lookup in PerFieldMergeState#FilterFieldInfos.
3014  (Kranthi via Simon Willnauer)
3015
3016Other
3017
3018* LUCENE-8523: Correct typo in JapaneseNumberFilterFactory javadocs (Ankush Jhalani
3019  via Alan Woodward)
3020
3021* LUCENE-8533: Fix Javadocs of DataInput#readVInt(): Negative numbers are
3022  supported, but should be avoided. (Vladimir Dolzhenko via Uwe Schindler)
3023
3024======================= Lucene 7.5.1 =======================
3025
3026Bug Fixes
3027
3028* LUCENE-8454: Fix incorrect vertex indexing and other computation errors in
3029  shape tessellation that would sometimes cause an infinite loop. (Nick Knize)
3030
3031======================= Lucene 7.5.0 =======================
3032
3033API Changes
3034
3035* LUCENE-8467: RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
3036  (Dawid Weiss)
3037
3038* LUCENE-8356: StandardFilter is deprecated (Alan Woodward)
3039
3040* LUCENE-8373: ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated.  Instead
3041  use EnglishAnalyzer.ENGLISH_STOP_WORD_SET.  The default constructor for
3042  StopAnalyzer is also deprecated, and a stop word set should be explicitly
3043  passed to the constructor.  (Alan Woodward)
3044
3045* LUCENE-8378: Add DocIdSetIterator.range static method to return an iterator
3046  matching a range of docids (Mike McCandless)
3047
3048* LUCENE-8379: Add experimental TermQuery.getTermStates method (Mike McCandless)
3049
3050* LUCENE-8407: Add experimental SpanTermQuery.getTermStates method (David Smiley)
3051
3052* LUCENE-8390: MatchesIteratorSupplier replaced by IOSupplier (Alan Woodward,
3053  David Smiley)
3054
3055* LUCENE-8397: Add DirectoryTaxonomyWriter.getCache (Mike McCandless)
3056
3057* LUCENE-8387: Add experimental IndexSearcher.getSlices API to see which slices
3058  IndexSearcher is searching concurrently when it's created with an ExecutorService
3059  (Mike McCandless)
3060
3061* LUCENE-8263: TieredMergePolicy's reclaimDeletesWeight has been replaced with a
3062  new deletesPctAllowed setting to control how aggressively deletes should be
3063  reclaimed. (Erick Erickson, Adrien Grand)
3064
3065* LUCENE-7314: Graduate LatLonPoint and query classes to core (Nick Knize)
3066
3067* LUCENE-8428: The way that oal.util.PriorityQueue creates sentinel objects has
3068  been changed from a protected method to a java.util.function.Supplier as a
3069  constructor argument. (Adrien Grand)
3070
3071* LUCENE-8437: CheckIndex.Status.cantOpenSegments and missingSegmentVersion
3072  have been removed as they were not computed correctly. (Adrien Grand)
3073
3074* LUCENE-8286: The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that
3075  will tell this highlighter to use the new MatchesIterator API as the underlying
3076  approach to navigate matching hits for a query.  This mode will highlight more
3077  accurately than any other highlighter, and can mark up phrases as one span instead of
3078  word-by-word.  The UH's public internal APIs changed a bit in the process.
3079  (David Smiley)
3080
3081* LUCENE-8471: IndexWriter.getFlushingBytes() returns how many bytes are currently
3082  being flushed to disk. (Alan Woodward)
3083
3084* LUCENE-8422: Static helper functions for Matches and MatchesIterator implementations
3085  have been moved from Matches to MatchesUtils (Alan Woodward)
3086
3087* LUCENE-8343: Suggesters now require Long (versus long, previously) from weight() method
3088  while indexing, and provide double (versus long, previously) scores at lookup time
3089  (Alessandro Benedetti)
3090
3091* LUCENE-8459: SearcherTaxonomyManager now has a constructor taking already opened
3092  IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example.
3093  (Mike McCandless)
3094
3095Bug Fixes
3096
3097* LUCENE-8445: Tighten condition when two planes are identical to prevent constructing
3098  bogus tiles when building GeoPolygons. (Ignacio Vera)
3099
3100* LUCENE-8444: Prevent building functionally identical plane bounds when constructing
3101  DualCrossingEdgeIterator . (Ignacio Vera)
3102
3103* LUCENE-8380: UTF8TaxonomyWriterCache inconsistency. (Ruslan Torobaev, Dawid Weiss)
3104
3105* LUCENE-8164: IndexWriter silently accepts broken payload. This has been fixed
3106  via LUCENE-8165 since we are now checking for offset+length going out of bounds.
3107  (Robert Muir, Nhat Nyugen, Simon Willnauer)
3108
3109* LUCENE-8370: Reproducing
3110  TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields()
3111  failures (Erick Erickson)
3112
3113* LUCENE-8376, LUCENE-8371: ConditionalTokenFilter.end() would not propagate correctly
3114  if the last token in the stream was subsequently dropped; FixedShingleFilter did
3115  not set position increment in end() (Alan Woodward)
3116
3117* LUCENE-8395: WordDelimiterGraphFilter would incorrectly insert a hole into a
3118  TokenStream if a token consisting entirely of delimiter characters was
3119  encountered, but preserve_original was set. (Alan Woodward)
3120
3121* LUCENE-8398: TieredMergePolicy.getMaxMergedSegmentMB has rounding error (Erick Erickson)
3122
3123* LUCENE-8429: DaciukMihovAutomatonBuilder is no longer prone to stack
3124  overflows by enforcing a maximum term length. (Adrien Grand)
3125
3126* LUCENE-8441: IndexWriter now checks doc value type for index sort fields
3127  and fails the document if they are not compatible. (Jim Ferenczi, Mike McCandless)
3128
3129* LUCENE-8458: Adjust initialization condition of PendingSoftDeletes and ensures
3130  it is initialized before accepting deletes (Simon Willnauer, Nhat Nguyen)
3131
3132* LUCENE-8466: IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush
3133  if the index is sorted. (Adrien Grand, Jim Ferenczi, Vish Ramachandran)
3134
3135* LUCENE-8502: Allow access to delegate in FilterCodecReader. FilterCodecReader didn't
3136  allow access to it's delegate like other filter readers. This adds a new #getDelegate method
3137  to access the wrapped reader. (Simon Willnauer)
3138
3139Changes in Runtime Behavior
3140
3141* LUCENE-7976: TieredMergePolicy now respects maxSegmentSizeMB by default when executing
3142  findForcedMerges and findForcedDeletesMerges (Erick Erickson)
3143
3144* LUCENE-8263: TieredMergePolicy now reclaims deleted documents more
3145  aggressively by default ensuring that no more than ~1/3 of the index size is
3146  used by deleted documents. (Adrien Grand)
3147
3148* LUCENE-8503: Call #getDelegate instead of direct member access during unwrap.
3149  Filter*Reader instances access the member or the delegate directly instead of
3150  calling getDelegate(). In order to track access of the delegate these methods
3151  should call #getDelegate() (Simon Willnauer)
3152
3153Improvements
3154
3155* LUCENE-8468: A ByteBuffer based Directory implementation. (Dawid Weiss)
3156
3157* LUCENE-8447: Add DISJOINT and WITHIN support to LatLonShape queries. (Nick Knize)
3158
3159* LUCENE-8440: Add support for indexing and searching Line and Point shapes using LatLonShape encoding (Nick Knize)
3160
3161* LUCENE-8435: Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons (Nick Knize)
3162
3163* LUCENE-8367: Make per-dimension drill down optional for each facet dimension (Mike McCandless)
3164
3165* LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes
3166  into a triangular mesh and indexes individual triangles as a 6 dimension point (Nick Knize)
3167
3168* LUCENE-8345, GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives;
3169  add wrapper class constructors to forbiddenapis.  (Michael Braun via Uwe Schindler)
3170
3171* LUCENE-8415: Clean up Directory contracts and JavaDoc comments. (Dawid Weiss)
3172
3173* LUCENE-8414: Make segmentInfos private in IndexWriter (Simon Willnauer, Nhat Nguyen)
3174
3175* LUCENE-8446: The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in
3176  the passage as merged (as if one larger match).  (David Smiley)
3177
3178* LUCENE-8460: Better argument validation in StoredField. (Namgyu Kim)
3179
3180* LUCENE-8432: TopFieldComparator stops comparing documents if the index is
3181  sorted, even if hits still need to be visited to compute the hit count.
3182  (Nikolay Khitrin)
3183
3184* LUCENE-8422: IntervalQuery now returns useful Matches (Alan Woodward)
3185
3186* LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the
3187  number of dimensions is bigger than 1. It improves performance when there is
3188  correlation between the dimensions, for example ranges. (Ignacio Vera, Adrien Grand)
3189
3190Build
3191
3192* LUCENE-5143: Stop publishing KEYS file with each version, use topmost lucene/KEYS file only.
3193  The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file.
3194  Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build. (janhoy)
3195
3196Other
3197
3198* LUCENE-8485: Update randomizedtesting to version 2.6.4. (Dawid Weiss)
3199
3200* LUCENE-8366: Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's
3201  Extended_Pictographic property. (Robert Muir)
3202
3203* LUCENE-8408: original Highlighter:  Remove obsolete static AttributeFactory instance
3204  in TokenStreamFromTermVector.  (Michael Braun, David Smiley)
3205
3206* LUCENE-8420: Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x
3207  cannot read. 1.9.0 can read the old format. (Koji Sekiguchi)
3208
3209* LUCENE-8453: Add documentation to analysis factories of Korean (Nori) analyzer
3210  module.  (Tomoko Uchida via Uwe Schindler)
3211
3212* LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml (Erick Erickson)
3213
3214* LUCENE-8456: Upgrade Apache Commons Compress to v1.18 (Steve Rowe)
3215
3216* LUCENE-765: Improved org.apache.lucene.index javadocs. (Mike Sokolov)
3217
3218* LUCENE-8476: Remove redundant nullity check and switch to optimized List.sort in the
3219  Korean's user dictionary. (Namgyu Kim)
3220
3221======================= Lucene 7.4.1 =======================
3222
3223Bug Fixes
3224
3225 * LUCENE-8365: Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes
3226   a "off by one" error in the UnifiedHighlighter's code that is only triggered when
3227   two nested SpanNearQueries contain the same term. (Marc-Andre Morissette via Simon Willnauer)
3228
3229 * LUCENE-8381: Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes
3230   while wrapping reader for merges. (Simon Willnauer, Nhat Nguyen)
3231
3232 * LUCENE-8384: Fix missing advance docValues generation while handling docValues
3233   update in PendingSoftDeletes. (Simon Willnauer, Nhat Nguyen)
3234
3235 * LUCENE-8472: Always rewrite the soft-deletes merge retention query. (Adrien Grand, Nhat Nguyen)
3236
3237======================= Lucene 7.4.0 =======================
3238
3239Upgrading
3240
3241* LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you
3242  explicitly use the preservePositionIncrements=false setting (not the default), then you ought
3243  to rebuild your suggester index. If you don't, queries or indexed data with trailing position
3244  gaps (e.g. stop words) may not work correctly. (David Smiley, Jim Ferenczi)
3245
3246API Changes
3247
3248* LUCENE-8242: IndexSearcher.createNormalizedWeight() has been deprecated.
3249  Instead use IndexSearcher.createWeight(), rewriting the query first.
3250  (Alan Woodward)
3251
3252* LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now
3253  also overrides getMaxCFSSegmentSizeMB (Mike Sokolov via Mike McCandless)
3254
3255* LUCENE-8303: LiveDocsFormat is now only responsible for (de)serialization of
3256  live docs. (Adrien Grand)
3257
3258Changes in Runtime Behavior
3259
3260* LUCENE-8309: Live docs are no longer backed by a FixedBitSet. (Adrien Grand)
3261
3262* LUCENE-8330: Detach IndexWriter from MergePolicy. MergePolicy now instead of
3263  requiring IndexWriter as a hard dependency expects a MergeContext which
3264  IndexWriter implements. (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless)
3265
3266New Features
3267
3268* LUCENE-8200: Allow doc-values to be updated atomically together
3269  with a document. Doc-Values updates now can be used as a soft-delete
3270  mechanism to all keeping several version of a document or already
3271  deleted documents around for later reuse. See "IW.softUpdateDocument(...)"
3272  for reference. (Simon Willnauer)
3273
3274* LUCENE-8197: A new FeatureField makes it easy and efficient to integrate
3275  static relevance signals into the final score. (Adrien Grand, Robert Muir)
3276
3277* LUCENE-8202: Add a FixedShingleFilter (Alan Woodward, Adrien Grand, Jim
3278  Ferenczi)
3279
3280* LUCENE-8125: ICUTokenizer support for emoji/emoji sequence tokens. (Robert Muir)
3281
3282* LUCENE-8196, LUCENE-8300: A new IntervalQuery in the sandbox allows efficient proximity
3283  searches based on minimum-interval semantics. (Alan Woodward, Adrien Grand,
3284  Jim Ferenczi, Simon Willnauer, Matt Weber)
3285
3286* LUCENE-8233: Add support for soft deletes to IndexWriter delete accounting.
3287  Soft deletes are accounted for inside the index writer and therefor also
3288  by merge policies. A SoftDeletesRetentionMergePolicy is added that allows
3289  to selectively carry over soft_deleted document across merges for retention
3290  policies (Simon Willnauer, Mike McCandless, Robert Muir)
3291
3292* LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper that allows to respect
3293  soft deletes if the reader is opened form a directory. (Simon Willnauer,
3294  Mike McCandless, Uwe Schindler, Adrien Grand)
3295
3296* LUCENE-8229, LUCENE-8270: Add a method Weight.matches(LeafReaderContext, doc)
3297  that returns an iterator over matching positions for a given query and document.
3298  This allows exact hit extraction and will enable implementation of accurate
3299  highlighters. (Alan Woodward, Adrien Grand, David Smiley)
3300
3301* LUCENE-8249: Implement Matches API for phrase queries (Alan Woodward, Adrien
3302  Grand)
3303
3304* LUCENE-8246: Allow to customize the number of deletes a merge claims. This
3305  helps merge policies in the soft-delete case to correctly implement retention
3306  policies without triggering uncessary merges. (Simon Willnauer, Mike McCandless)
3307
3308* LUCENE-8231: A new analysis module (nori) similar to Kuromoji
3309  but to handle Korean using mecab-ko-dic and morphological analysis.
3310  (Robert Muir, Jim Ferenczi)
3311
3312* LUCENE-8265: WordDelimter/GraphFilter now have an option to skip tokens
3313  marked with KeywordAttribute (Mike Sokolov via Mike McCandless)
3314
3315* LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can
3316  update doc values for a specific term but this might affect all documents
3317  containing the term. With tryUpdateDocValues users can update doc-values
3318  fields for individual documents. This allows for instance to soft-delete
3319  individual documents. (Simon Willnauer)
3320
3321* LUCENE-8298: Allow DocValues updates to reset a value. Passing a DV field with a null
3322  value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the
3323  provided document. This allows to undelete a soft-deleted document unless it's been claimed
3324  by a merge. (Simon Willnauer)
3325
3326* LUCENE-8273: ConditionalTokenFilter allows analysis chains to skip particular token
3327  filters based on the attributes of the current token. This generalises the keyword
3328  token logic currently used for stemmers and WDF.  It is integrated into
3329  CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new
3330  ProtectedTermFilter is added as an example.  (Alan Woodward, Robert Muir,
3331  David Smiley, Steve Rowe, Mike Sokolov)
3332
3333* LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes. Today we fail
3334  creating the IndexWriter when the directory has a pending delete. Yet, this
3335  is mainly done to prevent writing still existing files more than once.
3336  IndexFileDeleter already accounts for that for existing files which we can
3337  now use to also take pending deletes into account which ensures that all file
3338  generations per segment always go forward. (Simon Willnauer)
3339
3340* LUCENE-7960: Add preserveOriginal option to the NGram and EdgeNGram filters.
3341  (Ingomar Wesp, Shawn Heisey via Robert Muir)
3342
3343* LUCENE-8335: Enforce soft-deletes field up-front. Soft deletes field must be marked
3344  as such once it's introduced and can't be changed after the fact.
3345  (Nhat Nguyen via Simon Willnauer)
3346
3347* LUCENE-8332: New ConcatenateGraphFilter for concatenating all tokens into one (or more
3348  in the event of a graph input).  This is useful for fast analyzed exact-match lookup,
3349  suggesters, and as a component of a named entity recognition system.  This was excised
3350  out of CompletionTokenStream in the NRT doc suggester.  (David Smiley, Jim Ferenczi)
3351
3352Bug Fixes
3353
3354* LUCENE-8221: MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger
3355  indexes.
3356
3357* LUCENE-8266: Detect bogus tiles when creating a standard polygon and
3358  throw a TileException. (Ignacio Vera)
3359
3360* LUCENE-8234: Fixed bug in how spatial relationship is computed for
3361  GeoStandardCircle when it covers the whole world. (Ignacio Vera)
3362
3363* LUCENE-8236: Filter duplicated points when creating GeoPath shapes to
3364  avoid creation of bogus planes. (Ignacio Vera)
3365
3366* LUCENE-8243: IndexWriter.addIndexes(Directory[]) did not properly preserve
3367  index file names for updated doc values fields (Simon Willnauer,
3368  Michael McCandless, Nhat Nguyen)
3369
3370* LUCENE-8275: Push up #checkPendingDeletes to Directory to ensure IW fails if
3371  the directory has pending deletes files even if the directory is filtered or
3372  a FileSwitchDirectory (Simon Willnauer, Robert Muir)
3373
3374* LUCENE-8244: Do not leak open file descriptors in SearcherTaxonomyManager's
3375  refresh on exception (Mike McCandless)
3376
3377* LUCENE-8305: ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery
3378  that rewrites to a MatchNoDocsQuery instead of throwing an exception.
3379  (Bjarke Mortensen, Andy Tran via David Smiley)
3380
3381* LUCENE-8287: Ensure that empty regex completion queries always return no results.
3382  (Julie Tibshirani via Jim Ferenczi)
3383
3384* LUCENE-8317: Prevent concurrent deletes from being applied during full flush.
3385  Future deletes could potentially be exposed to flushes/commits/refreshes if the
3386  amount of RAM used by deletes is greater than half of the IW RAM buffer. (Simon Willnauer)
3387
3388* LUCENE-8320: Fix WindowsFS to correctly account for rename and hardlinks.
3389  (Simon Willnauer, Nhat Nguyen)
3390
3391* LUCENE-8328: Ensure ReadersAndUpdates consistently executes under lock.
3392  (Nhat Nguyen via Simon Willnauer)
3393
3394* LUCENE-8325: Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs.
3395  (chengpohi via Jim Ferenczi)
3396
3397* LUCENE-8186: LowerCaseTokenizerFactory now lowercases text in multi-term
3398  queries. (Tim Allison via Adrien Grand)
3399
3400* LUCENE-8278: Some end-of-input no-scheme domain-only URL tokens are typed as
3401  <ALPHANUM> rather than <URL>.  (Junte Zhang, Steve Rowe)
3402
3403* LUCENE-8355: Prevent IW from opening an already dropped segment while DV updates
3404  are written. (Nhat Nguyen via Simon Willnauer)
3405
3406* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing
3407  position increment when the preservePositionIncrement setting is false.
3408  (David Smiley, Jim Ferenczi)
3409
3410* LUCENE-8357: FunctionScoreQuery.boostByQuery() and boostByValue() were
3411  producing truncated Explanations (Markus Jelsma, Alan Woodward)
3412
3413* LUCENE-8360: NGramTokenFilter and EdgeNGramTokenFilter did not correctly
3414  set position increments in end() (Alan Woodward)
3415
3416Other
3417
3418* LUCENE-8301: Update randomizedtesting to 2.6.0. (Dawid Weiss)
3419
3420* LUCENE-8299: Geo3D wrapper uses new polygon method factory that gives better
3421  support for polygons with many points (>100). (Ignacio vera)
3422
3423* LUCENE-8261: InterpolatedProperties.interpolate and recursive property
3424  references. (Steve Rowe, Dawid Weiss)
3425
3426* LUCENE-8228: removed obsolete IndexDeletionPolicy clone() requirements from
3427  the javadoc. (Dawid Weiss)
3428
3429* LUCENE-8219: Use a realistic estimate of the number of nodes and links in
3430   LevensteinAutomaton.java, to save reallocation of arrays.
3431   (Christian Ziech)
3432
3433* LUCENE-8214: Improve selection of testPoint for GeoComplexPolygon.
3434  (Ignacio Vera)
3435
3436* SOLR-10912: Add automatic patch validation. (Mano Kovacs, Steve Rowe)
3437
3438* LUCENE-8122, LUCENE-8175: Upgrade analysis/icu to ICU 61.1.
3439  (Robert Muir, Adrien Grand, Uwe Schindler)
3440
3441* LUCENE-8291: Remove QueryTemplateManager utility class from XML queryparser.
3442  This class is just a general XML transforming tool (using property files and
3443  XSLT) and has nothing to do with query parsing. It can easily be implemented
3444  using more sophisticated libraries or using XSL transformers from the JDK.
3445  This change also removes the Lucene demo webapp to prevent XSS issues in
3446  untested/unmaintained code. (Uwe Schindler)
3447
3448Build
3449
3450* LUCENE-7935: Publish .sha512 hash files with the release artifacts and stop
3451  publishing .md5 hashes since the algorithm is broken (janhoy)
3452
3453* LUCENE-8230: Upgrade forbiddenapis to version 2.5.  (Uwe Schindler)
3454
3455Documentation
3456
3457* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
3458  (Mike Sokolov via Mike McCandless)
3459
3460======================= Lucene 7.3.1 =======================
3461
3462Bug fixes
3463
3464* LUCENE-8254: LRUQueryCache could cause IndexReader to hang on close, when
3465  shared with another reader with no CacheHelper (Alan Woodward, Simon Willnauer,
3466  Adrien Grand)
3467
3468======================= Lucene 7.3.0 =======================
3469
3470API Changes
3471
3472* LUCENE-8051: LevensteinDistance renamed to LevenshteinDistance.
3473  (Pulak Ghosh via Adrien Grand)
3474
3475* LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery.
3476  Users should instead use FunctionScoreQuery, possibly combined with
3477  a lucene expression (Alan Woodward)
3478
3479* LUCENE-8104: Remove facets module compile-time dependency on queries
3480  (Alan Woodward)
3481
3482* LUCENE-8145: UnifiedHighlighter now uses a unitary OffsetsEnum rather
3483  than a list of enums (Alan Woodward, David Smiley, Jim Ferenczi, Timothy
3484  Rodriguez)
3485
3486New Features
3487
3488* LUCENE-2899: Add new module analysis/opennlp, with analysis components
3489  to perform tokenization, part-of-speech tagging, lemmatization and phrase
3490  chunking by invoking the corresponding OpenNLP tools. Named entity
3491  recognition is also provided as a Solr update request processor.
3492  (Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau,
3493  Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
3494
3495* LUCENE-8126: Add new spatial prefix tree (SPT) based on google S2 geometry.
3496  It can only be used currently with Geo3D spatial context and it provides
3497  improvements on indexing time for non-points shapes and on query performance.
3498  (Ignacio Vera, David Smiley).
3499
3500Improvements
3501
3502* LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads
3503  Index/Update Threads try to help out flushing pending document buffers to
3504  disk. This change adds an expert setting to opt ouf of this behavior unless
3505  flusing is falling behind. (Simon Willnauer)
3506
3507* LUCENE-8086: spatial-extras Geo3dFactory: Use GeoExactCircle with
3508  configurable precision for non-spherical planet models.
3509  (Ignacio Vera via David Smiley)
3510
3511* LUCENE-8093: TrimFilterFactory implements MultiTermAwareComponent (Alan Woodward)
3512
3513* LUCENE-8094: TermInSetQuery.toString now returns "field:(A B C)" (Mike McCandless)
3514
3515* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are
3516  position sensitive (e.g. part of a phrase) by having an accurate freq.
3517  (David Smiley)
3518
3519* LUCENE-8129: A Unicode set filter can now be specified when using ICUFoldingFilter.
3520  (Ere Maijala)
3521
3522* LUCENE-7966: Build Multi-Release JARs to enable usage of optimized intrinsic methods
3523  from Java 9 for index bounds checking and array comparison/mismatch. This change
3524  introduces Java 8 replacements for those Java 9 methods and patches the compiled
3525  classes to use the optimized variants through the MR-JAR mechanism.
3526  (Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless)
3527
3528* LUCENE-8127: Speed up rewriteNoScoring when there are no MUST clauses.
3529  (Michael Braun via Adrien Grand)
3530
3531* LUCENE-8152: Improve consumption of doc-value iterators. (Horatiu Lazu via
3532  Adrien Grand)
3533
3534* LUCENE-8033: FieldInfos now always use a dense encoding. (Mayya Sharipova
3535  via Adrien Grand)
3536
3537* LUCENE-8190: Specialized cell interface to allow any spatial prefix tree to
3538  benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy.
3539  (Ignacio Vera)
3540
3541Bug Fixes
3542
3543* LUCENE-8077: Fixed bug in how CheckIndex verifies doc-value iterators.
3544  (Xiaoshan Sun via Adrien Grand)
3545
3546* SOLR-11758: Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F
3547  (Munendra S N via hossman)
3548
3549* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested
3550  SpanNearQueries at positions where it should not have.  It's fixed in the UH by
3551  switching to the SpanCollector API.  The original Highlighter still has this
3552  problem (LUCENE-2287, LUCENE-5455, LUCENE-6796).  Some public but internal parts of
3553  the UH were refactored. (David Smiley, Steve Davids)
3554
3555* LUCENE-8120: Fix LatLonBoundingBox's toString() method (Martijn van Groningen, Adrien Grand)
3556
3557* LUCENE-8130: Fix NullPointerException from TermStates.toString() (Mike McCandless)
3558
3559* LUCENE-8124: Fixed HyphenationCompoundWordTokenFilter to handle correctly
3560  hyphenation patterns with indicator >= 7. (Holger Bruch via Adrien Grand)
3561
3562* LUCENE-8163: BaseDirectoryTestCase could produce random filenames that fail
3563  on Windows (Alan Woodward)
3564
3565* LUCENE-8174: Fixed {Float,Double,Int,Long}Range.toString(). (Oliver Kaleske
3566  via Adrien Grand)
3567
3568* LUCENE-8182: Fixed BoostingQuery to apply the context boost instead of the parent query
3569  boost (Jim Ferenczi)
3570
3571* LUCENE-8188: Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the
3572  ResourceLoader to be leaked (hossman)
3573
3574
3575Other
3576
3577* LUCENE-8111: IndexOrDocValuesQuery Javadoc references outdated method name.
3578  (Kai Chan via Adrien Grand)
3579
3580* LUCENE-8106: Add script (reproduceJenkinsFailures.py) to attempt to reproduce
3581  failing tests from a Jenkins log. (Steve Rowe)
3582
3583* LUCENE-8075: Removed unnecessary null check in IntersectTermsEnum.
3584  (Pulak Ghosh via Adrien Grand)
3585
3586* LUCENE-8156: Require users to not have ASM on the Ant classpath during build.
3587  This is required by LUCENE-7966. (Adrien Grand, Uwe Schindler)
3588
3589* LUCENE-8161: spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7,
3590  which is drop-in compatible (Lucene doesn't expressly use any of the few API differences).
3591  Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version.  JTS 1.15.0 is
3592  dual-licensed to include BSD; prior versions were LGPL.  (David Smiley)
3593
3594* LUCENE-8155: Add back support in smoke tester to run against later Java versions.
3595  (Uwe Schindler)
3596
3597* LUCENE-8169: Migrated build to use OpenClover 4.2.1 for checking code coverage.
3598  (Uwe Schindler)
3599
3600* LUCENE-8170: Improve OpenClover reports (separate test from production code);
3601  enable coverage reports inside test-frameworks.  (Uwe Schindler)
3602
3603Build
3604
3605* LUCENE-8168: Moved Groovy scripts in build files to separate files.
3606  Update Groovy to 2.4.13.  (Uwe Schindler)
3607
3608* LUCENE-8176: HttpReplicatorTest awaits more than a minute for stopping Jetty threads
3609  (Mikhail Khludnev)
3610
3611======================= Lucene 7.2.1 =======================
3612
3613Bug Fixes
3614
3615* LUCENE-8117: Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues. (Jim Ferenczi).
3616
3617======================= Lucene 7.2.0 =======================
3618
3619API Changes
3620
3621* LUCENE-8017, LUCENE-8042: Weight, DoubleValuesSource and related objects
3622  now implement a SegmentCacheable interface, with a single method
3623  isCacheable(LeafReaderContext) determining whether or not the object may
3624  be cached against a LeafReader. (Alan Woodward, Robert Muir)
3625
3626* LUCENE-8038: Payload factors for scoring in PayloadScoreQuery are now
3627  calculated by a PayloadDecoder, instead of delegating to the Similarity.
3628  (Alan Woodward)
3629
3630* LUCENE-8014: Similarity.computeSlopFactor() and
3631  Similarity.computePayloadFactor() have been deprecated. (Alan Woodward)
3632
3633* LUCENE-6278: Scorer.freq() has been removed (Alan Woodward)
3634
3635* LUCENE-7736: DoubleValuesSource and LongValuesSource now expose a
3636  rewrite(IndexSearcher) function. (Alan Woodward)
3637
3638* LUCENE-7998: DoubleValuesSource.fromQuery() allows you to use the scores
3639  from a Query as a DoubleValuesSource. (Alan Woodward)
3640
3641* LUCENE-8049: IndexWriter.getMergingSegments()'s return type was changed from
3642  Collection to Set to more accurately reflect it's nature. (David Smiley)
3643
3644* LUCENE-8059: TopFieldDocCollector can now early terminate collection when
3645  the sort order is compatible with the index order. As a consequence,
3646  EarlyTerminatingSortingCollector is now deprecated. (Adrien Grand)
3647
3648New Features
3649
3650* LUCENE-8061: Add convenience factory methods to create BBoxes and XYZSolids
3651  directly from bounds objects.
3652
3653* LUCENE-7736: IndexReaderFunctions expose various IndexReader statistics as
3654  DoubleValuesSources. (Alan Woodward)
3655
3656* LUCENE-8068: Allow IndexWriter to write a single DWPT to disk Adds a
3657  flushNextBuffer method to IndexWriter that allows the caller to
3658  synchronously move the next pending or the biggest non-pending index buffer to
3659  disk. This enables flushing selected buffer to disk without highjacking an
3660  indexing thread. This is for instance useful if more than one IW (shards) must
3661  be maintained in a single JVM / system. (Simon Willnauer)
3662
3663Bug Fixes
3664
3665* LUCENE-8076: Normalize Vincenti distance calculation for planet models that aren't normalized.
3666  (Ignacio Vera)
3667
3668* LUCENE-8057: Exact circle bounds computation was incorrect.
3669  (Ignacio Vera)
3670
3671* LUCENE-8056: Exact circle segment bounding suffered from precision errors.
3672  (Karl Wright)
3673
3674* LUCENE-8054: Fix the exact circle case where relationships fail when the
3675  planet model has c <= ab, because the planes are constructed incorrectly.
3676  (Ignacio Vera)
3677
3678* LUCENE-7991: KNearestNeighborDocumentClassifier.knnSearch no longer applies
3679  a previous boosted field's factor to subsequent unboosted fields.
3680  (Christine Poerschke)
3681
3682* LUCENE-7999: Switch from int to long to track the name for the next
3683  segment to write, so that very long lived indices with very frequent
3684  refreshes or commits, and high indexing thread counts, do not
3685  overflow an int (Mykhailo Demianenko via Mike McCandless)
3686
3687* LUCENE-8025: Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
3688  that omit term frequency information, as it is equivalent in that case.
3689  Previously bogus numbers were used, and many similarities would
3690  completely degrade. (Robert Muir, Adrien Grand)
3691
3692* LUCENE-8045: ParallelLeafReader did not correctly report FieldInfo.dvGen
3693  (Alan Woodward)
3694
3695* LUCENE-8034: Use subtraction instead of addition to sidestep int
3696  overflow in SpanNotQuery.  (Hari Menon via Mike McCandless)
3697
3698* LUCENE-8078: The query cache should not cache instances of
3699  MatchNoDocsQuery. (Jon Harper via Adrien Grand)
3700
3701* LUCENE-8048: Filesystems do not guarantee order of directories updates
3702  (Nikolay Martynov, Simon Willnauer, Erick Erickson)
3703
3704Optimizations
3705
3706* LUCENE-8018: Smaller FieldInfos memory footprint by not retaining unnecessary
3707  references to TreeMap entries. (Julian Vassev via Adrien Grand)
3708
3709* LUCENE-7994: Use int/int scatter map to gather facet counts when the
3710  number of hits is small relative to the number of unique facet labels
3711  (Dawid Weiss, Robert Muir, Mike McCandless)
3712
3713* LUCENE-8062: GlobalOrdinalsQuery is no longer eligible for caching. (Jim Ferenczi)
3714
3715* LUCENE-8058: Large instances of TermInSetQuery are no longer eligible for
3716  caching as they could break memory accounting of the query cache.
3717  (Adrien Grand)
3718
3719* LUCENE-8055: MemoryIndex.MemoryDocValuesIterator returns 2 documents
3720  instead of 1. (Simon Willnauer)
3721
3722* LUCENE-8043: Fix document accounting in IndexWriter to prevent writing too many
3723  documents. Once this happens, Lucene refuses to open the index and throws a
3724  CorruptIndexException. (Simon Willnauer, Yonik Seeley, Mike McCandless)
3725
3726Tests
3727
3728* LUCENE-8035: Run tests with JDK-specific options: --illegal-access=deny
3729  on Java 9+.  (Uwe Schindler)
3730
3731Build
3732
3733* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
3734  jars in ~/.ant/lib/.  (Shawn Heisey, Steve Rowe)
3735
3736
3737======================= Lucene 7.1.0 =======================
3738
3739Changes in Runtime Behavior
3740
3741* Resolving of external entities in queryparser/xml/CoreParser is disallowed
3742  by default. See SOLR-11477 for details.
3743
3744New Features
3745
3746* LUCENE-7970: Add a shape to Geo3D that consists of multiple planes that
3747  approximate a true circle, rather than an ellipse, for non-spherical planet models.
3748  (Karl Wright, Ignacio Vera)
3749
3750* LUCENE-7955: Add support for the concept of "nearest distance" to Geo3D's
3751  GeoPath abstraction, which is the distance along the path to the point that is
3752  closest to the provided point. (Karl Wright)
3753
3754* LUCENE-7906: Add spatial relationships between all currently-defined Geo shapes.
3755  (Ignacio Vera)
3756
3757* LUCENE-7955: Add support for zero-width paths. (Karl Wright)
3758
3759* LUCENE-7936: Add serialization and deserialization support to Geo3D. (Karl Wright,
3760  Ignacio Vera)
3761
3762* LUCENE-7942: Distance computations now have the ability to accurately aggregate
3763  distances, rather than just doing sums. (Karl Wright)
3764
3765* LUCENE-7934: Add a planet model interface. (Karl Wright)
3766
3767* LUCENE-7918: Revamp the API for composites so that it's generic and can be used
3768  for many kinds of shapes. (Ignacio Vera)
3769
3770* LUCENE-7621: Add CoveringQuery, a query whose required number of matching
3771  clauses can be defined per document. (Adrien Grand)
3772
3773* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual
3774  numeric values (Mike McCandless)
3775
3776* LUCENE-7940: Add BengaliAnalyzer. (Md. Abdulla-Al-Sun via Robert Muir)
3777
3778* LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type.
3779  (Nick Knize)
3780
3781* LUCENE-7951: Spatial-extras has much better Geo3d support by implementing Spatial4j
3782  abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator.
3783  (Ignacio Vera, David Smiley)
3784
3785* LUCENE-7973: Update dictionary version for Ukrainian analyzer to 3.9.0 (Andriy
3786  Rysin via Dawid Weiss)
3787
3788* LUCENE-7974: Add FloatPointNearestNeighbor, an N-dimensional FloatPoint
3789  K-nearest-neighbor search implementation.  (Steve Rowe)
3790
3791* LUCENE-7975: Change the default taxonomy facets cache to a faster
3792  byte[] (UTF-8) based cache.  (Mike McCandless)
3793
3794* LUCENE-7972: DirectoryTaxonomyReader, in Lucene's facet module, now
3795  implements Accountable, so you can more easily track how much heap
3796  it's using.  (Mike McCandless)
3797
3798* LUCENE-7982: A new NormsFieldExistsQuery matches documents that have
3799  norms in a specified field (Colin Goodheart-Smithe via Mike McCandless)
3800
3801Optimizations
3802
3803* LUCENE-7905: Optimize how OrdinalMap (used by
3804  SortedSetDocValuesFacetCounts and others) builds its map (Robert
3805  Muir, Adrien Grand, Mike McCandless)
3806
3807* LUCENE-7655: Speed up geo-distance queries in case of dense single-valued
3808  fields when most documents match. (Maciej Zasada via Adrien Grand)
3809
3810* LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more
3811  than 8x greater than the cost of the lead iterator in order to use doc values.
3812  (Murali Krishna P via Adrien Grand)
3813
3814* LUCENE-7925: Collapse duplicate SHOULD or MUST clauses by summing up their
3815  boosts. (Adrien Grand)
3816
3817* LUCENE-7939: MinShouldMatchSumScorer now leverages two-phase iteration in
3818  order to be faster when used in conjunctions. (Adrien Grand)
3819
3820* LUCENE-7827: AnalyzingInfixSuggester doesn't create "textgrams"
3821  when minPrefixChar=0 (Mikhail Khludnev)
3822
3823Bug Fixes
3824
3825* LUCENE-8066: It was still possible to construct a concave GeoExactCircle, so use
3826   a sector approach to prevent that. (Ignacio Vera)
3827
3828* LUCENE-7967: The GeoDegeneratePoint isWithin() method needed allowance for
3829   numerical precision. (Karl Wright)
3830
3831* LUCENE-7965: GeoBBoxFactory was constructing the wrong shape at the poles
3832  if the longitude span was greater than 180 degrees. (Karl Wright)
3833
3834* LUCENE-7916: Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used
3835  with a different ICU JAR version than it is compiled against. Note, this is
3836  not recommended, lucene-analyzers-icu contains binary data structures
3837  specific to ICU/Unicode versions it is built against. (Chris Koenig, Robert Muir)
3838
3839* LUCENE-7891: Lucene's taxonomy facets now uses a non-buggy LRU cache
3840  by default.  (Jan-Willem van den Broek via Mike McCandless)
3841
3842* LUCENE-7959: Improve NativeFSLockFactory's exception message if it cannot create
3843  write.lock for an empty index due to bad permissions/read-only filesystem/etc.
3844  (Erick Erickson, Shawn Heisey, Robert Muir)
3845
3846* LUCENE-7968: AnalyzingSuggester would sometimes order suggestions incorrectly,
3847  it did not properly break ties on the surface forms when both the weights and
3848  the analyzed forms were equal. (Robert Muir)
3849
3850* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
3851  child scorers (Adrien Grand, Mike McCandless)
3852
3853* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
3854  by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
3855
3856Build
3857
3858* SOLR-11181: Switch order of maven artifact publishing procedure: deploy first
3859  instead of locally installing first, to workaround a double repository push of
3860  *-sources.jar and *-javadoc.jar files.  (Lynn Monson via Steve Rowe)
3861
3862* LUCENE-6673: Maven build fails for target javadoc:jar.
3863  (Ramkumar Aiyengar, Daniel Collins via Steve Rowe)
3864
3865* LUCENE-7985: Upgrade forbiddenapis to 2.4.1.  (Uwe Schindler)
3866
3867Other
3868
3869* LUCENE-7948, LUCENE-7937: Upgrade randomizedtesting to 2.5.3 (minor fixes
3870  in test filtering for IDEs). (Mike Sokolov, Dawid Weiss)
3871
3872* LUCENE-7933: LongBitSet now validates the numBits parameter (Won
3873  Jonghoon, Mike McCandless)
3874
3875* LUCENE-7978: Add some more documentation about setting up build
3876  environment.  (Anton R. Yuste via Uwe Schindler)
3877
3878* LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface
3879  instead of an abstract class with a single method (Dawid Weiss)
3880
3881* LUCENE-5753: Update TLDs recognized by UAX29URLEmailTokenizer. (Steve Rowe)
3882
3883
3884======================= Lucene 7.0.1 =======================
3885
3886Bug Fixes
3887
3888* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
3889  child scorers (Adrien Grand, Mike McCandless)
3890
3891======================= Lucene 7.0.0 =======================
3892
3893New Features
3894
3895* LUCENE-7703: SegmentInfos now record the major Lucene version at index
3896  creation time. (Adrien Grand)
3897
3898* LUCENE-7756: LeafReader.getMetaData now exposes the index created version as
3899  well as the oldest Lucene version that contributed to the segment.
3900  (Adrien Grand)
3901
3902* LUCENE-7854: The new TermFrequencyAttribute used during analysis
3903  with a custom token stream allows indexing custom term frequencies
3904  (Mike McCandless)
3905
3906* LUCENE-7866: Add a new DelimitedTermFrequencyTokenFilter that allows to
3907  mark tokens with a custom term frequency (LUCENE-7854). It parses a numeric
3908  value after a separator char ('|') at the end of each token and changes
3909  the term frequency to this value.  (Uwe Schindler, Robert Muir, Mike
3910  McCandless)
3911
3912* LUCENE-7868: Multiple threads can now resolve deletes and doc values
3913  updates concurrently, giving sizable speedups in update-heavy
3914  indexing use cases (Simon Willnauer, Mike McCandless)
3915
3916* LUCENE-7823: Pure query based naive bayes classifier using BM25 scores (Tommaso Teofili)
3917
3918* LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili)
3919
3920* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory.
3921  (Juan Pedro via Adrien Grand)
3922
3923API Changes
3924
3925* LUCENE-2605: Classic QueryParser no longer splits on whitespace by default.
3926  Use setSplitOnWhitespace(true) to get the old behavior.  (Steve Rowe)
3927
3928* LUCENE-7369: Similarity.coord and BooleanQuery.disableCoord are removed.
3929  (Adrien Grand)
3930
3931* LUCENE-7368: Removed query normalization. (Adrien Grand)
3932
3933* LUCENE-7355: AnalyzingQueryParser has been removed as its functionality has
3934  been folded into the classic QueryParser. (Adrien Grand)
3935
3936* LUCENE-7407: Doc values APIs have been switched from random access
3937  to iterators, enabling future codec compression improvements. (Mike
3938  McCandless)
3939
3940* LUCENE-7475: Norms now support sparsity, allowing to pay for what is
3941  actually used. (Adrien Grand)
3942
3943* LUCENE-7494: Points now have a per-field API, like doc values. (Adrien Grand)
3944
3945* LUCENE-7410: Cache keys and close listeners have been refactored in order
3946  to be less trappy. See IndexReader.getReaderCacheHelper and
3947  LeafReader.getCoreCacheHelper. (Adrien Grand)
3948
3949* LUCENE-6819: Index-time boosts are not supported anymore. As a replacement,
3950  index-time scoring factors should be indexed into a doc value field and
3951  combined at query time using eg. FunctionScoreQuery. (Adrien Grand)
3952
3953* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
3954  (David Smiley)
3955
3956* LUCENE-7701: Grouping collectors have been refactored, such that groups are
3957  now defined by a GroupSelector implementation. (Alan Woodward)
3958
3959* LUCENE-7741: DoubleValuesSource now has an explain() method (Alan Woodward,
3960  Adrien Grand)
3961
3962* LUCENE-7815: Removed the PostingsHighlighter; you should use the UnifiedHighlighter
3963  instead, which derived from the UH.  WholeBreakIterator and
3964  CustomSeparatorBreakIterator were moved to UH's package. (David Smiley)
3965
3966* LUCENE-7850: Removed support for legacy numerics. (Adrien Grand)
3967
3968* LUCENE-7500: Removed abstract LeafReader.fields(); instead terms(fieldName)
3969  has been made abstract, fomerly was final.  Also, MultiFields.getTerms
3970  was optimized to work directly instead of being implemented on getFields.
3971  (David Smiley)
3972
3973* LUCENE-7872: TopDocs.totalHits is now a long. (Adrien Grand, hossman)
3974
3975* LUCENE-7868: IndexWriterConfig.setMaxBufferedDeleteTerms is
3976  removed. (Simon Willnauer, Mike McCandless)
3977
3978* LUCENE-7877: PrefixAwareTokenStream is replaced with ConcatenatingTokenStream
3979  (Alan Woodward, Uwe Schindler, Adrien Grand)
3980
3981* LUCENE-7867: The deprecated Token class is now only available in the test
3982  framework (Alan Woodward, Adrien Grand)
3983
3984* LUCENE-7723: DoubleValuesSource enforces implementation of equals() and
3985  hashCode() (Alan Woodward)
3986
3987* LUCENE-7737: The spatial-extras module no longer has a dependency on the
3988  queries module.  All uses of ValueSource are either replaced with core
3989  DoubleValuesSource extensions, or with the new ShapeValuesSource and
3990  ShapeValuesPredicate classes (Alan Woodward, David Smiley)
3991
3992* LUCENE-7892: Doc-values query factory methods have been renamed so that their
3993  name contains "slow" in order to cleary indicate that they would usually be a
3994  bad choice. (Adrien Grand)
3995
3996* LUCENE-7899: FieldValueQuery is renamed to DocValuesFieldExistsQuery
3997  (Adrien Grand, Mike McCandless)
3998
3999Bug Fixes
4000
4001* LUCENE-7626: IndexWriter will no longer accept broken token offsets
4002  (Mike McCandless)
4003
4004* LUCENE-7859: Spatial-extras PackedQuadPrefixTree bug that only revealed itself
4005  with the new pointsOnly optimizations in LUCENE-7845. (David Smiley)
4006
4007* LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value, introducing
4008  wrap methods accepting children as DISI. Extracting ToParentDocValues (Mikhail Khludnev)
4009
4010* LUCENE-7914: Add a maximum recursion level in automaton recursive
4011  functions (Operations.isFinite and Operations.topsortState) to prevent
4012  large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi)
4013
4014* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even
4015  if possible). (Dawid Weiss)
4016
4017* LUCENE-7956: Fixed potential stack overflow error in ICUNormalizer2CharFilter.
4018  (Adrien Grand)
4019
4020* LUCENE-7963: Remove useless getAttribute() in DefaultIndexingChain that
4021  causes performance drop, introduced by LUCENE-7626.  (Daniel Mitterdorfer
4022  via Uwe Schindler)
4023
4024Improvements
4025
4026* LUCENE-7489: Better storage of sparse doc-values fields with the default
4027  codec. (Adrien Grand)
4028
4029* LUCENE-7730: More accurate encoding of the length normalization factor
4030  thanks to the removal of index-time boosts. (Adrien Grand)
4031
4032* LUCENE-7901: Original Highlighter now eagerly throws an exception if you
4033  provide components that are null. (Jason Gerlowski, David Smiley)
4034
4035* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss)
4036
4037Optimizations
4038
4039* LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both
4040  in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT
4041  clauses. (Spyros Kapnissis via Adrien Grand, Uwe Schindler)
4042
4043* LUCENE-7506: FastTaxonomyFacetCounts should use CPU in proportion to
4044  the size of the intersected set of hits from the query and documents
4045  that have a facet value, so sparse faceting works as expected
4046  (Adrien Grand via Mike McCandless)
4047
4048* LUCENE-7519: Add optimized APIs to compute browse-only top level
4049  facets (Mike McCandless)
4050
4051* LUCENE-7589: Numeric doc values now have the ability to encode blocks of
4052  values using different numbers of bits per value if this proves to save
4053  storage. (Adrien Grand)
4054
4055* LUCENE-7845: Enhance spatial-extras RecursivePrefixTreeStrategy queries when the
4056  query is a point (for 2D) or a is a simple date interval (e.g. 1 month).  When
4057  the strategy is marked as pointsOnly, the results is a TermQuery. (David Smiley)
4058
4059* LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi)
4060
4061* LUCENE-7828: Speed up range queries on range fields by improving how we
4062  compute the relation between the query and inner nodes of the BKD tree.
4063  (Adrien Grand)
4064
4065Other
4066
4067* LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss)
4068
4069* LUCENE-7328: Remove LegacyNumericEncoding from GeoPointField. (Nick Knize)
4070
4071* LUCENE-7360: Remove Explanation.toHtml() (Alan Woodward)
4072
4073* LUCENE-7681: MemoryIndex uses new DocValues API (Alan Woodward)
4074
4075* LUCENE-7753: Make fields static when possible.
4076  (Daniel Jelinski via Adrien Grand)
4077
4078* LUCENE-7540: Upgrade ICU to 59.1 (Mike McCandless, Jim Ferenczi)
4079
4080* LUCENE-7852: Correct copyright year(s) in lucene/LICENSE.txt file.
4081  (Christine Poerschke, Steve Rowe)
4082
4083* LUCENE-7719: Generalized the UnifiedHighlighter's support for AutomatonQuery
4084  for character & binary automata. Added AutomatonQuery.isBinary. (David Smiley)
4085
4086* LUCENE-7873: Due to serious problems with context class loaders in several
4087  frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats,
4088  DocValuesFormats and all analysis factories was changed to only inspect the
4089  current classloader that defined the interface class (lucene-core.jar).
4090  See MIGRATE.txt for more information!  (Uwe Schindler, Dawid Weiss)
4091
4092* LUCENE-7883: Lucene no longer uses the context class loader when resolving
4093  resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only
4094  resolved against Lucene's class loader by default. Please use another builder
4095  method to change to a custom classloader.  (Uwe Schindler)
4096
4097* LUCENE-5822: Convert README to Markdown (Jason Gerlowski via Mike Drob)
4098
4099* LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer.
4100  (Ahmet Arslan via Steve Rowe)
4101
4102* LUCENE-7800: Remove code that potentially rethrows checked exceptions
4103  from methods that don't declare them ("sneaky throw" hack). (Robert Muir,
4104  Uwe Schindler, Dawid Weiss)
4105
4106* LUCENE-7876: Avoid calls to LeafReader.fields() and MultiFields.getFields()
4107  that are trivially replaced by LeafReader.terms() and MultiFields.getTerms()
4108  (David Smiley)
4109
4110======================= Lucene 6.6.5 =======================
4111(No Changes)
4112
4113======================= Lucene 6.6.4 =======================
4114(No Changes)
4115
4116======================= Lucene 6.6.3 =======================
4117
4118Build
4119
4120* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
4121  jars in ~/.ant/lib/.  (Shawn Heisey, Steve Rowe)
4122
4123======================= Lucene 6.6.2 =======================
4124
4125Changes in Runtime Behavior
4126
4127* Resolving of external entities in queryparser/xml/CoreParser is disallowed
4128  by default. See SOLR-11477 for details.
4129
4130Bug Fixes
4131
4132* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
4133  by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
4134
4135======================= Lucene 6.6.1 =======================
4136
4137Bug Fixes
4138
4139* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects
4140  that these points are visited in ascending order. The memory index doesn't do this and this can result in document
4141  with multiple points that should match to not match. (Martijn van Groningen)
4142
4143* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi)
4144
4145======================= Lucene 6.6.0 =======================
4146
4147New Features
4148
4149* LUCENE-7811: Add a concurrent SortedSet facets implementation.
4150  (Mike McCandless)
4151
4152Bug Fixes
4153
4154* LUCENE-7777: ByteBlockPool.readBytes sometimes throws
4155  ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB
4156  were added (Mike McCandless)
4157
4158* LUCENE-7797: The static FSDirectory.listAll(Path) method was always
4159  returning an empty array.  (Atkins Chang via Mike McCandless)
4160
4161* LUCENE-7481: Fixed missing rewrite methods for SpanPayloadCheckQuery
4162  and PayloadScoreQuery. (Erik Hatcher)
4163
4164* LUCENE-7808: Fixed PayloadScoreQuery and SpanPayloadCheckQuery
4165  .equals and .hashCode methods.  (Erik Hatcher)
4166
4167* LUCENE-7798: Add .equals and .hashCode to ToParentBlockJoinSortField
4168  (Mikhail Khludnev)
4169
4170* LUCENE-7814: DateRangePrefixTree (in spatial-extras) had edge-case bugs for
4171  years >= 292,000,000. (David Smiley)
4172
4173* LUCENE-5365, LUCENE-7818: Fix incorrect condition in queryparser's
4174  QueryNodeOperation#logicalAnd().  (Olivier Binda, Amrit Sarkar,
4175  AppChecker via Uwe Schindler)
4176
4177* LUCENE-7821: The classic and flexible query parsers, as well as Solr's
4178 "lucene"/standard query parser, should require " TO " in range queries,
4179  and accept "TO" as endpoints in range queries. (hossman, Steve Rowe)
4180
4181* LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city).
4182  (Jim Ferenczi)
4183
4184* LUCENE-7817: Pass cached query to onQueryCache instead of null.
4185  (Christoph Kaser via Adrien Grand)
4186
4187* LUCENE-7831: CodecUtil should not seek to negative offsets. (Adrien Grand)
4188
4189* LUCENE-7833: ToParentBlockJoinQuery computed the min score instead of the max
4190  score with ScoreMode.MAX. (Adrien Grand)
4191
4192* LUCENE-7847: Fixed all-docs-match optimization of range queries on range
4193  fields. (Adrien Grand)
4194
4195* LUCENE-7810: Fix equals() and hashCode() methods of several join queries.
4196  (Hossman, Adrien Grand, Martijn van Groningen)
4197
4198Improvements
4199
4200* LUCENE-7782: OfflineSorter now passes the total number of items it
4201  will write to getWriter (Mike McCandless)
4202
4203* LUCENE-7785: Move dictionary for Ukrainian analyzer to external dependency.
4204  (Andriy Rysin via Steve Rowe, Dawid Weiss)
4205
4206* LUCENE-7801: SortedSetDocValuesReaderState now implements
4207  Accountable so you can see how much RAM it's using (Robert Muir,
4208  Mike McCandless)
4209
4210* LUCENE-7792: OfflineSorter can now run concurrently if you pass it
4211  an optional ExecutorService (Dawid Weiss, Mike McCandless)
4212
4213* LUCENE-7811: Sorted set facets now use sparse storage when
4214  collecting hits, when appropriate.  (Mike McCandless)
4215
4216Optimizations
4217
4218* LUCENE-7787: spatial-extras HeatmapFacetCounter will now short-circuit it's
4219  work when Bits.MatchNoBits is passed. (David Smiley)
4220
4221Other
4222
4223* LUCENE-7796: Make IOUtils.reThrow idiom declare Error return type so
4224  callers may use it in a way that compiler knows subsequent code is
4225  unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways
4226  with a slightly different semantics (see javadoc). (Hossman, Robert Muir,
4227  Dawid Weiss)
4228
4229* LUCENE-7754: Inner classes should be static whenever possible.
4230  (Daniel Jelinski via Adrien Grand)
4231
4232* LUCENE-7751: Avoid boxing primitives only to call compareTo.
4233  (Daniel Jelinski via Adrien Grand)
4234
4235* LUCENE-7743: Never call new String(String).
4236  (Daniel Jelinski via Adrien Grand)
4237
4238* LUCENE-7761: Fixed comment in ReqExclScorer.
4239  (Pablo Pita Leira via Adrien Grand)
4240
4241======================= Lucene 6.5.1 =======================
4242
4243Bug Fixes
4244
4245* LUCENE-7755: Fixed join queries to not reference IndexReaders, as it could
4246  cause leaks if they are cached. (Adrien Grand)
4247
4248* LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.
4249  (Martin Amirault via Adrien Grand)
4250
4251* LUCENE-7769: The UnifiedHighligter wasn't highlighting portions of the query
4252  wrapped in BoostQuery or SpanBoostQuery. (David Smiley, Dmitry Malinin)
4253
4254Other
4255
4256* LUCENE-7763: Remove outdated comment in IndexWriterConfig.setIndexSort javadocs.
4257  (马可阳 via Christine Poerschke)
4258
4259======================= Lucene 6.5.0 =======================
4260
4261API Changes
4262
4263* LUCENE-7740: Refactor Range Fields to remove Field suffix (e.g., DoubleRange),
4264  move InetAddressRange and InetAddressPoint from sandbox to misc module, and
4265  refactor all other range fields from sandbox to core. (Nick Knize)
4266
4267* LUCENE-7624: TermsQuery has been renamed as TermInSetQuery and moved to core.
4268  (Alan Woodward)
4269
4270* LUCENE-7637: TermInSetQuery requires that all terms come from the same field.
4271  (Adrien Grand)
4272
4273* LUCENE-7644: FieldComparatorSource.newComparator() and
4274  SortField.getComparator() no longer throw IOException (Alan Woodward)
4275
4276* LUCENE-7643: Replaced doc-values queries in lucene/sandbox with factory
4277  methods on the *DocValuesField classes. (Adrien Grand)
4278
4279* LUCENE-7659: Added a IndexWriter#getFieldNames() method (experimental) to return
4280  all field names as visible from the IndexWriter. This would be useful for
4281  IndexWriter#updateDocValues() calls, to prevent calling with non-existent
4282  docValues fields (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless)
4283
4284* LUCENE-6959: Removed ToParentBlockJoinCollector in favour of
4285  ParentChildrenBlockJoinQuery, that can return the matching children documents per
4286  parent document. This query should be executed for each matching parent document
4287  after the main query has been executed. (Adrien Grand, Martijn van Groningen,
4288  Mike McCandless)
4289
4290* LUCENE-7628: Scorer.getChildren() now only returns Scorers that are
4291  positioned on the current document, and can throw an IOException.
4292  AssertingScorer checks that getChildren() is not called on an unpositioned
4293  Scorer.  (Alan Woodward, Adrien Grand)
4294
4295* LUCENE-7702: Removed GraphQuery in favour of simple boolean query. (Matt Webber via Jim Ferenczi)
4296
4297* LUCENE-7707: TopDocs.merge now takes a boolean option telling it
4298  when to use the incoming shard index versus when to assign the shard
4299  index itself, allowing users to merge shard responses incrementally
4300  instead of once all shard responses are present. (Simon Willnauer,
4301  Mike McCandless)
4302
4303* LUCENE-7700: A cleanup of merge throughput control logic. Refactored all the
4304  code previously scattered throughout the IndexWriter and
4305  ConcurrentMergeScheduler into a more accessible set of public methods (see
4306  MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and
4307  OneMerge.mergeInit). (Dawid Weiss, Mike McCandless).
4308
4309* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
4310  (David Smiley)
4311
4312New Features
4313
4314* LUCENE-7738: Add new InetAddressRange for indexing and querying InetAddress
4315  ranges. (Nick Knize)
4316
4317* LUCENE-7449: Add CROSSES relation support to RangeFieldQuery. (Nick Knize)
4318
4319* LUCENE-7623: Add FunctionScoreQuery and FunctionMatchQuery (Alan Woodward,
4320  Adrien Grand, David Smiley)
4321
4322* LUCENE-7619: Add WordDelimiterGraphFilter, just like
4323  WordDelimiterFilter except it produces correct token graphs so that
4324  proximity queries at search time will produce correct results (Mike
4325  McCandless)
4326
4327* LUCENE-7656: Added the LatLonDocValuesField.new(Box/Distance)Query() factory
4328  methods that are the equivalent of factory methods on LatLonPoint but operate
4329  on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery
4330  for best performance. (Adrien Grand)
4331
4332* LUCENE-7673: Added MultiValued[Int/Long/Float/Double]FieldSource that given a
4333  SortedNumericSelector.Type can give a ValueSource view of a
4334  SortedNumericDocValues field. (Tomás Fernández Löbbe)
4335
4336* LUCENE-7465: Add SimplePatternTokenizer and
4337  SimplePatternSplitTokenizer, using Lucene's regexp/automaton
4338  implementation for analysis/tokenization (Clinton Gormley, Mike
4339  McCandless)
4340
4341* LUCENE-7688: Add OneMergeWrappingMergePolicy class.
4342  (Keith Laban, Christine Poerschke)
4343
4344* LUCENE-7686: The near-real-time document suggester can now
4345  efficiently filter out duplicate suggestions (Uwe Schindler, Mike
4346  McCandless)
4347
4348* LUCENE-7712: SimpleQueryParser now supports default fuzziness
4349  syntax, mapping foo~ to a FuzzyQuery with edit distance 2.  (Lee
4350  Hinman, David Pilato via Mike McCandless)
4351
4352Bug Fixes
4353
4354* LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads
4355  and preserve all attributes. (Nathan Gass via Uwe Schindler)
4356
4357* LUCENE-7679: MemoryIndex was ignoring omitNorms settings on passed-in
4358  IndexableFields. (Alan Woodward)
4359
4360* LUCENE-7692: PatternReplaceCharFilterFactory now implements MultiTermAware.
4361  (Adrien Grand)
4362
4363* LUCENE-7685: ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the
4364  rewritten child query in their equals and hashCode implementations.
4365  (Adrien Grand)
4366
4367* LUCENE-7698: CommonGramsQueryFilter was producing a disconnected
4368  token graph, messing up phrase queries when it was used during query
4369  parsing (Ere Maijala via Mike McCandless)
4370
4371* LUCENE-7708: ShingleFilter without unigram was producing a disconnected
4372  token graph, messing up queries when it was used during query
4373  parsing (Jim Ferenczi)
4374
4375Improvements
4376
4377* LUCENE-7055: Added Weight#scorerSupplier, which allows to estimate the cost
4378  of a Scorer before actually building it, in order to optimize how the query
4379  should be run, eg. using points or doc values depending on costs of other
4380  parts of the query. (Adrien Grand)
4381
4382* LUCENE-7643: IndexOrDocValuesQuery allows to execute range queries using
4383  either points or doc values depending on which one is more efficient.
4384  (Adrien Grand)
4385
4386* LUCENE-7662: If index files are missing, throw CorruptIndexException instead
4387  of the less descriptive FileNotFound or NoSuchFileException (Mike Drob via
4388  Mike McCandless, Erick Erickson)
4389
4390* LUCENE-7680: UsageTrackingQueryCachingPolicy never caches term filters anymore
4391  since they are plenty fast. This also has the side-effect of leaving more
4392  space in the history for costly filters. (Adrien Grand)
4393
4394* LUCENE-7677: UsageTrackingQueryCachingPolicy now caches compound queries a bit
4395  earlier than regular queries in order to improve cache efficiency.
4396  (Adrien Grand)
4397
4398* LUCENE-7710: BlockPackedReader throws CorruptIndexException and includes
4399  IndexInput description instead of plain IOException (Mike Drob via
4400  Mike McCandless)
4401
4402* LUCENE-7695: ComplexPhraseQueryParser to support query time synonyms (Markus Jelsma
4403  via Mikhail Khludnev)
4404
4405* LUCENE-7747: QueryBuilder now iterates lazily over the possible paths when building a graph query
4406  (Jim Ferenczi)
4407
4408Optimizations
4409
4410* LUCENE-7641: Optimized point range queries to compute documents that do not
4411  match the range on single-valued fields when more than half the documents in
4412  the index would match. (Adrien Grand)
4413
4414* LUCENE-7656: Speed up for LatLonPointDistanceQuery by computing distances even
4415  less often. (Adrien Grand)
4416
4417* LUCENE-7661: Speed up for LatLonPointInPolygonQuery by pre-computing the
4418  relation of the polygon with a grid. (Adrien Grand)
4419
4420* LUCENE-7660: Speed up LatLonPointDistanceQuery by improving the detection of
4421  whether BKD cells are entirely within the distance close to the dateline.
4422  (Adrien Grand)
4423
4424* LUCENE-7654: ToParentBlockJoinQuery now implements two-phase iteration and
4425  computes scores lazily in order to be faster when used in conjunctions.
4426  (Adrien Grand)
4427
4428* LUCENE-7667: BKDReader now calls `IntersectVisitor.grow()` on larger
4429  increments. (Adrien Grand)
4430
4431* LUCENE-7638: Query parsers now analyze the token graph for articulation
4432  points (or cut vertices) in order to create more efficient queries for
4433  multi-token synonyms. (Jim Ferenczi)
4434
4435* LUCENE-7699: Query parsers now use span queries to produce more efficient
4436  phrase queries for multi-token synonyms. (Matt Webber via Jim Ferenczi)
4437
4438* LUCENE-7742: Fix places where we were unboxing and then re-boxing
4439  according to FindBugs (Daniel Jelinski via Mike McCandless)
4440
4441* LUCENE-7739: Fix places where we unnecessarily boxed while parsing
4442  a numeric value according to FindBugs (Daniel Jelinski via Mike
4443  McCandless)
4444
4445Build
4446
4447* LUCENE-7653: Update randomizedtesting to version 2.5.0. (Dawid Weiss)
4448
4449* LUCENE-7665: Remove grouping dependency from the join module.
4450  (Martijn van Groningen)
4451
4452* SOLR-10023: Add non-recursive 'test-nocompile' target: Only runs unit tests.
4453  Jars are not downloaded; compilation is not updated; and Clover is not enabled.
4454  (Steve Rowe)
4455
4456* LUCENE-7694: Update forbiddenapis to version 2.3. (Uwe Schindler)
4457
4458* LUCENE-7693: Replace "org.apache." logic in GetMavenDependenciesTask.
4459  (Daniel Collins, Christine Poerschke)
4460
4461* LUCENE-7726: Fix HTML entity bugs in Javadocs to be able to build with
4462  Java 9. (Uwe Schindler, Hossman)
4463
4464* LUCENE-7727: Replace end-of-life Markdown parser "Pegdown" by "Flexmark"
4465  for compatibility with Java 9. (Uwe Schindler)
4466
4467Other
4468
4469* LUCENE-7666: Fix typos in lucene-join package info javadoc.
4470  (Tom Saleeba via Christine Poerschke)
4471
4472* LUCENE-7658: queryparser/xml CoreParser now implements SpanQueryBuilder interface.
4473  (Daniel Collins, Christine Poerschke)
4474
4475* LUCENE-7715: NearSpansUnordered simplifications.
4476  (Paul Elschot via Adrien Grand)
4477
4478======================= Lucene 6.4.2 =======================
4479
4480Bug Fixes
4481
4482* LUCENE-7676: Fixed FilterCodecReader to override more super-class methods.
4483  Also added TestFilterCodecReader class. (Christine Poerschke)
4484
4485* LUCENE-7717: The UnifiedHighlighter and PostingsHighlighter were not highlighting
4486  prefix queries with multi-byte characters. TermRangeQuery is affected too.
4487  (Dmitry Malinin, David Smiley)
4488
4489======================= Lucene 6.4.1 =======================
4490
4491Build
4492
4493* LUCENE-7651: Fix Javadocs build for Java 8u121 by injecting "Google Code
4494  Prettify" without adding Javascript to Javadocs's -bottom parameter.
4495  Also update Prettify to latest version to fix Google Chrome issue.
4496  (Uwe Schindler)
4497
4498Bug Fixes
4499
4500* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
4501  with a TermContext is cached. (Adrien Grand)
4502
4503* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
4504  configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
4505  issues. (Adrien Grand)
4506
4507* LUCENE-7670: AnalyzingInfixSuggester should not immediately open an
4508  IndexWriter over an already-built index. (Steve Rowe)
4509
4510======================= Lucene 6.4.0 =======================
4511
4512API Changes
4513
4514* LUCENE-7533: Classic query parser no longer allows autoGeneratePhraseQueries
4515  to be set to true when splitOnWhitespace is false (and vice-versa).
4516
4517* LUCENE-7607: LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer
4518  are declared as throwing IOException (Alan Woodward)
4519
4520* LUCENE-7617: Collector construction for two-pass grouping queries is
4521  abstracted into a new Grouper class, which can be passed as a constructor
4522  parameter to GroupingSearch.  The abstract base classes for the different
4523  grouping Collectors are renamed to remove the Abstract* prefix.
4524  (Alan Woodward, Martijn van Groningen)
4525
4526* LUCENE-7609: The expressions module now uses the DoubleValuesSource API, and
4527  no longer depends on the queries module.  Expression#getValueSource() is
4528  replaced with Expression#getDoubleValuesSource(). (Alan Woodward, Adrien
4529  Grand)
4530
4531* LUCENE-7610: The facets module now uses the DoubleValuesSource API, and
4532  methods that take ValueSource parameters are deprecated (Alan Woodward)
4533
4534* LUCENE-7611: DocumentValueSourceDictionary now takes a LongValuesSource
4535  as a parameter, and the ValueSource equivalent is deprecated (Alan Woodward)
4536
4537New features
4538
4539* LUCENE-5867: Added BooleanSimilarity. (Robert Muir, Adrien Grand)
4540
4541* LUCENE-7466: Added AxiomaticSimilarity. (Peilin Yang via Tommaso Teofili)
4542
4543* LUCENE-7590: Added DocValuesStatsCollector to compute statistics on DocValues
4544  fields. (Shai Erera)
4545
4546* LUCENE-7587: The new FacetQuery and MultiFacetQuery helper classes
4547  make it simpler to execute drill down when drill sideways counts are
4548  not needed (Emmanuel Keller via Mike McCandless)
4549
4550* LUCENE-6664: A new SynonymGraphFilter outputs a correct graph
4551  structure for multi-token synonyms, separating out a
4552  FlattenGraphFilter that is hardwired into the current
4553  SynonymFilter.  This finally makes it possible to implement
4554  correct multi-token synonyms at search time.  See
4555  http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
4556  for details. (Mike McCandless)
4557
4558* LUCENE-5325: Added LongValuesSource and DoubleValuesSource, intended as
4559  type-safe replacements for ValueSource in the queries module.  These
4560  expose per-segment LongValues or DoubleValues iterators. (Alan Woodward, Adrien Grand)
4561
4562* LUCENE-7603: Graph token streams are now handled accurately by query
4563  parsers, by enumerating all paths and creating the corresponding
4564  query/ies as sub-clauses (Matt Weber via Mike McCandless)
4565
4566* LUCENE-7588: DrillSideways can now run queries concurrently, and
4567  supports an IndexSearcher using an executor service to run each query
4568  concurrently across all segments in the index (Emmanuel Keller via
4569  Mike McCandless)
4570
4571* LUCENE-7627: Added .intersect methods to SortedDocValues and
4572  SortedSetDocValues to allow filtering their TermsEnums with a
4573  CompiledAutomaton (Alan Woodward, Mike McCandless)
4574
4575Bug Fixes
4576
4577* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
4578  dictionary file it opened (Markus via Mike McCandless)
4579
4580* LUCENE-7562: CompletionFieldsConsumer sometimes throws
4581  NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
4582
4583* LUCENE-7533: Classic query parser: disallow autoGeneratePhraseQueries=true
4584  when splitOnWhitespace=false (and vice-versa). (Steve Rowe)
4585
4586* LUCENE-7536: ASCIIFoldingFilterFactory used to return an illegal multi-term
4587  component when preserveOriginal was set to true. (Adrien Grand)
4588
4589* LUCENE-7576: Fix Terms.intersect in the default codec to detect when
4590  the incoming automaton is a special case and throw a clearer
4591  exception than NullPointerException (Tom Mortimer via Mike McCandless)
4592
4593* LUCENE-6989: Fix Exception handling in MMapDirectory's unmap hack
4594  support code to work with Java 9's new InaccessibleObjectException
4595  that does not extend ReflectiveAccessException in Java 9.
4596  (Uwe Schindler)
4597
4598* LUCENE-7581: Lucene now prevents updating a doc values field that is used
4599  in the index sort, since this would lead to corruption.  (Jim
4600  Ferenczi via Mike McCandless)
4601
4602* LUCENE-7570: IndexWriter may deadlock if a commit is running while
4603  there are too many merges running and one of the merges hits a
4604  tragic exception (Joey Echeverria via Mike McCandless)
4605
4606* LUCENE-7594: Fixed point range queries on floating-point types to recommend
4607  using helpers for exclusive bounds that are consistent with Double.compare.
4608  (Adrien Grand, Dawid Weiss)
4609
4610* LUCENE-7606: Normalization with CustomAnalyzer would only apply the last
4611  token filter. (Adrien Grand)
4612
4613* LUCENE-7612: Removed an unused dependency from the suggester to the misc
4614  module. (Alan Woodward)
4615
4616Improvements
4617
4618* LUCENE-7532: Add back lost codec file format documentation
4619  (Shinichiro Abe via Mike McCandless)
4620
4621* LUCENE-6824: TermAutomatonQuery now rewrites to TermQuery,
4622  PhraseQuery or MultiPhraseQuery when the word automaton is simple
4623  (Mike McCandless)
4624
4625* LUCENE-7431: Allow a certain amount of overlap to be specified between the include
4626  and exclude arguments of SpanNotQuery via negative pre and/or post arguments.
4627  (Marc Morissette via David Smiley)
4628
4629* LUCENE-7544: UnifiedHighlighter: add extension points for handling custom queries.
4630  (Michael Braun, David Smiley)
4631
4632* LUCENE-7538: Asking IndexWriter to store a too-massive text field
4633  now throws IllegalArgumentException instead of a cryptic exception
4634  that closes your IndexWriter (Steve Chen via Mike McCandless)
4635
4636* LUCENE-7524: Added more detailed explanation of how IDF is computed in
4637  ClassicSimilarity and BM25Similarity. (Adrien Grand)
4638
4639* LUCENE-7564: AnalyzingInfixSuggester should close its IndexWriter by default
4640  at the end of build(). (Steve Rowe)
4641
4642* LUCENE-7526: Enhanced UnifiedHighlighter's passage relevancy for queries with
4643  wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed()
4644  which can be overridden to return false to eek out more speed in some cases.
4645  (Timothy M. Rodriguez, David Smiley)
4646
4647* LUCENE-7560: QueryBuilder.createFieldQuery is no longer final,
4648  giving custom query parsers subclassing QueryBuilder more freedom to
4649  control how text is analyzed and converted into a query (Matt Weber
4650  via Mike McCandless)
4651
4652* LUCENE-7537: Index time sorting now supports multi-valued sorts
4653  using selectors (MIN, MAX, etc.) (Jim Ferenczi via Mike McCandless)
4654
4655* LUCENE-7575: UnifiedHighlighter can now highlight fields with queries that don't
4656  necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default.
4657  See UH get/setFieldMatcher. (Jim Ferenczi via David Smiley)
4658
4659* LUCENE-7592: If the segments file is truncated, we now throw
4660  CorruptIndexException instead of the more confusing EOFException
4661  (Mike Drob via Mike McCandless)
4662
4663* LUCENE-6989: Make MMapDirectory's unmap hack work with Java 9 EA (b150+):
4664  Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer).
4665  Java 9 now needs same permissions like Java 8;
4666  RuntimePermission("accessClassInPackage.jdk.internal.ref")
4667  is no longer needed. Support for older Java 9 builds was removed.
4668  (Uwe Schindler)
4669
4670* LUCENE-7401: Changed the way BKD trees pick the split dimension in order to
4671  ensure all dimensions are indexed. (Adrien Grand)
4672
4673* LUCENE-7614: Complex Phrase Query parser ignores double quotes around single token
4674  prefix, wildcard, range queries (Mikhail Khludnev)
4675
4676* LUCENE-7620: Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks
4677  that would create Passages that are too short.  Only for use with the UnifiedHighlighter
4678  (and probably PostingsHighlighter).  (David Smiley)
4679
4680Optimizations
4681
4682* LUCENE-7568: Optimize merging when index sorting is used but the
4683  index is already sorted (Jim Ferenczi via Mike McCandless)
4684
4685* LUCENE-7563: The BKD in-memory index for dimensional points now uses
4686  a compressed format, using substantially less RAM in some cases
4687  (Adrien Grand, Mike McCandless)
4688
4689* LUCENE-7583: BKD writing now buffers each leaf block in heap before
4690  writing to disk, giving a small speedup in points-heavy use cases.
4691  (Mike McCandless)
4692
4693* LUCENE-7572: Doc values queries now cache their hash code. (Adrien Grand)
4694
4695Other
4696
4697* LUCENE-7546: Fixed references to benchmark wikipedia data and the Jenkins line-docs file
4698  (David Smiley)
4699
4700* LUCENE-7534: fix smokeTestRelease.py to run on Cygwin (Mikhail Khludnev)
4701
4702* LUCENE-7559: UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow
4703  passage creation to be customized. (David Smiley)
4704
4705* LUCENE-7599: Simplify TestRandomChains using Java's built-in Predicate and
4706  Function interfaces. (Ahmet Arslan via Adrien Grand)
4707
4708* LUCENE-7595: Improve RAMUsageTester in test-framework to estimate memory usage of
4709  runtime classes and work with Java 9 EA (b148+). Disable static field heap usage
4710  checker in LuceneTestCase.  (Uwe Schindler, Dawid Weiss)
4711
4712Build
4713
4714* LUCENE-7387: fix defaultCodec in build.xml to account for the line ending (hossman)
4715
4716* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
4717  Lucene and Solr DOAP RDF files into the Git source repository under
4718  dev-tools/doap/ and then pulling release dates from those files, rather than
4719  from JIRA. (Mano Kovacs, hossman, Steve Rowe)
4720
4721* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
4722  build 148+. Also update JGit version for working-copy checks. (Uwe Schindler)
4723
4724======================= Lucene 6.3.0 =======================
4725
4726API Changes
4727
4728New Features
4729
4730* LUCENE-7438: New "UnifiedHighlighter" derivative of the PostingsHighlighter that
4731  can consume offsets from postings, term vectors, or analysis.  It can highlight phrases
4732  as accurately as the standard Highlighter. Light term vectors can be used with offsets
4733  in postings for fast wildcard (MultiTermQuery) highlighting.
4734  (David Smiley, Timothy Rodriguez)
4735
4736* LUCENE-7490: SimpleQueryParser now parses '*' to MatchAllDocsQuery
4737  (Lee Hinman via Mike McCandless)
4738
4739Bug Fixes
4740
4741* LUCENE-7507: Upgrade morfologik-stemming to version 2.1.1 (fixes security
4742  manager issue with Polish dictionary lookup). (Dawid Weiss)
4743
4744* LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are
4745  neither BooleanQuery nor TermQuery.  (Steve Rowe)
4746
4747* LUCENE-7456: PerFieldPostings/DocValues was failing to delegate the
4748  merge method (Julien MASSENET via Mike McCandless)
4749
4750* LUCENE-7468: ASCIIFoldingFilter should not emit duplicated tokens when
4751  preserve original is on. (David Causse via Adrien Grand)
4752
4753* LUCENE-7484: FastVectorHighlighter failed to highlight SynonymQuery
4754  (Jim Ferenczi via Mike McCandless)
4755
4756* LUCENE-7476: JapaneseNumberFilter should not invoke incrementToken
4757  on its input after it's exhausted (Andy Hind via Mike McCandless)
4758
4759* LUCENE-7486: DisjunctionMaxQuery does not work correctly with queries that
4760  return negative scores.  (Ivan Provalov, Uwe Schindler, Adrien Grand)
4761
4762* LUCENE-7491: Suddenly turning on dimensional points for some fields
4763  that already exist in an index but didn't previously index
4764  dimensional points could cause unexpected merge exceptions (Hans
4765  Lund, Mike McCandless)
4766
4767* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
4768  (Hossman)
4769
4770* LUCENE-7493: FacetCollector.search threw an unexpected exception if
4771  you asked for zero hits but wanted facets (Mahesh via Mike McCandless)
4772
4773* LUCENE-7505: AnalyzingInfixSuggester returned invalid results when
4774  allTermsRequired is false and context filters are specified (Mike
4775  McCandless)
4776
4777* LUCENE-7429: AnalyzerWrapper can now modify the normalization chain too and
4778  DelegatingAnalyzerWrapper does the right thing automatically. (Adrien Grand)
4779
4780* LUCENE-7135: Lucene's check for 32 or 64 bit JVM now works around security
4781  manager blocking access to some properties (Aaron Madlon-Kay via
4782  Mike McCandless)
4783
4784Improvements
4785
4786* LUCENE-7439: FuzzyQuery now matches all terms within the specified
4787  edit distance, even if they are short terms (Mike McCandless)
4788
4789* LUCENE-7496: Better toString for SweetSpotSimilarity (janhoy)
4790
4791* LUCENE-7520: Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery
4792  when its field doesn't match the field the extraction is scoped to.
4793  (Cao Manh Dat via David Smiley)
4794
4795Optimizations
4796
4797* LUCENE-7501: BKDReader should not store the split dimension explicitly in the
4798  1D case. (Adrien Grand)
4799
4800Other
4801
4802* LUCENE-7513: Upgrade randomizedtesting to 2.4.0. (Dawid Weiss)
4803
4804* LUCENE-7452: Block join query exception suggests how to find a doc, which
4805 violates orthogonality requirement. (Mikhail Khludnev)
4806
4807* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All
4808  highlighters are supported via SearchTravRetHighlight. (David Smiley)
4809
4810Build
4811
4812* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on
4813  Java 9 (this changed with recent EA build b135).  (Uwe Schindler)
4814
4815======================= Lucene 6.2.1 =======================
4816
4817API Changes
4818
4819* LUCENE-7436: MinHashFilter's constructor, and some of its default
4820  settings, should be public.  (Doug Turnbull via Mike McCandless)
4821
4822Bug Fixes
4823
4824* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
4825  trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
4826  term.  (Thomas Kappler via David Smiley)
4827
4828* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
4829  ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
4830  with large skips. (yonik)
4831
4832* LUCENE-7442: MinHashFilter's ctor should validate its args.
4833  (Cao Manh Dat via Steve Rowe)
4834
4835* LUCENE-7318: Fix backwards compatibility issues around StandardAnalyzer
4836  and its components, introduced with Lucene 6.2.0. The moved classes
4837  were restored in their original packages: LowercaseFilter and StopFilter,
4838  as well as several utility classes.  (Uwe Schindler, Mike McCandless)
4839
4840======================= Lucene 6.2.0 =======================
4841
4842API Changes
4843
4844* ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5.
4845
4846New Features
4847
4848* LUCENE-7388: Add point based IntRangeField, FloatRangeField, LongRangeField along with
4849  supporting queries and tests (Nick Knize)
4850
4851* LUCENE-7381: Add point based DoubleRangeField and RangeFieldQuery for
4852  indexing and querying on Ranges up to 4 dimensions (Nick Knize)
4853
4854* LUCENE-6968: LSH Filter (Tommaso Teofili, Andy Hind, Cao Manh Dat)
4855
4856* LUCENE-7302: IndexWriter methods that change the index now return a
4857  long "sequence number" indicating the effective equivalent
4858  single-threaded execution order (Mike McCandless)
4859
4860* LUCENE-7335: IndexWriter's commit data is now late binding,
4861  recording key/values from a provided iterable based on when the
4862  commit actually takes place (Mike McCandless)
4863
4864* LUCENE-7287: UkrainianMorfologikAnalyzer is a new dictionary-based
4865  analyzer for the Ukrainian language (Andriy Rysin via Mike
4866  McCandless)
4867
4868* LUCENE-7373: Directory.renameFile, which did both renaming and fsync
4869  of the directory metadata, has been deprecated; use the new separate
4870  methods Directory.rename and Directory.syncMetaData instead (Robert Muir,
4871  Uwe Schindler, Mike McCandless)
4872
4873* LUCENE-7355: Added Analyzer#normalize(), which only applies normalization to
4874  an input string. (Adrien Grand)
4875
4876* LUCENE-7380: Add Polygon.fromGeoJSON for more easily creating
4877  Polygon instances from a standard GeoJSON string (Robert Muir, Mike
4878  McCandless)
4879
4880* LUCENE-7395: PerFieldSimilarityWrapper requires a default similarity
4881  for calculating query norm and coordination factor in Lucene 6.x.
4882  Lucene 7 will no longer have those factors.  (Uwe Schindler, Sascha Markus)
4883
4884* SOLR-9279: Queries module: new ComparisonBoolFunction base class
4885  (Doug Turnbull via David Smiley)
4886
4887Bug Fixes
4888
4889* LUCENE-6662: Fixed potential resource leaks. (Rishabh Patel via Adrien Grand)
4890
4891* LUCENE-7340: MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug().
4892  (Daniel Collins, David Smiley)
4893
4894* LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the
4895  wrong default AttributeFactory for new Tokenizers.
4896  (Terry Smith, Uwe Schindler)
4897
4898* LUCENE-7389: Fix FieldType.setDimensions(...) validation for the dimensionNumBytes
4899  parameter. (Martijn van Groningen)
4900
4901* LUCENE-7391: Fix performance regression in MemoryIndex's fields() introduced
4902  in Lucene 6. (Steve Mason via David Smiley)
4903
4904* LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query
4905  norm and coordination factor using a default similarity added as ctor param.
4906  (Uwe Schindler, Sascha Markus)
4907
4908* SOLR-9413: Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test.
4909  (AppChecker, Christine Poerschke)
4910
4911* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
4912  PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
4913
4914Improvements
4915
4916* LUCENE-7323: Compound file writing now verifies the incoming
4917  sub-files' checkums and segment IDs, to catch hardware issues or
4918  filesytem bugs earlier (Robert Muir, Mike McCandless)
4919
4920* LUCENE-6766: Index time sorting has graduated from the misc module
4921  to core, is much simpler to use, via
4922  IndexWriter.setIndexSort, and now works with dimensional points.
4923  (Adrien Grand, Mike McCandless)
4924
4925* LUCENE-5931: Detect when an application tries to reopen an
4926  IndexReader after (illegally) removing the old index and
4927  reindexing (Vitaly Funstein, Robert Muir, Mike McCandless)
4928
4929* LUCENE-6171: Lucene now passes the StandardOpenOption.CREATE_NEW
4930  option when writing new files so the filesystem enforces our
4931  write-once architecture, possibly catching externally caused
4932  issues sooner (Robert Muir, Mike McCandless)
4933
4934* LUCENE-7318: StandardAnalyzer has been moved from the analysis
4935  module into core and is now the default analyzer in
4936  IndexWriterConfig (Robert Muir, Mike McCandless)
4937
4938* LUCENE-7345: RAMDirectory now enforces write-once files as well
4939  (Robert Muir, Mike McCandless)
4940
4941* LUCENE-7337: MatchNoDocsQuery now scores with 0 normalization factor
4942  and empty boolean queries now rewrite to MatchNoDocsQuery instead of
4943  vice/versa (Jim Ferenczi via Mike McCandless)
4944
4945* LUCENE-7359: Add equals() and hashCode() to Explanation (Alan Woodward)
4946
4947* LUCENE-7353: ScandinavianFoldingFilterFactory and
4948  ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent.
4949  (Adrien Grand)
4950
4951* LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
4952  control whether to split on whitespace prior to text analysis.  Default
4953  behavior remains unchanged: split-on-whitespace=true. (Steve Rowe)
4954
4955* LUCENE-7276: MatchNoDocsQuery now includes an optional reason for
4956  why it was used (Jim Ferenczi via Mike McCandless)
4957
4958* LUCENE-7355: AnalyzingQueryParser now only applies the subset of the analysis
4959  chain that is about normalization for range/fuzzy/wildcard queries.
4960  (Adrien Grand)
4961
4962* LUCENE-7376: Add support for ToParentBlockJoinQuery to fast vector highlighter's
4963  FieldQuery. (Martijn van Groningen)
4964
4965* LUCENE-7385: Improve/fix assert messages in SpanScorer. (David Smiley)
4966
4967* LUCENE-7393: Add ICUTokenizer option to parse Myanmar text as syllables instead of words,
4968  because the ICU word-breaking algorithm has some issues. This allows for the previous
4969  tokenization used before Lucene 5. (AM, Robert Muir)
4970
4971* LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with
4972  no guarantees. This uses a store-store barrier and yields the current thread
4973  before unmapping to allow in-flight requests to finish. The new code no longer
4974  uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new
4975  ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones.
4976  (Robert Muir, Uwe Schindler)
4977
4978Optimizations
4979
4980* LUCENE-7330, LUCENE-7339: Speed up conjunction queries. (Adrien Grand)
4981
4982* LUCENE-7356: SearchGroup tweaks. (Christine Poerschke)
4983
4984* LUCENE-7351: Doc id compression for points. (Adrien Grand)
4985
4986* LUCENE-7371: Point values are now better compressed using run-length
4987  encoding. (Adrien Grand)
4988
4989* LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.
4990  (Adrien Grand)
4991
4992* LUCENE-7396, LUCENE-7399: Faster flush of points.
4993  (Adrien Grand, Mike McCandless)
4994
4995* LUCENE-7406: Automaton and PrefixQuery tweaks (fewer object (re)allocations).
4996  (Christine Poerschke)
4997
4998Other
4999
5000* LUCENE-4787: Fixed some highlighting javadocs. (Michael Dodsworth via Adrien
5001  Grand)
5002
5003* LUCENE-7334: Update ASM dependency to 5.1.  (Uwe Schindler)
5004
5005* LUCENE-7346: Update forbiddenapis to version 2.2.
5006  (Uwe Schindler)
5007
5008* LUCENE-7360: Explanation.toHtml() is deprecated. (Alan Woodward)
5009
5010* LUCENE-7372: Factor out an org.apache.lucene.search.FilterWeight class.
5011  (Christine Poerschke, Adrien Grand, David Smiley)
5012
5013* LUCENE-7384: Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to
5014  reuse the existing Similarity instead of creating a new one. (David Smiley)
5015
5016======================= Lucene 6.1.0 =======================
5017
5018New Features
5019
5020* LUCENE-7099: Add LatLonDocValuesField.newDistanceSort to the sandbox.
5021  (Robert Muir)
5022
5023* LUCENE-7140: Add PlanetModel.bisection to spatial3d (Karl Wright via
5024  Mike McCandless)
5025
5026* LUCENE-7069: Add LatLonPoint.nearest, to find nearest N points to a
5027  provided query point (Mike McCandless)
5028
5029* LUCENE-7234: Added InetAddressPoint.nextDown/nextUp to easily generate range
5030  queries with excluded bounds. (Adrien Grand)
5031
5032* LUCENE-7300: The misc module now has a directory wrapper that uses hard-links if
5033  applicable and supported when copying files from another FSDirectory in
5034  Directory#copyFrom. (Simon Willnauer)
5035
5036API Changes
5037
5038* LUCENE-7184: Refactor LatLonPoint encoding methods to new GeoEncodingUtils
5039  helper class in core geo package. Also refactors LatLonPointTests to
5040  TestGeoEncodingUtils (Nick Knize)
5041
5042* LUCENE-7163: refactor GeoRect, Polygon, and GeoUtils tests to geo
5043  package in core (Nick Knize)
5044
5045* LUCENE-7152: Refactor GeoUtils from lucene-spatial package to
5046  core (Nick Knize)
5047
5048* LUCENE-7141: Switch OfflineSorter's ByteSequencesReader to
5049  BytesRefIterator (Mike McCandless)
5050
5051* LUCENE-7150: Spatial3d gets useful APIs to create common shape
5052  queries, matching LatLonPoint.  (Karl Wright via Mike McCandless)
5053
5054* LUCENE-7243: Removed the LeafReaderContext parameter from
5055  QueryCachingPolicy#shouldCache. (Adrien Grand)
5056
5057Optimizations
5058
5059* LUCENE-7071: Reduce bytes copying in OfflineSorter, giving ~10%
5060  speedup on merging 2D LatLonPoint values (Mike McCandless)
5061
5062* LUCENE-7105, LUCENE-7215: Optimize LatLonPoint's newDistanceQuery.
5063  (Robert Muir)
5064
5065* LUCENE-7097: IntroSorter now recurses to 2 * log_2(count) quicksort
5066  stack depth before switching to heapsort (Adrien Grand, Mike McCandless)
5067
5068* LUCENE-7115: Speed up FieldCache.CacheEntry toString by setting initial
5069  StringBuilder capacity (Gregory Chanan)
5070
5071* LUCENE-7147: Improve disjoint check for geo distance query traversal
5072  (Ryan Ernst, Robert Muir, Mike McCandless)
5073
5074* LUCENE-7153: GeoPointField and LatLonPoint polygon queries now support
5075  multiple polygons and holes, with memory usage independent of
5076  polygon complexity. (Karl Wright, Mike McCandless, Robert Muir)
5077
5078* LUCENE-7159: Speed up LatLonPoint polygon performance. (Robert Muir, Ryan Ernst)
5079
5080* LUCENE-7211: Reduce memory & GC for spatial RPT Intersects when the number of
5081  matching docs is small. (Jeff Wartes, David Smiley)
5082
5083* LUCENE-7235: LRUQueryCache should not take a lock for segments that it will
5084  not cache on anyway. (Adrien Grand)
5085
5086* LUCENE-7238: Explicitly disable the query cache in MemoryIndex#createSearcher.
5087  (Adrien Grand)
5088
5089* LUCENE-7237: LRUQueryCache now prefers returning an uncached Scorer than
5090  waiting on a lock. (Adrien Grand)
5091
5092* LUCENE-7261, LUCENE-7262, LUCENE-7264, LUCENE-7258: Speed up DocIdSetBuilder
5093  (which is used by TermsQuery, multi-term queries and several point queries).
5094  (Adrien Grand, Jeff Wartes, David Smiley)
5095
5096* LUCENE-7299: Speed up BytesRefHash.sort() using radix sort. (Adrien Grand)
5097
5098* LUCENE-7306: Speed up points indexing and merging using radix sort.
5099  (Adrien Grand)
5100
5101Bug Fixes
5102
5103* LUCENE-7127: Fix corner case bugs in GeoPointDistanceQuery. (Robert Muir)
5104
5105* LUCENE-7166: Fix corner case bugs in LatLonPoint/GeoPointField bounding box
5106  queries. (Robert Muir)
5107
5108* LUCENE-7168: Switch to stable encode for geo3d, remove quantization
5109  test leniency, remove dead code (Mike McCandless)
5110
5111* LUCENE-7301: Multiple doc values updates to the same document within
5112  one update batch could be applied in the wrong order resulting in
5113  the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
5114
5115* LUCENE-7312: Fix geo3d's x/y/z double to int encoding to ensure it always
5116  rounds down (Karl Wright, Mike McCandless)
5117
5118* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
5119  where ranges of documents had only a single clause matching while
5120  other ranges had more than one clause matching (Ahmet Arslan,
5121  hossman, Mike McCandless)
5122
5123* LUCENE-7286: Added support for highlighting SynonymQuery. (Adrien Grand)
5124
5125* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
5126  dateline and indexed non-point shapes are much bigger than the heatmap region.
5127  (David Smiley)
5128
5129* LUCENE-7333: Fix test bug where randomSimpleString() generated a filename
5130  that is a reserved device name on Windows.  (Uwe Schindler, Mike McCandless)
5131
5132Other
5133
5134* LUCENE-7295: TermAutomatonQuery.hashCode calculates Automaton.toDot().hash,
5135  equivalence relationship replaced with object identity. (Dawid Weiss)
5136
5137* LUCENE-7277: Make Query.hashCode and Query.equals abstract. (Paul Elschot,
5138  Dawid Weiss)
5139
5140* LUCENE-7174: Upgrade randomizedtesting to 2.3.4. (Uwe Schindler, Dawid Weiss)
5141
5142* LUCENE-7205: Remove repeated nl.getLength() calls in
5143  (Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder. (Christine Poerschke)
5144
5145* LUCENE-7210: Make TestCore*Parser's analyzer choice override-able
5146  (Christine Poerschke, Daniel Collins)
5147
5148* LUCENE-7263: Make queryparser/xml/CoreParser's SpanQueryBuilderFactory
5149  accessible to deriving classes. (Daniel Collins via Christine Poerschke)
5150
5151* SOLR-9109/SOLR-9121: Allow specification of a custom Ivy settings file via system
5152  property "ivysettings.xml". (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe)
5153
5154* LUCENE-7206: Improve the ToParentBlockJoinQuery's explain by including the explain
5155  of the best matching child doc. (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen)
5156
5157* LUCENE-7307: Add getters to the PointInSetQuery and PointRangeQuery queries.
5158  (Martijn van Groningen, Adrien Grand)
5159
5160Build
5161
5162* LUCENE-7292: Use '-release' instead of '-source/-target' during
5163  compilation on Java 9+ to ensure real cross-compilation.
5164  (Uwe Schindler)
5165
5166* LUCENE-7296: Update forbiddenapis to version 2.1.
5167  (Uwe Schindler)
5168
5169======================= Lucene 6.0.1 =======================
5170
5171New Features
5172
5173* LUCENE-7278: Spatial-extras DateRangePrefixTree's Calendar is now configurable, to
5174  e.g. clear the Gregorian Change Date.  Also, toString(cal) is now identical to
5175  DateTimeFormatter.ISO_INSTANT. (David Smiley)
5176
5177Bug Fixes
5178
5179* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
5180  should delegate to the wrapped weight. (Martijn van Groningen)
5181
5182* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
5183
5184* LUCENE-7232: Fixed InetAddressPoint.newPrefixQuery, which was generating an
5185  incorrect query when the prefix length was not a multiple of 8. (Adrien Grand)
5186
5187* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
5188  on some valid inputs (Mike McCandless)
5189
5190* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
5191  that led to IllegalStateException being thrown when nothing was wrong.
5192  (David Smiley, yonik)
5193
5194* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
5195  match the underlying queries' (lower|upper)Term optionality logic.
5196  (Kaneshanathan Srivisagan, Christine Poerschke)
5197
5198* LUCENE-7257: Fixed PointValues#size(IndexReader, String), docCount,
5199  minPackedValue and maxPackedValue to skip leaves that do not have points
5200  rather than raising an IllegalStateException. (Adrien Grand)
5201
5202* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
5203  Woodward)
5204
5205* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
5206  phrase queries. (Eva Popenda, Alan Woodward)
5207
5208* LUCENE-7293: Don't try to highlight GeoPoint queries (Britta Weber,
5209  Nick Knize, Mike McCandless, Uwe Schindler)
5210
5211Documentation
5212
5213* LUCENE-7223: Improve XXXPoint javadocs to make it clear that you
5214  should separately add StoredField if you want to retrieve these
5215  field values at search time (Greg Huber, Robert Muir, Mike McCandless)
5216
5217======================= Lucene 6.0.0 =======================
5218
5219System Requirements
5220
5221* LUCENE-5950: Move to Java 8 as minimum Java version.
5222  (Ryan Ernst, Uwe Schindler)
5223
5224* LUCENE-6069: Lucene Core now gets compiled with Java 8 "compact1" profile,
5225  all other modules with "compact2".  (Robert Muir, Uwe Schindler)
5226
5227New Features
5228
5229* LUCENE-6631: Lucene Document classification (Tommaso Teofili, Alessandro Benedetti)
5230
5231* LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single
5232  token which is a concatenation of the sorted and de-duplicated set of
5233  input tokens. Useful for normalizing short text in clustering/linking
5234  tasks. (Mark Harwood, Adrien Grand)
5235
5236* LUCENE-5735: NumberRangePrefixTreeStrategy now includes interval/range faceting
5237  for counting ranges that align with the underlying terms as defined by the
5238  NumberRangePrefixTree (e.g. familiar date units like days).  (David Smiley)
5239
5240* LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field
5241  length computations, to avoid skew from documents that don't have the field.
5242  (Ahmet Arslan via Robert Muir)
5243
5244* LUCENE-6758: Use docCount+1 for DefaultSimilarity's IDF, so that queries
5245  containing nonexistent fields won't screw up querynorm. (Terry Smith, Robert Muir)
5246
5247* SOLR-7876: The QueryTimeout interface now has a isTimeoutEnabled method
5248  that can return false to exit from ExitableDirectoryReader wrapping at
5249  the point fields() is called. (yonik)
5250
5251* LUCENE-6825: Add low-level support for block-KD trees (Mike McCandless)
5252
5253* LUCENE-6852, LUCENE-6975: Add support for points (dimensionally
5254  indexed values) to index, document and codec APIs, including a
5255  simple text implementation.  (Mike McCandless)
5256
5257* LUCENE-6861: Create Lucene60Codec, supporting points.
5258  (Mike McCandless)
5259
5260* LUCENE-6879: Allow to define custom CharTokenizer instances without
5261  subclassing using Java 8 lambdas or method references. (Uwe Schindler)
5262
5263* LUCENE-6881: Cutover all BKD implementations to points
5264  (Mike McCandless)
5265
5266* LUCENE-6837: Add N-best output support to JapaneseTokenizer.
5267  (Hiroharu Konno via Christian Moen)
5268
5269* LUCENE-6962: Add per-dimension min/max to points
5270  (Mike McCandless)
5271
5272* LUCENE-6975: Add ExactPointQuery, to match a single N-dimensional
5273  point (Robert Muir, Mike McCandless)
5274
5275* LUCENE-6989: Add preliminary support for MMapDirectory unmapping in Java 9.
5276  (Uwe Schindler, Chris Hegarty, Peter Levart)
5277
5278* LUCENE-7040: Upgrade morfologik-stemming to version 2.1.0.
5279  (Dawid Weiss)
5280
5281* LUCENE-7048: Add XXXPoint.newSetQuery, to create a query that
5282  efficiently matches all documents containing any of the specified
5283  point values.  This is the analog of TermsQuery, but for points
5284  instead.   (Adrien Grand, Robert Muir, Mike McCandless)
5285
5286API Changes
5287
5288* LUCENE-7094: BBoxStrategy and PointVectorStrategy now support
5289  PointValues (in addition to legacy numeric trie).  Their APIs
5290  were changed a little and also made more consistent.  PointValues/Trie
5291  is optional, DocValues is optional, stored value is optional.
5292  (Nick Knize, David Smiley)
5293
5294* LUCENE-6067: Accountable.getChildResources has a default
5295  implementation returning the empty list.  (Robert Muir)
5296
5297* LUCENE-6583: FilteredQuery has been removed. Instead, you can construct a
5298  BooleanQuery with one MUST clause for the query, and one FILTER clause for
5299  the filter. (Adrien Grand)
5300
5301* LUCENE-6651: AttributeImpl#reflectWith(AttributeReflector) was made
5302  abstract and has no reflection-based default implementation anymore.
5303  (Uwe Schindler)
5304
5305* LUCENE-6706: PayloadTermQuery and PayloadNearQuery have been removed.
5306  Instead, use PayloadScoreQuery to wrap any SpanQuery. (Alan Woodward)
5307
5308* LUCENE-6829: OfflineSorter, and the classes that use it (suggesters,
5309  hunspell) now do all temporary file IO via Directory instead of
5310  directly through java's temp dir.  Directory.createTempOutput
5311  creates a uniquely named IndexOutput, and the new
5312  IndexOutput.getName returns its name (Dawid Weiss, Robert Muir, Mike
5313  McCandless)
5314
5315* LUCENE-6917: Deprecate and rename NumericXXX classes to
5316  LegacyNumericXXX in favor of points (Mike McCandless)
5317
5318* LUCENE-6947: SortField.missingValue is now protected. You can read its
5319  value using the new SortField.getMissingValue getter. (Adrien Grand)
5320
5321* LUCENE-7028: Remove duplicate method in LegacyNumericUtils.
5322  (Uwe Schindler)
5323
5324* LUCENE-7052, LUCENE-7053: Remove custom comparators from BytesRef
5325  class and solely use natural byte[] comparator throughout codebase.
5326  This also simplifies API of BytesRefHash. It also replaces the natural
5327  comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
5328  (Mike McCandless, Uwe Schindler, Robert Muir)
5329
5330* LUCENE-7060: Update Spatial4j to 0.6.  The package com.spatial4j.core
5331  is now org.locationtech.spatial4j. (David Smiley)
5332
5333* LUCENE-7058: Add getters to various Query implementations (Guillaume Smet via
5334  Alan Woodward)
5335
5336* LUCENE-7064: MultiPhraseQuery is now immutable and should be constructed
5337  with MultiPhraseQuery.Builder. (Luc Vanlerberghe via Adrien Grand)
5338
5339* LUCENE-7072: Geo3DPoint always uses WGS84 planet model.
5340  (Robert Muir, Mike McCandless)
5341
5342* LUCENE-7056: Geo3D classes are in different packages now. (David Smiley)
5343
5344* LUCENE-6952: These classes are now abstract: FilterCodecReader, FilterLeafReader,
5345  FilterCollector, FilterDirectory.  And some Filter* classes in
5346  lucene-test-framework too. (David Smiley)
5347
5348* SOLR-8867: FunctionValues.getRangeScorer now takes a LeafReaderContext instead
5349  of an IndexReader, and avoids matching documents without a value in the field
5350  for numeric fields. (yonik)
5351
5352Optimizations
5353
5354* LUCENE-6891: Use prefix coding when writing points in
5355  each leaf block in the default codec, to reduce the index
5356  size (Mike McCandless)
5357
5358* LUCENE-6901: Optimize points indexing: use faster
5359  IntroSorter instead of InPlaceMergeSorter, and specialize 1D
5360  merging to merge sort the already sorted segments instead of
5361  re-indexing (Mike McCandless)
5362
5363* LUCENE-6793: LegacyNumericRangeQuery.hashCode() is now less subject to hash
5364  collisions. (J.B. Langston via Adrien Grand)
5365
5366* LUCENE-7050: TermsQuery is now cached more aggressively by the default
5367  query caching policy. (Adrien Grand)
5368
5369* LUCENE-7066: PointRangeQuery got optimized for the case that all documents
5370  have a value and all points from the segment match. (Adrien Grand)
5371
5372Changes in Runtime Behavior
5373
5374* LUCENE-6789: IndexSearcher's default Similarity is changed to BM25Similarity.
5375  Use ClassicSimilarity to get the old vector space DefaultSimilarity. (Robert Muir)
5376
5377* LUCENE-6886: Reserve the .tmp file name extension for temp files,
5378  and codec components are no longer allowed to use this extension
5379  (Robert Muir, Mike McCandless)
5380
5381* LUCENE-6835: Directory.listAll now returns entries in sorted order,
5382  to not leak platform-specific behavior, and "retrying file deletion"
5383  is now the responsibility of Directory.deleteFile, not the caller.
5384  (Robert Muir, Mike McCandless)
5385
5386Tests
5387
5388* LUCENE-7009: Add expectThrows utility to LuceneTestCase. This uses a lambda
5389  expression to encapsulate a statement that is expected to throw an exception.
5390  (Ryan Ernst)
5391
5392Bug Fixes
5393
5394* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
5395  explain would also indicate that non matching documents would match.
5396  On top of that with score mode average, the explain would fail with a NPE.
5397  (Martijn van Groningen)
5398
5399* LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many
5400  temporary file descriptors, for large sorts (Mike McCandless)
5401
5402* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
5403  Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
5404
5405* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
5406  implementation (Karl Wright via Mike McCandless)
5407
5408* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
5409  on queries that could not be extracted. (Adrien Grand)
5410
5411* LUCENE-7126: Remove GeoPointDistanceRangeQuery. This query was implemented
5412  with boolean NOT, and incorrect for multi-valued documents. (Robert Muir)
5413
5414* LUCENE-7158: Consistently use earth's WGS84 mean radius wherever our
5415  geo search implementations approximate the earth as a sphere (Karl
5416  Wright via Mike McCandless)
5417
5418Other
5419
5420* LUCENE-7035: Upgrade icu4j to 56.1/unicode 8. (Robert Muir)
5421
5422* LUCENE-7087: Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>'
5423  as document instead of 'Document'. (Martijn van Groningen)
5424
5425* LUCENE-7091: Add doc values support to MemoryIndex
5426  (Martijn van Groningen, David Smiley)
5427
5428* LUCENE-7093: Add point values support to MemoryIndex
5429  (Martijn van Groningen, Mike McCandless)
5430
5431* LUCENE-7095: Add point values support to the numeric field query time join.
5432  (Martijn van Groningen, Mike McCandless)
5433
5434======================= Lucene 5.5.5 =======================
5435
5436Changes in Runtime Behavior
5437
5438* Resolving of external entities in queryparser/xml/CoreParser is disallowed
5439  by default. See SOLR-11477 for details.
5440
5441Bug Fixes
5442
5443* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
5444  PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
5445
5446* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
5447  by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
5448
5449======================= Lucene 5.5.4 =======================
5450
5451Bug Fixes
5452
5453* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
5454  trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
5455  term.  (Thomas Kappler via David Smiley)
5456
5457* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
5458  with a TermContext is cached. (Adrien Grand)
5459
5460* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
5461  configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
5462  issues. (Adrien Grand)
5463
5464* LUCENE-7562: CompletionFieldsConsumer sometimes throws
5465  NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
5466
5467* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
5468  dictionary file it opened (Markus via Mike McCandless)
5469
5470* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
5471  (Hossman)
5472
5473* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
5474  ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
5475  with large skips. (yonik)
5476
5477* LUCENE-7570: IndexWriter may deadlock if a commit is running while
5478  there are too many merges running and one of the merges hits a
5479  tragic exception (Joey Echeverria via Mike McCandless)
5480
5481Other
5482
5483* LUCENE-6989: Backport MMapDirectory's unmapping code from Lucene 6.4 to use
5484  MethodHandles. This allows it to work with Java 9 (EA build 150 and later).
5485  (Uwe Schindler)
5486
5487Build
5488
5489* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
5490  Lucene and Solr DOAP RDF files into the Git source repository under
5491  dev-tools/doap/ and then pulling release dates from those files, rather than
5492  from JIRA. (Mano Kovacs, hossman, Steve Rowe)
5493
5494* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
5495  build 148+. Also update JGit version for working-copy checks. This does not
5496  fix all issues with Java 9, but allows to build the distribution.
5497  (Uwe Schindler)
5498
5499* LUCENE-7651: Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation
5500  build to inject "Google Code Prettify" without adding Javascript to Javadocs's
5501  -bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are
5502  built with Java 7, as there is no generic way in Java 7 to inject Javascript
5503  without breaking Java 8 (and possible paid Java 7 security updates). This
5504  fix also updates Prettify to latest version to work around a Google Chrome
5505  issue. (Uwe Schindler)
5506
5507======================= Lucene 5.5.3 =======================
5508(No Changes)
5509
5510======================= Lucene 5.5.2 =======================
5511
5512Bug Fixes
5513
5514* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
5515  explain would also indicate that non matching documents would match.
5516  On top of that with score mode average, the explain would fail with a NPE.
5517  (Martijn van Groningen)
5518
5519* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
5520  Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
5521
5522* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
5523  implementation (Karl Wright via Mike McCandless)
5524
5525* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
5526  should delegate to the wrapped weight. (Martijn van Groningen)
5527
5528* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
5529  on some valid inputs (Mike McCandless)
5530
5531* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
5532  match the underlying queries' (lower|upper)Term optionality logic.
5533  (Kaneshanathan Srivisagan, Christine Poerschke)
5534
5535* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
5536  Woodward)
5537
5538* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
5539  phrase queries. (Eva Popenda, Alan Woodward)
5540
5541* LUCENE-7301: Multiple doc values updates to the same document within
5542  one update batch could be applied in the wrong order resulting in
5543  the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
5544
5545* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
5546  where ranges of documents had only a single clause matching while
5547  other ranges had more than one clause matching (Ahmet Arslan,
5548  hossman, Mike McCandless)
5549
5550* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
5551  dateline and indexed non-point shapes are much bigger than the heatmap region.
5552  (David Smiley)
5553
5554======================= Lucene 5.5.1 =======================
5555
5556Bug fixes
5557
5558* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
5559  on queries that could not be extracted. (Adrien Grand)
5560
5561* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
5562  that led to IllegalStateException being thrown when nothing was wrong.
5563  (David Smiley, yonik)
5564
5565* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
5566
5567======================= Lucene 5.5.0 =======================
5568
5569New Features
5570
5571* LUCENE-5868: JoinUtil.createJoinQuery(..,NumericType,..) query-time join
5572  for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values.
5573  (Alexey Zelin via Mikhail Khludnev)
5574
5575* LUCENE-6939: Add exponential reciprocal scoring to
5576  BlendedInfixSuggester, to even more strongly favor suggestions that
5577  match closer to the beginning (Arcadius Ahouansou via Mike McCandless)
5578
5579* LUCENE-6958: Improved CustomAnalyzer to take class references to factories
5580  as alternative to their SPI name. This enables compile-time safety when
5581  defining analyzer's components.  (Uwe Schindler, Shai Erera)
5582
5583* LUCENE-6818, LUCENE-6986: Add DFISimilarity implementing the divergence
5584  from independence model. (Ahmet Arslan via Robert Muir)
5585
5586* SOLR-4619: Added removeAllAttributes() to AttributeSource, which removes
5587  all previously added attributes.
5588
5589* LUCENE-7010: Added MergePolicyWrapper to allow easy wrapping of other policies.
5590  (Shai Erera)
5591
5592API Changes
5593
5594* LUCENE-6997: refactor sandboxed GeoPointField and query classes to lucene-spatial
5595  module under new lucene.spatial.geopoint package (Nick Knize)
5596
5597* LUCENE-6908: GeoUtils static relational methods have been refactored to new
5598  GeoRelationUtils and now correctly handle large irregular rectangles, and
5599  pole crossing distance queries. (Nick Knize)
5600
5601* LUCENE-6900: Grouping sortWithinGroup variables used to allow null to mean
5602  Sort.RELEVANCE.  Null is no longer permitted.  (David Smiley)
5603
5604* LUCENE-6919: The Scorer class has been refactored to expose an iterator
5605  instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed
5606  to twoPhaseIterator() for consistency. (Adrien Grand)
5607
5608* LUCENE-6973: TeeSinkTokenFilter no longer accepts a SinkFilter (the latter
5609  has been removed). If you wish to filter the sinks, you can wrap them with
5610  any other TokenFilter (e.g. a FilteringTokenFilter). Also, you can no longer
5611  add a SinkTokenStream to an existing TeeSinkTokenFilter. If you need to
5612  share multiple streams with a single sink, chain them with multiple
5613  TeeSinkTokenFilters.
5614  DateRecognizerSinkFilter was renamed to DateRecognizerFilter and moved under
5615  analysis/common. TokenTypeSinkFilter was removed (use TypeTokenFilter instead).
5616  TokenRangeSinkFilter was removed. (Shai Erera, Uwe Schindler)
5617
5618* LUCENE-6980: Default applyAllDeletes to true when opening
5619  near-real-time readers (Mike McCandless)
5620
5621* LUCENE-6981: SpanQuery.getTermContexts() helper methods are now public, and
5622  SpanScorer has a public getSpans() method. (Alan Woodward)
5623
5624* LUCENE-6932: IndexInput.seek implementations now throw EOFException
5625  if you seek beyond the end of the file (Adrien Grand, Mike McCandless)
5626
5627* LUCENE-6988: IndexableField.tokenStream() no longer throws IOException
5628  (Alan Woodward)
5629
5630* LUCENE-7028: Deprecate a duplicate method in NumericUtils.
5631  (Uwe Schindler)
5632
5633Optimizations
5634
5635* LUCENE-6930: Decouple GeoPointField from NumericType by using a custom
5636  and efficient GeoPointTokenStream and TermEnum designed for GeoPoint prefix
5637  terms. (Nick Knize)
5638
5639* LUCENE-6951: Improve GeoPointInPolygonQuery using point orientation based
5640  line crossing algorithm, and adding result for multi-value docs when least
5641  1 point satisfies polygon criteria. (Nick Knize)
5642
5643* LUCENE-6889: BooleanQuery.rewrite now performs some query optimization, in
5644  particular to rewrite queries that look like: "+*:* #filter" to a
5645  "ConstantScore(filter)". (Adrien Grand)
5646
5647* LUCENE-6912: Grouping's Collectors now calculate a response to needsScores()
5648  instead of always 'true'. (David Smiley)
5649
5650* LUCENE-6815: DisjunctionScorer now advances two-phased iterators lazily,
5651  stopping to evaluate them as soon as a single one matches. The other iterators
5652  will be confirmed lazily when computing score() or freq(). (Adrien Grand)
5653
5654* LUCENE-6926: MUST_NOT clauses now use the match cost API to run the slow bits
5655  last whenever possible. (Adrien Grand)
5656
5657* LUCENE-6944: BooleanWeight no longer creates sub-scorers if BS1 is not
5658  applicable. (Adrien Grand)
5659
5660* LUCENE-6940: MUST_NOT clauses execute faster, especially when they are sparse.
5661  (Adrien Grand)
5662
5663* LUCENE-6470: Improve efficiency of TermsQuery constructors. (Robert Muir)
5664
5665Bug Fixes
5666
5667* LUCENE-6976: BytesRefTermAttributeImpl.copyTo NPE'ed if BytesRef was null.
5668  Added equals & hashCode, and a new test for these things. (David Smiley)
5669
5670* LUCENE-6932: RAMDirectory's IndexInput was failing to throw
5671  EOFException in some cases (Stéphane Campinas, Adrien Grand via Mike
5672  McCandless)
5673
5674* LUCENE-6896: Don't treat the smallest possible norm value as an infinitely
5675  long document in SimilarityBase or BM25Similarity. Add more warnings to sims
5676  that will not work well with extreme tf values. (Ahmet Arslan, Robert Muir)
5677
5678* LUCENE-6984: SpanMultiTermQueryWrapper no longer modifies its wrapped query.
5679  (Alan Woodward, Adrien Grand)
5680
5681* LUCENE-6998: Fix a couple places to better detect truncated index files
5682  as corruption.  (Robert Muir, Mike McCandless)
5683
5684* LUCENE-7002: Fixed MultiCollector to not throw a NPE if setScorer is called
5685  after one of the sub collectors is done collecting. (John Wang, Adrien Grand)
5686
5687* LUCENE-7027: Fixed NumericTermAttribute to not throw IllegalArgumentException
5688  after NumericTokenStream was exhausted.  (Uwe Schindler, Lee Hinman,
5689  Mike McCandless)
5690
5691* LUCENE-7018: Fix GeoPointTermQueryConstantScoreWrapper to add document on
5692  first GeoPointField match. (Nick Knize)
5693
5694* LUCENE-7019: Add two-phase iteration to GeoPointTermQueryConstantScoreWrapper.
5695  (Robert Muir via Nick Knize)
5696
5697* LUCENE-6989: Improve MMapDirectory's unmapping checks to catch more non-working
5698  cases. The unmap-hack does not yet work with recent Java 9. Official support
5699  will come with Lucene 6.  (Uwe Schindler)
5700
5701Other
5702
5703* LUCENE-6924: Upgrade randomizedtesting to 2.3.2. (Dawid Weiss)
5704
5705* LUCENE-6920: Improve custom function checks in expressions module
5706  to use MethodHandles and work without extra security privileges.
5707  (Uwe Schindler, Robert Muir)
5708
5709* LUCENE-6921: Fix SPIClassIterator#isParentClassLoader to don't
5710  require extra permissions.  (Uwe Schindler)
5711
5712* LUCENE-6923: Fix RamUsageEstimator to access private fields inside
5713  AccessController block for computing size. (Robert Muir)
5714
5715* LUCENE-6907: make TestParser extendable, rename test/.../xml/
5716  NumericRangeQueryQuery.xml to NumericRangeQuery.xml
5717  (Christine Poerschke)
5718
5719* LUCENE-6925: add ForceMergePolicy class in test-framework
5720  (Christine Poerschke)
5721
5722* LUCENE-6945: factor out TestCorePlus(Queries|Extensions)Parser from
5723  TestParser, rename TestParser to TestCoreParser (Christine Poerschke)
5724
5725* LUCENE-6949: fix (potential) resource leak in SynonymFilterFactory
5726  (https://scan.coverity.com/projects/5620 CID 120656)
5727  (Christine Poerschke, Coverity Scan (via Rishabh Patel))
5728
5729* LUCENE-6961: Improve Exception handling in AnalysisFactories /
5730  AnalysisSPILoader: Don't wrap exceptions occuring in factory's
5731  ctor inside InvocationTargetException.  (Uwe Schindler)
5732
5733* LUCENE-6965: Expression's JavascriptCompiler now throw ParseException
5734  with bad function names or bad arity instead of IllegalArgumentException.
5735  (Tomás Fernández Löbbe, Uwe Schindler, Ryan Ernst)
5736
5737* LUCENE-6964: String-based signatures in JavascriptCompiler replaced
5738  with better compile-time-checked MethodType; generated class files
5739  are no longer marked as synthetic.  (Uwe Schindler)
5740
5741* LUCENE-6978: Refactor several code places that lookup locales
5742  by string name to use BCP47 locale tag instead. LuceneTestCase
5743  now also prints locales on failing tests this way.
5744  Locale#forLanguageTag() and Locale#toString() were placed on list
5745  of forbidden signatures.  (Uwe Schindler, Robert Muir)
5746
5747* LUCENE-6988: You can now add IndexableFields directly to a MemoryIndex,
5748  and create a MemoryIndex from a lucene Document.  (Alan Woodward)
5749
5750* LUCENE-7005: TieredMergePolicy tweaks (>= vs. >, @see get vs. set)
5751  (Christine Poerschke)
5752
5753* LUCENE-7006: increase BaseMergePolicyTestCase use (TestNoMergePolicy and
5754  TestSortingMergePolicy now extend it, TestUpgradeIndexMergePolicy added)
5755  (Christine Poerschke)
5756
5757======================= Lucene 5.4.1 =======================
5758
5759Bug Fixes
5760
5761* LUCENE-6910: fix 'if ... > Integer.MAX_VALUE' check in
5762  (Binary|Numeric)DocValuesFieldUpdates.merge
5763  (https://scan.coverity.com/projects/5620 CID 119973 and CID 120081)
5764  (Christine Poerschke, Coverity Scan (via Rishabh Patel))
5765
5766* LUCENE-6946: SortField.equals now takes the missingValue parameter into
5767  account. (Adrien Grand)
5768
5769* LUCENE-6918: LRUQueryCache.onDocIdSetEviction is only called when at least
5770  one DocIdSet is being evicted. (Adrien Grand)
5771
5772* LUCENE-6929: Fix SpanNotQuery rewriting to not drop the pre/post parameters.
5773  (Tim Allison via Adrien Grand)
5774
5775* LUCENE-6950: Fix FieldInfos handling of UninvertingReader, e.g. do not
5776  hide the true docvalues update generation or other properties.
5777  (Ishan Chattopadhyaya via Robert Muir)
5778
5779* LUCENE-6948: Fix ArrayIndexOutOfBoundsException in PagedBytes$Reader.fill
5780  by removing an unnecessary long-to-int cast.
5781  (Michael Lawley via Christine Poerschke)
5782
5783* SOLR-7865: BlendedInfixSuggester was returning too many results
5784  (Arcadius Ahouansou via Mike McCandless)
5785
5786* LUCENE-6970: Fixed off-by-one error in Lucene54DocValuesProducer that could
5787  potentially corrupt doc values. (Adrien Grand)
5788
5789* LUCENE-2229: Fix Highlighter's SimpleSpanFragmenter when multiple adjacent
5790  stop words following a span can unduly make the fragment way too long.
5791  (Elmer Garduno, Lukhnos Liu via David Smiley)
5792
5793======================= Lucene 5.4.0 =======================
5794
5795New Features
5796
5797* LUCENE-6875: New Serbian Filter. (Nikola Smolenski via Robert Muir,
5798  Dawid Weiss)
5799
5800* LUCENE-6720: New FunctionRangeQuery wrapper around ValueSourceScorer
5801  (returned from ValueSource/FunctionValues.getRangeScorer()). (David Smiley)
5802
5803* LUCENE-6724: Add utility APIs to GeoHashUtils to compute neighbor
5804  geohash cells (Nick Knize via Mike McCandless).
5805
5806* LUCENE-6737: Add DecimalDigitFilter which folds unicode digits to basic latin.
5807  (Robert Muir)
5808
5809* LUCENE-6699: Add integration of BKD tree and geo3d APIs to give
5810  fast, very accurate query to find all indexed points within an
5811  earth-surface shape (Karl Wright, Mike McCandless)
5812
5813* LUCENE-6838: Added IndexSearcher#getQueryCache and #getQueryCachingPolicy.
5814  (Adrien Grand)
5815
5816* LUCENE-6844: PayloadScoreQuery can include or exclude underlying span scores
5817  from its score calculations (Bill Bell, Alan Woodward)
5818
5819* LUCENE-6778: Add GeoPointDistanceRangeQuery, to search for points
5820  within a "ring" (beyond a minimum distance and below a maximum
5821  distance) (Nick Knize via Mike McCandless)
5822
5823* LUCENE-6874: Add a new UnicodeWhitespaceTokenizer to analysis/common
5824  that uses Unicode character properties extracted from ICU4J to tokenize
5825  text on whitespace. This tokenizer will split on non-breaking
5826  space (NBSP), too.  (David Smiley, Uwe Schindler, Steve Rowe)
5827
5828API Changes
5829
5830* LUCENE-6590: Query.setBoost(), Query.getBoost() and Query.clone() are gone.
5831  In order to apply boosts, you now need to wrap queries in a BoostQuery.
5832  (Adrien Grand)
5833
5834* LUCENE-6716: SpanPayloadCheckQuery now takes a List<BytesRef> rather than
5835  a Collection<byte[]>. (Alan Woodward)
5836
5837* LUCENE-6489: The various span payload queries have been moved to the queries
5838  submodule, and PayloadSpanUtil is now in sandbox. (Alan Woodward)
5839
5840* LUCENE-6650: The spatial module no longer uses Filter in any way.  All
5841  spatial Filters are now subclass Query.  The spatial heatmap/facet API
5842  now accepts a Bits parameter to filter counts. (David Smiley, Adrien Grand)
5843
5844* LUCENE-6803: Deprecate sandbox Regexp Query. (Uwe Schindler)
5845
5846* LUCENE-6301: org.apache.lucene.search.Filter is now deprecated. You should use
5847  Query objects instead of Filters, and the BooleanClause.Occur.FILTER clause in
5848  order to let Lucene know that a Query should be used for filtering but not
5849  scoring.
5850
5851* LUCENE-6939: SpanOrQuery.addClause is now deprecated, clauses should all be
5852  provided at construction time. (Paul Elschot via Adrien Grand)
5853
5854* LUCENE-6855: CachingWrapperQuery is deprecated and will be removed in 6.0.
5855  (Adrien Grand)
5856
5857* LUCENE-6870: DisjunctionMaxQuery#add is now deprecated, clauses should all be
5858  provided at construction time. (Adrien Grand)
5859
5860* LUCENE-6884: Analyzer.tokenStream() and Tokenizer.setReader() are no longer
5861  declared as throwing IOException. (Alan Woodward)
5862
5863* LUCENE-6849: Expose IndexWriter.flush() method, to move all
5864  in-memory segments to disk without opening a near-real-time reader
5865  nor calling fsync (Robert Muir, Simon Willnauer, Mike McCandless)
5866
5867* LUCENE-6911: Add correct StandardQueryParser.getMultiFields() method,
5868  deprecate no-op StandardQueryParser.getMultiFields(CharSequence[]) method.
5869  (Christine Poerschke, Mikhail Khludnev, Coverity Scan (via Rishabh Patel))
5870
5871Optimizations
5872
5873* LUCENE-6708: TopFieldCollector does not compute the score several times on the
5874  same document anymore. (Adrien Grand)
5875
5876* LUCENE-6720: ValueSourceScorer, returned from
5877  FunctionValues.getRangeScorer(), now uses TwoPhaseIterator. (David Smiley)
5878
5879* LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer for better
5880  performance when used as a top-level query. (Adrien Grand)
5881
5882* LUCENE-6746: DisjunctionMaxQuery, BoostingQuery and BoostedQuery now create
5883  sub weights through IndexSearcher so that they can be cached. (Adrien Grand)
5884
5885* LUCENE-6754: Optimized IndexSearcher.count for the cases when it can use
5886  index statistics instead of collecting all matches. (Adrien Grand)
5887
5888* LUCENE-6773: Nested conjunctions now iterate over documents as if clauses
5889  were all at the same level. (Adrien Grand)
5890
5891* LUCENE-6777: Reuse BytesRef when visiting term ranges in
5892  GeoPointTermsEnum to reduce GC pressure (Nick Knize via Mike
5893  McCandless)
5894
5895* LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write
5896  strings larger than 64kb by an amount equal to string's utf8 size.
5897  (Dawid Weiss, Robert Muir, shalin)
5898
5899* LUCENE-6850: Optimize BooleanScorer for sparse clauses. (Adrien Grand)
5900
5901* LUCENE-6840: Ordinal indexes for SORTED_SET/SORTED_NUMERIC fields and
5902  addresses for BINARY fields are now stored on disk instead of in memory.
5903  (Adrien Grand)
5904
5905* LUCENE-6878: Speed up TopDocs.merge. (Daniel Jelinski via Adrien Grand)
5906
5907* LUCENE-6885: StandardDirectoryReader (initialCapacity) tweaks
5908  (Christine Poerschke)
5909
5910* LUCENE-6863: Optimized storage requirements of doc values fields when less
5911  than 1% of documents have a value. (Adrien Grand)
5912
5913* LUCENE-6892: various lucene.index initialCapacity tweaks
5914  (Christine Poerschke)
5915
5916* LUCENE-6276: Added TwoPhaseIterator.matchCost() which allows to confirm the
5917  least costly TwoPhaseIterators first. (Paul Elschot via Adrien Grand)
5918
5919* LUCENE-6898: In the default codec, the last stored field value will not
5920  be fully read from disk if the supplied StoredFieldVisitor doesn't want it.
5921  So put your largest text field value last to benefit. (David Smiley)
5922
5923* LUCENE-6909: Remove unnecessary synchronized from
5924  FacetsConfig.getDimConfig for better concurrency (Sanne Grinovero
5925  via Mike McCandless)
5926
5927* SOLR-7730: Speed up SlowCompositeReaderWrapper.getSortedSetDocValues() by
5928  avoiding merging FieldInfos just to check doc value type.
5929  (Paul Vasilyev, Yuriy Pakhomov, Mikhail Khludnev, yonik)
5930
5931Bug Fixes
5932
5933* LUCENE-6905: Unwrap center longitude for dateline crossing
5934  GeoPointDistanceQuery. (Nick Knize)
5935
5936* LUCENE-6817: ComplexPhraseQueryParser.ComplexPhraseQuery does not display
5937  slop in toString(). (Ahmet Arslan via Dawid Weiss)
5938
5939* LUCENE-6730: Hyper-parameter c is ignored in term frequency NormalizationH1.
5940  (Ahmet Arslan via Robert Muir)
5941
5942* LUCENE-6742: Lovins & Finnish implementation of SnowballFilter was
5943  fixed to behave exactly as specified. A bug in the snowball compiler
5944  caused differences in output of the filter in comparison to the original
5945  test data.  In addition, the performance of those filters was improved
5946  significantly.  (Uwe Schindler, Robert Muir)
5947
5948* LUCENE-6783: Removed side effects from FuzzyLikeThisQuery.rewrite.
5949  (Adrien Grand)
5950
5951* LUCENE-6776: Fix geo3d math to handle randomly squashed planet
5952  models (Karl Wright via Mike McCandless)
5953
5954* LUCENE-6792: Fix TermsQuery.toString() to work with binary terms.
5955  (Ruslan Muzhikov, Robert Muir)
5956
5957* LUCENE-5503: When Highlighter's WeightedSpanTermExtractor converts a
5958  PhraseQuery to an equivalent SpanQuery, it would sometimes use a slop that is
5959  too low (no highlight) or determine inOrder wrong.
5960  (Tim Allison via David Smiley)
5961
5962* LUCENE-6790: Fix IndexWriter thread safety when one thread is
5963  handling a tragic exception but another is still committing (Mike
5964  McCandless)
5965
5966* LUCENE-6810: Upgrade to Spatial4j 0.5 -- fixes some edge-case bugs in the
5967  spatial module. See https://github.com/locationtech/spatial4j/blob/master/CHANGES.md
5968  (David Smiley)
5969
5970* LUCENE-6813: OfflineSorter no longer removes its output Path up
5971  front, and instead opens it for write with the
5972  StandardCopyOption.REPLACE_EXISTING to overwrite any prior file, so
5973  that callers can safely use Files.createTempFile for the output.
5974  This change also fixes OfflineSorter's default temp directory when
5975  running tests to use mock filesystems so e.g. we detect file handle
5976  leaks (Dawid Weiss, Robert Muir, Mike McCandless)
5977
5978* LUCENE-6813: RangeTreeWriter was failing to close all file handles
5979  it opened, leading to intermittent failures on Windows (Dawid Weiss,
5980  Robert Muir, Mike McCandless)
5981
5982* LUCENE-6826: Fix ClassCastException when merging a field that has no
5983  terms because they were filtered out by e.g. a FilterCodecReader
5984  (Trejkaz via Mike McCandless)
5985
5986* LUCENE-6823: LocalReplicator should use System.nanoTime as its clock
5987  source for checking for expiration (Ishan Chattopadhyaya via Mike
5988  McCandless)
5989
5990* LUCENE-6856: The Weight wrapper used by LRUQueryCache now delegates to the
5991  original Weight's BulkScorer when applicable. (Adrien Grand)
5992
5993* LUCENE-6858: Fix ContextSuggestField to correctly wrap token stream
5994  when using CompletionAnalyzer. (Areek Zillur)
5995
5996* LUCENE-6872: IndexWriter handles any VirtualMachineError, not just OOM,
5997  as tragic. (Robert Muir)
5998
5999* LUCENE-6814: PatternTokenizer no longer hangs onto heap sized to the
6000  maximum input string it's ever seen, which can be a large memory
6001  "leak" if you tokenize large strings with many threads across many
6002  indices (Alex Chow via Mike McCandless)
6003
6004* LUCENE-6888: Explain output of map() function now also prints default value (janhoy)
6005
6006Other
6007
6008* LUCENE-6899: Upgrade randomizedtesting to 2.3.1. (Dawid Weiss)
6009
6010* LUCENE-6478: Test execution can hang with java.security.debug. (Dawid Weiss)
6011
6012* LUCENE-6862: Upgrade of RandomizedRunner to version 2.2.0. (Dawid Weiss)
6013
6014* LUCENE-6857: Validate StandardQueryParser with NOT operator
6015  with-in parantheses. (Jigar Shah via Dawid Weiss)
6016
6017* LUCENE-6827: Use explicit capacity ArrayList instead of a LinkedList
6018  in MultiFieldQueryNodeProcessor. (Dawid Weiss).
6019
6020* LUCENE-6812: Upgrade RandomizedTesting to 2.1.17. (Dawid Weiss)
6021
6022* LUCENE-6174: Improve "ant eclipse" to select right JRE for building.
6023  (Uwe Schindler, Dawid Weiss)
6024
6025* LUCENE-6417, LUCENE-6830: Upgrade ANTLR used in expressions module
6026  to version 4.5.1-1.  (Jack Conradson, Uwe Schindler)
6027
6028* LUCENE-6729: Upgrade ASM used in expressions module to version 5.0.4.
6029  (Uwe Schindler)
6030
6031* LUCENE-6738: remove IndexWriterConfig.[gs]etIndexingChain
6032  (Christine Poerschke)
6033
6034* LUCENE-6755: more tests of ToChildBlockJoinScorer.advance (hossman)
6035
6036* LUCENE-6571: fix some private access level javadoc errors and warnings
6037  (Cao Manh Dat, Christine Poerschke)
6038
6039* LUCENE-6768: AbstractFirstPassGroupingCollector.groupSort private member
6040  is not needed. (Christine Poerschke)
6041
6042* LUCENE-6761: MatchAllDocsQuery's Scorers do not expose approximations
6043  anymore. (Adrien Grand)
6044
6045* LUCENE-6775, LUCENE-6833: Improved MorfologikFilterFactory to allow
6046  loading of custom dictionaries from ResourceLoader. Upgraded
6047  Morfologik to version 2.0.1. The 'dictionary' attribute has been
6048  reverted back and now points at the dictionary resource to be
6049  loaded instead of the default Polish dictionary.
6050  (Uwe Schindler, Dawid Weiss)
6051
6052* LUCENE-6797: Make GeoCircle an interface and use a factory to create
6053  it, to eventually handle degenerate cases (Karl Wright via Mike
6054  McCandless)
6055
6056* LUCENE-6800: Use XYZSolidFactory to create XYZSolids (Karl Wright
6057  via Mike McCandless)
6058
6059* LUCENE-6798: Geo3d now models degenerate (too tiny) circles as a
6060  single point (Karl Wright via Mike McCandless)
6061
6062* LUCENE-6770: Add javadocs that FSDirectory canonicalizes the path.
6063  (Uwe Schindler, Vladimir Kuzmin)
6064
6065* LUCENE-6795: Fix various places where code used
6066  AccessibleObject#setAccessible() without a privileged block. Code
6067  without a hard requirement to do reflection were rewritten. This
6068  makes Lucene and Solr ready for Java 9 Jigsaw's module system, where
6069  reflection on Java's runtime classes is very restricted.
6070  (Robert Muir, Uwe Schindler)
6071
6072* LUCENE-6467: Simplify Query.equals. (Paul Elschot via Adrien Grand)
6073
6074* LUCENE-6845: SpanScorer is now merged into Spans (Alan Woodward, David Smiley)
6075
6076* LUCENE-6887: DefaultSimilarity is deprecated, use ClassicSimilarity for equivalent behavior,
6077  or consider switching to BM25Similarity which will become the new default in Lucene 6.0 (hossman)
6078
6079* LUCENE-6893: factor out CorePlusQueriesParser from CorePlusExtensionsParser
6080  (Christine Poerschke)
6081
6082* LUCENE-6902: Don't retry to fsync files / directories; fail
6083  immediately. (Daniel Mitterdorfer, Uwe Schindler)
6084
6085* LUCENE-6801: Clarify JavaDocs of PhraseQuery that it in fact supports terms
6086  at the same position (as does MultiPhraseQuery), treated like a conjunction.
6087  Added test. (David Smiley, Adrien Grand)
6088
6089Build
6090
6091* LUCENE-6732: Improve checker for invalid source patterns to also
6092  detect javadoc-style license headers. Use Groovy to implement the
6093  checks instead of plain Ant.  (Uwe Schindler)
6094
6095* LUCENE-6594: Update forbiddenapis to 2.0.  (Uwe Schindler)
6096
6097Tests
6098
6099* LUCENE-6752: Add Math#random() to forbiddenapis.  (Uwe Schindler,
6100  Mikhail Khludnev, Andrei Beliakov)
6101
6102Changes in Backwards Compatibility Policy
6103
6104* LUCENE-6742: The Lovins & Finnish implementation of SnowballFilter
6105  were fixed to now behave exactly like the original Snowball stemmer.
6106  If you have indexed text using those stemmers you may need to reindex.
6107  (Uwe Schindler, Robert Muir)
6108
6109Changes in Runtime Behavior
6110
6111* LUCENE-6772: MultiCollector now catches CollectionTerminatedException and
6112  removes the collector that threw this exception from the list of sub
6113  collectors to collect. (Adrien Grand)
6114
6115* LUCENE-6784: IndexSearcher's query caching is enabled by default. Run
6116  indexSearcher.setQueryCache(null) to disable. (Adrien Grand)
6117
6118* LUCENE-6305: BooleanQuery.equals and hashcode do not depend on the order of
6119  clauses anymore. (Adrien Grand)
6120
6121======================= Lucene 5.3.2 =======================
6122
6123Bug Fixes
6124
6125* SOLR-7865: BlendedInfixSuggester was returning too many results
6126  (Arcadius Ahouansou via Mike McCandless)
6127
6128======================= Lucene 5.3.1 =======================
6129
6130Bug Fixes
6131
6132* LUCENE-6774: Remove classloader hack in MorfologikFilter. (Robert Muir,
6133  Uwe Schindler)
6134
6135* LUCENE-6748: UsageTrackingQueryCachingPolicy no longer caches trivial queries
6136  like MatchAllDocsQuery. (Adrien Grand)
6137
6138* LUCENE-6781: Fixed BoostingQuery to rewrite wrapped queries. (Adrien Grand)
6139
6140Tests
6141
6142* LUCENE-6760, SOLR-7958: Move TestUtil#randomWhitespace to the only
6143  Solr test that is using it. The method is not useful for Lucene tests
6144  (and easily breaks, e.g., in Java 9 caused by Unicode version updates).
6145  (Uwe Schindler)
6146
6147
6148======================= Lucene 5.3.0 =======================
6149
6150New Features
6151
6152* LUCENE-6485: Add CustomSeparatorBreakIterator to postings
6153  highlighter which splits on any character. For example, it
6154  can be used with getMultiValueSeparator render whole field
6155  values.  (Luca Cavanna via Robert Muir)
6156
6157* LUCENE-6459: Add common suggest API that mirrors Lucene's
6158  Query/IndexSearcher APIs for Document based suggester.
6159  Adds PrefixCompletionQuery, RegexCompletionQuery,
6160  FuzzyCompletionQuery and ContextQuery.
6161  (Areek Zillur via Mike McCandless)
6162
6163* LUCENE-6487: Spatial Geo3D API now has a WGS84 ellipsoid world model option.
6164  (Karl Wright via David Smiley)
6165
6166* LUCENE-6477: Add experimental BKD geospatial tree doc values format
6167  and queries, for fast "bbox/polygon contains lat/lon points" (Mike
6168  McCandless)
6169
6170* LUCENE-6526: Asserting(Query|Weight|Scorer) now ensure scores are not computed
6171  if they are not needed. (Adrien Grand)
6172
6173* LUCENE-6481: Add GeoPointField, GeoPointInBBoxQuery,
6174  GeoPointInPolygonQuery for simple "indexed lat/lon point in
6175  bbox/shape" searching.  (Nick Knize via Mike McCandless)
6176
6177* LUCENE-5954: The segments_N commit point now stores the Lucene
6178  version that wrote the commit as well as the lucene version that
6179  wrote the oldest segment in the index, for faster checking of "too
6180  old" indices (Ryan Ernst, Robert Muir, Mike McCandless)
6181
6182* LUCENE-6519: BKDPointInPolygonQuery is much faster by avoiding
6183  the per-hit polygon check when a leaf cell is fully contained by the
6184  polygon.  (Nick Knize, Mike McCandless)
6185
6186* LUCENE-6549: Add preload option to MMapDirectory. (Robert Muir)
6187
6188* LUCENE-6504: Add Lucene53Codec, with norms implemented directly
6189  via the Directory's RandomAccessInput api. (Robert Muir)
6190
6191* LUCENE-6539: Add new DocValuesNumbersQuery, to match any document
6192  containing one of the specified long values.  This change also
6193  moves the existing DocValuesTermsQuery and DocValuesRangeQuery
6194  to Lucene's sandbox module, since in general these queries are
6195  quite slow and are only fast in specific cases.  (Adrien Grand,
6196  Robert Muir, Mike McCandless)
6197
6198* LUCENE-6577: Give earlier and better error message for invalid CRC.
6199  (Robert Muir)
6200
6201* LUCENE-6544: Geo3D: (1) Regularize path & polygon construction, (2) add
6202  PlanetModel.surfaceDistance() (ellipsoidal calculation), (3) cache lat & lon
6203  in GeoPoint, (4) add thread-safety where missing -- Geo3dShape. (Karl Wright,
6204  David Smiley)
6205
6206* LUCENE-6606: SegmentInfo.toString now confesses how the documents
6207  were sorted, when SortingMergePolicy was used (Christine Poerschke
6208  via Mike McCandless)
6209
6210* LUCENE-6524: IndexWriter can now be initialized from an already open
6211  near-real-time or non-NRT reader.  (Boaz Leskes, Robert Muir, Mike
6212  McCandless)
6213
6214* LUCENE-6578: Geo3D can now compute the distance from a point to a shape, both
6215  inner distance and to an outside edge. Multiple distance algorithms are
6216  available.  (Karl Wright, David Smiley)
6217
6218* LUCENE-6632: Geo3D: Compute circle planes more accurately.
6219  (Karl Wright via David Smiley)
6220
6221* LUCENE-6653: Added general purpose BytesTermAttribute to basic token
6222  attributes package that can be used for TokenStreams that solely produce
6223  binary terms.  (Uwe Schindler)
6224
6225* LUCENE-6365: Add Operations.topoSort, to run topological sort of the
6226  states in an Automaton (Markus Heiden via Mike McCandless)
6227
6228* LUCENE-6365: Replace Operations.getFiniteStrings with a
6229  more scalable iterator API (FiniteStringsIterator) (Markus Heiden
6230  via Mike McCandless)
6231
6232* LUCENE-6589: Add a new org.apache.lucene.search.join.CheckJoinIndex class
6233  that can be used to validate that an index has an appropriate structure to
6234  run join queries. (Adrien Grand)
6235
6236* LUCENE-6659: Remove IndexWriter's unnecessary hard limit on max concurrency
6237  (Robert Muir, Mike McCandless)
6238
6239* LUCENE-6547: Add GeoPointDistanceQuery, matching all points within
6240  the specified distance from the center point.  Fix
6241  GeoPointInBBoxQuery to handle dateline crossing.
6242
6243* LUCENE-6694: Add LithuanianAnalyzer and LithuanianStemmer.
6244  (Dainius Jocas via Robert Muir)
6245
6246* LUCENE-6695: Added a new BlendedTermQuery to blend statistics across several
6247  terms. (Simon Willnauer, Adrien Grand)
6248
6249* LUCENE-6706: Added a new PayloadScoreQuery that generalises the behaviour of
6250  PayloadTermQuery and PayloadNearQuery to all Span queries. (Alan Woodward)
6251
6252* LUCENE-6697: Add experimental range tree doc values format and
6253  queries, based on a 1D version of the spatial BKD tree, for a faster
6254  and smaller alternative to postings-based numeric and binary term
6255  filtering.  Range trees can also handle values larger than 64 bits.
6256  (Adrien Grand, Mike McCandless)
6257
6258* LUCENE-6647: Add GeoHash string utility APIs (Nick Knize via Mike
6259  McCandless).
6260
6261* LUCENE-6710: GeoPointField now uses full 64 bits (up from 62) to encode
6262  lat/lon (Nick Knize via Mike McCandless).
6263
6264* LUCENE-6580: SpanNearQuery now allows defined-width gaps in its subqueries
6265  (Alan Woodward, Adrien Grand).
6266
6267* LUCENE-6712: Use doc values to post-filter GeoPointField hits that
6268  fall in boundary cells, resulting in smaller index, faster searches
6269  and less heap used for each query (Nick Knize via Mike McCandless).
6270
6271API Changes
6272
6273* LUCENE-6508: Simplify Lock api, there is now just
6274  Directory.obtainLock() which returns a Lock that can be
6275  released (or fails with exception). Add lock verification
6276  to IndexWriter. Improve exception messages when locking fails.
6277  (Uwe Schindler, Mike McCandless, Robert Muir)
6278
6279* LUCENE-6371, LUCENE-6490: Payload collection from Spans is moved to a more generic
6280  SpanCollector framework.  Spans no longer implements .hasPayload() and
6281  .getPayload() methods, and instead exposes a collect() method that allows
6282  the collection of arbitrary postings information. SpanPayloadCheckQuery and
6283  SpanPayloadNearCheckQuery have moved from the .spans package to the .payloads
6284  package. (Alan Woodward, David Smiley, Paul Elschot, Robert Muir)
6285
6286* LUCENE-6529: Removed an optimization in UninvertingReader that was causing
6287  incorrect results for Numeric fields using precisionStep
6288  (hossman, Robert Muir)
6289
6290* LUCENE-6551: Add missing ConcurrentMergeScheduler.getAutoIOThrottle
6291  getter (Simon Willnauer, Mike McCandless)
6292
6293* LUCENE-6552: Add MergePolicy.OneMerge.getMergeInfo and rename
6294  setInfo to setMergeInfo (Simon Willnauer, Mike McCandless)
6295
6296* LUCENE-6525: Deprecate IndexWriterConfig's writeLockTimeout.
6297  (Robert Muir)
6298
6299* LUCENE-6583: FilteredQuery is deprecated and will be removed in 6.0. It should
6300  be replaced with a BooleanQuery which handle the query as a MUST clause and
6301  the filter as a FILTER clause. (Adrien Grand)
6302
6303* LUCENE-6553: The postings, spans and scorer APIs no longer take an acceptDocs
6304  parameter. Live docs are now always checked on top of these APIs.
6305  (Adrien Grand)
6306
6307* LUCENE-6634: PKIndexSplitter now takes a Query instead of a Filter to decide
6308  how to split an index. (Adrien Grand)
6309
6310* LUCENE-6643: GroupingSearch from lucene/grouping was changed to take a Query
6311  object to define groups instead of a Filter. (Adrien Grand)
6312
6313* LUCENE-6554: ToParentBlockJoinFieldComparator was removed because of a bug
6314  with missing values that could not be fixed. ToParentBlockJoinSortField now
6315  works with string or numeric doc values selectors. Sorting on anything else
6316  than a string or numeric field would require to implement a custom selector.
6317  (Adrien Grand)
6318
6319* LUCENE-6648: All lucene/facet APIs now take Query objects where they used to
6320  take Filter objects. (Adrien Grand)
6321
6322* LUCENE-6640: Suggesters now take a BitsProducer object instead of a Filter
6323  object to reduce the scope of doc IDs that may be returned, emphasizing the
6324  fact that these objects need to support random-access. (Adrien Grand)
6325
6326* LUCENE-6646: Make EarlyTerminatingCollector take a Sort object directly
6327  instead of a SortingMergePolicy. (Christine Poerschke via Adrien Grand)
6328
6329* LUCENE-6649: BitDocIdSetFilter and BitDocIdSetCachingWrapperFilter are now
6330  deprecated in favour of BitSetProducer and QueryBitSetProducer, which do not
6331  extend oal.search.Filter. (Adrien Grand)
6332
6333* LUCENE-6607: Factor out geo3d into its own spatial3d module.  (Karl
6334  Wright, Nick Knize, David Smiley, Mike McCandless)
6335
6336* LUCENE-6531: PhraseQuery is now immutable and can be built using the
6337  PhraseQuery.Builder class. (Adrien Grand)
6338
6339* LUCENE-6570: BooleanQuery is now immutable and can be built using the
6340  BooleanQuery.Builder class. (Adrien Grand)
6341
6342* LUCENE-6702: NRTSuggester: Add a method to inject context values at index time
6343  in ContextSuggestField. Simplify ContextQuery logic for extracting contexts and
6344  add dedicated method to consider all context values at query time.
6345  (Areek Zillur, Mike McCandless)
6346
6347* LUCENE-6719: NumericUtils getMinInt, getMaxInt, getMinLong, getMaxLong now
6348  return null if there are no terms for the specified field, previously these
6349  methods returned primitive values and raised an undocumented NullPointerException
6350  if there were no terms for the field. (hossman, Timothy Potter)
6351
6352Bug fixes
6353
6354* LUCENE-6500: ParallelCompositeReader did not always call
6355  closed listeners. This was fixed by LUCENE-6501.
6356  (Adrien Grand, Uwe Schindler)
6357
6358* LUCENE-6520: Geo3D GeoPath.done() would throw an NPE if adjacent path
6359  segments were co-linear. (Karl Wright via David Smiley)
6360
6361* LUCENE-5805: QueryNodeImpl.removeFromParent was doing nothing in a
6362  costly manner (Christoph Kaser, Cao Manh Dat via Mike McCAndless)
6363
6364* LUCENE-6533: SlowCompositeReaderWrapper no longer caches its live docs
6365  instance since this can prevent future improvements like a
6366  disk-backed live docs (Adrien Grand, Mike McCandless)
6367
6368* LUCENE-6558: Highlighters now work with CustomScoreQuery (Cao Manh
6369  Dat via Mike McCandless)
6370
6371* LUCENE-6560: BKDPointInBBoxQuery now handles "dateline crossing"
6372  correctly (Nick Knize, Mike McCandless)
6373
6374* LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8
6375  ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString
6376  as workaround); fix output of tests to use same format.  (Uwe Schindler,
6377  Ramkumar Aiyengar)
6378
6379* LUCENE-6593: Fixed ToChildBlockJoinQuery's scorer to not refuse to advance
6380  to a document that belongs to the parent space. (Adrien Grand)
6381
6382* LUCENE-6591: Never write a negative vLong (Robert Muir, Ryan Ernst,
6383  Adrien Grand, Mike McCandless)
6384
6385* LUCENE-6588: Fix how ToChildBlockJoinQuery deals with acceptDocs.
6386  (Christoph Kaser via Adrien Grand)
6387
6388* LUCENE-6597: Geo3D's GeoCircle now supports a world-globe diameter.
6389  (Karl Wright via David Smiley)
6390
6391* LUCENE-6608: Fix potential resource leak in BigramDictionary.
6392  (Rishabh Patel via Uwe Schindler)
6393
6394* LUCENE-6614: Improve partition detection in IOUtils#spins() so it
6395  works with NVMe drives.  (Uwe Schindler, Mike McCandless)
6396
6397* LUCENE-6586: Fix typo in GermanStemmer, causing possible wrong value
6398  for substCount.  (Christoph Kaser via Mike McCandless)
6399
6400* LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any
6401  segments.  (Trejkaz, Uwe Schindler)
6402
6403* LUCENE-6677: QueryParserBase fails to enforce maxDeterminizedStates when
6404  creating a WildcardQuery (David Causse via Mike McCandless)
6405
6406* LUCENE-6680: Preserve two suggestions that have same key and weight but
6407  different payloads (Arcadius Ahouansou via Mike McCandless)
6408
6409* LUCENE-6681: SortingMergePolicy must override MergePolicy.size(...).
6410  (Christine Poerschke via Adrien Grand)
6411
6412* LUCENE-6682: StandardTokenizer performance bug: scanner buffer is
6413  unnecessarily copied when maxTokenLength doesn't change.  Also stop silently
6414  maxing out buffer size (and effectively also max token length) at 1M chars,
6415  but instead throw an exception from setMaxTokenLength() when the given
6416  length is greater than 1M chars.  (Piotr Idzikowski, Steve Rowe)
6417
6418* LUCENE-6696: Fix FilterDirectoryReader.close() to never close the
6419  underlying reader several times. (Adrien Grand)
6420
6421* LUCENE-6334: FastVectorHighlighter failed to highlight phrases across
6422  more than one value in a multi-valued field. (Chris Earle, Nik Everett
6423  via Mike McCandless)
6424
6425* LUCENE-6704: GeoPointDistanceQuery was visiting too many term ranges,
6426  consuming too much heap for a large radius (Nick Knize via Mike McCandless)
6427
6428* SOLR-5882: fix ScoreMode.Min at ToParentBlockJoinQuery (Mikhail Khludnev)
6429
6430* LUCENE-6718: JoinUtil.createJoinQuery failed to rewrite queries before
6431  creating a Weight. (Adrien Grand)
6432
6433* LUCENE-6713: TooComplexToDeterminizeException claims to be serializable
6434  but wasn't (Simon Willnauer, Mike McCandless)
6435
6436* LUCENE-6723: Fix date parsing problems in Java 9 with date formats using
6437  English weekday/month names.  (Uwe Schindler)
6438
6439* LUCENE-6618: Properly set MMapDirectory.UNMAP_SUPPORTED when it is now allowed
6440  by security policy. (Robert Muir)
6441
6442Changes in Runtime Behavior
6443
6444* LUCENE-6501: The subreader structure in ParallelCompositeReader
6445  was flattened, because the current implementation had too many
6446  hidden bugs regarding refounting and close listeners.
6447  If you create a new ParallelCompositeReader, it will just take
6448  all leaves of the passed readers and form a flat structure of
6449  ParallelLeafReaders instead of trying to assemble the original
6450  structure of composite and leaf readers.  (Adrien Grand,
6451  Uwe Schindler)
6452
6453* LUCENE-6537: NearSpansOrdered no longer tries to minimize its
6454  Span matches.  This means that the matching algorithm is entirely
6455  lazy.  All spans returned by the previous implementation are still
6456  reported, but matching documents may now also return additional
6457  spans that were previously discarded in preference to shorter
6458  overlapping ones. (Alan Woodward, Adrien Grand, Paul Elschot)
6459
6460* LUCENE-6538: Also include java.vm.version and java.runtime.version
6461  in per-segment diagnostics (Robert Muir, Mike McCandless)
6462
6463* LUCENE-6569: Optimize MultiFunction.anyExists and allExists to eliminate
6464  excessive array creation in common 2 argument usage (Jacob Graves, hossman)
6465
6466* LUCENE-2880: Span queries now score more consistently with regular queries.
6467  (Robert Muir, Adrien Grand)
6468
6469* LUCENE-6601: FilteredQuery now always rewrites to a BooleanQuery which handles
6470  the query as a MUST clause and the filter as a FILTER clause.
6471  LEAP_FROG_QUERY_FIRST_STRATEGY and LEAP_FROG_FILTER_FIRST_STRATEGY do not
6472  guarantee anymore which iterator will be advanced first, it will depend on the
6473  respective costs of the iterators. QUERY_FIRST_FILTER_STRATEGY and
6474  RANDOM_ACCESS_FILTER_STRATEGY still consume the filter using its random-access
6475  API, however the returned bits may be called on different documents compared
6476  to before. (Adrien Grand)
6477
6478* LUCENE-6542: FSDirectory's ctor now works with security policies or file systems
6479  that restrict write access.  (Trejkaz, hossman, Uwe Schindler)
6480
6481* LUCENE-6651: The default implementation of AttributeImpl#reflectWith(AttributeReflector)
6482  now uses AccessControler#doPrivileged() to do the reflection. Please consider
6483  implementing this method in all your custom attributes, because the method will be
6484  made abstract in Lucene 6.  (Uwe Schindler)
6485
6486* LUCENE-6639: LRUQueryCache and CachingWrapperQuery now consider a query as
6487  "used" when the first Scorer is pulled instead of when a Scorer is pulled on
6488  the first segment on an index. (Terry Smith, Adrien Grand)
6489
6490* LUCENE-6579: IndexWriter now sacrifices (closes) itself to protect the index
6491  when an unexpected, tragic exception strikes while merging. (Robert
6492  Muir, Mike McCandless)
6493
6494* LUCENE-6691: SortingMergePolicy.isSorted now considers FilterLeafReader instances.
6495  EarlyTerminatingSortingCollector.terminatedEarly accessor added.
6496  TestEarlyTerminatingSortingCollector.testTerminatedEarly test added.
6497  (Christine Poerschke)
6498
6499* LUCENE-6609: Add getSortField impls to many subclasses of FieldCacheSource which return
6500  the most direct SortField implementation.  In many trivial sort by ValueSource usages, this
6501  will result in less RAM, and more precise sorting of extreme values due to no longer
6502  converting to double. (hossman)
6503
6504Optimizations
6505
6506* LUCENE-6548: Some optimizations for BlockTree's intersect with very
6507  finite automata (Mike McCandless)
6508
6509* LUCENE-6585: Flatten conjunctions and conjunction approximations into
6510  parent conjunctions. For example a sloppy phrase query of "foo bar"~5
6511  with a filter of "baz" will internally leapfrog foo,bar,baz as one
6512  conjunction. (Ryan Ernst, Robert Muir, Adrien Grand)
6513
6514* LUCENE-6325: Reduce RAM usage of FieldInfos, and speed up lookup by
6515  number, by using an array instead of TreeMap except in very sparse
6516  cases (Robert Muir, Mike McCandless)
6517
6518* LUCENE-6617: Reduce heap usage for small FSTs (Mike McCandless)
6519
6520* LUCENE-6616: IndexWriter now lists the files in the index directory
6521  only once on init, and IndexFileDeleter no longer suppresses
6522  FileNotFoundException and NoSuchFileException.  This also improves
6523  IndexFileDeleter to delete segments_N files last, so that in the
6524  presence of a virus checker, the index is never left in a state
6525  where an expired segments_N references non-existing files (Robert
6526  Muir, Mike McCandless)
6527
6528* LUCENE-6645: Optimized the way we merge postings lists in multi-term queries
6529  and TermsQuery. This should especially help when there are lots of small
6530  postings lists. (Adrien Grand, Mike McCandless)
6531
6532* LUCENE-6668: Optimized storage for sorted set and sorted numeric doc values
6533  in the case that there are few unique sets of values.
6534  (Adrien Grand, Robert Muir)
6535
6536* LUCENE-6690: Sped up MultiTermsEnum.next() on high-cardinality fields.
6537  (Adrien Grand)
6538
6539* LUCENE-6621: Removed two unused variables in analysis/stempel/src/java/org/
6540  egothor/stemmer/Compile.java
6541  (Rishabh Patel via Christine Poerschke)
6542
6543Build
6544
6545* LUCENE-6518: Don't report false thread leaks from IBM J9
6546  ClassCache Reaper in test framework. (Dawid Weiss)
6547
6548* LUCENE-6567: Simplify payload checking in SpanPayloadCheckQuery (Alan
6549  Woodward)
6550
6551* LUCENE-6568: Make rat invocation depend on ivy configuration being set up
6552  (Ramkumar Aiyengar)
6553
6554* LUCENE-6683: ivy-fail goal directs people to non-existent page
6555  (Mike Drob via Steve Rowe)
6556
6557* LUCENE-6693: Updated Groovy to 2.4.4, Pegdown to 1.5, Svnkit to 1.8.10.
6558  Also fixed some PermGen errors while running full build caused by
6559  these updates: Tasks are now installed from root's build.xml.
6560  (Uwe Schindler)
6561
6562* LUCENE-6741: Fix jflex files to regenerate the java files correctly.
6563  (Uwe Schindler)
6564
6565Test Framework
6566
6567* LUCENE-6637: Fix FSTTester to not violate file permissions
6568  on -Dtests.verbose=true.  (Mesbah M. Alam, Uwe Schindler)
6569
6570* LUCENE-6542: LuceneTestCase now has runWithRestrictedPermissions() to run
6571  an action with reduced permissions. This can be used to simulate special
6572  environments (e.g., read-only dirs). If tests are running without a security
6573  manager, an assume cancels test execution automatically.  (Uwe Schindler)
6574
6575* LUCENE-6652: Removed lots of useless Byte(s)TermAttributes all over test
6576  infrastructure.  (Uwe Schindler)
6577
6578* LUCENE-6563: Improve MockFileSystemTestCase.testURI to check if a path
6579  can be encoded according to local filesystem requirements. Otherwise
6580  stop test execution.  (Christine Poerschke via Uwe Schindler)
6581
6582Changes in Backwards Compatibility Policy
6583
6584* LUCENE-6553: The iterator returned by the LeafReader.postings method now
6585  always includes deleted docs, so you have to check for deleted documents on
6586  top of the iterator. (Adrien Grand)
6587
6588* LUCENE-6633: DuplicateFilter has been deprecated and will be removed in 6.0.
6589  DiversifiedTopDocsCollector can be used instead with a maximum number of hits
6590  per key equal to 1. (Adrien Grand)
6591
6592* LUCENE-6653: The workflow for consuming the TermToBytesRefAttribute was changed:
6593  getBytesRef() now does all work and is called on each token, fillBytesRef()
6594  was removed. The implementation is free to reuse the internal BytesRef
6595  or return a new one on each call.  (Uwe Schindler)
6596
6597* LUCENE-6682: StandardTokenizer.setMaxTokenLength() now throws an exception if
6598  a length greater than 1M chars is given.  Previously the effective max token
6599  length (the scanner's buffer) was capped at 1M chars, but getMaxTokenLength()
6600  incorrectly returned the previously requested length, even when it exceeded 1M.
6601  (Piotr Idzikowski, Steve Rowe)
6602
6603
6604======================= Lucene 5.2.1 =======================
6605
6606Bug Fixes
6607
6608* LUCENE-6482: Fix class loading deadlock relating to Codec initialization,
6609  default codec and SPI discovery.  (Shikhar Bhushan, Uwe Schindler)
6610
6611* LUCENE-6523: NRT readers now reflect a new commit even if there is
6612  no change to the commit user data (Mike McCandless)
6613
6614* LUCENE-6527: Queries now get a dummy Similarity when scores are not needed
6615  in order to not load unnecessary information like norms. (Adrien Grand)
6616
6617* LUCENE-6559: TimeLimitingCollector now also checks for timeout when a new
6618  leaf reader is pulled ie. if we move from one segment to another even without
6619  collecting a hit. (Simon Willnauer)
6620
6621======================= Lucene 5.2.0 =======================
6622
6623New Features
6624
6625* LUCENE-6308, LUCENE-6385, LUCENE-6391: Span queries now share
6626  document conjunction/intersection
6627  code with boolean queries, and use two-phased iterators for
6628  faster intersection by avoiding loading positions in certain cases.
6629  (Paul Elschot, Terry Smith, Robert Muir via Mike McCandless)
6630
6631* LUCENE-6393: Add two-phase support to SpanPositionCheckQuery
6632  and its subclasses: SpanPositionRangeQuery, SpanPayloadCheckQuery,
6633  SpanNearPayloadCheckQuery, SpanFirstQuery. (Paul Elschot, Robert Muir)
6634
6635* LUCENE-6394: Add two-phase support to SpanNotQuery and refactor
6636  FilterSpans to just have an accept(Spans candidate) method for
6637  subclasses. (Robert Muir)
6638
6639* LUCENE-6373: SpanOrQuery shares disjunction logic with boolean
6640  queries, and supports two-phased iterators to avoid loading
6641  positions when possible. (Paul Elschot via Robert Muir)
6642
6643* LUCENE-6352, LUCENE-6472: Added a new query time join to the join module
6644  that uses global ordinals, which is faster for subsequent joins between
6645  reopens. (Martijn van Groningen, Adrien Grand)
6646
6647* LUCENE-5879: Added experimental auto-prefix terms to BlockTree terms
6648  dictionary, exposed as AutoPrefixPostingsFormat (Adrien Grand,
6649  Uwe Schindler, Robert Muir, Mike McCandless)
6650
6651* LUCENE-5579: New CompositeSpatialStrategy combines speed of RPT with
6652  accuracy of SDV. Includes optimized Intersect predicate to avoid many
6653  geometry checks. Uses TwoPhaseIterator. (David Smiley)
6654
6655* LUCENE-5989: Allow passing BytesRef to StringField to make it easier
6656  to index arbitrary binary tokens, and change the experimental
6657  StoredFieldVisitor.stringField API to take UTF-8 byte[] instead of
6658  String (Mike McCandless)
6659
6660* LUCENE-6389: Added ScoreMode.Min that aggregates the lowest child score
6661  to the parent hit. (Martijn van Groningen, Adrien Grand)
6662
6663* LUCENE-6423: New LimitTokenOffsetFilter that limits tokens to those before
6664  a configured maximum start offset. (David Smiley)
6665
6666* LUCENE-6422: New spatial PackedQuadPrefixTree, a generally more efficient
6667  choice than QuadPrefixTree, especially for high precision shapes.
6668  When used, you should typically disable RPT's pruneLeafyBranches option.
6669  (Nick Knize, David Smiley)
6670
6671* LUCENE-6451: Expressions now support bindings keys that look like
6672  zero arg functions (Jack Conradson via Ryan Ernst)
6673
6674* LUCENE-6083: Add SpanWithinQuery and SpanContainingQuery that return
6675  spans inside of / containing another spans. (Paul Elschot via Robert Muir)
6676
6677* LUCENE-6454: Added distinction between member variable and method in
6678  expression helper VariableContext
6679  (Jack Conradson via Ryan Ernst)
6680
6681* LUCENE-6196: New Spatial "Geo3d" API with partial Spatial4j integration.
6682  It is a set of shapes implemented using 3D planar geometry for calculating
6683  spatial relations on the surface of a sphere. Shapes include Point, BBox,
6684  Circle, Path (buffered line string), and Polygon.
6685  (Karl Wright via David Smiley)
6686
6687* LUCENE-6464: Add a new expert lookup method to
6688  AnalyzingInfixSuggester to accept an arbitrary BooleanQuery to
6689  express how contexts should be filtered. (Arcadius Ahouansou via
6690  Mike McCandless)
6691
6692Optimizations
6693
6694* LUCENE-6379: IndexWriter.deleteDocuments(Query...) now detects if
6695  one of the queries is MatchAllDocsQuery and just invokes the much
6696  faster IndexWriter.deleteAll in that case (Robert Muir, Adrien
6697  Grand, Mike McCandless)
6698
6699* LUCENE-6388: Optimize SpanNearQuery when payloads are not present.
6700  (Robert Muir)
6701
6702* LUCENE-6421: Defer reading of positions in MultiPhraseQuery until
6703  they are needed. (Robert Muir)
6704
6705* LUCENE-6392: Highligher- reduce memory of tokens in
6706  TokenStreamFromTermVector, and add maxStartOffset limit. (David Smiley)
6707
6708* LUCENE-6456: Queries that generate doc id sets that are too large for the
6709  query cache are not cached instead of evicting everything. (Adrien Grand)
6710
6711* LUCENE-6455: Require a minimum index size to enable query caching in order
6712  not to cache eg. on MemoryIndex. (Adrien Grand)
6713
6714* LUCENE-6330: BooleanScorer (used for top-level disjunctions) does not decode
6715  norms when not necessary anymore. (Adrien Grand)
6716
6717* LUCENE-6350: TermsQuery is now compressed with PrefixCodedTerms.
6718  (Robert Muir, Mike McCandless, Adrien Grand)
6719
6720* LUCENE-6458: Multi-term queries matching few terms per segment now execute
6721  like a disjunction. (Adrien Grand)
6722
6723* LUCENE-6360: TermsQuery rewrites to a disjunction when there are 16 matching
6724  terms or less. (Adrien Grand)
6725
6726Bug Fixes
6727
6728* LUCENE-329: Fix FuzzyQuery defaults to rank exact matches highest.
6729  (Mark Harwood, Adrien Grand)
6730
6731* LUCENE-6378: Fix all RuntimeExceptions to throw the underlying root cause.
6732  (Varun Thacker, Adrien Grand, Mike McCandless)
6733
6734* LUCENE-6415: TermsQuery.extractTerms is a no-op (used to throw an
6735  UnsupportedOperationException). (Adrien Grand)
6736
6737* LUCENE-6416: BooleanQuery.extractTerms now only extracts terms from scoring
6738  clauses. (Adrien Grand)
6739
6740* LUCENE-6409: Fixed integer overflow in LongBitSet.ensureCapacity.
6741  (Luc Vanlerberghe via Adrien Grand)
6742
6743* LUCENE-6424, LUCENE-6430: Fix many bugs with mockfs filesystems in the
6744  test-framework: always consistently wrap Path, fix buggy behavior for
6745  globs, implement equals/hashcode for filtered Paths, etc.
6746  (Ryan Ernst, Simon Willnauer, Robert Muir)
6747
6748* LUCENE-6426: Fix FieldType's copy constructor to also copy over the numeric
6749  precision step. (Adrien Grand)
6750
6751* LUCENE-6345: Null check terms/fields in Lucene queries (Lee
6752  Hinman via Mike McCandless)
6753
6754* LUCENE-6400: SolrSynonymParser should preserve original token instead
6755  of replacing it with a synonym, when expand=true and there is no
6756  explicit mapping (Ian Ribas, Robert Muir, Mike McCandless)
6757
6758* LUCENE-6449: Don't throw NullPointerException if some segments are
6759  missing the field being highlighted, in PostingsHighlighter (Roman
6760  Khmelichek via Mike McCandless)
6761
6762* LUCENE-6427: Added assertion about the presence of ghost bits in
6763  (Fixed|Long)BitSet. (Luc Vanlerberghe via Adrien Grand)
6764
6765* LUCENE-6468: Fixed NPE with empty Kuromoji user dictionary.
6766  (Jun Ohtani via Christian Moen)
6767
6768* LUCENE-6483: Ensure core closed listeners are called on the same cache key as
6769  the reader which has been used to register the listener. (Adrien Grand)
6770
6771* LUCENE-6486 DocumentDictionary iterator no longer skips
6772  documents with no payloads and now returns an empty BytesRef instead
6773  (Marius Grama via Michael McCandless)
6774
6775* LUCENE-6505: NRT readers now reflect segments_N filename and commit
6776  user data from previous commits (Mike McCandless)
6777
6778* LUCENE-6507: Don't let NativeFSLock.close() release other locks
6779  (Simon Willnauer, Robert Muir, Uwe Schindler, Mike McCandless)
6780
6781API Changes
6782
6783* LUCENE-6377: SearcherFactory#newSearcher now accepts the previous reader
6784  to simplify warming logic during opening new searchers. (Simon Willnauer)
6785
6786* LUCENE-6410: Removed unused "reuse" parameter to
6787  Terms.iterator. (Robert Muir, Mike McCandless)
6788
6789* LUCENE-6425: Replaced Query.extractTerms with Weight.extractTerms.
6790  (Adrien Grand)
6791
6792* LUCENE-6446: Simplified Explanation API. (Adrien Grand)
6793
6794* LUCENE-6445: Two new methods in Highlighter's TokenSources; the existing
6795  methods are now marked deprecated. (David Smiley)
6796
6797* LUCENE-6484: Removed EliasFanoDocIdSet, which was unused.
6798  (Paul Elschot via Adrien Grand)
6799
6800* LUCENE-6466: Moved SpanQuery.getSpans() and .extractTerms() to SpanWeight
6801  (Alan Woodward, Robert Muir)
6802
6803* LUCENE-6497: Allow subclasses of FieldType to check frozen state
6804  (Ryan Ernst)
6805
6806Other
6807
6808* LUCENE-6413: Test runner should report the number of suites completed/
6809  remaining. (Dawid Weiss)
6810
6811* LUCENE-5439: Add 'ant jacoco' build target. (Robert Muir)
6812
6813* LUCENE-6315: Simplify the private iterator Lucene uses internally
6814  when resolving deleted terms to matched docids. (Robert Muir, Adrien
6815  Grand, Mike McCandless)
6816
6817* LUCENE-6399: Benchmark module's QueryMaker.resetInputs should call setConfig
6818  so queries can react to property changes in new rounds. (David Smiley)
6819
6820* LUCENE-6382: Lucene now enforces that positions never exceed the
6821  maximum value IndexWriter.MAX_POSITION.  (Robert Muir, Mike McCandless)
6822
6823* LUCENE-6372: Simplified and improved equals/hashcode of span queries.
6824  (Paul Elschot via Adrien Grand)
6825
6826Build
6827
6828* LUCENE-6420: Update forbiddenapis to v1.8  (Uwe Schindler)
6829
6830Test Framework
6831
6832* LUCENE-6419: Added two-phase iteration assertions to AssertingQuery.
6833  (Adrien Grand)
6834
6835* LUCENE-6437: Randomly set CPU core count and spins, derived from
6836  test's master seed, used by ConcurrentMergeScheduler to set dynamic
6837  defaults, for better test randomization and to help tests reproduce
6838  (Robert Muir, Mike McCandless)
6839
6840======================= Lucene 5.1.0 =======================
6841
6842New Features
6843
6844* LUCENE-6066: Added DiversifiedTopDocsCollector to misc for collecting no more
6845  than a given number of results under a choice of key. Introduces new remove
6846  method to core's PriorityQueue. (Mark Harwood)
6847
6848* LUCENE-6191: New spatial 2D heatmap faceting for PrefixTreeStrategy. (David Smiley)
6849
6850* LUCENE-6227: Added BooleanClause.Occur.FILTER to filter documents without
6851  participating in scoring (on the contrary to MUST). (Adrien Grand)
6852
6853* LUCENE-6294: Added oal.search.CollectorManager to allow for parallelization
6854  of the document collection process on IndexSearcher. (Adrien Grand)
6855
6856* LUCENE-6303: Added filter caching baked into IndexSearcher, disabled by
6857  default. (Adrien Grand)
6858
6859* LUCENE-6304: Added a new MatchNoDocsQuery that matches no documents.
6860  (Lee Hinman via Adrien Grand)
6861
6862* LUCENE-6341: Add a -fast option to CheckIndex. (Robert Muir)
6863
6864* LUCENE-6355: IndexWriter's infoStream now also logs time to write FieldInfos
6865  during merge (Lee Hinman via Mike McCandless)
6866
6867* LUCENE-6339: Added Near-real time Document Suggester via custom postings format
6868  (Areek Zillur, Mike McCandless, Simon Willnauer)
6869
6870Bug Fixes
6871
6872* LUCENE-6368: FST.save can truncate output (BufferedOutputStream may be closed
6873  after the underlying stream). (Ippei Matsushima via Dawid Weiss)
6874
6875* LUCENE-6249: StandardQueryParser doesn't support pure negative clauses.
6876  (Dawid Weiss)
6877
6878* LUCENE-6190: Spatial pointsOnly flag on PrefixTreeStrategy shouldn't switch all predicates to
6879  Intersects. (David Smiley)
6880
6881* LUCENE-6242: Ram usage estimation was incorrect for SparseFixedBitSet when
6882  object alignment was different from 8. (Uwe Schindler, Adrien Grand)
6883
6884* LUCENE-6293: Fixed TimSorter bug. (Adrien Grand)
6885
6886* LUCENE-6001: DrillSideways hits NullPointerException for certain
6887  BooleanQuery searches.  (Dragan Jotannovic, jane chang via Mike
6888  McCandless)
6889
6890* LUCENE-6311: Fix NIOFSDirectory and SimpleFSDirectory so that the
6891  toString method of IndexInputs confess when they are from a compound
6892  file. (Robert Muir, Mike McCandless)
6893
6894* LUCENE-6381: Add defensive wait time limit in
6895  DocumentsWriterStallControl to prevent hangs during indexing if we
6896  miss a .notify/All somewhere (Mike McCandless)
6897
6898* LUCENE-6386: Correct IndexWriter.forceMerge documentation to state
6899  that up to 3X (X = current index size) spare disk space may be needed
6900  to complete forceMerge(1).  (Robert Muir, Shai Erera, Mike McCandless)
6901
6902* LUCENE-6395: Seeking by term ordinal was failing to set the term's
6903  bytes in MemoryIndex (Mike McCandless)
6904
6905* LUCENE-6429: Removed the TermQuery(Term,int) constructor which could lead to
6906  inconsistent term statistics. (Adrien Grand, Robert Muir)
6907
6908Optimizations
6909
6910* LUCENE-6183, LUCENE-5647: Avoid recompressing stored fields
6911  and term vectors when merging segments without deletions.
6912  Lucene50Codec's BEST_COMPRESSION mode uses a higher deflate
6913  level for more compact storage.  (Robert Muir)
6914
6915* LUCENE-6184: Make BooleanScorer only score windows that contain
6916  matches. (Adrien Grand)
6917
6918* LUCENE-6161: Speed up resolving of deleted terms to docIDs by doing
6919  a combined merge sort between deleted terms and segment terms
6920  instead of a separate merge sort for each segment.  In delete-heavy
6921  use cases this can be a sizable speedup. (Mike McCandless)
6922
6923* LUCENE-6201: BooleanScorer can now deal with values of minShouldMatch that
6924  are greater than one and is used when queries produce dense result sets.
6925  (Adrien Grand)
6926
6927* LUCENE-6218: Don't decode frequencies or match all positions when scoring
6928  is not needed. (Robert Muir)
6929
6930* LUCENE-6233 Speed up CheckIndex when the index has term vectors
6931  (Robert Muir, Mike McCandless)
6932
6933* LUCENE-6198: Added the TwoPhaseIterator API, exposed on scorers which
6934  is for now only used on phrase queries and conjunctions in order to check
6935  positions lazily if the phrase query is in a conjunction with other queries.
6936  (Robert Muir, Adrien Grand, David Smiley)
6937
6938* LUCENE-6244, LUCENE-6251: All boolean queries but those that have a
6939  minShouldMatch > 1 now either propagate or take advantage of the two-phase
6940  iteration capabilities added in LUCENE-6198. (Adrien Grand, Robert Muir)
6941
6942* LUCENE-6241: FSDirectory.listAll() doesnt filter out subdirectories anymore,
6943  for faster performance. Subdirectories don't matter to Lucene. If you need to
6944  filter out non-index files with some custom usage, you may want to look at
6945  the IndexFileNames class. (Robert Muir)
6946
6947* LUCENE-6262: ConstantScoreQuery does not wrap the inner weight anymore when
6948  scores are not required. (Adrien Grand)
6949
6950* LUCENE-6263: MultiCollector automatically caches scores when several
6951  collectors need them. (Adrien Grand)
6952
6953* LUCENE-6275: SloppyPhraseScorer now uses the same logic as ConjunctionScorer
6954  in order to advance doc IDs, which takes advantage of the cost() API.
6955  (Adrien Grand)
6956
6957* LUCENE-6290: QueryWrapperFilter propagates approximations and FilteredQuery
6958  rewrites to a BooleanQuery when the filter is a QueryWrapperFilter in order
6959  to leverage approximations. (Adrien Grand)
6960
6961* LUCENE-6318: Reduce RAM usage of FieldInfos when there are many fields.
6962  (Mike McCandless, Robert Muir)
6963
6964* LUCENE-6320: Speed up CheckIndex. (Robert Muir)
6965
6966* LUCENE-4942: Optimized the encoding of PrefixTreeStrategy indexes for
6967  non-point data: 33% smaller index, 68% faster indexing, and 44% faster
6968  searching. YMMV (David Smiley)
6969
6970API Changes
6971
6972* LUCENE-6204, LUCENE-6208: Simplify CompoundFormat: remove files()
6973  and remove files parameter to write(). (Robert Muir)
6974
6975* LUCENE-6217: Add IndexWriter.isOpen and getTragicException.  (Simon
6976  Willnauer, Mike McCandless)
6977
6978* LUCENE-6218, LUCENE-6220: Add Collector.needsScores() and needsScores
6979  parameter to Query.createWeight(). (Robert Muir, Adrien Grand)
6980
6981* LUCENE-4524, LUCENE-6246, LUCENE-6256, LUCENE-6271: Merge DocsEnum and DocsAndPositionsEnum
6982  into a single PostingsEnum iterator.  TermsEnum.docs() and TermsEnum.docsAndPositions()
6983  are replaced by TermsEnum.postings().
6984  (Alan Woodward, Simon Willnauer, Robert Muir, Ryan Ernst)
6985
6986* LUCENE-6222: Removed TermFilter, use a QueryWrapperFilter(TermQuery)
6987  instead. This will be as efficient now that queries can opt out from
6988  scoring. (Adrien Grand)
6989
6990* LUCENE-6269: Removed BooleanFilter, use a QueryWrapperFilter(BooleanQuery)
6991  instead. (Adrien Grand)
6992
6993* LUCENE-6270: Replaced TermsFilter with TermsQuery, use a
6994  QueryWrapperFilter(TermsQuery) instead. (Adrien Grand)
6995
6996* LUCENE-6223: Move BooleanQuery.BooleanWeight to BooleanWeight.
6997  (Robert Muir)
6998
6999* LUCENE-1518: Make Filter extend Query and return 0 as score.
7000  (Uwe Schindler, Adrien Grand)
7001
7002* LUCENE-6245: Force Filter subclasses to implement toString API from Query.
7003  (Ryan Ernst)
7004
7005* LUCENE-6268: Replace FieldValueFilter and DocValuesRangeFilter with equivalent
7006  queries that support approximations. (Adrien Grand)
7007
7008* LUCENE-6289: Replace DocValuesRangeFilter with DocValuesRangeQuery which
7009  supports approximations. (Adrien Grand)
7010
7011* LUCENE-6266: Remove unnecessary Directory params from SegmentInfo.toString,
7012  SegmentInfos.files/toString, and SegmentCommitInfo.toString. (Robert Muir)
7013
7014* LUCENE-6272: Scorer extends DocSetIdIterator rather than DocsEnum (Alan
7015  Woodward)
7016
7017* LUCENE-6281: Removed support for slow collations from lucene/sandbox. Better
7018  performance would be achieved through CollationKeyAnalyzer or
7019  ICUCollationKeyAnalyzer. (Adrien Grand)
7020
7021* LUCENE-6286: Removed IndexSearcher methods that take a Filter object.
7022  A BooleanQuery with a filter clause must be used instead. (Adrien Grand)
7023
7024* LUCENE-6300: PrefixFilter, TermRangeFilter and NumericRangeFilter have been
7025  removed. Use PrefixQuery, TermRangeQuery and NumericRangeQuery instead.
7026  (Adrien Grand)
7027
7028* LUCENE-6303: Replaced FilterCache with QueryCache and CachingWrapperFilter
7029  with CachingWrapperQuery. (Adrien Grand)
7030
7031* LUCENE-6317: Deprecate DataOutput.writeStringSet and writeStringStringMap.
7032  Use writeSetOfStrings/Maps instead. (Mike McCandless, Robert Muir)
7033
7034* LUCENE-6307: Rename SegmentInfo.getDocCount -> .maxDoc,
7035  SegmentInfos.totalDocCount -> .totalMaxDoc, MergeInfo.totalDocCount
7036  -> .totalMaxDoc and MergePolicy.OneMerge.totalDocCount ->
7037  .totalMaxDoc (Adrien Grand, Robert Muir, Mike McCandless)
7038
7039* LUCENE-6367: PrefixQuery now subclasses AutomatonQuery, removing the
7040  specialized PrefixTermsEnum.  (Robert Muir, Mike McCandless)
7041
7042Other
7043
7044* LUCENE-6248: Remove unused odd constants from StandardSyntaxParser.jj
7045  (Dawid Weiss)
7046
7047* LUCENE-6193: Collapse identical catch branches in try-catch statements.
7048  (shalin)
7049
7050* LUCENE-6239: Removed RAMUsageEstimator's sun.misc.Unsafe calls.
7051  (Robert Muir, Dawid Weiss, Uwe Schindler)
7052
7053* LUCENE-6292: Seed StringHelper better. (Robert Muir)
7054
7055* LUCENE-6333: Refactored queries to delegate their equals and hashcode
7056  impls to the super class. (Lee Hinman via Adrien Grand)
7057
7058* LUCENE-6343: DefaultSimilarity javadocs had the wrong float value to
7059  demonstrate precision of encoded norms (András Péteri via Mike McCandless)
7060
7061Changes in Runtime Behavior
7062
7063* LUCENE-6255: PhraseQuery now ignores leading holes and requires that
7064  positions are positive and added in order. (Adrien Grand)
7065
7066* LUCENE-6298: SimpleQueryParser returns an empty query rather than
7067  null, if e.g. the terms were all stopwords. (Lee Hinman via Robert Muir)
7068
7069======================= Lucene 5.0.0 =======================
7070
7071New Features
7072
7073* LUCENE-5945: All file handling converted to NIO.2 apis. (Robert Muir)
7074
7075* LUCENE-5946: SimpleFSDirectory now uses Files.newByteChannel, for
7076  portability with custom FileSystemProviders. If you want the old
7077  non-interruptible behavior of RandomAccessFile, use RAFDirectory
7078  in the misc/ module. (Uwe Schindler, Robert Muir)
7079
7080* SOLR-3359: Added analyzer attribute/property to SynonymFilterFactory.
7081  (Ryo Onodera via Koji Sekiguchi)
7082
7083* LUCENE-5648: Index and search date ranges, particularly multi-valued ones. It's
7084  implemented in the spatial module as DateRangePrefixTree used with
7085  NumberRangePrefixTreeStrategy. (David Smiley)
7086
7087* LUCENE-5895: Lucene now stores a unique id per-segment and per-commit to aid
7088  in accurate replication of index files (Robert Muir, Mike McCandless)
7089
7090* LUCENE-5889: Add commit method to AnalyzingInfixSuggester, and allow just using .add
7091  to build up the suggester.  (Varun Thacker via Mike McCandless)
7092
7093* LUCENE-5123: Add a "pull" option to the postings writing API, so
7094  that a PostingsFormat now receives a Fields instance and it is
7095  responsible for iterating through all fields, terms, documents and
7096  positions.  (Robert Muir, Mike McCandless)
7097
7098* LUCENE-5268: Full cutover of all postings formats to the "pull"
7099  FieldsConsumer API, removing PushFieldsConsumer.  Added new
7100  PushPostingsWriterBase for single-pass push of docs/positions to the
7101  postings format.  (Mike McCandless)
7102
7103* LUCENE-5906: Use Files.delete everywhere instead of File.delete, so that
7104  when things go wrong, you get a real exception message why.
7105  (Uwe Schindler, Robert Muir)
7106
7107* LUCENE-5933: Added FilterSpans for easier wrapping of Spans instance. (Shai Erera)
7108
7109* LUCENE-5925: Remove fallback logic from opening commits, instead use
7110  Directory.renameFile so that in-progress commits are never visible.
7111  (Robert Muir)
7112
7113* LUCENE-5820: SuggestStopFilter should have a factory.
7114  (Varun Thacker via Steve Rowe)
7115
7116* LUCENE-5949: Add Accountable.getChildResources(). (Robert Muir)
7117
7118* SOLR-5986: Added ExitableDirectoryReader that extends FilterDirectoryReader and enables
7119  exiting requests that take too long to enumerate over terms. (Anshum Gupta, Steve Rowe,
7120  Robert Muir)
7121
7122* LUCENE-5911: Add MemoryIndex.freeze() to allow thread-safe searching over a
7123  MemoryIndex. (Alan Woodward, David Smiley, Robert Muir)
7124
7125* LUCENE-5969: Lucene 5.0 has a new index format with mismatched file detection,
7126  improved exception handling, and indirect norms encoding for sparse fields.
7127  (Mike McCandless, Ryan Ernst, Robert Muir)
7128
7129* LUCENE-6053: Add Serbian analyzer.  (Nikola Smolenski via Robert Muir, Mike McCandless)
7130
7131* LUCENE-4400: Add support for new NYSIIS Apache commons phonetic
7132  codec (Thomas Neidhart via Mike McCandless)
7133
7134* LUCENE-6059: Add Daitch-Mokotoff Soundex phonetic Apache commons
7135  phonetic codec, and upgrade to Apache commons codec 1.10. (Thomas
7136  Neidhart via Mike McCandless)
7137
7138* LUCENE-6058: With the upgrade to Apache commons codec 1.10, the
7139  experimental BeiderMorseFilter has changed its behavior, so any
7140  index using it will need to be rebuilt.  (Thomas
7141  Neidhart via Mike McCandless)
7142
7143* LUCENE-6050: Accept MUST and MUST_NOT (in addition to SHOULD) for
7144  each context passed to Analyzing/BlendedInfixSuggester (Arcadius
7145  Ahouansou, jane chang via Mike McCandless)
7146
7147* LUCENE-5929: Also extract terms to highlight from block join
7148  queries. (Julie Tibshirani via Mike McCandless)
7149
7150* LUCENE-6063: Allow overriding whether/how ConcurrentMergeScheduler
7151  stalls incoming threads when merges are falling behind (Mike
7152  McCandless)
7153
7154* LUCENE-5833: DocumentDictionary now enumerates each value separately
7155  in a multi-valued field (not just the first value), so you can build
7156  suggesters from multi-valued fields.  (Varun Thacker via Mike
7157  McCandless)
7158
7159* LUCENE-6077: Added a filter cache. (Adrien Grand, Robert Muir)
7160
7161* LUCENE-6088: TermsFilter implements Accountable. (Adrien Grand)
7162
7163* LUCENE-6034: The default highlighter when used with QueryScorer will highlight payload-sensitive
7164  queries provided that term vectors with positions, offsets, and payloads are present. This is the
7165  only highlighter that can highlight such queries accurately. (David Smiley)
7166
7167* LUCENE-5914: Add an option to Lucene50Codec to support either BEST_SPEED
7168  or BEST_COMPRESSION for stored fields. (Adrien Grand, Robert Muir)
7169
7170* LUCENE-6119: Add auto-IO-throttling to ConcurrentMergeScheduler, to
7171  rate limit IO writes for each merge depending on incoming merge
7172  rate.  (Mike McCandless)
7173
7174* LUCENE-6155: Add payload support to MemoryIndex. The default highlighter's
7175  QueryScorer and WeighedSpanTermExtractor now have setUsePayloads(bool).
7176  (David Smiley)
7177
7178* LUCENE-6166: Deletions (alone) can now trigger new merges.  (Mike McCandless)
7179
7180* LUCENE-6177: Add CustomAnalyzer that allows to configure analyzers
7181  like you do in Solr's index schema. This class has a builder API to configure
7182  Tokenizers, TokenFilters, and CharFilters based on their SPI names
7183  and parameters as documented by the corresponding factories.
7184  (Uwe Schindler)
7185
7186Optimizations
7187
7188* LUCENE-5960: Use a more efficient bitset, not a Set<Integer>, to
7189  track visited states.  (Markus Heiden via Mike McCandless)
7190
7191* LUCENE-5959: Don't allocate excess memory when building automaton in
7192  finish. (Markus Heiden via Mike McCandless)
7193
7194* LUCENE-5963: Reduce memory allocations in
7195  AnalyzingSuggester. (Markus Heiden via Mike McCandless)
7196
7197* LUCENE-5938: MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is now faster on
7198  queries that match few documents by using a sparse bit set implementation.
7199  (Adrien Grand)
7200
7201* LUCENE-5969: Refactor merging to be more efficient, checksum calculation is
7202  per-segment/per-producer, and norms and doc values merging no longer cause
7203  RAM spikes for latent fields. (Mike McCandless, Robert Muir)
7204
7205* LUCENE-5983: CachingWrapperFilter now uses a new DocIdSet implementation
7206  called RoaringDocIdSet instead of WAH8DocIdSet. (Adrien Grand)
7207
7208* LUCENE-6022: DocValuesDocIdSet checks live docs before doc values.
7209  (Adrien Grand)
7210
7211* LUCENE-6030: Add norms patched compression for a small number of common values
7212  (Ryan Ernst)
7213
7214* LUCENE-6040: Speed up EliasFanoDocIdSet through broadword bit selection.
7215  (Paul Elschot)
7216
7217* LUCENE-6033: CachingTokenFilter now uses ArrayList not LinkedList, and has new
7218  isCached() method. (David Smiley)
7219
7220* LUCENE-6031: TokenSources (in the default highlighter) converts term vectors into a
7221  TokenStream much faster in linear time (not N*log(N) using less memory, and with reset()
7222  implemented.  Only one of offsets or positions are required of the term vector.
7223  (David Smiley)
7224
7225* LUCENE-6089, LUCENE-6090: Tune CompressionMode.HIGH_COMPRESSION for
7226  better compression and less cpu usage. (Adrien Grand, Robert Muir)
7227
7228* LUCENE-6034: QueryScorer, used by the default highlighter, needn't re-index the provided
7229  TokenStream with MemoryIndex when it comes from TokenSources (term vectors) with offsets and
7230  positions. (David Smiley)
7231
7232* LUCENE-5951: ConcurrentMergeScheduler detects whether the index is on SSD or not
7233  and does a better job defaulting its settings.  This only works on Linux for now;
7234  other OS's will continue to use the previous defaults (tuned for spinning disks).
7235  (Robert Muir, Uwe Schindler, hossman, Mike McCandless)
7236
7237* LUCENE-6131: Optimize SortingMergePolicy. (Robert Muir)
7238
7239* LUCENE-6133: Improve default StoredFieldsWriter.merge() to be more efficient.
7240  (Robert Muir)
7241
7242* LUCENE-6145: Make EarlyTerminatingSortingCollector able to early-terminate
7243  when the sort order is a prefix of the index-time order. (Adrien Grand)
7244
7245* LUCENE-6178: Score boolean queries containing MUST_NOT clauses with BooleanScorer2,
7246  to use skip list data and avoid unnecessary scoring. (Adrien Grand, Robert Muir)
7247
7248API Changes
7249
7250* LUCENE-5900: Deprecated more constructors taking Version in *InfixSuggester and
7251  ICUCollationKeyAnalyzer, and removed TEST_VERSION_CURRENT from the test framework.
7252  (Ryan Ernst)
7253
7254* LUCENE-4535: oal.util.FilterIterator is now an internal API.
7255  (Adrien Grand)
7256
7257* LUCENE-4924: DocIdSetIterator.docID() must now return -1 when the iterator is
7258  not positioned. This change affects all classes that inherit from
7259  DocIdSetIterator, including DocsEnum and DocsAndPositionsEnum. (Adrien Grand)
7260
7261* LUCENE-5127: Reduce RAM usage of FixedGapTermsIndex. Remove
7262  IndexWriterConfig.setTermIndexInterval, IndexWriterConfig.setReaderTermsIndexDivisor,
7263  and termsIndexDivisor from StandardDirectoryReader. These options have been no-ops
7264  with the default codec since Lucene 4.0. If you want to configure the interval for
7265  this term index, pass it directly in your codec, where it can also be configured
7266  per-field. (Robert Muir)
7267
7268* LUCENE-5388: Remove Reader from Tokenizer's constructor and from
7269  Analyzer's createComponents. TokenStreams now always get their input
7270  via setReader.
7271  (Benson Margulies via Robert Muir - pull request #16)
7272
7273* LUCENE-5527: The Collector API has been refactored to use a dedicated Collector
7274  per leaf. (Shikhar Bhushan, Adrien Grand)
7275
7276* LUCENE-5702: The FieldComparator API has been refactor to a per-leaf API, just
7277  like Collectors. (Adrien Grand)
7278
7279* LUCENE-4246: IndexWriter.close now always closes, even if it throws
7280  an exception.  The new IndexWriterConfig.setCommitOnClose (default
7281  true) determines whether close() should commit before closing.
7282
7283* LUCENE-5608, LUCENE-5565: Refactor SpatialPrefixTree/Cell API. Doesn't use Strings
7284  as tokens anymore, and now iterates cells on-demand during indexing instead of
7285  building a collection.  RPT now has more setters. (David Smiley)
7286
7287* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
7288  to use the DocValues API instead of FieldCache. For FieldCache functionality,
7289  use UninvertingReader in lucene/misc (or implement your own FilterReader).
7290  UninvertingReader is more efficient: supports multi-valued numeric fields,
7291  detects when a multi-valued field is single-valued, reuses caches
7292  of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
7293  without insanity).  "Insanity" is no longer possible unless you explicitly want it.
7294  Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*.
7295  Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which
7296  takes the same selectors. Add helper methods to DocValues.java that are better
7297  suited for search code (never return null, etc).  (Mike McCandless, Robert Muir)
7298
7299* LUCENE-5871: Remove Version from IndexWriterConfig. Use
7300  IndexWriterConfig.setCommitOnClose to change the behavior of IndexWriter.close().
7301  The default has been changed to match that of 4.x.
7302  (Ryan Ernst, Mike McCandless)
7303
7304* LUCENE-5965: CorruptIndexException requires a String or DataInput resource.
7305  (Robert Muir)
7306
7307* LUCENE-5972: IndexFormatTooOldException and IndexFormatTooNewException now
7308               extend from IOException.
7309  (Ryan Ernst, Robert Muir)
7310
7311* LUCENE-5569: *AtomicReader/AtomicReaderContext have been renamed to *LeafReader/LeafReaderContext.
7312  (Ryan Ernst)
7313
7314* LUCENE-5938: Removed MultiTermQuery.ConstantScoreAutoRewrite as
7315  MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is usually better. (Adrien Grand)
7316
7317* LUCENE-5924: Rename CheckIndex -fix option to -exorcise. This option does not
7318  actually fix the index, it just drops data.  (Robert Muir)
7319
7320* LUCENE-5969: Add Codec.compoundFormat, which handles the encoding of compound
7321  files. Add getMergeInstance() to codec producer APIs, which can be overridden
7322  to return an instance optimized for merging instead of searching. Add
7323  Terms.getStats() which can return additional codec-specific statistics about a field.
7324  Change instance method SegmentInfos.read() to two static methods: SegmentInfos.readCommit()
7325  and SegmentInfos.readLatestCommit().
7326  (Mike McCandless, Robert Muir)
7327
7328* LUCENE-5992: Remove FieldInfos from SegmentInfosWriter.write API. (Robert Muir, Mike McCandless)
7329
7330* LUCENE-5998: Simplify Field/SegmentInfoFormat to read+write methods.
7331  (Robert Muir)
7332
7333* LUCENE-6000: Removed StandardTokenizerInterface.  Tokenizers now use
7334  their jflex impl directly.
7335  (Ryan Ernst)
7336
7337* LUCENE-6006: Removed FieldInfo.normType since it's redundant: it
7338  will be DocValuesType.NUMERIC if the field indexed and does not omit
7339  norms, else null.  (Robert Muir, Mike McCandless)
7340
7341* LUCENE-6013: Removed indexed boolean from IndexableFieldType and
7342  FieldInfo, since it's redundant with IndexOptions != null. (Robert
7343  Muir, Mike McCandless)
7344
7345* LUCENE-6021: FixedBitSet.nextSetBit now returns DocIdSetIterator.NO_MORE_DOCS
7346  instead of -1 when there are no more bits which are set. (Adrien Grand)
7347
7348* LUCENE-5953: Directory and LockFactory APIs were restructured: Locking is
7349  now under the responsibility of the Directory implementation. LockFactory is
7350  only used by subclasses of BaseDirectory to delegate locking to an impl
7351  class. LockFactories are now singletons and are responsible to create a Lock
7352  instance based on a Directory implementation passed to the factory method.
7353  See MIGRATE.txt for more details.  (Uwe Schindler, Robert Muir)
7354
7355* LUCENE-6062: Throw exception instead of silently doing nothing if you try to
7356  sort/group/etc on a misconfigured field (e.g. no docvalues, no UninvertingReader, etc).
7357  (Robert Muir)
7358
7359* LUCENE-6068: LeafReader.fields() never returns null. (Robert Muir)
7360
7361* LUCENE-6082: Remove abort() from codec apis. (Robert Muir)
7362
7363* LUCENE-6084: IndexOutput's constructor now requires a String
7364  resourceDescription so its toString is sane (Robert Muir, Mike
7365  McCandless)
7366
7367* LUCENE-6087: Allow passing custom DirectoryReader to SearcherManager
7368  (Mike McCandless)
7369
7370* LUCENE-6085: Undeprecate SegmentInfo attributes, but add safety so they
7371  won't be trappy if codec tries to use them during docvalues updates.
7372  (Robert Muir)
7373
7374* LUCENE-6097: Remove dangerous / overly expert
7375  IndexWriter.abortMerges and waitForMerges methods.  (Robert Muir,
7376  Mike McCandless)
7377
7378* LUCENE-6099: Add FilterDirectory.unwrap and
7379  FilterDirectoryReader.unwrap (Simon Willnauer, Mike McCandless)
7380
7381* LUCENE-6121: CachingTokenFilter.reset() now propagates to its input if called before
7382  incrementToken().  You must call reset() now on this filter instead of doing it a-priori on the
7383  input(), which previously didn't work.  (David Smiley, Robert Muir)
7384
7385* LUCENE-6147: Make the core Accountables.namedAccountable function public
7386  (Ryan Ernst)
7387
7388* LUCENE-6150: Remove staleFiles set and onIndexOutputClosed() from FSDirectory.
7389  (Uwe Schindler, Robert Muir, Mike McCandless)
7390
7391* LUCENE-6146: Replaced Directory.copy() with Directory.copyFrom().
7392  (Robert Muir)
7393
7394* LUCENE-6149: Infix suggesters' highlighting and allTermsRequired can
7395  be set at the constructor for non-contextual lookup.
7396  (Boon Low, Tomás Fernández Löbbe)
7397
7398* LUCENE-6158, LUCENE-6165: IndexWriter.addIndexes(IndexReader...) changed to
7399  addIndexes(CodecReader...) (Robert Muir)
7400
7401* LUCENE-6179: Out-of-order scoring is not allowed anymore, so
7402  Weight.scoresDocsOutOfOrder and LeafCollector.acceptsDocsOutOfOrder have been
7403  removed and boolean queries now always score in order.
7404
7405* LUCENE-6212: IndexWriter no longer accepts per-document Analyzer to
7406  add/updateDocument.  These methods were trappy as they made it
7407  easy to accidentally index tokens that were not easily
7408  searchable. (Mike McCandless)
7409
7410Bug Fixes
7411
7412* LUCENE-5650: Enforce read-only access to any path outside the temporary
7413  folder via security manager, and make test temp dirs absolute.
7414  (Ryan Ernst, Dawid Weiss)
7415
7416* LUCENE-5948: RateLimiter now fully inits itself on init.  (Varun
7417  Thacker via Mike McCandless)
7418
7419* LUCENE-5981: CheckIndex obtains write.lock, since with some parameters it
7420  may modify the index, and to prevent false corruption reports, as it does
7421  not have the regular "spinlock" of DirectoryReader.open. It now implements
7422  Closeable and you must close it to release the lock.  (Mike McCandless, Robert Muir)
7423
7424* LUCENE-6004: Don't highlight the LookupResult.key returned from
7425  AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless)
7426
7427* LUCENE-5980: Don't let document length overflow. (Robert Muir)
7428
7429* LUCENE-5961: Fix the exists() method for FunctionValues returned by many ValueSources to
7430  behave properly when wrapping other ValueSources which do not exist for the specified document
7431  (hossman)
7432
7433* LUCENE-6039: Add IndexOptions.NONE and DocValuesType.NONE instead of
7434  using null to mean not index and no doc values, renamed
7435  IndexOptions.DOCS_ONLY to DOCS, and pulled IndexOptions and
7436  DocValues out of FieldInfo into their own classes in
7437  org.apache.lucene.index (Simon Willnauer, Robert Muir, Mike
7438  McCandless)
7439
7440* LUCENE-6041: Remove sugar methods FieldInfo.isIndexed and
7441  FieldInfo.hasDocValues.  (Robert Muir, Mike McCandless)
7442
7443* LUCENE-6044: Fix backcompat support for token filters with enablePositionIncrements=false.
7444  Also fixed backcompat for TrimFilter with updateOffsets=true.  These options
7445  are supported with a match version before 4.4, and no longer valid at all with 5.0.
7446  (Ryan Ernst)
7447
7448* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases,
7449  such as when nested inside a boolean query. (Denis Lantsman via Robert Muir)
7450
7451* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has
7452  an exponential worst case) so that if it would create too many states, it
7453  now throws an exception instead of exhausting CPU/RAM.  (Nik
7454  Everett via Mike McCandless)
7455
7456* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via
7457  Mike McCandless)
7458
7459* LUCENE-6049: Don't throw cryptic exception writing a segment when
7460  the only docs in it had fields that hit non-aborting exceptions
7461  during indexing but also had doc values.  (Mike McCandless)
7462
7463* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying
7464  bytes. (Shai Erera)
7465
7466* LUCENE-6060: Remove dangerous IndexWriter.unlock method (Simon
7467  Willnauer, Mike McCandless)
7468
7469* LUCENE-6062: Pass correct fieldinfos to docvalues producer when the
7470  segment has updates. (Mike McCandless, Shai Erera, Robert Muir)
7471
7472* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes
7473  via Mike McCandless)
7474
7475* LUCENE-5987: IndexWriter will now forcefully close itself on
7476  aborting exception (an exception that would otherwise cause silent
7477  data loss).  (Robert Muir, Mike McCandless)
7478
7479* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even
7480  when it's stalling because there are too many merges. (Mike McCandless)
7481
7482* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is
7483  small, or if the cache would be > 20% of the size of the FST.
7484  (Robert Muir, Mike McCandless)
7485
7486* LUCENE-6124: Fix double-close() problems in codec and store APIs.
7487  (Robert Muir)
7488
7489* LUCENE-6152: Fix double close problems in OutputStreamIndexOutput.
7490  (Uwe Schindler)
7491
7492* LUCENE-6139: Highlighter: TokenGroup start & end offset getters should have
7493  been returning the offsets of just the matching tokens in the group when
7494  there's a distinction. (David Smiley)
7495
7496* LUCENE-6173: NumericTermAttribute and spatial/CellTokenStream do not clone
7497  their BytesRef(Builder)s. Also equals/hashCode was missing.  (Uwe Schindler)
7498
7499* LUCENE-6205: Fixed intermittent concurrency issue that could cause
7500  FileNotFoundException when writing doc values updates at the same
7501  time that a merge kicks off.  (Mike McCandless)
7502
7503* LUCENE-6192: Fix int overflow corruption case in skip data for
7504  high frequency terms in extremely large indices (Robert Muir, Mike
7505  McCandless)
7506
7507* LUCENE-6093: Don't throw NullPointerException from
7508  BlendedInfixSuggester for lookups that do not end in a prefix
7509  token.  (jane chang via Mike McCandless)
7510
7511* LUCENE-6214: Fixed IndexWriter deadlock when one thread is
7512  committing while another opens a near-real-time reader and an
7513  unrecoverable (tragic) exception is hit.  (Simon Willnauer, Mike
7514  McCandless)
7515
7516Documentation
7517
7518* LUCENE-5392: Add/improve analysis package documentation to reflect
7519  analysis API changes.  (Benson Margulies via Robert Muir - pull request #17)
7520
7521* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless)
7522
7523* LUCENE-6112: Fix compile error in FST package example code
7524  (Tomoko Uchida via Koji Sekiguchi)
7525
7526Tests
7527
7528* LUCENE-5957: Add option for tests to not randomize codec
7529  (Ryan Ernst)
7530
7531* LUCENE-5974: Add check that backcompat indexes use default codecs
7532  (Ryan Ernst)
7533
7534* LUCENE-5971: Create addBackcompatIndexes.py script to build and add
7535  backcompat test indexes for a given lucene version. Also renamed backcompat
7536  index files to use Version.toString() in filename.
7537  (Ryan Ernst)
7538
7539* LUCENE-6002: Monster tests no longer fail.  Most of them now have an 80 hour
7540  timeout, effectively removing the timeout.  The tests that operate near the 2
7541  billion limit now use IndexWriter.MAX_DOCS instead of Integer.MAX_VALUE.
7542  Some of the slow Monster tests now explicitly choose the default codec.
7543  (Mike McCandless, Shawn Heisey)
7544
7545* LUCENE-5968: Improve error message when 'ant beast' is run on top-level
7546  modules.  (Ramkumar Aiyengar, Uwe Schindler)
7547
7548* LUCENE-6120: Fix MockDirectoryWrapper's close() handling.
7549  (Mike McCandless, Robert Muir)
7550
7551Build
7552
7553* LUCENE-5909: Smoke tester now has better command line parsing and
7554  optionally also runs on Java 8.  (Ryan Ernst, Uwe Schindler)
7555
7556* LUCENE-5902: Add bumpVersion.py script to manage version increase after release branch is cut.
7557
7558* LUCENE-5962: Rename diffSources.py to createPatch.py and make it work with all text file types.
7559  (Ryan Ernst)
7560
7561* LUCENE-5995: Upgrade ICU to 54.1 (Robert Muir)
7562
7563* LUCENE-6070: Upgrade forbidden-apis to 1.7 (Uwe Schindler)
7564
7565Other
7566
7567* LUCENE-5563: Removed sep layout: which has fallen behind on features and doesn't
7568  perform as well as other options.  (Robert Muir)
7569
7570* LUCENE-4086: Removed support for Lucene 3.x indexes. See migration guide for
7571  more information.  (Robert Muir)
7572
7573* LUCENE-5858: Moved Lucene 4 compatibility codecs to 'lucene-backward-codecs.jar'.
7574  (Adrien Grand, Robert Muir)
7575
7576* LUCENE-5915: Remove Pulsing postings format. (Robert Muir)
7577
7578* LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs.
7579  (Ryan Ernst)
7580
7581======================= Lucene 4.10.4 ======================
7582
7583Bug fixes
7584
7585* LUCENE-6019, LUCENE-6117: Remove -Dtests.assert to make IndexWriter
7586  infoStream sane.  (Robert Muir, Mike McCandless)
7587
7588* LUCENE-6161: Resolving deletes was failing to reuse DocsEnum likely
7589  causing substantial performance cost for use cases that frequently
7590  delete old documents (Mike McCandless)
7591
7592* LUCENE-6192: Fix int overflow corruption case in skip data for
7593  high frequency terms in extremely large indices (Robert Muir, Mike
7594  McCandless)
7595
7596* LUCENE-6207: Fixed consumption of several terms enums on the same
7597  sorted (set) doc values instance at the same time.
7598  (Tom Shally, Robert Muir, Adrien Grand)
7599
7600* LUCENE-6093: Don't throw NullPointerException from
7601  BlendedInfixSuggester for lookups that do not end in a prefix
7602  token.  (jane chang via Mike McCandless)
7603
7604* LUCENE-6279: Don't let an abusive leftover _N_upgraded.si in the
7605  index directory cause index corruption on upgrade (Robert Muir, Mike
7606  McCandless)
7607
7608* LUCENE-6287: Fix concurrency bug in IndexWriter that could cause
7609  index corruption (missing _N.si files) the first time 4.x kisses a
7610  3.x index if merges are also running.  (Simon Willnauer, Mike
7611  McCandless)
7612
7613* LUCENE-6205: Fixed intermittent concurrency issue that could cause
7614  FileNotFoundException when writing doc values updates at the same
7615  time that a merge kicks off.  (Mike McCandless)
7616
7617* LUCENE-6214: Fixed IndexWriter deadlock when one thread is
7618  committing while another opens a near-real-time reader and an
7619  unrecoverable (tragic) exception is hit.  (Simon Willnauer, Mike
7620  McCandless)
7621
7622* LUCENE-6105: Don't cache FST root arcs if the number of root arcs is
7623  small, or if the cache would be > 20% of the size of the FST.
7624  (Robert Muir, Mike McCandless)
7625
7626* LUCENE-6001: DrillSideways hits NullPointerException for certain
7627  BooleanQuery searches.  (Dragan Jotannovic, jane chang via Mike
7628  McCandless)
7629
7630* LUCENE-6306: Merging of doc values and norms now checks whether the
7631  merge was aborted so IndexWriter.rollback can more promptly abort a
7632  running merge. (Robert Muir, Mike McCandless)
7633
7634API Changes
7635
7636* LUCENE-6212: Deprecate IndexWriter APIs that accept per-document Analyzer.
7637  These methods were trappy as they made it easy to accidentally index
7638  tokens that were not easily searchable and will be removed in 5.0.0.
7639  (Mike McCandless)
7640
7641======================= Lucene 4.10.3 ======================
7642
7643Bug fixes
7644
7645* LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has
7646  an exponential worst case) so that if it would create too many states, it
7647  now throws an exception instead of exhausting CPU/RAM.  (Nik
7648  Everett via Mike McCandless)
7649
7650* LUCENE-6054: Allow repeating the empty automaton (Nik Everett via
7651  Mike McCandless)
7652
7653* LUCENE-6049: Don't throw cryptic exception writing a segment when
7654  the only docs in it had fields that hit non-aborting exceptions
7655  during indexing but also had doc values.  (Mike McCandless)
7656
7657* LUCENE-6060: Deprecate IndexWriter.unlock (Simon Willnauer, Mike
7658  McCandless)
7659
7660* LUCENE-3229: Overlapping ordered SpanNearQuery spans should not match.
7661  (Ludovic Boutros, Paul Elschot, Greg Dearing, ehatcher)
7662
7663* LUCENE-6004: Don't highlight the LookupResult.key returned from
7664  AnalyzingInfixSuggester (Christian Reuschling, jane chang via Mike McCandless)
7665
7666* LUCENE-6075: Don't overflow int in SimpleRateLimiter (Boaz Leskes
7667  via Mike McCandless)
7668
7669* LUCENE-5980: Don't let document length overflow. (Robert Muir)
7670
7671* LUCENE-6042: CustomScoreQuery explain was incorrect in some cases,
7672  such as when nested inside a boolean query. (Denis Lantsman via Robert Muir)
7673
7674* LUCENE-5948: RateLimiter now fully inits itself on init.  (Varun
7675  Thacker via Mike McCandless)
7676
7677* LUCENE-6055: PayloadAttribute.clone() now does a deep clone of the underlying
7678  bytes. (Shai Erera)
7679
7680* LUCENE-6094: Allow IW.rollback to stop ConcurrentMergeScheduler even
7681  when it's stalling because there are too many merges. (Mike McCandless)
7682
7683Documentation
7684
7685* LUCENE-6057: Improve Sort(SortField) docs (Martin Braun via Mike McCandless)
7686
7687======================= Lucene 4.10.2 ======================
7688
7689Bug fixes
7690
7691* LUCENE-5977: Fix tokenstream safety checks in IndexWriter to properly
7692  work across multi-valued fields. Previously some cases across multi-valued
7693  fields would happily create a corrupt index. (Dawid Weiss, Robert Muir)
7694
7695* LUCENE-6019: Detect when DocValuesType illegally changes for the
7696  same field name.  Also added -Dtests.asserts=true|false so we can
7697  run tests with and without assertions. (Simon Willnauer, Robert
7698  Muir, Mike McCandless).
7699
7700======================= Lucene 4.10.1 ======================
7701
7702Bug fixes
7703
7704* LUCENE-5934: Fix backwards compatibility for 4.0 indexes.
7705  (Ian Lea, Uwe Schindler, Robert Muir, Ryan Ernst)
7706
7707* LUCENE-5939: Regenerate old backcompat indexes to ensure they were built with
7708  the exact release
7709  (Ryan Ernst, Uwe Schindler)
7710
7711* LUCENE-5952: Improve error messages when version cannot be parsed;
7712  don't check for too old or too new major version (it's too low level
7713  to enforce here); use simple string tokenizer.  (Ryan Ernst, Uwe Schindler,
7714  Robert Muir, Mike McCandless)
7715
7716* LUCENE-5958: Don't let exceptions during checkpoint corrupt the index.
7717  Refactor existing OOM handling too, so you don't need to handle OOM special
7718  for every IndexWriter method: instead such disasters will cause IW to close itself
7719  defensively. (Robert Muir, Mike McCandless)
7720
7721* LUCENE-5904: Fixed a corruption case that can happen when 1)
7722  IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2)
7723  on startup, when a new IndexWriter is created, a virus checker is
7724  holding some of the previously written but unused files open and
7725  preventing deletion, 3) IndexWriter writes these files again during
7726  the course of indexing, then the files can later be deleted, causing
7727  corruption.  This case was detected by adding evilness to
7728  MockDirectoryWrapper to have it simulate a virus checker holding a
7729  file open and preventing deletion (Robert Muir, Mike McCandless)
7730
7731* LUCENE-5916: Static scope test components should be consistent between
7732  tests (and test iterations). Fix for FaultyIndexInput in particular.
7733  (Dawid Weiss)
7734
7735* LUCENE-5975: Fix reading of 3.0-3.3 indexes, where bugs in these old
7736  index formats would result in CorruptIndexException "did not read all
7737  bytes from file" when reading the deleted docs file. (Patrick Mi, Robert MUir)
7738
7739Tests
7740
7741* LUCENE-5936: Add backcompat checks to verify what is tested matches known versions
7742  (Ryan Ernst)
7743
7744======================= Lucene 4.10.0 ======================
7745
7746New Features
7747
7748* LUCENE-5778: Support hunspell morphological description fields/aliases.
7749  (Robert Muir)
7750
7751* LUCENE-5801: Added (back) OrdinalMappingAtomicReader for merging search
7752  indexes that contain category ordinals from separate taxonomy indexes.
7753  (Nicola Buso via Shai Erera)
7754
7755* LUCENE-4175, LUCENE-5714, LUCENE-5779: Index and search rectangles with spatial
7756  BBoxSpatialStrategy using most predicates.  Sort documents by relative overlap
7757  of query areas or just by indexed shape area. (Ryan McKinley, David Smiley)
7758
7759* LUCENE-5806: Extend expressions grammar to support array access in variables.
7760  Added helper class VariableContext to parse complex variable into pieces.
7761  (Ryan Ernst)
7762
7763* LUCENE-5826: Support proper hunspell case handling, LANG, KEEPCASE, NEEDAFFIX,
7764  and ONLYINCOMPOUND flags.  (Robert Muir)
7765
7766* LUCENE-5815: Add TermAutomatonQuery, a proximity query allowing you
7767  to create an arbitrary automaton, using terms on the transitions,
7768  expressing which sequence of sequential terms (including a special
7769  "any" term) are allowed.  This is a generalization of
7770  MultiPhraseQuery and span queries, and enables "correct" (including
7771  position) length search-time graph synonyms.  (Mike McCandless)
7772
7773* LUCENE-5819: Add OrdsLucene41 block tree terms dict and postings
7774  format, to include term ordinals in the index so the optional
7775  TermsEnum.ord() and TermsEnum.seekExact(long ord) APIs work.  (Mike
7776  McCandless)
7777
7778* LUCENE-5835: TermValComparator can sort missing values last. (Adrien Grand)
7779
7780* LUCENE-5825: Benchmark module can use custom postings format, e.g.:
7781 codec.postingsFormat=Memory (Varun Shenoy, David Smiley)
7782
7783* LUCENE-5842: When opening large files (where it's too expensive to compare
7784  checksum against all the bytes), retrieve checksum to validate structure
7785  of footer, this can detect some forms of corruption such as truncation.
7786  (Robert Muir)
7787
7788* LUCENE-5739: Added DataInput.readZ(Int|Long) and DataOutput.writeZ(Int|Long)
7789  to read and write small signed integers. (Adrien Grand)
7790
7791API Changes
7792
7793* LUCENE-5752: Simplified Automaton API to be immutable. (Mike McCandless)
7794
7795* LUCENE-5793: Add equals/hashCode to FieldType. (Shay Banon, Robert Muir)
7796
7797* LUCENE-5692: DisjointSpatialFilter is deprecated (used by RecursivePrefixTreeStrategy)
7798  (David Smiley)
7799
7800* LUCENE-5771: SpatialOperation's predicate names are now aliased to OGC standard names.
7801  Thus you can use: Disjoint, Equals, Intersects, Overlaps, Within, Contains, Covers,
7802  CoveredBy. The area requirement on the predicates was removed, and Overlaps' definition
7803  was fixed. (David Smiley)
7804
7805* LUCENE-5850: Made Version handling more robust and extensible. Deprecated
7806  Constants.LUCENE_MAIN_VERSION, Constants.LUCENE_VERSION and current Version
7807  constants of the form LUCENE_X_Y. Added version constants that include bugfix
7808  number of form LUCENE_X_Y_Z.  Changed Version.LUCENE_CURRENT to Version.LATEST.
7809  CheckIndex now prints the Lucene version used to write each segment.
7810  (Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless)
7811
7812* LUCENE-5836: BytesRef has been splitted into BytesRef, whose intended usage is
7813  to be just a reference to a section of a larger byte[] and BytesRefBuilder
7814  which is a StringBuilder-like class for BytesRef instances. (Adrien Grand)
7815
7816* LUCENE-5883: You can now change the MergePolicy instance on a live IndexWriter,
7817  without first closing and reopening the writer. This allows to e.g. run a special
7818  merge with UpgradeIndexMergePolicy without reopening the writer. Also, MergePolicy
7819  no longer implements Closeable; if you need to release your custom MergePolicy's
7820  resources, you need to implement close() and call it explicitly. (Shai Erera)
7821
7822* LUCENE-5859: Deprecate Analyzer constructors taking Version.  Use Analyzer.setVersion()
7823  to set the version an analyzer to replicate behavior from a specific release.
7824  (Ryan Ernst, Robert Muir)
7825
7826
7827Optimizations
7828
7829* LUCENE-5780: Make OrdinalMap more memory-efficient, especially in case the
7830  first segment has all values. (Adrien Grand, Robert Muir)
7831
7832* LUCENE-5782: OrdinalMap now sorts enums before being built in order to
7833  improve compression. (Adrien Grand)
7834
7835* LUCENE-5798: Optimize MultiDocsEnum reuse. (Robert Muir)
7836
7837* LUCENE-5799: Optimize numeric docvalues merging. (Robert Muir)
7838
7839* LUCENE-5797: Optimize norms merging (Adrien Grand, Robert Muir)
7840
7841* LUCENE-5803: Add DelegatingAnalyzerWrapper, an optimized variant
7842  of AnalyzerWrapper that doesn't allow to wrap components or readers.
7843  This wrapper class is the base class of all analyzers that just delegate
7844  to another analyzer, e.g. per field name: PerFieldAnalyzerWrapper and
7845  Solr's schema support.  (Shay Banon, Uwe Schindler, Robert Muir)
7846
7847* LUCENE-5795: MoreLikeThisQuery now only collects the top N terms instead
7848  of collecting all terms from the like text when building the query.
7849  (Alex Ksikes, Simon Willnauer)
7850
7851* LUCENE-5681: Fix RAMDirectory's IndexInput to not do double buffering
7852  on slices (causes useless data copying, especially on random access slices).
7853  This also improves slices of NRTCachingDirectory, because the cache
7854  is based on RAMDirectory. BufferedIndexInput.wrap() was marked with a
7855  warning in javadocs. It is almost always a better idea to implement
7856  slicing on your own!  (Uwe Schindler, Robert Muir)
7857
7858* LUCENE-5834: Empty sorted set and numeric doc values are now singletons.
7859  (Adrien Grand)
7860
7861* LUCENE-5841: Improve performance of block tree terms dictionary when
7862  assigning terms to blocks.  (Mike McCandless)
7863
7864* LUCENE-5856: Optimize Fixed/Open/LongBitSet to remove unnecessary AND.
7865  (Robert Muir)
7866
7867* LUCENE-5884: Optimize FST.ramBytesUsed.  (Adrien Grand, Robert Muir,
7868  Mike McCandless)
7869
7870* LUCENE-5882: Add Lucene410DocValuesFormat, with faster term lookups
7871  for SORTED/SORTED_SET fields.  (Robert Muir)
7872
7873* LUCENE-5887: Remove WeakIdentityMap caching in AttributeFactory,
7874  AttributeSource, and VirtualMethod in favour of Java 7's ClassValue.
7875  Always use MethodHandles to create AttributeImpl classes.
7876  (Uwe Schindler)
7877
7878Bug Fixes
7879
7880* LUCENE-5796: Fixes the Scorer.getChildren() method for two combinations
7881  of BooleanQuery. (Terry Smith via Robert Muir)
7882
7883* LUCENE-5790: Fix compareTo in MutableValueDouble and MutableValueBool, this caused
7884  incorrect results when grouping on fields with missing values.
7885  (海老澤 志信, hossman)
7886
7887* LUCENE-5817: Fix hunspell zero-affix handling: previously only zero-strips worked
7888  correctly.  (Robert Muir)
7889
7890* LUCENE-5818, LUCENE-5823: Fix hunspell overgeneration for short strings that also
7891  match affixes, words are only stripped to a zero-length string if FULLSTRIP option
7892  is specified in the dictionary.  (Robert Muir)
7893
7894* LUCENE-5824: Fix hunspell 'long' flag handling. (Robert Muir)
7895
7896* LUCENE-5838: Fix hunspell when the .aff file has over 64k affixes. (Robert Muir)
7897
7898* LUCENE-5869: Added restriction to positive values for maxExpansions in
7899  FuzzyQuery.  (Ryan Ernst)
7900
7901* LUCENE-5672: IndexWriter.addIndexes() calls maybeMerge(), to ensure the index stays
7902  healthy. If you don't want merging use NoMergePolicy instead. (Robert Muir)
7903
7904* LUCENE-5908: Fix Lucene43NGramTokenizer to be final
7905
7906Test Framework
7907
7908* LUCENE-5786: Unflushed/ truncated events file (hung testing subprocess).
7909  (Dawid Weiss)
7910
7911* LUCENE-5881: Add "beasting" of tests: repeats the whole "test" Ant target
7912  N times with "ant beast -Dbeast.iters=N".  (Uwe Schindler, Robert Muir,
7913  Ryan Ernst, Dawid Weiss)
7914
7915Build
7916
7917* LUCENE-5770: Upgrade to JFlex 1.6, which has direct support for
7918  supplementary code points - as a result, ICU4J is no longer used
7919  to generate surrogate pairs to augment JFlex scanner specifications.
7920  (Steve Rowe)
7921
7922* SOLR-6358: Remove VcsDirectoryMappings from idea configuration
7923  vcs.xml (Ramkumar Aiyengar via Steve Rowe)
7924
7925======================= Lucene 4.9.1 ======================
7926
7927Bug fixes
7928
7929* LUCENE-5907: Fix corruption case when opening a pre-4.x index with
7930  IndexWriter, then opening an NRT reader from that writer, then
7931  calling commit from the writer, then closing the NRT reader.  This
7932  case would remove the wrong files from the index leading to a
7933  corrupt index.  (Mike McCandless)
7934
7935* LUCENE-5919: Fix exception handling inside IndexWriter when
7936  deleteFile throws an exception, to not over-decRef index files,
7937  possibly deleting a file that's still in use in the index, leading
7938  to corruption.  (Mike McCandless)
7939
7940* LUCENE-5922: DocValuesDocIdSet on 5.x and FieldCacheDocIdSet on 4.x
7941  are not cacheable. (Adrien Grand)
7942
7943* LUCENE-5843: Added IndexWriter.MAX_DOCS which is the maximum number
7944  of documents allowed in a single index, and any operations that add
7945  documents will now throw IllegalStateException if the max count
7946  would be exceeded, instead of silently creating an unusable
7947  index.  (Mike McCandless)
7948
7949* LUCENE-5844: ArrayUtil.grow/oversize now returns a maximum of
7950  Integer.MAX_VALUE - 8 for the maximum array size.  (Robert Muir,
7951  Mike McCandless)
7952
7953* LUCENE-5827: Make all Directory implementations correctly fail with
7954  IllegalArgumentException if slices are out of bounds.  (Uwe Schindler)
7955
7956* LUCENE-5897, LUCENE-5400: JFlex-based tokenizers StandardTokenizer and
7957  UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of
7958  text partially matching certain grammar rules.  The scanner default
7959  buffer size was reduced, and scanner buffer growth was disabled, resulting
7960  in much, much faster tokenization for these text sequences.
7961  (Chris Geeringh, Robert Muir, Steve Rowe)
7962
7963======================= Lucene 4.9.0 =======================
7964
7965Changes in Runtime Behavior
7966
7967* LUCENE-5611: Changing the term vector options for multiple field
7968  instances by the same name in one document is not longer accepted;
7969  IndexWriter will now throw IllegalArgumentException.  (Robert Muir,
7970  Mike McCandless)
7971
7972* LUCENE-5646: Remove rare/undertested bulk merge algorithm in
7973  CompressingStoredFieldsWriter. (Robert Muir, Adrien Grand)
7974
7975New Features
7976
7977* LUCENE-5610: Add Terms.getMin and Terms.getMax to get the lowest and
7978  highest terms, and NumericUtils.get{Min/Max}{Int/Long} to get the
7979  minimum numeric values from the provided Terms.  (Robert Muir, Mike
7980  McCandless)
7981
7982* LUCENE-5675: Add IDVersionPostingsFormat, a postings format
7983  optimized for primary-key (ID) fields that also record a version
7984  (long) for each ID.  (Robert Muir, Mike McCandless)
7985
7986* LUCENE-5680: Add ability to atomically update a set of DocValues
7987  fields. (Shai Erera)
7988
7989* LUCENE-5717: Add support for multiterm queries nested inside
7990  filtered and constant-score queries to postings highlighter.
7991  (Luca Cavanna via Robert Muir)
7992
7993* LUCENE-5731, LUCENE-5760: Add RandomAccessInput, a random access API for directory.
7994  Add DirectReader/Writer, optimized for reading packed integers directly
7995  from Directory. Add Lucene49Codec and Lucene49DocValuesFormat that make
7996  use of these.  (Robert Muir)
7997
7998* LUCENE-5743: Add Lucene49NormsFormat, which can compress in some cases
7999  such as very short fields.  (Ryan Ernst, Adrien Grand, Robert Muir)
8000
8001* LUCENE-5748: Add SORTED_NUMERIC docvalues type, which is efficient
8002  for processing numeric fields with multiple values.  (Robert Muir)
8003
8004* LUCENE-5754: Allow "$" as part of variable and function names in
8005  expressions module.  (Uwe Schindler)
8006
8007Changes in Backwards Compatibility Policy
8008
8009* LUCENE-5634: Add reuse argument to IndexableField.tokenStream. This
8010  can be used by custom fieldtypes, which don't use the Analyzer, but
8011  implement their own TokenStream.  (Uwe Schindler, Robert Muir)
8012
8013* LUCENE-5640: AttributeSource.AttributeFactory was moved to a
8014  top-level class: org.apache.lucene.util.AttributeFactory
8015  (Uwe Schindler, Robert Muir)
8016
8017* LUCENE-4371: Removed IndexInputSlicer and Directory.createSlicer() and replaced
8018  with IndexInput.slice(). (Robert Muir)
8019
8020* LUCENE-5727, LUCENE-5678: Remove IndexOutput.seek, IndexOutput.setLength().
8021  (Robert Muir, Uwe Schindler)
8022
8023API Changes
8024
8025* LUCENE-5756: IndexWriter now implements Accountable and IW#ramSizeInBytes()
8026  has been deprecated in favor of IW#ramBytesUsed() (Simon Willnauer)
8027
8028* LUCENE-5725: MoreLikeThis#like now accepts multiple values per field.
8029  The pre-existing method has been deprecated in favor of a variable arguments
8030  for the like text. (Alex Ksikes via Simon Willnauer)
8031
8032* LUCENE-5711: MergePolicy accepts an IndexWriter instance
8033  on each method rather than holding state against a single
8034  IndexWriter instance. (Simon Willnauer)
8035
8036* LUCENE-5582: Deprecate IndexOutput.length (just use
8037  IndexOutput.getFilePointer instead) and IndexOutput.setLength.
8038  (Mike McCandless)
8039
8040* LUCENE-5621: Deprecate IndexOutput.flush: this is not used by Lucene.
8041  (Robert Muir)
8042
8043* LUCENE-5611: Simplified Lucene's default indexing chain / APIs.
8044  AttributeSource/TokenStream.getAttribute now returns null if the
8045  attribute is not present (previously it threw
8046  IllegalArgumentException).  StoredFieldsWriter.startDocument no
8047  longer receives the number of fields that will be added (Robert
8048  Muir, Mike McCandless)
8049
8050* LUCENE-5632: In preparation for coming Lucene versions, the Version
8051  enum constants were renamed to make them better readable. The constant
8052  for Lucene 4.9 is now "LUCENE_4_9". Version.parseLeniently() is still
8053  able to parse the old strings ("LUCENE_49"). The old identifiers got
8054  deprecated and will be removed in Lucene 5.0.  (Uwe Schindler,
8055  Robert Muir)
8056
8057* LUCENE-5633: Change NoMergePolicy to a singleton with no distinction between
8058  compound and non-compound types. (Shai Erera)
8059
8060* LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams
8061  are using Attributes, Token is no longer used.  (Uwe Schindler, Robert Muir)
8062
8063* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and
8064  IndexWriter.deleteDocuments(Query) with their varargs counterparts.
8065  (Shai Erera)
8066
8067* LUCENE-5701: Core closed listeners are now available in the AtomicReader API,
8068  they used to sit only in SegmentReader. (Adrien Grand, Robert Muir)
8069
8070* LUCENE-5706: Removed the option to unset a DocValues field through DocValues
8071  updates. (Shai Erera)
8072
8073* LUCENE-5700: Added oal.util.Accountable that is now implemented by all
8074  classes whose memory usage can be estimated. (Robert Muir, Adrien Grand)
8075
8076* LUCENE-5708: Remove IndexWriterConfig.clone, so now IndexWriter
8077  simply uses the IndexWriterConfig you pass it, and you must create a
8078  new IndexWriterConfig for each IndexWriter.  (Mike McCandless)
8079
8080* LUCENE-5678: IndexOutput no longer allows seeking, so it is no longer required
8081  to use RandomAccessFile to write Indexes. Lucene now uses standard FileOutputStream
8082  wrapped with OutputStreamIndexOutput to write index data. BufferedIndexOutput was
8083  removed, because buffering and checksumming is provided by FilterOutputStreams,
8084  provided by the JDK.  (Uwe Schindler, Mike McCandless)
8085
8086* LUCENE-5703: BinaryDocValues API changed to work like TermsEnum and not allocate/
8087  copy bytes on each access, you are responsible for cloning if you want to keep
8088  data around. (Adrien Grand)
8089
8090* LUCENE-5695: DocIdSet implements Accountable. (Adrien Grand)
8091
8092* LUCENE-5757: Moved RamUsageEstimator's reflection-based processing to RamUsageTester
8093  in the test-framework module. (Robert Muir)
8094
8095* LUCENE-5761: Removed DiskDocValuesFormat, it was very inefficient and saved very little
8096  RAM over the default codec. (Robert Muir)
8097
8098* LUCENE-5775: Deprecate JaspellLookup. (Mike McCandless)
8099
8100Optimizations
8101
8102* LUCENE-5603: hunspell stemmer more efficiently strips prefixes
8103  and suffixes.  (Robert Muir)
8104
8105* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
8106  InputStream. (Christoph Kaser via Shai Erera)
8107
8108* LUCENE-5591: pass an IOContext with estimated flush size when applying DV
8109  updates. (Shai Erera)
8110
8111* LUCENE-5634: IndexWriter reuses TokenStream instances for String and Numeric
8112  fields by default. (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir)
8113
8114* LUCENE-5638, LUCENE-5640: TokenStream uses a more performant AttributeFactory
8115  by default, that packs the core attributes into one implementation
8116  (PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and
8117  restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for
8118  instantiating Attribute implementations.  (Uwe Schindler, Robert Muir)
8119
8120* LUCENE-5609: Changed the default NumericField precisionStep from 4
8121  to 8 (for int/float) and 16 (for long/double), for faster indexing
8122  time and smaller indices. (Robert Muir, Uwe Schindler, Mike McCandless)
8123
8124* LUCENE-5670: Add skip/FinalOutput to FST Outputs.  (Christian
8125  Ziech via Mike McCandless).
8126
8127* LUCENE-4236: Optimize BooleanQuery's in-order scoring. This speeds up
8128  some types of boolean queries.  (Robert Muir)
8129
8130* LUCENE-5694: Don't score() subscorers in DisjunctionSumScorer or
8131  DisjunctionMaxScorer unless score() is called.  (Robert Muir)
8132
8133* LUCENE-5720: Optimize DirectPackedReader's decompression. (Robert Muir)
8134
8135* LUCENE-5722: Optimize ByteBufferIndexInput#seek() by specializing
8136  implementations. This improves random access as used by docvalues codecs
8137  if used with MMapDirectory.  (Robert Muir, Uwe Schindler)
8138
8139* LUCENE-5730: FSDirectory.open returns MMapDirectory for 64-bit operating
8140  systems, not just Linux and Windows. (Robert Muir)
8141
8142* LUCENE-5703: BinaryDocValues producers don't allocate or copy bytes on
8143  each access anymore.  (Adrien Grand)
8144
8145* LUCENE-5721: Monotonic compression doesn't use zig-zag encoding anymore.
8146  (Robert Muir, Adrien Grand)
8147
8148* LUCENE-5750: Speed up monotonic addressing for BINARY and SORTED_SET
8149  docvalues. (Robert Muir)
8150
8151* LUCENE-5751: Speed up MemoryDocValues. (Adrien Grand, Robert Muir)
8152
8153* LUCENE-5767: OrdinalMap optimizations, that mostly help on low cardinalities.
8154  (Martijn van Groningen, Adrien Grand)
8155
8156* LUCENE-5769: SingletonSortedSetDocValues now supports random access ordinals.
8157  (Robert Muir)
8158
8159Bug fixes
8160
8161* LUCENE-5738: Ensure NativeFSLock prevents opening the file channel for the
8162  lock if the lock is already obtained by the JVM. Trying to obtain an already
8163  obtained lock in the same JVM can unlock the file might allow other processes
8164  to lock the file even without explicitly unlocking the FileLock. This behavior
8165  is operating system dependent. (Simon Willnauer)
8166
8167* LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws
8168  a confusing OutOfMemoryError wrapped inside IOException if the FileChannel
8169  mapping failed because of lack of virtual address space. The IOException is
8170  rethrown with more useful information about the problem, omitting the
8171  incorrect OutOfMemoryError.  (Robert Muir, Uwe Schindler)
8172
8173* LUCENE-5682: NPE in QueryRescorer when Scorer is null
8174  (Joel Bernstein, Mike McCandless)
8175
8176* LUCENE-5691: DocTermOrds lookupTerm(BytesRef) would return incorrect results
8177  if the underlying TermsEnum supports ord() and the insertion point would
8178  be at the end. (Robert Muir)
8179
8180* LUCENE-5618, LUCENE-5636: SegmentReader referenced unneeded files following
8181  doc-values updates. Now doc-values field updates are written in separate file
8182  per field. (Shai Erera, Robert Muir)
8183
8184* LUCENE-5684: Make best effort to detect invalid usage of Lucene,
8185  when IndexReader is reopened after all files in its index were
8186  removed and recreated by the application (the proper way to do
8187  this is IndexWriter.deleteAll, or opening an IndexWriter with
8188  OpenMode.CREATE)  (Mike McCandless)
8189
8190* LUCENE-5704: Fix compilation error with Java 8u20.  (Uwe Schindler)
8191
8192* LUCENE-5710: Include the inner exception as the cause and in the
8193  exception message when an immense term is hit during indexing (Lee
8194  Hinman via Mike McCandless)
8195
8196* LUCENE-5724: CompoundFileWriter was failing to pass through the
8197  IOContext in some cases, causing NRTCachingDirectory to cache
8198  compound files when it shouldn't, then causing OOMEs.  (Mike
8199  McCandless)
8200
8201* LUCENE-5747: Project-specific settings for the eclipse development
8202  environment will prevent automatic code reformatting. (Shawn Heisey)
8203
8204* LUCENE-5768, LUCENE-5777: Hunspell condition checks containing character classes
8205  were buggy. (Clinton Gormley, Robert Muir)
8206
8207Test Framework
8208
8209* LUCENE-5622: Fail tests if they print over the given limit of bytes to
8210  System.out or System.err. (Robert Muir, Dawid Weiss)
8211
8212* LUCENE-5619: Added backwards compatibility tests to ensure we can update existing
8213  indexes with doc-values updates. (Shai Erera, Robert Muir)
8214
8215Build
8216
8217* LUCENE-5442: The Ant check-lib-versions target now runs Ivy resolution
8218  transitively, then fails the build when it finds a version conflict: when a
8219  transitive dependency's version is more recent than the direct dependency's
8220  version specified in lucene/ivy-versions.properties.  Exceptions are
8221  specifiable in lucene/ivy-ignore-conflicts.properties.
8222  (Steve Rowe)
8223
8224* LUCENE-5715: Upgrade direct dependencies known to be older than transitive
8225  dependencies: com.sun.jersey.version:1.8->1.9; com.sun.xml.bind:jaxb-impl:2.2.2->2.2.3-1;
8226  commons-beanutils:commons-beanutils:1.7.0->1.8.3; commons-digester:commons-digester:2.0->2.1;
8227  commons-io:commons-io:2.1->2.3; commons-logging:commons-logging:1.1.1->1.1.3;
8228  io.netty:netty:3.6.2.Final->3.7.0.Final; javax.activation:activation:1.1->1.1.1;
8229  javax.mail:mail:1.4.1->1.4.3; log4j:log4j:1.2.16->1.2.17; org.apache.avro:avro:1.7.4->1.7.5;
8230  org.tukaani:xz:1.2->1.4; org.xerial.snappy:snappy-java:1.0.4.1->1.0.5 (Steve Rowe)
8231
8232======================= Lucene 4.8.1 =======================
8233
8234Bug fixes
8235
8236* LUCENE-5639: Fix PositionLengthAttribute implementation in Token class.
8237  (Uwe Schindler, Robert Muir)
8238
8239* LUCENE-5635: IndexWriter didn't properly handle IOException on TokenStream.reset(),
8240  which could leave the analyzer in an inconsistent state.  (Robert Muir)
8241
8242* LUCENE-5599: HttpReplicator did not properly delegate bulk read() to wrapped
8243  InputStream. (Christoph Kaser via Shai Erera)
8244
8245* LUCENE-5600: HttpClientBase did not properly consume a connection if a server
8246  error occurred. (Christoph Kaser via Shai Erera)
8247
8248* LUCENE-5628: Change getFiniteStrings to iterative not recursive
8249  implementation, so that building suggesters on a long suggestion
8250  doesn't risk overflowing the stack; previously it consumed one Java
8251  stack frame per character in the expanded suggestion.  If you are building
8252  a suggester this is a nasty trap. (Robert Muir, Simon Willnauer,
8253  Mike McCandless).
8254
8255* LUCENE-5559: Add additional argument validation for CapitalizationFilter
8256  and CodepointCountFilter. (Ahmet Arslan via Robert Muir)
8257
8258* LUCENE-5641: SimpleRateLimiter would silently rate limit at 8 MB/sec
8259  even if you asked for higher rates.  (Mike McCandless)
8260
8261* LUCENE-5644: IndexWriter clears which threads use which internal
8262  thread states on flush, so that if an application reduces how many
8263  threads it uses for indexing, that results in a reduction of how
8264  many segments are flushed on a full-flush (e.g. to obtain a
8265  near-real-time reader).  (Simon Willnauer, Mike McCandless)
8266
8267* LUCENE-5653: JoinUtil with ScoreMode.Avg on a multi-valued field
8268  with more than 256 values would throw exception.
8269  (Mikhail Khludnev via Robert Muir)
8270
8271* LUCENE-5654: Fix various close() methods that could suppress
8272  throwables such as OutOfMemoryError, instead returning scary messages
8273  that look like index corruption.  (Mike McCandless, Robert Muir)
8274
8275* LUCENE-5656: Fix rare fd leak in SegmentReader when multiple docvalues
8276  fields have been updated with IndexWriter.updateXXXDocValue and one
8277  hits exception. (Shai Erera, Robert Muir)
8278
8279* LUCENE-5660: AnalyzingSuggester.build will now throw IllegalArgumentException if
8280  you give it a longer suggestion than it can handle (Robert Muir, Mike McCandless)
8281
8282* LUCENE-5662: Add missing checks to Field to prevent IndexWriter.abort
8283  if a stored value is null. (Robert Muir)
8284
8285* LUCENE-5668: Fix off-by-one in TieredMergePolicy (Mike McCandless)
8286
8287* LUCENE-5671: Upgrade ICU version to fix an ICU concurrency problem that
8288  could cause exceptions when indexing. (feedly team, Robert Muir)
8289
8290======================= Lucene 4.8.0 =======================
8291
8292System Requirements
8293
8294* LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
8295  (Robert Muir, Uwe Schindler)
8296
8297Changes in Runtime Behavior
8298
8299* LUCENE-5472: IndexWriter.addDocument will now throw an IllegalArgumentException
8300  if a Term to be indexed exceeds IndexWriter.MAX_TERM_LENGTH.  To recreate previous
8301  behavior of silently ignoring these terms, use LengthFilter in your Analyzer.
8302  (hossman, Mike McCandless, Varun Thacker)
8303
8304New Features
8305
8306* LUCENE-5356: Morfologik filter can accept custom dictionary resources.
8307  (Michal Hlavac, Dawid Weiss)
8308
8309* LUCENE-5454: Add SortedSetSortField to lucene/sandbox, to allow sorting
8310  on multi-valued field. (Robert Muir)
8311
8312* LUCENE-5478: CommonTermsQuery now allows to create custom term queries
8313  similar to the query parser by overriding a newTermQuery method.
8314  (Simon Willnauer)
8315
8316* LUCENE-5477: AnalyzingInfixSuggester now supports near-real-time
8317  additions and updates (to change weight or payload of an existing
8318  suggestion).  (Mike McCandless)
8319
8320* LUCENE-5482: Improve default TurkishAnalyzer by adding apostrophe
8321  handling suitable for Turkish.  (Ahmet Arslan via Robert Muir)
8322
8323* LUCENE-5479: FacetsConfig subclass can now customize the default
8324  per-dim facets configuration.  (Rob Audenaerde via Mike McCandless)
8325
8326* LUCENE-5485: Add circumfix support to HunspellStemFilter. (Robert Muir)
8327
8328* LUCENE-5224: Add iconv, oconv, and ignore support to HunspellStemFilter.
8329  (Robert Muir)
8330
8331* LUCENE-5493: SortingMergePolicy, and EarlyTerminatingSortingCollector
8332  support arbitrary Sort specifications.
8333  (Robert Muir, Mike McCandless, Adrien Grand)
8334
8335* LUCENE-3758: Allow the ComplexPhraseQueryParser to search order or
8336  un-order proximity queries. (Ahmet Arslan via Erick Erickson)
8337
8338* LUCENE-5530: ComplexPhraseQueryParser throws ParseException for fielded queries.
8339  (Erick Erickson via Tomas Fernandez Lobbe and Ahmet Arslan)
8340
8341* LUCENE-5513: Add IndexWriter.updateBinaryDocValue which lets
8342  you update the value of a BinaryDocValuesField without reindexing the
8343  document(s). (Shai Erera)
8344
8345* LUCENE-4072: Add ICUNormalizer2CharFilter, which lets you do unicode normalization
8346  with offset correction before the tokenizer. (David Goldfarb, Ippei UKAI via Robert Muir)
8347
8348* LUCENE-5476: Add RandomSamplingFacetsCollector for computing facets on a sampled
8349  set of matching hits, in cases where there are millions of hits.
8350  (Rob Audenaerde, Gilad Barkai, Shai Erera)
8351
8352* LUCENE-4984: Add SegmentingTokenizerBase, abstract class for tokenizers
8353  that want to do two-pass tokenization such as by sentence and then by word.
8354  (Robert Muir)
8355
8356* LUCENE-5489: Add Rescorer/QueryRescorer, to resort the hits from a
8357  first pass search using scores from a more costly second pass
8358  search. (Simon Willnauer, Robert Muir, Mike McCandless)
8359
8360* LUCENE-5528: Add context to suggesters (InputIterator and Lookup
8361  classes), and fix AnalyzingInfixSuggester to handle contexts.
8362  Suggester contexts allow you to filter suggestions.  (Areek Zillur,
8363  Mike McCandless)
8364
8365* LUCENE-5545: Add SortRescorer and Expression.getRescorer, to
8366  resort the hits from a first pass search using a Sort or an
8367  Expression. (Simon Willnauer, Robert Muir, Mike McCandless)
8368
8369* LUCENE-5558: Add TruncateTokenFilter which truncates terms to
8370  the specified length.  (Ahmet Arslan via Robert Muir)
8371
8372* LUCENE-2446: Added checksums to lucene index files. As of 4.8, the last 8
8373  bytes of each file contain a zlib-crc32 checksum. Small metadata files are
8374  verified on load. Larger files can be checked on demand via
8375  AtomicReader.checkIntegrity. You can configure this to happen automatically
8376  before merges by enabling IndexWriterConfig.setCheckIntegrityAtMerge.
8377  (Robert Muir)
8378
8379* LUCENE-5580: Checksums are automatically verified on the default stored
8380  fields format when performing a bulk merge. (Adrien Grand)
8381
8382* LUCENE-5602: Checksums are automatically verified on the default term
8383  vectors format when performing a bulk merge. (Adrien Grand, Robert Muir)
8384
8385* LUCENE-5583: Added DataInput.skipBytes. ChecksumIndexInput can now seek, but
8386  only forward. (Adrien Grand, Mike McCandless, Simon Willnauer, Uwe Schindler)
8387
8388* LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring
8389  that all file metadata is persisted on disk in case of power failure.
8390  This does not work on all file systems and operating systems, but Linux
8391  and MacOSX are known to work. On Windows, fsyncing a directory is not
8392  possible with Java APIs.  (Mike McCandless, Uwe Schindler)
8393
8394API Changes
8395
8396* LUCENE-5454: Add RandomAccessOrds, an optional extension of SortedSetDocValues
8397  that supports random access to the ordinals in a document. (Robert Muir)
8398
8399* LUCENE-5468: Move offline Sort (from suggest module) to OfflineSort. (Robert Muir)
8400
8401* LUCENE-5493: SortingMergePolicy and EarlyTerminatingSortingCollector take
8402  Sort instead of Sorter. BlockJoinSorter is removed, replaced with
8403  BlockJoinComparatorSource, which can take a Sort for ordering of parents
8404  and a separate Sort for ordering of children within a block.
8405  (Robert Muir, Mike McCandless, Adrien Grand)
8406
8407* LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as
8408  a boolean that indicates if a new merge was found in the caller thread before
8409  the scheduler was called. (Simon Willnauer)
8410
8411* LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from
8412  normal scoring (Weight.scorer) for those queries that can do bulk
8413  scoring more efficiently, e.g. BooleanQuery in some cases.  This
8414  also simplified the Weight.scorer API by removing the two confusing
8415  booleans.  (Robert Muir, Uwe Schindler, Mike McCandless)
8416
8417* LUCENE-5519: TopNSearcher now allows to retrieve incomplete results if the max
8418  size of the candidate queue is unknown. The queue can still be bound in order
8419  to apply pruning while retrieving the top N but will not throw an exception if
8420  too many results are rejected to guarantee an absolutely correct top N result.
8421  The TopNSearcher now returns a struct like class that indicates if the result
8422  is complete in the sense of the top N or not. Consumers of this API should assert
8423  on the completeness if the bounded queue size is know ahead of time. (Simon Willnauer)
8424
8425* LUCENE-4984: Deprecate ThaiWordFilter and smartcn SentenceTokenizer and WordTokenFilter.
8426  These filters would not work correctly with CharFilters and could not be safely placed
8427  at an arbitrary position in the analysis chain. Use ThaiTokenizer and HMMChineseTokenizer
8428  instead. (Robert Muir)
8429
8430* LUCENE-5543: Remove/deprecate Directory.fileExists (Mike McCandless)
8431
8432* LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues.
8433  (Dawid Weiss, Robert Muir)
8434
8435* LUCENE-5604: Switched BytesRef.hashCode to MurmurHash3 (32 bit).
8436  TermToBytesRefAttribute.fillBytesRef no longer returns the hash
8437  code.  BytesRefHash now uses MurmurHash3 for its hashing.  (Robert
8438  Muir, Mike McCandless)
8439
8440Optimizations
8441
8442* LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads
8443  all known openoffice dictionaries without error, and supports an additional
8444  longestOnly option for a less aggressive approach.  (Robert Muir)
8445
8446* LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
8447  for NIOFSDirectory and MMapDirectory. This allows to delete open files
8448  on Windows if NIOFSDirectory is used, mmapped files are still locked.
8449  (Michael Poindexter, Robert Muir, Uwe Schindler)
8450
8451* LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc
8452  array with length of at most equal to the specified size instead of length
8453  equal to at most from + size as was before. (Martijn van Groningen)
8454
8455* LUCENE-5529: Spatial search of non-point indexed shapes should be a little
8456  faster due to skipping intersection tests on redundant cells. (David Smiley)
8457
8458Bug fixes
8459
8460* LUCENE-5483: Fix inaccuracies in HunspellStemFilter. Multi-stage affix-stripping,
8461  prefix-suffix dependencies, and COMPLEXPREFIXES now work correctly according
8462  to the hunspell algorithm. Removed recursionCap parameter, as it's no longer needed, rules for
8463  recursive affix application are driven correctly by continuation classes in the affix file.
8464  (Robert Muir)
8465
8466* LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions.
8467  (Robert Muir)
8468
8469* LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying
8470  types of whitespace in SET/FLAG commands. (Robert Muir)
8471
8472* LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases
8473  etc before the encoding declaration.  (Robert Muir)
8474
8475* LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order.  (Robert Muir)
8476
8477* LUCENE-5555: Fix SortedInputIterator to correctly encode/decode contexts in presence of payload (Areek Zillur)
8478
8479* LUCENE-5559: Add missing argument checks to tokenfilters taking
8480  numeric arguments.  (Ahmet Arslan via Robert Muir)
8481
8482* LUCENE-5568: Benchmark module's "default.codec" option didn't work. (David Smiley)
8483
8484* SOLR-5983: HTMLStripCharFilter is treating CDATA sections incorrectly.
8485  (Dan Funk, Steve Rowe)
8486
8487* LUCENE-5615: Validate per-segment delete counts at write time, to
8488  help catch bugs that might otherwise cause corruption (Mike McCandless)
8489
8490* LUCENE-5612: NativeFSLockFactory no longer deletes its lock file. This cannot be done
8491  safely without the risk of deleting someone else's lock file. If you use NativeFSLockFactory,
8492  you may see write.lock hanging around from time to time: it's harmless.
8493  (Uwe Schindler, Mike McCandless, Robert Muir)
8494
8495* LUCENE-5624: Ensure NativeFSLockFactory does not leak file handles if it is unable
8496  to obtain the lock. (Uwe Schindler, Robert Muir)
8497
8498* LUCENE-5626: Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed
8499  IOException (ERROR_ACCESS_DENIED) on Windows if the lock file was created
8500  concurrently. This error is now handled the same way like in NativeFSLockFactory
8501  by returning false.  (Uwe Schindler, Robert Muir, Dawid Weiss)
8502
8503* LUCENE-5630: Add missing META-INF entry for UpperCaseFilterFactory.
8504  (Robert Muir)
8505
8506Tests
8507
8508* LUCENE-5630: Fix TestAllAnalyzersHaveFactories to correctly check for existence
8509  of class and corresponding Map<String,String> ctor.  (Uwe Schindler, Robert Muir)
8510
8511Test Framework
8512
8513* LUCENE-5592: Incorrectly reported uncloseable files. (Dawid Weiss)
8514
8515* LUCENE-5577: Temporary folder and file management (and cleanup facilities)
8516  (Mark Miller, Uwe Schindler, Dawid Weiss)
8517
8518* LUCENE-5567: When a suite fails with zombie threads failure marker and count
8519  is not propagated properly. (Dawid Weiss)
8520
8521* LUCENE-5449: Rename _TestUtil and _TestHelper to remove the leading _.
8522
8523* LUCENE-5501: Added random out-of-order collection testing (when the collector
8524  supports it) to AssertingIndexSearcher. (Adrien Grand)
8525
8526Build
8527
8528* LUCENE-5463: RamUsageEstimator.(human)sizeOf(Object) is now a forbidden API.
8529  (Adrien Grand, Robert Muir)
8530
8531* LUCENE-5512: Remove redundant typing (use diamond operator) throughout
8532  the codebase.  (Furkan KAMACI via Robert Muir)
8533
8534* LUCENE-5614: Enable building on Java 8 using Apache Ant 1.8.3 or 1.8.4
8535  by adding a workaround for the Ant bug.  (Uwe Schindler)
8536
8537* LUCENE-5612: Add a new Ant target in lucene/core to test LockFactory
8538  implementations: "ant test-lock-factory".  (Uwe Schindler, Mike McCandless,
8539  Robert Muir)
8540
8541Documentation
8542
8543* LUCENE-5534: Add javadocs to GreekStemmer methods.
8544  (Stamatis Pitsios via Robert Muir)
8545
8546======================= Lucene 4.7.2 =======================
8547
8548Bug Fixes
8549
8550* LUCENE-5574: Closing a near-real-time reader no longer attempts to
8551  delete unreferenced files if the original writer has been closed;
8552  this could cause index corruption in certain cases where index files
8553  were directly changed (deleted, overwritten, etc.) in the index
8554  directory outside of Lucene.  (Simon Willnauer, Shai Erera, Robert
8555  Muir, Mike McCandless)
8556
8557* LUCENE-5570: Don't let FSDirectory.sync() create new zero-byte files, instead throw
8558  exception if a file is missing.  (Uwe Schindler, Mike McCandless, Robert Muir)
8559
8560======================= Lucene 4.7.1 =======================
8561
8562Changes in Runtime Behavior
8563
8564* LUCENE-5532: AutomatonQuery.equals is no longer implemented as "accepts same language".
8565  This was inconsistent with hashCode, and unnecessary for any subclasses in Lucene.
8566  If you desire this in a custom subclass, minimize the automaton.  (Robert Muir)
8567
8568Bug Fixes
8569
8570* LUCENE-5450: Fix getField() NPE issues with SpanOr/SpanNear when they have an
8571  empty list of clauses. This can happen for example,  when a wildcard matches
8572  no terms.  (Tim Allison via Robert Muir)
8573
8574* LUCENE-5473: Throw IllegalArgumentException, not
8575  NullPointerException, if the synonym map is empty when creating
8576  SynonymFilter (帅广应 via Mike McCandless)
8577
8578* LUCENE-5432: EliasFanoDocIdSet: Fix number of index entry bits when the maximum
8579  entry is a power of 2. (Paul Elschot via Adrien Grand)
8580
8581* LUCENE-5466: query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier.
8582  (Koji Sekiguchi)
8583
8584* LUCENE-5502: Fixed TermsFilter.equals that could return true for different
8585  filters. (Igor Motov via Adrien Grand)
8586
8587* LUCENE-5522: FacetsConfig didn't add drill-down terms for association facet
8588  fields labels. (Shai Erera)
8589
8590* LUCENE-5520: ToChildBlockJoinQuery would hit
8591  ArrayIndexOutOfBoundsException if a parent document had no children
8592  (Sally Ang via Mike McCandless)
8593
8594* LUCENE-5532: AutomatonQuery.hashCode was not thread-safe. (Robert Muir)
8595
8596* LUCENE-5525: Implement MultiFacets.getAllDims, so you can do sparse
8597  facets through DrillSideways, for example.  (Jose Peleteiro, Mike
8598  McCandless)
8599
8600* LUCENE-5481: IndexWriter.forceMerge used to run a merge even if there was a
8601  single segment in the index. (Adrien Grand, Mike McCandless)
8602
8603* LUCENE-5538: Fix FastVectorHighlighter bug with index-time synonyms when the
8604  query is more complex than a single phrase.  (Robert Muir)
8605
8606* LUCENE-5544: Exceptions during IndexWriter.rollback could leak file handles
8607  and the write lock. (Robert Muir)
8608
8609* LUCENE-4978: Spatial RecursivePrefixTree queries could result in false-negatives for
8610  indexed shapes within 1/2 maxDistErr from the edge of the query shape.  This meant
8611  searching for a point by the same point as a query rarely worked.  (David Smiley)
8612
8613* LUCENE-5553: IndexReader#ReaderClosedListener is not always invoked when
8614  IndexReader#close() is called or if refCount is 0. If an exception is
8615  thrown during internal close or on any of the close listeners some or all
8616  listeners might be missed. This can cause memory leaks if the core listeners
8617  are used to clear caches. (Simon Willnauer)
8618
8619Build
8620
8621* LUCENE-5511: "ant precommit" / "ant check-svn-working-copy" now work again
8622  with any working copy format (thanks to svnkit 1.8.4).  (Uwe Schindler)
8623
8624======================= Lucene 4.7.0 =======================
8625
8626New Features
8627
8628* LUCENE-5336: Add SimpleQueryParser: parser for human-entered queries.
8629  (Jack Conradson via Robert Muir)
8630
8631* LUCENE-5337: Add Payload support to FileDictionary (Suggest) and make it more
8632  configurable (Areek Zillur via Erick Erickson)
8633
8634* LUCENE-5329: suggest: DocumentDictionary and
8635  DocumentExpressionDictionary are now lenient for dirty documents
8636  (missing the term, weight or payload).  (Areek Zillur via
8637  Mike McCandless)
8638
8639* LUCENE-5404: Add .getCount method to all suggesters (Lookup); persist count
8640  metadata on .store(); Dictionary returns InputIterator; Dictionary.getWordIterator
8641  renamed to .getEntryIterator. (Areek Zillur)
8642
8643* SOLR-1871: The RangeMapFloatFunction accepts an arbitrary ValueSource
8644  as target and default values. (Chris Harris, shalin)
8645
8646* LUCENE-5371: Speed up Lucene range faceting from O(N) per hit to
8647  O(log(N)) per hit using segment trees; this only really starts to
8648  matter in practice if the number of ranges is over 10 or so.  (Mike
8649  McCandless)
8650
8651* LUCENE-5379: Add Analyzer for Kurdish.  (Robert Muir)
8652
8653* LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens. (ryan)
8654
8655* LUCENE-5345: Add a new BlendedInfixSuggester, which is like
8656  AnalyzingInfixSuggester but boosts suggestions that matched tokens
8657  with lower positions.  (Remi Melisson via Mike McCandless)
8658
8659* LUCENE-5399: When sorting by String (SortField.STRING), you can now
8660  specify whether missing values should be sorted first (the default),
8661  using SortField.setMissingValue(SortField.STRING_FIRST), or last,
8662  using SortField.setMissingValue(SortField.STRING_LAST). (Rob Muir,
8663  Mike McCandless)
8664
8665* LUCENE-5099: QueryNode should have the ability to detach from its node
8666  parent. Added QueryNode.removeFromParent() that allows nodes to be
8667  detached from its parent node. (Adriano Crestani)
8668
8669* LUCENE-5395 LUCENE-5451: Upgrade to Spatial4j 0.4.1: Parses WKT (including
8670  ENVELOPE) with extension "BUFFER"; buffering a point results in a Circle.
8671  JTS isn't needed for WKT any more but remains required for Polygons. New
8672  Shapes: ShapeCollection and BufferedLineString. Various other improvements and
8673  bug fixes too. More info:
8674  https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md  (David Smiley)
8675
8676* LUCENE-5415: Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter.
8677  (Mike McCandless, Robert Muir)
8678
8679* LUCENE-3069: Add two memory resident dictionaries (FST terms dictionary and
8680  FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat
8681  API is also changed so that term dictionaries get the ability to block
8682  encode term metadata, and all dictionary implementations can now plug in any
8683  PostingsBaseFormat. (Han Jiang, Mike McCandless)
8684
8685* LUCENE-5353: ShingleFilter's filler token should be configurable.
8686  (Ahmet Arslan, Simon Willnauer, Steve Rowe)
8687
8688* LUCENE-5320: Add SearcherTaxonomyManager over search and taxonomy index
8689  directories (i.e. not only NRT). (Shai Erera)
8690
8691* LUCENE-5410: Add fuzzy and near support via '~' operator to SimpleQueryParser.
8692  (Lee Hinman via Robert Muir)
8693
8694* LUCENE-5426: Make SortedSetDocValuesReaderState abstract to allow
8695  custom implementations for Lucene doc values faceting (John Wang via
8696  Mike McCandless)
8697
8698* LUCENE-5434: NRT support for file systems that do no have delete on last
8699  close or cannot delete while referenced semantics.
8700  (Mark Miller, Mike McCandless)
8701
8702* LUCENE-5418: Drilling down or sideways on a Lucene facet range
8703  (using Range.getFilter()) is now faster for costly filters (uses
8704  random access, not iteration); range facet counts now accept a
8705  fast-match filter to avoid computing the value for documents that
8706  are out of bounds, e.g. using a bounding box filter with distance
8707  range faceting.  (Mike McCandless)
8708
8709* LUCENE-5440: Add LongBitSet for managing more than 2.1B bits (otherwise use
8710  FixedBitSet). (Shai Erera)
8711
8712* LUCENE-5437: ASCIIFoldingFilter now has an option to preserve the original token
8713  and emit it on the same position as the folded token only if the actual token was
8714  folded. (Simon Willnauer, Nik Everett)
8715
8716* LUCENE-5408: Add spatial SerializedDVStrategy that serializes a binary
8717  representations of a shape into BinaryDocValues. It supports exact geometry
8718  relationship calculations. (David Smiley)
8719
8720* LUCENE-5457: Add SloppyMath.earthDiameter(double latitude) that returns an
8721  approximate value of the diameter of the earth at the given latitude.
8722  (Adrien Grand)
8723
8724* LUCENE-5979: FilteredQuery uses the cost API to decide on whether to use
8725  random-access or leap-frog to intersect the filter with the query.
8726  (Adrien Grand)
8727
8728Build
8729
8730* LUCENE-5217,LUCENE-5420: Maven config: get dependencies from Ant+Ivy config;
8731  disable transitive dependency resolution for all depended-on artifacts by
8732  putting an exclusion for each transitive dependency in the
8733  <dependencyManagement> section of the grandparent POM. (Steve Rowe)
8734
8735* LUCENE-5322: Clean up / simplify Maven-related Ant targets.
8736  (Steve Rowe)
8737
8738* LUCENE-5347: Upgrade forbidden-apis checker to version 1.4.
8739  (Uwe Schindler)
8740
8741* LUCENE-4381: Upgrade analysis/icu to 52.1. (Robert Muir)
8742
8743* LUCENE-5357: Upgrade StandardTokenizer and UAX29URLEmailTokenizer to
8744  Unicode 6.3; update UAX29URLEmailTokenizer's recognized top level
8745  domains in URLs and Emails from the IANA Root Zone Database.
8746  (Steve Rowe)
8747
8748* LUCENE-5360: Add support for developing in Netbeans IDE.
8749  (Michal Hlavac, Uwe Schindler, Steve Rowe)
8750
8751* SOLR-5590: Upgrade HttpClient/HttpComponents to 4.3.x.
8752  (Karl Wright via Shawn Heisey)
8753
8754* LUCENE-5385: "ant precommit" / "ant check-svn-working-copy" now work
8755  for SVN 1.8 or GIT checkouts. The ANT target prints a warning instead
8756  of failing. It also instructs the user, how to run on SVN 1.8 working
8757  copies.  (Robert Muir, Uwe Schindler)
8758
8759* LUCENE-5383: fix changes2html to link pull requests (Steve Rowe)
8760
8761* LUCENE-5411: Upgrade to released JFlex 1.5.0; stop requiring
8762  a locally built JFlex snapshot jar. (Steve Rowe)
8763
8764* LUCENE-5465: Solr Contrib "map-reduce" breaks Manifest of all other
8765  JAR files by adding a broken Main-Class attribute.
8766  (Uwe Schindler, Steve Rowe)
8767
8768Bug fixes
8769
8770* LUCENE-5285: Improved highlighting of multi-valued fields with
8771  FastVectorHighlighter. (Nik Everett via Adrien Grand)
8772
8773* LUCENE-5391: UAX29URLEmailTokenizer should not tokenize no-scheme
8774  domain-only URLs that are followed by an alphanumeric character.
8775  (Chris Geeringh, Steve Rowe)
8776
8777* LUCENE-5405: If an analysis component throws an exception, Lucene
8778  logs the field name to the info stream to assist in
8779  diagnosis. (Benson Margulies)
8780
8781* SOLR-5661: PriorityQueue now refuses to allocate itself if the
8782  incoming maxSize is too large (Raintung Li via Mike McCandless)
8783
8784* LUCENE-5228: IndexWriter.addIndexes(Directory[]) now acquires a
8785  write lock in each Directory, to ensure that no open IndexWriter is
8786  changing the incoming indices.  This also means that you cannot pass
8787  the same Directory to multiple concurrent addIndexes calls (which is
8788  anyways unusual).  (Robert Muir, Mike McCandless)
8789
8790* LUCENE-5415: SpanMultiTermQueryWrapper didn't handle its boost in
8791  hashcode/equals/tostring/rewrite.  (Robert Muir)
8792
8793* LUCENE-5409: ToParentBlockJoinCollector.getTopGroups would fail to
8794  return any groups when the joined query required more than one
8795  rewrite step (Peng Cheng via Mike McCandless)
8796
8797* LUCENE-5398: NormValueSource was incorrectly casting the long value
8798  to byte, before calling Similarity.decodeNormValue.  (Peng Cheng via
8799  Mike McCandless)
8800
8801* LUCENE-5436: ReferenceManager#accquire can result in infinite loop if
8802  managed resource is abused outside of the ReferenceManager. Decrementing
8803  the reference without a corresponding incRef() call can cause an infinite
8804  loop. ReferenceManager now throws IllegalStateException if currently managed
8805  resources ref count is 0. (Simon Willnauer)
8806
8807* LUCENE-5443: Lucene45DocValuesProducer.ramBytesUsed() may throw
8808  ConcurrentModificationException. (Shai Erera, Simon Willnauer)
8809
8810* LUCENE-5444: MemoryIndex didn't respect the analyzers offset gap and
8811  offsets were corrupted if multiple fields with the same name were
8812  added to the memory index. (Britta Weber, Simon Willnauer)
8813
8814* LUCENE-5447: StandardTokenizer should break at consecutive chars matching
8815  Word_Break = MidLetter, MidNum and/or MidNumLet (Steve Rowe)
8816
8817* LUCENE-5462: RamUsageEstimator.sizeOf(Object) is not used anymore to
8818  estimate memory usage of segments. This used to make
8819  SegmentReader.ramBytesUsed very CPU-intensive. (Adrien Grand)
8820
8821* LUCENE-5461: ControlledRealTimeReopenThread would sometimes wait too
8822  long (up to targetMaxStaleSec) when a searcher is waiting for a
8823  specific generation, when it should have waited for at most
8824  targetMinStaleSec. (Hans Lund via Mike McCandless)
8825
8826API Changes
8827
8828* LUCENE-5339: The facet module was simplified/reworked to make the
8829  APIs more approachable to new users. Note: when migrating to the new
8830  API, you must pass the Document that is returned from FacetConfig.build()
8831  to IndexWriter.addDocument(). (Shai Erera, Gilad Barkai, Rob
8832  Muir, Mike McCandless)
8833
8834* LUCENE-5405: Make ShingleAnalyzerWrapper.getWrappedAnalyzer() public final (gsingers)
8835
8836* LUCENE-5395: The SpatialArgsParser now only reads WKT, no more "lat, lon"
8837  etc. but it's easy to override the parseShape method if you wish. (David
8838  Smiley)
8839
8840* LUCENE-5414: DocumentExpressionDictionary was renamed to
8841  DocumentValueSourceDictionary and all dependencies to the lucene-expression
8842  module were removed from lucene-suggest. DocumentValueSourceDictionary now
8843  only accepts a ValueSource instead of a convenience ctor for an expression
8844  string. (Simon Willnauer)
8845
8846* LUCENE-3069: PostingsWriterBase and PostingsReaderBase are no longer
8847  responsible for encoding/decoding a block of terms.  Instead, they
8848  should encode/decode each term to/from a long[] and byte[].  (Han
8849  Jiang, Mike McCandless)
8850
8851* LUCENE-5425: FacetsCollector and MatchingDocs use a general DocIdSet,
8852  allowing for custom implementations to be used when faceting.
8853  (John Wang, Lei Wang, Shai Erera)
8854
8855Optimizations
8856
8857* LUCENE-5372: Replace StringBuffer by StringBuilder, where possible.
8858  (Joshua Hartman via Uwe Schindler, Dawid Weiss, Mike McCandless)
8859
8860* LUCENE-5271: A slightly more accurate SloppyMath distance.
8861  (Gilad Barkai via Ryan Ernst)
8862
8863* LUCENE-5399: Deep paging using IndexSearcher.searchAfter when
8864  sorting by fields is faster (Rob Muir, Mike McCandless)
8865
8866Changes in Runtime Behavior
8867
8868* LUCENE-5362: IndexReader and SegmentCoreReaders now throw
8869  AlreadyClosedException if the refCount in incremented but
8870  is less that 1. (Simon Willnauer)
8871
8872Documentation
8873
8874* LUCENE-5384: Add some tips for making tokenfilters and tokenizers
8875  to the analysis package overview.
8876  (Benson Margulies via Robert Muir - pull request #12)
8877
8878* LUCENE-5389: Add more guidance in the analysis documentation
8879  package overview.
8880  (Benson Margulies via Robert Muir - pull request #14)
8881
8882======================= Lucene 4.6.1 =======================
8883
8884Bug fixes
8885
8886* LUCENE-5373: Memory usage of
8887  [Lucene40/Lucene42/Memory/Direct]DocValuesFormat was over-estimated.
8888  (Shay Banon, Adrien Grand, Robert Muir)
8889
8890* LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.
8891  (Nik Everett via Adrien Grand)
8892
8893* LUCENE-5374: IndexWriter processes internal events after the it
8894  closed itself internally. This rare condition can happen if an
8895  IndexWriter has internal changes that were not fully applied yet
8896  like when index / flush requests happen concurrently to the close or
8897  rollback call. (Simon Willnauer)
8898
8899* LUCENE-5394: Fix TokenSources.getTokenStream to return payloads if
8900  they were indexed with the term vectors. (Mike McCandless)
8901
8902* LUCENE-5344: Flexible StandardQueryParser behaves differently than
8903  ClassicQueryParser. (Adriano Crestani)
8904
8905* LUCENE-5375: ToChildBlockJoinQuery works harder to detect mis-use,
8906  when the parent query incorrectly returns child documents, and throw
8907  a clear exception saying so. (Dr. Oleg Savrasov via Mike McCandless)
8908
8909* LUCENE-5401: Field.StringTokenStream#end() calls super.end() now,
8910  preventing wrong term positions for fields that use
8911  StringTokenStream. (Michael Busch)
8912
8913* LUCENE-5377: IndexWriter.addIndexes(Directory[]) would cause corruption
8914  on Lucene 4.6 if any index segments were Lucene 4.0-4.5.
8915  (Littlestar, Mike McCandless, Shai Erera, Robert Muir)
8916
8917======================= Lucene 4.6.0 =======================
8918
8919New Features
8920
8921* LUCENE-4906: PostingsHighlighter can now render to custom Object,
8922  for advanced use cases where String is too restrictive (Luca
8923  Cavanna, Robert Muir, Mike McCandless)
8924
8925* LUCENE-5133: Changed AnalyzingInfixSuggester.highlight to return
8926  Object instead of String, to allow for advanced use cases where
8927  String is too restrictive (Robert Muir, Shai Erera, Mike
8928  McCandless)
8929
8930* LUCENE-5207, LUCENE-5334: Added expressions module for customizing ranking
8931  with script-like syntax.
8932  (Jack Conradson, Ryan Ernst, Uwe Schindler via Robert Muir)
8933
8934* LUCENE-5180: ShingleFilter now creates shingles with trailing holes,
8935  for example if a StopFilter had removed the last token.  (Mike
8936  McCandless)
8937
8938* LUCENE-5219: Add support to SynonymFilterFactory for custom
8939  parsers.  (Ryan Ernst via Robert Muir)
8940
8941* LUCENE-5235: Tokenizers now throw an IllegalStateException if the
8942  consumer does not call reset() before consuming the stream. Previous
8943  versions throwed NullPointerException or ArrayIndexOutOfBoundsException
8944  on best effort which was not user-friendly.
8945  (Uwe Schindler, Robert Muir)
8946
8947* LUCENE-5240: Tokenizers now throw an IllegalStateException if the
8948  consumer neglects to call close() on the previous stream before consuming
8949  the next one. (Uwe Schindler, Robert Muir)
8950
8951* LUCENE-5214: Add new FreeTextSuggester, to predict the next word
8952  using a simple ngram language model.  This is useful for the "long
8953  tail" suggestions, when a primary suggester fails to find a
8954  suggestion.  (Mike McCandless)
8955
8956* LUCENE-5251: New DocumentDictionary allows building suggesters via
8957  contents of existing field, weight and optionally payload stored
8958  fields in an index (Areek Zillur via Mike McCandless)
8959
8960* LUCENE-5261: Add QueryBuilder, a simple API to build queries from
8961  the analysis chain directly, or to make it easier to implement
8962  query parsers.  (Robert Muir, Uwe Schindler)
8963
8964* LUCENE-5270: Add Terms.hasFreqs, to determine whether a given field
8965  indexed per-doc term frequencies.  (Mike McCandless)
8966
8967* LUCENE-5269: Add CodepointCountFilter. (Robert Muir)
8968
8969* LUCENE-5294: Suggest module: add DocumentExpressionDictionary to
8970  compute each suggestion's weight using a javascript expression.
8971  (Areek Zillur via Mike McCandless)
8972
8973* LUCENE-5274: FastVectorHighlighter now supports highlighting against several
8974  indexed fields. (Nik Everett via Adrien Grand)
8975
8976* LUCENE-5304: SingletonSortedSetDocValues can now return the wrapped
8977  SortedDocValues (Robert Muir, Adrien Grand)
8978
8979* LUCENE-2844: The benchmark module can now test the spatial module. See
8980  spatial.alg  (David Smiley, Liviy Ambrose)
8981
8982* LUCENE-5302: Make StemmerOverrideMap's methods public (Alan Woodward)
8983
8984* LUCENE-5296: Add DirectDocValuesFormat, which holds all doc values
8985  in heap as uncompressed java native arrays.  (Mike McCandless)
8986
8987* LUCENE-5189: Add IndexWriter.updateNumericDocValues, to update
8988  numeric DocValues fields of documents, without re-indexing them.
8989  (Shai Erera, Mike McCandless, Robert Muir)
8990
8991* LUCENE-5298: Add SumValueSourceFacetRequest for aggregating facets by
8992  a ValueSource, such as a NumericDocValuesField or an expression.
8993  (Shai Erera)
8994
8995* LUCENE-5323: Add .sizeInBytes method to all suggesters (Lookup).
8996  (Areek Zillur via Mike McCandless)
8997
8998* LUCENE-5312: Add BlockJoinSorter, a new Sorter implementation that makes sure
8999  to never split up blocks of documents indexed with IndexWriter.addDocuments.
9000  (Adrien Grand)
9001
9002* LUCENE-5297: Allow to range-facet on any ValueSource, not just
9003  NumericDocValues fields. (Shai Erera)
9004
9005Bug Fixes
9006
9007* LUCENE-5272: OpenBitSet.ensureCapacity did not modify numBits, causing
9008  false assertion errors in fastSet. (Shai Erera)
9009
9010* LUCENE-5303: OrdinalsCache did not use coreCacheKey, resulting in
9011  over caching across multiple threads. (Mike McCandless, Shai Erera)
9012
9013* LUCENE-5307: Fix topScorer inconsistency in handling QueryWrapperFilter
9014  inside ConstantScoreQuery, which now rewrites to a query removing the
9015  obsolete QueryWrapperFilter.  (Adrien Grand, Uwe Schindler)
9016
9017* LUCENE-5330: IndexWriter didn't process all internal events on
9018  #getReader(), #close() and #rollback() which causes files to be
9019  deleted at a later point in time. This could cause short-term disk
9020  pollution or OOM if in-memory directories are used. (Simon Willnauer)
9021
9022* LUCENE-5342: Fixed bulk-merge issue in CompressingStoredFieldsFormat which
9023  created corrupted segments when mixing chunk sizes.
9024  Lucene41StoredFieldsFormat is not impacted. (Adrien Grand, Robert Muir)
9025
9026API Changes
9027
9028* LUCENE-5222: Add SortField.needsScores(). Previously it was not possible
9029  for a custom Sort that makes use of the relevance score to work correctly
9030  with IndexSearcher when an ExecutorService is specified.
9031  (Ryan Ernst, Mike McCandless, Robert Muir)
9032
9033* LUCENE-5275: Change AttributeSource.toString() to display the current
9034  state of attributes. (Robert Muir)
9035
9036* LUCENE-5277: Modify FixedBitSet copy constructor to take an additional
9037  numBits parameter to allow growing/shrinking the copied bitset. You can
9038  use FixedBitSet.clone() if you only need to clone the bitset. (Shai Erera)
9039
9040* LUCENE-5260: Use TermFreqPayloadIterator for all suggesters; those
9041  suggesters that can't support payloads will throw an exception if
9042  hasPayloads() is true.  (Areek Zillur via Mike McCandless)
9043
9044* LUCENE-5280: Rename TermFreqPayloadIterator -> InputIterator, along
9045  with associated suggest/spell classes.  (Areek Zillur via Mike
9046  McCandless)
9047
9048* LUCENE-5157: Rename OrdinalMap methods to clarify API and internal structure.
9049  (Boaz Leskes via Adrien Grand)
9050
9051* LUCENE-5313: Move preservePositionIncrements from setter to ctor in
9052  Analyzing/FuzzySuggester.  (Areek Zillur via Mike McCandless)
9053
9054* LUCENE-5321: Remove Facet42DocValuesFormat. Use DirectDocValuesFormat if you
9055  want to load the category list into memory. (Shai Erera, Mike McCandless)
9056
9057* LUCENE-5324: AnalyzerWrapper.getPositionIncrementGap and getOffsetGap can now
9058  be overridden. (Adrien Grand)
9059
9060Optimizations
9061
9062* LUCENE-5225: The ToParentBlockJoinQuery only keeps tracks of the the child
9063  doc ids and child scores if the ToParentBlockJoinCollector is used.
9064  (Martijn van Groningen)
9065
9066* LUCENE-5236: EliasFanoDocIdSet now has an index and uses broadword bit
9067  selection to speed-up advance(). (Paul Elschot via Adrien Grand)
9068
9069* LUCENE-5266: Improved number of read calls and branches in DirectPackedReader. (Ryan Ernst)
9070
9071* LUCENE-5300: Optimized SORTED_SET storage for fields which are single-valued.
9072  (Adrien Grand)
9073
9074Documentation
9075
9076* LUCENE-5211: Better javadocs and error checking of 'format' option in
9077  StopFilterFactory, as well as comments in all snowball formatted files
9078  about specifying format option.  (hossman)
9079
9080Changes in backwards compatibility policy
9081
9082* LUCENE-5235: Sub classes of Tokenizer have to call super.reset()
9083  when implementing reset(). Otherwise the consumer will get an
9084  IllegalStateException because the Reader is not correctly assigned.
9085  It is important to never change the "input" field on Tokenizer
9086  without using setReader(). The "input" field must not be used
9087  outside reset(), incrementToken(), or end() - especially not in
9088  the constructor.  (Uwe Schindler, Robert Muir)
9089
9090* LUCENE-5204: Directory doesn't have default implementations for
9091  LockFactory-related methods, which have been moved to BaseDirectory. If you
9092  had a custom Directory implementation that extended Directory, you need to
9093  extend BaseDirectory instead. (Adrien Grand)
9094
9095Build
9096
9097* LUCENE-5283: Fail the build if ant test didn't execute any tests
9098  (everything filtered out). (Dawid Weiss, Uwe Schindler)
9099
9100* LUCENE-5249, LUCENE-5257: All Lucene/Solr modules should use the same
9101  dependency versions. (Steve Rowe)
9102
9103* LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary
9104  distributions accompanying a release, including on Maven Central,
9105  should be identical across all distributions. (Steve Rowe, Uwe Schindler,
9106  Shalin Shekhar Mangar)
9107
9108* LUCENE-4753: Run forbidden-apis Ant task per module. This allows more
9109  improvements and prevents OOMs after the number of class files
9110  raised recently.  (Uwe Schindler)
9111
9112Tests
9113
9114* LUCENE-5278: Fix MockTokenizer to work better with more regular expression
9115  patterns. Previously it could only behave like CharTokenizer (where a character
9116  is either a "word" character or not), but now it gives a general longest-match
9117  behavior.  (Nik Everett via Robert Muir)
9118
9119======================= Lucene 4.5.1 =======================
9120
9121Bug Fixes
9122
9123* LUCENE-4998: Fixed a few places to pass IOContext.READONCE instead
9124  of IOContext.READ (Shikhar Bhushan via Mike McCandless)
9125
9126* LUCENE-5242: DirectoryTaxonomyWriter.replaceTaxonomy did not fully reset
9127  its state, which could result in exceptions being thrown, as well as
9128  incorrect ordinals returned from getParent. (Shai Erera)
9129
9130* LUCENE-5254: Fixed bounded memory leak, where objects like live
9131  docs bitset were not freed from an starting reader after reopening
9132  to a new reader and closing the original one.  (Shai Erera, Mike
9133  McCandless)
9134
9135* LUCENE-5262: Fixed file handle leaks when multiple attempts to open an
9136  NRT reader hit exceptions. (Shai Erera)
9137
9138* LUCENE-5263: Transient IOExceptions, e.g. due to disk full or file
9139  descriptor exhaustion, hit at unlucky times inside IndexWriter could
9140  lead to silently losing deletions. (Shai Erera, Mike McCandless)
9141
9142* LUCENE-5264: CommonTermsQuery ignored minMustMatch if only high-frequent
9143  terms were present in the query and the high-frequent operator was set
9144  to SHOULD. (Simon Willnauer)
9145
9146* LUCENE-5269: Fix bug in NGramTokenFilter where it would sometimes count
9147  unicode characters incorrectly. (Mike McCandless, Robert Muir)
9148
9149* LUCENE-5289: IndexWriter.hasUncommittedChanges was returning false
9150  when there were buffered delete-by-Term.  (Shalin Shekhar Mangar,
9151  Mike McCandless)
9152
9153======================= Lucene 4.5.0 =======================
9154
9155New features
9156
9157* LUCENE-5084: Added new Elias-Fano encoder, decoder and DocIdSet
9158  implementations. (Paul Elschot via Adrien Grand)
9159
9160* LUCENE-5081: Added WAH8DocIdSet, an in-memory doc id set implementation based
9161  on word-aligned hybrid encoding. (Adrien Grand)
9162
9163* LUCENE-5098: New broadword utility methods in oal.util.BroadWord.
9164  (Paul Elschot via Adrien Grand, Dawid Weiss)
9165
9166* LUCENE-5030: FuzzySuggester now supports optional unicodeAware
9167  (default is false).  If true then edits are measured in Unicode code
9168  points instead of UTF8 bytes.  (Artem Lukanin via Mike McCandless)
9169
9170* LUCENE-5118: SpatialStrategy.makeDistanceValueSource() now has an optional
9171  multiplier for scaling degrees to another unit. (David Smiley)
9172
9173* LUCENE-5091: SpanNotQuery can now be configured with pre and post slop to act
9174  as a hypothetical SpanNotNearQuery. (Tim Allison via David Smiley)
9175
9176* LUCENE-4985: FacetsAccumulator.create() is now able to create a
9177  MultiFacetsAccumulator over a mixed set of facet requests. MultiFacetsAccumulator
9178  allows wrapping multiple FacetsAccumulators, allowing to easily mix
9179  existing and custom ones. TaxonomyFacetsAccumulator supports any
9180  FacetRequest which implements createFacetsAggregator and was indexed
9181  using the taxonomy index. (Shai Erera)
9182
9183* LUCENE-5153: AnalyzerWrapper.wrapReader allows wrapping the Reader given to
9184  inputReader. (Shai Erera)
9185
9186* LUCENE-5155: FacetRequest.getValueOf and .getFacetArraysSource replaced by
9187  FacetsAggregator.createOrdinalValueResolver. This gives better options for
9188  resolving an ordinal's value by FacetAggregators. (Shai Erera)
9189
9190* LUCENE-5165: Add SuggestStopFilter, to be used with analyzing
9191  suggesters, so that a stop word at the very end of the lookup query,
9192  and without any trailing token characters, will be preserved.  This
9193  enables query "a" to suggest apple; see
9194  http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html
9195  for details.
9196
9197* LUCENE-5178: Added support for missing values to DocValues fields.
9198  AtomicReader.getDocsWithField returns a Bits of documents with a value,
9199  and FieldCache.getDocsWithField forwards to that for DocValues fields. Things like
9200  SortField.setMissingValue, FunctionValues.exists, and FieldValueFilter now
9201  work with DocValues fields.  (Robert Muir)
9202
9203* LUCENE-5124: Lucene 4.5 has a new Lucene45Codec with Lucene45DocValues,
9204  supporting missing values and with most datastructures residing off-heap.
9205  Added "Memory" docvalues format that works entirely in heap, and "Disk"
9206  loads no datastructures into RAM. Both of these also support missing values.
9207  Added DiskNormsFormat (in case you want norms entirely on disk).  (Robert Muir)
9208
9209* LUCENE-2750: Added PForDeltaDocIdSet, an in-memory doc id set implementation
9210  based on the PFOR encoding. (Adrien Grand)
9211
9212* LUCENE-5186: Added CachingWrapperFilter.getFilter in order to be able to get
9213  the wrapped filter. (Trejkaz via Adrien Grand)
9214
9215* LUCENE-5197: Added SegmentReader.ramBytesUsed to return approximate heap RAM
9216  used by index datastructures. (Areek Zillur via Robert Muir)
9217
9218Bug Fixes
9219
9220* LUCENE-5116: IndexWriter.addIndexes(IndexReader...) should drop empty (or all
9221  deleted) segments. (Robert Muir, Shai Erera)
9222
9223* LUCENE-5132: Spatial RecursivePrefixTree Contains predicate will throw an NPE
9224  when there's no indexed data and maybe in other circumstances too. (David Smiley)
9225
9226* LUCENE-5146: AnalyzingSuggester sort comparator read part of the input key as the
9227  weight that caused the sorter to never sort by weight first since the weight is only
9228  considered if the input is equal causing the malformed weight to be identical as well.
9229  (Simon Willnauer)
9230
9231* LUCENE-5151: Associations FacetsAggregators could enter an infinite loop when
9232  some result documents were missing category associations. (Shai Erera)
9233
9234* LUCENE-5152: Fix MemoryPostingsFormat to not modify borrowed BytesRef from FSTEnum
9235  seek/lookup which can cause side effects if done on a cached FST root arc.
9236  (Simon Willnauer)
9237
9238* LUCENE-5160: Handle the case where reading from a file or FileChannel returns -1,
9239  which could happen in rare cases where something happens to the file between the
9240  time we start the read loop (where we check the length) and when we actually do
9241  the read. (gsingers, yonik, Robert Muir, Uwe Schindler)
9242
9243* LUCENE-5166: PostingsHighlighter would throw IOOBE if a term spanned the maxLength
9244  boundary, made it into the top-N and went to the formatter.
9245  (Manuel Amoabeng, Michael McCandless, Robert Muir)
9246
9247* LUCENE-4583: Indexing core no longer enforces a limit on maximum
9248  length binary doc values fields, but individual codecs (including
9249  the default one) have their own limits (David Smiley, Robert Muir,
9250  Mike McCandless)
9251
9252* LUCENE-3849: TokenStreams now set the position increment in end(),
9253  so we can handle trailing holes.  If you have a custom TokenStream
9254  implementing end() then be sure it calls super.end().  (Robert Muir,
9255  Mike McCandless)
9256
9257* LUCENE-5192: IndexWriter could allow adding same field name with different
9258  DocValueTypes under some circumstances. (Shai Erera)
9259
9260* LUCENE-5191: SimpleHTMLEncoder in Highlighter module broke Unicode
9261  outside BMP because it encoded UTF-16 chars instead of codepoints.
9262  The escaping of codepoints > 127 was removed (not needed for valid HTML)
9263  and missing escaping for ' and / was added.  (Uwe Schindler)
9264
9265* LUCENE-5201: Fixed compression bug in LZ4.compressHC when the input is highly
9266  compressible and the start offset of the array to compress is > 0.
9267  (Adrien Grand)
9268
9269* LUCENE-5221: SimilarityBase did not write norms the same way as DefaultSimilarity
9270  if discountOverlaps == false and index-time boosts are present for the field.
9271  (Yubin Kim via Robert Muir)
9272
9273* LUCENE-5223: Fixed IndexUpgrader command line parsing: -verbose is not required
9274  and -dir-impl option now works correctly.  (hossman)
9275
9276* LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always
9277  return a ConstantScoreQuery to make scoring consistent. Previously it
9278  returned an empty unwrapped BooleanQuery, if no terms were available,
9279  which has a different query norm.  (Nik Everett, Uwe Schindler)
9280
9281* LUCENE-5218: In some cases, trying to retrieve or merge a 0-length
9282  binary doc value would hit an ArrayIndexOutOfBoundsException.
9283  (Littlestar via Mike McCandless)
9284
9285API Changes
9286
9287* LUCENE-5094: Add ramBytesUsed() to MultiDocValues.OrdinalMap.
9288  (Robert Muir)
9289
9290* LUCENE-5114: Remove unused boolean useCache parameter from
9291  TermsEnum.seekCeil and .seekExact (Mike McCandless)
9292
9293* LUCENE-5128: IndexSearcher.searchAfter throws IllegalArgumentException if
9294  searchAfter exceeds the number of documents in the reader.
9295  (Crocket via Shai Erera)
9296
9297* LUCENE-5129: CategoryAssociationsContainer no longer supports null
9298  association values for categories. If you want to index categories without
9299  associations, you should add them using FacetFields. (Shai Erera)
9300
9301* LUCENE-4876: IndexWriter no longer clones the given IndexWriterConfig. If you
9302  need to use the same config more than once, e.g. when sharing between multiple
9303  writers, make sure to clone it before passing to each writer.
9304  (Shai Erera, Mike McCandless)
9305
9306* LUCENE-5144: StandardFacetsAccumulator renamed to OldFacetsAccumulator, and all
9307  associated classes were moved under o.a.l.facet.old. The intention to remove it
9308  one day, when the features it covers (complements, partitions, sampling) will be
9309  migrated to the new FacetsAggregator and FacetsAccumulator API. Also,
9310  FacetRequest.createAggregator was replaced by OldFacetsAccumulator.createAggregator.
9311  (Shai Erera)
9312
9313* LUCENE-5149: CommonTermsQuery now allows to set the minimum number of terms that
9314  should match for its high and low frequent sub-queries. Previously this was only
9315  supported on the low frequent terms query. (Simon Willnauer)
9316
9317* LUCENE-5156: CompressingTermVectors TermsEnum no longer supports ord().
9318  (Robert Muir)
9319
9320* LUCENE-5161, LUCENE-5164: Fix default chunk sizes in FSDirectory to not be
9321  unnecessarily large (now 8192 bytes); also use chunking when writing to index
9322  files. FSDirectory#setReadChunkSize() is now deprecated and will be removed
9323  in Lucene 5.0.  (Uwe Schindler, Robert Muir, gsingers)
9324
9325* LUCENE-5170: Analyzer.ReuseStrategy instances are now stateless and can
9326  be reused in other Analyzer instances, which was not possible before.
9327  Lucene ships now with stateless singletons for per field and global reuse.
9328  Legacy code can still instantiate the deprecated implementation classes,
9329  but new code should use the constants. Implementors of custom strategies
9330  have to take care of new method signatures. AnalyzerWrapper can now be
9331  configured to use a custom strategy, too, ideally the one from the wrapped
9332  Analyzer. Analyzer adds a getter to retrieve the strategy for this use-case.
9333  (Uwe Schindler, Robert Muir, Shay Banon)
9334
9335* LUCENE-5173: Lucene never writes segments with 0 documents anymore.
9336  (Shai Erera, Uwe Schindler, Robert Muir)
9337
9338* LUCENE-5178: SortedDocValues always returns -1 ord when a document is missing
9339  a value for the field. Previously it only did this if the SortedDocValues
9340  was produced by uninversion on the FieldCache.  (Robert Muir)
9341
9342* LUCENE-5183: remove BinaryDocValues.MISSING. In order to determine a document
9343  is missing a field, use getDocsWithField instead.  (Robert Muir)
9344
9345Changes in Runtime Behavior
9346
9347* LUCENE-5178: DocValues codec consumer APIs (iterables) return null values
9348  when the document has no value for the field. (Robert Muir)
9349
9350* LUCENE-5200: The HighFreqTerms command-line tool returns the true top-N
9351  by totalTermFreq when using the -t option, it uses the term statistics (faster)
9352  and now always shows totalTermFreq in the output.  (Robert Muir)
9353
9354Optimizations
9355
9356* LUCENE-5088: Added TermFilter to filter docs by a specific term.
9357  (Martijn van Groningen)
9358
9359* LUCENE-5119: DiskDV keeps the document-to-ordinal mapping on disk for
9360  SortedDocValues.  (Robert Muir)
9361
9362* LUCENE-5145: New AppendingPackedLongBuffer, a new variant of the former
9363  AppendingLongBuffer which assumes values are 0-based.
9364  (Boaz Leskes via Adrien Grand)
9365
9366* LUCENE-5145: All Appending*Buffer now support bulk get.
9367  (Boaz Leskes via Adrien Grand)
9368
9369* LUCENE-5140: Fixed a performance regression of span queries caused by
9370  LUCENE-4946. (Alan Woodward, Adrien Grand)
9371
9372* LUCENE-5150: Make WAH8DocIdSet able to inverse its encoding in order to
9373  compress dense sets efficiently as well. (Adrien Grand)
9374
9375* LUCENE-5159: Prefix-code the sorted/sortedset value dictionaries in DiskDV.
9376  (Robert Muir)
9377
9378* LUCENE-5170: Fixed several wrapper analyzers to inherit the reuse strategy
9379  of the wrapped Analyzer.  (Uwe Schindler, Robert Muir, Shay Banon)
9380
9381* LUCENE-5006: Simplified DocumentsWriter and DocumentsWriterPerThread
9382  synchronization and concurrent interaction with IndexWriter. DWPT is now
9383  only setup once and has no reset logic. All segment publishing and state
9384  transition from DWPT into IndexWriter is now done via an Event-Queue
9385  processed from within the IndexWriter in order to prevent situations
9386  where DWPT or DW calling int IW causing deadlocks. (Simon Willnauer)
9387
9388* LUCENE-5182: Terminate phrase searches early if max phrase window is
9389  exceeded in FastVectorHighlighter to prevent very long running phrase
9390  extraction if phrase terms are high frequent. (Simon Willnauer)
9391
9392* LUCENE-5188: CompressingStoredFieldsFormat now slices chunks containing big
9393  documents into fixed-size blocks so that requesting a single field does not
9394  necessarily force to decompress the whole chunk. (Adrien Grand)
9395
9396* LUCENE-5101: CachingWrapper makes it easier to plug-in a custom cacheable
9397  DocIdSet implementation and uses WAH8DocIdSet by default, which should be
9398  more memory efficient than FixedBitSet on average as well as faster on small
9399  sets. (Robert Muir)
9400
9401Documentation
9402
9403* LUCENE-4894: remove facet userguide as it was outdated. Partially absorbed into
9404  package's documentation and classes javadocs. (Shai Erera)
9405
9406* LUCENE-5206: Clarify FuzzyQuery's unexpected behavior on short
9407  terms. (Tim Allison via Mike McCandless)
9408
9409Changes in backwards compatibility policy
9410
9411* LUCENE-5141: CheckIndex.fixIndex(Status,Codec) is now
9412  CheckIndex.fixIndex(Status). If you used to pass a codec to this method, just
9413  remove it from the arguments. (Adrien Grand)
9414
9415* LUCENE-5089, SOLR-5126: Update to Morfologik 1.7.1. MorfologikAnalyzer and MorfologikFilter
9416  no longer support multiple "dictionaries" as there is only one dictionary available.
9417  (Dawid Weiss)
9418
9419* LUCENE-5170: Changed method signatures of Analyzer.ReuseStrategy to take
9420  Analyzer. Closeable interface was removed because the class was changed to
9421  be stateless.  (Uwe Schindler, Robert Muir, Shay Banon)
9422
9423* LUCENE-5187: SlowCompositeReaderWrapper constructor is now private,
9424  SlowCompositeReaderWrapper.wrap should be used instead. (Adrien Grand)
9425
9426* LUCENE-5101: CachingWrapperFilter doesn't always return FixedBitSet instances
9427  anymore. Users of the join module can use
9428  oal.search.join.FixedBitSetCachingWrapperFilter instead. (Adrien Grand)
9429
9430Build
9431
9432* SOLR-5159: Manifest includes non-parsed maven variables.
9433  (Artem Karpenko via Steve Rowe)
9434
9435* LUCENE-5193: Add jar-src as top-level target to generate all Lucene and Solr
9436  *-src.jar. (Steve Rowe, Shai Erera)
9437
9438======================= Lucene 4.4.0 =======================
9439
9440Changes in backwards compatibility policy
9441
9442* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords
9443  (Dawid Weiss, Grzegorz Sobczyk)
9444
9445* LUCENE-4955: NGramTokenFilter now emits all n-grams for the same token at the
9446  same position and preserves the position length and the offsets of the
9447  original token. (Simon Willnauer, Adrien Grand)
9448
9449* LUCENE-4955: NGramTokenizer now emits n-grams in a different order
9450  (a, ab, b, bc, c) instead of (a, b, c, ab, bc) and doesn't trim trailing
9451  whitespaces. (Adrien Grand)
9452
9453* LUCENE-5042: The n-gram and edge n-gram tokenizers and filters now correctly
9454  handle supplementary characters, and the tokenizers have the ability to
9455  pre-tokenize the input stream similarly to CharTokenizer. (Adrien Grand)
9456
9457* LUCENE-4967: NRTManager is replaced by
9458  ControlledRealTimeReopenThread, for controlling which requests must
9459  see which indexing changes, so that it can work with any
9460  ReferenceManager (Mike McCandless)
9461
9462* LUCENE-4973: SnapshotDeletionPolicy no longer requires a unique
9463  String id (Mike McCandless, Shai Erera)
9464
9465* LUCENE-4946: The internal sorting API (SorterTemplate, now Sorter) has been
9466  completely refactored to allow for a better implementation of TimSort.
9467  (Adrien Grand, Uwe Schindler, Dawid Weiss)
9468
9469* LUCENE-4963: Some TokenFilter options that generate broken TokenStreams have
9470  been deprecated: updateOffsets=true on TrimFilter and
9471  enablePositionIncrements=false on all classes that inherit from
9472  FilteringTokenFilter: JapanesePartOfSpeechStopFilter, KeepWordFilter,
9473  LengthFilter, StopFilter and TypeTokenFilter. (Adrien Grand)
9474
9475* LUCENE-4963: In order not to take position increments into account in
9476  suggesters, you now need to call setPreservePositionIncrements(false) instead
9477  of configuring the token filters to not increment positions. (Adrien Grand)
9478
9479* LUCENE-3907: EdgeNGramTokenizer now supports maxGramSize > 1024, doesn't trim
9480  the input, sets position increment = 1 for all tokens and doesn't support
9481  backward grams anymore. (Adrien Grand)
9482
9483* LUCENE-3907: EdgeNGramTokenFilter does not support backward grams and does
9484  not update offsets anymore. (Adrien Grand)
9485
9486* LUCENE-4981: PositionFilter is now deprecated as it can corrupt token stream
9487  graphs. Since it main use-case was to make query parsers generate boolean
9488  queries instead of phrase queries, it is now advised to use
9489  QueryParser.setAutoGeneratePhraseQueries(false) (for simple cases) or to
9490  override QueryParser.newFieldQuery. (Adrien Grand, Steve Rowe)
9491
9492* LUCENE-5018: CompoundWordTokenFilterBase and its children
9493  DictionaryCompoundWordTokenFilter and HyphenationCompoundWordTokenFilter don't
9494  update offsets anymore. (Adrien Grand)
9495
9496* LUCENE-5015: SamplingAccumulator no longer corrects the counts of the sampled
9497  categories. You should set TakmiSampleFixer on SamplingParams if required (but
9498  notice that this means slower search). (Rob Audenaerde, Gilad Barkai, Shai Erera)
9499
9500* LUCENE-4933: Replace ExactSimScorer/SloppySimScorer with just SimScorer. Previously
9501  there were 2 implementations as a performance hack to support tableization of
9502  sqrt(), but this caching is removed, as sqrt is implemented in hardware with modern
9503  jvms and it's faster not to cache.  (Robert Muir)
9504
9505* LUCENE-5038: MergePolicy now has a default implementation for useCompoundFile based
9506  on segment size and noCFSRatio. The default implementation was pulled up from
9507  TieredMergePolicy. (Simon Willnauer)
9508
9509* LUCENE-5063: FieldCache.get(Bytes|Shorts), SortField.Type.(BYTE|SHORT) and
9510  FieldCache.DEFAULT_(BYTE|SHORT|INT|LONG|FLOAT|DOUBLE)_PARSER are now
9511  deprecated. These methods/types assume that data is stored as strings although
9512  Lucene has much better support for numeric data through (Int|Long)Field,
9513  NumericRangeQuery and FieldCache.get(Int|Long)s. (Adrien Grand)
9514
9515* LUCENE-5078: TfIDFSimilarity lets you encode the norm value as any arbitrary long.
9516  As a result, encode/decodeNormValue were made abstract with their signatures changed.
9517  The default implementation was moved to DefaultSimilarity, which encodes the norm as
9518  a single-byte value. (Shai Erera)
9519
9520Bug Fixes
9521
9522* LUCENE-4890: QueryTreeBuilder.getBuilder() only finds interfaces on the
9523  most derived class. (Adriano Crestani)
9524
9525* LUCENE-4997: Internal test framework's tests are sensitive to previous
9526  test failures and tests.failfast. (Dawid Weiss, Shai Erera)
9527
9528* LUCENE-4955: NGramTokenizer now supports inputs larger than 1024 chars.
9529  (Adrien Grand)
9530
9531* LUCENE-4959: Fix incorrect return value in
9532  SimpleNaiveBayesClassifier.assignClass. (Alexey Kutin via Adrien Grand)
9533
9534* LUCENE-4972: DirectoryTaxonomyWriter created empty commits even if no changes
9535  were made. (Shai Erera, Michael McCandless)
9536
9537* LUCENE-949: AnalyzingQueryParser can't work with leading wildcards.
9538  (Tim Allison, Robert Muir, Steve Rowe)
9539
9540* LUCENE-4980: Fix issues preventing mixing of RangeFacetRequest and
9541  non-RangeFacetRequest when using DrillSideways.  (Mike McCandless,
9542  Shai Erera)
9543
9544* LUCENE-4996: Ensure DocInverterPerField always includes field name
9545  in exception messages.  (Markus Jelsma via Robert Muir)
9546
9547* LUCENE-4992: Fix constructor of CustomScoreQuery to take FunctionQuery
9548  for scoringQueries. Instead use QueryValueSource to safely wrap arbitrary
9549  queries and use them with CustomScoreQuery.  (John Wang, Robert Muir)
9550
9551* LUCENE-5016: SamplingAccumulator returned inconsistent label if asked to
9552  aggregate a non-existing category. Also fixed a bug in RangeAccumulator if
9553  some readers did not have the requested numeric DV field.
9554  (Rob Audenaerde, Shai Erera)
9555
9556* LUCENE-5028: Remove pointless and confusing doShare option in FST's
9557  PositiveIntOutputs (Han Jiang via Mike McCandless)
9558
9559* LUCENE-5032: Fix IndexOutOfBoundsExc in PostingsHighlighter when
9560  multi-valued fields exceed maxLength (Tomás Fernández Löbbe
9561  via Mike McCandless)
9562
9563* LUCENE-4933: SweetSpotSimilarity didn't apply its tf function to some
9564  queries (SloppyPhraseQuery, SpanQueries).  (Robert Muir)
9565
9566* LUCENE-5033: SlowFuzzyQuery was accepting too many terms (documents) when
9567  provided minSimilarity is an int > 1 (Tim Allison via Mike McCandless)
9568
9569* LUCENE-5045: DrillSideways.search did not work on an empty index. (Shai Erera)
9570
9571* LUCENE-4995: CompressingStoredFieldsReader now only reuses an internal buffer
9572  when there is no more than 32kb to decompress. This prevents from running
9573  into out-of-memory errors when working with large stored fields.
9574  (Adrien Grand)
9575
9576* LUCENE-5062: If the spatial data for a document was comprised of multiple
9577  overlapping or adjacent parts then a CONTAINS predicate query might not match
9578  when the sum of those shapes contain the query shape but none do individually.
9579  A flag was added to use the original faster algorithm. (David Smiley)
9580
9581* LUCENE-4971: Fixed NPE in AnalyzingSuggester when there are too many
9582  graph expansions.  (Alexey Kudinov via Mike McCandless)
9583
9584* LUCENE-5080: Combined setMaxMergeCount and setMaxThreadCount into one
9585  setter in ConcurrentMergePolicy: setMaxMergesAndThreads.  Previously these
9586  setters would not work unless you invoked them very carefully.
9587  (Robert Muir, Shai Erera)
9588
9589* LUCENE-5068: QueryParserUtil.escape() does not escape forward slash.
9590  (Matias Holte via Steve Rowe)
9591
9592* LUCENE-5103: A join on A single-valued field with deleted docs scored too few
9593  docs. (David Smiley)
9594
9595* LUCENE-5090: Detect mismatched readers passed to
9596  SortedSetDocValuesReaderState and SortedSetDocValuesAccumulator.
9597  (Robert Muir, Mike McCandless)
9598
9599* LUCENE-5120: AnalyzingSuggester modified its FST's cached root arc if payloads
9600  are used and the entire output resided on the root arc on the first access. This
9601  caused subsequent suggest calls to fail. (Simon Willnauer)
9602
9603Optimizations
9604
9605* LUCENE-4936: Improve numeric doc values compression in case all values share
9606  a common divisor. In particular, this improves the compression ratio of dates
9607  without time when they are encoded as milliseconds since Epoch. Also support
9608  TABLE compressed numerics in the Disk codec.  (Robert Muir, Adrien Grand)
9609
9610* LUCENE-4951: DrillSideways uses the new Scorer.cost() method to make
9611  better decisions about which scorer to use internally.  (Mike McCandless)
9612
9613* LUCENE-4976: PersistentSnapshotDeletionPolicy writes its state to a
9614  single snapshots_N file, and no longer requires closing (Mike
9615  McCandless, Shai Erera)
9616
9617* LUCENE-5035: Compress addresses in FieldCacheImpl.SortedDocValuesImpl more
9618  efficiently. (Adrien Grand, Robert Muir)
9619
9620* LUCENE-4941: Sort "from" terms only once when using JoinUtil.
9621  (Martijn van Groningen)
9622
9623* LUCENE-5050: Close the stored fields and term vectors index files as soon as
9624  the index has been loaded into memory to save file descriptors. (Adrien Grand)
9625
9626* LUCENE-5086: RamUsageEstimator now uses official Java 7 API or a proprietary
9627  Oracle Java 6 API to get Hotspot MX bean, preventing AWT classes to be
9628  loaded on MacOSX.  (Shay Banon, Dawid Weiss, Uwe Schindler)
9629
9630New Features
9631
9632* LUCENE-5085: MorfologikFilter will no longer stem words marked as keywords
9633  (Dawid Weiss, Grzegorz Sobczyk)
9634
9635* LUCENE-5064: Added PagedMutable (internal), a paged extension of
9636  PackedInts.Mutable which allows for storing more than 2B values. (Adrien Grand)
9637
9638* LUCENE-4766: Added a PatternCaptureGroupTokenFilter that uses Java regexes to
9639  emit multiple tokens one for each capture group in one or more patterns.
9640  (Simon Willnauer, Clinton Gormley)
9641
9642* LUCENE-4952: Expose control (protected method) in DrillSideways to
9643  force all sub-scorers to be on the same document being collected.
9644  This is necessary when using collectors like
9645  ToParentBlockJoinCollector with DrillSideways.  (Mike McCandless)
9646
9647* SOLR-4761: Add SimpleMergedSegmentWarmer, which just initializes terms,
9648  norms, docvalues, and so on. (Mark Miller, Mike McCandless, Robert Muir)
9649
9650* LUCENE-4964: Allow arbitrary Query for per-dimension drill-down to
9651  DrillDownQuery and DrillSideways, to support future dynamic faceting
9652  methods (Mike McCandless)
9653
9654* LUCENE-4966: Add CachingWrapperFilter.sizeInBytes() (Mike McCandless)
9655
9656* LUCENE-4965: Add dynamic (no taxonomy index used) numeric range
9657  faceting to Lucene's facet module (Mike McCandless, Shai Erera)
9658
9659* LUCENE-4979: LiveFieldFields can work with any ReferenceManager, not
9660  just ReferenceManager<IndexSearcher> (Mike McCandless).
9661
9662* LUCENE-4975: Added a new Replicator module which can replicate index
9663  revisions between server and client. (Shai Erera, Mike McCandless)
9664
9665* LUCENE-5022: Added FacetResult.mergeHierarchies to merge multiple
9666  FacetResult of the same dimension into a single one with the reconstructed
9667  hierarchy. (Shai Erera)
9668
9669* LUCENE-5026: Added PagedGrowableWriter, a new internal packed-ints structure
9670  that grows the number of bits per value on demand, can store more than 2B
9671  values and supports random write and read access. (Adrien Grand)
9672
9673* LUCENE-5025: FST's Builder can now handle more than 2.1 billion
9674  "tail nodes" while building a minimal FST.  (Aaron Binns, Adrien
9675  Grand, Mike McCandless)
9676
9677* LUCENE-5063: FieldCache.DEFAULT.get(Ints|Longs) now uses bit-packing to save
9678  memory. (Adrien Grand)
9679
9680* LUCENE-5079: IndexWriter.hasUncommittedChanges() returns true if there are
9681  changes that have not been committed. (yonik, Mike McCandless, Uwe Schindler)
9682
9683* SOLR-4565: Extend NorwegianLightStemFilter and NorwegianMinimalStemFilter
9684  to handle "nynorsk" (Erlend Garåsen, janhoy via Robert Muir)
9685
9686* LUCENE-5087: Add getMultiValuedSeparator to PostingsHighlighter, for cases
9687  where you want a different logical separator between field values. This can
9688  be set to e.g. U+2029 PARAGRAPH SEPARATOR if you never want passes to span
9689  values. (Mike McCandless, Robert Muir)
9690
9691* LUCENE-5013: Added ScandinavianFoldingFilterFactory and
9692  ScandinavianNormalizationFilterFactory (Karl Wettin via janhoy)
9693
9694* LUCENE-4845: AnalyzingInfixSuggester finds suggestions based on
9695  matches to any tokens in the suggestion, not just based on pure
9696  prefix matching.  (Mike McCandless, Robert Muir)
9697
9698API Changes
9699
9700* LUCENE-5077: Make it easier to use compressed norms. Lucene42NormsFormat takes
9701  an overhead parameter, so you can easily pass a different value other than
9702  PackedInts.FASTEST from your own codec.  (Robert Muir)
9703
9704* LUCENE-5097: Analyzer now has an additional tokenStream(String fieldName,
9705  String text) method, so wrapping by StringReader for common use is no
9706  longer needed. This method uses an internal reusable reader, which was
9707  previously only used by the Field class.  (Uwe Schindler, Robert Muir)
9708
9709* LUCENE-4542: HunspellStemFilter's maximum recursion level is now configurable.
9710  (Piotr, Rafał Kuć via Adrien Grand)
9711
9712Build
9713
9714* LUCENE-4987: Upgrade randomized testing to version 2.0.10:
9715  Test framework may fail internally due to overly aggressive J9 optimizations.
9716  (Dawid Weiss, Shai Erera)
9717
9718* LUCENE-5043: The eclipse target now uses the containing directory for the
9719  project name.  This also enforces UTF-8 encoding when files are copied with
9720  filtering.
9721
9722* LUCENE-5055: "rat-sources" target now checks also build.xml, ivy.xml,
9723  forbidden-api signatures, and parts of resources folders.  (Ryan Ernst,
9724  Uwe Schindler)
9725
9726* LUCENE-5072: Automatically patch javadocs generated by JDK versions
9727  before 7u25 to work around the frame injection vulnerability (CVE-2013-1571,
9728  VU#225657).  (Uwe Schindler)
9729
9730Tests
9731
9732* LUCENE-4901: TestIndexWriterOnJRECrash should work on any
9733  JRE vendor via Runtime.halt().
9734  (Mike McCandless, Robert Muir, Uwe Schindler, Rodrigo Trujillo, Dawid Weiss)
9735
9736Changes in runtime behavior
9737
9738* LUCENE-5038: New segments written by IndexWriter are now wrapped into CFS
9739  by default. DocumentsWriterPerThread doesn't consult MergePolicy anymore
9740  to decide if a CFS must be written, instead IndexWriterConfig now has a
9741  property to enable / disable CFS for newly created segments. (Simon Willnauer)
9742
9743* LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding,
9744  Unicode is no longer escaped. Reading of legacy properties files with
9745  \u escapes is still possible.  (Uwe Schindler, Robert Muir)
9746
9747======================= Lucene 4.3.1 =======================
9748
9749Bug Fixes
9750
9751* SOLR-4813: Fix SynonymFilterFactory to allow init parameters for
9752  tokenizer factory used when parsing synonyms file.  (Shingo Sasaki, hossman)
9753
9754* LUCENE-4935: CustomScoreQuery wrongly applied its query boost twice
9755  (boost^2).  (Robert Muir)
9756
9757* LUCENE-4948: Fixed ArrayIndexOutOfBoundsException in PostingsHighlighter
9758  if you had a 64-bit JVM without compressed OOPS: IBM J9, or Oracle with
9759  large heap/explicitly disabled.  (Mike McCandless, Uwe Schindler, Robert Muir)
9760
9761* LUCENE-4953: Fixed ParallelCompositeReader to inform ReaderClosedListeners of
9762  its synthetic subreaders. FieldCaches keyed on the atomic children will be purged
9763  earlier and FC insanity prevented.  In addition, ParallelCompositeReader's
9764  toString() was changed to better reflect the reader structure.
9765  (Mike McCandless, Uwe Schindler)
9766
9767* LUCENE-4968: Fixed ToParentBlockJoinQuery/Collector: correctly handle parent
9768  hits that had no child matches, don't throw IllegalArgumentEx when
9769  the child query has no hits, more aggressively catch cases where childQuery
9770  incorrectly matches parent documents (Mike McCandless)
9771
9772* LUCENE-4970: Fix boost value of rewritten NGramPhraseQuery.
9773  (Shingo Sasaki via Adrien Grand)
9774
9775* LUCENE-4974: CommitIndexTask was broken if no params were set. (Shai Erera)
9776
9777* LUCENE-4986: Fixed case where a newly opened near-real-time reader
9778  fails to reflect a delete from IndexWriter.tryDeleteDocument (Reg,
9779  Mike McCandless)
9780
9781* LUCENE-4994: Fix PatternKeywordMarkerFilter to have public constructor.
9782  (Uwe Schindler)
9783
9784* LUCENE-4993: Fix BeiderMorseFilter to preserve custom attributes when
9785  inserting tokens with position increment 0.  (Uwe Schindler)
9786
9787* LUCENE-4991: Fix handling of synonyms in classic QueryParser.getFieldQuery for
9788  terms not separated by whitespace. PositionIncrementAttribute was ignored, so with
9789  default AND synonyms wrongly became mandatory clauses, and with OR, the
9790  coordination factor was wrong.  (李威, Robert Muir)
9791
9792* LUCENE-5002: IndexWriter#deleteAll() caused a deadlock in DWPT / DWSC if a
9793  DwPT was flushing concurrently while deleteAll() aborted all DWPT. The IW
9794  should never wait on DWPT via the flush control while holding on to the IW
9795  Lock. (Simon Willnauer)
9796
9797Optimizations
9798
9799* LUCENE-4938: Don't use an unnecessarily large priority queue in IndexSearcher
9800  methods that take top-N.  (Uwe Schindler, Mike McCandless, Robert Muir)
9801
9802
9803======================= Lucene 4.3.0 =======================
9804
9805Changes in backwards compatibility policy
9806
9807* LUCENE-4810: EdgeNGramTokenFilter no longer increments position for
9808  multiple ngrams derived from the same input token. (Walter Underwood
9809  via Mike McCandless)
9810
9811* LUCENE-4822: KeywordTokenFilter is now an abstract class. Subclasses
9812  need to implement #isKeyword() in order to mark terms as keywords.
9813  The existing functionality has been factored out into a new
9814  SetKeywordTokenFilter class. (Simon Willnauer, Uwe Schindler)
9815
9816* LUCENE-4642: Remove Tokenizer's and subclasses' ctors taking
9817  AttributeSource. (Renaud Delbru, Uwe Schindler, Steve Rowe)
9818
9819* LUCENE-4833: IndexWriterConfig used to use LogByteSizeMergePolicy when
9820  calling setMergePolicy(null) although the default merge policy is
9821  TieredMergePolicy. IndexWriterConfig setters now throw an exception when
9822  passed null if null is not a valid value. (Adrien Grand)
9823
9824* LUCENE-4849: Made ParallelTaxonomyArrays abstract with a concrete
9825  implementation for DirectoryTaxonomyWriter/Reader. Also moved it under
9826  o.a.l.facet.taxonomy. (Shai Erera)
9827
9828* LUCENE-4876: IndexDeletionPolicy is now an abstract class instead of an
9829  interface. IndexDeletionPolicy, MergeScheduler and InfoStream now implement
9830  Cloneable. (Adrien Grand)
9831
9832* LUCENE-4874: FilterAtomicReader and related classes (FilterTerms,
9833  FilterDocsEnum, ...) don't forward anymore to the filtered instance when the
9834  method has a default implementation through other abstract methods.
9835  (Adrien Grand, Robert Muir)
9836
9837* LUCENE-4642, LUCENE-4877: Implementors of TokenizerFactory, TokenFilterFactory,
9838  and CharFilterFactory now need to provide at least one constructor taking
9839  Map<String,String> to be able to be loaded by the SPI framework (e.g., from Solr).
9840  In addition, TokenizerFactory needs to implement the abstract
9841  create(AttributeFactory,Reader) method.  (Renaud Delbru, Uwe Schindler,
9842  Steve Rowe, Robert Muir)
9843
9844API Changes
9845
9846* LUCENE-4896: Made PassageFormatter abstract in PostingsHighlighter, made
9847  members of DefaultPassageFormatter protected.  (Luca Cavanna via Robert Muir)
9848
9849* LUCENE-4844: removed TaxonomyReader.getParent(), you should use
9850  TaxonomyReader.getParallelArrays().parents() instead. (Shai Erera)
9851
9852* LUCENE-4742: Renamed spatial 'Node' to 'Cell', along with any method names
9853  and variables using this terminology. (David Smiley)
9854
9855New Features
9856
9857* LUCENE-4815: DrillSideways now allows more than one FacetRequest per
9858  dimension (Mike McCandless)
9859
9860* LUCENE-3918: IndexSorter has been ported to 4.3 API and now supports
9861  sorting documents by a numeric DocValues field, or reverse the order of
9862  the documents in the index. Additionally, apps can implement their own
9863  sort criteria. (Anat Hashavit, Shai Erera)
9864
9865* LUCENE-4817: Added KeywordRepeatFilter that allows to emit a token twice
9866  once as a keyword and once as an ordinary token allow stemmers to emit
9867  a stemmed version along with the un-stemmed version. (Simon Willnauer)
9868
9869* LUCENE-4822: PatternKeywordTokenFilter can mark tokens as keywords based
9870  on regular expressions. (Simon Willnauer, Uwe Schindler)
9871
9872* LUCENE-4821: AnalyzingSuggester now uses the ending offset to
9873  determine whether the last token was finished or not, so that a
9874  query "i " will no longer suggest "Isla de Muerta" for example.
9875  (Mike McCandless)
9876
9877* LUCENE-4642: Add create(AttributeFactory) to TokenizerFactory and
9878  subclasses with ctors taking AttributeFactory.
9879  (Renaud Delbru, Uwe Schindler, Steve Rowe)
9880
9881* LUCENE-4820: Add payloads to Analyzing/FuzzySuggester, to record an
9882  arbitrary byte[] per suggestion (Mike McCandless)
9883
9884* LUCENE-4816: Add WholeBreakIterator to PostingsHighlighter
9885  for treating the entire content as a single Passage.  (Robert
9886  Muir, Mike McCandless)
9887
9888* LUCENE-4827: Add additional ctor to PostingsHighlighter PassageScorer
9889  to provide bm25 k1,b,avgdl parameters. (Robert Muir)
9890
9891* LUCENE-4607: Add DocIDSetIterator.cost() and Spans.cost() for optimizing
9892  scoring.  (Simon Willnauer, Robert Muir)
9893
9894* LUCENE-4795: Add SortedSetDocValuesFacetFields and
9895  SortedSetDocValuesAccumulator, to compute topK facet counts from a
9896  field's SortedSetDocValues.  This method only supports flat
9897  (dim/label) facets, is a bit (~25%) slower, has added cost
9898  per-IndexReader-open to compute its ordinal map, but it requires no
9899  taxonomy index and it tie-breaks facet labels in an understandable
9900  (by Unicode sort order) way.  (Robert Muir, Mike McCandless)
9901
9902* LUCENE-4843: Add LimitTokenPositionFilter: don't emit tokens with
9903  positions that exceed the configured limit.  (Steve Rowe)
9904
9905* LUCENE-4832: Add ToParentBlockJoinCollector.getTopGroupsWithAllChildDocs, to retrieve
9906  all children in each group.  (Aleksey Aleev via Mike McCandless)
9907
9908* LUCENE-4846: PostingsHighlighter subclasses can override where the
9909  String values come from (it still defaults to pulling from stored
9910  fields).  (Robert Muir, Mike McCandless)
9911
9912* LUCENE-4853: Add PostingsHighlighter.highlightFields method that
9913  takes int[] docIDs instead of TopDocs.  (Robert Muir, Mike
9914  McCandless)
9915
9916* LUCENE-4856: If there are no matches for a given field, return the
9917  first maxPassages sentences (Robert Muir, Mike McCandless)
9918
9919* LUCENE-4859: IndexReader now exposes Terms statistics: getDocCount,
9920  getSumDocFreq, getSumTotalTermFreq. (Shai Erera)
9921
9922* LUCENE-4862: It is now possible to terminate collection of a single
9923  IndexReader leaf by throwing a CollectionTerminatedException in
9924  Collector.collect. (Adrien Grand, Shai Erera)
9925
9926* LUCENE-4752: New SortingMergePolicy (in lucene/misc) that sorts documents
9927  before merging segments. (Adrien Grand, Shai Erera, David Smiley)
9928
9929* LUCENE-4860: Customize scoring and formatting per-field in
9930  PostingsHighlighter by subclassing and overriding the getFormatter
9931  and/or getScorer methods.  This also changes Passage.getMatchTerms()
9932  to return BytesRef[] instead of Term[].  (Robert Muir, Mike
9933  McCandless)
9934
9935* LUCENE-4839: Added SorterTemplate.timSort, a O(n log n) stable sort algorithm
9936  that performs well on partially sorted data. (Adrien Grand)
9937
9938* LUCENE-4644: Added support for the "IsWithin" spatial predicate for
9939  RecursivePrefixTreeStrategy. It's for matching non-point indexed shapes; if
9940  you only have points (1/doc) then "Intersects" is equivalent and faster.
9941  See the javadocs.  (David Smiley)
9942
9943* LUCENE-4861: Make BreakIterator per-field in PostingsHighlighter. This means
9944  you can override getBreakIterator(String field) to use different mechanisms
9945  for e.g. title vs. body fields.  (Mike McCandless, Robert Muir)
9946
9947* LUCENE-4645: Added support for the "Contains" spatial predicate for
9948  RecursivePrefixTreeStrategy.  (David Smiley)
9949
9950* LUCENE-4898: DirectoryReader.openIfChanged now allows opening a reader
9951  on an IndexCommit starting from a near-real-time reader (previously
9952  this would throw IllegalArgumentException).  (Mike McCandless)
9953
9954* LUCENE-4905: Made the maxPassages parameter per-field in PostingsHighlighter.
9955  (Robert Muir)
9956
9957* LUCENE-4897: Added TaxonomyReader.getChildren for traversing a category's
9958  children. (Shai Erera)
9959
9960* LUCENE-4902: Added FilterDirectoryReader to allow easy filtering of a
9961  DirectoryReader's subreaders. (Alan Woodward, Adrien Grand, Uwe Schindler)
9962
9963* LUCENE-4858: Added EarlyTerminatingSortingCollector to be used in conjunction
9964  with SortingMergePolicy, which allows to early terminate queries on sorted
9965  indexes, when the sort order matches the index order. (Adrien Grand, Shai Erera)
9966
9967* LUCENE-4904: Added descending sort order to NumericDocValuesSorter. (Shai Erera)
9968
9969* LUCENE-3786: Added SearcherTaxonomyManager, to manage access to both
9970  IndexSearcher and DirectoryTaxonomyReader for near-real-time
9971  faceting.  (Shai Erera, Mike McCandless)
9972
9973* LUCENE-4915: DrillSideways now allows drilling down on fields that
9974  are not faceted. (Mike McCandless)
9975
9976* LUCENE-4895: Added support for the "IsDisjointTo" spatial predicate for
9977  RecursivePrefixTreeStrategy.  (David Smiley)
9978
9979* LUCENE-4774: Added FieldComparator that allows sorting parent documents based on
9980  fields on the child / nested document level. (Martijn van Groningen)
9981
9982Optimizations
9983
9984* LUCENE-4839: SorterTemplate.merge can now be overridden in order to replace
9985  the default implementation which merges in-place by a faster implementation
9986  that could require fewer swaps at the expense of some extra memory.
9987  ArrayUtil and CollectionUtil override it so that their mergeSort and timSort
9988  methods are faster but only require up to 1% of extra memory. (Adrien Grand)
9989
9990* LUCENE-4571: Speed up BooleanQuerys with minNrShouldMatch to use
9991  skipping.  (Stefan Pohl via Robert Muir)
9992
9993* LUCENE-4863: StemmerOverrideFilter now uses a FST to represent its overrides
9994  in memory. (Simon Willnauer)
9995
9996* LUCENE-4889: UnicodeUtil.codePointCount implementation replaced with a
9997  non-array-lookup version. (Dawid Weiss)
9998
9999* LUCENE-4923: Speed up BooleanQuerys processing of in-order disjunctions.
10000  (Robert Muir)
10001
10002* LUCENE-4926: Speed up DisjunctionMatchQuery.  (Robert Muir)
10003
10004* LUCENE-4930: Reduce contention in older/buggy JVMs when using
10005  AttributeSource#addAttribute() because java.lang.ref.ReferenceQueue#poll()
10006  is implemented using synchronization.  (Christian Ziech, Karl Wright,
10007  Uwe Schindler)
10008
10009Bug Fixes
10010
10011* LUCENE-4868: SumScoreFacetsAggregator used an incorrect index into
10012  the scores array. (Shai Erera)
10013
10014* LUCENE-4882: FacetsAccumulator did not allow to count ROOT category (i.e.
10015  count dimensions). (Shai Erera)
10016
10017* LUCENE-4876: IndexWriterConfig.clone() now clones its MergeScheduler,
10018  IndexDeletionPolicy and InfoStream in order to make an IndexWriterConfig and
10019  its clone fully independent. (Adrien Grand)
10020
10021* LUCENE-4893: Facet counts were multiplied as many times as
10022  FacetsCollector.getFacetResults() is called. (Shai Erera)
10023
10024* LUCENE-4888: Fixed SloppyPhraseScorer, MultiDocs(AndPositions)Enum and
10025  MultiSpansWrapper which happened to sometimes call DocIdSetIterator.advance
10026  with target<=current (in this case the behavior of advance is undefined).
10027  (Adrien Grand)
10028
10029* LUCENE-4899: FastVectorHighlighter failed with StringIndexOutOfBoundsException
10030  if a single highlight phrase or term was greater than the fragCharSize producing
10031  negative string offsets. (Simon Willnauer)
10032
10033* LUCENE-4877: Throw exception for invalid arguments in analysis factories.
10034  (Steve Rowe, Uwe Schindler, Robert Muir)
10035
10036* LUCENE-4914: SpatialPrefixTree's Node/Cell.reset() forgot to reset the 'leaf'
10037  flag.  It affects SpatialRecursivePrefixTreeStrategy on non-point indexed
10038  shapes, as of Lucene 4.2. (David Smiley)
10039
10040* LUCENE-4913: FacetResultNode.ordinal was always 0 when all children
10041  are returned. (Mike McCandless)
10042
10043* LUCENE-4918: Highlighter closes the given IndexReader if QueryScorer
10044  is used with an external IndexReader. (Simon Willnauer, Sirvan Yahyaei)
10045
10046* LUCENE-4880: Fix MemoryIndex to consume empty terms from the tokenstream consistent
10047  with IndexWriter. Previously it discarded them.  (Timothy Allison via Robert Muir)
10048
10049* LUCENE-4885: FacetsAccumulator did not set the correct value for
10050  FacetResult.numValidDescendants. (Mike McCandless, Shai Erera)
10051
10052* LUCENE-4925: Fixed IndexSearcher.search when the argument list contains a Sort
10053  and one of the sort fields is the relevance score. Only IndexSearchers created
10054  with an ExecutorService are concerned. (Adrien Grand)
10055
10056* LUCENE-4738, LUCENE-2727, LUCENE-2812: Simplified
10057  DirectoryReader.indexExists so that it's more robust to transient
10058  IOExceptions (e.g. due to issues like file descriptor exhaustion),
10059  but this will also cause it to err towards returning true for
10060  example if the directory contains a corrupted index or an incomplete
10061  initial commit.  In addition, IndexWriter with OpenMode.CREATE will
10062  now succeed even if the directory contains a corrupted index (Billow
10063  Gao, Robert Muir, Mike McCandless)
10064
10065* LUCENE-4928: Stored fields and term vectors could become super slow in case
10066  of tiny documents (a few bytes). This is especially problematic when switching
10067  codecs since bulk-merge strategies can't be applied and the same chunk of
10068  documents can end up being decompressed thousands of times. A hard limit on
10069  the number of documents per chunk has been added to fix this issue.
10070  (Robert Muir, Adrien Grand)
10071
10072* LUCENE-4934: Fix minor equals/hashcode problems in facet/DrillDownQuery,
10073  BoostingQuery, MoreLikeThisQuery, FuzzyLikeThisQuery, and block join queries.
10074  (Robert Muir, Uwe Schindler)
10075
10076* LUCENE-4504: Fix broken sort comparator in ValueSource.getSortField,
10077  used when sorting by a function query.  (Tom Shally via Robert Muir)
10078
10079* LUCENE-4937: Fix incorrect sorting of float/double values (+/-0, NaN).
10080  (Robert Muir, Uwe Schindler)
10081
10082Documentation
10083
10084* LUCENE-4841: Added example SimpleSortedSetFacetsExample to show how
10085  to use the new SortedSetDocValues backed facet implementation.
10086  (Shai Erera, Mike McCandless)
10087
10088Build
10089
10090* LUCENE-4879: Upgrade randomized testing to version 2.0.9:
10091  Filter stack traces on console output. (Dawid Weiss, Robert Muir)
10092
10093
10094======================= Lucene 4.2.1 =======================
10095
10096Bug Fixes
10097
10098* LUCENE-4713: The SPI components used to load custom codecs or analysis
10099  components were fixed to also scan the Lucene ClassLoader in addition
10100  to the context ClassLoader, so Lucene is always able to find its own
10101  codecs. The special case of a null context ClassLoader is now also
10102  supported.  (Christian Kohlschütter, Uwe Schindler)
10103
10104* LUCENE-4819: seekExact(BytesRef, boolean) did not work correctly with
10105  Sorted[Set]DocValuesTermsEnum.  (Robert Muir)
10106
10107* LUCENE-4826: PostingsHighlighter was not returning the top N best
10108  scoring passages. (Robert Muir, Mike McCandless)
10109
10110* LUCENE-4854: Fix DocTermOrds.getOrdTermsEnum() to not return negative
10111  ord on initial next().  (Robert Muir)
10112
10113* LUCENE-4836: Fix SimpleRateLimiter#pause to return the actual time spent
10114  sleeping instead of the wakeup timestamp in nano seconds. (Simon Willnauer)
10115
10116* LUCENE-4828: BooleanQuery no longer extracts terms from its MUST_NOT
10117  clauses.  (Mike McCandless)
10118
10119* SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading
10120  of multivalued fields. (hossman)
10121
10122* LUCENE-4870: Fix bug where an entire index might be deleted by the IndexWriter
10123  due to false detection if an index exists in the directory when
10124  OpenMode.CREATE_OR_APPEND is used. This might also affect application that set
10125  the open mode manually using DirectoryReader#indexExists. (Simon Willnauer)
10126
10127* LUCENE-4878: Override getRegexpQuery in MultiFieldQueryParser to prevent
10128  NullPointerException when regular expression syntax is used with
10129  MultiFieldQueryParser. (Simon Willnauer, Adam Rauch)
10130
10131Optimizations
10132
10133* LUCENE-4819: Added Sorted[Set]DocValues.termsEnum(), and optimized the
10134  default codec for improved enumeration performance.  (Robert Muir)
10135
10136* LUCENE-4854: Speed up TermsEnum of FieldCache.getDocTermOrds.
10137  (Robert Muir)
10138
10139* LUCENE-4857: Don't unnecessarily copy stem override map in
10140  StemmerOverrideFilter. (Simon Willnauer)
10141
10142======================= Lucene 4.2.0 =======================
10143
10144Changes in backwards compatibility policy
10145
10146* LUCENE-4602: FacetFields now stores facet ordinals in a DocValues field,
10147  rather than a payload. This forces rebuilding existing indexes, or do a
10148  one time migration using FacetsPayloadMigratingReader. Since DocValues
10149  support in-memory caching, CategoryListCache was removed too.
10150  (Shai Erera, Michael McCandless)
10151
10152* LUCENE-4697: FacetResultNode is now a concrete class with public members
10153  (instead of getter methods). (Shai Erera)
10154
10155* LUCENE-4600: FacetsCollector is now an abstract class with two
10156  implementations: StandardFacetsCollector (the old version of
10157  FacetsCollector) and CountingFacetsCollector. FacetsCollector.create()
10158  returns the most optimized collector for the given parameters.
10159  (Shai Erera, Michael McCandless)
10160
10161* LUCENE-4700: OrdinalPolicy is now per CategoryListParams, and is no longer
10162  an interface, but rather an enum with values NO_PARENTS and ALL_PARENTS.
10163  PathPolicy was removed, you should extend FacetFields and DrillDownStream
10164  to control which categories are added as drill-down terms. (Shai Erera)
10165
10166* LUCENE-4547: DocValues improvements:
10167  - Simplified codec API: codecs are now only responsible for encoding and
10168    decoding docvalues, they do not need to do buffering or RAM accounting.
10169  - Per-Field support: added PerFieldDocValuesFormat, which allows you to
10170    use a different DocValuesFormat per field (like postings).
10171  - Unified with FieldCache api: DocValues can be accessed via FieldCache API,
10172    so it works automatically with grouping/join/sort/function queries, etc.
10173  - Simplified types: There are only 3 types (NUMERIC, BINARY, SORTED), so it's
10174    not necessary to specify for example that all of your binary values have
10175    the same length. Instead it's easy for the Codec API to optimize encoding
10176    based on any properties of the content.
10177  (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
10178
10179* LUCENE-4757: Cleanup and refactoring of FacetsAccumulator, FacetRequest,
10180  FacetsAggregator and FacetResultsHandler API. If your application did
10181  FacetsCollector.create(), you should not be affected, but if you wrote
10182  an Aggregator, then you should migrate it to the per-segment
10183  FacetsAggregator. You can still use StandardFacetsAccumulator, which works
10184  with the old API (for now). (Shai Erera)
10185
10186* LUCENE-4761: Facet packages reorganized. Should be easy to fix your import
10187  statements, if you use an IDE such as Eclipse. (Shai Erera)
10188
10189* LUCENE-4750: Convert DrillDown to DrillDownQuery, so you can initialize it
10190  and add drill-down categories to it. (Michael McCandless, Shai Erera)
10191
10192* LUCENE-4759: remove FacetRequest.SortBy; result categories are always
10193  sorted by value, while ties are broken by category ordinal. (Shai Erera)
10194
10195* LUCENE-4772: Facet associations moved to new FacetsAggregator API. You
10196  should override FacetsAccumulator and return the relevant aggregator,
10197  for aggregating the association values. (Shai Erera)
10198
10199* LUCENE-4748: A FacetRequest on a non-existent field now returns an
10200  empty FacetResult instead of skipping it.  (Shai Erera, Mike McCandless)
10201
10202* LUCENE-4806: The default category delimiter character was changed
10203  from U+F749 to U+001F, since the latter uses 1 byte vs 3 bytes for
10204  the former.  Existing facet indices must be reindexed.  (Robert
10205  Muir, Shai Erera, Mike McCandless)
10206
10207Optimizations
10208
10209* LUCENE-4687: BloomFilterPostingsFormat now lazily initializes delegate
10210  TermsEnum only if needed to do a seek or get a DocsEnum. (Simon Willnauer)
10211
10212* LUCENE-4677, LUCENE-4682: unpacked FSTs now use vInt to encode the node target,
10213  to reduce their size (Mike McCandless)
10214
10215* LUCENE-4678: FST now uses a paged byte[] structure instead of a
10216  single byte[] internally, to avoid large memory spikes during
10217  building (James Dyer, Mike McCandless)
10218
10219* LUCENE-3298: FST can now be larger than 2.1 GB / 2.1 B nodes.
10220  (James Dyer, Mike McCandless)
10221
10222* LUCENE-4690: Performance improvements and non-hashing versions
10223  of NumericUtils.*ToPrefixCoded() (yonik)
10224
10225* LUCENE-4715: CategoryListParams.getOrdinalPolicy now allows to return a
10226  different OrdinalPolicy per dimension, to better tune how you index
10227  facets. Also added OrdinalPolicy.ALL_BUT_DIMENSION.
10228  (Shai Erera, Michael McCandless)
10229
10230* LUCENE-4740: Don't track clones of MMapIndexInput if unmapping
10231  is disabled. This reduces GC overhead. (Kristofer Karlsson, Uwe Schindler)
10232
10233* LUCENE-4733: The default Lucene 4.2 codec now uses a more compact
10234  TermVectorsFormat (Lucene42TermVectorsFormat) based on
10235  CompressingTermVectorsFormat. (Adrien Grand)
10236
10237* LUCENE-3729: The default Lucene 4.2 codec now uses a more compact
10238  DocValuesFormat (Lucene42DocValuesFormat). Sorted values are stored in an
10239  FST, Numerics and Ordinals use a number of strategies (delta-compression,
10240  table-compression, etc), and memory addresses use MonotonicBlockPackedWriter.
10241  (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
10242
10243* LUCENE-4792: Reduction of the memory required to build the doc ID maps used
10244  when merging segments. (Adrien Grand)
10245
10246* LUCENE-4794: Spatial RecursivePrefixTreeStrategy's search filter: Skip calls
10247  to termsEnum.seek() when the next term is known to follow the current cell.
10248  (David Smiley)
10249
10250New Features
10251
10252* LUCENE-4686: New specialized DGapVInt8IntEncoder for facets (now the
10253  default). (Shai Erera)
10254
10255* LUCENE-4703: Add simple PrintTaxonomyStats tool to see summary
10256  information about the facets taxonomy index.  (Mike McCandless)
10257
10258* LUCENE-4599: New oal.codecs.compressing.CompressingTermVectorsFormat which
10259  compresses term vectors into chunks of documents similarly to
10260  CompressingStoredFieldsFormat. (Adrien Grand)
10261
10262* LUCENE-4695: Added LiveFieldValues utility class, for getting the
10263  current (live, real-time) value for any indexed doc/field.  The
10264  class buffers recently indexed doc/field values until a new
10265  near-real-time reader is opened that contains those changes.
10266  (Robert Muir, Mike McCandless)
10267
10268* LUCENE-4723: Add AnalyzerFactoryTask to benchmark, and enable analyzer
10269  creation via the resulting factories using NewAnalyzerTask.  (Steve Rowe)
10270
10271* LUCENE-4728: Unknown and not explicitly mapped queries are now rewritten
10272  against the highlighting IndexReader to obtain primitive queries before
10273  discarding the query entirely. WeightedSpanTermExtractor now builds a
10274  MemoryIndex only once even if multiple fields are highlighted.
10275  (Simon Willnauer)
10276
10277* LUCENE-4035: Added ICUCollationDocValuesField, more efficient
10278  support for Locale-sensitive sort and range queries for
10279  single-valued fields.  (Robert Muir)
10280
10281* LUCENE-4547: Added MonotonicBlockPacked(Reader/Writer), which provide
10282  efficient random access to large amounts of monotonically increasing
10283  positive values (e.g. file offsets). Each block stores the minimum value
10284  and the average gap, and values are encoded as signed deviations from
10285  the expected value.  (Adrien Grand)
10286
10287* LUCENE-4547: Added AppendingLongBuffer, an append-only buffer that packs
10288  signed long values in memory and provides an efficient iterator API.
10289  (Adrien Grand)
10290
10291* LUCENE-4540: It is now possible for a codec to represent norms with
10292  less than 8 bits per value. For performance reasons this is not done
10293  by default, but you can customize your codec (e.g. pass PackedInts.DEFAULT
10294  to Lucene42DocValuesConsumer) if you want to make this tradeoff.
10295  (Adrien Grand, Robert Muir)
10296
10297* LUCENE-4764: A new Facet42Codec and Facet42DocValuesFormat provide
10298  faster but more RAM-consuming facet performance.  (Shai Erera, Mike
10299  McCandless)
10300
10301* LUCENE-4769: Added OrdinalsCache and CachedOrdsCountingFacetsAggregator
10302  which uses the cache to obtain a document's ordinals. This aggregator
10303  is faster than others, however consumes much more RAM.
10304  (Michael McCandless, Shai Erera)
10305
10306* LUCENE-4778: Add a getter for the delegate in RateLimitedDirectoryWrapper.
10307  (Mark Miller)
10308
10309* LUCENE-4765: Add a multi-valued docvalues type (SORTED_SET). This is equivalent
10310  to building a FieldCache.getDocTermOrds at index-time.  (Robert Muir)
10311
10312* LUCENE-4780: Add MonotonicAppendingLongBuffer: an append-only buffer for
10313  monotonically increasing values.  (Adrien Grand)
10314
10315* LUCENE-4748: Added DrillSideways utility class for computing both
10316  drill-down and drill-sideways counts for a DrillDownQuery.  (Mike
10317  McCandless)
10318
10319API Changes
10320
10321* LUCENE-4709: FacetResultNode no longer has a residue field. (Shai Erera)
10322
10323* LUCENE-4716: DrillDown.query now takes Occur, allowing to specify if
10324  categories should be OR'ed or AND'ed. (Shai Erera)
10325
10326* LUCENE-4695: ReferenceManager.RefreshListener.afterRefresh now takes
10327  a boolean indicating whether a new reference was in fact opened, and
10328  a new beforeRefresh method notifies you when a refresh attempt is
10329  starting.  (Robert Muir, Mike McCandless)
10330
10331* LUCENE-4794: Spatial RecursivePrefixTreeFilter replaced by
10332  IntersectsPrefixTreeFilter and some extensible base classes. (David Smiley)
10333
10334Bug Fixes
10335
10336* LUCENE-4705: Pass on FilterStrategy in FilteredQuery if the filtered query is
10337  rewritten. (Simon Willnauer)
10338
10339* LUCENE-4712: MemoryIndex#normValues() throws NPE if field doesn't exist.
10340  (Simon Willnauer, Ricky Pritchett)
10341
10342* LUCENE-4550: Shapes wider than 180 degrees would use too much accuracy for the
10343  PrefixTree based SpatialStrategy. For a pathological case of nearly 360
10344  degrees and barely any height, it would generate so many indexed terms
10345  (> 500k) that it could even cause an OutOfMemoryError. Fixed. (David Smiley)
10346
10347* LUCENE-4704: Make join queries override hashcode and equals methods.
10348  (Martijn van Groningen)
10349
10350* LUCENE-4724: Fix bug in CategoryPath which allowed passing null or empty
10351  string components. This is forbidden now (throws an exception). Note that if
10352  you have a taxonomy index created with such strings, you should rebuild it.
10353  (Michael McCandless, Shai Erera)
10354
10355* LUCENE-4732: Fixed TermsEnum.seekCeil/seekExact on term vectors.
10356  (Adrien Grand, Robert Muir)
10357
10358* LUCENE-4739: Fixed bugs that prevented FSTs more than ~1.1GB from
10359  being saved and loaded (Adrien Grand, Mike McCandless)
10360
10361* LUCENE-4717: Fixed bug where Lucene40DocValuesFormat would sometimes write
10362  an extra unused ordinal for sorted types. The bug is detected and corrected
10363  on-the-fly for old indexes.  (Robert Muir)
10364
10365* LUCENE-4547: Fixed bug where Lucene40DocValuesFormat was unable to encode
10366  segments that would exceed 2GB total data. This could happen in some surprising
10367  cases, for example if you had an index with more than 260M documents and a
10368  VAR_INT field.  (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
10369
10370* LUCENE-4775: Remove SegmentInfo.sizeInBytes() and make
10371  MergePolicy.OneMerge.totalBytesSize thread safe (Josh Bronson via
10372  Robert Muir, Mike McCandless)
10373
10374* LUCENE-4770: If spatial's TermQueryPrefixTreeStrategy was used to search
10375  indexed non-point shapes, then there was an edge case where a query should
10376  find a shape but it didn't. The fix is the removal of an optimization that
10377  simplifies some leaf cells into a parent. The index data for such a field is
10378  now ~20% larger. This optimization is still done for the query shape, and for
10379  indexed data for RecursivePrefixTreeStrategy. Furthermore, this optimization
10380  is enhanced to roll up beyond the bottom cell level. (David Smiley,
10381  Florian Schilling)
10382
10383* LUCENE-4790: Fix FieldCacheImpl.getDocTermOrds to not bake deletes into the
10384  cached datastructure. Otherwise this can cause inconsistencies with readers
10385  at different points in time.  (Robert Muir)
10386
10387* LUCENE-4791: A conjunction of terms (ConjunctionTermScorer) scanned on
10388  the lowest frequency term instead of skipping, leading to potentially
10389  large performance impacts for many non-random or non-uniform
10390  term distributions.  (John Wang, yonik)
10391
10392* LUCENE-4798: PostingsHighlighter's formatter sometimes didn't highlight
10393  matched terms.  (Robert Muir)
10394
10395* LUCENE-4796, SOLR-4373: Fix concurrency issue in NamedSPILoader and
10396  AnalysisSPILoader when doing reload (e.g. from Solr).
10397  (Uwe Schindler, Hossman)
10398
10399* LUCENE-4802: Don't compute norms for drill-down facet fields. (Mike McCandless)
10400
10401* LUCENE-4804: PostingsHighlighter sometimes applied terms to the wrong passage,
10402  if they started exactly on a passage boundary.  (Robert Muir)
10403
10404Documentation
10405
10406* LUCENE-4718: Fixed documentation of oal.queryparser.classic.
10407  (Hayden Muhl via Adrien Grand)
10408
10409* LUCENE-4784, LUCENE-4785, LUCENE-4786: Fixed references to deprecated classes
10410  SinkTokenizer, ValueSourceQuery and RangeQuery. (Hao Zhong via Adrien Grand)
10411
10412Build
10413
10414* LUCENE-4654: Test duration statistics from multiple test runs should be
10415  reused. (Dawid Weiss)
10416
10417* LUCENE-4636: Upgrade ivy to 2.3.0 (Shawn Heisey via Robert Muir)
10418
10419* LUCENE-4570: Use the Policeman Forbidden API checker, released separately
10420  from Lucene and downloaded via Ivy.  (Uwe Schindler, Robert Muir)
10421
10422* LUCENE-4758: 'ant jar', 'ant compile', and 'ant compile-test' should
10423  recurse.  (Steve Rowe)
10424
10425======================= Lucene 4.1.0 =======================
10426
10427Changes in backwards compatibility policy
10428
10429* LUCENE-4514: Scorer's freq() method returns an integer value indicating
10430  the number of times the scorer matches the current document. Previously
10431  this was only sometimes the case, in some cases it returned a (meaningless)
10432  floating point value.  Scorer now extends DocsEnum so it has attributes().
10433  (Robert Muir)
10434
10435* LUCENE-4543: TFIDFSimilarity's index-time computeNorm is now final to
10436  match the fact that its query-time norm usage requires a FIXED_8 encoding.
10437  Override lengthNorm and/or encode/decodeNormValue to change the specifics,
10438  like Lucene 3.x. (Robert Muir)
10439
10440* LUCENE-3441: The facet module now supports NRT. As a result, the following
10441  changes were made:
10442  - DirectoryTaxonomyReader has a new constructor which takes a
10443    DirectoryTaxonomyWriter. You should use that constructor in order to get
10444    the NRT support (or the old one for non-NRT).
10445  - TaxonomyReader.refresh() removed in exchange for TaxonomyReader.openIfChanged
10446    static method. Similar to DirectoryReader, the method either returns null
10447    if no changes were made to the taxonomy, or a new TR instance otherwise.
10448    Instead of calling refresh(), you should write similar code to how you reopen
10449    a regular DirectoryReader.
10450  - TaxonomyReader.openIfChanged (previously refresh()) no longer throws
10451    InconsistentTaxonomyException, and supports recreate. InconsistentTaxoEx
10452    was removed.
10453  - ChildrenArrays was pulled out of TaxonomyReader into a top-level class.
10454  - TaxonomyReader was made an abstract class (instead of an interface), with
10455    methods such as close() and reference counting management pulled from
10456    DirectoryTaxonomyReader, and made final. The rest of the methods, remained
10457    abstract.
10458  (Shai Erera, Gilad Barkai)
10459
10460* LUCENE-4576: Remove CachingWrapperFilter(Filter, boolean). This recacheDeletes
10461  option gave less than 1% speedup at the expense of cache churn (filters were
10462  invalidated on reopen if even a single delete was posted against the segment).
10463  (Robert Muir)
10464
10465* LUCENE-4575: Replace IndexWriter's commit/prepareCommit versions that take
10466  commitData with setCommitData(). That allows committing changes to IndexWriter
10467  even if the commitData is the only thing that changes.
10468  (Shai Erera, Michael McCandless)
10469
10470* LUCENE-4565: TaxonomyReader.getParentArray and .getChildrenArrays consolidated
10471  into one getParallelTaxonomyArrays(). You can obtain the 3 arrays that the
10472  previous two methods returned by calling parents(), children() or siblings()
10473  on the returned ParallelTaxonomyArrays. (Shai Erera)
10474
10475* LUCENE-4585: Spatial PrefixTree based Strategies (either TermQuery or
10476  RecursivePrefix based) MAY want to re-index if used for point data. If a
10477  re-index is not done, then an indexed point is ~1/2 the smallest grid cell
10478  larger and as such is slightly more likely to match a query shape.
10479  (David Smiley)
10480
10481* LUCENE-4604: DefaultOrdinalPolicy removed in favor of OrdinalPolicy.ALL_PARENTS.
10482  Same for DefaultPathPolicy (now PathPolicy.ALL_CATEGORIES). In addition, you
10483  can use OrdinalPolicy.NO_PARENTS to never write any parent category ordinal
10484  to the fulltree posting payload (but note that you need a special
10485  FacetsAccumulator - see javadocs). (Shai Erera)
10486
10487* LUCENE-4594: Spatial PrefixTreeStrategy no longer indexes center points of
10488  non-point shapes.  If you want to call makeDistanceValueSource() based on
10489  shape centers, you need to do this yourself in another spatial field.
10490  (David Smiley)
10491
10492* LUCENE-4615: Replace IntArrayAllocator and FloatArrayAllocator by ArraysPool.
10493  FacetArrays no longer takes those allocators; if you need to reuse the arrays,
10494  you should use ReusingFacetArrays. (Shai Erera, Gilad Barkai)
10495
10496* LUCENE-4621: FacetIndexingParams is now a concrete class (instead of DefaultFIP).
10497  Also, the entire IndexingParams chain is now immutable. If you need to override
10498  a setting, you should extend the relevant class.
10499  Additionally, FacetSearchParams is now immutable, and requires all FacetRequests
10500  to specified at initialization time. (Shai Erera)
10501
10502* LUCENE-4647: CategoryDocumentBuilder and EnhancementsDocumentBuilder are replaced
10503  by FacetFields and AssociationsFacetFields respectively. CategoryEnhancement and
10504  AssociationEnhancement were removed in favor of a simplified CategoryAssociation
10505  interface, with CategoryIntAssociation and CategoryFloatAssociation
10506  implementations.
10507  NOTE: indexes that contain category enhancements/associations are not supported
10508  by the new code and should be recreated. (Shai Erera)
10509
10510* LUCENE-4659: Massive cleanup to CategoryPath API. Additionally, CategoryPath is
10511  now immutable, so you don't need to clone() it. (Shai Erera)
10512
10513* LUCENE-4670: StoredFieldsWriter and TermVectorsWriter have new finish* callbacks
10514  which are called after a doc/field/term has been completely added.
10515  (Adrien Grand, Robert Muir)
10516
10517* LUCENE-4620: IntEncoder/Decoder were changed to do bulk encoding/decoding. As a
10518  result, few other classes such as Aggregator and CategoryListIterator were
10519  changed to handle bulk category ordinals. (Shai Erera)
10520
10521* LUCENE-4683: CategoryListIterator and Aggregator are now per-segment. As such
10522  their implementations no longer take a top-level IndexReader in the constructor
10523  but rather implement a setNextReader. (Shai Erera)
10524
10525New Features
10526
10527* LUCENE-4226: New experimental StoredFieldsFormat that compresses chunks of
10528  documents together in order to improve the compression ratio. (Adrien Grand)
10529
10530* LUCENE-4426: New ValueSource implementations (in lucene/queries) for
10531  DocValues fields. (Adrien Grand)
10532
10533* LUCENE-4410: FilteredQuery now exposes a FilterStrategy that exposes
10534  how filters are applied during query execution. (Simon Willnauer)
10535
10536* LUCENE-4404: New ListOfOutputs (in lucene/misc) for FSTs wraps
10537  another Outputs implementation, allowing you to store more than one
10538  output for a single input.  UpToTwoPositiveIntsOutputs was moved
10539  from lucene/core to lucene/misc.  (Mike McCandless)
10540
10541* LUCENE-3842: New AnalyzingSuggester, for doing auto-suggest
10542  using an analyzer.  This can create powerful suggesters: if the analyzer
10543  remove stop words then "ghost chr..." could suggest "The Ghost of
10544  Christmas Past"; if SynonymFilter is used to map wifi and wireless
10545  network to hotspot, then "wirele..." could suggest "wifi router";
10546  token normalization likes stemmers, accent removal, etc. would allow
10547  the suggester to ignore such variations. (Robert Muir, Sudarshan
10548  Gaikaiwari, Mike McCandless)
10549
10550* LUCENE-4446: Lucene 4.1 has a new default index format (Lucene41Codec)
10551  that incorporates the previously experimental "Block" postings format
10552  for better search performance.
10553  (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
10554
10555* LUCENE-3846: New FuzzySuggester, like AnalyzingSuggester except it
10556  also finds completions allowing for fuzzy edits in the input string.
10557  (Robert Muir, Simon Willnauer, Mike McCandless)
10558
10559* LUCENE-4515: MemoryIndex now supports adding the same field multiple
10560  times. (Simon Willnauer)
10561
10562* LUCENE-4489: Added consumeAllTokens option to LimitTokenCountFilter
10563  (hossman, Robert Muir)
10564
10565* LUCENE-4566: Add NRT/SearcherManager.RefreshListener/addListener to
10566  be notified whenever a new searcher was opened. (selckin via Shai
10567  Erera, Mike McCandless)
10568
10569* SOLR-4123: Add per-script customizability to ICUTokenizerFactory via
10570  rule files in the ICU RuleBasedBreakIterator format.
10571  (Shawn Heisey, Robert Muir, Steve Rowe)
10572
10573* LUCENE-4590: Added WriteEnwikiLineDocTask - a benchmark task for writing
10574  Wikipedia category pages and non-category pages into separate line files.
10575  extractWikipedia.alg was changed to use this task, so now it creates two
10576  files. (Doron Cohen)
10577
10578* LUCENE-4290: Added PostingsHighlighter to the highlighter module. It uses
10579  offsets from the postings lists to highlight documents. (Robert Muir)
10580
10581* LUCENE-4628: Added CommonTermsQuery that executes high-frequency terms
10582  in a optional sub-query to prevent slow queries due to "common" terms
10583  like stopwords. (Simon Willnauer)
10584
10585API Changes
10586
10587* LUCENE-4399: Deprecated AppendingCodec. Lucene's term dictionaries
10588  no longer seek when writing.  (Adrien Grand, Robert Muir)
10589
10590* LUCENE-4479: Rename TokenStream.getTokenStream(IndexReader, int, String)
10591  to TokenStream.getTokenStreamWithOffsets, and return null on failure
10592  rather than throwing IllegalArgumentException.  (Alan Woodward)
10593
10594* LUCENE-4472: MergePolicy now accepts a MergeTrigger that provides
10595  information about the trigger of the merge ie. merge triggered due
10596  to a segment merge or a full flush etc. (Simon Willnauer)
10597
10598* LUCENE-4415: TermsFilter is now immutable. All terms need to be provided
10599  as constructor argument. (Simon Willnauer)
10600
10601* LUCENE-4520: ValueSource.getSortField no longer throws IOExceptions
10602  (Alan Woodward)
10603
10604* LUCENE-4537: RateLimiter is now separated from FSDirectory and exposed via
10605  RateLimitingDirectoryWrapper. Any Directory can now be rate-limited.
10606  (Simon Willnauer)
10607
10608* LUCENE-4591: CompressingStoredFields{Writer,Reader} now accept a segment
10609  suffix as a constructor parameter. (Renaud Delbru via Adrien Grand)
10610
10611* LUCENE-4605: Added DocsEnum.FLAG_NONE which can be passed instead of 0 as
10612  the flag to .docs() and .docsAndPositions(). (Shai Erera)
10613
10614* LUCENE-4617: Remove FST.pack() method. Previously to make a packed FST,
10615  you had to make a Builder with willPackFST=true (telling it you will later pack it),
10616  create your fst with finish(), and then call pack() to get another FST.
10617  Instead just pass true for doPackFST to Builder and finish() returns a packed FST.
10618  (Robert Muir)
10619
10620* LUCENE-4663: Deprecate IndexSearcher.document(int, Set). This was not intended
10621  to be final, nor named document(). Use IndexSearcher.doc(int, Set) instead.
10622  (Robert Muir)
10623
10624* LUCENE-4684: Made DirectSpellChecker extendable.
10625  (Martijn van Groningen)
10626
10627Bug Fixes
10628
10629* LUCENE-1822: BaseFragListBuilder hard-coded 6 char margin is too naive.
10630  (Alex Vigdor, Arcadius Ahouansou, Koji Sekiguchi)
10631
10632* LUCENE-4468: Fix rareish integer overflows in Lucene41 postings
10633  format. (Robert Muir)
10634
10635* LUCENE-4486: Add support for ConstantScoreQuery in Highlighter.
10636 (Simon Willnauer)
10637
10638* LUCENE-4485: When CheckIndex terms, terms/docs pairs and tokens,
10639  these counts now all exclude deleted documents.  (Mike McCandless)
10640
10641* LUCENE-4479: Highlighter works correctly for fields with term vector
10642  positions, but no offsets.  (Alan Woodward)
10643
10644* SOLR-3906: JapaneseReadingFormFilter in romaji mode will return
10645  romaji even for out-of-vocabulary kana cases (e.g. half-width forms).
10646  (Robert Muir)
10647
10648* LUCENE-4511: TermsFilter might return wrong results if a field is not
10649  indexed or doesn't exist in the index. (Simon Willnauer)
10650
10651* LUCENE-4521: IndexWriter.tryDeleteDocument could return true
10652  (successfully deleting the document) but then on IndexWriter
10653  close/commit fail to write the new deletions, if no other changes
10654  happened in the IndexWriter instance.  (Ivan Vasilev via Mike
10655  McCandless)
10656
10657* LUCENE-4513: Fixed that deleted nested docs are scored into the
10658  parent doc when using ToParentBlockJoinQuery. (Martijn van Groningen)
10659
10660* LUCENE-4534: Fixed WFSTCompletionLookup and Analyzing/FuzzySuggester
10661  to allow 0 byte values in the lookup keys.  (Mike McCandless)
10662
10663* LUCENE-4532: DirectoryTaxonomyWriter use a timestamp to denote taxonomy
10664  index re-creation, which could cause a bug in case machine clocks were
10665  not synced. Instead, it now tracks an 'epoch' version, which is incremented
10666  whenever the taxonomy is re-created, or replaced. (Shai Erera)
10667
10668* LUCENE-4544: Fixed off-by-1 in ConcurrentMergeScheduler that would
10669  allow 1+maxMergeCount merges threads to be created, instead of just
10670  maxMergeCount (Radim Kolar, Mike McCandless)
10671
10672* LUCENE-4567: Fixed NullPointerException in analyzing, fuzzy, and
10673  WFST suggesters when no suggestions were added (selckin via Mike
10674  McCandless)
10675
10676* LUCENE-4568: Fixed integer overflow in
10677  PagedBytes.PagedBytesData{In,Out}put.getPosition. (Adrien Grand)
10678
10679* LUCENE-4581: GroupingSearch.setAllGroups(true) was failing to
10680  actually compute allMatchingGroups (dizh@neusoft.com via Mike
10681  McCandless)
10682
10683* LUCENE-4009: Improve TermsFilter.toString (Tim Costermans via Chris
10684  Male, Mike McCandless)
10685
10686* LUCENE-4588: Benchmark's EnwikiContentSource was discarding last wiki
10687  document and had leaking threads in 'forever' mode. (Doron Cohen)
10688
10689* LUCENE-4585: Spatial RecursivePrefixTreeFilter had some bugs that only
10690  occurred when shapes were indexed.  In what appears to be rare circumstances,
10691  documents with shapes near a query shape were erroneously considered a match.
10692  In addition, it wasn't possible to index a shape representing the entire
10693  globe.
10694
10695* LUCENE-4595: EnwikiContentSource had a thread safety problem (NPE) in
10696  'forever' mode (Doron Cohen)
10697
10698* LUCENE-4587: fix WordBreakSpellChecker to not throw AIOOBE when presented
10699  with 2-char codepoints, and to correctly break/combine terms containing
10700  non-latin characters. (James Dyer, Andreas Hubold)
10701
10702* LUCENE-4596: fix a concurrency bug in DirectoryTaxonomyWriter.
10703  (Shai Erera)
10704
10705* LUCENE-4594: Spatial PrefixTreeStrategy would index center-points in addition
10706  to the shape to index if it was non-point, in the same field.  But sometimes
10707  the center-point isn't actually in the shape (consider a LineString), and for
10708  highly precise shapes it could cause makeDistanceValueSource's cache to load
10709  parts of the shape's boundary erroneously too.  So center points aren't
10710  indexed any more; you should use another spatial field. (David Smiley)
10711
10712* LUCENE-4629: IndexWriter misses to delete documents if a document block is
10713  indexed and the Iterator throws an exception. Documents were only rolled back
10714  if the actual indexing process failed. (Simon Willnauer)
10715
10716* LUCENE-4608: Handle large number of requested fragments better.
10717  (Martijn van Groningen)
10718
10719* LUCENE-4633: DirectoryTaxonomyWriter.replaceTaxonomy did not refresh its
10720  internal reader, which could cause an existing category to be added twice.
10721  (Shai Erera)
10722
10723* LUCENE-4461: If you added the same FacetRequest more than once, you would get
10724  inconsistent results. (Gilad Barkai via Shai Erera)
10725
10726* LUCENE-4656: Fix regression in IndexWriter to work with empty TokenStreams
10727  that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute),
10728  e.g., oal.analysis.miscellaneous.EmptyTokenStream.
10729  (Uwe Schindler, Adrien Grand, Robert Muir)
10730
10731* LUCENE-4660: ConcurrentMergeScheduler was taking too long to
10732  un-pause incoming threads it had paused when too many merges were
10733  queued up. (Mike McCandless)
10734
10735* LUCENE-4662: Add missing elided articles and prepositions to FrenchAnalyzer's
10736  DEFAULT_ARTICLES list passed to ElisionFilter.  (David Leunen via Steve Rowe)
10737
10738* LUCENE-4671: Fix CharsRef.subSequence method.  (Tim Smith via Robert Muir)
10739
10740* LUCENE-4465: Let ConstantScoreQuery's Scorer return its child scorer.
10741  (selckin via Uwe Schindler)
10742
10743Changes in Runtime Behavior
10744
10745* LUCENE-4586: Change default ResultMode of FacetRequest to PER_NODE_IN_TREE.
10746  This only affects requests with depth>1. If you execute such requests and
10747  rely on the facet results being returned flat (i.e. no hierarchy), you should
10748  set the ResultMode to GLOBAL_FLAT. (Shai Erera, Gilad Barkai)
10749
10750* LUCENE-1822: Improves the text window selection by recalculating the starting margin
10751  once all phrases in the fragment have been identified in FastVectorHighlighter. This
10752  way if a single word is matched in a fragment, it will appear in the middle of the highlight,
10753  instead of 6 characters from the beginning. This way one can also guarantee that
10754  the entirety of short texts are represented in a fragment by specifying a large
10755  enough fragCharSize.
10756
10757Optimizations
10758
10759* LUCENE-2221: oal.util.BitUtil was modified to use Long.bitCount and
10760  Long.numberOfTrailingZeros (which are intrinsics since Java 6u18) instead of
10761  pure java bit twiddling routines in order to improve performance on modern
10762  JVMs/hardware. (Dawid Weiss, Adrien Grand)
10763
10764* LUCENE-4509: Enable stored fields compression by default in the Lucene 4.1
10765  default codec. (Adrien Grand)
10766
10767* LUCENE-4536: PackedInts on-disk format is now byte-aligned (it used to be
10768  long-aligned), saving up to 7 bytes per array of values.
10769  (Adrien Grand, Mike McCandless)
10770
10771* LUCENE-4512: Additional memory savings for CompressingStoredFieldsFormat.
10772  (Adrien Grand, Robert Muir)
10773
10774* LUCENE-4443: Lucene41PostingsFormat no longer writes unnecessary offsets
10775  into the skipdata. (Robert Muir)
10776
10777* LUCENE-4459: Improve WeakIdentityMap.keyIterator() to remove GCed keys
10778  from backing map early instead of waiting for reap(). This makes test
10779  failures in TestWeakIdentityMap disappear, too.
10780  (Uwe Schindler, Mike McCandless, Robert Muir)
10781
10782* LUCENE-4473: Lucene41PostingsFormat encodes offsets more efficiently
10783  for low frequency terms (< 128 occurrences).  (Robert Muir)
10784
10785* LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds
10786  CFS files if necessary during segment flush and not during publishing. The latter
10787  was a single threaded process while now all IO and CPU heavy computation is done
10788  concurrently in DocumentsWriterPerThread. (Simon Willnauer)
10789
10790* LUCENE-4496: Optimize Lucene41PostingsFormat when requesting a subset of
10791  the postings data (via flags to TermsEnum.docs/docsAndPositions) to use
10792  ForUtil.skipBlock.  (Robert Muir)
10793
10794* LUCENE-4497: Don't write PosVIntCount to the positions file in
10795  Lucene41PostingsFormat, as it's always totalTermFreq % BLOCK_SIZE. (Robert Muir)
10796
10797* LUCENE-4498: In Lucene41PostingsFormat, when a term appears in only one document,
10798  Instead of writing a file pointer to a VIntBlock containing the doc id, just
10799  write the doc id.  (Mike McCandless, Robert Muir)
10800
10801* LUCENE-4515: MemoryIndex now uses Byte/IntBlockPool internally to hold terms and
10802  posting lists. All index data is represented as consecutive byte/int arrays to
10803  reduce GC cost and memory overhead. (Simon Willnauer)
10804
10805* LUCENE-4538: DocValues now caches direct sources in a ThreadLocal exposed via SourceCache.
10806  Users of this API can now simply obtain an instance via DocValues#getDirectSource per thread.
10807  (Simon Willnauer)
10808
10809* LUCENE-4580: DrillDown.query variants return a ConstantScoreQuery with boost set to 0.0f
10810  so that documents scores are not affected by running a drill-down query. (Shai Erera)
10811
10812* LUCENE-4598: PayloadIterator no longer uses top-level IndexReader to iterate on the
10813  posting's payload. (Shai Erera, Michael McCandless)
10814
10815* LUCENE-4661: Drop default maxThreadCount to 1 and maxMergeCount to 2
10816  in ConcurrentMergeScheduler, for faster merge performance on
10817  spinning-magnet drives (Mike McCandless)
10818
10819Documentation
10820
10821* LUCENE-4483: Refer to BytesRef.deepCopyOf in Term's constructor that takes BytesRef.
10822  (Paul Elschot via Robert Muir)
10823
10824Build
10825
10826* LUCENE-4650: Upgrade randomized testing to version 2.0.8: make the
10827  test framework more robust under low memory conditions. (Dawid Weiss)
10828
10829* LUCENE-4603: Upgrade randomized testing to version 2.0.5: print forked
10830  JVM PIDs on heartbeat from hung tests (Dawid Weiss)
10831
10832* Upgrade randomized testing to version 2.0.4: avoid hangs on shutdown
10833  hooks hanging forever by calling Runtime.halt() in addition to
10834  Runtime.exit() after a short delay to allow graceful shutdown (Dawid Weiss)
10835
10836* LUCENE-4451: Memory leak per unique thread caused by
10837  RandomizedContext.contexts static map. Upgrade randomized testing
10838  to version 2.0.2 (Mike McCandless, Dawid Weiss)
10839
10840* LUCENE-4589: Upgraded benchmark module's Nekohtml dependency to version
10841  1.9.17, removing the workaround in Lucene's HTML parser for the
10842  Turkish locale.  (Uwe Schindler)
10843
10844* LUCENE-4601: Fix ivy availability check to use typefound, so it works
10845  if called from another build file.  (Ryan Ernst via Robert Muir)
10846
10847
10848======================= Lucene 4.0.0 =======================
10849
10850Changes in backwards compatibility policy
10851
10852* LUCENE-4392: Class org.apache.lucene.util.SortedVIntList has been removed.
10853  (Adrien Grand)
10854
10855* LUCENE-4393: RollingCharBuffer has been moved to the o.a.l.analysis.util
10856  package of lucene-analysis-common. (Adrien Grand)
10857
10858New Features
10859
10860* LUCENE-1888: Added the option to store payloads in the term
10861  vectors (IndexableFieldType.storeTermVectorPayloads()). Note
10862  that you must store term vector positions to store payloads.
10863  (Robert Muir)
10864
10865* LUCENE-3892: Add a new BlockPostingsFormat that bulk-encodes docs,
10866  freqs and positions in large (size 128) packed-int blocks for faster
10867  search performance.  This was from Han Jiang's 2012 Google Summer of
10868  Code project (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
10869
10870* LUCENE-4323: Added support for an absolute maximum CFS segment size
10871  (in MiB) to LogMergePolicy and TieredMergePolicy.
10872  (Alexey Lef via Uwe Schindler)
10873
10874* LUCENE-4339: Allow deletes against 3.x segments for easier upgrading.
10875  Lucene3x Codec is still otherwise read-only, you should not set it
10876  as the default Codec on IndexWriter, because it cannot write new segments.
10877  (Mike McCandless, Robert Muir)
10878
10879* SOLR-3441: ElisionFilterFactory is now MultiTermAware
10880  (Jack Krupansky via hossman)
10881
10882API Changes
10883
10884* LUCENE-4391, LUCENE-4440: All methods of Lucene40Codec but
10885  getPostingsFormatForField are now final. To reuse functionality
10886  of Lucene40, you should extend FilterCodec and delegate to Lucene40
10887  instead of extending Lucene40Codec.  (Adrien Grand, Shai Erera,
10888  Robert Muir, Uwe Schindler)
10889
10890* LUCENE-4299: Added Terms.hasPositions() and Terms.hasOffsets().
10891  Previously you had no real way to know that a term vector field
10892  had positions or offsets, since this can be configured on a
10893  per-field-per-document basis. (Robert Muir)
10894
10895* Removed DocsAndPositionsEnum.hasPayload() and simplified the
10896  contract of getPayload(). It returns null if there is no payload,
10897  otherwise returns the current payload. You can now call it multiple
10898  times per position if you want. (Robert Muir)
10899
10900* Removed FieldsEnum. Fields API instead implements Iterable<String>
10901  and exposes Iterator, so you can iterate over field names with
10902  for (String field : fields) instead.  (Robert Muir)
10903
10904* LUCENE-4152: added IndexReader.leaves(), which lets you enumerate
10905  the leaf atomic reader contexts for all readers in the tree.
10906  (Uwe Schindler, Robert Muir)
10907
10908* LUCENE-4304: removed PayloadProcessorProvider. If you want to change
10909  payloads (or other things) when merging indexes, it's recommended
10910  to just use a FilterAtomicReader + IndexWriter.addIndexes. See the
10911  OrdinalMappingAtomicReader and TaxonomyMergeUtils in the facets
10912  module if you want an example of this.
10913  (Mike McCandless, Uwe Schindler, Shai Erera, Robert Muir)
10914
10915* LUCENE-4304: Make CompositeReader.getSequentialSubReaders()
10916  protected. To get atomic leaves of any IndexReader use the new method
10917  leaves() (LUCENE-4152), which lists AtomicReaderContexts including
10918  the doc base of each leaf.  (Uwe Schindler, Robert Muir)
10919
10920* LUCENE-4307: Renamed IndexReader.getTopReaderContext to
10921  IndexReader.getContext.  (Robert Muir)
10922
10923* LUCENE-4316: Deprecate Fields.getUniqueTermCount and remove it from
10924  AtomicReader. If you really want the unique term count across all
10925  fields, just sum up Terms.size() across those fields. This method
10926  only exists so that this statistic can be accessed for Lucene 3.x
10927  segments, which don't support Terms.size().  (Uwe Schindler, Robert Muir)
10928
10929* LUCENE-4321: Change CharFilter to extend Reader directly, as FilterReader
10930  overdelegates (read(), read(char[], int, int), skip, etc). This made it
10931  hard to implement CharFilters that were correct. Instead only close() is
10932  delegated by default: read(char[], int, int) and correct(int) are abstract
10933  so that it's obvious which methods you should implement.  The protected
10934  inner Reader is 'input' like CharFilter in the 3.x series, instead of 'in'.
10935  (Dawid Weiss, Uwe Schindler, Robert Muir)
10936
10937* LUCENE-3309: The expert FieldSelector API, used to load only certain
10938  fields in a stored document, has been replaced with the simpler
10939  StoredFieldVisitor API.  (Mike McCandless)
10940
10941* LUCENE-4343: Made Tokenizer.setReader final. This is a setter that should
10942  not be overridden by subclasses: per-stream initialization should happen
10943  in reset().  (Robert Muir)
10944
10945* LUCENE-4377: Remove IndexInput.copyBytes(IndexOutput, long).
10946  Use DataOutput.copyBytes(DataInput, long) instead.
10947  (Mike McCandless, Robert Muir)
10948
10949* LUCENE-4355: Simplify AtomicReader's sugar methods such as termDocsEnum,
10950  termPositionsEnum, docFreq, and totalTermFreq to only take Term as a
10951  parameter. If you want to do expert things such as pass a different
10952  Bits as liveDocs, then use the flex apis (fields(), terms(), etc) directly.
10953  (Mike McCandless, Robert Muir)
10954
10955* LUCENE-4425: clarify documentation of StoredFieldVisitor.binaryValue
10956  and simplify the api to binaryField(FieldInfo, byte[]).
10957  (Adrien Grand, Robert Muir)
10958
10959Bug Fixes
10960
10961* LUCENE-4423: DocumentStoredFieldVisitor.binaryField ignored offset and
10962  length. (Adrien Grand)
10963
10964* LUCENE-4297: BooleanScorer2 would multiply the coord() factor
10965  twice for conjunctions: for most users this is no problem, but
10966  if you had a customized Similarity that returned something other
10967  than 1 when overlap == maxOverlap (always the case for conjunctions),
10968  then the score would be incorrect.  (Pascal Chollet, Robert Muir)
10969
10970* LUCENE-4298: MultiFields.getTermDocsEnum(IndexReader, Bits, String, BytesRef)
10971  did not work at all, it would infinitely recurse.
10972  (Alberto Paro via Robert Muir)
10973
10974* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you
10975  had a custom Similarity where coord(1,1) != 1F, then the rewritten
10976  query would be scored differently.  (Robert Muir)
10977
10978* Don't allow negatives in the positions file. If you have an index
10979  from 2.4.0 or earlier with such negative positions, and you already
10980  upgraded to 3.x, then to Lucene 4.0-ALPHA or -BETA, you should run
10981  CheckIndex. If it fails, then you need to upgrade again to 4.0  (Robert Muir)
10982
10983* LUCENE-4303: PhoneticFilterFactory and SnowballPorterFilterFactory load their
10984  encoders / stemmers via the ResourceLoader now instead of Class.forName().
10985  Solr users should now no longer have to embed these in its war. (David Smiley)
10986
10987* SOLR-3737: StempelPolishStemFilterFactory loaded its stemmer table incorrectly.
10988  Also, ensure immutability and use only one instance of this table in RAM (lazy
10989  loaded) since it's quite large. (sausarkar, Steven Rowe, Robert Muir)
10990
10991* LUCENE-4310: MappingCharFilter was failing to match input strings
10992  containing non-BMP Unicode characters.  (Dawid Weiss, Robert Muir,
10993  Mike McCandless)
10994
10995* LUCENE-4224: Add in-order scorer to query time joining and the
10996  out-of-order scorer throws an UOE. (Martijn van Groningen, Robert Muir)
10997
10998* LUCENE-4333: Fixed NPE in TermGroupFacetCollector when faceting on mv fields.
10999  (Jesse MacVicar, Martijn van Groningen)
11000
11001* LUCENE-4218: Document.get(String) and Field.stringValue() again return
11002  values for numeric fields, like Lucene 3.x and consistent with the documentation.
11003  (Jamie, Uwe Schindler, Robert Muir)
11004
11005* NRTCachingDirectory was always caching a newly flushed segment in
11006  RAM, instead of checking the estimated size of the segment
11007  to decide whether to cache it. (Mike McCandless)
11008
11009* LUCENE-3720: fix memory-consumption issues with BeiderMorseFilter.
11010  (Thomas Neidhart via Robert Muir)
11011
11012* LUCENE-4401: Fix bug where DisjunctionSumScorer would sometimes call score()
11013  on a subscorer that had already returned NO_MORE_DOCS.  (Liu Chao, Robert Muir)
11014
11015* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth
11016  parameter is reset to the default (1), even if set otherwise.
11017  (Gilad Barkai via Shai Erera)
11018
11019* LUCENE-4455: Fix bug in SegmentInfoPerCommit.sizeInBytes() that was
11020  returning 2X the true size, inefficiently.  Also fixed bug in
11021  CheckIndex that would report no deletions when a segment has
11022  deletions, and vice/versa.  (Uwe Schindler, Robert Muir, Mike McCandless)
11023
11024* LUCENE-4456: Fixed double-counting sizeInBytes for a segment
11025  (affects how merge policies pick merges); fixed CheckIndex's
11026  incorrect reporting of whether a segment has deletions; fixed case
11027  where on abort Lucene could remove files it didn't create; fixed
11028  many cases where IndexWriter could leave leftover files (on
11029  exception in various places, on reuse of a segment name after crash
11030  and recovery.  (Uwe Schindler, Robert Muir, Mike McCandless)
11031
11032Optimizations
11033
11034* LUCENE-4322: Decrease lucene-core JAR size. The core JAR size had increased a
11035  lot because of generated code introduced in LUCENE-4161 and LUCENE-3892.
11036  (Adrien Grand)
11037
11038* LUCENE-4317: Improve reuse of internal TokenStreams and StringReader
11039  in oal.document.Field.  (Uwe Schindler, Chris Male, Robert Muir)
11040
11041* LUCENE-4327: Support out-of-order scoring in FilteredQuery for higher
11042  performance.  (Mike McCandless, Robert Muir)
11043
11044* LUCENE-4364: Optimize MMapDirectory to not make a mapping per-cfs-slice,
11045  instead one map per .cfs file. This reduces the total number of maps.
11046  Additionally factor out a (package-private) generic
11047  ByteBufferIndexInput from MMapDirectory.  (Uwe Schindler, Robert Muir)
11048
11049Build
11050
11051* LUCENE-4406, LUCENE-4407: Upgrade to randomizedtesting 2.0.1.
11052  Workaround for broken test output XMLs due to non-XML text unicode
11053  chars in strings. Added printing of failed tests at the end of a
11054  test run (Dawid Weiss)
11055
11056* LUCENE-4252: Detect/Fail tests when they leak RAM in static fields
11057  (Robert Muir, Dawid Weiss)
11058
11059* LUCENE-4360: Support running the same test suite multiple times in
11060  parallel (Dawid Weiss)
11061
11062* LUCENE-3985: Upgrade to randomizedtesting 2.0.0. Added support for
11063  thread leak detection. Added support for suite timeouts. (Dawid Weiss)
11064
11065* LUCENE-4354: Corrected maven dependencies to be consistent with
11066  the licenses/ folder and the binary release. Some had different
11067  versions or additional unnecessary dependencies. (selckin via Robert Muir)
11068
11069* LUCENE-4340: Move all non-default codec, postings format and terms
11070  dictionary implementations to lucene/codecs. (Adrien Grand)
11071
11072Documentation
11073
11074* LUCENE-4302: Fix facet userguide to have HTML loose doctype like
11075  all other javadocs.  (Karl Nicholas via Uwe Schindler)
11076
11077======================= Lucene 4.0.0-BETA =======================
11078
11079New features
11080
11081* LUCENE-4249: Changed the explanation of the PayloadTermWeight to use the
11082  underlying PayloadFunction's explanation as the explanation
11083  for the payload score. (Scott Smerchek via Robert Muir)
11084
11085* LUCENE-4069: Added BloomFilteringPostingsFormat for use with low-frequency terms
11086  such as primary keys (Mark Harwood, Mike McCandless)
11087
11088* LUCENE-4201: Added JapaneseIterationMarkCharFilter to normalize Japanese
11089  iteration marks. (Robert Muir, Christian Moen)
11090
11091* LUCENE-3832: Added BasicAutomata.makeStringUnion method to efficiently
11092  create automata from a fixed collection of UTF-8 encoded BytesRef
11093  (Dawid Weiss, Robert Muir)
11094
11095* LUCENE-4153: Added option to fast vector highlighting via BaseFragmentsBuilder to
11096  respect field boundaries in the case of highlighting for multivalued fields.
11097  (Martijn van Groningen)
11098
11099* LUCENE-4227: Added DirectPostingsFormat, to hold all postings in
11100  memory as uncompressed simple arrays.  This uses a tremendous amount
11101  of RAM but gives good search performance gains.  (Mike McCandless)
11102
11103* LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and
11104  CharFilterFactories to the lucene-analysis module. The API is still
11105  experimental.  (Chris Male, Robert Muir, Uwe Schindler)
11106
11107* LUCENE-4230: When pulling a DocsAndPositionsEnum you can now
11108  specify whether or not you require payloads (in addition to
11109  offsets); turning one or both off may allow some codec
11110  implementations to optimize the enum implementation.  (Robert Muir,
11111  Mike McCandless)
11112
11113* LUCENE-4203: Add IndexWriter.tryDeleteDocument(AtomicReader reader,
11114  int docID), to attempt deletion by docID as long as the provided
11115  reader is an NRT reader, and the segment has not yet been merged
11116  away (Mike McCandless).
11117
11118* LUCENE-4286: Added option to CJKBigramFilter to always also output
11119  unigrams. This can be used for a unigram+bigram approach, or at
11120  index-time only for better support of short queries.
11121  (Tom Burton-West, Robert Muir)
11122
11123API Changes
11124
11125* LUCENE-4138: update of morfologik (Polish morphological analyzer) to 1.5.3.
11126  The tag attribute class has been renamed to MorphosyntacticTagsAttribute and
11127  has a different API (carries a list of tags instead of a compound tag). Upgrade
11128  of embedded morfologik dictionaries to version 1.9. (Dawid Weiss)
11129
11130* LUCENE-4178: set 'tokenized' to true on FieldType by default, so that if you
11131  make a custom FieldType and set indexed = true, it's analyzed by the analyzer.
11132  (Robert Muir)
11133
11134* LUCENE-4220: Removed the buggy JavaCC-based HTML parser in the benchmark
11135  module and replaced by NekoHTML. HTMLParser interface was cleaned up while
11136  changing method signatures.  (Uwe Schindler, Robert Muir)
11137
11138* LUCENE-2191: Rename Tokenizer.reset(Reader) to Tokenizer.setReader(Reader).
11139  The purpose of this method was always to set a new Reader on the Tokenizer,
11140  reusing the object. But the name was often confused with TokenStream.reset().
11141  (Robert Muir)
11142
11143* LUCENE-4228: Refactored CharFilter to extend java.io.FilterReader. CharFilters
11144  filter another reader and you override correct() for offset correction.
11145  (Robert Muir)
11146
11147* LUCENE-4240: Analyzer api now just takes fieldName for getOffsetGap. If the
11148  field is not analyzed (e.g. StringField), then the analyzer is not invoked
11149  at all. If you want to tweak things like positionIncrementGap and offsetGap,
11150  analyze the field with KeywordTokenizer instead.  (Grant Ingersoll, Robert Muir)
11151
11152* LUCENE-4250: Pass fieldName to the PayloadFunction explain method, so it
11153  parallels with docScore and the default implementation is correct.
11154  (Robert Muir)
11155
11156* LUCENE-3747: Support Unicode 6.1.0. (Steve Rowe)
11157
11158* LUCENE-3884: Moved ElisionFilter out of org.apache.lucene.analysis.fr
11159  package into org.apache.lucene.analysis.util.  (Robert Muir)
11160
11161* LUCENE-4230: When pulling a DocsAndPositionsEnum you now pass an int
11162  flags instead of the previous boolean needOffsets.  Currently
11163  recognized flags are DocsAndPositionsEnum.FLAG_PAYLOADS and
11164  DocsAndPositionsEnum.FLAG_OFFSETS (Robert Muir, Mike McCandless)
11165
11166* LUCENE-4273: When pulling a DocsEnum, you can pass an int flags
11167  instead of the previous boolean needsFlags; consistent with the changes
11168  for DocsAndPositionsEnum in LUCENE-4230. Currently the only flag
11169  is DocsEnum.FLAG_FREQS. (Robert Muir, Mike McCandless)
11170
11171* LUCENE-3616: TextField(String, Reader, Store) was reduced to TextField(String, Reader),
11172  as the Store parameter didn't make sense: if you supplied Store.YES, you would only
11173  receive an exception anyway. (Robert Muir)
11174
11175Optimizations
11176
11177* LUCENE-4171: Performance improvements to Packed64.
11178  (Toke Eskildsen via Adrien Grand)
11179
11180* LUCENE-4184: Performance improvements to the aligned packed bits impl.
11181  (Toke Eskildsen, Adrien Grand)
11182
11183* LUCENE-4235: Remove enforcing of Filter rewrite for NRQ queries.
11184  (Uwe Schindler)
11185
11186* LUCENE-4279: Regenerated snowball Stemmers from snowball r554,
11187  making them substantially more lightweight. Behavior is unchanged.
11188  (Robert Muir)
11189
11190* LUCENE-4291: Reduced internal buffer size for Jflex-based tokenizers
11191  such as StandardTokenizer from 32kb to 8kb.
11192  (Raintung Li, Steven Rowe, Robert Muir)
11193
11194Bug Fixes
11195
11196* LUCENE-4109: BooleanQueries are not parsed correctly with the
11197  flexible query parser. (Karsten Rauch via Robert Muir)
11198
11199* LUCENE-4176: Fix AnalyzingQueryParser to analyze range endpoints as bytes,
11200  so that it works correctly with Analyzers that produce binary non-UTF-8 terms
11201  such as CollationAnalyzer. (Nattapong Sirilappanich via Robert Muir)
11202
11203* LUCENE-4209: Fix FSTCompletionLookup to close its sorter, so that it won't
11204  leave temp files behind in /tmp. Fix SortedTermFreqIteratorWrapper to not
11205  leave temp files behind in /tmp on Windows. Fix Sort to not leave
11206  temp files behind when /tmp is a separate volume. (Uwe Schindler, Robert Muir)
11207
11208* LUCENE-4221: Fix overeager CheckIndex validation for term vector offsets.
11209  (Robert Muir)
11210
11211* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the
11212  size in bytes not MB (Chris Fuller via Mike McCandless)
11213
11214* LUCENE-3505: Fix bug (Lucene 4.0alpha only) where boolean conjunctions
11215  were sometimes scored incorrectly. Conjunctions of only termqueries where
11216  at least one term omitted term frequencies (IndexOptions.DOCS_ONLY) would
11217  be scored as if all terms omitted term frequencies.  (Robert Muir)
11218
11219* LUCENE-2686, LUCENE-3505: Fixed BooleanQuery scorers to return correct
11220  freq().  Added support for scorer navigation API (Scorer.getChildren) to
11221  all queries.  Made Scorer.freq() abstract.
11222  (Koji Sekiguchi, Mike McCandless, Robert Muir)
11223
11224* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest,
11225  and the number of matching documents is too large. (Gilad Barkai via Shai Erera)
11226
11227* LUCENE-4245: Make IndexWriter#close() and MergeScheduler#close()
11228  non-interruptible.  (Mark Miller, Uwe Schindler)
11229
11230* LUCENE-4190: restrict allowed filenames that a codec may create to
11231  the patterns recognized by IndexFileNames.  This also fixes
11232  IndexWriter to only delete files matching this pattern from an index
11233  directory, to reduce risk when the wrong index path is accidentally
11234  passed to IndexWriter (Robert Muir, Mike McCandless)
11235
11236* LUCENE-4277: Fix IndexWriter deadlock during rollback if flushable DWPT
11237  instance are already checked out and queued up but not yet flushed.
11238  (Simon Willnauer)
11239
11240* LUCENE-4282: Automaton FuzzyQuery didn't always deliver all results.
11241  (Johannes Christen, Uwe Schindler, Robert Muir)
11242
11243* LUCENE-4289: Fix minor idf inconsistencies/inefficiencies in highlighter.
11244  (Robert Muir)
11245
11246Changes in Runtime Behavior
11247
11248* LUCENE-4109: Enable position increments in the flexible queryparser by default.
11249  (Karsten Rauch via Robert Muir)
11250
11251* LUCENE-3616: Field throws exception if you try to set a boost on an
11252  unindexed field or one that omits norms. (Robert Muir)
11253
11254Build
11255
11256* LUCENE-4094: Support overriding file.encoding on forked test JVMs
11257  (force via -Drandomized.file.encoding=XXX). (Dawid Weiss)
11258
11259* LUCENE-4189: Test output should include timestamps (start/end for each
11260  test/ suite). Added -Dtests.timestamps=[off by default]. (Dawid Weiss)
11261
11262* LUCENE-4110: Report long periods of forked jvm inactivity (hung tests/ suites).
11263  Added -Dtests.heartbeat=[seconds] with the default of 60 seconds.
11264  (Dawid Weiss)
11265
11266* LUCENE-4160: Added a property to quit the tests after a given
11267  number of failures has occurred. This is useful in combination
11268  with -Dtests.iters=N (you can start N iterations and wait for M
11269  failures, in particular M = 1). -Dtests.maxfailures=M. Alternatively,
11270  specify -Dtests.failfast=true to skip all tests after the first failure.
11271  (Dawid Weiss)
11272
11273* LUCENE-4115: JAR resolution/ cleanup should be done automatically for ant
11274  clean/ eclipse/ resolve (Dawid Weiss)
11275
11276* LUCENE-4199, LUCENE-4202, LUCENE-4206: Add a new target "check-forbidden-apis"
11277  that parses all generated .class files for use of APIs that use default
11278  charset, default locale, or default timezone and fail build if violations
11279  found. This ensures, that Lucene / Solr is independent on local configuration
11280  options.  (Uwe Schindler, Robert Muir, Dawid Weiss)
11281
11282* LUCENE-4217: Add the possibility to run tests with Atlassian Clover
11283  loaded from IVY. A development License solely for Apache code was added in
11284  the tools/ folder, but is not included in releases.  (Uwe Schindler)
11285
11286Documentation
11287
11288* LUCENE-4195: Added package documentation and examples for
11289  org.apache.lucene.codecs (Alan Woodward via Robert Muir)
11290
11291======================= Lucene 4.0.0-ALPHA =======================
11292
11293More information about this release, including any errata related to the
11294release notes, upgrade instructions, or other changes may be found online at:
11295   https://wiki.apache.org/lucene-java/Lucene4.0
11296
11297For "contrib" changes prior to 4.0, please see:
11298http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/contrib/CHANGES.txt
11299
11300Changes in backwards compatibility policy
11301
11302* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
11303
11304  - On upgrading to 4.0, if you do not fully reindex your documents,
11305    Lucene will emulate the new flex API on top of the old index,
11306    incurring some performance cost (up to ~10% slowdown, typically).
11307    To prevent this slowdown, use oal.index.IndexUpgrader
11308    to upgrade your indexes to latest file format (LUCENE-3082).
11309
11310  - Mixed flex/pre-flex indexes are perfectly fine -- the two
11311    emulation layers (flex API on pre-flex index, and pre-flex API on
11312    flex index) will remap the access as required.  So on upgrading to
11313    4.0 you can start indexing new documents into an existing index.
11314    To get optimal performance, use oal.index.IndexUpgrader
11315    to upgrade your indexes to latest file format (LUCENE-3082).
11316
11317  - The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum)
11318    have been removed in favor of the new flexible
11319    indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
11320    DocsEnum, DocsAndPositionsEnum). One big difference is that field
11321    and terms are now enumerated separately: a TermsEnum provides a
11322    BytesRef (wraps a byte[]) per term within a single field, not a
11323    Term.  Another is that when asking for a Docs/AndPositionsEnum, you
11324    now specify the skipDocs explicitly (typically this will be the
11325    deleted docs, but in general you can provide any Bits).
11326
11327  - The term vectors APIs (TermFreqVector, TermPositionVector,
11328    TermVectorMapper) have been removed in favor of the above
11329    flexible indexing APIs, presenting a single-document inverted
11330    index of the document from the term vectors.
11331
11332  - MultiReader ctor now throws IOException
11333
11334  - Directory.copy/Directory.copyTo now copies all files (not just
11335    index files), since what is and isn't and index file is now
11336    dependent on the codecs used.
11337
11338  - UnicodeUtil now uses BytesRef for UTF-8 output, and some method
11339    signatures have changed to CharSequence.  These are internal APIs
11340    and subject to change suddenly.
11341
11342  - Positional queries (PhraseQuery, *SpanQuery) will now throw an
11343    exception if use them on a field that omits positions during
11344    indexing (previously they silently returned no results).
11345
11346  - FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has
11347    changed -- each parse method now takes a BytesRef instead of a
11348    String.  If you have an existing Parser, a simple way to fix it is
11349    invoke BytesRef.utf8ToString, and pass that String to your
11350    existing parser.  This will work, but performance would be better
11351    if you could fix your parser to instead operate directly on the
11352    byte[] in the BytesRef.
11353
11354  - The internal (experimental) API of NumericUtils changed completely
11355    from String to BytesRef. Client code should never use this class,
11356    so the change would normally not affect you. If you used some of
11357    the methods to inspect terms or create TermQueries out of
11358    prefix encoded terms, change to use BytesRef. Please note:
11359    Do not use TermQueries to search for single numeric terms.
11360    The recommended way is to create a corresponding NumericRangeQuery
11361    with upper and lower bound equal and included. TermQueries do not
11362    score correct, so the constant score mode of NRQ is the only
11363    correct way to handle single value queries.
11364
11365  - NumericTokenStream now works directly on byte[] terms. If you
11366    plug a TokenFilter on top of this stream, you will likely get
11367    an IllegalArgumentException, because the NTS does not support
11368    TermAttribute/CharTermAttribute. If you want to further filter
11369    or attach Payloads to NTS, use the new NumericTermAttribute.
11370
11371  (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
11372
11373* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
11374  AtomicReader, CompositeReader, and DirectoryReader. To open Directory-
11375  based indexes use DirectoryReader.open(), the corresponding method in
11376  IndexReader is now deprecated for easier migration. Only DirectoryReader
11377  supports commits, versions, and reopening with openIfChanged(). Terms,
11378  postings, docvalues, and norms can from now on only be retrieved using
11379  AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
11380  only offering stored fields and access to the sub-readers (which may be
11381  composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be
11382  used to emulate atomic readers on top of composites.
11383  Please review MIGRATE.txt for information how to migrate old code.
11384  (Uwe Schindler, Robert Muir, Mike McCandless)
11385
11386* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
11387  not unicode code units. For example, a Wildcard "?" represents any unicode
11388  character. Furthermore, the rest of the automaton package and RegexpQuery use
11389  true Unicode codepoint representation.  (Robert Muir, Mike McCandless)
11390
11391* LUCENE-2380: The String-based FieldCache methods (getStrings,
11392  getStringIndex) have been replaced with BytesRef-based equivalents
11393  (getTerms, getTermsIndex).  Also, the sort values (returned in
11394  FieldDoc.fields) when sorting by SortField.STRING or
11395  SortField.STRING_VAL are now BytesRef instances.  See MIGRATE.txt
11396  for more details. (yonik, Mike McCandless)
11397
11398* LUCENE-2480: Though not a change in backwards compatibility policy, pre-3.0
11399  indexes are no longer supported. You should upgrade to 3.x first, then run
11400  optimize(), or reindex. (Shai Erera, Earwin Burrfoot)
11401
11402* LUCENE-2484: Removed deprecated TermAttribute. Use CharTermAttribute
11403  and TermToBytesRefAttribute instead.  (Uwe Schindler)
11404
11405* LUCENE-2600: Remove IndexReader.isDeleted in favor of
11406  AtomicReader.getDeletedDocs().  (Mike McCandless)
11407
11408* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
11409  behavior: the minimum similarity is 2 edit distances from the word,
11410  and the priority queue size is 50. To support this, FuzzyQuery now allows
11411  specifying unscaled edit distances (foobar~2). If your application depends
11412  upon the old defaults of 0.5 (scaled) minimum similarity and Integer.MAX_VALUE
11413  priority queue size, you can use FuzzyQuery(Term, float, int, int) to specify
11414  those explicitly.
11415
11416* LUCENE-2674: MultiTermQuery.TermCollector.collect now accepts the
11417  TermsEnum as well.  (Robert Muir, Mike McCandless)
11418
11419* LUCENE-588: WildcardQuery and QueryParser now allows escaping with
11420  the '\' character. Previously this was impossible (you could not escape */?,
11421  for example).  If your code somehow depends on the old behavior, you will
11422  need to change it (e.g. using "\\" to escape '\' itself).
11423  (Sunil Kamath, Terry Yang via Robert Muir)
11424
11425* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
11426  removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
11427  ParallelMultiSearcher into IndexSearcher as an optional
11428  ExecutorServiced passed to its ctor.  (Mike McCandless)
11429
11430* LUCENE-2908, LUCENE-4037: Removed serialization code from lucene classes.
11431  It is recommended that you serialize user search needs at a higher level
11432  in your application.
11433  (Robert Muir, Benson Margulies)
11434
11435* LUCENE-2831: Changed Weight#scorer, Weight#explain & Filter#getDocIdSet to
11436  operate on a AtomicReaderContext instead of directly on IndexReader to enable
11437  searches to be aware of IndexSearcher's context. (Simon Willnauer)
11438
11439* LUCENE-2839: Scorer#score(Collector,int,int) is now public because it is
11440  called from other classes and part of public API. (Uwe Schindler)
11441
11442* LUCENE-2865: Weight#scorer(AtomicReaderContext, boolean, boolean) now accepts
11443  a ScorerContext struct instead of booleans.(Simon Willnauer)
11444
11445* LUCENE-2882: Cut over SpanQuery#getSpans to AtomicReaderContext to enforce
11446  per segment semantics on SpanQuery & Spans. (Simon Willnauer)
11447
11448* LUCENE-2236: Similarity can now be configured on a per-field basis. See the
11449  migration notes in MIGRATE.txt for more details.  (Robert Muir, Doron Cohen)
11450
11451* LUCENE-2315: AttributeSource's methods for accessing attributes are now final,
11452  else it's easy to corrupt the internal states.  (Uwe Schindler)
11453
11454* LUCENE-2814: The IndexWriter.flush method no longer takes "boolean
11455  flushDocStores" argument, as we now always flush doc stores (index
11456  files holding stored fields and term vectors) while flushing a
11457  segment.  (Mike McCandless)
11458
11459* LUCENE-2548: Field names (eg in Term, FieldInfo) are no longer
11460  interned.  (Mike McCandless)
11461
11462* LUCENE-2883: The contents of o.a.l.search.function has been consolidated into
11463  the queries module and can be found at o.a.l.queries.function.  See
11464  MIGRATE.txt for more information (Chris Male)
11465
11466* LUCENE-2392, LUCENE-3299: Decoupled vector space scoring from
11467  Query/Weight/Scorer. If you extended Similarity directly before, you should
11468  extend TFIDFSimilarity instead.  Similarity is now a lower-level API to
11469  implement other scoring algorithms.  See MIGRATE.txt for more details.
11470  (David Nemeskey, Simon Willnauer, Mike McCandless, Robert Muir)
11471
11472* LUCENE-3330: The expert visitor API in Scorer has been simplified and
11473  extended to support arbitrary relationships. To navigate to a scorer's
11474  children, call Scorer.getChildren().  (Robert Muir)
11475
11476* LUCENE-2308: Field is now instantiated with an instance of IndexableFieldType,
11477  of which there is a core implementation FieldType.  Most properties
11478  describing a Field have been moved to IndexableFieldType.  See MIGRATE.txt
11479  for more details.  (Nikola Tankovic, Mike McCandless, Chris Male)
11480
11481* LUCENE-3396: ReusableAnalyzerBase.TokenStreamComponents.reset(Reader) now
11482  returns void instead of boolean.  If a Component cannot be reset, it should
11483  throw an Exception.  (Chris Male)
11484
11485* LUCENE-3396: ReusableAnalyzerBase has been renamed to Analyzer.  All Analyzer
11486  implementations must now use Analyzer.TokenStreamComponents, rather than
11487  overriding .tokenStream() and .reusableTokenStream() (which are now final).
11488  (Chris Male)
11489
11490* LUCENE-3346: Analyzer.reusableTokenStream() has been renamed to tokenStream()
11491  with the old tokenStream() method removed.  Consequently it is now mandatory
11492  for all Analyzers to support reusability. (Chris Male)
11493
11494* LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when
11495  it cannot be easily determined. Instead, it returns -1 to be consistent with
11496  this behavior across other index statistics.
11497  (Robert Muir)
11498
11499* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
11500  allowed to throw IOException. This change was required to make it conform
11501  to the Bits interface. This method should never do I/O for performance reasons.
11502  (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
11503  Jason Rutherglen, Paul Elschot)
11504
11505* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were removed,
11506  as these are no longer used by the scoring system. See MIGRATE.txt for more
11507  details.  (Robert Muir)
11508
11509* LUCENE-3533: Removed SpanFilters, they created large lists of objects and
11510  did not scale. (Robert Muir)
11511
11512* LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer
11513  possible to delete or undelete documents using IndexReader; you have to use
11514  IndexWriter now. As deleting by internal Lucene docID is no longer possible,
11515  this requires adding a unique identifier field to your index. Deleting/
11516  relying upon Lucene docIDs is not recommended anyway, because they can
11517  change. Consequently commit() was removed and DirectoryReader.open(),
11518  openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy
11519  instances. Furthermore, IndexReader.setNorm() was removed. If you need
11520  customized norm values, the recommended way to do this is by modifying
11521  Similarity to use an external byte[] or one of the new DocValues
11522  fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
11523  *and* length norm) at query time, wrap your AtomicReader using
11524  FilterAtomicReader, overriding FilterAtomicReader.norms(). To persist the
11525  changes on disk, copy the FilteredIndexReader to a new index using
11526  IndexWriter.addIndexes().  (Uwe Schindler, Robert Muir)
11527
11528* LUCENE-3640: Removed IndexSearcher.close(), because IndexSearcher no longer
11529  takes a Directory and no longer "manages" IndexReaders, it is a no-op.
11530  (Robert Muir)
11531
11532* LUCENE-3684: Add offsets into DocsAndPositionsEnum, and a few
11533  FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS.  (Robert
11534  Muir, Mike McCandless)
11535
11536* LUCENE-2858, LUCENE-3770: FilterIndexReader was renamed to
11537  FilterAtomicReader and now extends AtomicReader. If you want to filter
11538  composite readers like DirectoryReader or MultiReader, filter their
11539  atomic leaves and build a new CompositeReader (e.g. MultiReader) around
11540  them. (Uwe Schindler, Robert Muir)
11541
11542* LUCENE-3736: ParallelReader was split into ParallelAtomicReader
11543  and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
11544  ParallelAtomicReader; but the new composite variant has improved performance
11545  as it works on the atomic subreaders. It requires that all parallel
11546  composite readers have the same subreader structure. If you cannot provide this,
11547  you can use SlowCompositeReaderWrapper to make all parallel readers atomic
11548  and use ParallelAtomicReader.  (Uwe Schindler, Mike McCandless, Robert Muir)
11549
11550* LUCENE-2000: clone() now returns covariant types where possible. (ryan)
11551
11552* LUCENE-3970: Rename Fields.getUniqueFieldCount -> .size() and
11553  Terms.getUniqueTermCount -> .size().  (Iulius Curt via Mike McCandless)
11554
11555* LUCENE-3514: IndexSearcher.setDefaultFieldSortScoring was removed
11556  and replaced with per-search control via new expert search methods
11557  that take two booleans indicating whether hit scores and max
11558  score should be computed.  (Mike McCandless)
11559
11560* LUCENE-4055: You can't put foreign files into the index dir anymore.
11561
11562* LUCENE-3866: CompositeReader.getSequentialSubReaders() now returns
11563  unmodifiable List<? extends IndexReader>. ReaderUtil.Gather was
11564  removed, as IndexReaderContext.leaves() is now the preferred way
11565  to access sub-readers.  (Uwe Schindler)
11566
11567* LUCENE-4155: oal.util.ReaderUtil, TwoPhaseCommit, TwoPhaseCommitTool
11568  classes were moved to oal.index package. oal.util.CodecUtil class was moved
11569  to oal.codecs package. oal.util.DummyConcurrentLock was removed
11570  (no longer used in Lucene 4.0).  (Uwe Schindler)
11571
11572Changes in Runtime Behavior
11573
11574* LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you
11575  omitNorms(true) for field "a" for 1000 documents, but then add a document with
11576  omitNorms(false) for field "a", all documents for field "a" will have no
11577  norms.  Previously, Lucene would fill the first 1000 documents with
11578  "fake norms" from Similarity.getDefault(). (Robert Muir, Mike McCandless)
11579
11580* LUCENE-2846: When some documents contain field "a", and others do not, the
11581  documents that don't have the field get a norm byte value of 0. Previously,
11582  Lucene would populate "fake norms" with Similarity.getDefault() for these
11583  documents.  (Robert Muir, Mike McCandless)
11584
11585* LUCENE-2720: IndexWriter throws IndexFormatTooOldException on open, rather
11586  than later when e.g. a merge starts.
11587  (Shai Erera, Mike McCandless, Uwe Schindler)
11588
11589* LUCENE-2881: FieldInfos is now tracked per segment.  Before it was tracked
11590  per IndexWriter session, which resulted in FieldInfos that had the FieldInfo
11591  properties from all previous segments combined. Field numbers are now tracked
11592  globally across IndexWriter sessions and persisted into a _X.fnx file on
11593  successful commit. The corresponding file format changes are backwards-
11594  compatible. (Michael Busch, Simon Willnauer)
11595
11596* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
11597  DocumentsWriterPerThread:
11598
11599  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
11600    Each DocumentsWriterPerThread indexes documents in its own private segment,
11601    and the in memory segments are no longer merged on flush.  Instead, each
11602    segment is separately flushed to disk and subsequently merged with normal
11603    segment merging.
11604
11605  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
11606    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
11607    indexing may continue concurrently with flushing.  The selected
11608    DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
11609    don't flush all RAM resident documents but only the documents private to
11610    the DWPT selected for flushing.
11611
11612  - Flushing is now controlled by FlushPolicy that is called for every add,
11613    update or delete on IndexWriter. By default DWPTs are flushed either on
11614    maxBufferedDocs per DWPT or the global active used memory. Once the active
11615    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
11616    flushing and the memory used by this DWPT is subtracted from the active
11617    memory and added to a flushing memory pool, which can lead to temporarily
11618    higher memory usage due to ongoing indexing.
11619
11620  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
11621    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
11622    number of DWPT available in the used DocumentsWriterPerThreadPool.
11623    IndexWriters net memory consumption can grow far beyond the 2048 MB limit if
11624    the application can use all available DWPTs. To prevent a DWPT from
11625    exhausting its address space IndexWriter will forcefully flush a DWPT if its
11626    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled
11627    via IndexWriterConfig and defaults to 1945 MB.
11628    Since IndexWriter flushes DWPT concurrently not all memory is released
11629    immediately. Applications should still use a ramBufferSize significantly
11630    lower than the JVMs available heap memory since under high load multiple
11631    flushing DWPT can consume substantial transient memory when IO performance
11632    is slow relative to indexing rate.
11633
11634  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
11635    'currently' RAM resident documents to disk. Yet, flushes that occur while a
11636    a full flush is running are queued and will happen after all DWPT involved
11637    in the full flush are done flushing. Applications using multiple threads
11638    during indexing and trigger a full flush (eg call commit() or open a new
11639    NRT reader) can use significantly more transient memory.
11640
11641  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
11642    threads if the number of active + number of flushing DWPT exceed a
11643    safety limit. By default this happens if 2 * max number available thread
11644    states (DWPTPool) is exceeded. This safety limit prevents applications from
11645    exhausting their available memory if flushing can't keep up with
11646    concurrently indexing threads.
11647
11648  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
11649    limit is reached during indexing. No segment flushes will be triggered
11650    due to this setting.
11651
11652  - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter
11653    anymore. A dedicated flushLock has been introduced to prevent multiple full-
11654    flushes happening concurrently.
11655
11656  - DocumentsWriter doesn't write shared doc stores anymore.
11657
11658  (Mike McCandless, Michael Busch, Simon Willnauer)
11659
11660* LUCENE-3309: Stored fields no longer record whether they were
11661  tokenized or not.  In general you should not rely on stored fields
11662  to record any "metadata" from indexing (tokenized, omitNorms,
11663  IndexOptions, boost, etc.)  (Mike McCandless)
11664
11665* LUCENE-3309: Fast vector highlighter now inserts the
11666  MultiValuedSeparator for NOT_ANALYZED fields (in addition to
11667  ANALYZED fields).  To ensure your offsets are correct you should
11668  provide an analyzer that returns 1 from the offsetGap method.
11669  (Mike McCandless)
11670
11671* LUCENE-2621: Removed contrib/instantiated.  (Robert Muir)
11672
11673* LUCENE-1768: StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder
11674  for RangeQueryNodes, since theses two classes were removed;
11675  TermRangeQueryNodeProcessor now creates TermRangeQueryNode,
11676  instead of RangeQueryNode; the same applies for numeric nodes;
11677  (Vinicius Barros via Uwe Schindler)
11678
11679* LUCENE-3455: QueryParserBase.newFieldQuery() will throw a ParseException if
11680  any of the calls to the Analyzer throw an IOException.  QueryParseBase.analyzeRangePart()
11681  will throw a RuntimeException if an IOException is thrown by the Analyzer.
11682
11683* LUCENE-4127: IndexWriter will now throw IllegalArgumentException if
11684  the first token of an indexed field has 0 positionIncrement
11685  (previously it silently corrected it to 1, possibly masking bugs).
11686  OffsetAttributeImpl will throw IllegalArgumentException if startOffset
11687  is less than endOffset, or if offsets are negative.
11688  (Robert Muir, Mike McCandless)
11689
11690API Changes
11691
11692* LUCENE-2302, LUCENE-1458, LUCENE-2111, LUCENE-2514: Terms are no longer
11693  required to be character based. Lucene views a term as an arbitrary byte[]:
11694  during analysis, character-based terms are converted to UTF8 byte[],
11695  but analyzers are free to directly create terms as byte[]
11696  (NumericField does this, for example).  The term data is buffered as
11697  byte[] during indexing, written as byte[] into the terms dictionary,
11698  and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
11699  searching.
11700
11701* LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its
11702  deleted docs (getDeletedDocs), providing a new Bits interface to
11703  directly query by doc ID.
11704
11705* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
11706  exposed via open and reopen methods on DirectoryReader.  The semantics of the
11707  call is the same as it was prior to the API change.
11708  (Grant Ingersoll, Mike McCandless)
11709
11710* LUCENE-2566: QueryParser: Unary operators +,-,! will not be treated as
11711  operators if they are followed by whitespace. (yonik)
11712
11713* LUCENE-2831: Weight#scorer, Weight#explain, Filter#getDocIdSet,
11714  Collector#setNextReader & FieldComparator#setNextReader now expect an
11715  AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
11716
11717* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
11718  default) which takes Analyzer as a parameter, for easier customization by
11719  subclasses.  (Robert Muir)
11720
11721* LUCENE-2953: In addition to changes in 3.x, PriorityQueue#initialize(int)
11722  function was moved into the ctor. (Uwe Schindler, Yonik Seeley)
11723
11724* LUCENE-3219: SortField type properties have been moved to an enum
11725  SortField.Type.  In be consistent, CachedArrayCreator.getSortTypeID() has
11726  been changed CachedArrayCreator.getSortType(). (Chris Male)
11727
11728* LUCENE-3225: Add TermsEnum.seekExact for faster seeking when you
11729  don't need the ceiling term; renamed existing seek methods to either
11730  seekCeil or seekExact; changed seekExact(ord) to return no value.
11731  Fixed MemoryCodec and SimpleTextCodec to optimize the seekExact
11732  case, and fixed places in Lucene to use seekExact when possible.
11733  (Mike McCandless)
11734
11735* LUCENE-1536: Filter.getDocIdSet() now takes an acceptDocs Bits interface (like
11736  Scorer) limiting the documents that can appear in the returned DocIdSet.
11737  Filters are now required to respect these acceptDocs, otherwise deleted documents
11738  may get returned by searches. Most filters will pass these Bits down to DocsEnum,
11739  but those, e.g. working on FieldCache, may need to use BitsFilteredDocIdSet.wrap()
11740  to exclude them.
11741  (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
11742  Jason Rutherglen, Paul Elschot)
11743
11744* LUCENE-3722: Similarity methods and collection/term statistics now take
11745  long instead of int (to enable distributed scoring of > 2B docs).
11746  (Yonik Seeley, Andrzej Bialecki, Robert Muir)
11747
11748* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager.
11749  SearcherManager remains a concrete class, but due to the refactoring, the
11750  method maybeReopen has been deprecated in favor of maybeRefresh().
11751  (Shai Erera, Mike McCandless, Simon Willnauer)
11752
11753* LUCENE-3859: AtomicReader.hasNorms(field) is deprecated, instead you
11754  can inspect the FieldInfo yourself to see if norms are present, which
11755  also allows you to get the type.  (Robert Muir)
11756
11757* LUCENE-2606: Changed RegexCapabilities interface to fix thread
11758  safety, serialization, and performance problems. If you have
11759  written a custom RegexCapabilities it will need to be updated
11760  to the new API.  (Robert Muir, Uwe Schindler)
11761
11762* LUCENE-2638 MakeHighFreqTerms.TermStats public to make it more useful
11763  for API use. (Andrzej Bialecki)
11764
11765* LUCENE-2912: The field-specific hashmaps in SweetSpotSimilarity were removed.
11766  Instead, use PerFieldSimilarityWrapper to return different SweetSpotSimilaritys
11767  for different fields, this way all parameters (such as TF factors) can be
11768  customized on a per-field basis.  (Robert Muir)
11769
11770* LUCENE-3308: DuplicateFilter keepMode and processingMode have been converted to
11771  enums DuplicateFilter.KeepMode and DuplicateFilter.ProcessingMode respectively.
11772
11773* LUCENE-3483: Move Function grouping collectors from Solr to grouping module.
11774  (Martijn van Groningen)
11775
11776* LUCENE-3606: FieldNormModifier was deprecated, because IndexReader's
11777  setNorm() was deprecated. Furthermore, this class is broken, as it does
11778  not take position overlaps into account while recalculating norms.
11779  (Uwe Schindler, Robert Muir)
11780
11781* LUCENE-3936: Renamed StringIndexDocValues to DocTermsIndexDocValues.
11782  (Martijn van Groningen)
11783
11784* LUCENE-1768: Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder),
11785  ParametricRangeQueryNodeProcessor were removed. (Vinicius Barros via Uwe Schindler)
11786
11787* LUCENE-3820: Deprecated constructors accepting pattern matching bounds. The input
11788  is buffered and matched in one pass. (Dawid Weiss)
11789
11790* LUCENE-2413: Deprecated PatternAnalyzer in common/miscellaneous, in favor
11791  of the pattern package (CharFilter, Tokenizer, TokenFilter).  (Robert Muir)
11792
11793* LUCENE-2413: Removed the AnalyzerUtil in common/miscellaneous.  (Robert Muir)
11794
11795* LUCENE-1370: Added ShingleFilter option to output unigrams if no shingles
11796  can be generated. (Chris Harris via Steven Rowe)
11797
11798* LUCENE-2514, LUCENE-2551: JDK and ICU CollationKeyAnalyzers were changed to
11799  use pure byte keys when Version >= 4.0. This cuts sort key size approximately
11800  in half. (Robert Muir)
11801
11802* LUCENE-3400: Removed DutchAnalyzer.setStemDictionary (Chris Male)
11803
11804* LUCENE-3431: Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods
11805  since they prevented reuse.  Stopwords are now generated at instantiation through
11806  the Analyzer's constructors. (Chris Male)
11807
11808* LUCENE-3434: Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer
11809  since they prevent reuse.  Both Analyzers should be configured at instantiation.
11810  (Chris Male)
11811
11812* LUCENE-3765: Stopset ctors that previously took Set<?> or Map<?,String> now take
11813  CharArraySet and CharArrayMap respectively. Previously the behavior was confusing,
11814  and sometimes different depending on the type of set, and ultimately a CharArraySet
11815  or CharArrayMap was always used anyway.  (Robert Muir)
11816
11817* LUCENE-3830: Switched to NormalizeCharMap.Builder to create
11818  immutable instances of NormalizeCharMap. (Dawid Weiss, Mike
11819  McCandless)
11820
11821* LUCENE-4063: FrenchLightStemmer no longer deletes repeated digits.
11822  (Tanguy Moal via Steve Rowe)
11823
11824* LUCENE-4122: Replace Payload with BytesRef. (Andrzej Bialecki)
11825
11826* LUCENE-4132: IndexWriter.getConfig() now returns a LiveIndexWriterConfig object
11827  which can be used to change the IndexWriter's live settings. IndexWriterConfig
11828  is used only for initializing the IndexWriter. (Shai Erera)
11829
11830* LUCENE-3866: IndexReaderContext.leaves() is now the preferred way to access
11831  atomic sub-readers of any kind of IndexReader (for AtomicReaders it returns
11832  itself as only leaf with docBase=0).  (Uwe Schindler)
11833
11834New features
11835
11836* LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions
11837  are directly supported by the standard queryparser via
11838     fieldName:/expression/ OR /expression against default field/
11839  Users who wish to search for literal "/" characters are advised to
11840  backslash-escape or quote those characters as needed.
11841  (Simon Willnauer, Robert Muir)
11842
11843* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
11844  matches terms against a finite-state machine. Implement WildcardQuery
11845  and FuzzyQuery with finite-state methods. Adds RegexpQuery.
11846  (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
11847
11848* LUCENE-3662: Add support for levenshtein distance with transpositions
11849  to LevenshteinAutomata, FuzzyTermsEnum, and DirectSpellChecker.
11850  (Jean-Philippe Barrette-LaPierre, Robert Muir)
11851
11852* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
11853  representation for the in-memory terms dict index.  (Mike
11854  McCandless)
11855
11856* LUCENE-2126: Add new classes for data (de)serialization: DataInput
11857  and DataOutput.  IndexInput and IndexOutput extend these new classes.
11858  (Michael Busch)
11859
11860* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
11861  for an application to create its own postings codec, to alter how
11862  fields, terms, docs and positions are encoded into the index.  The
11863  standard codec is the default codec. IndexWriter accepts a Codec
11864  class to obtain codecs for newly written segments.
11865
11866* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
11867  for flexible indexing, including pulsing codec (inlines
11868  low-frequency terms directly into the terms dict, avoiding seeking
11869  for some queries), sep codec (stores docs, freqs, positions, skip
11870  data and payloads in 5 separate files instead of the 2 used by
11871  standard codec), and int block (really a "base" for using
11872  block-based compressors like PForDelta for storing postings data).
11873
11874* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
11875  codec is more RAM efficient: terms data is stored as block byte
11876  arrays and packed integers.  Net RAM reduction for indexes that have
11877  many unique terms should be substantial, and initial open time for
11878  IndexReaders should be faster.  These gains only apply for newly
11879  written segments after upgrading.
11880
11881* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
11882  byte[] during indexing, which uses half the RAM for ascii terms (and
11883  also numeric fields).  This can improve indexing throughput for
11884  applications that have many unique terms, since it reduces how often
11885  a new segment must be flushed given a fixed RAM buffer size.
11886
11887* LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which
11888  lets you set the Codec per field (Mike McCandless)
11889
11890* LUCENE-2373: Extend Codec to use SegmentInfosWriter and
11891  SegmentInfosReader to allow customization of SegmentInfos data.
11892  (Andrzej Bialecki)
11893
11894* LUCENE-2504: FieldComparator.setNextReader now returns a
11895  FieldComparator instance.  You can "return this", to just reuse the
11896  same instance, or you can return a comparator optimized to the new
11897  segment.  (yonik, Mike McCandless)
11898
11899* LUCENE-2648: PackedInts.Iterator now supports to advance by more than a
11900  single ordinal. (Simon Willnauer)
11901
11902* LUCENE-2649: Objects in the FieldCache can optionally store Bits
11903  that mark which docs have real values in the native[] (ryan)
11904
11905* LUCENE-2664: Add SimpleText codec, which stores all terms/postings
11906  data in a single text file for transparency (at the expense of poor
11907  performance).  (Sahin Buyrukbilen via Mike McCandless)
11908
11909* LUCENE-2589: Add a VariableSizedIntIndexInput, which, when used w/
11910  Sep*, makes it simple to take any variable sized int block coders
11911  (like Simple9/16) and use them in a codec.  (Mike McCandless)
11912
11913* LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a
11914  composite reader (eg MultiReader or DirectoryReader), making it
11915  pretend it's an atomic reader.  This is a convenience class (you can
11916  use MultiFields static methods directly, instead) if you need to use
11917  the flex APIs directly on a composite reader.  (Mike McCandless)
11918
11919* LUCENE-2690: MultiTermQuery boolean rewrites per segment.
11920  (Uwe Schindler, Robert Muir, Mike McCandless, Simon Willnauer)
11921
11922* LUCENE-996: The QueryParser now accepts mixed inclusive and exclusive
11923  bounds for range queries.  Example: "{3 TO 5]"
11924  QueryParser subclasses that overrode getRangeQuery will need to be changed
11925  to use the new getRangeQuery method.  (Andrew Schurman, Mark Miller, yonik)
11926
11927* LUCENE-2742: Add native per-field postings format support. Codec lets you now
11928  register a postings format for each field and which is in turn recorded
11929  into the index. Postings formats are maintained on a per-segment basis and be
11930  resolved without knowing the actual postings format used for writing the segment.
11931  (Simon Willnauer)
11932
11933* LUCENE-2741: Add support for multiple codecs that use the same file
11934  extensions within the same segment. Codecs now use their per-segment codec
11935  ID in the file names. (Simon Willnauer)
11936
11937* LUCENE-2843: Added a new terms index impl,
11938  VariableGapTermsIndexWriter/Reader, that accepts a pluggable
11939  IndexTermSelector for picking which terms should be indexed in the
11940  terms dict.  This impl stores the indexed terms in an FST, which is
11941  much more RAM efficient than FixedGapTermsIndex.  (Mike McCandless)
11942
11943* LUCENE-2862: Added TermsEnum.totalTermFreq() and
11944  Terms.getSumTotalTermFreq().  (Mike McCandless, Robert Muir)
11945
11946* LUCENE-3290: Added Terms.getSumDocFreq()  (Mike McCandless, Robert Muir)
11947
11948* LUCENE-3003: Added new expert class oal.index.DocTermsOrd,
11949  refactored from Solr's UnInvertedField, for accessing term ords for
11950  multi-valued fields, per document.  This is similar to FieldCache in
11951  that it inverts the index to compute the ords, but differs in that
11952  it's able to handle multi-valued fields and does not hold the term
11953  bytes in RAM. (Mike McCandless)
11954
11955* LUCENE-3108, LUCENE-2935, LUCENE-2168, LUCENE-1231: Changes from
11956  DocValues (ColumnStrideFields):
11957
11958  - IndexWriter now supports typesafe dense per-document values stored in
11959    a column like storage. DocValues are stored on a per-document
11960    basis where each documents field can hold exactly one value of a given
11961    type. DocValues are provided via Fieldable and can be used in
11962    conjunction with stored and indexed values.
11963
11964  - DocValues provides an entirely RAM resident document id to value
11965    mapping per field as well as a DocIdSetIterator based disk-resident
11966    sequential access API relying on filesystem-caches.
11967
11968  - Both APIs are exposed via IndexReader and the Codec / Flex API allowing
11969    expert users to integrate customized DocValues reader and writer
11970    implementations by extending existing Codecs.
11971
11972  - DocValues provides implementations for primitive datatypes like int,
11973    long, float, double and arrays of byte. Byte based implementations further
11974    provide storage variants like straight or dereferenced stored bytes, fixed
11975    and variable length bytes as well as index time sorted based on
11976    user-provided comparators.
11977
11978  (Mike McCandless, Simon Willnauer)
11979
11980* LUCENE-3209: Added MemoryCodec, which stores all terms & postings in
11981  RAM as an FST; this is good for primary-key fields if you frequently
11982  need to lookup by that field or perform deletions against it, for
11983  example in a near-real-time setting. (Mike McCandless)
11984
11985* SOLR-2533: Added support for rewriting Sort and SortFields using an
11986  IndexSearcher.  SortFields can have SortField.REWRITEABLE type which
11987  requires they are rewritten before they are used. (Chris Male)
11988
11989* LUCENE-3203: FSDirectory can now limit the max allowed write rate
11990  (MB/sec) of all running merges, to reduce impact ongoing merging has
11991  on searching, NRT reopen time, etc.  (Mike McCandless)
11992
11993* LUCENE-2793: Directory#createOutput & Directory#openInput now accept an
11994  IOContext instead of a buffer size to allow low level optimizations for
11995  different usecases like merging, flushing and reading.
11996  (Simon Willnauer, Mike McCandless, Varun Thacker)
11997
11998* LUCENE-3354: FieldCache can cache DocTermOrds. (Martijn van Groningen)
11999
12000* LUCENE-3376: ReusableAnalyzerBase has been moved from modules/analysis/common
12001  into lucene/src/java/org/apache/lucene/analysis (Chris Male)
12002
12003* LUCENE-3423: add Terms.getDocCount(), which returns the number of documents
12004  that have at least one term for a field.  (Yonik Seeley, Robert Muir)
12005
12006* LUCENE-2959: Added a variety of different relevance ranking systems to Lucene.
12007
12008  - Added Okapi BM25, Language Models, Divergence from Randomness, and
12009    Information-Based Models. The models are pluggable, support all of lucene's
12010    features (boosts, slops, explanations, etc) and queries (spans, etc).
12011
12012  - All models default to the same index-time norm encoding as
12013    DefaultSimilarity, so you can easily try these out/switch back and
12014    forth/run experiments and comparisons without reindexing. Note: most of
12015    the models do rely upon index statistics that are new in Lucene 4.0, so
12016    for existing 3.x indexes it's a good idea to upgrade your index to the
12017    new format with IndexUpgrader first.
12018
12019  - Added a new subclass SimilarityBase which provides a simplified API
12020    for plugging in new ranking algorithms without dealing with all of the
12021    nuances and implementation details of Lucene.
12022
12023  - For example, to use BM25 for all fields:
12024     searcher.setSimilarity(new BM25Similarity());
12025
12026    If you instead want to apply different similarities (e.g. ones with
12027    different parameter values or different algorithms entirely) to different
12028    fields, implement PerFieldSimilarityWrapper with your per-field logic.
12029
12030  (David Mark Nemeskey via Robert Muir)
12031
12032* LUCENE-3396: ReusableAnalyzerBase now provides a ReuseStrategy abstraction
12033  which controls how TokenStreamComponents are reused per request.  Two
12034  implementations are provided - GlobalReuseStrategy which implements the
12035  current behavior of sharing components between all fields, and
12036  PerFieldReuseStrategy which shares per field.  (Chris Male)
12037
12038* LUCENE-2309: Added IndexableField.tokenStream(Analyzer) which is now
12039  responsible for creating the TokenStreams for Fields when they are to
12040  be indexed.  (Chris Male)
12041
12042* LUCENE-3433: Added random access for non RAM resident IndexDocValues. RAM
12043  resident and disk resident IndexDocValues are now exposed via the Source
12044  interface. ValuesEnum has been removed in favour of Source. (Simon Willnauer)
12045
12046* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
12047  a new bits() method, returning all documents in a random access way. If the
12048  DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
12049  as replacement for AtomicReader's live docs.
12050  In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
12051  Using FilteredQuery you can chain Filters in a very performant way
12052  [new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
12053  possible with IndexSearcher's methods. FilteredQuery also allows to override
12054  the heuristics used to decide if filtering should be done random access or
12055  using a conjunction on DocIdSet's iterator().
12056  (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley,
12057  Jason Rutherglen, Paul Elschot)
12058
12059* LUCENE-3638: Added sugar methods to IndexReader and IndexSearcher to
12060  load only certain fields when loading a document.  (Peter Chang via
12061  Mike McCandless)
12062
12063* LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes
12064  a #normValues(String) method to obtain norms per field. (Simon Willnauer)
12065
12066* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute
12067  norm values or arbitrary precision. Instead of returning a fixed single byte
12068  value, custom similarities can now set a integer, float or byte value to the
12069  given Norm object. (Simon Willnauer)
12070
12071* LUCENE-2604, LUCENE-4103: Added RegexpQuery support to contrib/queryparser.
12072  (Simon Willnauer, Robert Muir, Daniel Truemper)
12073
12074* LUCENE-2373: Added a Codec implementation that works with append-only
12075  filesystems (such as e.g. Hadoop DFS). SegmentInfos writing/reading
12076  code is refactored to support append-only FS, and to allow for future
12077  customization of per-segment information. (Andrzej Bialecki)
12078
12079* LUCENE-2479: Added ability to provide a sort comparator for spelling suggestions along
12080  with two implementations.  The existing comparator (score, then frequency) is the default (Grant Ingersoll)
12081
12082* LUCENE-2608: Added the ability to specify the accuracy at method time in the SpellChecker.  The per class
12083  method is also still available.  (Grant Ingersoll)
12084
12085* LUCENE-2507: Added DirectSpellChecker, which retrieves correction candidates directly
12086  from the term dictionary using levenshtein automata.  (Robert Muir)
12087
12088* LUCENE-3527: Add LuceneLevenshteinDistance, which computes string distance in a compatible
12089  way as DirectSpellChecker. This can be used to merge top-N results from more than one
12090  SpellChecker.  (James Dyer via Robert Muir)
12091
12092* LUCENE-3496: Support grouping by DocValues. (Martijn van Groningen)
12093
12094* LUCENE-2795: Generified DirectIOLinuxDirectory to work across any
12095  unix supporting the O_DIRECT flag when opening a file (tested on
12096  Linux and OS X but likely other Unixes will work), and improved it
12097  so it can be used for indexing and searching.  The directory uses
12098  direct IO when doing large merges to avoid  unnecessarily evicting
12099  cached IO pages due to large merges.  (Varun Thacker, Mike
12100  McCandless)
12101
12102* LUCENE-3827: DocsAndPositionsEnum from MemoryIndex implements
12103  start/endOffset, if offsets are indexed. (Alan Woodward via Mike
12104  McCandless)
12105
12106* LUCENE-3802, LUCENE-3856: Support for grouped faceting. (Martijn van Groningen)
12107
12108* LUCENE-3444: Added a second pass grouping collector that keeps track of distinct
12109  values for a specified field for the top N group. (Martijn van Groningen)
12110
12111* LUCENE-3778: Added a grouping utility class that makes it easier to use result
12112  grouping for pure Lucene apps. (Martijn van Groningen)
12113
12114* LUCENE-2341: A new analysis/ filter: Morfologik - a dictionary-driven lemmatizer
12115  (accurate stemmer) for Polish (includes morphosyntactic annotations).
12116  (Michał Dybizbański, Dawid Weiss)
12117
12118* LUCENE-2413: Consolidated Lucene/Solr analysis components into analysis/common.
12119  New features from Solr now available to Lucene users include:
12120   - o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
12121     and phrases.
12122   - o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
12123     constructs.
12124   - o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
12125     into subwords and performs optional transformations on subword groups.
12126   - o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
12127     filters out Tokens at the same position and Term text as the previous token.
12128   - o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace
12129     from Tokens in the stream.
12130   - o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens
12131     with text contained in the required words (inverse of StopFilter).
12132   - o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts
12133     hyphenated words broken into two lines back together.
12134   - o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies
12135     capitalization rules to tokens.
12136   - o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
12137     CharFilter, Tokenizer, and TokenFilter for transforming text with regexes.
12138   - o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word
12139     synonyms.
12140   - o.a.l.analysis.phonetic: Package for phonetic search, containing various
12141     phonetic encoders such as Double Metaphone.
12142
12143   Some existing analysis components changed packages:
12144    - o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
12145    - o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
12146    - o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
12147    - o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
12148    - o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
12149    - o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
12150    - o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
12151    - o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
12152    - o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
12153    - o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
12154    - o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
12155    - o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
12156    - o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
12157    - o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
12158    - o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
12159    - o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
12160    - o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
12161    - o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
12162    - o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
12163    - o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
12164    - o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
12165    - o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
12166    - o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
12167    - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
12168    - o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
12169    - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
12170    - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
12171    - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
12172
12173   All analyzers in contrib/analyzers and contrib/icu were moved to the
12174   analysis/ module.  The 'smartcn' and 'stempel' components now depend on 'common'.
12175   (Chris Male, Robert Muir)
12176
12177* LUCENE-4004: Add DisjunctionMaxQuery support to the xml query parser.
12178  (Benson Margulies via Robert Muir)
12179
12180* LUCENE-4025: Add maybeRefreshBlocking to ReferenceManager, to let a caller
12181  block until the refresh logic has been executed. (Shai Erera, Mike McCandless)
12182
12183* LUCENE-4039: Add AddIndexesTask to benchmark, which uses IW.addIndexes.
12184  (Shai Erera)
12185
12186* LUCENE-3514: Added IndexSearcher.searchAfter when Sort is used,
12187  returning results after a specified FieldDoc for deep
12188  paging.  (Mike McCandless)
12189
12190* LUCENE-4043: Added scoring support via score mode for query time joining.
12191  (Martijn van Groningen, Mike McCandless)
12192
12193* LUCENE-3523: Added oal.search.spell.WordBreakSpellChecker, which
12194    generates suggestions by combining two or more terms and/or
12195    breaking terms into multiple words.  See Javadocs for usage. (James Dyer)
12196
12197* LUCENE-4019: Added improved parsing of Hunspell Dictionaries so those
12198  rules missing the required number of parameters either ignored or
12199  cause a ParseException (depending on whether strict parsing is enabled).
12200  (Luca Cavanna via Chris Male)
12201
12202* LUCENE-3440: Add ordered fragments feature with IDF-weighted terms for FVH.
12203  (Sebastian Lutze via Koji Sekiguchi)
12204
12205* LUCENE-4082: Added explain to ToParentBlockJoinQuery.
12206  (Christoph Kaser, Martijn van Groningen)
12207
12208* LUCENE-4108: add replaceTaxonomy to DirectoryTaxonomyWriter, which replaces
12209  the taxonomy in place with the given one. (Shai Erera)
12210
12211* LUCENE-3030: new BlockTree terms dictionary (used by the default
12212  Lucene40 postings format) uses less RAM (for the terms index) and
12213  disk space (for all terms and metadata) and gives sizable
12214  performance gains for terms dictionary intensive operations like
12215  FuzzyQuery, direct spell checker and primary-key lookup (Mike
12216  McCandless).
12217
12218Optimizations
12219
12220* LUCENE-2588: Don't store unnecessary suffixes when writing the terms
12221  index, saving RAM in IndexReader; change default terms index
12222  interval from 128 to 32, because the terms index now requires much
12223  less RAM.  (Robert Muir, Mike McCandless)
12224
12225* LUCENE-2669: Optimize NumericRangeQuery.NumericRangeTermsEnum to
12226  not seek backwards when a sub-range has no terms. It now only seeks
12227  when the current term is less than the next sub-range's lower end.
12228  (Uwe Schindler, Mike McCandless)
12229
12230* LUCENE-2694: Optimize MultiTermQuery to be single pass for Term lookups.
12231  MultiTermQuery now stores TermState per leaf reader during rewrite to re-
12232  seek the term dictionary in TermQuery / TermWeight.
12233  (Simon Willnauer, Mike McCandless, Robert Muir)
12234
12235* LUCENE-3292: IndexWriter no longer shares the same SegmentReader
12236  instance for merging and NRT readers, which enables directory impls
12237  to separately tune IO flags for each.  (Varun Thacker, Simon
12238  Willnauer, Mike McCandless)
12239
12240* LUCENE-3328: BooleanQuery now uses a specialized ConjunctionScorer if all
12241  boolean clauses are required and instances of TermQuery.
12242  (Simon Willnauer, Robert Muir)
12243
12244* LUCENE-3643: FilteredQuery and IndexSearcher.search(Query, Filter,...)
12245  now optimize the special case query instanceof MatchAllDocsQuery to
12246  execute as ConstantScoreQuery.  (Uwe Schindler)
12247
12248* LUCENE-3509: Added fasterButMoreRam option for docvalues. This option controls whether the space for packed ints
12249  should be rounded up for better performance. This option only applies for docvalues types bytes fixed sorted
12250  and bytes var sorted. (Simon Willnauer, Martijn van Groningen)
12251
12252* LUCENE-3795: Replace contrib/spatial with modules/spatial.  This includes
12253  a basic spatial strategy interface.  (David Smiley, Chris Male, ryan)
12254
12255* LUCENE-3932: Lucene3x codec loads terms index faster, by
12256  pre-allocating the packed ints array based on the .tii file size
12257  (Sean Bridges via Mike McCandless)
12258
12259* LUCENE-3468: Replaced last() and remove() with pollLast() in
12260  FirstPassGroupingCollector (Martijn van Groningen)
12261
12262* LUCENE-3830: Changed MappingCharFilter/NormalizeCharMap to use an
12263  FST under the hood, which requires less RAM.  NormalizeCharMap no
12264  longer accepts empty string match (it did previously, but ignored
12265  it).  (Dawid Weiss, Mike McCandless)
12266
12267* LUCENE-4061: improve synchronization in DirectoryTaxonomyWriter.addCategory
12268  and few general improvements to DirectoryTaxonomyWriter.
12269  (Shai Erera, Gilad Barkai)
12270
12271* LUCENE-4062: Add new aligned packed bits impls for faster lookup
12272  performance; add float acceptableOverheadRatio to getWriter and
12273  getMutable API to give packed ints freedom to pick faster
12274  implementations (Adrien Grand via Mike McCandless)
12275
12276* LUCENE-2357: Reduce transient RAM usage when merging segments in
12277  IndexWriter. (Adrien Grand)
12278
12279* LUCENE-4098: Add bulk get/set methods to PackedInts (Adrien Grand
12280  via Mike McCandless)
12281
12282* LUCENE-4156: DirectoryTaxonomyWriter.getSize is no longer synchronized.
12283  (Shai Erera, Sivan Yogev)
12284
12285* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using
12286  the new WeakIdentityMap on top of a ConcurrentHashMap to manage
12287  the cloned instances. WeakIdentityMap was extended to support
12288  iterating over its keys.  (Uwe Schindler)
12289
12290Bug fixes
12291
12292* LUCENE-2803: The FieldCache can miss values if an entry for a reader
12293  with more document deletions is requested before a reader with fewer
12294  deletions, provided they share some segments. (yonik)
12295
12296* LUCENE-2645: Fix false assertion error when same token was added one
12297  after another with 0 posIncr.  (David Smiley, Kurosaka Teruhiko via Mike
12298  McCandless)
12299
12300* LUCENE-3348: Fix thread safety hazards in IndexWriter that could
12301  rarely cause deletions to be incorrectly applied.  (Yonik Seeley,
12302  Simon Willnauer, Mike McCandless)
12303
12304* LUCENE-3515: Fix terrible merge performance versus 3.x, especially
12305  when the directory isn't MMapDirectory, due to failing to reuse
12306  DocsAndPositionsEnum while merging (Marc Sturlese, Erick Erickson,
12307  Robert Muir, Simon Willnauer, Mike McCandless)
12308
12309* LUCENE-3589: BytesRef copy(short) didn't set length.
12310  (Peter Chang via Robert Muir)
12311
12312* LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
12313  not lowercasing the key before checking for the tag (Adriano Crestani)
12314
12315* LUCENE-3890: Fixed NPE for grouped faceting on multi-valued fields.
12316  (Michael McCandless, Martijn van Groningen)
12317
12318* LUCENE-2945: Fix hashCode/equals for surround query parser generated queries.
12319  (Paul Elschot, Simon Rosenthal, gsingers via ehatcher)
12320
12321* LUCENE-3971: MappingCharFilter could return invalid final token position.
12322  (Dawid Weiss, Robert Muir)
12323
12324* LUCENE-3820: PatternReplaceCharFilter could return invalid token positions.
12325  (Dawid Weiss)
12326
12327* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in
12328  CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter,
12329  SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer,
12330  WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and
12331  CommonGramsFilter now populate PositionLengthAttribute. Fixed
12332  PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in
12333  ReversePathHierarchyTokenizer if skip is large. Fixed wrong final
12334  offset calculation in PathHierarchyTokenizer.
12335  (Mike McCandless, Uwe Schindler, Robert Muir)
12336
12337* LUCENE-4060: Fix a synchronization bug in
12338  DirectoryTaxonomyWriter.addTaxonomies(). Also, the method has been renamed to
12339  addTaxonomy and now takes only one Directory and one OrdinalMap.
12340  (Shai Erera, Gilad Barkai)
12341
12342* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when
12343  offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix
12344  CharsRef's CharSequence methods to throw exceptions in boundary cases
12345  to properly meet the specification.  (Robert Muir)
12346
12347* LUCENE-4084: Attempting to reuse a single IndexWriterConfig instance
12348  across more than one IndexWriter resulted in a cryptic exception.
12349  This is now fixed, but requires that certain members of
12350  IndexWriterConfig (MergePolicy, FlushPolicy,
12351  DocumentsWriterThreadPool) implement clone.  (Robert Muir, Simon
12352  Willnauer, Mike McCandless)
12353
12354* LUCENE-4079: Fixed loading of Hunspell dictionaries that use aliasing (AF rules)
12355  (Ludovic Boutros via Chris Male)
12356
12357* LUCENE-4077: Expose the max score and per-group scores from
12358  ToParentBlockJoinCollector (Christoph Kaser, Mike McCandless)
12359
12360* LUCENE-4114: Fix int overflow bugs in BYTES_FIXED_STRAIGHT and
12361  BYTES_FIXED_DEREF doc values implementations (Walt Elder via Mike McCandless).
12362
12363* LUCENE-4147: Fixed thread safety issues when rollback() and commit()
12364  are called simultaneously.  (Simon Willnauer, Mike McCandless)
12365
12366* LUCENE-4165: Removed closing of the Reader used to read the affix file in
12367  HunspellDictionary.  Consumers are now responsible for closing all InputStreams
12368  once the Dictionary has been instantiated. (Torsten Krah, Uwe Schindler, Chris Male)
12369
12370Documentation
12371
12372* LUCENE-3958: Javadocs corrections for IndexWriter.
12373  (Iulius Curt via Robert Muir)
12374
12375Build
12376
12377* LUCENE-4047: Cleanup of LuceneTestCase: moved blocks of initialization/ cleanup
12378  code into JUnit instance and class rules. (Dawid Weiss)
12379
12380* LUCENE-4016: Require ANT 1.8.2+ for the build.
12381
12382* LUCENE-3808: Refactoring of testing infrastructure to use randomizedtesting
12383  package: http://labs.carrotsearch.com/randomizedtesting.html (Dawid Weiss)
12384
12385* LUCENE-3964: Added target stage-maven-artifacts, which stages
12386  Maven release artifacts to a Maven staging repository in preparation
12387  for release.  (Steve Rowe)
12388
12389* LUCENE-2845: Moved contrib/benchmark to lucene/benchmark.
12390
12391* LUCENE-2995: Moved contrib/spellchecker into lucene/suggest.
12392
12393* LUCENE-3285: Moved contrib/queryparser into lucene/queryparser
12394
12395* LUCENE-3285: Moved contrib/xml-query-parser's demo into lucene/demo
12396
12397* LUCENE-3271: Moved contrib/queries BooleanFilter, BoostingQuery,
12398  ChainedFilter, FilterClause and TermsFilter into lucene/queries
12399
12400* LUCENE-3381: Moved contrib/queries regex.*, DuplicateFilter,
12401  FuzzyLikeThisQuery and SlowCollated* into lucene/sandbox.
12402  Removed contrib/queries.
12403
12404* LUCENE-3286: Moved remainder of contrib/xml-query-parser to lucene/queryparser.
12405  Classes now found at org.apache.lucene.queryparser.xml.*
12406
12407* LUCENE-4059: Improve ANT task prepare-webpages (used by documentation
12408  tasks) to correctly encode build file names as URIs for later processing by
12409  XSL.  (Greg Bowyer, Uwe Schindler)
12410
12411
12412======================= Lucene 3.6.2 =======================
12413
12414Bug Fixes
12415
12416* LUCENE-4234: Exception when FacetsCollector is used with ScoreFacetRequest,
12417  and the number of matching documents is too large. (Gilad Barkai via Shai Erera)
12418
12419* LUCENE-2686, LUCENE-3505, LUCENE-4401: Fix BooleanQuery scorers to
12420  return correct freq().
12421  (Koji Sekiguchi, Mike McCandless, Liu Chao, Robert Muir)
12422
12423* LUCENE-2501: Fixed rare thread-safety issue that could cause
12424  ArrayIndexOutOfBoundsException inside ByteBlockPool (Robert Muir,
12425  Mike McCandless)
12426
12427* LUCENE-4297: BooleanScorer2 would multiply the coord() factor
12428  twice for conjunctions: for most users this is no problem, but
12429  if you had a customized Similarity that returned something other
12430  than 1 when overlap == maxOverlap (always the case for conjunctions),
12431  then the score would be incorrect.  (Pascal Chollet, Robert Muir)
12432
12433* LUCENE-4300: BooleanQuery's rewrite was not always safe: if you
12434  had a custom Similarity where coord(1,1) != 1F, then the rewritten
12435  query would be scored differently.  (Robert Muir)
12436
12437* LUCENE-4398: If you index many different field names in your
12438  documents then due to a bug in how it measures its RAM
12439  usage, IndexWriter would flush each segment too early eventually
12440  reaching the point where it flushes after every doc.  (Tim Smith via
12441  Mike McCandless)
12442
12443* LUCENE-4411: when sampling is enabled for a FacetRequest, its depth
12444  parameter is reset to the default (1), even if set otherwise.
12445  (Gilad Barkai via Shai Erera)
12446
12447* LUCENE-4635: Fixed ArrayIndexOutOfBoundsException when in-memory
12448  terms index requires more than 2.1 GB RAM (indices with billions of
12449  terms).  (Tom Burton-West via Mike McCandless)
12450
12451Documentation
12452
12453* LUCENE-4302: Fix facet userguide to have HTML loose doctype like
12454  all other javadocs.  (Karl Nicholas via Uwe Schindler)
12455
12456
12457======================= Lucene 3.6.1 =======================
12458More information about this release, including any errata related to the
12459release notes, upgrade instructions, or other changes may be found online at:
12460   https://wiki.apache.org/lucene-java/Lucene3.6.1
12461
12462Bug Fixes
12463
12464* LUCENE-3969: Throw IAE on bad arguments that could cause confusing
12465  errors in KeywordTokenizer.
12466  (Uwe Schindler, Mike McCandless, Robert Muir)
12467
12468* LUCENE-3971: MappingCharFilter could return invalid final token position.
12469  (Dawid Weiss, Robert Muir)
12470
12471* LUCENE-4023: DisjunctionMaxScorer now implements visitSubScorers().
12472  (Uwe Schindler)
12473
12474* LUCENE-2566: + - operators allow any amount of whitespace (yonik, janhoy)
12475
12476* LUCENE-3590: Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when
12477  offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix
12478  CharsRef's CharSequence methods to throw exceptions in boundary cases
12479  to properly meet the specification.  (Robert Muir)
12480
12481* LUCENE-4222: TieredMergePolicy.getFloorSegmentMB was returning the
12482  size in bytes not MB (Chris Fuller via Mike McCandless)
12483
12484API Changes
12485
12486* LUCENE-4023: Changed the visibility of Scorer#visitSubScorers() to
12487  public, otherwise it's impossible to implement Scorers outside
12488  the Lucene package.  (Uwe Schindler)
12489
12490Optimizations
12491
12492* LUCENE-4163: Improve concurrency of MMapIndexInput.clone() by using
12493  the new WeakIdentityMap on top of a ConcurrentHashMap to manage
12494  the cloned instances. WeakIdentityMap was extended to support
12495  iterating over its keys.  (Uwe Schindler)
12496
12497Tests
12498
12499* LUCENE-3873: add MockGraphTokenFilter, testing analyzers with
12500  random graph tokens.  (Mike McCandless)
12501
12502* LUCENE-3968: factor out LookaheadTokenFilter from
12503  MockGraphTokenFilter (Mike McCandless)
12504
12505
12506======================= Lucene 3.6.0 =======================
12507More information about this release, including any errata related to the
12508release notes, upgrade instructions, or other changes may be found online at:
12509   https://wiki.apache.org/lucene-java/Lucene3.6
12510
12511Changes in backwards compatibility policy
12512
12513* LUCENE-3594: The protected inner class (never intended to be visible)
12514  FieldCacheTermsFilter.FieldCacheTermsFilterDocIdSet was removed and
12515  replaced by another internal implementation.  (Uwe Schindler)
12516
12517* LUCENE-3620: FilterIndexReader now overrides all methods of IndexReader that
12518  it should (note that some are still not overridden, as they should be
12519  overridden by sub-classes only). In the process, some methods of IndexReader
12520  were made final. This is not expected to affect many apps, since these methods
12521  already delegate to abstract methods, which you had to already override
12522  anyway. (Shai Erera)
12523
12524* LUCENE-3636: Added SearcherFactory, used by SearcherManager and NRTManager
12525  to create new IndexSearchers. You can provide your own implementation to
12526  warm new searchers, set an ExecutorService, set a custom Similarity, or
12527  even return your own subclass of IndexSearcher. The SearcherWarmer and
12528  ExecutorService parameters on these classes were removed, as they are
12529  subsumed by SearcherFactory.  (Shai Erera, Mike McCandless, Robert Muir)
12530
12531* LUCENE-3644: The expert ReaderFinishedListener api suffered problems (propagated
12532  down to subreaders, but was not called on SegmentReaders, unless they were
12533  the owner of the reader core, and other ambiguities). The API is revised:
12534  You can set ReaderClosedListeners on any IndexReader, and onClose is called
12535  when that reader is closed.  SegmentReader has CoreClosedListeners that you
12536  can register to know when a shared reader core is closed.
12537  (Uwe Schindler, Mike McCandless, Robert Muir)
12538
12539* LUCENE-3652: The package org.apache.lucene.messages was moved to
12540  contrib/queryparser. If you have used those classes in your code
12541  just add the lucene-queryparser.jar file to your classpath.
12542  (Uwe Schindler)
12543
12544* LUCENE-3681: FST now stores labels for BYTE2 input type as 2 bytes
12545  instead of vInt; this can make FSTs smaller and faster, but it is a
12546  break in the binary format so if you had built and saved any FSTs
12547  then you need to rebuild them. (Robert Muir, Mike McCandless)
12548
12549* LUCENE-3679: The expert IndexReader.getFieldNames(FieldOption) API
12550  has been removed and replaced with the experimental getFieldInfos
12551  API.  All IndexReader subclasses must implement getFieldInfos.
12552  (Mike McCandless)
12553
12554* LUCENE-3695: Move confusing add(X) methods out of FST.Builder into
12555  FST.Util.  (Robert Muir, Mike McCandless)
12556
12557* LUCENE-3701: Added an additional argument to the expert FST.Builder
12558  ctor to take FreezeTail, which you can use to (very-expertly) customize
12559  the FST construction process. Pass null if you want the default
12560  behavior.  Added seekExact() to FSTEnum, and added FST.save/read
12561  from a File. (Mike McCandless, Dawid Weiss, Robert Muir)
12562
12563* LUCENE-3712: Removed unused and untested ReaderUtil#subReader methods.
12564  (Uwe Schindler)
12565
12566* LUCENE-3672: Deprecate Directory.fileModified,
12567  IndexCommit.getTimestamp and .getVersion and
12568  IndexReader.lastModified and getCurrentVersion (Andrzej Bialecki,
12569  Robert Muir, Mike McCandless)
12570
12571* LUCENE-3760: In IndexReader/DirectoryReader, deprecate static
12572  methods getCurrentVersion and getCommitUserData, and non-static
12573  method getCommitUserData (use getIndexCommit().getUserData()
12574  instead).  (Ryan McKinley, Robert Muir, Mike McCandless)
12575
12576* LUCENE-3867: Deprecate instance creation of RamUsageEstimator, instead
12577  the new static method sizeOf(Object) should be used. As the algorithm
12578  is now using Hotspot(TM) internals (reference size, header sizes,
12579  object alignment), the abstract o.a.l.util.MemoryModel class was
12580  completely removed (without replacement). The new static methods
12581  no longer support String intern-ness checking, interned strings
12582  now count to memory usage as any other Java object.
12583  (Dawid Weiss, Uwe Schindler, Shai Erera)
12584
12585* LUCENE-3738: All readXxx methods in BufferedIndexInput were made
12586  final. Subclasses should only override protected readInternal /
12587  seekInternal.  (Uwe Schindler)
12588
12589* LUCENE-2599: Deprecated the spatial contrib module, which was buggy and not
12590  well maintained.  Lucene 4 includes a new spatial module that replaces this.
12591  (David Smiley, Ryan McKinley, Chris Male)
12592
12593Changes in Runtime Behavior
12594
12595* LUCENE-3796, SOLR-3241: Throw an exception if you try to set an index-time
12596  boost on a field that omits norms. Because the index-time boost
12597  is multiplied into the norm, previously your boost would be
12598  silently discarded.  (Tomás Fernández Löbbe, Hoss Man, Robert Muir)
12599
12600* LUCENE-3848: Fix tokenstreams to not produce a stream with an initial
12601  position increment of 0: which is out of bounds (overlapping with a
12602  non-existent previous term). Consumers such as IndexWriter and QueryParser
12603  still check for and silently correct this situation today, but at some point
12604  in the future they may throw an exception.  (Mike McCandless, Robert Muir)
12605
12606* LUCENE-3738: DataInput/DataOutput no longer allow negative vLongs. Negative
12607  vInts are still supported (for index backwards compatibility), but
12608  should not be used in new code. The read method for negative vLongs
12609  was already broken since Lucene 3.1.
12610  (Uwe Schindler, Mike McCandless, Robert Muir)
12611
12612Security fixes
12613
12614* LUCENE-3588: Try harder to prevent SIGSEGV on cloned MMapIndexInputs:
12615  Previous versions of Lucene could SIGSEGV the JVM if you try to access
12616  the clone of an IndexInput retrieved from MMapDirectory. This security fix
12617  prevents this as best as it can by throwing AlreadyClosedException
12618  also on clones.  (Uwe Schindler, Robert Muir)
12619
12620API Changes
12621
12622* LUCENE-3606: IndexReader will be made read-only in Lucene 4.0, so all
12623  methods allowing to delete or undelete documents using IndexReader were
12624  deprecated; you should use IndexWriter now. Consequently
12625  IndexReader.commit() and all open(), openIfChanged(), clone() methods
12626  taking readOnly booleans (or IndexDeletionPolicy instances) were
12627  deprecated. IndexReader.setNorm() is superfluous and was deprecated.
12628  If you have to change per-document boost use CustomScoreQuery.
12629  If you want to dynamically change norms (boost *and* length norm) at
12630  query time, wrap your IndexReader using FilterIndexReader, overriding
12631  FilterIndexReader.norms(). To persist the changes on disk, copy the
12632  FilteredIndexReader to a new index using IndexWriter.addIndexes().
12633  In Lucene 4.0, SimilarityProvider will allow you to customize scoring
12634  using external norms, too.  (Uwe Schindler, Robert Muir)
12635
12636* LUCENE-3735: PayloadProcessorProvider was changed to return a
12637  ReaderPayloadProcessor instead of DirPayloadProcessor. The selection
12638  of the provider to return for the factory is now based on the IndexReader
12639  to be merged. To mimic the old behaviour, just use IndexReader.directory()
12640  for choosing the provider by Directory.  (Uwe Schindler)
12641
12642* LUCENE-3765: Deprecated StopFilter ctor that took ignoreCase, because
12643  in some cases (if the set is a CharArraySet), the argument is ignored.
12644  Deprecated StandardAnalyzer and ClassicAnalyzer ctors that take File,
12645  please use the Reader ctor instead.  (Robert Muir)
12646
12647* LUCENE-3766: Deprecate no-arg ctors of Tokenizer. Tokenizers are
12648  TokenStreams with Readers: tokenizers with null Readers will not be
12649  supported in Lucene 4.0, just use a TokenStream.
12650  (Mike McCandless, Robert Muir)
12651
12652* LUCENE-3769: Simplified NRTManager by requiring applyDeletes to be
12653  passed to ctor only; if an app needs to mix and match it's free to
12654  create two NRTManagers (one always applying deletes and the other
12655  never applying deletes).  (MJB, Shai Erera, Mike McCandless)
12656
12657* LUCENE-3761: Generalize SearcherManager into an abstract ReferenceManager.
12658  SearcherManager remains a concrete class, but due to the refactoring, the
12659  method maybeReopen has been deprecated in favor of maybeRefresh().
12660  (Shai Erera, Mike McCandless, Simon Willnauer)
12661
12662* LUCENE-3776: You now acquire/release the IndexSearcher directly from
12663  NRTManager.  (Mike McCandless)
12664
12665New Features
12666
12667* LUCENE-3593: Added a FieldValueFilter that accepts all documents that either
12668  have at least one or no value at all in a specific field. (Simon Willnauer,
12669  Uwe Schindler, Robert Muir)
12670
12671* LUCENE-3586: CheckIndex and IndexUpgrader allow you to specify the
12672  specific FSDirectory implementation to use (with the new -dir-impl
12673  command-line option).  (Luca Cavanna via Mike McCandless)
12674
12675* LUCENE-3634: IndexReader's static main method was moved to a new
12676  tool, CompoundFileExtractor, in contrib/misc.  (Robert Muir, Mike
12677  McCandless)
12678
12679* LUCENE-995: The QueryParser now interprets * as an open end for range
12680  queries.  Literal asterisks may be represented by quoting or escaping
12681  (i.e. \* or "*")  Custom QueryParser subclasses overriding getRangeQuery()
12682  will be passed null for any open endpoint. (Ingo Renner, Adriano
12683  Crestani, yonik, Mike McCandless
12684
12685* LUCENE-3121: Add sugar reverse lookup (given an output, find the
12686  input mapping to it) for FSTs that have strictly monotonic long
12687  outputs (such as an ord).  (Mike McCandless)
12688
12689* LUCENE-3671: Add TypeTokenFilter that filters tokens based on
12690  their TypeAttribute.  (Tommaso Teofili via Uwe Schindler)
12691
12692* LUCENE-3690,LUCENE-3913: Added HTMLStripCharFilter, a CharFilter that strips
12693  HTML markup. (Steve Rowe)
12694
12695* LUCENE-3725: Added optional packing to FST building; this uses extra
12696  RAM during building but results in a smaller FST.  (Mike McCandless)
12697
12698* LUCENE-3714: Add top N shortest cost paths search for FST.
12699  (Robert Muir, Dawid Weiss, Mike McCandless)
12700
12701* LUCENE-3789: Expose MTQ TermsEnum via RewriteMethod for non package private
12702  access (Simon Willnauer)
12703
12704* LUCENE-3881: Added UAX29URLEmailAnalyzer: a standard analyzer that recognizes
12705  URLs and emails. (Steve Rowe)
12706
12707Bug fixes
12708
12709* LUCENE-3595: Fixed FieldCacheRangeFilter and FieldCacheTermsFilter
12710  to correctly respect deletions on reopened SegmentReaders. Factored out
12711  FieldCacheDocIdSet to be a top-level class.  (Uwe Schindler, Simon Willnauer)
12712
12713* LUCENE-3627: Don't let an errant 0-byte segments_N file corrupt the index.
12714  (Ken McCracken via Mike McCandless)
12715
12716* LUCENE-3630: The internal method MultiReader.doOpenIfChanged(boolean doClone)
12717  was overriding IndexReader.doOpenIfChanged(boolean readOnly), so changing the
12718  contract of the overridden method. This method was renamed and made private.
12719  In ParallelReader the bug was not existent, but the implementation method
12720  was also made private.  (Uwe Schindler)
12721
12722* LUCENE-3641: Fixed MultiReader to correctly propagate readerFinishedListeners
12723  to clones/reopened readers.  (Uwe Schindler)
12724
12725* LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed bugs in CharTokenizer, n-gram tokenizers/filters,
12726  compound token filters, thai word filter, icutokenizer, pattern analyzer,
12727  wikipediatokenizer, and smart chinese where they would create invalid offsets in
12728  some situations, leading to problems in highlighting.
12729  (Max Beutel, Edwin Steiner via Robert Muir)
12730
12731* LUCENE-3639: TopDocs.merge was incorrectly setting TopDocs.maxScore to
12732  Float.MIN_VALUE when it should be Float.NaN, when there were 0
12733  hits.  Improved age calculation in SearcherLifetimeManager, to have
12734  double precision and to compute age to be how long ago the searcher
12735  was replaced with a new searcher (Mike McCandless)
12736
12737* LUCENE-3658: Corrected potential concurrency issues with
12738  NRTCachingDir, fixed createOutput to overwrite any previous file,
12739  and removed invalid asserts (Robert Muir, Mike McCandless)
12740
12741* LUCENE-3605: don't sleep in a retry loop when trying to locate the
12742  segments_N file (Robert Muir, Mike McCandless)
12743
12744* LUCENE-3711: SentinelIntSet with a small initial size can go into
12745  an infinite loop when expanded.  This can affect grouping using
12746  TermAllGroupsCollector or TermAllGroupHeadsCollector if instantiated with a
12747  non default small size. (Martijn van Groningen, yonik)
12748
12749* LUCENE-3727: When writing stored fields and term vectors, Lucene
12750  checks file sizes to detect a bug in some Sun JREs (LUCENE-1282),
12751  however, on some NFS filesystems File.length() could be stale,
12752  resulting in false errors like "fdx size mismatch while indexing".
12753  These checks now use getFilePointer instead to avoid this.
12754  (Jamir Shaikh, Mike McCandless, Robert Muir)
12755
12756* LUCENE-3816: Fixed problem in FilteredDocIdSet, if null was returned
12757  from the delegate DocIdSet.iterator(), which is allowed to return
12758  null by DocIdSet specification when no documents match.
12759  (Shay Banon via Uwe Schindler)
12760
12761* LUCENE-3821: SloppyPhraseScorer missed documents that ExactPhraseScorer finds
12762  When phrase query had repeating terms (e.g. "yes no yes")
12763  sloppy query missed documents that exact query matched.
12764  Fixed except when for repeating multiterms (e.g. "yes no yes|no").
12765  (Robert Muir, Doron Cohen)
12766
12767* LUCENE-3841: Fix CloseableThreadLocal to also purge stale entries on
12768  get(); this fixes certain cases where we were holding onto objects
12769  for dead threads for too long (Matthew Bellew, Mike McCandless)
12770
12771* LUCENE-3872: IndexWriter.close() now throws IllegalStateException if
12772  you call it after calling prepareCommit() without calling commit()
12773  first.  (Tim Bogaert via Mike McCandless)
12774
12775* LUCENE-3874: Throw IllegalArgumentException from IndexWriter (rather
12776  than producing a corrupt index), if a positionIncrement would cause
12777  integer overflow. This can happen, for example when using a buggy
12778  TokenStream that forgets to call clearAttributes() in combination
12779  with a StopFilter. (Robert Muir)
12780
12781* LUCENE-3876: Fix bug where positions for a document exceeding
12782  Integer.MAX_VALUE/2 would produce a corrupt index.
12783  (Simon Willnauer, Mike McCandless, Robert Muir)
12784
12785* LUCENE-3880: UAX29URLEmailTokenizer now recognizes emails when the mailto:
12786  scheme is prepended. (Kai Gülzau, Steve Rowe)
12787
12788Optimizations
12789
12790* LUCENE-3653: Improve concurrency in VirtualMethod and AttributeSource by
12791  using a WeakIdentityMap based on a ConcurrentHashMap.  (Uwe Schindler,
12792  Gerrit Jansen van Vuuren)
12793
12794Documentation
12795
12796* LUCENE-3597: Fixed incorrect grouping documentation. (Martijn van Groningen,
12797  Robert Muir)
12798
12799* LUCENE-3926: Improve documentation of RAMDirectory, because this
12800  class is not intended to work with huge indexes. Everything beyond
12801  several hundred megabytes will waste resources (GC cycles), because
12802  it uses an internal buffer size of 1024 bytes, producing millions of
12803  byte[1024] arrays. This class is optimized for small memory-resident
12804  indexes. It also has bad concurrency on multithreaded environments.
12805  It is recommended to materialize large indexes on disk and use
12806  MMapDirectory, which is a high-performance directory implementation
12807  working directly on the file system cache of the operating system,
12808  so copying data to Java heap space is not useful.  (Uwe Schindler,
12809  Mike McCandless, Robert Muir)
12810
12811Build
12812
12813* LUCENE-3857: exceptions from other threads in beforeclass/etc do not fail
12814  the test (Dawid Weiss)
12815
12816* LUCENE-3847: LuceneTestCase will now check for modifications of System
12817  properties before and after each test (and suite). If changes are detected,
12818  the test will fail. A rule can be used to reset system properties to
12819  before-scope state (and this has been used to make Solr tests pass).
12820  (Dawid Weiss, Uwe Schindler).
12821
12822* LUCENE-3228: Stop downloading external javadoc package-list files:
12823
12824  - Added package-list files for Oracle Java javadocs and JUnit javadocs to
12825    Lucene/Solr subversion.
12826
12827  - The Oracle Java javadocs package-list file is excluded from Lucene and
12828    Solr source release packages.
12829
12830  - Regardless of network connectivity, javadocs built from a subversion
12831    checkout contain links to Oracle & JUnit javadocs.
12832
12833  - Building javadocs from a source release package will download the Oracle
12834    Java package-list file if it isn't already present.
12835
12836  - When the Oracle Java package-list file is not present and download fails,
12837    the javadocs targets will not fail the build, though an error will appear
12838    in the build log.  In this case, the built javadocs will not contain links
12839    to Oracle Java javadocs.
12840
12841  - Links from Solr javadocs to Lucene's javadocs are enabled. When building
12842    a X.Y.Z-SNAPSHOT version, the links are to the most recently built nightly
12843    Jenkins javadocs. When building a release version, links are to the
12844    Lucene release javadocs for the same version.
12845
12846  (Steve Rowe, hossman)
12847
12848* LUCENE-3753: Restructure the Lucene build system:
12849  - Created a new Lucene-internal module named "core" by moving the java/
12850    and test/ directories from lucene/src/ to lucene/core/src/.
12851  - Eliminated lucene/src/ by moving all its directories up one level.
12852  - Each internal module (core/, test-framework/, and tools/) now has its own
12853    build.xml, from which it is possible to run module-specific targets.
12854    lucene/build.xml delegates all build tasks (via
12855    <ant dir="internal-module-dir"> calls) to these modules' build.xml files.
12856  (Steve Rowe)
12857
12858* LUCENE-3774: Optimized and streamlined license and notice file validation
12859  by refactoring the build task into an ANT task and modifying build scripts
12860  to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir)
12861
12862* LUCENE-3762: Upgrade JUnit to 4.10, refactor state-machine of detecting
12863  setUp/tearDown call chaining in LuceneTestCase. (Dawid Weiss, Robert Muir)
12864
12865* LUCENE-3944: Make the 'generate-maven-artifacts' target use filtered POMs
12866  placed under lucene/build/poms/, rather than in each module's base
12867  directory.  The 'clean' target now removes them.
12868  (Steve Rowe, Robert Muir)
12869
12870* LUCENE-3930: Changed build system to use Apache Ivy for retrival of 3rd
12871  party JAR files.  Please review BUILD.txt for instructions.
12872  (Robert Muir, Chris Male, Uwe Schindler, Steven Rowe, Hossman)
12873
12874
12875======================= Lucene 3.5.0 =======================
12876
12877Changes in backwards compatibility policy
12878
12879* LUCENE-3390: The first approach in Lucene 3.4.0 for missing values
12880  support for sorting had a design problem that made the missing value
12881  be populated directly into the FieldCache arrays during sorting,
12882  leading to concurrency issues. To fix this behaviour, the method
12883  signatures had to be changed:
12884  - FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField()
12885    returning a Bits interface (backported from Lucene 4.0).
12886  - FieldComparator.setMissingValue() was removed and added to
12887    constructor
12888  As this is expert API, most code will not be affected.
12889  (Uwe Schindler, Doron Cohen, Mike McCandless)
12890
12891* LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to
12892  keep a buffer in your IndexInput, do this yourself in your implementation,
12893  and be sure to do the right thing on clone()!  (Robert Muir)
12894
12895* LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of
12896  relying on a private daemon thread. The global time limiting clock thread
12897  has been exposed and is now lazily loaded and fully optional.
12898  TimeLimitingCollector now supports setting clock baseline manually to include
12899  prelude of a search. Previous versions set the baseline on construction time,
12900  now baseline is set once the first IndexReader is passed to the collector
12901  unless set before. (Simon Willnauer)
12902
12903Changes in runtime behavior
12904
12905* LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time
12906  reader, will now return null if there are no changes.  The API has
12907  always reserved the right to do this; it's just that in the past for
12908  near-real-time readers it never did. (Mike McCandless)
12909
12910Bug fixes
12911
12912* LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results
12913  for queries with many repeats (Doron Cohen)
12914
12915* LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false.
12916  (Edward Drapkin via Robert Muir)
12917
12918* LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy
12919  should ignore the maxMergedSegmentMB setting (v.sevel via Mike
12920  McCandless)
12921
12922* LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic
12923  IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing
12924  QueryWrapperFilter and similar classes to get a top-level DocIdSet.
12925  (Dan C., Uwe Schindler)
12926
12927* LUCENE-3390: Corrected handling of missing values when two parallel searches
12928  using different missing values for sorting: the missing value was populated
12929  directly into the FieldCache arrays during sorting, leading to concurrency
12930  issues.  (Uwe Schindler, Doron Cohen, Mike McCandless)
12931
12932* LUCENE-3439: Closing an NRT reader after the writer was closed was
12933  incorrectly invoking the DeletionPolicy and (then possibly deleting
12934  files) on the closed IndexWriter (Robert Muir, Mike McCandless)
12935
12936* LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq
12937  (Robert Muir, Doron Cohen)
12938
12939* LUCENE-3503: DisjunctionSumScorer would give slightly different scores
12940  for a document depending if you used nextDoc() versus advance().
12941  (Mike McCandless, Robert Muir)
12942
12943* LUCENE-3529: Properly support indexing an empty field with empty term text.
12944  Previously, if you had assertions enabled you would receive an error during
12945  flush, if you didn't, you would get an invalid index.
12946  (Mike McCandless, Robert Muir)
12947
12948* LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal
12949  structures larger than 256MB (Toke Eskildsen via Mike McCandless)
12950
12951* LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the
12952  error message in IndexFormatTooOldException was incorrect. (Uwe Schindler,
12953  Mike McCandless)
12954
12955* LUCENE-3541: IndexInput's default copyBytes() implementation was not safe
12956  across multiple threads, because all clones shared the same buffer.
12957  (Robert Muir)
12958
12959* LUCENE-3548: Fix CharsRef#append to extend length of the existing char[]
12960  and preserve existing chars. (Simon Willnauer)
12961
12962* LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() /
12963  NumericUtils.doubleToSortableLong(), so this is consistent with stored
12964  fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open
12965  ranges (one bound is null). Because of normalization, NumericRangeQuery
12966  can now be used to hit NaN values by creating a query with
12967  upper == lower == NaN (inclusive).  (Dawid Weiss, Uwe Schindler)
12968
12969API Changes
12970
12971* LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage
12972  use of this method since it is horribly costly and rarely justified
12973  anymore.  MergePolicy.findMergesForOptimize was renamed to
12974  findForcedMerges.  IndexReader.isOptimized was
12975  deprecated. IndexCommit.isOptimized was replaced with
12976  getSegmentCount. (Robert Muir, Mike McCandless)
12977
12978* LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and
12979  related methods], as the numbers returned are not useful
12980  for multi-segment indexes. They were only needed for tests of
12981  NumericRangeQuery.  (Mike McCandless, Uwe Schindler)
12982
12983* LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants
12984  and add new ones for Java 6 and Java 7.  (Uwe Schindler)
12985
12986* LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors
12987  that take IndexReader instead.  (Robert Muir)
12988
12989* LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes,
12990  and revamped the javadocs, to discourage
12991  use of this method since it is horribly costly and rarely
12992  justified.  MergePolicy.findMergesToExpungeDeletes was renamed to
12993  findForcedDeletesMerges. (Robert Muir, Mike McCandless)
12994
12995* LUCENE-3464: IndexReader.reopen has been renamed to
12996  IndexReader.openIfChanged (a static method), and now returns null
12997  (instead of the old reader) if there are no changes in the index, to
12998  prevent the common pitfall of accidentally closing the old reader.
12999
13000New Features
13001
13002* LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI).
13003  (Uwe Schindler)
13004
13005* LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a
13006  specified ScoreDoc (e.g. last document on the previous page) to support deep
13007  paging use cases.  (Aaron McCurry, Grant Ingersoll, Robert Muir)
13008
13009* LUCENE-1990: Adds internal packed ints implementation, to be used
13010  for more efficient storage of int arrays when the values are
13011  bounded, for example for storing the terms dict index (Toke
13012  Eskildsen via Mike McCandless)
13013
13014* LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into
13015  core. All classes are contained in o.a.l.search. (Simon Willnauer)
13016
13017Optimizations
13018
13019* LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to
13020  reduce the number of terms of the query when rewrite(), in order to improve
13021  performance.  (Robert Muir, Koji Sekiguchi)
13022
13023* LUCENE-3494: Optimize FilteredQuery to remove a multiply in score()
13024  (Uwe Schindler, Robert Muir)
13025
13026* LUCENE-3534: Remove filter logic from IndexSearcher and delegate to
13027  FilteredQuery's Scorer. This is a partial backport of a cleanup in
13028  FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
13029  (Uwe Schindler)
13030
13031* LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold
13032  the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless)
13033
13034* LUCENE-3443: FieldCache can now set docsWithField, and create an
13035  array, in a single pass.  This results in faster init time for apps
13036  that need both (such as sorting by a field with a missing value).
13037  (Mike McCandless)
13038
13039Test Cases
13040
13041* LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer
13042  for implementing subclasses in different packages, where assertions are not
13043  enabled. (Uwe Schindler)
13044
13045* LUCENE-3506: tests relying on assertions being enabled were no-op because
13046  they ignored AssertionError. With this fix now entire test framework
13047  (every test) fails if assertions are disabled, unless
13048  -Dtests.asserts.gracious=true is specified. (Doron Cohen)
13049
13050Build
13051
13052* SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe)
13053
13054* LUCENE-3561: Fix maven xxx-src.jar files that were missing resources.
13055  (Uwe Schindler)
13056
13057======================= Lucene 3.4.0 =======================
13058
13059Bug fixes
13060
13061* LUCENE-3251: Directory#copy failed to close target output if opening the
13062  source stream failed. (Simon Willnauer)
13063
13064* LUCENE-3255: If segments_N file is all zeros (due to file
13065  corruption), don't read that to mean the index is empty.  (Gregory
13066  Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
13067
13068* LUCENE-3254: Fixed minor bug in deletes were written to disk,
13069  causing the file to sometimes be larger than it needed to be.  (Mike
13070  McCandless)
13071
13072* LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
13073  corrupt index if a term with docfreq >= 16 was indexed more than once
13074  at the same position.  (Robert Muir)
13075
13076* LUCENE-3339: Fixed deadlock case when multiple threads use the new
13077  block-add (IndexWriter.add/updateDocuments) methods.  (Robert Muir,
13078  Mike McCandless)
13079
13080* LUCENE-3340: Fixed case where IndexWriter was not flushing at
13081  exactly maxBufferedDeleteTerms (Mike McCandless)
13082
13083* LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
13084  wrongly discarded combining marks attached to Han or Hiragana characters,
13085  this is fixed if you supply Version >= 3.4 If you supply a previous
13086  lucene version, you get the old buggy behavior for backwards compatibility.
13087  (Trejkaz, Robert Muir)
13088
13089* LUCENE-3368: IndexWriter commits segments without applying their buffered
13090  deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
13091
13092* LUCENE-3365: Create or Append mode determined before obtaining write lock
13093  can cause IndexWriter overriding an existing index.
13094  (Geoff Cooney via Simon Willnauer)
13095
13096* LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
13097  throw NoSuchDirectoryException when all files written so far have been
13098  written to one directory, but the other still has not yet been created on the
13099  filesystem.  (Robert Muir)
13100
13101* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
13102  SegmentReaders, leading to unused files accumulating in the
13103  Directory.  (tal steier via Mike McCandless)
13104
13105* LUCENE-3418: Lucene was failing to fsync index files on commit,
13106  meaning an operating system or hardware crash, or power loss, could
13107  easily corrupt the index.  (Mark Miller, Robert Muir, Mike
13108  McCandless)
13109
13110New Features
13111
13112* LUCENE-3290: Added FieldInvertState.numUniqueTerms
13113  (Mike McCandless, Robert Muir)
13114
13115* LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
13116  (grow on demand if you set/get/clear too-large indices).  (Mike
13117  McCandless)
13118
13119* LUCENE-2048: Added the ability to omit positions but still index
13120  term frequencies, you can now control what is indexed into
13121  the postings via AbstractField.setIndexOptions:
13122   DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
13123   DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
13124   DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
13125  AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
13126  you should use DOCS_ONLY instead.  (Robert Muir)
13127
13128* LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
13129  documents per group. This can be useful in situations when one wants to compute grouping
13130  based facets / statistics on the complete query result. (Martijn van Groningen)
13131
13132* LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
13133  suppressed exceptions in the original exception, so stack trace
13134  will contain them.  (Uwe Schindler)
13135
13136Optimizations
13137
13138* LUCENE-3201, LUCENE-3218: CompoundFileSystem code has been consolidated
13139  into a Directory implementation. Reading is optimized for MMapDirectory,
13140  NIOFSDirectory and SimpleFSDirectory to only map requested parts of the
13141  CFS into an IndexInput. Writing to a CFS now tries to append to the CF
13142  directly if possible and merges separately written files on the fly instead
13143  of during close. (Simon Willnauer, Robert Muir)
13144
13145* LUCENE-3289: When building an FST you can now tune how aggressively
13146  the FST should try to share common suffixes.  Typically you can
13147  greatly reduce RAM required during building, and CPU consumed, at
13148  the cost of a somewhat larger FST.  (Mike McCandless)
13149
13150Test Cases
13151
13152* LUCENE-3327: Fix AIOOBE when TestFSTs is run with -Dtests.verbose=true
13153 (James Dyer via Mike McCandless)
13154
13155Build
13156
13157* LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
13158  to package sources from the local working copy.
13159  (Seung-Yeoul Yang via Steve Rowe)
13160
13161
13162======================= Lucene 3.3.0 =======================
13163
13164Changes in backwards compatibility policy
13165
13166* LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
13167  of IndexInput) as its first argument.  (Robert Muir, Dawid Weiss,
13168  Mike McCandless)
13169
13170* LUCENE-3191: FieldComparator.value now returns an Object not
13171  Comparable; FieldDoc.fields also changed from Comparable[] to
13172  Object[] (Uwe Schindler, Mike McCandless)
13173
13174* LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
13175  Searcher.createWeight() final to prevent override. If you have
13176  overridden one of these methods, cut over to the non-deprecated
13177  implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
13178
13179* LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
13180  problems (such as not properly setting rewrite methods, or
13181  not working correctly with things like SpanMultiTermQueryWrapper).
13182  To rewrite to a simpler form, instead return a simpler enum
13183  from getEnum(IndexReader). For example, to rewrite to a single term,
13184  return a SingleTermEnum.  (ludovic Boutros, Uwe Schindler, Robert Muir)
13185
13186Changes in runtime behavior
13187
13188* LUCENE-2834: the hash used to compute the lock file name when the
13189  lock file is not stored in the index has changed.  This means you
13190  will see a different lucene-XXX-write.lock in your lock directory.
13191  (Robert Muir, Uwe Schindler, Mike McCandless)
13192
13193* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
13194  does not store norms. (Shai Erera, Mike McCandless)
13195
13196* LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
13197  FSDirectory.open now defaults to MMapDirectory instead of
13198  NIOFSDirectory since MMapDirectory gives better performance.  (Mike
13199  McCandless)
13200
13201* LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
13202  When setting the chunk size, it is rounded down to the next possible
13203  value. The new default value for 64 bit platforms is 2^30 (1 GiB),
13204  for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
13205  Internally, MMapDirectory now only uses one dedicated final IndexInput
13206  implementation supporting multiple chunks, which makes Hotspot's life
13207  easier.  (Uwe Schindler, Robert Muir, Mike McCandless)
13208
13209Bug fixes
13210
13211* LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
13212  code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
13213  including locks, and fails if the test fails to release all of them.
13214  (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
13215
13216* LUCENE-3102: CachingCollector.replay was failing to call setScorer
13217  per-segment (Martijn van Groningen via Mike McCandless)
13218
13219* LUCENE-3183: Fix rare corner case where seeking to empty term
13220  (field="", term="") with terms index interval 1 could hit
13221  ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
13222  McCandless)
13223
13224* LUCENE-3208: IndexSearcher had its own private similarity field
13225  and corresponding get/setter overriding Searcher's implementation. If you
13226  setted a different Similarity instance on IndexSearcher, methods implemented
13227  in the superclass Searcher were not using it, leading to strange bugs.
13228  (Uwe Schindler, Robert Muir)
13229
13230* LUCENE-3197: Fix core merge policies to not over-merge during
13231  background optimize when documents are still being deleted
13232  concurrently with the optimize (Mike McCandless)
13233
13234* LUCENE-3222: The RAM accounting for buffered delete terms was
13235  failing to measure the space required to hold the term's field and
13236  text character data.  (Mike McCandless)
13237
13238* LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
13239  of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
13240  an error instead.  (ludovic Boutros, Uwe Schindler, Robert Muir)
13241
13242API Changes
13243
13244* LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
13245  public method IndexSearcher.createNormalizedWeight() as this better describes
13246  what this method does. The old method is still there for backwards
13247  compatibility. Query.weight() was deprecated and simply delegates to
13248  IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
13249  (Uwe Schindler, Robert Muir, Yonik Seeley)
13250
13251* LUCENE-3197: MergePolicy.findMergesForOptimize now takes
13252  Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
13253  argument, so the merge policy knows which segments were originally
13254  present vs produced by an optimizing merge (Mike McCandless)
13255
13256Optimizations
13257
13258* LUCENE-1736: DateTools.java general improvements.
13259  (David Smiley via Steve Rowe)
13260
13261New Features
13262
13263* LUCENE-3140: Added experimental FST implementation to Lucene.
13264  (Robert Muir, Dawid Weiss, Mike McCandless)
13265
13266* LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
13267  algorithm over objects that implement the new TwoPhaseCommit interface (such
13268  as IndexWriter). (Shai Erera)
13269
13270* LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
13271  different shards (Uwe Schindler, Mike McCandless)
13272
13273* LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
13274
13275* LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
13276  segments with deletions; added new methods
13277  set/getReclaimDeletesWeight to control this.  (Mike McCandless)
13278
13279Build
13280
13281* LUCENE-1344: Create OSGi bundle using dev-tools/maven.
13282  (Nicolas Lalevée, Luca Stancapiano via ryan)
13283
13284* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
13285  users of the generate-maven-artifacts target no longer have to manually
13286  place this jar in the Ant classpath.  NOTE: when Ant looks for the
13287  maven-ant-tasks jar, it looks first in its pre-existing classpath, so
13288  any copies it finds will be used instead of the copy included in the
13289  Lucene/Solr source tree.  For this reason, it is recommeded to remove
13290  any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
13291  ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
13292
13293
13294======================= Lucene 3.2.0 =======================
13295
13296Changes in backwards compatibility policy
13297
13298* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
13299  with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
13300  a method getHeapArray() was added to retrieve the internal heap array as a
13301  non-generic Object[].  (Uwe Schindler, Yonik Seeley)
13302
13303* LUCENE-1076: IndexWriter.setInfoStream now throws IOException
13304  (Mike McCandless, Shai Erera)
13305
13306* LUCENE-3084: MergePolicy.OneMerge.segments was changed from
13307  SegmentInfos to a List<SegmentInfo>. SegmentInfos itself was changed
13308  to no longer extend Vector<SegmentInfo> (to update code that is using
13309  Vector-API, use the new asList() and asSet() methods returning unmodifiable
13310  collections; modifying SegmentInfos is now only possible through
13311  the explicitely declared methods). IndexWriter.segString() now takes
13312  Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
13313  should fix this. MergePolicy and SegmentInfos are internal/experimental
13314  APIs not covered by the strict backwards compatibility policy.
13315  (Uwe Schindler, Mike McCandless)
13316
13317Changes in runtime behavior
13318
13319* LUCENE-3065: When a NumericField is retrieved from a Document loaded
13320  from IndexReader (or IndexSearcher), it will now come back as
13321  NumericField not as a Field with a string-ified version of the
13322  numeric value you had indexed.  Note that this only applies for
13323  newly-indexed Documents; older indices will still return Field
13324  with the string-ified numeric value. If you call Document.get(),
13325  the value comes still back as String, but Document.getFieldable()
13326  returns NumericField instances. (Uwe Schindler, Ryan McKinley,
13327  Mike McCandless)
13328
13329* LUCENE-1076: Changed the default merge policy from
13330  LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
13331  (passed to IndexWriterConfig), which is able to merge non-contiguous
13332  segments. This means docIDs no longer necessarily stay "in order"
13333  during indexing.  If this is a problem then you can use either of
13334  the LogMergePolicy impls.  (Mike McCandless)
13335
13336New features
13337
13338* LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
13339  that allows to upgrade all segments to last recent supported index
13340  format without fully optimizing.  (Uwe Schindler, Mike McCandless)
13341
13342* LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
13343  segments, which means docIDs no longer necessarily stay "in order".
13344  (Mike McCandless, Shai Erera)
13345
13346* LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
13347  PathHierarchyTokenizer (Olivier Favre via ryan)
13348
13349* LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
13350  document IDs and scores encountered during the search, and "replay" them to
13351  another Collector. (Mike McCandless, Shai Erera)
13352
13353* LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
13354  enabling a block of documents to be indexed, atomically, with
13355  guaranteed sequential docIDs.  (Mike McCandless)
13356
13357API Changes
13358
13359* LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
13360  (though @lucene.experimental), allowing for custom MergeScheduler
13361  implementations. (Shai Erera)
13362
13363* LUCENE-3065: Document.getField() was deprecated, as it throws
13364  ClassCastException when loading lazy fields or NumericFields.
13365  (Uwe Schindler, Ryan McKinley, Mike McCandless)
13366
13367* LUCENE-2027: Directory.touchFile is deprecated and will be removed
13368  in 4.0.  (Mike McCandless)
13369
13370Optimizations
13371
13372* LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
13373  on empty or one-element lists/arrays.  (Uwe Schindler)
13374
13375* LUCENE-2897: Apply deleted terms while flushing a segment.  We still
13376  buffer deleted terms to later apply to past segments.  (Mike McCandless)
13377
13378* LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
13379  aren't already and MergePolicy allows that. (Shai Erera)
13380
13381Bug fixes
13382
13383* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
13384  indexes, causing existing deletions to be applied on the incoming indexes as
13385  well. (Shai Erera, Mike McCandless)
13386
13387* LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
13388  seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
13389  McCandless)
13390
13391* LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
13392  chain after it was already (partly) consumed [or clearAttributes(),
13393  captureState(), cloneAttributes(),... was called by the Tokenizer],
13394  the Tokenizer calling clearAttributes() or capturing state after addition
13395  may not do this on the newly added Attribute. This bug affected only
13396  very special use cases of the TokenStream-API, most users would not
13397  have recognized it.  (Uwe Schindler, Robert Muir)
13398
13399* LUCENE-3054: PhraseQuery can in some cases stack overflow in
13400  SorterTemplate.quickSort(). This fix also adds an optimization to
13401  PhraseQuery as term with lower doc freq will also have less positions.
13402  (Uwe Schindler, Robert Muir, Otis Gospodnetic)
13403
13404* LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
13405  query terms had same position in the query. (Doron Cohen)
13406
13407* LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
13408  (Robert Muir)
13409
13410Build
13411
13412* LUCENE-3006: Building javadocs will fail on warnings by default.
13413  Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
13414
13415* LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
13416  integration (unless one already exists). (Daniel Serodio via Shai Erera)
13417
13418Test Cases
13419
13420* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
13421  stop iterating if at least 'tests.iter.min' ran and a failure occured.
13422  (Shai Erera, Chris Hostetter)
13423
13424======================= Lucene 3.1.0 =======================
13425
13426Changes in backwards compatibility policy
13427
13428* LUCENE-2719: Changed API of internal utility class
13429  org.apache.lucene.util.SorterTemplate to support faster quickSort using
13430  pivot values and also merge sort and insertion sort. If you have used
13431  this class, you have to implement two more methods for handling pivots.
13432  (Uwe Schindler, Robert Muir, Mike McCandless)
13433
13434* LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
13435  toString.  These are advanced APIs and subject to change suddenly.
13436  (Tim Smith via Mike McCandless)
13437
13438* LUCENE-2190: Removed deprecated customScore() and customExplain()
13439  methods from experimental CustomScoreQuery.  (Uwe Schindler)
13440
13441* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
13442  This means that terms with a position increment gap of zero do not
13443  affect the norms calculation by default.  (Robert Muir)
13444
13445* LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
13446  the IndexWriter for a MergePolicy exactly once. You can change references to
13447  'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
13448  (it is also advisable to add an <code>assert writer != null;</code> before you
13449  access the wrapped IndexWriter.)
13450
13451  In addition, MergePolicy only exposes a default constructor, and the one that
13452  took IndexWriter as argument has been removed from all MergePolicy extensions.
13453  (Shai Erera via Mike McCandless)
13454
13455* LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
13456  FSDirectory.FSIndexInput. Anyone extending this class will have to
13457  fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
13458
13459* LUCENE-2302: The new interface for term attributes, CharTermAttribute,
13460  now implements CharSequence. This requires the toString() methods of
13461  CharTermAttribute, deprecated TermAttribute, and Token to return only
13462  the term text and no other attribute contents. LUCENE-2374 implements
13463  an attribute reflection API to no longer rely on toString() for attribute
13464  inspection. (Uwe Schindler, Robert Muir)
13465
13466* LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
13467  PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final.  Also removed
13468  the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
13469  Analyzer and TokenStream base classes now have an assertion in their ctor,
13470  that check subclasses to be final or at least have final implementations
13471  of incrementToken(), tokenStream(), and reusableTokenStream().
13472  (Uwe Schindler, Robert Muir)
13473
13474* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
13475  actual file's length if the file exists, and throws FileNotFoundException
13476  otherwise. Returning length=0 for a non-existent file is no longer allowed. If
13477  you relied on that, make sure to catch the exception. (Shai Erera)
13478
13479* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
13480  creation. Previously, if you passed an empty Directory and set OpenMode to
13481  CREATE*, IndexWriter would make a first empty commit. If you need that
13482  behavior you can call writer.commit()/close() immediately after you create it.
13483  (Shai Erera, Mike McCandless)
13484
13485* LUCENE-2733: Removed public constructors of utility classes with only static
13486  methods to prevent instantiation.  (Uwe Schindler)
13487
13488* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
13489  takes deletions into account by default.  You can disable this by
13490  calling setCalibrateSizeByDeletes(false) on the merge policy.  (Mike
13491  McCandless)
13492
13493* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
13494  values in multi-valued field has been changed for some cases in index.
13495  If you index empty fields and uses positions/offsets information on that
13496  fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
13497
13498* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
13499  (Shai Erera, Robert Muir)
13500
13501* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
13502  Searchable are collapsed into IndexSearcher; contrib/remote and
13503  MultiSearcher have been removed.  (Mike McCandless)
13504
13505* LUCENE-2854: Deprecated SimilarityDelegator and
13506  Similarity.lengthNorm; the latter is now final, forcing any custom
13507  Similarity impls to cutover to the more general computeNorm (Robert
13508  Muir, Mike McCandless)
13509
13510* LUCENE-2869: Deprecated Query.getSimilarity: instead of using
13511  "runtime" subclassing/delegation, subclass the Weight instead.
13512  (Robert Muir)
13513
13514* LUCENE-2674: A new idfExplain method was added to Similarity, that
13515  accepts an incoming docFreq.  If you subclass Similarity, make sure
13516  you also override this method on upgrade.  (Robert Muir, Mike
13517  McCandless)
13518
13519Changes in runtime behavior
13520
13521* LUCENE-1923: Made IndexReader.toString() produce something
13522  meaningful (Tim Smith via Mike McCandless)
13523
13524* LUCENE-2179: CharArraySet.clear() is now functional.
13525  (Robert Muir, Uwe Schindler)
13526
13527* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
13528  before it adds the new ones. Also, the existing segments are not merged and so
13529  the index will not end up with a single segment (unless it was empty before).
13530  In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
13531  invokes a merge on the incoming and target segments, but instead copies the
13532  segments to the target index. You can call maybeMerge or optimize after this
13533  method completes, if you need to.
13534
13535  In addition, Directory.copyTo* were removed in favor of copy which takes the
13536  target Directory, source and target files as arguments, and copies the source
13537  file to the target Directory under the target file name. (Shai Erera)
13538
13539* LUCENE-2663: IndexWriter no longer forcefully clears any existing
13540  locks when create=true.  This was a holdover from when
13541  SimpleFSLockFactory was the default locking implementation, and,
13542  even then it was dangerous since it could mask bugs in IndexWriter's
13543  usage, allowing applications to accidentally open two writers on the
13544  same directory.  (Mike McCandless)
13545
13546* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
13547  LogMergePolicy now affect optimize() as well (as opposed to only regular
13548  merges). This means that you can run optimize() and too large segments won't
13549  be merged. (Shai Erera)
13550
13551* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
13552  guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
13553
13554* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
13555  the IndexSearcher search methods that take an int nDocs will now
13556  throw IllegalArgumentException if nDocs is 0.  Instead, you should
13557  use the newly added TotalHitCountCollector.  (Mike McCandless)
13558
13559* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
13560  to determine whether the passed in segment should be compound.
13561  (Shai Erera, Earwin Burrfoot)
13562
13563* LUCENE-2805: IndexWriter now increments the index version on every change to
13564  the index instead of for every commit. Committing or closing the IndexWriter
13565  without any changes to the index will not cause any index version increment.
13566  (Simon Willnauer, Mike McCandless)
13567
13568* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
13569  Windows and Solaris systems that support unmapping, FSDirectory.open returns
13570  MMapDirectory. Additionally the behavior of MMapDirectory has been
13571  changed to enable unmapping by default if supported by the JRE.
13572  (Mike McCandless, Uwe Schindler, Robert Muir)
13573
13574* LUCENE-2829: Improve the performance of "primary key" lookup use
13575  case (running a TermQuery that matches one document) on a
13576  multi-segment index.  (Robert Muir, Mike McCandless)
13577
13578* LUCENE-2010: Segments with 100% deleted documents are now removed on
13579  IndexReader or IndexWriter commit.   (Uwe Schindler, Mike McCandless)
13580
13581* LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
13582  "live" (after an IW is instantiated), via
13583  IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
13584
13585API Changes
13586
13587* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory.  (George
13588  Aroush via Mike McCandless)
13589
13590* LUCENE-1260: Change norm encode (float->byte) and decode
13591  (byte->float) to be instance methods not static methods.  This way a
13592  custom Similarity can alter how norms are encoded, though they must
13593  still be encoded as a single byte (Johan Kindgren via Mike
13594  McCandless)
13595
13596* LUCENE-2103: NoLockFactory should have a private constructor;
13597  until Lucene 4.0 the default one will be deprecated.
13598  (Shai Erera via Uwe Schindler)
13599
13600* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
13601  Since the removal of compressed fields, Store can only be YES, so
13602  it's not necessary to specify.  (Erik Hatcher via Mike McCandless)
13603
13604* LUCENE-2200: Several final classes had non-overriding protected
13605  members. These were converted to private and unused protected
13606  constructors removed.  (Steven Rowe via Robert Muir)
13607
13608* LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
13609  Version ctors.  (Simon Willnauer via Uwe Schindler)
13610
13611* LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
13612  unused files.  This is only useful on Windows, which prevents
13613  deletion of open files. IndexWriter will eventually remove these
13614  files itself; this method just lets you do so when you know the
13615  files are no longer open by IndexReaders. (luocanrao via Mike
13616  McCandless)
13617
13618* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
13619  use by external code. In addition it offers a matchExtension method which
13620  callers can use to query whether a certain file matches a certain extension.
13621  (Shai Erera via Mike McCandless)
13622
13623* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
13624  This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
13625  only scores terms by their boost values. For example, this can be used
13626  with FuzzyQuery to ensure that exact matches are always scored higher,
13627  because only the boost will be used in scoring.  (Robert Muir)
13628
13629* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
13630  expose its folding logic.  (Cédrik Lime via Robert Muir)
13631
13632* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
13633  single ctor which accepts IndexWriterConfig and a Directory. You can set all
13634  the parameters related to IndexWriter on IndexWriterConfig. The different
13635  setter/getter methods were deprecated as well. One should call
13636  writer.getConfig().getXYZ() to query for a parameter XYZ.
13637  Additionally, the setter/getter related to MergePolicy were deprecated as
13638  well. One should interact with the MergePolicy directly.
13639  (Shai Erera via Mike McCandless)
13640
13641* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
13642  IndexWriterConfig and the respective methods on IndexWriter were deprecated.
13643  (Shai Erera via Mike McCandless)
13644
13645* LUCENE-2328: Directory now keeps track itself of the files that are written
13646  but not yet fsynced. The old Directory.sync(String file) method is deprecated
13647  and replaced with Directory.sync(Collection<String> files). Take a look at
13648  FSDirectory to see a sample of how such tracking might look like, if needed
13649  in your custom Directories.  (Earwin Burrfoot via Mike McCandless)
13650
13651* LUCENE-2302: Deprecated TermAttribute and replaced by a new
13652  CharTermAttribute. The change is backwards compatible, so
13653  mixed new/old TokenStreams all work on the same char[] buffer
13654  independent of which interface they use. CharTermAttribute
13655  has shorter method names and implements CharSequence and
13656  Appendable. This allows usage like Java's StringBuilder in
13657  addition to direct char[] access. Also terms can directly be
13658  used in places where CharSequence is allowed (e.g. regular
13659  expressions).
13660  (Uwe Schindler, Robert Muir)
13661
13662* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
13663  points too. If you use an IndexDeletionPolicy which holds onto index commits
13664  (such as SnapshotDeletionPolicy), you can call this method to remove those
13665  commit points when they are not needed anymore (instead of waiting for the
13666  next commit). (Shai Erera)
13667
13668* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
13669  with equivalent ones that take a String (id) as argument. You can pass
13670  whatever ID you want, as long as you use the same one when calling both.
13671  (Shai Erera)
13672
13673* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
13674  set what IndexWriter passes for termsIndexDivisor to the readers it
13675  opens internally when apply deletions or creating a near-real-time
13676  reader.  (Earwin Burrfoot via Mike McCandless)
13677
13678* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
13679  in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
13680  Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
13681  points, including values from U+FFFF to U+10FFFF
13682
13683  ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
13684  Analyzer implementation and behavior.  Only the Unicode Basic Multilingual
13685  Plane (code points from U+0000 to U+FFFF) is covered.
13686
13687  UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
13688  relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
13689  (Steven Rowe, Robert Muir, Uwe Schindler)
13690
13691* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
13692  and return a different RAMFile implementation. (Shai Erera)
13693
13694* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
13695  count the number of hits matching the query.  (Mike McCandless)
13696
13697* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
13698  is only syntactic sugar for setNorm(int, String, byte), but  using the global
13699  Similarity.getDefault().encodeNormValue().  Use the byte-based method instead
13700  to ensure that the norm is encoded with your Similarity.
13701  (Robert Muir, Mike McCandless)
13702
13703* LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
13704  contents of AttributeImpl and AttributeSource using a well-defined API.
13705  This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
13706  in a structured way.
13707  There are also some backwards incompatible changes in toString() output,
13708  as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
13709  leading to changed toString() return values. The new API allows to get a
13710  string representation in a well-defined way using a new method
13711  reflectAsString(). For backwards compatibility reasons, when toString()
13712  was implemented by implementation subclasses, the default implementation of
13713  AttributeImpl.reflectWith() uses toString()s output instead to report the
13714  Attribute's properties. Otherwise, reflectWith() uses Java's reflection
13715  (like toString() did before) to get the attribute properties.
13716  In addition, the mandatory equals() and hashCode() are no longer required
13717  for AttributeImpls, but can still be provided (if needed).
13718  (Uwe Schindler)
13719
13720* LUCENE-2691: Deprecate IndexWriter.getReader in favor of
13721  IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
13722
13723* LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
13724  it should keep it itself. Fixed Scorers to pass their parent Weight, so that
13725  Scorer.visitSubScorers (LUCENE-2590) will work correctly.
13726  (Robert Muir, Doron Cohen)
13727
13728* LUCENE-2900: When opening a near-real-time (NRT) reader
13729  (IndexReader.re/open(IndexWriter)) you can now specify whether
13730  deletes should be applied.  Applying deletes can be costly, and some
13731  expert use cases can handle seeing deleted documents returned.  The
13732  deletes remain buffered so that the next time you open an NRT reader
13733  and pass true, all deletes will be a applied.  (Mike McCandless)
13734
13735* LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
13736  require up front specification of enablePositionIncrement. Together with
13737  StopFilter they have a common base class (FilteringTokenFilter) that handles
13738  the position increments automatically. Implementors only need to override an
13739  accept() method that filters tokens.  (Uwe Schindler, Robert Muir)
13740
13741Bug fixes
13742
13743* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
13744  close.  (Martin Traverso via Uwe Schindler)
13745
13746* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
13747  incorrectly and lead to ConcurrentModificationException.
13748  (Uwe Schindler, Robert Muir)
13749
13750* LUCENE-2328: Index files fsync tracking moved from
13751  IndexWriter/IndexReader to Directory, and it no longer leaks memory.
13752  (Earwin Burrfoot via Mike McCandless)
13753
13754* LUCENE-2074: Reduce buffer size of lexer back to default on reset.
13755  (Ruben Laguna, Shai Erera via Uwe Schindler)
13756
13757* LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
13758  a prior (corrupt) index missing its segments_N file.  (Mike
13759  McCandless)
13760
13761* LUCENE-2458: QueryParser no longer automatically forms phrase queries,
13762  assuming whitespace tokenization. Previously all CJK queries, for example,
13763  would be turned into phrase queries. The old behavior is preserved with
13764  the matchVersion parameter for previous versions. Additionally, you can
13765  explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
13766  (Robert Muir)
13767
13768* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
13769  OOM if a large file was copied. (Shai Erera)
13770
13771* LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
13772  exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
13773
13774* LUCENE-2617: Optional clauses of a BooleanQuery were not factored
13775  into coord if the scorer for that segment returned null.  This
13776  can cause the same document to score to differently depending on
13777  what segment it resides in. (yonik)
13778
13779* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
13780
13781* LUCENE-2732: Fix charset problems in XML loading in
13782  HyphenationCompoundWordTokenFilter.  (Uwe Schindler)
13783
13784* LUCENE-2802: NRT DirectoryReader returned incorrect values from
13785  getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
13786  to a mutable reference to the IndexWriters SegmentInfos.
13787  (Simon Willnauer, Earwin Burrfoot)
13788
13789* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
13790  false EOF after seeking to EOF then seeking back to same block you
13791  were just in and then calling readBytes (Robert Muir, Mike McCandless)
13792
13793* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
13794  decides whether to return the cached computed size or not. (Shai Erera)
13795
13796* LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
13797  called by multiple threads. (Alexander Kanarsky via Shai Erera)
13798
13799* LUCENE-2809: Fixed IndexWriter.numDocs to take into account
13800  applied but not yet flushed deletes.  (Mike McCandless)
13801
13802* LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
13803  internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
13804  (Robert Muir)
13805
13806* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
13807  (Jason Rutherglen via Shai Erera)
13808
13809* LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
13810  is safe also in strange locales.  (Uwe Schindler)
13811
13812* LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
13813  which can be used to prevent loading the terms index into memory. (Shai Erera)
13814
13815* LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
13816  indexing) had an underflow detection bug that caused floatToByte(f)==0 where
13817  f was greater than 0, but slightly less than byteToFloat(1).  This meant that
13818  certain very small field norms (index_boost * length_norm) could have
13819  been rounded down to 0 instead of being rounded up to the smallest
13820  positive number.  (yonik)
13821
13822* LUCENE-2936: PhraseQuery score explanations were not correctly
13823  identifying matches vs non-matches.  (hossman)
13824
13825* LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
13826  the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
13827  The loop was unwinded which makes the hotspot bug disappear.
13828  (Uwe Schindler, Robert Muir, Mike McCandless)
13829
13830New features
13831
13832* LUCENE-2128: Parallelized fetching document frequencies during weight
13833  creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
13834
13835* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
13836  to Java 5, supplementary characters are now lowercased correctly if the
13837  set is created as case insensitive.
13838  CharArraySet now requires a Version argument to preserve
13839  backwards compatibility. If Version < 3.1 is passed to the constructor,
13840  CharArraySet yields the old behavior. (Simon Willnauer)
13841
13842* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
13843  to Java 5, supplementary characters are now lowercased correctly.
13844  LowerCaseFilter now requires a Version argument to preserve
13845  backwards compatibility. If Version < 3.1 is passed to the constructor,
13846  LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
13847
13848* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
13849  that makes it easier to reuse TokenStreams correctly. This issue also added
13850  StopwordAnalyzerBase, which improves consistency of all Analyzers that use
13851  stopwords, and implement many analyzers in contrib with it.
13852  (Simon Willnauer via Robert Muir)
13853
13854* LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
13855  new KeywordAttribute.  (Simon Willnauer, Drew Farris via Uwe Schindler)
13856
13857* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
13858  to CharTokenizer and its subclasses. CharTokenizer now has new
13859  int-API which is conditionally preferred to the old char-API depending
13860  on the provided Version. Version < 3.1 will use the char-API.
13861  (Simon Willnauer via Uwe Schindler)
13862
13863* LUCENE-2247: Added a CharArrayMap<V> for performance improvements
13864  in some stemmers and synonym filters. (Uwe Schindler)
13865
13866* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
13867  exactly once. (Shai Erera via Mike McCandless)
13868
13869* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
13870  allows to use cloneAttributes() and this method as a replacement
13871  for captureState()/restoreState(), if the state itself
13872  needs to be inspected/modified.  (Uwe Schindler)
13873
13874* LUCENE-2293: Expose control over max number of threads that
13875  IndexWriter will allow to run concurrently while indexing
13876  documents (previously this was hardwired to 5), using
13877  IndexWriterConfig.setMaxThreadStates.  (Mike McCandless)
13878
13879* LUCENE-2297: Enable turning on reader pooling inside IndexWriter
13880  even when getReader (near-real-timer reader) is not in use, through
13881  IndexWriterConfig.enable/disableReaderPooling.  (Mike McCandless)
13882
13883* LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
13884  addition, add NoMergeScheduler which never executes any merges. These two are
13885  convenient classes in case you want to disable segment merges by IndexWriter
13886  without tweaking a particular MergePolicy parameters, such as mergeFactor.
13887  MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
13888
13889* LUCENE-2339: Deprecate static method Directory.copy in favor of
13890  Directory.copyTo, and use nio's FileChannel.transferTo when copying
13891  files between FSDirectory instances.  (Earwin Burrfoot via Mike
13892  McCandless).
13893
13894* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
13895  matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
13896
13897* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
13898  can be used to prevent commits from ever getting deleted from the index.
13899  (Shai Erera)
13900
13901* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
13902  return a DirPayloadProcessor for a given Directory, which returns a
13903  PayloadProcessor for a given Term. The PayloadProcessor will be used to
13904  process the payloads of the segments as they are merged (e.g. if one wants to
13905  rewrite payloads of external indexes as they are added, or of local ones).
13906  (Shai Erera, Michael Busch, Mike McCandless)
13907
13908* LUCENE-2440: Add support for custom ExecutorService in
13909  ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
13910
13911* LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
13912  to wrap any other Analyzer and provide the same functionality as
13913  MaxFieldLength provided on IndexWriter.  This patch also fixes a bug
13914  in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
13915
13916* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
13917  it's empty.  (Ross Woolf via Mike McCandless)
13918
13919* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
13920  McCandless)
13921
13922* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq.  Along
13923  with a custom Collector these experimental methods make it possible
13924  to gather the hit-count per sub-clause and per document while a
13925  search is running.  (Simon Willnauer, Mike McCandless)
13926
13927* LUCENE-2636: Added MultiCollector which allows running the search with several
13928  Collectors. (Shai Erera)
13929
13930* LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
13931  to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
13932  Using this wrapper it's easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
13933  (Robert Muir, Uwe Schindler)
13934
13935* LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
13936  instance for stripping off scores. The use of a QueryWrapperFilter
13937  is no longer needed and discouraged for that use case. Directly wrapping
13938  Query improves performance, as out-of-order collection is now supported.
13939  (Uwe Schindler)
13940
13941* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
13942  FieldInvertState so that it can be used in Similarity.computeNorm.
13943  (Robert Muir)
13944
13945* LUCENE-2720: Segments now record the code version which created them.
13946  (Shai Erera, Mike McCandless, Uwe Schindler)
13947
13948* LUCENE-2474: Added expert ReaderFinishedListener API to
13949  IndexReader, to allow apps that maintain external per-segment caches
13950  to evict entries when a segment is finished.  (Shay Banon, Yonik
13951  Seeley, Mike McCandless)
13952
13953* LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
13954  the ICUTokenizer in contrib now all tag types with a consistent set
13955  of token types (defined in StandardTokenizer). Tokens in the major
13956  CJK types are explicitly marked to allow for custom downstream handling:
13957  <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
13958  (Robert Muir, Steven Rowe)
13959
13960* LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
13961
13962* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
13963  (Tim Smith, Grant Ingersoll)
13964
13965* LUCENE-2692: Added several new SpanQuery classes for positional checking
13966  (match is in a range, payload is a specific value) (Grant Ingersoll)
13967
13968Optimizations
13969
13970* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
13971  simple polling for results. (Edward Drapkin, Simon Willnauer)
13972
13973* LUCENE-2075: Terms dict cache is now shared across threads instead
13974  of being stored separately in thread local storage.  Also fixed
13975  terms dict so that the cache is used when seeking the thread local
13976  term enum, which will be important for MultiTermQuery impls that do
13977  lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
13978  Seeley)
13979
13980* LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
13981  only has a single sub-reader, delegate all enum requests to it.
13982  This avoid the overhead of using a PQ unnecessarily.  (Mike
13983  McCandless)
13984
13985* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
13986  Burrfoot via Mike McCandless)
13987
13988* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
13989  into MultiTermQuery. The number of fuzzy expansions can be specified with
13990  the maxExpansions parameter to FuzzyQuery.
13991  (Uwe Schindler, Robert Muir, Mike McCandless)
13992
13993* LUCENE-2164: ConcurrentMergeScheduler has more control over merge
13994  threads.  First, it gives smaller merges higher thread priority than
13995  larges ones.  Second, a new set/getMaxMergeCount setting will pause
13996  the larger merges to allow smaller ones to finish.  The defaults for
13997  these settings are now dynamic, depending the number CPU cores as
13998  reported by Runtime.getRuntime().availableProcessors() (Mike
13999  McCandless)
14000
14001* LUCENE-2169: Improved CharArraySet.copy(), if source set is
14002  also a CharArraySet.  (Simon Willnauer via Uwe Schindler)
14003
14004* LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
14005  directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
14006  take advantage of this for faster performance.
14007  (Steven Rowe, Uwe Schindler, Robert Muir)
14008
14009* LUCENE-2188: Add a utility class for tracking deprecated overridden
14010  methods in non-final subclasses.
14011  (Uwe Schindler, Robert Muir)
14012
14013* LUCENE-2195: Speedup CharArraySet if set is empty.
14014  (Simon Willnauer via Robert Muir)
14015
14016* LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
14017
14018* LUCENE-2303: Remove code duplication in Token class by subclassing
14019  TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
14020  null-handling for TypeAttribute.  (Uwe Schindler)
14021
14022* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
14023  term to parallel arrays, indexed by termID. This reduces garbage collection
14024  overhead significantly, which results in great indexing performance wins
14025  when the available JVM heap space is low. This will become even more
14026  important when the DocumentsWriter RAM buffer is searchable in the future,
14027  because then it will make sense to make the RAM buffers as large as
14028  possible. (Mike McCandless, Michael Busch)
14029
14030* LUCENE-2380: The terms field cache methods (getTerms,
14031  getTermsIndex), which replace the older String equivalents
14032  (getStrings, getStringIndex), consume quite a bit less RAM in most
14033  cases.  (Mike McCandless)
14034
14035* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
14036  (Mike McCandless)
14037
14038* LUCENE-2531: Fix issue when sorting by a String field that was
14039  causing too many fallbacks to compare-by-value (instead of by-ord).
14040  (Mike McCandless)
14041
14042* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
14043  efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
14044  streams. (Shai Erera)
14045
14046* LUCENE-2719: Improved TermsHashPerField's sorting to use a better
14047  quick sort algorithm that dereferences the pivot element not on
14048  every compare call. Also replaced lots of sorting code in Lucene
14049  by the improved SorterTemplate class.
14050  (Uwe Schindler, Robert Muir, Mike McCandless)
14051
14052* LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
14053  (Robert Muir)
14054
14055* LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
14056  even when IndexWriter.addIndexes(IndexReader...) is used with
14057  DirectoryReaders or other MultiReaders. This saves lots of memory
14058  during merge of norms.  (Uwe Schindler, Mike McCandless)
14059
14060* LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
14061  (Robert Muir)
14062
14063* LUCENE-2010: Segments with 100% deleted documents are now removed on
14064  IndexReader or IndexWriter commit.  (Uwe Schindler, Mike McCandless)
14065
14066* LUCENE-1472: Removed synchronization from static DateTools methods
14067  by using a ThreadLocal. Also converted DateTools.Resolution to a
14068  Java 5 enum (this should not break backwards).  (Uwe Schindler)
14069
14070Build
14071
14072* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
14073  into core, and moved the ICU-based collation support into contrib/icu.
14074  (Robert Muir)
14075
14076* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
14077  branch is now included in the svn repository using "svn copy"
14078  after release. (Uwe Schindler)
14079
14080* LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
14081  JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
14082
14083* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
14084  can force them to run sequentially by passing -Drunsequential=1 on the command
14085  line. The number of threads that are spawned per CPU defaults to '1'. If you
14086  wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
14087  (Robert Muir, Shai Erera, Peter Kofler)
14088
14089* LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
14090  from tarball of previous version. Backwards tests are now packaged together
14091  with src distribution.  (Uwe Schindler)
14092
14093* LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
14094  "ant idea".  See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
14095  (Steven Rowe)
14096
14097* LUCENE-2657: Switch from using Maven POM templates to full POMs when
14098  generating Maven artifacts (Steven Rowe)
14099
14100* LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
14101  tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
14102  Steven Rowe)
14103
14104Test Cases
14105
14106* LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
14107  via Mike McCandless)
14108
14109* LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
14110  Mike McCandless)
14111
14112* LUCENE-2065: Use Java 5 generics throughout our unit tests.  (Kay
14113  Kay via Mike McCandless)
14114
14115* LUCENE-2155: Fix time and zone dependent localization test failures
14116  in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
14117
14118* LUCENE-2170: Fix thread starvation problems.  (Uwe Schindler)
14119
14120* LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
14121  Version.LUCENE_CURRENT, but instead use a global static value
14122  from LuceneTestCase(J4), that contains the release version.
14123  (Uwe Schindler, Simon Willnauer, Shai Erera)
14124
14125* LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
14126  verbosity of tests. If VERBOSE==false (default) tests should not print
14127  anything other than errors to System.(out|err). The setting can be
14128  changed with -Dtests.verbose=true on test invocation.
14129  (Shai Erera, Paul Elschot, Uwe Schindler)
14130
14131* LUCENE-2318: Remove inconsistent system property code for retrieving
14132  temp and data directories inside test cases. It is now centralized in
14133  LuceneTestCase(J4). Also changed lots of tests to use
14134  getClass().getResourceAsStream() to retrieve test data. Tests needing
14135  access to "real" files from the test folder itself, can use
14136  LuceneTestCase(J4).getDataFile().  (Uwe Schindler)
14137
14138* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
14139  as Eclipse and IntelliJ.
14140  (Paolo Castagna, Steven Rowe via Robert Muir)
14141
14142* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
14143  random. (Shai Erera, Robert Muir)
14144
14145Documentation
14146
14147* LUCENE-2579: Fix oal.search's package.html description of abstract
14148  methods.  (Santiago M. Mola via Mike McCandless)
14149
14150* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
14151  that the TermEnum must be seeked since it is unpositioned.
14152  (Adriano Crestani via Robert Muir)
14153
14154* LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
14155  (Shinichiro Abe, Koji Sekiguchi)
14156
14157================== Release 2.9.4 / 3.0.3 ====================
14158
14159Changes in runtime behavior
14160
14161* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
14162  test lock just before the real lock is acquired.  (Surinder Pal
14163  Singh Bindra via Mike McCandless)
14164
14165* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
14166  handles against deleted files when compound-file was enabled (the
14167  default) and readers are pooled.  As a result of this the peak
14168  worst-case free disk space required during optimize is now 3X the
14169  index size, when compound file is enabled (else 2X).  (Mike
14170  McCandless)
14171
14172* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
14173  0.1), which means any time a merged segment is greater than 10% of
14174  the index size, it will be left in non-compound format even if
14175  compound format is on.  This change was made to reduce peak
14176  transient disk usage during optimize which increased due to
14177  LUCENE-2762.  (Mike McCandless)
14178
14179Bug fixes
14180
14181* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
14182  throws an exception when term count exceeds doc count.
14183  (Mike McCandless, Uwe Schindler)
14184
14185* LUCENE-2513: when opening writable IndexReader on a not-current
14186  commit, do not overwrite "future" commits.  (Mike McCandless)
14187
14188* LUCENE-2536: IndexWriter.rollback was failing to properly rollback
14189  buffered deletions against segments that were flushed (Mark Harwood
14190  via Mike McCandless)
14191
14192* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
14193  with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
14194  NumericUtils.splitRange() overflowed, if
14195  - the range contained a LOWER bound
14196    that was greater than (Long.MAX_VALUE - (1L << precisionStep))
14197  - the range contained an UPPER bound
14198    that was less than (Long.MIN_VALUE + (1L << precisionStep))
14199  With standard precision steps around 4, this had no effect on
14200  most queries, only those that met the above conditions.
14201  Queries with large precision steps failed more easy. Queries with
14202  precision step >=64 were not affected. Also 32 bit data types int
14203  and float were not affected.
14204  (Yonik Seeley, Uwe Schindler)
14205
14206* LUCENE-2593: Fixed certain rare cases where a disk full could lead
14207  to a corrupted index (Robert Muir, Mike McCandless)
14208
14209* LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
14210  would result in unbearably slow performance.  (Nick Barkas via Robert Muir)
14211
14212* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
14213  exact multiple of the chunk size.  (Robert Muir)
14214
14215* LUCENE-2634: isCurrent on an NRT reader was failing to return false
14216  if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
14217
14218* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
14219  an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
14220
14221* LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
14222  (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
14223
14224* LUCENE-2658: Exceptions while processing term vectors enabled for multiple
14225  fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
14226  (Robert Muir, Mike McCandless)
14227
14228* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
14229  (Javier Godoy via Uwe Schindler)
14230
14231* LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
14232  already sync'd files. (Earwin Burrfoot via Mike McCandless)
14233
14234* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
14235  the absolute docid.  (Uwe Schindler)
14236
14237* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
14238  primary & secondary dirs share the same underlying directory.
14239  (Michael McCandless)
14240
14241* LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
14242  is fixed to return null if there are no segments.  (Karthick
14243  Sankarachary via Mike McCandless)
14244
14245* LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
14246
14247* LUCENE-2744: CheckIndex was stating total number of fields,
14248  not the number that have norms enabled, on the "test: field
14249  norms..." output.  (Mark Kristensson via Mike McCandless)
14250
14251* LUCENE-2759: Fixed two near-real-time cases where doc store files
14252  may be opened for read even though they are still open for write.
14253  (Mike McCandless)
14254
14255* LUCENE-2618: Fix rare thread safety issue whereby
14256  IndexWriter.optimize could sometimes return even though the index
14257  wasn't fully optimized (Mike McCandless)
14258
14259* LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
14260  that could potentially result in index corruption.  (Mike
14261  McCandless)
14262
14263* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
14264  handles against deleted files when compound-file was enabled (the
14265  default) and readers are pooled.  As a result of this the peak
14266  worst-case free disk space required during optimize is now 3X the
14267  index size, when compound file is enabled (else 2X).  (Mike
14268  McCandless)
14269
14270* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
14271  sets that only differed by trailing zeros. (Dawid Weiss, yonik)
14272
14273* LUCENE-2782: Fix rare potential thread hazard with
14274  IndexWriter.commit (Mike McCandless)
14275
14276API Changes
14277
14278* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
14279  0.1), which means any time a merged segment is greater than 10% of
14280  the index size, it will be left in non-compound format even if
14281  compound format is on.  This change was made to reduce peak
14282  transient disk usage during optimize which increased due to
14283  LUCENE-2762.  (Mike McCandless)
14284
14285Optimizations
14286
14287* LUCENE-2556: Improve memory usage after cloning TermAttribute.
14288  (Adriano Crestani via Uwe Schindler)
14289
14290* LUCENE-2098: Improve the performance of BaseCharFilter, especially for
14291  large documents.  (Robin Wojciki, Koji Sekiguchi, Robert Muir)
14292
14293New features
14294
14295* LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
14296  also in 2.9. The file format did not change, only the version number was
14297  upgraded to mark segments that have no compression. FieldsWriter still only
14298  writes 2.9 segments as they could contain compressed fields. This cross-version
14299  index format compatibility is provided here solely because Lucene 2.9 and 3.0
14300  have the same bugfix level, features, and the same index format with this slight
14301  compression difference. In general, Lucene does not support reading newer
14302  indexes with older library versions. (Uwe Schindler)
14303
14304Documentation
14305
14306* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
14307  Java NIO behavior when a Thread is interrupted while blocking on IO.
14308  (Simon Willnauer, Robert Muir)
14309
14310================== Release 2.9.3 / 3.0.2 ====================
14311
14312Changes in backwards compatibility policy
14313
14314* LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
14315  interface.  Anyone implementing FieldCache externally will need to
14316  fix their code to implement this, on upgrading.  (Mike McCandless)
14317
14318Changes in runtime behavior
14319
14320* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
14321  it cannot delete the lock file, since obtaining the lock does not fail if the
14322  file is there. (Shai Erera)
14323
14324* LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
14325  maxNumThreads from 3 to 1, because in practice we get the most gains
14326  from running a single merge in the backround.  More than one
14327  concurrent merge causes alot of thrashing (though it's possible on
14328  SSD storage that there would be net gains).  (Jason Rutherglen, Mike
14329  McCandless)
14330
14331Bug fixes
14332
14333* LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
14334  IndexWriter.prepareCommit has been called but before
14335  IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
14336
14337* LUCENE-2119: Don't throw NegativeArraySizeException if you pass
14338  Integer.MAX_VALUE as nDocs to IndexSearcher search methods.  (Paul
14339  Taylor via Mike McCandless)
14340
14341* LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
14342  exception when term count exceeds doc count.  (Mike McCandless)
14343
14344* LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
14345  another thread/process.  (Shai Erera via Uwe Schindler)
14346
14347* LUCENE-2283: Use shared memory pool for term vector and stored
14348  fields buffers. This memory will be reclaimed if needed according to
14349  the configured RAM Buffer Size for the IndexWriter.  This also fixes
14350  potentially excessive memory usage when many threads are indexing a
14351  mix of small and large documents.  (Tim Smith via Mike McCandless)
14352
14353* LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
14354  has been obtained), and addIndexes* is run, do not pool the
14355  readers from the external directory.  This is harmless (NRT reader is
14356  correct), but a waste of resources.  (Mike McCandless)
14357
14358* LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
14359  little performance, and ties up possibly large amounts of memory
14360  for apps that index large docs.  (Ross Woolf via Mike McCandless)
14361
14362* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
14363  in IndexWriter, nor the Reader in Tokenizer after close is
14364  called.  (Ruben Laguna, Uwe Schindler, Mike McCandless)
14365
14366* LUCENE-2417: IndexCommit did not implement hashCode() and equals()
14367  consistently. Now they both take Directory and version into consideration. In
14368  addition, all of IndexComnmit methods which threw
14369  UnsupportedOperationException are now abstract. (Shai Erera)
14370
14371* LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
14372  are indexed.  (Mike McCandless)
14373
14374* LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
14375  demo resulted in ArrayIndexOutOfBoundsException.
14376  (Sami Siren via Robert Muir)
14377
14378* LUCENE-2476: If any exception is hit init'ing IW, release the write
14379  lock (previously we only released on IOException).  (Tamas Cservenak
14380  via Mike McCandless)
14381
14382* LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
14383  Filter.getDocIdSet() returns null.  (Uwe Schindler, Daniel Noll)
14384
14385* LUCENE-2468: Allow specifying how new deletions should be handled in
14386  CachingWrapperFilter and CachingSpanFilter.  By default, new
14387  deletions are ignored in CachingWrapperFilter, since typically this
14388  filter is AND'd with a query that correctly takes new deletions into
14389  account.  This should be a performance gain (higher cache hit rate)
14390  in apps that reopen readers, or use near-real-time reader
14391  (IndexWriter.getReader()), but may introduce invalid search results
14392  (allowing deleted docs to be returned) for certain cases, so a new
14393  expert ctor was added to CachingWrapperFilter to enforce deletions
14394  at a performance cost.  CachingSpanFilter by default recaches if
14395  there are new deletions (Shay Banon via Mike McCandless)
14396
14397* LUCENE-2299: If you open an NRT reader while addIndexes* is running,
14398  it may miss some segments (Earwin Burrfoot via Mike McCandless)
14399
14400* LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
14401  there are no commits yet (Shai Erera)
14402
14403* LUCENE-2424: Fix FieldDoc.toString to actually return its fields
14404  (Stephen Green via Mike McCandless)
14405
14406* LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
14407  SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
14408  that warming is free to do whatever it needs to.  (Earwin Burrfoot
14409  via Mike McCandless)
14410
14411* LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
14412  position-increment tokens that would sometimes assign different
14413  scores to identical docs.  (Mike McCandless)
14414
14415* LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
14416  files when a mergedSegmentWarmer is set on IndexWriter.  (Mike
14417  McCandless)
14418
14419* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
14420  multi-segment index (Michael McCandless)
14421
14422API Changes
14423
14424* LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
14425  operations before flush starts. Also exposed doAfterFlush as protected instead
14426  of package-private. (Shai Erera via Mike McCandless)
14427
14428* LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
14429  what IndexWriter passes for termsIndexDivisor to the readers it
14430  opens internally when applying deletions or creating a
14431  near-real-time reader.  (Earwin Burrfoot via Mike McCandless)
14432
14433Optimizations
14434
14435* LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
14436  instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
14437
14438* LUCENE-2135: On IndexReader.close, forcefully evict any entries from
14439  the FieldCache rather than waiting for the WeakHashMap to release
14440  the reference (Mike McCandless)
14441
14442* LUCENE-2161: Improve concurrency of IndexReader, especially in the
14443  context of near real-time readers.  (Mike McCandless)
14444
14445* LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
14446  IndexWriter (Robert Muir, Mike McCandless)
14447
14448Build
14449
14450* LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
14451  contrib modules on request (pass '-Dforce.jdk14.build=true') when
14452  compiling/testing/packaging. This marks the benchmark contrib also
14453  as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
14454
14455================== Release 2.9.2 / 3.0.1 ====================
14456
14457Changes in backwards compatibility policy
14458
14459* LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
14460  from FuzzyQuery. The change was needed because the comparator of this
14461  class had to be changed in an incompatible way. The class was never
14462  intended to be public.  (Uwe Schindler, Mike McCandless)
14463
14464Bug fixes
14465
14466 * LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
14467   and equals methods, cause bad things to happen when caching
14468   BooleanQueries.  (Chris Hostetter, Mike McCandless)
14469
14470 * LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
14471   the same time, it's possible for commit to return control back to
14472   one of the threads before all changes are actually committed.
14473   (Sanne Grinovero via Mike McCandless)
14474
14475 * LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
14476   with a Version argument.  (Brian Li via Robert Muir)
14477
14478 * LUCENE-2166: Don't incorrectly keep warning about the same immense
14479   term, when IndexWriter.infoStream is on.  (Mike McCandless)
14480
14481 * LUCENE-2158: At high indexing rates, NRT reader could temporarily
14482   lose deletions.  (Mike McCandless)
14483
14484 * LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
14485   implementation class when interface was loaded by a different
14486   class loader.  (Uwe Schindler, reported on java-user by Ahmed El-dawy)
14487
14488 * LUCENE-2257: Increase max number of unique terms in one segment to
14489   termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
14490   (Tom Burton-West via Mike McCandless)
14491
14492 * LUCENE-2260: Fixed AttributeSource to not hold a strong
14493   reference to the Attribute/AttributeImpl classes which prevents
14494   unloading of custom attributes loaded by other classloaders
14495   (e.g. in Solr plugins).  (Uwe Schindler)
14496
14497 * LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
14498   only one payload is present.  (Erik Hatcher, Mike McCandless
14499   via Uwe Schindler)
14500
14501 * LUCENE-2270: Queries consisting of all zero-boost clauses
14502   (for example, text:foo^0) sorted incorrectly and produced
14503   invalid docids. (yonik)
14504
14505API Changes
14506
14507 * LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
14508   (it was accidentally removed in 3.0.0)  (Mike McCandless)
14509
14510 * LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
14511   (it was accidentally removed in 3.0.0)  (John Wang via Uwe Schindler)
14512
14513 * LUCENE-2190: Added a new class CustomScoreProvider to function package
14514   that can be subclassed to provide custom scoring to CustomScoreQuery.
14515   The methods in CustomScoreQuery that did this before were deprecated
14516   and replaced by a method getCustomScoreProvider(IndexReader) that
14517   returns a custom score implementation using the above class. The change
14518   is necessary with per-segment searching, as CustomScoreQuery is
14519   a stateless class (like all other Queries) and does not know about
14520   the currently searched segment. This API works similar to Filter's
14521   getDocIdSet(IndexReader).  (Paul chez Jamespot via Mike McCandless,
14522   Uwe Schindler)
14523
14524 * LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
14525   will cause backwards compatibility problems when upgrading Lucene. See
14526   the Version javadocs for additional information.
14527   (Robert Muir)
14528
14529Optimizations
14530
14531 * LUCENE-2086: When resolving deleted terms, do so in term sort order
14532   for better performance (Bogdan Ghidireac via Mike McCandless)
14533
14534 * LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
14535   added by LUCENE-504.  (Uwe Schindler, Robert Muir, Mike McCandless)
14536
14537 * LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
14538   (Uwe Schindler, Robert Muir)
14539
14540Test Cases
14541
14542 * LUCENE-2114: Change TestFilteredSearch to test on multi-segment
14543   index as well. (Simon Willnauer via Mike McCandless)
14544
14545 * LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
14546   that checks if clearAttributes() was called correctly.
14547   (Uwe Schindler, Robert Muir)
14548
14549 * LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
14550   end() is implemented correctly.  (Koji Sekiguchi, Robert Muir)
14551
14552Documentation
14553
14554 * LUCENE-2114: Improve javadocs of Filter to call out that the
14555   provided reader is per-segment (Simon Willnauer via Mike
14556   McCandless)
14557
14558======================= Release 3.0.0 =======================
14559
14560Changes in backwards compatibility policy
14561
14562* LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
14563  from IndexCommitPoint to IndexCommit. Code that uses this method
14564  needs to be recompiled against Lucene 3.0 in order to work. The
14565  previously deprecated IndexCommitPoint is also removed.
14566  (Michael Busch)
14567
14568* o.a.l.Lock.isLocked() is now allowed to throw an IOException.
14569  (Mike McCandless)
14570
14571* LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
14572  the internal cache implementation for thread safety, before it was
14573  declared protected.  (Peter Lenahan, Uwe Schindler, Simon Willnauer)
14574
14575* LUCENE-2053: If you call Thread.interrupt() on a thread inside
14576  Lucene, Lucene will do its best to interrupt the thread.  However,
14577  instead of throwing InterruptedException (which is a checked
14578  exception), you'll get an oal.util.ThreadInterruptedException (an
14579  unchecked exception, subclassing RuntimeException).  The interrupt
14580  status on the thread is cleared when this exception is thrown.
14581  (Mike McCandless)
14582
14583* LUCENE-2052: Some methods in Lucene core were changed to accept
14584  Java 5 varargs. This is not a backwards compatibility problem as
14585  long as you not try to override such a method. We left common
14586  overridden methods unchanged and added varargs to constructors,
14587  static, or final methods (MultiSearcher,...).  (Uwe Schindler)
14588
14589* LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
14590  reader, and new IndexSearcher(Directory) does the same.  Note that
14591  this is a change in the default from 2.9, when these methods were
14592  previously deprecated.  (Mike McCandless)
14593
14594* LUCENE-1753: Make not yet final TokenStreams final to enforce
14595  decorator pattern. (Uwe Schindler)
14596
14597Changes in runtime behavior
14598
14599* LUCENE-1677: Remove the system property to set SegmentReader class
14600  implementation.  (Uwe Schindler)
14601
14602* LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
14603  support for this type of fields was removed. Lucene 3.0 is still able
14604  to read indexes with compressed fields, but as soon as merges occur
14605  or the index is optimized, all compressed fields are decompressed
14606  and converted to Field.Store.YES. Because of this, indexes with
14607  compressed fields can suddenly get larger. Also the first merge with
14608  decompression cannot be done in raw mode, it is therefore slower.
14609  This change has no effect for code that uses such old indexes,
14610  they behave as before (fields are automatically decompressed
14611  during read). Indexes converted to Lucene 3.0 format cannot be read
14612  anymore with previous versions.
14613  It is recommended to optimize your indexes after upgrading to convert
14614  to the new format and decompress all fields.
14615  If you want compressed fields, you can use CompressionTools, that
14616  creates compressed byte[] to be added as binary stored field. This
14617  cannot be done automatically, as you also have to decompress such
14618  fields when reading. You have to reindex to do that.
14619  (Michael Busch, Uwe Schindler)
14620
14621* LUCENE-2060: Changed ConcurrentMergeScheduler's default for
14622  maxNumThreads from 3 to 1, because in practice we get the most
14623  gains from running a single merge in the background.  More than one
14624  concurrent merge causes a lot of thrashing (though it's possible on
14625  SSD storage that there would be net gains).  (Jason Rutherglen,
14626  Mike McCandless)
14627
14628API Changes
14629
14630* LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
14631  LUCENE-1998: Port to Java 1.5:
14632
14633  - Add generics to public and internal APIs (see below).
14634  - Replace new Integer(int), new Double(double),... by static valueOf() calls.
14635  - Replace for-loops with Iterator by foreach loops.
14636  - Replace StringBuffer with StringBuilder.
14637  - Replace o.a.l.util.Parameter by Java 5 enums (see below).
14638  - Add @Override annotations.
14639  (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
14640  DM Smith)
14641
14642* Generify Lucene API:
14643
14644  - TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
14645    instance of the requested attribute interface and no cast needed anymore
14646    (LUCENE-1855).
14647  - NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
14648    now have Integer, Long, Float, Double as type param (LUCENE-1857).
14649  - Document.getFields() returns List<Fieldable>.
14650  - Query.extractTerms(Set<Term>)
14651  - CharArraySet and stop word sets in core/contrib
14652  - PriorityQueue (LUCENE-1935)
14653  - TopDocCollector
14654  - DisjunctionMaxQuery (LUCENE-1984)
14655  - MultiTermQueryWrapperFilter
14656  - CloseableThreadLocal
14657  - MapOfSets
14658  - o.a.l.util.cache package
14659  - lot's of internal APIs of IndexWriter
14660  (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
14661
14662* LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
14663  LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
14664  LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
14665  Remove deprecated methods/constructors/classes:
14666
14667  - Remove all String/File directory paths in IndexReader /
14668    IndexSearcher / IndexWriter.
14669  - Remove FSDirectory.getDirectory()
14670  - Make FSDirectory abstract.
14671  - Remove Field.Store.COMPRESS (see above).
14672  - Remove Filter.bits(IndexReader) method and make
14673    Filter.getDocIdSet(IndexReader) abstract.
14674  - Remove old DocIdSetIterator methods and make the new ones abstract.
14675  - Remove some methods in PriorityQueue.
14676  - Remove old TokenStream API and backwards compatibility layer.
14677  - Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
14678  - Remove SpanQuery.getTerms().
14679  - Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
14680  - Remove old-style custom sort.
14681  - Remove legacy search setting in SortField.
14682  - Remove Hits and all references from core and contrib.
14683  - Remove HitCollector and its TopDocs support implementations.
14684  - Remove term field and accessors in MultiTermQuery
14685    (and fix Highlighter).
14686  - Remove deprecated methods in BooleanQuery.
14687  - Remove deprecated methods in Similarity.
14688  - Remove BoostingTermQuery.
14689  - Remove MultiValueSource.
14690  - Remove Scorer.explain(int).
14691  ...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
14692
14693* LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
14694  protected; add expert ctor to directly specify reader, subReaders
14695  and docStarts.  (John Wang, Tim Smith via Mike McCandless)
14696
14697* LUCENE-1945: All public classes that have a close() method now
14698  also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
14699  (Uwe Schindler)
14700
14701* LUCENE-1998: Change all Parameter instances to Java 5 enums. This
14702  is no backwards-break, only a change of the super class. Parameter
14703  was deprecated and will be removed in a later version.
14704  (DM Smith, Uwe Schindler)
14705
14706Bug fixes
14707
14708* LUCENE-1951: When the text provided to WildcardQuery has no wildcard
14709  characters (ie matches a single term), don't lose the boost and
14710  rewrite method settings.  Also, rewrite to PrefixQuery if the
14711  wildcard is form "foo*", for slightly faster performance. (Robert
14712  Muir via Mike McCandless)
14713
14714* LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
14715  (Benjamin Keil via Mark Miller)
14716
14717* LUCENE-2088: addAttribute() should only accept interfaces that
14718  extend Attribute. (Shai Erera, Uwe Schindler)
14719
14720* LUCENE-2045: Fix silly FileNotFoundException hit if you enable
14721  infoStream on IndexWriter and then add an empty document and commit
14722  (Shai Erera via Mike McCandless)
14723
14724* LUCENE-2046: IndexReader should not see the index as changed, after
14725  IndexWriter.prepareCommit has been called but before
14726  IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
14727
14728New features
14729
14730* LUCENE-1933: Provide a convenience AttributeFactory that creates a
14731  Token instance for all basic attributes.  (Uwe Schindler)
14732
14733* LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
14734  code refactoring and Java 5 concurrent support in MultiSearcher.
14735  (Joey Surls, Simon Willnauer via Uwe Schindler)
14736
14737* LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
14738  any Set<?> to a CharArraySet that is optimized, if Set<?> is already
14739  an CharArraySet.  (Simon Willnauer)
14740
14741Optimizations
14742
14743* LUCENE-1183: Optimize Levenshtein Distance computation in
14744  FuzzyQuery.  (Cédrik Lime via Mike McCandless)
14745
14746* LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
14747  use Comparable<?> interface.  (Uwe Schindler, Mark Miller)
14748
14749* LUCENE-2087: Remove recursion in NumericRangeTermEnum.
14750  (Uwe Schindler)
14751
14752Build
14753
14754* LUCENE-486: Remove test->demo dependencies. (Michael Busch)
14755
14756* LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
14757  (Uwe Schindler, Mike McCandless)
14758
14759======================= Release 2.9.1 =======================
14760
14761Changes in backwards compatibility policy
14762
14763 * LUCENE-2002: Add required Version matchVersion argument when
14764   constructing QueryParser or MultiFieldQueryParser and, default (as
14765   of 2.9) enablePositionIncrements to true to match
14766   StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
14767
14768Bug fixes
14769
14770 * LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
14771   BooleanScorer for scoring), whereby some matching documents fail to
14772   be collected.  (Fulin Tang via Mike McCandless)
14773
14774 * LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
14775   (stefatwork@gmail.com via Mike McCandless)
14776
14777 * LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
14778   when the reader is a near real-time reader.  (Jake Mannix via Mike
14779   McCandless)
14780
14781 * LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
14782   Mark Miller via Mike McCandless)
14783
14784 * LUCENE-1992: Fix thread hazard if a merge is committing just as an
14785   exception occurs during sync (Uwe Schindler, Mike McCandless)
14786
14787 * LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
14788   cannot exceed 2048 MB, and throw IllegalArgumentException if it
14789   does.  (Aaron McKee, Yonik Seeley, Mike McCandless)
14790
14791 * LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
14792   by client code.  (Uwe Schindler)
14793
14794 * LUCENE-2016: Replace illegal U+FFFF character with the replacement
14795   char (U+FFFD) during indexing, to prevent silent index corruption.
14796   (Peter Keegan, Mike McCandless)
14797
14798API Changes
14799
14800 * Un-deprecate search(Weight weight, Filter filter, int n) from
14801   Searchable interface (deprecated by accident).  (Uwe Schindler)
14802
14803 * Un-deprecate o.a.l.util.Version constants.  (Mike McCandless)
14804
14805 * LUCENE-1987: Un-deprecate some ctors of Token, as they will not
14806   be removed in 3.0 and are still useful. Also add some missing
14807   o.a.l.util.Version constants for enabling invalid acronym
14808   settings in StandardAnalyzer to be compatible with the coming
14809   Lucene 3.0.  (Uwe Schindler)
14810
14811 * LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
14812   to allow controlling per-IndexSearcher whether scores are computed
14813   when sorting by field.  (Uwe Schindler, Mike McCandless)
14814
14815 * LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
14816   (Mike McCandless)
14817
14818Documentation
14819
14820 * LUCENE-1955: Fix Hits deprecation notice to point users in right
14821   direction. (Mike McCandless, Mark Miller)
14822
14823 * Fix javadoc about score tracking done by search methods in Searcher
14824   and IndexSearcher.  (Mike McCandless)
14825
14826 * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
14827   (Luke Nezda via Mike McCandless)
14828
14829======================= Release 2.9.0 =======================
14830
14831Changes in backwards compatibility policy
14832
14833 * LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
14834    longer computes a document score for each hit by default.  If
14835    document score tracking is still needed, you can call
14836    IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
14837    both per-hit and maxScore tracking; however, this is deprecated
14838    and will be removed in 3.0.
14839
14840    Alternatively, use Searchable.search(Weight, Filter, Collector)
14841    and pass in a TopFieldCollector instance, using the following code
14842    sample:
14843
14844    <code>
14845      TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
14846                                                       true /* trackDocScores */,
14847                                                       true /* trackMaxScore */,
14848                                                       false /* docsInOrder */);
14849      searcher.search(query, tfc);
14850      TopDocs results = tfc.topDocs();
14851    </code>
14852
14853    Note that your Sort object cannot use SortField.AUTO when you
14854    directly instantiate TopFieldCollector.
14855
14856    Also, the method search(Weight, Filter, Collector) was added to
14857    the Searchable interface and the Searcher abstract class to
14858    replace the deprecated HitCollector versions.  If you either
14859    implement Searchable or extend Searcher, you should change your
14860    code to implement this method.  If you already extend
14861    IndexSearcher, no further changes are needed to use Collector.
14862
14863    Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
14864    valid scores.  Lucene uses these values internally in certain
14865    places, so if you have hits with such scores, it will cause
14866    problems. (Shai Erera via Mike McCandless)
14867
14868 * LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
14869    have been moved into FieldCache. ExtendedFieldCache is now deprecated and
14870    contains only a few declarations for binary backwards compatibility.
14871    ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
14872    ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
14873    The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
14874    ExtendedFieldCache and FieldCache, FieldCache can now additionally return
14875    long[] and double[] arrays in addition to int[] and float[] and StringIndex.
14876
14877    The interface changes are only notable for users implementing the interfaces,
14878    which was unlikely done, because there is no possibility to change
14879    Lucene's FieldCache implementation.  (Grant Ingersoll, Uwe Schindler)
14880
14881 * LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
14882    class. Some of the method signatures have changed, but it should be fairly
14883    easy to see what adjustments must be made to existing code to sync up
14884    with the new API. You can find more detail in the API Changes section.
14885
14886    Going forward Searchable will be kept for convenience only and may
14887    be changed between minor releases without any deprecation
14888    process. It is not recommended that you implement it, but rather extend
14889    Searcher.
14890    (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
14891
14892 * LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
14893    has some backwards breaks in rare cases. We did our best to make the
14894    transition as easy as possible and you are not likely to run into any problems.
14895    If your tokenizers still implement next(Token) or next(), the calls are
14896    automatically wrapped. The indexer and query parser use the new API
14897    (eg use incrementToken() calls). All core TokenStreams are implemented using
14898    the new API. You can mix old and new API style TokenFilters/TokenStream.
14899    Problems only occur when you have done the following:
14900    You have overridden next(Token) or next() in one of the non-abstract core
14901    TokenStreams/-Filters. These classes should normally be final, but some
14902    of them are not. In this case, next(Token)/next() would never be called.
14903    To fail early with a hard compile/runtime error, the next(Token)/next()
14904    methods in these TokenStreams/-Filters were made final in this release.
14905    (Michael Busch, Uwe Schindler)
14906
14907 * LUCENE-1763: MergePolicy now requires an IndexWriter instance to
14908    be passed upon instantiation. As a result, IndexWriter was removed
14909    as a method argument from all MergePolicy methods. (Shai Erera via
14910    Mike McCandless)
14911
14912 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
14913    compat break and caused custom SpanQuery implementations to fail at runtime
14914    in a variety of ways. This issue attempts to remedy things by causing
14915    a compile time break on custom SpanQuery implementations and removing
14916    the PayloadSpans class, with its functionality now moved to Spans. To
14917    help in alleviating future back compat pain, Spans has been changed from
14918    an interface to an abstract class.
14919    (Hugh Cayless, Mark Miller)
14920
14921 * LUCENE-1808: Query.createWeight has been changed from protected to
14922    public. This will be a back compat break if you have overridden this
14923    method - but you are likely already affected by the LUCENE-1693 (make Weight
14924    abstract rather than an interface) back compat break if you have overridden
14925    Query.creatWeight, so we have taken the opportunity to make this change.
14926    (Tim Smith, Shai Erera via Mark Miller)
14927
14928 * LUCENE-1708 - IndexReader.document() no longer checks if the document is
14929    deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
14930    (Shai Erera via Mike McCandless)
14931
14932
14933Changes in runtime behavior
14934
14935 * LUCENE-1424: QueryParser now by default uses constant score auto
14936    rewriting when it generates a WildcardQuery and PrefixQuery (it
14937    already does so for TermRangeQuery, as well).  Call
14938    setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
14939    to revert to slower BooleanQuery rewriting method.  (Mark Miller via Mike
14940    McCandless)
14941
14942 * LUCENE-1575: As of 2.9, the core collectors as well as
14943    IndexSearcher's search methods that return top N results, no
14944    longer filter documents with scores <= 0.0. If you rely on this
14945    functionality you can use PositiveScoresOnlyCollector like this:
14946
14947    <code>
14948      TopDocsCollector tdc = new TopScoreDocCollector(10);
14949      Collector c = new PositiveScoresOnlyCollector(tdc);
14950      searcher.search(query, c);
14951      TopDocs hits = tdc.topDocs();
14952      ...
14953    </code>
14954
14955 * LUCENE-1604: IndexReader.norms(String field) is now allowed to
14956    return null if the field has no norms, as long as you've
14957    previously called IndexReader.setDisableFakeNorms(true).  This
14958    setting now defaults to false (to preserve the fake norms back
14959    compatible behavior) but in 3.0 will be hardwired to true.  (Shon
14960    Vella via Mike McCandless).
14961
14962 * LUCENE-1624: If you open IndexWriter with create=true and
14963    autoCommit=false on an existing index, IndexWriter no longer
14964    writes an empty commit when it's created.  (Paul Taylor via Mike
14965    McCandless)
14966
14967 * LUCENE-1593: When you call Sort() or Sort.setSort(String field,
14968    boolean reverse), the resulting SortField array no longer ends
14969    with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
14970    internally by docID). (Shai Erera via Michael McCandless)
14971
14972 * LUCENE-1542: When the first token(s) have 0 position increment,
14973    IndexWriter used to incorrectly record the position as -1, if no
14974    payload is present, or Integer.MAX_VALUE if a payload is present.
14975    This causes positional queries to fail to match.  The bug is now
14976    fixed, but if your app relies on the buggy behavior then you must
14977    call IndexWriter.setAllowMinus1Position().  That API is deprecated
14978    so you must fix your application, and rebuild your index, to not
14979    rely on this behavior by the 3.0 release of Lucene. (Jonathan
14980    Mamou, Mark Miller via Mike McCandless)
14981
14982
14983 * LUCENE-1715: Finalizers have been removed from the 4 core classes
14984    that still had them, since they will cause GC to take longer, thus
14985    tying up memory for longer, and at best they mask buggy app code.
14986    DirectoryReader (returned from IndexReader.open) & IndexWriter
14987    previously released the write lock during finalize.
14988    SimpleFSDirectory.FSIndexInput closed the descriptor in its
14989    finalizer, and NativeFSLock released the lock.  It's possible
14990    applications will be affected by this, but only if the application
14991    is failing to close reader/writers.  (Brian Groose via Mike
14992    McCandless)
14993
14994 * LUCENE-1717: Fixed IndexWriter to account for RAM usage of
14995    buffered deletions.  (Mike McCandless)
14996
14997 * LUCENE-1727: Ensure that fields are stored & retrieved in the
14998    exact order in which they were added to the document.  This was
14999    true in all Lucene releases before 2.3, but was broken in 2.3 and
15000    2.4, and is now fixed in 2.9.  (Mike McCandless)
15001
15002 * LUCENE-1678: The addition of Analyzer.reusableTokenStream
15003    accidentally broke back compatibility of external analyzers that
15004    subclassed core analyzers that implemented tokenStream but not
15005    reusableTokenStream.  This is now fixed, such that if
15006    reusableTokenStream is invoked on such a subclass, that method
15007    will forcefully fallback to tokenStream.  (Mike McCandless)
15008
15009 * LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
15010    startOffset, endOffset and type. This is not likely to affect any
15011    Tokenizer chains, as Tokenizers normally always set these three values.
15012    This change was made to be conform to the new AttributeImpl.clear() and
15013    AttributeSource.clearAttributes() to work identical for Token as one for all
15014    AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
15015
15016 * LUCENE-1483: When searching over multiple segments, a new Scorer is now created
15017    for each segment. Searching has been telescoped out a level and IndexSearcher now
15018    operates much like MultiSearcher does. The Weight is created only once for the top
15019    level Searcher, but each Scorer is passed a per-segment IndexReader. This will
15020    result in doc ids in the Scorer being internal to the per-segment IndexReader. It
15021    has always been outside of the API to count on a given IndexReader to contain every
15022    doc id in the index - and if you have been ignoring MultiSearcher in your custom code
15023    and counting on this fact, you will find your code no longer works correctly. If a
15024    custom Scorer implementation uses any caches/filters that rely on being based on the
15025    top level IndexReader, it will need to be updated to correctly use contextless
15026    caches/filters eg you can't count on the IndexReader to contain any given doc id or
15027    all of the doc ids. (Mark Miller, Mike McCandless)
15028
15029 * LUCENE-1846: DateTools now uses the US locale to format the numbers in its
15030    date/time strings instead of the default locale. For most locales there will
15031    be no change in the index format, as DateFormatSymbols is using ASCII digits.
15032    The usage of the US locale is important to guarantee correct ordering of
15033    generated terms.  (Uwe Schindler)
15034
15035 * LUCENE-1860: MultiTermQuery now defaults to
15036    CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
15037    was SCORING_BOOLEAN_QUERY_REWRITE).  This means that PrefixQuery
15038    and WildcardQuery will now produce constant score for all matching
15039    docs, equal to the boost of the query.  (Mike McCandless)
15040
15041API Changes
15042
15043 * LUCENE-1419: Add expert API to set custom indexing chain. This API is
15044   package-protected for now, so we don't have to officially support it.
15045   Yet, it will give us the possibility to try out different consumers
15046   in the chain. (Michael Busch)
15047
15048 * LUCENE-1427: DocIdSet.iterator() is now allowed to throw
15049   IOException.  (Paul Elschot, Mike McCandless)
15050
15051 * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
15052   AttributeSource instead of the Token class, which is now a utility class that
15053   holds common Token attributes. All attributes that the Token class had have
15054   been moved into separate classes: TermAttribute, OffsetAttribute,
15055   PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
15056   The new API is much more flexible; it allows to combine the Attributes
15057   arbitrarily and also to define custom Attributes. The new API has the same
15058   performance as the old next(Token) approach. For conformance with this new
15059   API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
15060   (Michael Busch, Uwe Schindler; additional contributions and bug fixes by
15061   Daniel Shane, Doron Cohen)
15062
15063 * LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
15064   These methods can be used to avoid additional calls to doc().
15065   (Michael Busch)
15066
15067 * LUCENE-1468: Deprecate Directory.list(), which sometimes (in
15068   FSDirectory) filters out files that don't look like index files, in
15069   favor of new Directory.listAll(), which does no filtering.  Also,
15070   listAll() will never return null; instead, it throws an IOException
15071   (or subclass).  Specifically, FSDirectory.listAll() will throw the
15072   newly added NoSuchDirectoryException if the directory does not
15073   exist.  (Marcel Reutegger, Mike McCandless)
15074
15075 * LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
15076   you to record an opaque commitUserData (maps String -> String) into
15077   the commit written by IndexReader.  This matches IndexWriter's
15078   commit methods.  (Jason Rutherglen via Mike McCandless)
15079
15080 * LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
15081   enable compressing & decompressing binary content, external to
15082   Lucene's indexing.  Deprecated Field.Store.COMPRESS.
15083
15084 * LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
15085    (Otis Gospodnetic via Mike McCandless)
15086
15087 * LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
15088    to denote issues when offsets in TokenStream tokens exceed the length of the
15089    provided text.  (Mark Harwood)
15090
15091 * LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
15092    a new Collector abstract class. For easy migration, people can use
15093    HitCollectorWrapper which translates (wraps) HitCollector into
15094    Collector. Note that this class is also deprecated and will be
15095    removed when HitCollector is removed.  Also TimeLimitedCollector
15096    is deprecated in favor of the new TimeLimitingCollector which
15097    extends Collector.  (Shai Erera, Mark Miller, Mike McCandless)
15098
15099 * LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
15100    it is used nowhere in core/contrib and there is only a very ineffective
15101    default implementation available. If you want to position a TermEnum
15102    to another Term, create a new one using IndexReader.terms(Term).
15103    (Uwe Schindler)
15104
15105 * LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
15106    not make sense for all subclasses of MultiTermQuery. Check individual
15107    subclasses to see if they support getTerm().  (Mark Miller)
15108
15109 * LUCENE-1636: Make TokenFilter.input final so it's set only
15110    once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
15111
15112 * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
15113    (but left an FSDirectory base class).  Added an FSDirectory.open
15114    static method to pick a good default FSDirectory implementation
15115    given the OS. FSDirectories should now be instantiated using
15116    FSDirectory.open or with public constructors rather than
15117    FSDirectory.getDirectory(), which has been deprecated.
15118    (Michael McCandless, Uwe Schindler, yonik)
15119
15120 * LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
15121    Instead, when sorting by field, the application should explicitly
15122    state the type of the field.  (Mike McCandless)
15123
15124 * LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
15125    require up front specification of enablePositionIncrement (Mike
15126    McCandless)
15127
15128 * LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
15129    of the new nextDoc() and advance(). The new methods return the doc Id they
15130    landed on, saving an extra call to doc() in most cases.
15131    For easy migration of the code, you can change the calls to next() to
15132    nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
15133    However it is advised that you take advantage of the returned doc ID and not
15134    call doc() following those two.
15135    Also, doc() was deprecated in favor of docID(). docID() should return -1 or
15136    NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
15137    iterator has exhausted. Otherwise it should return the current doc ID.
15138    (Shai Erera via Mike McCandless)
15139
15140 * LUCENE-1672: All ctors/opens and other methods using String/File to
15141    specify the directory in IndexReader, IndexWriter, and IndexSearcher
15142    were deprecated. You should instantiate the Directory manually before
15143    and pass it to these classes (LUCENE-1451, LUCENE-1658).
15144    (Uwe Schindler)
15145
15146 * LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
15147    of Lucene's core into new contrib/remote package.  Searchable no
15148    longer extends java.rmi.Remote (Simon Willnauer via Mike
15149    McCandless)
15150
15151 * LUCENE-1677: The global property
15152    org.apache.lucene.SegmentReader.class, and
15153    ReadOnlySegmentReader.class are now deprecated, to be removed in
15154    3.0.  src/gcj/* has been removed. (Earwin Burrfoot via Mike
15155    McCandless)
15156
15157 * LUCENE-1673: Deprecated NumberTools in favour of the new
15158    NumericRangeQuery and its new indexing format for numeric or
15159    date values.  (Uwe Schindler)
15160
15161 * LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
15162    a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
15163    topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
15164    this method to obtain a scorer matching the capabilities of the Collector
15165    wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
15166    efficient if out-of-order documents scoring is allowed by a Collector.
15167    Collector must now implement acceptsDocsOutOfOrder. If you write a
15168    Collector which does not care about doc ID orderness, it is recommended
15169    that you return true.  Weight has a scoresDocsOutOfOrder method, which by
15170    default returns false.  If you create a Weight which will score documents
15171    out of order if requested, you should override that method to return true.
15172    BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
15173    deprecated as they are not needed anymore. BooleanQuery will now score docs
15174    out of order when used with a Collector that can accept docs out of order.
15175    Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
15176    a top level reader and docID.
15177    (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
15178
15179 * LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
15180    chaining & mapping of characters before tokenizers run. CharStream (subclass of
15181    Reader) is the base class for custom java.io.Reader's, that support offset
15182    correction. Tokenizers got an additional method correctOffset() that is passed
15183    down to the underlying CharStream if input is a subclass of CharStream/-Filter.
15184    (Koji Sekiguchi via Mike McCandless, Uwe Schindler)
15185
15186 * LUCENE-1703: Add IndexWriter.waitForMerges.  (Tim Smith via Mike
15187    McCandless)
15188
15189 * LUCENE-1625: CheckIndex's programmatic API now returns separate
15190    classes detailing the status of each component in the index, and
15191    includes more detailed status than previously.  (Tim Smith via
15192    Mike McCandless)
15193
15194 * LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
15195    TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
15196    score auto rewrite mode by default. The new classes also have new
15197    ctors taking field and term ranges as Strings (see also
15198    LUCENE-1424).  (Uwe Schindler)
15199
15200 * LUCENE-1609: The termInfosIndexDivisor must now be specified
15201    up-front when opening the IndexReader.  Attempts to call
15202    IndexReader.setTermInfosIndexDivisor will hit an
15203    UnsupportedOperationException.  This was done to enable removal of
15204    all synchronization in TermInfosReader, which previously could
15205    cause threads to pile up in certain cases. (Dan Rosher via Mike
15206    McCandless)
15207
15208 * LUCENE-1688: Deprecate static final String stop word array in and
15209    StopAnalzyer and replace it with an immutable implementation of
15210    CharArraySet.  (Simon Willnauer via Mark Miller)
15211
15212 * LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
15213    made public as expert, experimental APIs.  These APIs may suddenly
15214    change from release to release (Jason Rutherglen via Mike
15215    McCandless).
15216
15217 * LUCENE-1754: QueryWeight.scorer() can return null if no documents
15218    are going to be matched by the query. Similarly,
15219    Filter.getDocIdSet() can return null if no documents are going to
15220    be accepted by the Filter. Note that these 'can' return null,
15221    however they don't have to and can return a Scorer/DocIdSet which
15222    does not match / reject all documents.  This is already the
15223    behavior of some QueryWeight/Filter implementations, and is
15224    documented here just for emphasis. (Shai Erera via Mike
15225    McCandless)
15226
15227 * LUCENE-1705: Added IndexWriter.deleteAllDocuments.  (Tim Smith via
15228    Mike McCandless)
15229
15230 * LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
15231    use the new TokenStream API. (Robert Muir, Michael Busch)
15232
15233 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
15234    compat break and caused custom SpanQuery implementations to fail at runtime
15235    in a variety of ways. This issue attempts to remedy things by causing
15236    a compile time break on custom SpanQuery implementations and removing
15237    the PayloadSpans class, with its functionality now moved to Spans. To
15238    help in alleviating future back compat pain, Spans has been changed from
15239    an interface to an abstract class.
15240    (Hugh Cayless, Mark Miller)
15241
15242 * LUCENE-1808: Query.createWeight has been changed from protected to
15243    public. (Tim Smith, Shai Erera via Mark Miller)
15244
15245 * LUCENE-1826: Add constructors that take AttributeSource and
15246    AttributeFactory to all Tokenizer implementations.
15247    (Michael Busch)
15248
15249 * LUCENE-1847: Similarity#idf for both a Term and Term Collection have
15250    been deprecated. New versions that return an IDFExplanation have been
15251    added.  (Yasoja Seneviratne, Mike McCandless, Mark Miller)
15252
15253 * LUCENE-1877: Made NativeFSLockFactory the default for
15254    the new FSDirectory API (open(), FSDirectory subclass ctors).
15255    All FSDirectory system properties were deprecated and all lock
15256    implementations use no lock prefix if the locks are stored inside
15257    the index directory. Because the deprecated String/File ctors of
15258    IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
15259    still use the old SimpleFSLockFactory and the new API
15260    NativeFSLockFactory, we strongly recommend not to mix deprecated
15261    and new API. (Uwe Schindler, Mike McCandless)
15262
15263 * LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
15264    should return true, if the underlying implementation does not use disk
15265    I/O and is fast enough to be directly cached by CachingWrapperFilter.
15266    OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
15267    The default implementation of the abstract DocIdSet class returns false.
15268    In this case, CachingWrapperFilter copies the DocIdSetIterator into an
15269    OpenBitSet for caching.  (Uwe Schindler, Thomas Becker)
15270
15271Bug fixes
15272
15273 * LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
15274   implementation - Leads to Solr Cache misses.
15275   (Todd Feak, Mark Miller via yonik)
15276
15277 * LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
15278   of Terms#skipTo(). (Michael Busch)
15279
15280 * LUCENE-1573: Do not ignore InterruptedException (caused by
15281   Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
15282   will cause a RuntimeException to be thrown.  In 3.0 we will change
15283   public APIs to throw InterruptedException.  (Jeremy Volkman via
15284   Mike McCandless)
15285
15286 * LUCENE-1590: Fixed stored-only Field instances do not change the
15287   value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
15288   retrieve such fields they will now have omitNorms=true and
15289   omitTermFreqAndPositions=false (though these values are unused).
15290   (Uwe Schindler via Mike McCandless)
15291
15292 * LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
15293   without a collator equal to one with a collator.
15294   (Mark Platvoet via Mark Miller)
15295
15296 * LUCENE-1600: Don't call String.intern unnecessarily in some cases
15297   when loading documents from the index.  (P Eger via Mike
15298   McCandless)
15299
15300 * LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
15301   could cause "infinite merging" to happen.  (Christiaan Fluit via
15302   Mike McCandless)
15303
15304 * LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
15305   contain field names with non-ascii characters.  (Mike Streeton via
15306   Mike McCandless)
15307
15308 * LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
15309   sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
15310   when it wasn't). (Shai Erera via Michael McCandless)
15311
15312 * LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
15313    the segment's deletion count to be incorrect. (Mike McCandless)
15314
15315 * LUCENE-1542: When the first token(s) have 0 position increment,
15316    IndexWriter used to incorrectly record the position as -1, if no
15317    payload is present, or Integer.MAX_VALUE if a payload is present.
15318    This causes positional queries to fail to match.  The bug is now
15319    fixed, but if your app relies on the buggy behavior then you must
15320    call IndexWriter.setAllowMinus1Position().  That API is deprecated
15321    so you must fix your application, and rebuild your index, to not
15322    rely on this behavior by the 3.0 release of Lucene. (Jonathan
15323    Mamou, Mark Miller via Mike McCandless)
15324
15325 * LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
15326    on EOF, removed numeric overflow possibilities and added support
15327    for a hack to unmap the buffers on closing IndexInput.
15328    (Uwe Schindler)
15329
15330 * LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
15331    getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
15332
15333 * LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
15334    on this functionality and does not work correctly without it.
15335    (Billow Gao, Mark Miller)
15336
15337 * LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
15338    readers (Mike McCandless)
15339
15340 * LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
15341  documentation indicates it should.  (Moti Nisenson via Mark Miller)
15342
15343 * LUCENE-1566: Sun JVM Bug
15344    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
15345    invalid OutOfMemoryError when reading too many bytes at once from
15346    a file on 32bit JVMs that have a large maximum heap size.  This
15347    fix adds set/getReadChunkSize to FSDirectory so that large reads
15348    are broken into chunks, to work around this JVM bug.  On 32bit
15349    JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
15350    show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
15351    via Mike McCandless)
15352
15353 * LUCENE-1448: Added TokenStream.end() to perform end-of-stream
15354    operations (ie to return the end offset of the tokenization).
15355    This is important when multiple fields with the same name are added
15356    to a document, to ensure offsets recorded in term vectors for all
15357    of the instances are correct.
15358    (Mike McCandless, Mark Miller, Michael Busch)
15359
15360 * LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
15361    although it does allow it in set(Object). Fix get() to not assert the object
15362    is not null. (Shai Erera via Mike McCandless)
15363
15364 * LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
15365    that are the source of Tokens to always call
15366    AttributeSource.clearAttributes() first. (Uwe Schindler)
15367
15368 * LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
15369    that is parsable by the QueryParser.  (John Wang, Mark Miller)
15370
15371 * LUCENE-1836: Fix localization bug in the new query parser and add
15372    new LocalizedTestCase as base class for localization junit tests.
15373    (Robert Muir, Uwe Schindler via Michael Busch)
15374
15375 * LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
15376    in their Weight#explain methods - these stats should be corpus wide.
15377    (Yasoja Seneviratne, Mike McCandless, Mark Miller)
15378
15379 * LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
15380    if the lock was obtained by another NativeFSLock(Factory) instance.
15381    Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
15382    not work correctly.  (Uwe Schindler)
15383
15384 * LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
15385    OpenBitSet, due to an inefficiency in how the underlying storage is
15386    reallocated.  (Nadav Har'El via Mike McCandless)
15387
15388 * LUCENE-1918: Fixed cases where a ParallelReader would
15389   generate exceptions on being passed to
15390   IndexWriter.addIndexes(IndexReader[]).  First case was when the
15391   ParallelReader was empty.  Second case was when the ParallelReader
15392   used to contain documents with TermVectors, but all such documents
15393   have been deleted. (Christian Kohlschütter via Mike McCandless)
15394
15395New features
15396
15397 * LUCENE-1411: Added expert API to open an IndexWriter on a prior
15398    commit, obtained from IndexReader.listCommits.  This makes it
15399    possible to rollback changes to an index even after you've closed
15400    the IndexWriter that made the changes, assuming you are using an
15401    IndexDeletionPolicy that keeps past commits around.  This is useful
15402    when building transactional support on top of Lucene.  (Mike
15403    McCandless)
15404
15405 * LUCENE-1382: Add an optional arbitrary Map (String -> String)
15406    "commitUserData" to IndexWriter.commit(), which is stored in the
15407    segments file and is then retrievable via
15408    IndexReader.getCommitUserData instance and static methods.
15409    (Shalin Shekhar Mangar via Mike McCandless)
15410
15411 * LUCENE-1420: Similarity now has a computeNorm method that allows
15412    custom Similarity classes to override how norm is computed.  It's
15413    provided a FieldInvertState instance that contains details from
15414    inverting the field.  The default impl is boost *
15415    lengthNorm(numTerms), to be backwards compatible.  Also added
15416    {set/get}DiscountOverlaps to DefaultSimilarity, to control whether
15417    overlapping tokens (tokens with 0 position increment) should be
15418    counted in lengthNorm.  (Andrzej Bialecki via Mike McCandless)
15419
15420 * LUCENE-1424: Moved constant score query rewrite capability into
15421    MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
15422    to switch between constant-score rewriting or BooleanQuery
15423    expansion rewriting via a new setRewriteMethod method.
15424    Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
15425    McCandless)
15426
15427 * LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
15428    single-term fields that uses FieldCache to compute the filter.  If
15429    your documents all have a single term for a given field, and you
15430    need to create many RangeFilters with varying lower/upper bounds,
15431    then this is likely a much faster way to create the filters than
15432    RangeFilter.  FieldCacheRangeFilter allows ranges on all data types,
15433    FieldCache supports (term ranges, byte, short, int, long, float, double).
15434    However, it comes at the expense of added RAM consumption and slower
15435    first-time usage due to populating the FieldCache.  It also does not
15436    support collation  (Tim Sturge, Matt Ericson via Mike McCandless and
15437    Uwe Schindler)
15438
15439 * LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
15440    to allow subclasses to choose which DocIdSet implementation to use
15441    (Paul Elschot via Mike McCandless)
15442
15443 * LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
15444    alphabetic, numeric, and symbolic Unicode characters which are not in
15445    the first 127 ASCII characters (the "Basic Latin" Unicode block) into
15446    their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
15447    handles a subset of this filter, has been deprecated.
15448    (Andi Vajda, Steven Rowe via Mark Miller)
15449
15450 * LUCENE-1478: Added new SortField constructor allowing you to
15451    specify a custom FieldCache parser to generate numeric values from
15452    terms for a field.  (Uwe Schindler via Mike McCandless)
15453
15454 * LUCENE-1528: Add support for Ideographic Space to the queryparser.
15455    (Luis Alves via Michael Busch)
15456
15457 * LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
15458    terms on single-valued fields.  The filter loads the FieldCache
15459    for the field the first time it's called, and subsequent usage of
15460    that field, even with different Terms in the filter, are fast.
15461    (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
15462
15463 * LUCENE-1314: Add clone(), clone(boolean readOnly) and
15464    reopen(boolean readOnly) to IndexReader.  Cloning an IndexReader
15465    gives you a new reader which you can make changes to (deletions,
15466    norms) without affecting the original reader.  Now, with clone or
15467    reopen you can change the readOnly of the original reader.  (Jason
15468    Rutherglen, Mike McCandless)
15469
15470 * LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
15471    subclass to implement the "match" method to accept or reject each
15472    docID.  Unlike ChainedFilter (under contrib/misc),
15473    FilteredDocIdSet never requires you to materialize the full
15474    bitset.  Instead, match() is called on demand per docID.  (John
15475    Wang via Mike McCandless)
15476
15477 * LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
15478    to reverse the characters in each token.  (Koji Sekiguchi via yonik)
15479
15480 * LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
15481    efficiently opening a new reader on a specific commit, sharing
15482    resources with the original reader.  (Torin Danil via Mike
15483    McCandless)
15484
15485 * LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
15486    to encode byte[] as String values that are valid terms, and
15487    maintain sort order of the original byte[] when the bytes are
15488    interpreted as unsigned.  (Steven Rowe via Mike McCandless)
15489
15490 * LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
15491    a specific fields to set the score for a document.  (Karl Wettin
15492    via Mike McCandless)
15493
15494 * LUCENE-1586: Add IndexReader.getUniqueTermCount().  (Mike
15495    McCandless via Derek)
15496
15497 * LUCENE-1516: Added "near real-time search" to IndexWriter, via a
15498    new expert getReader() method.  This method returns a reader that
15499    searches the full index, including any uncommitted changes in the
15500    current IndexWriter session.  This should result in a faster
15501    turnaround than the normal approach of commiting the changes and
15502    then reopening a reader.  (Jason Rutherglen via Mike McCandless)
15503
15504 * LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
15505    MultiTermQuery as a Filter.  Also made some improvements to
15506    MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
15507    terms in the enum; track the total number of terms it visited
15508    during rewrite (getTotalNumberOfTerms).  FilteredTermEnum is also
15509    more friendly to subclassing.  (Uwe Schindler via Mike McCandless)
15510
15511 * LUCENE-1605: Added BitVector.subset().  (Jeremy Volkman via Mike
15512    McCandless)
15513
15514 * LUCENE-1618: Added FileSwitchDirectory that enables files with
15515    specified extensions to be stored in a primary directory and the
15516    rest of the files to be stored in the secondary directory.  For
15517    example, this can be useful for the large doc-store (stored
15518    fields, term vectors) files in FSDirectory and the rest of the
15519    index files in a RAMDirectory. (Jason Rutherglen via Mike
15520    McCandless)
15521
15522 * LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
15523    cross-correlate Spans from different fields.
15524    (Paul Cowan and Chris Hostetter)
15525
15526 * LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
15527    deletions into account when considering merges.  (Yasuhiro Matsuda
15528    via Mike McCandless)
15529
15530 * LUCENE-1550: Added new n-gram based String distance measure for spell checking.
15531    See the Javadocs for NGramDistance.java for a reference paper on why
15532    this is helpful (Tom Morton via Grant Ingersoll)
15533
15534 * LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
15535    Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
15536    RangeQuery/RangeFilter for numeric searches. They depend on a specific
15537    structure of terms in the index that can be created by indexing
15538    using the new NumericField or NumericTokenStream classes. NumericField
15539    can only be used for indexing and optionally stores the values as
15540    string representation in the doc store. Documents returned from
15541    IndexReader/IndexSearcher will return only the String value using
15542    the standard Fieldable interface. NumericFields can be sorted on
15543    and loaded into the FieldCache.  (Uwe Schindler, Yonik Seeley,
15544    Mike McCandless)
15545
15546 * LUCENE-1405: Added support for Ant resource collections in contrib/ant
15547    <index> task.  (Przemyslaw Sztoch via Erik Hatcher)
15548
15549 * LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
15550    in conjunction with any other ways to specify stored field values,
15551    currently binary or string values.  (yonik)
15552
15553 * LUCENE-1701: Made the standard FieldCache.Parsers public and added
15554    parsers for fields generated using NumericField/NumericTokenStream.
15555    All standard parsers now also implement Serializable and enforce
15556    their singleton status.  (Uwe Schindler, Mike McCandless)
15557
15558 * LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
15559    On 32 bit platforms, the address space can be very fragmented, so
15560    one big ByteBuffer for the whole file may not fit into address space.
15561    (Eks Dev via Uwe Schindler)
15562
15563 * LUCENE-1644: Enable 4 rewrite modes for queries deriving from
15564    MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
15565    NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
15566    filter and then assigns constant score (boost) to docs;
15567    CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
15568    uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
15569    creates a BooleanQuery but keeps the BooleanQuery's scores;
15570    CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
15571    constant-score rewrite method.  (Mike McCandless)
15572
15573 * LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
15574    operations.  This is currently used to fix offset problems when
15575    multiple fields with the same name are added to a document.
15576    (Mike McCandless, Mark Miller, Michael Busch)
15577
15578 * LUCENE-1776: Add an option to not collect payloads for an ordered
15579    SpanNearQuery. Payloads were not lazily loaded in this case as
15580    the javadocs implied. If you have payloads and want to use an ordered
15581    SpanNearQuery that does not need to use the payloads, you can
15582    disable loading them with a new constructor switch.  (Mark Miller)
15583
15584 * LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
15585    with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
15586
15587 * LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
15588    based on the maximum payload seen for a document.
15589    Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
15590
15591 * LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
15592    hooks to use it in all existing Lucene Tests.  This class can
15593    be used by any application to inspect the FieldCache and provide
15594    diagnostic information about the possibility of inconsistent
15595    FieldCache usage.  Namely: FieldCache entries for the same field
15596    with different datatypes or parsers; and FieldCache entries for
15597    the same field in both a reader, and one of its (descendant) sub
15598    readers.
15599    (Chris Hostetter, Mark Miller)
15600
15601 * LUCENE-1789: Added utility class
15602    oal.search.function.MultiValueSource to ease the transition to
15603    segment based searching for any apps that directly call
15604    oal.search.function.* APIs.  This class wraps any other
15605    ValueSource, but takes care when composite (multi-segment) are
15606    passed to not double RAM usage in the FieldCache.  (Chris
15607    Hostetter, Mark Miller, Mike McCandless)
15608
15609Optimizations
15610
15611 * LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
15612    scores of the query, since they are just discarded.  Also, made it
15613    more efficient (single pass) by not creating & populating an
15614    intermediate OpenBitSet (Paul Elschot, Mike McCandless)
15615
15616 * LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
15617    (Paul Elschot via yonik)
15618
15619 * LUCENE-1484: Remove synchronization of IndexReader.document() by
15620    using CloseableThreadLocal internally.  (Jason Rutherglen via Mike
15621    McCandless).
15622
15623 * LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
15624    is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
15625
15626 * LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
15627    IndexReader.isDeleted() call per document, by directly accessing
15628    the underlying deleteDocs BitVector.  This improves performance
15629    with non-readOnly readers, especially in a multi-threaded
15630    environment.  (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
15631    McCandless)
15632
15633 * LUCENE-1483: When searching over multiple segments we now visit
15634    each sub-reader one at a time.  This speeds up warming, since
15635    FieldCache entries (if required) can be shared across reopens for
15636    those segments that did not change, and also speeds up searches
15637    that sort by relevance or by field values.  (Mark Miller, Mike
15638    McCandless)
15639
15640 * LUCENE-1575: The new Collector class decouples collect() from
15641    score computation.  Collector.setScorer is called to establish the
15642    current Scorer in-use per segment.  Collectors that require the
15643    score should then call Scorer.score() per hit inside
15644    collect(). (Shai Erera via Mike McCandless)
15645
15646 * LUCENE-1596: MultiTermDocs speedup when set with
15647    MultiTermDocs.seek(MultiTermEnum) (yonik)
15648
15649 * LUCENE-1653: Avoid creating a Calendar in every call to
15650    DateTools#dateToString, DateTools#timeToString and
15651    DateTools#round.  (Shai Erera via Mark Miller)
15652
15653 * LUCENE-1688: Deprecate static final String stop word array and
15654    replace it with an immutable implementation of CharArraySet.
15655    Removes conversions between Set and array.
15656    (Simon Willnauer via Mark Miller)
15657
15658 * LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
15659    it won't match any documents (e.g. if there are no required and
15660    optional scorers, or not enough optional scorers to satisfy
15661    minShouldMatch).  (Shai Erera via Mike McCandless)
15662
15663 * LUCENE-1607: To speed up string interning for commonly used
15664    strings, the StringHelper.intern() interface was added with a
15665    default implementation that uses a lockless cache.
15666    (Earwin Burrfoot, yonik)
15667
15668 * LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
15669
15670
15671Documentation
15672
15673 * LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
15674   (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
15675
15676 * LUCENE-1872: NumericField javadoc improvements
15677    (Michael McCandless, Uwe Schindler)
15678
15679 * LUCENE-1875: Make TokenStream.end javadoc less confusing.
15680    (Uwe Schindler)
15681
15682 * LUCENE-1862: Rectified duplicate package level javadocs for
15683    o.a.l.queryParser and o.a.l.analysis.cn.
15684    (Chris Hostetter)
15685
15686 * LUCENE-1886: Improved hyperlinking in key Analysis javadocs
15687    (Bernd Fondermann via Chris Hostetter)
15688
15689 * LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
15690    typos.
15691    (Robert Muir via Chris Hostetter)
15692
15693 * LUCENE-1898: Switch changes to use bullets rather than numbers and
15694    update changes-to-html script to handle the new format.
15695    (Steven Rowe, Mark Miller)
15696
15697 * LUCENE-1900: Improve Searchable Javadoc.
15698    (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
15699
15700 * LUCENE-1896: Improve Similarity#queryNorm javadocs.
15701    (Jiri Kuhn, Mark Miller)
15702
15703Build
15704
15705 * LUCENE-1440: Add new targets to build.xml that allow downloading
15706    and executing the junit testcases from an older release for
15707    backwards-compatibility testing. (Michael Busch)
15708
15709 * LUCENE-1446: Add compatibility tag to common-build.xml and run
15710    backwards-compatibility tests in the nightly build. (Michael Busch)
15711
15712 * LUCENE-1529: Properly test "drop-in" replacement of jar with
15713    backwards-compatibility tests. (Mike McCandless, Michael Busch)
15714
15715 * LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
15716    and clean contrib/surround files. (Luis Alves via Michael Busch)
15717
15718 * LUCENE-1854: tar task should use longfile="gnu" to avoid false file
15719    name length warnings.  (Mark Miller)
15720
15721Test Cases
15722
15723 * LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
15724    classes to wrap IndexReaders and Searchers in MultiReaders or
15725    MultiSearcher when possible to help exercise more edge cases.
15726    (Chris Hostetter, Mark Miller)
15727
15728 * LUCENE-1852: Fix localization test failures.
15729    (Robert Muir via Michael Busch)
15730
15731 * LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
15732    in core and contrib to use a new BaseTokenStreamTestCase
15733    base class. Also rewrote some tests to use this general analysis assert
15734    functions instead of own ones (e.g. TestMappingCharFilter).
15735    The new base class also tests tokenization with the TokenStream.next()
15736    backwards layer enabled (using Token/TokenWrapper as attribute
15737    implementation) and disabled (default for Lucene 3.0)
15738    (Uwe Schindler, Robert Muir)
15739
15740 * LUCENE-1836: Added a new LocalizedTestCase as base class for localization
15741    junit tests.  (Robert Muir, Uwe Schindler via Michael Busch)
15742
15743======================= Release 2.4.1 =======================
15744
15745API Changes
15746
157471. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
15748   resources.  (Christian Kohlschütter via Mike McCandless)
15749
15750Bug fixes
15751
157521. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
15753   truncated to 0 bytes during merging if the segments being merged
15754   are non-congruent (same field name maps to different field
15755   numbers).  This bug was introduced with LUCENE-1219.  (Andrzej
15756   Bialecki via Mike McCandless).
15757
157582. LUCENE-1429: Don't throw incorrect IllegalStateException from
15759   IndexWriter.close() if you've hit an OOM when autoCommit is true.
15760   (Mike McCandless)
15761
157623. LUCENE-1474: If IndexReader.flush() is called twice when there were
15763   pending deletions, it could lead to later false AssertionError
15764   during IndexReader.open.  (Mike McCandless)
15765
157664. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
15767   (masking an actual IOException) that takes String or File path.
15768   (Mike McCandless)
15769
157705. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
15771   token offsets.  (Mike McCandless)
15772
157736. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
15774   incorrectly closing the shared FSDirectory. This bug would only
15775   happen if you use IndexReader.open() with a File or String argument.
15776   The returned readers are wrapped by a FilterIndexReader that
15777   correctly handles closing of directory after reopen()/clone().
15778   (Mark Miller, Uwe Schindler, Mike McCandless)
15779
157807. LUCENE-1457: Fix possible overflow bugs during binary
15781   searches. (Mark Miller via Mike McCandless)
15782
157838. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
15784   both bits() and getDocIdSet() methods are called. (Matt Jones via
15785   Mike McCandless)
15786
157879. LUCENE-1519: Fix int overflow bug during segment merging.  (Deepak
15788   via Mike McCandless)
15789
1579010. LUCENE-1521: Fix int overflow bug when flushing segment.
15791    (Shon Vella via Mike McCandless).
15792
1579311. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
15794    (Mike McCandless via Doug Sale)
15795
1579612. LUCENE-1547: Fix rare thread safety issue if two threads call
15797    IndexWriter commit() at the same time.  (Mike McCandless)
15798
1579913. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
15800    rather than the correct, shortest match; Payloads could be returned even
15801    if the max slop was exceeded; The wrong payload could be returned in
15802    certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
15803
1580414. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
15805    resources.  (Christian Kohlschütter via Mike McCandless)
15806
1580715. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
15808    rollback IndexWriter's internal state on hitting an
15809    exception. (Scott Garland via Mike McCandless)
15810
15811======================= Release 2.4.0 =======================
15812
15813Changes in backwards compatibility policy
15814
158151. LUCENE-1340: In a minor change to Lucene's backward compatibility
15816   policy, we are now allowing the Fieldable interface to have
15817   changes, within reason, and made on a case-by-case basis.  If an
15818   application implements its own Fieldable, please be aware of
15819   this.  Otherwise, no need to be concerned.  This is in effect for
15820   all 2.X releases, starting with 2.4.  Also note, that in all
15821   likelihood, Fieldable will be changed in 3.0.
15822
15823
15824Changes in runtime behavior
15825
15826 1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
15827    (eg lucene.apache.org) as an ACRONYM.  To get back to the pre-2.4
15828    backwards compatible, but buggy, behavior, you can either call
15829    StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
15830    method), or, set system property
15831    org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
15832    to "false" on JVM startup.  All StandardAnalyzer instances created
15833    after that will then show the pre-2.4 behavior.  Alternatively,
15834    you can call setReplaceInvalidAcronym(false) to change the
15835    behavior per instance of StandardAnalyzer.  This backwards
15836    compatibility will be removed in 3.0 (hardwiring the value to
15837    true).  (Mike McCandless)
15838
15839 2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
15840    that a reader can see the changes) far less often than it used to.
15841    Previously, every flush was also a commit.  You can always force a
15842    commit by calling IndexWriter.commit().  Furthermore, in 3.0,
15843    autoCommit will be hardwired to false (IndexWriter constructors
15844    that take an autoCommit argument have been deprecated) (Mike
15845    McCandless)
15846
15847 3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
15848    addIndexesNoOptimize no longer allow the same Directory instance
15849    to be passed in more than once.  Internally, IndexWriter uses
15850    Directory and segment name to uniquely identify segments, so
15851    adding the same Directory more than once was causing duplicates
15852    which led to problems (Mike McCandless)
15853
15854 4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
15855    positions are indicated with a ? and multiple terms at the same
15856    position are joined with a |.  (Andrzej Bialecki via Mike
15857    McCandless)
15858
15859API Changes
15860
15861 1. LUCENE-1084: Changed all IndexWriter constructors to take an
15862    explicit parameter for maximum field size.  Deprecated all the
15863    pre-existing constructors; these will be removed in release 3.0.
15864    NOTE: these new constructors set autoCommit to false.  (Steven
15865    Rowe via Mike McCandless)
15866
15867 2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
15868    java.util.BitSet. This allows using more efficient data structures
15869    for Filters and makes them more flexible. This deprecates
15870    Filter.bits(), so all filters that implement this outside
15871    the Lucene code base will need to be adapted. See also the javadocs
15872    of the Filter class. (Paul Elschot, Michael Busch)
15873
15874 3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
15875    adds/deletes and then commits a new segments file so readers will
15876    see the changes.  Deprecate IndexWriter.flush() in favor of
15877    IndexWriter.commit().  (Mike McCandless)
15878
15879 4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
15880    consult the MergePolicy to find merges necessary to merge away all
15881    deletes from the index.  This should be a somewhat lower cost
15882    operation than optimize.  (John Wang via Mike McCandless)
15883
15884 5. LUCENE-1233: Return empty array instead of null when no fields
15885    match the specified name in these methods in Document:
15886    getFieldables, getFields, getValues, getBinaryValues.  (Stefan
15887    Trcek vai Mike McCandless)
15888
15889 6. LUCENE-1234: Make BoostingSpanScorer protected.  (Andi Vajda via Grant Ingersoll)
15890
15891 7. LUCENE-510: The index now stores strings as true UTF-8 bytes
15892    (previously it was Java's modified UTF-8).  If any text, either
15893    stored fields or a token, has illegal UTF-16 surrogate characters,
15894    these characters are now silently replaced with the Unicode
15895    replacement character U+FFFD.  This is a change to the index file
15896    format.  (Marvin Humphrey via Mike McCandless)
15897
15898 8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
15899    and RAM buffer size.  (Otis Gospodnetic)
15900
15901 9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
15902    and remove all references to these classes from the core. Also update demos
15903    and tutorials. (Michael Busch)
15904
1590510. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
15906    getVersion() returns the same value that IndexReader.getVersion()
15907    returns when the reader is opened on the same commit.  (Jason
15908    Rutherglen via Mike McCandless)
15909
1591011. LUCENE-1311: Added IndexReader.listCommits(Directory) static
15911    method to list all commits in a Directory, plus IndexReader.open
15912    methods that accept an IndexCommit and open the index as of that
15913    commit.  These methods are only useful if you implement a custom
15914    DeletionPolicy that keeps more than the last commit around.
15915    (Jason Rutherglen via Mike McCandless)
15916
1591712. LUCENE-1325: Added IndexCommit.isOptimized().  (Shalin Shekhar
15918    Mangar via Mike McCandless)
15919
1592013. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
15921    McCandless)
15922
1592314. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
15924    frequency, positions and payloads.  This saves index space, and
15925    indexing/searching time.  (Eks Dev via Mike McCandless)
15926
1592715. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
15928    getBinaryValue/Offset/Length(); currently only lazy fields reuse
15929    the provided byte[] result to getBinaryValue.  (Eks Dev via Mike
15930    McCandless)
15931
1593216. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
15933    which defaults term text to "".  (DM Smith via Mike McCandless)
15934
1593517. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
15936    Token.  Also added term() method to return a String, with a
15937    performance penalty clearly documented.  Also implemented
15938    hashCode() and equals() in Token, and fixed all core and contrib
15939    analyzers to use the re-use APIs.  (DM Smith via Mike McCandless)
15940
1594118. LUCENE-1329: Add optional readOnly boolean when opening an
15942    IndexReader.  A readOnly reader is not allowed to make changes
15943    (deletions, norms) to the index; in exchanged, the isDeleted
15944    method, often a bottleneck when searching with many threads, is
15945    not synchronized.  The default for readOnly is still false, but in
15946    3.0 the default will become true.  (Jason Rutherglen via Mike
15947    McCandless)
15948
1594919. LUCENE-1367: Add IndexCommit.isDeleted().  (Shalin Shekhar Mangar
15950    via Mike McCandless)
15951
1595220. LUCENE-1061: Factored out all "new XXXQuery(...)" in
15953    QueryParser.java into protected methods newXXXQuery(...) so that
15954    subclasses can create their own subclasses of each Query type.
15955    (John Wang via Mike McCandless)
15956
1595721. LUCENE-753: Added new Directory implementation
15958    org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
15959    FileChannel to do file reads.  On most non-Windows platforms, with
15960    many threads sharing a single searcher, this may yield sizable
15961    improvement to query throughput when compared to FSDirectory,
15962    which only allows a single thread to read from an open file at a
15963    time.  (Jason Rutherglen via Mike McCandless)
15964
1596522. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
15966    (Mike McCandless)
15967
1596823. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
15969    constructor and fields from package to protected. (Shai Erera
15970    via Doron Cohen)
15971
1597224. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
15973    which is equivalent to
15974    getDirectory().fileModified(getSegmentsFileName()).  (Mike McCandless)
15975
1597623. LUCENE-1366: Rename Field.Index options to be more accurate:
15977    TOKENIZED becomes ANALYZED;  UN_TOKENIZED becomes NOT_ANALYZED;
15978    NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
15979    is added.  (Mike McCandless)
15980
1598124. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
15982
15983Bug fixes
15984
15985 1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
15986    clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
15987
15988 2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
15989    a filter might miss some hits because scorer.skipTo() is called
15990    without checking if the scorer is already at the right position.
15991    scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
15992    scorer.next(). (Eks Dev, Michael Busch)
15993
15994 3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
15995
15996 4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
15997    of a single field phrase. (Trejkaz via Doron Cohen)
15998
15999 5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
16000    result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
16001
16002 6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
16003    deprecated docCount().  (Mike McCandless)
16004
16005 7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
16006    which does phase 1 of a 2-phase commit (commit() does phase 2).
16007    This is needed when you want to update an index as part of a
16008    transaction involving external resources (eg a database).  Also
16009    deprecated abort(), renaming it to rollback().  (Mike McCandless)
16010
16011 8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
16012    (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
16013
16014 9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
16015    methods, plus removal of IndexReader reference.
16016    (Naveen Belkale via Otis Gospodnetic)
16017
1601810. LUCENE-1046: Removed dead code in SpellChecker
16019    (Daniel Naber via Otis Gospodnetic)
16020
1602111. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
16022    quoted terms correctly. (Tomer Gabel via Michael Busch)
16023
1602412. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
16025
1602613. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
16027    depending only upon the non-payload score part, regardless of the effect of
16028    the payload on the score. Prior to this, score of a query containing a BTQ
16029    differed from its explanation. (Doron Cohen)
16030
1603114. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
16032    than twice in the query. (Doron Cohen)
16033
1603415. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
16035
1603616. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
16037    ThreadLocal, to prevent Lucene from causing unexpected
16038    OutOfMemoryError in certain situations (notably J2EE
16039    applications).  (Chris Lu via Mike McCandless)
16040
16041New features
16042
16043 1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
16044    process.  The flag is not indexed/stored and is thus only used by analysis.
16045
16046 2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
16047    check only a specific segment or segments in your index.  (Mike
16048    McCandless)
16049
16050 3. LUCENE-1045: Reopened this issue to add support for short and bytes.
16051
16052 4. LUCENE-584: Added new data structures to o.a.l.util, such as
16053    OpenBitSet and SortedVIntList. These extend DocIdSet and can
16054    directly be used for Filters with the new Filter API. Also changed
16055    the core Filters to use OpenBitSet instead of java.util.BitSet.
16056    (Paul Elschot, Michael Busch)
16057
16058 5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
16059    This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
16060
16061 6. LUCENE-1044: Change Lucene to properly "sync" files after
16062    committing, to ensure on a machine or OS crash or power cut, even
16063    with cached writes, the index remains consistent.  Also added
16064    explicit commit() method to IndexWriter to force a commit without
16065    having to close.  (Mike McCandless)
16066
16067 7. LUCENE-997: Add search timeout (partial) support.
16068    A TimeLimitedCollector was added to allow limiting search time.
16069    It is a partial solution since timeout is checked only when
16070    collecting a hit, and therefore a search for rare words in a
16071    huge index might not stop within the specified time.
16072    (Sean Timm via Doron Cohen)
16073
16074 8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
16075    close/re-open of IndexWriter while still protecting an open
16076    snapshot (Tim Brennan via Mike McCandless)
16077
16078 9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
16079    documents matching the specified query.  Also added static unlock
16080    and isLocked methods (deprecating the ones in IndexReader).  (Mike
16081    McCandless)
16082
1608310. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
16084    via Mike McCandless)
16085
1608611. LUCENE-550:  Added InstantiatedIndex implementation.  Experimental
16087    Index store similar to MemoryIndex but allows for multiple documents
16088    in memory.  (Karl Wettin via Grant Ingersoll)
16089
1609012. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
16091    that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
16092
1609313. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
16094
1609514. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
16096    and DocIdSetIterator-based filters. Backwards-compatibility with old
16097    BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
16098
1609915. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
16100
1610116. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
16102
1610317. LUCENE-1297: Allow other string distance measures for the SpellChecker
16104    (Thomas Morton via Otis Gospodnetic)
16105
1610618. LUCENE-1001: Provide access to Payloads via Spans.  All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
16107
1610819. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
16109
1611020. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser.  (Steve Rowe via Grant Ingersoll)
16111
16112Optimizations
16113
16114 1. LUCENE-705: When building a compound file, use
16115    RandomAccessFile.setLength() to tell the OS/filesystem to
16116    pre-allocate space for the file.  This may improve fragmentation
16117    in how the CFS file is stored, and allows us to detect an upcoming
16118    disk full situation before actually filling up the disk.  (Mike
16119    McCandless)
16120
16121 2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
16122    raw bytes for each contiguous range of non-deleted documents.
16123    (Mike McCandless)
16124
16125 3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
16126    SegmentTermEnum is null for every call of scanTo().
16127    (Christian Kohlschuetter via Michael Busch)
16128
16129 4. LUCENE-1217: Internal to Field.java, use isBinary instead of
16130    runtime type checking for possible speedup of binaryValue().
16131    (Eks Dev via Mike McCandless)
16132
16133 5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
16134    less memory than the previous version.  (Cédrik LIME via Otis Gospodnetic)
16135
16136 6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
16137    TermInfosReader. In performance experiments the speedup was about 25% on
16138    average on mid-size indexes with ~500,000 documents for queries with 3
16139    terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
16140
16141Documentation
16142
16143  1. LUCENE-1236:  Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
16144
16145  2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
16146     from CHANGES.txt. This HTML file is currently visible only via developers page.
16147     (Steven Rowe via Doron Cohen)
16148
16149  3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason.  See the note at
16150  the top of this file and also on Fieldable.java).  (Grant Ingersoll)
16151
16152  4. LUCENE-1873: Update documentation to reflect current Contrib area status.
16153     (Steven Rowe, Mark Miller)
16154
16155Build
16156
16157  1. LUCENE-1153: Added JUnit JAR to new lib directory.  Updated build to rely on local JUnit instead of ANT/lib.
16158
16159  2. LUCENE-1202: Small fixes to the way Clover is used to work better
16160     with contribs.  Of particular note: a single clover db is used
16161     regardless of whether tests are run globally or in the specific
16162     contrib directories.
16163
16164  3. LUCENE-1353: Javacc target in contrib/miscellaneous for
16165     generating the precedence query parser.
16166
16167Test Cases
16168
16169 1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
16170    Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
16171    collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
16172
16173 2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
16174    timeout exceeded (just because test machine is very busy).
16175
16176======================= Release 2.3.2 =======================
16177
16178Bug fixes
16179
16180 1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
16181    methods in IndexWriter, do not commit any further changes to the
16182    index to prevent risk of possible corruption.  (Mike McCandless)
16183
16184 2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
16185    too early when TermVectors were in use.  (Mike McCandless)
16186
16187 3. LUCENE-1198: Don't corrupt index if an exception happens inside
16188    DocumentsWriter.init (Mike McCandless)
16189
16190 4. LUCENE-1199: Added defensive check for null indexReader before
16191    calling close in IndexModifier.close() (Mike McCandless)
16192
16193 5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
16194    ConcurrentMergeScheduler is in use (Mike McCandless)
16195
16196 6. LUCENE-1208: Fix deadlock case on hitting an exception while
16197    processing a document that had triggered a flush (Mike McCandless)
16198
16199 7. LUCENE-1210: Fix deadlock case on hitting an exception while
16200    starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
16201
16202 8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
16203    flush (Mark Ferguson via Mike McCandless)
16204
16205 9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
16206    successfully created compound files. (Michael Busch)
16207
1620810. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
16209    this was accidentally lost with LUCENE-966.  (Nicolas Lalevée via
16210    Mike McCandless)
16211
1621211. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
16213    hitting an exception in readInternal, the buffer is incorrectly
16214    filled with stale bytes such that subsequent calls to readByte()
16215    return incorrect results.  (Trejkaz via Mike McCandless)
16216
1621712. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
16218    would hang after IndexWriter.addIndexesNoOptimize had been
16219    called.  (Stu Hood via Mike McCandless)
16220
16221Build
16222
16223 1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
16224
16225
16226======================= Release 2.3.1 =======================
16227
16228Bug fixes
16229
16230 1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
16231    documents have mixed term vectors (Suresh Guvvala via Mike
16232    McCandless).
16233
16234 2. LUCENE-1171: Fixed some cases where OOM errors could cause
16235    deadlock in IndexWriter (Mike McCandless).
16236
16237 3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
16238    merging of stored fields is used (Yonik via Mike McCandless).
16239
16240 4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
16241    offset, int len) that was ignoring offset and thus giving the
16242    wrong answer.  (Thomas Peuss via Mike McCandless)
16243
16244 5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
16245    many merges at the end.  (Mike McCandless)
16246
16247 6. LUCENE-1176: Fix corruption case when documents with no term
16248    vector fields are added before documents with term vector fields.
16249    (Mike McCandless)
16250
16251 7. LUCENE-1179: Fixed assert statement that was incorrectly
16252    preventing Fields with empty-string field name from working.
16253    (Sergey Kabashnyuk via Mike McCandless)
16254
16255======================= Release 2.3.0 =======================
16256
16257Changes in runtime behavior
16258
16259 1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
16260    out-of-the-box indexing speed.  First, IndexWriter now flushes by
16261    RAM usage (16 MB by default) instead of a fixed doc count (call
16262    IndexWriter.setMaxBufferedDocs to get backwards compatible
16263    behavior).  Second, ConcurrentMergeScheduler is used to run merges
16264    using background threads (call IndexWriter.setMergeScheduler(new
16265    SerialMergeScheduler()) to get backwards compatible behavior).
16266    Third, merges are chosen based on size in bytes of each segment
16267    rather than document count of each segment (call
16268    IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
16269    backwards compatible behavior).
16270
16271    NOTE: users of ParallelReader must change back all of these
16272    defaults in order to ensure the docIDs "align" across all parallel
16273    indices.
16274
16275    (Mike McCandless)
16276
16277 2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
16278    the field type for sorting automatically, numbers used to be
16279    interpreted as int, then as float, if parsing the number as an int
16280    failed. Now the detection checks for int, then for long,
16281    then for float. (Daniel Naber)
16282
16283API Changes
16284
16285 1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
16286    IndexWriter flush whenever the buffered documents are using more
16287    than the specified amount of RAM.  Also added new APIs to Token
16288    that allow one to set a char[] plus offset and length to specify a
16289    token (to avoid creating a new String() for each Token).  (Mike
16290    McCandless)
16291
16292 2. LUCENE-963: Add setters to Field to allow for re-using a single
16293    Field instance during indexing.  This is a sizable performance
16294    gain, especially for small documents.  (Mike McCandless)
16295
16296 3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
16297    permit re-using of Token and TokenStream instances during
16298    indexing.  Changed Token to use a char[] as the store for the
16299    termText instead of String.  This gives faster tokenization
16300    performance (~10-15%).  (Mike McCandless)
16301
16302 4. LUCENE-847: Factored MergePolicy, which determines which merges
16303    should take place and when, as well as MergeScheduler, which
16304    determines when the selected merges should actually run, out of
16305    IndexWriter.  The default merge policy is now
16306    LogByteSizeMergePolicy (see LUCENE-845) and the default merge
16307    scheduler is now ConcurrentMergeScheduler (see
16308    LUCENE-870). (Steven Parkes via Mike McCandless)
16309
16310 5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
16311    that allows you to reduce memory usage of the termInfos by further
16312    sub-sampling (over the termIndexInterval that was used during
16313    indexing) which terms are loaded into memory.  (Chuck Williams,
16314    Doug Cutting via Mike McCandless)
16315
16316 6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
16317    existing IndexReader (see New features -> 8.) (Michael Busch)
16318
16319 7. LUCENE-1062: Add setData(byte[] data),
16320    setData(byte[] data, int offset, int length), getData(), getOffset()
16321    and clone() methods to o.a.l.index.Payload. Also add the field name
16322    as arg to Similarity.scorePayload(). (Michael Busch)
16323
16324 8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
16325    "partially optimize" an index down to maxNumSegments segments.
16326    (Mike McCandless)
16327
16328 9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
16329
1633010. LUCENE-1064: Changed TopDocs constructor to be public.
16331     (Shai Erera via Michael Busch)
16332
1633311. LUCENE-1079: DocValues cleanup: constructor now has no params,
16334    and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
16335
1633612. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
16337    the Object (if any) that was bumped from the queue to allow
16338    re-use.  (Shai Erera via Mike McCandless)
16339
1634013. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
16341    modified so it is token producer's responsibility
16342    to call Token.clear(). (Doron Cohen)
16343
1634414. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
16345    255 characters) tokens.  You can increase this limit by calling
16346    StandardAnalyzer.setMaxTokenLength(...).  (Michael McCandless)
16347
16348
16349Bug fixes
16350
16351 1. LUCENE-933: QueryParser fixed to not produce empty sub
16352    BooleanQueries "()" even if the Analyzer produced no
16353    tokens for input. (Doron Cohen)
16354
16355 2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
16356    first term in the dictionary. (Michael Busch)
16357
16358 3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
16359    that was thrown after a call of TermPositions.seek().
16360    (Rich Johnson via Michael Busch)
16361
16362 4. LUCENE-938: Fixed cases where an unhandled exception in
16363    IndexWriter's methods could cause deletes to be lost.
16364    (Steven Parkes via Mike McCandless)
16365
16366 5. LUCENE-962: Fixed case where an unhandled exception in
16367    IndexWriter.addDocument or IndexWriter.updateDocument could cause
16368    unreferenced files in the index to not be deleted
16369    (Steven Parkes via Mike McCandless)
16370
16371 6. LUCENE-957: RAMDirectory fixed to properly handle directories
16372    larger than Integer.MAX_VALUE. (Doron Cohen)
16373
16374 7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
16375    isOptimized() or getVersion() is called. Separated MultiReader
16376    into two classes: MultiSegmentReader extends IndexReader, is
16377    package-protected and is created automatically by IndexReader.open()
16378    in case the index has multiple segments. The public MultiReader
16379    now extends MultiSegmentReader and is intended to be used by users
16380    who want to add their own subreaders. (Daniel Naber, Michael Busch)
16381
16382 8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
16383    a call of isOptimized() would throw a NPE. (Michael Busch)
16384
16385 9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
16386    isOptimized() or getVersion() is called. (Michael Busch)
16387
1638810. LUCENE-948: Fix FNFE exception caused by stale NFS client
16389    directory listing caches when writers on different machines are
16390    sharing an index over NFS and using a custom deletion policy (Mike
16391    McCandless)
16392
1639311. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
16394    close any streams they had opened if an exception is hit in the
16395    constructor.  (Ning Li via Mike McCandless)
16396
1639712. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
16398    we now throw an IllegalArgumentException saying the term is too
16399    long, instead of cryptic ArrayIndexOutOfBoundsException.  (Karl
16400    Wettin via Mike McCandless)
16401
1640213. LUCENE-991: The explain() method of BoostingTermQuery had errors
16403    when no payloads were present on a document.  (Peter Keegan via
16404    Grant Ingersoll)
16405
1640614. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
16407    (this was broken by LUCENE-843).  (Ning Li via Mike McCandless)
16408
1640915. LUCENE-1008: Fixed corruption case when document with no term
16410    vector fields is added after documents with term vector fields.
16411    This bug was introduced with LUCENE-843.  (Grant Ingersoll via
16412    Mike McCandless)
16413
1641416. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
16415    length quoted string.)  (yonik)
16416
1641717. LUCENE-1010: Fixed corruption case when document with no term
16418    vector fields is added after documents with term vector fields.
16419    This case is hit during merge and would cause an EOFException.
16420    This bug was introduced with LUCENE-984.  (Andi Vajda via Mike
16421    McCandless)
16422
1642319. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
16424    autoCommit=false and documents are using stored fields and/or term
16425    vectors.  (Mark Miller via Mike McCandless)
16426
1642720. LUCENE-1011: Fixed corruption case when two or more machines,
16428    sharing an index over NFS, can be writers in quick succession.
16429    (Patrick Kimber via Mike McCandless)
16430
1643121. LUCENE-1028: Fixed Weight serialization for few queries:
16432    DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
16433    Serialization check added for all queries.
16434    (Kyle Maxwell via Doron Cohen)
16435
1643622. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
16437    timeout argument is very large (eg Long.MAX_VALUE).  Also added
16438    Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout.  (Nikolay
16439    Diakov via Mike McCandless)
16440
1644123. LUCENE-1050: Throw LockReleaseFailedException in
16442    Simple/NativeFSLockFactory if we fail to delete the lock file when
16443    releasing the lock.  (Nikolay Diakov via Mike McCandless)
16444
1644524. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
16446    the merged segment. (Michael Busch)
16447
1644825. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
16449    with other getTermFreqVector calls.  Also removed the throwing of the other IOException in that method to be consistent.  (Karl Wettin via Grant Ingersoll)
16450
1645126. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
16452    along with iterating the hits. Deleting docs already retrieved
16453    now works seamlessly. If docs not yet retrieved are deleted
16454    (e.g. from another thread), and then, relying on the initial
16455    Hits.length(), an application attempts to retrieve more hits
16456    than actually exist , a ConcurrentMidificationException
16457    is thrown.  (Doron Cohen)
16458
1645927. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
16460  the type of some tokens incorrectly.  This is done by adding a new flag named
16461  replaceInvalidAcronym which defaults to false, the current, incorrect behavior.  Setting
16462  this flag to true fixes the problem.  This flag is a temporary fix and is already
16463  marked as being deprecated.  3.x will implement the correct approach.  (Shai Erera via Grant Ingersoll)
16464  LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
16465
1646628. LUCENE-749: ChainedFilter behavior fixed when logic of
16467    first filter is ANDNOT.  (Antonio Bruno via Doron Cohen)
16468
1646929. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
16470    term) after next() returns false.  (Steven Tamm via Mike
16471    McCandless)
16472
16473
16474New features
16475
16476 1. LUCENE-906: Elision filter for French.
16477    (Mathieu Lecarme via Otis Gospodnetic)
16478
16479 2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
16480    not only filtering, but knowing where in a Document a Filter matches
16481    (Grant Ingersoll)
16482
16483 3. LUCENE-868: Added new Term Vector access features.  New callback
16484    mechanism allows application to define how and where to read Term
16485    Vectors from disk. This implementation contains several extensions
16486    of the new abstract TermVectorMapper class.  The new API should be
16487    back-compatible.  No changes in the actual storage of Term Vectors
16488    has taken place.
16489 3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
16490     to provide information about what document is being accessed.
16491     (Karl Wettin via Grant Ingersoll)
16492
16493 4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
16494    position based lookup of term vector information.
16495    See item #3 above (LUCENE-868).
16496
16497 5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
16498    to verify that locking is working properly.  LockVerifyServer runs
16499    a separate server to verify locks.  LockStressTest runs a simple
16500    tool that rapidly obtains and releases locks.
16501    VerifyingLockFactory is a LockFactory that wraps any other
16502    LockFactory and consults the LockVerifyServer whenever a lock is
16503    obtained or released, throwing an exception if an illegal lock
16504    obtain occurred.  (Patrick Kimber via Mike McCandless)
16505
16506 6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
16507    support doubles and longs.  Added support into SortField for sorting
16508    on doubles and longs as well.  (Grant Ingersoll)
16509
16510 7. LUCENE-1020: Created basic index checking & repair tool
16511    (o.a.l.index.CheckIndex).  When run without -fix it does a
16512    detailed test of all segments in the index and reports summary
16513    information and any errors it hit.  With -fix it will remove
16514    segments that had errors.  (Mike McCandless)
16515
16516 8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
16517    existing IndexReader by only loading those portions of an index
16518    that have changed since the reader was (re)opened. reopen() can
16519    be significantly faster than open(), depending on the amount of
16520    index changes. SegmentReader, MultiSegmentReader, MultiReader,
16521    and ParallelReader implement reopen(). (Michael Busch)
16522
16523 9. LUCENE-1040: CharArraySet useful for efficiently checking
16524    set membership of text specified by char[]. (yonik)
16525
1652610. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
16527    live backup of an index without pausing indexing.  (Mike
16528    McCandless)
16529
1653011. LUCENE-1019: CustomScoreQuery enhanced to support multiple
16531    ValueSource queries. (Kyle Maxwell via Doron Cohen)
16532
1653312. LUCENE-1095: Added an option to StopFilter to increase
16534    positionIncrement of the token succeeding a stopped token.
16535    Disabled by default. Similar option added to QueryParser
16536    to consider token positions when creating PhraseQuery
16537    and MultiPhraseQuery. Disabled by default (so by default
16538    the query parser ignores position increments).
16539    (Doron Cohen)
16540
1654113. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
16542
16543
16544
16545Optimizations
16546
16547 1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
16548    Tokens that are cached in the LinkedList. This increases performance
16549    significantly, especially when the number of Tokens is large.
16550    (Mark Miller via Michael Busch)
16551
16552 2. LUCENE-843: Substantial optimizations to improve how IndexWriter
16553    uses RAM for buffering documents and to speed up indexing (2X-8X
16554    faster).  A single shared hash table now records the in-memory
16555    postings per unique term and is directly flushed into a single
16556    segment.  (Mike McCandless)
16557
16558 3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
16559    takes place when using compound files.  (Mike McCandless)
16560
16561 4. LUCENE-959: Remove synchronization in Document (yonik)
16562
16563 5. LUCENE-963: Add setters to Field to allow for re-using a single
16564    Field instance during indexing.  This is a sizable performance
16565    gain, especially for small documents.  (Mike McCandless)
16566
16567 6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
16568    and don't rely on exceptions. (Michael Busch)
16569
16570 7. LUCENE-966: Very substantial speedups (~6X faster) for
16571    StandardTokenizer (StandardAnalyzer) by using JFlex instead of
16572    JavaCC to generate the tokenizer.
16573    (Stanislaw Osinski via Mike McCandless)
16574
16575 8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
16576    TokenStream instances when possible to improve tokenization
16577    performance (~10-15%). (Mike McCandless)
16578
16579 9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
16580    McCandless)
16581
1658210. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
16583    subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
16584    now extend DirectoryIndexReader and are the only IndexReader
16585    implementations that use SegmentInfos to access an index and
16586    acquire a write lock for index modifications. (Michael Busch)
16587
1658811. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
16589    either RAM usage or document count or both (whichever comes
16590    first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
16591    one of the flush triggers.  (Ning Li via Mike McCandless)
16592
1659312. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
16594    raw bytes for each contiguous range of non-deleted documents.
16595    (Robert Engels via Mike McCandless)
16596
1659713. LUCENE-693: Speed up nested conjunctions (~2x) that match many
16598    documents, and a slight performance increase for top level
16599    conjunctions.  (yonik)
16600
1660114. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
16602    and final. (Nathan Beyer via Michael Busch)
16603
16604Documentation
16605
16606 1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
16607    classes, as well as an unified view. Also add an appropriate menu
16608    structure to the website. (Michael Busch)
16609
16610 2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
16611    (Ronnie Kolehmainen via Michael Busch)
16612
16613Build
16614
16615 1. LUCENE-908: Improvements and simplifications for how the MANIFEST
16616    file and the META-INF dir are created. (Michael Busch)
16617
16618 2. LUCENE-935: Various improvements for the maven artifacts. Now the
16619    artifacts also include the sources as .jar files. (Michael Busch)
16620
16621 3. Added apply-patch target to top-level build.  Defaults to looking for
16622    a patch in ${basedir}/../patches with name specified by -Dpatch.name.
16623    Can also specify any location by -Dpatch.file property on the command
16624    line.  This should be helpful for easy application of patches, but it
16625    is also a step towards integrating automatic patch application with
16626    JIRA and Hudson, and is thus subject to change.  (Grant Ingersoll)
16627
16628 4. LUCENE-935: Defined property "m2.repository.url" to allow setting
16629    the url to a maven remote repository to deploy to. (Michael Busch)
16630
16631 5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
16632
16633 6. LUCENE-1055: Remove gdata-server from build files and its sources
16634    from trunk. (Michael Busch)
16635
16636 7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
16637    via scp and ssh authentication. (Michael Busch)
16638
16639 8. LUCENE-1123: Allow overriding the specification version for
16640    MANIFEST.MF (Michael Busch)
16641
16642Test Cases
16643
16644 1. LUCENE-766: Test adding two fields with the same name but different
16645    term vector setting.  (Nicolas Lalevée via Doron Cohen)
16646
16647======================= Release 2.2.0 =======================
16648
16649Changes in runtime behavior
16650
16651API Changes
16652
16653 1. LUCENE-793: created new exceptions and added them to throws clause
16654    for many methods (all subclasses of IOException for backwards
16655    compatibility): index.StaleReaderException,
16656    index.CorruptIndexException, store.LockObtainFailedException.
16657    This was done to better call out the possible root causes of an
16658    IOException from these methods.  (Mike McCandless)
16659
16660 2. LUCENE-811: make SegmentInfos class, plus a few methods from related
16661    classes, package-private again (they were unnecessarily made public
16662    as part of LUCENE-701).  (Mike McCandless)
16663
16664 3. LUCENE-710: added optional autoCommit boolean to IndexWriter
16665    constructors.  When this is false, index changes are not committed
16666    until the writer is closed.  This gives explicit control over when
16667    a reader will see the changes.  Also added optional custom
16668    deletion policy to explicitly control when prior commits are
16669    removed from the index.  This is intended to allow applications to
16670    share an index over NFS by customizing when prior commits are
16671    deleted. (Mike McCandless)
16672
16673 4. LUCENE-818: changed most public methods of IndexWriter,
16674    IndexReader (and its subclasses), FieldsReader and RAMDirectory to
16675    throw AlreadyClosedException if they are accessed after being
16676    closed.  (Mike McCandless)
16677
16678 5. LUCENE-834: Changed some access levels for certain Span classes to allow them
16679    to be overridden.  They have been marked expert only and not for public
16680    consumption. (Grant Ingersoll)
16681
16682 6. LUCENE-796: Removed calls to super.* from various get*Query methods in
16683    MultiFieldQueryParser, in order to allow sub-classes to override them.
16684    (Steven Parkes via Otis Gospodnetic)
16685
16686 7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
16687    in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
16688    combination when caching is desired.
16689    (Chris Hostetter, Otis Gospodnetic)
16690
16691 8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
16692    to enable extensibility of these classes. (Michael Busch)
16693
16694 9. LUCENE-580: Added the public method reset() to TokenStream. This method does
16695    nothing by default, but may be overwritten by subclasses to support consuming
16696    the TokenStream more than once. (Michael Busch)
16697
1669810. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
16699    argument, available as tokenStreamValue(). This is useful to avoid the need of
16700    "dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
16701
1670211. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
16703    getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
16704    getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
16705    improves performance for certain queries but results in scoring out of docid
16706    order. This patch reverse this change, so now by default hit docs are scored
16707    in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
16708    This patch also enables the tests in QueryUtils again that check for docid
16709    order. (Paul Elschot, Doron Cohen, Michael Busch)
16710
1671112. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
16712    to optionally specify the size of the read buffer.  Also added
16713    BufferedIndexInput.setBufferSize(int) to change the buffer size.
16714    (Mike McCandless)
16715
1671613. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
16717    to be public because it implements the public interface TermPositionVector.
16718    (Michael Busch)
16719
16720Bug fixes
16721
16722 1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist.  (Doron Cohen)
16723
16724 2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
16725    Query parser modified to create a prefix query only for the case
16726    that there is a single trailing wildcard (and no additional wildcard
16727    or '?' in the query text).  (Doron Cohen)
16728
16729 3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
16730    and SimpleFSLockFactory.  This enables all 4 builtin LockFactory
16731    implementations to be specified via the System property
16732    org.apache.lucene.store.FSDirectoryLockFactoryClass.  (Mike McCandless)
16733
16734 4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
16735    failed to reduce the number of open descriptors since it was still
16736    opened once per field with norms. (yonik)
16737
16738 5. LUCENE-823: Make sure internal file handles are closed when
16739    hitting an exception (eg disk full) while flushing deletes in
16740    IndexWriter's mergeSegments, and also during
16741    IndexWriter.addIndexes.  (Mike McCandless)
16742
16743 6. LUCENE-825: If directory is removed after
16744    FSDirectory.getDirectory() but before IndexReader.open you now get
16745    a FileNotFoundException like Lucene pre-2.1 (before this fix you
16746    got an NPE).  (Mike McCandless)
16747
16748 7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
16749    because the backslash is the escape character. Also changed the ESCAPED_CHAR
16750    list to contain all possible characters, because every character that
16751    follows a backslash should be considered as escaped. (Michael Busch)
16752
16753 8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
16754    is consumed. Now a ParseException is thrown if a query contains too many
16755    closing parentheses. (Andreas Neumann via Michael Busch)
16756
16757 9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
16758    Now also deleting all javacc generated files before calling javacc.
16759    (Steven Parkes, Doron Cohen)
16760
1676110. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
16762
1676311. LUCENE-828: Minor fix for Term's equal().
16764    (Paul Cowan via Otis Gospodnetic)
16765
1676612. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
16767    and you call addIndexes, and hit an exception (eg disk full) then
16768    when IndexWriter rolls back its internal state this could corrupt
16769    the instance of IndexWriter (but, not the index itself) by
16770    referencing already deleted segments.  This bug was only present
16771    in 2.2 (trunk), ie was never released.  (Mike McCandless)
16772
1677313. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
16774    For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
16775
1677614. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
16777    by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
16778    Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
16779    was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
16780    designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
16781
1678215. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
16783    has written the postings. Then the resources associated with the
16784    TokenStreams can safely be released. (Michael Busch)
16785
1678616. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
16787    won't insert terms twice anymore. (Daniel Naber)
16788
1678917. LUCENE-881: QueryParser.escape() now also escapes the characters
16790    '|' and '&' which are part of the queryparser syntax. (Michael Busch)
16791
1679218. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
16793    anymore and ignored, but re-thrown. Some javadoc improvements.
16794    (Daniel Naber)
16795
1679619. LUCENE-698: FilteredQuery now takes the query boost into account for
16797    scoring. (Michael Busch)
16798
1679920. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
16800    enumeration. (Christian Mallwitz via Daniel Naber)
16801
1680221. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
16803    Explanation tests now "deep" check the explanation details.
16804    (Chris Hostetter, Doron Cohen)
16805
1680622. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
16807    skip target param and ends up at the first match.
16808    (Sudaakeran B. via Chris Hostetter & Doron Cohen)
16809
1681023. LUCENE-913: Two consecutive score() calls return different
16811    scores for Boolean Queries. (Michael Busch, Doron Cohen)
16812
1681324. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
16814    box", again, by moving set/getMaxMergeDocs up from
16815    LogDocMergePolicy into LogMergePolicy.  This fixes the API
16816    breakage (non backwards compatible change) caused by LUCENE-994.
16817    (Yonik Seeley via Mike McCandless)
16818
16819New features
16820
16821 1. LUCENE-759: Added two n-gram-producing TokenFilters.
16822    (Otis Gospodnetic)
16823
16824 2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
16825    RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
16826
16827 3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
16828    These metadata are called Payloads. For every position of a Token one Payload in the form
16829    of a variable length byte array can be stored in the prox file.
16830    Remark: The APIs introduced with this feature are in experimental state and thus
16831            contain appropriate warnings in the javadocs.
16832    (Michael Busch)
16833
16834 4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
16835    values of a payload (see #3 above.) (Grant Ingersoll)
16836
16837 5. LUCENE-834: Similarity has a new method for scoring payloads called
16838    scorePayloads that can be overridden to take advantage of payload
16839    storage (see #3 above)
16840
16841 6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
16842    implemented it in the appropriate places (Grant Ingersoll)
16843
16844 7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
16845    on the remote side of the RMI connection.
16846    (Matt Ericson via Otis Gospodnetic)
16847
16848 8. LUCENE-446: Added Solr's search.function for scores based on field
16849    values, plus CustomScoreQuery for simple score (post) customization.
16850    (Yonik Seeley, Doron Cohen)
16851
16852 9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
16853    Fields such that the other Fields do not have to go through the whole Analysis process over again.  For instance, if you have two
16854    Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
16855    between the two using the TeeTokenFilter and the SinkTokenizer.  See TeeSinkTokenTest.java for examples.
16856    (Grant Ingersoll, Michael Busch, Yonik Seeley)
16857
16858Optimizations
16859
16860 1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
16861    when nextPosition() is called for the first time. This allows using instances
16862    of SegmentTermPositions instead of SegmentTermDocs without additional costs.
16863    (Michael Busch)
16864
16865 2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
16866    IndexOutput directly now. This avoids further buffering and thus avoids
16867    unnecessary array copies. (Michael Busch)
16868
16869 3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
16870    cases and possibly improve scoring performance.  Documents can now be
16871    delivered out-of-order as they are scored (e.g. to HitCollector).
16872    N.B. A bit of code had to be disabled in QueryUtils in order for
16873    TestBoolean2 test to keep passing.
16874    (Paul Elschot via Otis Gospodnetic)
16875
16876 4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
16877    them to keep the spell index small. (Daniel Naber)
16878
16879 5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
16880    Together with LUCENE-888 this will allow to adjust the buffer size
16881    dynamically. (Paul Elschot, Michael Busch)
16882
16883 6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
16884    BufferedIndexOutput.  Also increase buffer size in
16885    BufferedIndexInput, but only when used during merging.  Together,
16886    these increases yield 10-18% overall performance gain vs the
16887    previous 1K defaults.  (Mike McCandless)
16888
16889 7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
16890    up most queries that use skipTo(), especially on big indexes with large posting
16891    lists. For average AND queries the speedup is about 20%, for queries that
16892    contain very frequent and very unique terms the speedup can be over 80%.
16893    (Michael Busch)
16894
16895Documentation
16896
16897 1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
16898    http://wiki.apache.org/lucene-java/   Updated the links in the docs and
16899    wherever else I found references.  (Grant Ingersoll, Joe Schaefer)
16900
16901 2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
16902    consistent with java.util.Comparator.compare(): Any integer is allowed to
16903    be returned instead of only -1/0/1.
16904    (Paul Cowan via Michael Busch)
16905
16906 3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
16907    Solved javadoc errors under jdk5 (jars in path for gdata).
16908    Made "javadocs" target depend on "build-contrib" for first downloading
16909    contrib jars configured for dynamic downloaded. (Note: when running
16910    behind firewall, a firewall prompt might pop up) (Doron Cohen)
16911
16912 4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
16913    remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
16914
16915 5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
16916
16917 6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
16918
16919Build
16920
16921 1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
16922    (Steven Parkes via Michael Busch)
16923
16924 2. LUCENE-885: "ant test" now includes all contrib tests.  The new
16925    "ant test-core" target can be used to run only the Core (non
16926    contrib) tests.
16927    (Chris Hostetter)
16928
16929 3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
16930    (Doron Cohen)
16931
16932 4. LUCENE-894: Add custom build file for binary distributions that includes
16933    targets to build the demos. (Chris Hostetter, Michael Busch)
16934
16935 5. LUCENE-904: The "package" targets in build.xml now also generate .md5
16936    checksum files. (Chris Hostetter, Michael Busch)
16937
16938 6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
16939    demo war, demo jar, and the contrib jars. (Michael Busch)
16940
16941 7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
16942
16943 8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
16944    for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
16945    jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
16946    (Chris Hostetter, Michael Busch)
16947
16948 9. LUCENE-930: Various contrib building improvements to ensure contrib
16949    dependencies are met, and test compilation errors fail the build.
16950    (Steven Parkes, Chris Hostetter)
16951
1695210. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
16953    of the Lucene core and the contrib modules.
16954    (Sami Siren, Karl Wettin, Michael Busch)
16955
16956======================= Release 2.1.0 =======================
16957
16958Changes in runtime behavior
16959
16960 1. 's' and 't' have been removed from the list of default stopwords
16961    in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
16962    as a stopword meant that 's-class' led to the same results as 'class'.
16963    Note that this problem still exists for 'a', e.g. in 'a-class' as
16964    'a' continues to be a stopword.
16965    (Daniel Naber)
16966
16967 2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
16968    (now split into CJ and K) in StandardAnalyzer.  (John Wang and
16969    Steven Rowe via Otis Gospodnetic)
16970
16971 3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
16972    and added a few more of them to increase CJK character coverage.
16973    Also documented some of the ranges.
16974    (Otis Gospodnetic)
16975
16976 4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
16977    QueryParser.  Default is to disallow them, as before.
16978    (Steven Parkes via Otis Gospodnetic)
16979
16980 5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
16981    for range queries. Added useOldRangeQuery property to QueryParser to allow
16982    selection of old RangeQuery class if required.
16983    (Mark Harwood)
16984
16985 6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
16986    does not contain a wildcard character (? or *), when previously a
16987    StringIndexOutOfBoundsException was thrown.
16988    (Michael Busch via Erik Hatcher)
16989
16990 7. LUCENE-726: Removed the use of deprecated doc.fields() method and
16991    Enumeration.
16992    (Michael Busch via Otis Gospodnetic)
16993
16994 8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
16995    and added a call to enumerators.remove() in TermInfosReader.close().
16996    The finalize() overrides were added to help with a pre-1.4.2 JVM bug
16997    that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
16998    (Otis Gospodnetic)
16999
17000 9. LUCENE-771: The default location of the write lock is now the
17001    index directory, and is named simply "write.lock" (without a big
17002    digest prefix).  The system properties "org.apache.lucene.lockDir"
17003    nor "java.io.tmpdir" are no longer used as the global directory
17004    for storing lock files, and the LOCK_DIR field of FSDirectory is
17005    now deprecated.  (Mike McCandless)
17006
17007New features
17008
17009 1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
17010    (Samphan Raruenrom via Chris Hostetter)
17011
17012 2. LUCENE-545: New FieldSelector API and associated changes to
17013    IndexReader and implementations.  New Fieldable interface for use
17014    with the lazy field loading mechanism.  (Grant Ingersoll and Chuck
17015    Williams via Grant Ingersoll)
17016
17017 3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
17018    Smolsky, Yonik Seeley)
17019
17020 4. LUCENE-678: Added NativeFSLockFactory, which implements locking
17021    using OS native locking (via java.nio.*).  (Michael McCandless via
17022    Yonik Seeley)
17023
17024 5. LUCENE-544: Added the ability to specify different boosts for
17025    different fields when using MultiFieldQueryParser (Matt Ericson
17026    via Otis Gospodnetic)
17027
17028 6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
17029    optimize the index when adding new segments, only performing
17030    merges as needed.  (Ning Li via Yonik Seeley)
17031
17032 7. LUCENE-573: QueryParser now allows backslash escaping in
17033    quoted terms and phrases. (Michael Busch via Yonik Seeley)
17034
17035 8. LUCENE-716: QueryParser now allows specification of Unicode
17036    characters in terms via a unicode escape of the form \uXXXX
17037    (Michael Busch via Yonik Seeley)
17038
17039 9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
17040    and IndexWriter.flushRamSegments(), allowing applications to
17041    control the amount of memory used to buffer documents.
17042    (Chuck Williams via Yonik Seeley)
17043
1704410. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
17045    (Yonik Seeley)
17046
1704711. LUCENE-741: Command-line utility for modifying or removing norms
17048    on fields in an existing index.  This is mostly based on LUCENE-496
17049    and lives in contrib/miscellaneous.
17050    (Chris Hostetter, Otis Gospodnetic)
17051
1705212. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
17053    their passing unit tests.
17054    (Otis Gospodnetic)
17055
1705613. LUCENE-565: Added methods to IndexWriter to more efficiently
17057    handle updating documents (the "delete then add" use case).  This
17058    is intended to be an eventual replacement for the existing
17059    IndexModifier.  Added IndexWriter.flush() (renamed from
17060    flushRamSegments()) to flush all pending updates (held in RAM), to
17061    the Directory.  (Ning Li via Mike McCandless)
17062
1706314. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
17064    which allow one to retrieve the size of a field without retrieving the
17065    actual field. (Chuck Williams via Grant Ingersoll)
17066
1706715. LUCENE-799: Properly handle lazy, compressed fields.
17068    (Mike Klaas via Grant Ingersoll)
17069
17070API Changes
17071
17072 1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
17073    changing of termText via setTermText().  (Yonik Seeley)
17074
17075 2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
17076    and is supposed to be replaced with the WordlistLoader class in
17077    package org.apache.lucene.analysis (Daniel Naber)
17078
17079 3. LUCENE-609: Revert return type of Document.getField(s) to Field
17080    for backward compatibility, added new Document.getFieldable(s)
17081    for access to new lazy loaded fields. (Yonik Seeley)
17082
17083 4. LUCENE-608: Document.fields() has been deprecated and a new method
17084    Document.getFields() has been added that returns a List instead of
17085    an Enumeration (Daniel Naber)
17086
17087 5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
17088    subclass allows explain methods to produce Explanations which model
17089    "matching" independent of having a positive value.
17090    (Chris Hostetter)
17091
17092 6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
17093    and IndexWriter.setDefaultCommitLockTimeout for overriding default
17094    timeout values for all future instances of IndexWriter (as well
17095    as for any other classes that may reference the static values,
17096    ie: IndexReader).
17097    (Michael McCandless via Chris Hostetter)
17098
17099 7. LUCENE-638: FSDirectory.list() now only returns the directory's
17100    Lucene-related files. Thanks to this change one can now construct
17101    a RAMDirectory from a file system directory that contains files
17102    not related to Lucene.
17103    (Simon Willnauer via Daniel Naber)
17104
17105 8. LUCENE-635: Decoupling locking implementation from Directory
17106    implementation.  Added set/getLockFactory to Directory and moved
17107    all locking code into subclasses of abstract class LockFactory.
17108    FSDirectory and RAMDirectory still default to their prior locking
17109    implementations, but now you can mix & match, for example using
17110    SingleInstanceLockFactory (ie, in memory locking) locking with an
17111    FSDirectory.  Note that now you must call setDisableLocks before
17112    the instantiation a FSDirectory if you wish to disable locking
17113    for that Directory.
17114    (Michael McCandless, Jeff Patterson via Yonik Seeley)
17115
17116 9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
17117    (Steven Parkes via Otis Gospodnetic)
17118
1711910. LUCENE-701: Lockless commits: a commit lock is no longer required
17120    when a writer commits and a reader opens the index.  This includes
17121    a change to the index file format (see docs/fileformats.html for
17122    details).  It also removes all APIs associated with the commit
17123    lock & its timeout.  Readers are now truly read-only and do not
17124    block one another on startup.  This is the first step to getting
17125    Lucene to work correctly over NFS (second step is
17126    LUCENE-710). (Mike McCandless)
17127
1712811. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
17129    in Similarity's MoreLikeThis class. The misspelling has been
17130    replaced by the correct spelling.
17131    (Andi Vajda via Daniel Naber)
17132
1713312. LUCENE-738: Reduce the size of the file that keeps track of which
17134    documents are deleted when the number of deleted documents is
17135    small.  This changes the index file format and cannot be
17136    read by previous versions of Lucene.  (Doron Cohen via Yonik Seeley)
17137
1713813. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
17139    number of open files and file descriptors for the non-compound index
17140    format.  This changes the index file format, but maintains the
17141    ability to read and update older indices. The first segment merge
17142    on an older format index will create a single .nrm file for the new
17143    segment.  (Doron Cohen via Yonik Seeley)
17144
1714514. LUCENE-732: DateTools support has been added to QueryParser, with
17146    setters for both the default Resolution, and per-field Resolution.
17147    For backwards compatibility, DateField is still used if no Resolutions
17148    are specified. (Michael Busch via Chris Hostetter)
17149
1715015. Added isOptimized() method to IndexReader.
17151    (Otis Gospodnetic)
17152
1715316. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
17154    take a boolean "create" argument.  Instead you should use
17155    IndexWriter's "create" argument to create a new index.
17156    (Mike McCandless)
17157
1715817. LUCENE-780: Add a static Directory.copy() method to copy files
17159    from one Directory to another.  (Jiri Kuhn via Mike McCandless)
17160
1716118. LUCENE-773: Added Directory.clearLock(String name) to forcefully
17162    remove an old lock.  The default implementation is to ask the
17163    lockFactory (if non null) to clear the lock.  (Mike McCandless)
17164
1716519. LUCENE-795: Directory.renameFile() has been deprecated as it is
17166    not used anymore inside Lucene.  (Daniel Naber)
17167
17168Bug fixes
17169
17170 1. Fixed the web application demo (built with "ant war-demo") which
17171    didn't work because it used a QueryParser method that had
17172    been removed (Daniel Naber)
17173
17174 2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
17175    (Yonik Seeley)
17176
17177 3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
17178    (Karl Wettin via Yonik Seeley)
17179
17180 4. LUCENE-587: Explanation.toHtml was producing malformed HTML
17181    (Chris Hostetter)
17182
17183 5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
17184
17185 6. LUCENE-601: RAMDirectory and RAMFile made Serializable
17186    (Karl Wettin via Otis Gospodnetic)
17187
17188 7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
17189    Explanations match up with the real scores.
17190    (Chris Hostetter)
17191
17192 8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
17193    new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
17194
17195 9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
17196    disambiguate inner class scorer's use of doc() in BooleanScorer2,
17197    other test code changes.  (DM Smith via Yonik Seeley)
17198
1719910. LUCENE-451: All core query types now use ComplexExplanations so that
17200    boosts of zero don't confuse the BooleanWeight explain method.
17201    (Chris Hostetter)
17202
1720311. LUCENE-593: Fixed LuceneDictionary's inner Iterator
17204    (Kåre Fiedler Christiansen via Otis Gospodnetic)
17205
1720612. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
17207    (Daniel Naber)
17208
1720913. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
17210    to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
17211
1721214. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
17213    has no value.
17214    (Oliver Hutchison via Chris Hostetter)
17215
1721615. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
17217    (Yonik Seeley)
17218
1721916. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
17220    lock to be shared between different directories.
17221    (Michael McCandless via Yonik Seeley)
17222
1722317. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
17224    (Yonik Seeley)
17225
1722618. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
17227    called on it before next().  (Yonik Seeley)
17228
1722919. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
17230    to recognize ordered spans if they overlapped with unordered spans.
17231    (Paul Elschot via Chris Hostetter)
17232
1723320. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
17234    in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
17235
1723621. LUCENE-715: Fixed private constructor in IndexWriter.java to
17237    properly release the acquired write lock if there is an
17238    IOException after acquiring the write lock but before finishing
17239    instantiation. (Matthew Bogosian via Mike McCandless)
17240
1724122. LUCENE-651: Multiple different threads requesting the same
17242    FieldCache entry (often for Sorting by a field) at the same
17243    time caused multiple generations of that entry, which was
17244    detrimental to performance and memory use.
17245    (Oliver Hutchison via Otis Gospodnetic)
17246
1724723. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
17248    (Doron Cohen via Otis Gospodnetic)
17249
1725024. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
17251    classes from contrib/similarity, as their new home is under
17252    contrib/queries.
17253    (Otis Gospodnetic)
17254
1725525. LUCENE-669: Do not double-close the RandomAccessFile in
17256    FSIndexInput/Output during finalize().  Besides sending an
17257    IOException up to the GC, this may also be the cause intermittent
17258    "The handle is invalid" IOExceptions on Windows when trying to
17259    close readers or writers. (Michael Busch via Mike McCandless)
17260
1726126. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
17262    on any exceptions (eg disk full).  The semantics of these methods
17263    is now transactional: either all indices are merged or none are.
17264    Also fixed IndexWriter.mergeSegments (called outside of
17265    addIndexes(*) by addDocument, optimize, flushRamSegments) and
17266    IndexReader.commit() (called by close) to clean up and keep the
17267    instance state consistent to what's actually in the index (Mike
17268    McCandless).
17269
1727027. LUCENE-129: Change finalizers to do "try {...} finally
17271    {super.finalize();}" to make sure we don't miss finalizers in
17272    classes above us. (Esmond Pitt via Mike McCandless)
17273
1727428. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
17275    IndexReaders to hang around forever, in addition to not
17276    fixing the original FieldCache performance problem.
17277    (Chris Hostetter, Yonik Seeley)
17278
1727929. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
17280    correctly raise ArrayIndexOutOfBoundsException when docNum is too
17281    large.  Previously, if docNum was only slightly too large (within
17282    the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
17283    exception would be raised and instead the index would become
17284    silently corrupted.  The corruption then only appears much later,
17285    in mergeSegments, when the corrupted segment is merged with
17286    segment(s) after it. (Mike McCandless)
17287
1728830. LUCENE-768: Fix case where an Exception during deleteDocument,
17289    undeleteAll or setNorm in IndexReader could leave the reader in a
17290    state where close() fails to release the write lock.
17291    (Mike McCandless)
17292
1729331. Remove "tvp" from known index file extensions because it is
17294    never used. (Nicolas Lalevée via Bernhard Messer)
17295
1729632. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
17297    rely on file length check and instead use the SegmentInfo's
17298    docCount that's already stored explicitly in the index.  This is a
17299    defensive bug fix (ie, there is no known problem seen "in real
17300    life" due to this, just a possible future problem).  (Chuck
17301    Williams via Mike McCandless)
17302
17303Optimizations
17304
17305  1. LUCENE-586: TermDocs.skipTo() is now more efficient for
17306     multi-segment indexes.  This will improve the performance of many
17307     types of queries against a non-optimized index. (Andrew Hudson
17308     via Yonik Seeley)
17309
17310  2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
17311     internal "files", allowing them to be GCed even if references to the
17312     RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
17313
17314  3. LUCENE-629: Compressed fields are no longer uncompressed and
17315     recompressed during segment merges (e.g. during indexing or
17316     optimizing), thus improving performance . (Michael Busch via Otis
17317     Gospodnetic)
17318
17319  4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
17320     large by keeping a count of buffered documents rather than
17321     counting after each document addition.  (Doron Cohen, Paul Smith,
17322     Yonik Seeley)
17323
17324  5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
17325     looping through docs. (Grant Ingersoll)
17326
17327  6. LUCENE-672: New indexing segment merge policy flushes all
17328     buffered docs to their own segment and delays a merge until
17329     mergeFactor segments of a certain level have been accumulated.
17330     This increases indexing performance in the presence of deleted
17331     docs or partially full segments as well as enabling future
17332     optimizations.
17333
17334     NOTE: this also fixes an "under-merging" bug whereby it is
17335     possible to get far too many segments in your index (which will
17336     drastically slow down search, risks exhausting file descriptor
17337     limit, etc.).  This can happen when the number of buffered docs
17338     at close, plus the number of docs in the last non-ram segment is
17339     greater than mergeFactor. (Ning Li, Yonik Seeley)
17340
17341  7. Lazy loaded fields unnecessarily retained an extra copy of loaded
17342     String data.  (Yonik Seeley)
17343
17344  8. LUCENE-443: ConjunctionScorer performance increase.  Speed up
17345     any BooleanQuery with more than one mandatory clause.
17346     (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
17347
17348  9. LUCENE-365: DisjunctionSumScorer performance increase of
17349     ~30%. Speeds up queries with optional clauses. (Paul Elschot via
17350     Yonik Seeley)
17351
17352 10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
17353     size buffers, which will speed up merging and retrieving binary
17354     and compressed fields.  (Nadav Har'El via Yonik Seeley)
17355
17356 11. LUCENE-687: Lazy skipping on proximity file speeds up most
17357     queries involving term positions, including phrase queries.
17358     (Michael Busch via Yonik Seeley)
17359
17360 12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
17361     with calls to System.arraycopy instead, in DocumentWriter.java.
17362     (Nicolas Lalevee via Mike McCandless)
17363
17364 13. LUCENE-729: Non-recursive skipTo and next implementation of
17365     TermDocs for a MultiReader.  The old implementation could
17366     recurse up to the number of segments in the index. (Yonik Seeley)
17367
17368 14. LUCENE-739: Improve segment merging performance by reusing
17369     the norm array across different fields and doing bulk writes
17370     of norms of segments with no deleted docs.
17371    (Michael Busch via Yonik Seeley)
17372
17373 15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
17374     to the List of clauses and replaced the internal synchronized Vector
17375     with an unsynchronized List. (Yonik Seeley)
17376
17377 16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
17378     FSIndexInput finalizer to the actual file so all clones don't
17379     register a new finalizer. (Yonik Seeley)
17380
17381Test Cases
17382
17383  1. Added TestTermScorer.java (Grant Ingersoll)
17384
17385  2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
17386
17387  3. LUCENE-744 Append the user.name property onto the temporary directory
17388     that is created so it doesn't interfere with other users. (Grant Ingersoll)
17389
17390Documentation
17391
17392  1. Added style sheet to xdocs named lucene.css and included in the
17393     Anakia VSL descriptor.  (Grant Ingersoll)
17394
17395  2. Added scoring.xml document into xdocs.  Updated Similarity.java
17396     scoring formula.(Grant Ingersoll and Steve Rowe.  Updates from:
17397     Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
17398     Issue 664.
17399
17400  3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
17401
17402  4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
17403     Issue 707.  Site now builds using Forrest, just like the other Lucene
17404     siblings.  See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
17405     for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
17406     Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
17407
17408  5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
17409
17410  6. LUCENE-713 Updated the Term Vector section of File Formats to include
17411     documentation on how Offset and Position info are stored in the TVF file.
17412     (Grant Ingersoll, Samir Abdou)
17413
17414  7. Added in link to Clover Test Code Coverage Reports under the Develop
17415     section in Resources (Grant Ingersoll)
17416
17417  8. LUCENE-748: Added details for semantics of IndexWriter.close on
17418     hitting an Exception.  (Jed Wesley-Smith via Mike McCandless)
17419
17420  9. Added some text about what is contained in releases.
17421     (Eric Haszlakiewicz via Grant Ingersoll)
17422
17423  10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
17424      makes a full copy of the starting Directory.  (Mike McCandless)
17425
17426  11. LUCENE-764: Fix javadocs to detail temporary space requirements
17427      for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
17428      methods.  (Mike McCandless)
17429
17430Build
17431
17432  1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
17433     To enable clover code coverage, you must have clover.jar in the ANT
17434     classpath and specify -Drun.clover=true on the command line.
17435     (Michael Busch and Grant Ingersoll)
17436
17437  2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
17438     ${build.dir}/test just like the tempDir sysproperty.
17439
17440  3. LUCENE-757 Added new target named init-dist that does setup for
17441     distribution of both binary and source distributions.  Called by package
17442     and package-*-src
17443
17444======================= Release 2.0.0 =======================
17445
17446API Changes
17447
17448 1. All deprecated methods and fields have been removed, except
17449    DateField, which will still be supported for some time
17450    so Lucene can read its date fields from old indexes
17451    (Yonik Seeley & Grant Ingersoll)
17452
17453 2. DisjunctionSumScorer is no longer public.
17454    (Paul Elschot via Otis Gospodnetic)
17455
17456 3. Creating a Field with both an empty name and an empty value
17457    now throws an IllegalArgumentException
17458    (Daniel Naber)
17459
17460 4. LUCENE-301: Added new IndexWriter({String,File,Directory},
17461    Analyzer) constructors that do not take a boolean "create"
17462    argument.  These new constructors will create a new index if
17463    necessary, else append to the existing one.  (Dan Armbrust via
17464    Mike McCandless)
17465
17466New features
17467
17468 1. LUCENE-496: Command line tool for modifying the field norms of an
17469    existing index; added to contrib/miscellaneous.  (Chris Hostetter)
17470
17471 2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
17472    (Chris Hostetter)
17473
17474Bug fixes
17475
17476 1. LUCENE-330: Fix issue of FilteredQuery not working properly within
17477    BooleanQuery.  (Paul Elschot via Erik Hatcher)
17478
17479 2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
17480    with RemoteSearchable.  (Philippe Laflamme via Yonik Seeley)
17481
17482 3. Added methods to get/set writeLockTimeout and commitLockTimeout in
17483    IndexWriter. These could be set in Lucene 1.4 using a system property.
17484    This feature had been removed without adding the corresponding
17485    getter/setter methods.  (Daniel Naber)
17486
17487 4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
17488    when using SpanQueries. (Paul Elschot via Yonik Seeley)
17489
17490 5. Implemented FilterIndexReader.getVersion() and isCurrent()
17491    (Yonik Seeley)
17492
17493 6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
17494    that sometimes caused the index order of documents to change.
17495    (Yonik Seeley)
17496
17497 7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
17498    subsequent String sorts with different locales to sort identically.
17499    (Paul Cowan via Yonik Seeley)
17500
17501 8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
17502    (Stefan Will via Yonik Seeley)
17503
17504 9. LUCENE-514: Added getTermArrays() and extractTerms() to
17505    MultiPhraseQuery (Eric Jain & Yonik Seeley)
17506
1750710. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
17508    (frederic via Yonik)
17509
1751011. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
17511    NullPointerException when "exclude" query was not a SpanTermQuery.
17512    (Chris Hostetter)
17513
1751412. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
17515    (Chris Hostetter)
17516
1751713. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
17518    didn't know about the field yet, reader didn't keep track if it had deletions,
17519    and deleteDocument calls could circumvent synchronization on the subreaders.
17520    (Chuck Williams via Yonik Seeley)
17521
1752214. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
17523    ConstantScoreQuery in order to allow their use with a MultiSearcher.
17524    (Yonik Seeley)
17525
1752615. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
17527    (Peter Royal, Michael Chan, Yonik Seeley)
17528
1752916. LUCENE-485: Don't hold commit lock while removing obsolete index
17530    files.  (Luc Vanlerberghe via cutting)
17531
17532
175331.9.1
17534
17535Bug fixes
17536
17537 1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
17538    introduced in 1.9-final.  (Shay Banon & Steven Tamm via cutting)
17539
175401.9 final
17541
17542Note that this release is mostly but not 100% source compatible with
17543the previous release of Lucene (1.4.3). In other words, you should
17544make sure your application compiles with this version of Lucene before
17545you replace the old Lucene JAR with the new one.  Many methods have
17546been deprecated in anticipation of release 2.0, so deprecation
17547warnings are to be expected when upgrading from 1.4.3 to 1.9.
17548
17549Bug fixes
17550
17551 1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
17552    effects on indexing performance and has thus been reverted. The
17553    argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
17554    an exception is thrown. (Daniel Naber)
17555
17556Optimizations
17557
17558 1. Optimized BufferedIndexOutput.writeBytes() to use
17559    System.arraycopy() in more cases, rather than copying byte-by-byte.
17560    (Lukas Zapletal via Cutting)
17561
175621.9 RC1
17563
17564Requirements
17565
17566 1. To compile and use Lucene you now need Java 1.4 or later.
17567
17568Changes in runtime behavior
17569
17570 1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
17571    FuzzyQuery expands to more than BooleanQuery.maxClauseCount
17572    terms only the BooleanQuery.maxClauseCount most similar terms
17573    go into the rewritten query and thus the exception is avoided.
17574    (Christoph)
17575
17576 2. Changed system property from "org.apache.lucene.lockdir" to
17577    "org.apache.lucene.lockDir", so that its casing follows the existing
17578    pattern used in other Lucene system properties. (Bernhard)
17579
17580 3. The terms of RangeQueries and FuzzyQueries are now converted to
17581    lowercase by default (as it has been the case for PrefixQueries
17582    and WildcardQueries before). Use setLowercaseExpandedTerms(false)
17583    to disable that behavior but note that this also affects
17584    PrefixQueries and WildcardQueries. (Daniel Naber)
17585
17586 4. Document frequency that is computed when MultiSearcher is used is now
17587    computed correctly and "globally" across subsearchers and indices, while
17588    before it used to be computed locally to each index, which caused
17589    ranking across multiple indices not to be equivalent.
17590    (Chuck Williams, Wolf Siberski via Otis, bug #31841)
17591
17592 5. When opening an IndexWriter with create=true, Lucene now only deletes
17593    its own files from the index directory (looking at the file name suffixes
17594    to decide if a file belongs to Lucene). The old behavior was to delete
17595    all files. (Daniel Naber and Bernhard Messer, bug #34695)
17596
17597 6. The version of an IndexReader, as returned by getCurrentVersion()
17598    and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
17599    is now initialized by the system time in milliseconds.
17600    (Bernhard Messer via Daniel Naber)
17601
17602 7. Several default values cannot be set via system properties anymore, as
17603    this has been considered inappropriate for a library like Lucene. For
17604    most properties there are set/get methods available in IndexWriter which
17605    you should use instead. This affects the following properties:
17606    See IndexWriter for getter/setter methods:
17607      org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
17608      org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
17609      org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
17610      org.apache.lucene.mergeFactor,
17611    See BooleanQuery for getter/setter methods:
17612      org.apache.lucene.maxClauseCount
17613    See FSDirectory for getter/setter methods:
17614      disableLuceneLocks
17615    (Daniel Naber)
17616
17617 8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
17618    instead of using Integer and Float classes for parsing.
17619    (Yonik Seeley via Otis Gospodnetic)
17620
17621 9. Expert level search routines returning TopDocs and TopFieldDocs
17622    no longer normalize scores.  This also fixes bugs related to
17623    MultiSearchers and score sorting/normalization.
17624    (Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
17625
17626New features
17627
17628 1. Added support for stored compressed fields (patch #31149)
17629    (Bernhard Messer via Christoph)
17630
17631 2. Added support for binary stored fields (patch #29370)
17632    (Drew Farris and Bernhard Messer via Christoph)
17633
17634 3. Added support for position and offset information in term vectors
17635    (patch #18927). (Grant Ingersoll & Christoph)
17636
17637 4. A new class DateTools has been added. It allows you to format dates
17638    in a readable format adequate for indexing. Unlike the existing
17639    DateField class DateTools can cope with dates before 1970 and it
17640    forces you to specify the desired date resolution (e.g. month, day,
17641    second, ...) which can make RangeQuerys on those fields more efficient.
17642    (Daniel Naber)
17643
17644 5. QueryParser now correctly works with Analyzers that can return more
17645    than one token per position. For example, a query "+fast +car"
17646    would be parsed as "+fast +(car automobile)" if the Analyzer
17647    returns "car" and "automobile" at the same position whenever it
17648    finds "car" (Patch #23307).
17649    (Pierrick Brihaye, Daniel Naber)
17650
17651 6. Permit unbuffered Directory implementations (e.g., using mmap).
17652    InputStream is replaced by the new classes IndexInput and
17653    BufferedIndexInput.  OutputStream is replaced by the new classes
17654    IndexOutput and BufferedIndexOutput.  InputStream and OutputStream
17655    are now deprecated and FSDirectory is now subclassable. (cutting)
17656
17657 7. Add native Directory and TermDocs implementations that work under
17658    GCJ.  These require GCC 3.4.0 or later and have only been tested
17659    on Linux.  Use 'ant gcj' to build demo applications. (cutting)
17660
17661 8. Add MMapDirectory, which uses nio to mmap input files.  This is
17662    still somewhat slower than FSDirectory.  However it uses less
17663    memory per query term, since a new buffer is not allocated per
17664    term, which may help applications which use, e.g., wildcard
17665    queries.  It may also someday be faster. (cutting & Paul Elschot)
17666
17667 9. Added javadocs-internal to build.xml - bug #30360
17668    (Paul Elschot via Otis)
17669
1767010. Added RangeFilter, a more generically useful filter than DateFilter.
17671    (Chris M Hostetter via Erik)
17672
1767311. Added NumberTools, a utility class indexing numeric fields.
17674    (adapted from code contributed by Matt Quail; committed by Erik)
17675
1767612. Added public static IndexReader.main(String[] args) method.
17677    IndexReader can now be used directly at command line level
17678    to list and optionally extract the individual files from an existing
17679    compound index file.
17680    (adapted from code contributed by Garrett Rooney; committed by Bernhard)
17681
1768213. Add IndexWriter.setTermIndexInterval() method.  See javadocs.
17683    (Doug Cutting)
17684
1768514. Added LucenePackage, whose static get() method returns java.util.Package,
17686    which lets the caller get the Lucene version information specified in
17687    the Lucene Jar.
17688    (Doug Cutting via Otis)
17689
1769015. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
17691    This provides standard java.util.Iterator iteration over Hits.
17692    Each call to the iterator's next() method returns a Hit object.
17693    (Jeremy Rayner via Erik)
17694
1769516. Add ParallelReader, an IndexReader that combines separate indexes
17696    over different fields into a single virtual index.  (Doug Cutting)
17697
1769817. Add IntParser and FloatParser interfaces to FieldCache, so that
17699    fields in arbitrarily formats can be cached as ints and floats.
17700    (Doug Cutting)
17701
1770218. Added class org.apache.lucene.index.IndexModifier which combines
17703    IndexWriter and IndexReader, so you can add and delete documents without
17704    worrying about synchronization/locking issues.
17705    (Daniel Naber)
17706
1770719. Lucene can now be used inside an unsigned applet, as Lucene's access
17708    to system properties will not cause a SecurityException anymore.
17709    (Jon Schuster via Daniel Naber, bug #34359)
17710
1771120. Added a new class MatchAllDocsQuery that matches all documents.
17712    (John Wang via Daniel Naber, bug #34946)
17713
1771421. Added ability to omit norms on a per field basis to decrease
17715    index size and memory consumption when there are many indexed fields.
17716    See Field.setOmitNorms()
17717    (Yonik Seeley, LUCENE-448)
17718
1771922. Added NullFragmenter to contrib/highlighter, which is useful for
17720    highlighting entire documents or fields.
17721    (Erik Hatcher)
17722
1772323. Added regular expression queries, RegexQuery and SpanRegexQuery.
17724    Note the same term enumeration caveats apply with these queries as
17725    apply to WildcardQuery and other term expanding queries.
17726    These two new queries are not currently supported via QueryParser.
17727    (Erik Hatcher)
17728
1772924. Added ConstantScoreQuery which wraps a filter and produces a score
17730    equal to the query boost for every matching document.
17731    (Yonik Seeley, LUCENE-383)
17732
1773325. Added ConstantScoreRangeQuery which produces a constant score for
17734    every document in the range.  One advantage over a normal RangeQuery
17735    is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
17736    number of terms the range can cover.  Both endpoints may also be open.
17737    (Yonik Seeley, LUCENE-383)
17738
1773926. Added ability to specify a minimum number of optional clauses that
17740    must match in a BooleanQuery.  See BooleanQuery.setMinimumNumberShouldMatch().
17741    (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
17742
1774327. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
17744    It's very useful for searching across multiple fields.
17745    (Chuck Williams via Yonik Seeley, LUCENE-323)
17746
1774728. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
17748    Latin 1 character set by their unaccented equivalent.
17749    (Sven Duzont via Erik Hatcher)
17750
1775129. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
17752    This is useful for data like zip codes, ids, and some product names.
17753    (Erik Hatcher)
17754
1775530. Copied LengthFilter from contrib area to core. Removes words that are too
17756    long and too short from the stream.
17757    (David Spencer via Otis and Daniel)
17758
1775931. Added getPositionIncrementGap(String fieldName) to Analyzer.  This allows
17760    custom analyzers to put gaps between Field instances with the same field
17761    name, preventing phrase or span queries crossing these boundaries.  The
17762    default implementation issues a gap of 0, allowing the default token
17763    position increment of 1 to put the next field's first token into a
17764    successive position.
17765    (Erik Hatcher, with advice from Yonik)
17766
1776732. StopFilter can now ignore case when checking for stop words.
17768    (Grant Ingersoll via Yonik, LUCENE-248)
17769
1777033. Add TopDocCollector and TopFieldDocCollector.  These simplify the
17771    implementation of hit collectors that collect only the
17772    top-scoring or top-sorting hits.
17773
17774API Changes
17775
17776 1. Several methods and fields have been deprecated. The API documentation
17777    contains information about the recommended replacements. It is planned
17778    that most of the deprecated methods and fields will be removed in
17779    Lucene 2.0. (Daniel Naber)
17780
17781 2. The Russian and the German analyzers have been moved to contrib/analyzers.
17782    Also, the WordlistLoader class has been moved one level up in the
17783    hierarchy and is now org.apache.lucene.analysis.WordlistLoader
17784    (Daniel Naber)
17785
17786 3. The API contained methods that declared to throw an IOException
17787    but that never did this. These declarations have been removed. If
17788    your code tries to catch these exceptions you might need to remove
17789    those catch clauses to avoid compile errors. (Daniel Naber)
17790
17791 4. Add a serializable Parameter Class to standardize parameter enum
17792    classes in BooleanClause and Field. (Christoph)
17793
17794 5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
17795    This allows custom SpanQuery subclasses that rewrite (for term expansion, for
17796    example) to nest within the built-in SpanQuery classes successfully.
17797
17798Bug fixes
17799
17800 1. The JSP demo page (src/jsp/results.jsp) now properly closes the
17801    IndexSearcher it opens. (Daniel Naber)
17802
17803 2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
17804    prevented deletion of obsolete segments. (Christoph Goller)
17805
17806 3. Fix in FieldInfos to avoid the return of an extra blank field in
17807    IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
17808
17809 4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
17810    PhrasePrefixQuery) could provoke UnsupportedOperationException
17811    (bug #33161). (Rhett Sutphin via Daniel Naber)
17812
17813 5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
17814    if skipTo() was called without prior call to next() fixed. (Christoph)
17815
17816 6. Disable Similiarty.coord() in the scoring of most automatically
17817    generated boolean queries.  The coord() score factor is
17818    appropriate when clauses are independently specified by a user,
17819    but is usually not appropriate when clauses are generated
17820    automatically, e.g., by a fuzzy, wildcard or range query.  Matches
17821    on such automatically generated queries are no longer penalized
17822    for not matching all terms.  (Doug Cutting, Patch #33472)
17823
17824 7. Getting a lock file with Lock.obtain(long) was supposed to wait for
17825    a given amount of milliseconds, but this didn't work.
17826    (John Wang via Daniel Naber, Bug #33799)
17827
17828 8. Fix FSDirectory.createOutput() to always create new files.
17829    Previously, existing files were overwritten, and an index could be
17830    corrupted when the old version of a file was longer than the new.
17831    Now any existing file is first removed.  (Doug Cutting)
17832
17833 9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
17834    could return an incorrect number of hits.
17835    (Reece Wilton via Erik Hatcher, Bug #35157)
17836
1783710. Fix NullPointerException that could occur with a MultiPhraseQuery
17838    inside a BooleanQuery.
17839    (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
17840
1784111. Fixed SnowballFilter to pass through the position increment from
17842    the original token.
17843    (Yonik Seeley via Erik Hatcher, LUCENE-437)
17844
1784512. Added Unicode range of Korean characters to StandardTokenizer,
17846    grouping contiguous characters into a token rather than one token
17847    per character.  This change also changes the token type to "<CJ>"
17848    for Chinese and Japanese character tokens (previously it was "<CJK>").
17849    (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
17850
1785113. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
17852    FieldInfo.storePositionWithTermVector and creates the Field with
17853    correct TermVector parameter.
17854    (Frank Steinmann via Bernhard, LUCENE-455)
17855
1785614. Fixed WildcardQuery to prevent "cat" matching "ca??".
17857    (Xiaozheng Ma via Bernhard, LUCENE-306)
17858
1785915. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
17860    change the sort order when sorting by string for documents without
17861    a value for the sort field.
17862    (Luc Vanlerberghe via Yonik, LUCENE-453)
17863
1786416. Fixed a sorting problem with MultiSearchers that can lead to
17865    missing or duplicate docs due to equal docs sorting in an arbitrary order.
17866    (Yonik Seeley, LUCENE-456)
17867
1786817. A single hit using the expert level sorted search methods
17869    resulted in the score not being normalized.
17870    (Yonik Seeley, LUCENE-462)
17871
1787218. Fixed inefficient memory usage when loading an index into RAMDirectory.
17873    (Volodymyr Bychkoviak via Bernhard, LUCENE-475)
17874
1787519. Corrected term offsets returned by ChineseTokenizer.
17876    (Ray Tsang via Erik Hatcher, LUCENE-324)
17877
1787820. Fixed MultiReader.undeleteAll() to correctly update numDocs.
17879    (Robert Kirchgessner via Doug Cutting, LUCENE-479)
17880
1788121. Race condition in IndexReader.getCurrentVersion() and isCurrent()
17882    fixed by acquiring the commit lock.
17883    (Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
17884
1788522. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
17886    this has now been fixed. (Daniel Naber)
17887
1788823. Fixed QueryParser when called with a date in local form like
17889    "[1/16/2000 TO 1/18/2000]". This query did not include the documents
17890    of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
17891
1789224. Removed sorting constraint that threw an exception if there were
17893    not yet any values for the sort field (Yonik Seeley, LUCENE-374)
17894
17895Optimizations
17896
17897 1. Disk usage (peak requirements during indexing and optimization)
17898    in case of compound file format has been improved.
17899    (Bernhard, Dmitry, and Christoph)
17900
17901 2. Optimize the performance of certain uses of BooleanScorer,
17902    TermScorer and IndexSearcher.  In particular, a BooleanQuery
17903    composed of TermQuery, with not all terms required, that returns a
17904    TopDocs (e.g., through a Hits with no Sort specified) runs much
17905    faster.  (cutting)
17906
17907 3. Removed synchronization from reading of term vectors with an
17908    IndexReader (Patch #30736). (Bernhard Messer via Christoph)
17909
17910 4. Optimize term-dictionary lookup to allocate far fewer terms when
17911    scanning for the matching term.  This speeds searches involving
17912    low-frequency terms, where the cost of dictionary lookup can be
17913    significant. (cutting)
17914
17915 5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
17916    of 0 now run 20-50% faster (Patch #31882).
17917    (Jonathan Hager via Daniel Naber)
17918
17919 6. A Version of BooleanScorer (BooleanScorer2) added that delivers
17920    documents in increasing order and implements skipTo. For queries
17921    with required or forbidden clauses it may be faster than the old
17922    BooleanScorer, for BooleanQueries consisting only of optional
17923    clauses it is probably slower. The new BooleanScorer is now the
17924    default. (Patch 31785 by Paul Elschot via Christoph)
17925
17926 7. Use uncached access to norms when merging to reduce RAM usage.
17927    (Bug #32847).  (Doug Cutting)
17928
17929 8. Don't read term index when random-access is not required.  This
17930    reduces time to open IndexReaders and they use less memory when
17931    random access is not required, e.g., when merging segments.  The
17932    term index is now read into memory lazily at the first
17933    random-access.  (Doug Cutting)
17934
17935 9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
17936    added indexes is larger than mergeFactor.  Previously this could
17937    result in quadratic performance.  Now performance is n log(n).
17938    (Doug Cutting)
17939
1794010. Speed up the creation of TermEnum for indices with multiple
17941    segments and deleted documents, and thus speed up PrefixQuery,
17942    RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
17943    and sorting the first time on a field.
17944    (Yonik Seeley, LUCENE-454)
17945
1794611. Optimized and generalized 32 bit floating point to byte
17947    (custom 8 bit floating point) conversions.  Increased the speed of
17948    Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
17949    (Yonik Seeley, LUCENE-467)
17950
17951Infrastructure
17952
17953 1. Lucene's source code repository has converted from CVS to
17954    Subversion.  The new repository is at
17955    http://svn.apache.org/repos/asf/lucene/java/trunk
17956
17957 2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
17958    Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
17959    The old issues are still available at
17960    http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
17961    (use the bug number instead of xxxx)
17962
17963
179641.4.3
17965
17966 1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
17967    messages which might contain user input (e.g. error messages about
17968    query parsing). If you used that page as a starting point for your
17969    own code please make sure your code also properly escapes HTML
17970    characters from user input in order to avoid so-called cross site
17971    scripting attacks. (Daniel Naber)
17972
17973  2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
17974     API is supported again. (Christoph)
17975
17976
179771.4.2
17978
17979 1. Fixed bug #31241: Sorting could lead to incorrect results (documents
17980    missing, others duplicated) if the sort keys were not unique and there
17981    were more than 100 matches. (Daniel Naber)
17982
17983 2. Memory leak in Sort code (bug #31240) eliminated.
17984    (Rafal Krzewski via Christoph and Daniel)
17985
17986 3. FuzzyQuery now takes an additional parameter that specifies the
17987    minimum similarity that is required for a term to match the query.
17988    The QueryParser syntax for this is term~x, where x is a floating
17989    point number >= 0 and < 1 (a bigger number means that a higher
17990    similarity is required). Furthermore, a prefix can be specified
17991    for FuzzyQuerys so that only those terms are considered similar that
17992    start with this prefix. This can speed up FuzzyQuery greatly.
17993    (Daniel Naber, Christoph Goller)
17994
17995 4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
17996    of relative positions. (Christoph Goller)
17997
17998 5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
17999    (patch #9110); some unused method parameters removed; The ability
18000    to specify a minimum similarity for FuzzyQuery has been added.
18001    (Christoph Goller)
18002
18003 6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
18004    for every non-zero-scoring hit.  This makes 'OR' queries that
18005    contain common terms substantially faster.  (cutting)
18006
18007
180081.4.1
18009
18010 1. Fixed a performance bug in hit sorting code, where values were not
18011    correctly cached.  (Aviran via cutting)
18012
18013 2. Fixed errors in file format documentation. (Daniel Naber)
18014
18015
180161.4 final
18017
18018 1. Added "an" to the list of stop words in StopAnalyzer, to complement
18019    the existing "a" there.  Fix for bug 28960
18020     (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
18021
18022 2. Added new class FieldCache to manage in-memory caches of field term
18023    values.  (Tim Jones)
18024
18025 3. Added overloaded getFieldQuery method to QueryParser which
18026    accepts the slop factor specified for the phrase (or the default
18027    phrase slop for the QueryParser instance).  This allows overriding
18028    methods to replace a PhraseQuery with a SpanNearQuery instead,
18029    keeping the proper slop factor. (Erik Hatcher)
18030
18031 4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
18032    UTF-8 and changed the build encoding to UTF-8, to make changed files
18033    compile. (Otis Gospodnetic)
18034
18035 5. Removed synchronization from term lookup under IndexReader methods
18036    termFreq(), termDocs() or termPositions() to improve
18037    multi-threaded performance.  (cutting)
18038
18039 6. Fix a bug where obsolete segment files were not deleted on Win32.
18040
18041
180421.4 RC3
18043
18044 1. Fixed several search bugs introduced by the skipTo() changes in
18045    release 1.4RC1.  The index file format was changed a bit, so
18046    collections must be re-indexed to take advantage of the skipTo()
18047    optimizations.  (Christoph Goller)
18048
18049 2. Added new Document methods, removeField() and removeFields().
18050    (Christoph Goller)
18051
18052 3. Fixed inconsistencies with index closing.  Indexes and directories
18053    are now only closed automatically by Lucene when Lucene opened
18054    them automatically.  (Christoph Goller)
18055
18056 4. Added new class: FilteredQuery.  (Tim Jones)
18057
18058 5. Added a new SortField type for custom comparators.  (Tim Jones)
18059
18060 6. Lock obtain timed out message now displays the full path to the lock
18061    file. (Daniel Naber via Erik)
18062
18063 7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
18064
18065 8. Fixed so that FSDirectory's locks still work when the
18066    java.io.tmpdir system property is null.  (cutting)
18067
18068 9. Changed FilteredTermEnum's constructor to take no parameters,
18069    as the parameters were ignored anyway (bug #28858)
18070
180711.4 RC2
18072
18073 1. GermanAnalyzer now throws an exception if the stopword file
18074    cannot be found (bug #27987). It now uses LowerCaseFilter
18075    (bug #18410) (Daniel Naber via Otis, Erik)
18076
18077 2. Fixed a few bugs in the file format documentation. (cutting)
18078
18079
180801.4 RC1
18081
18082 1. Changed the format of the .tis file, so that:
18083
18084    - it has a format version number, which makes it easier to
18085      back-compatibly change file formats in the future.
18086
18087    - the term count is now stored as a long.  This was the one aspect
18088      of the Lucene's file formats which limited index size.
18089
18090    - a few internal index parameters are now stored in the index, so
18091      that they can (in theory) now be changed from index to index,
18092      although there is not yet an API to do so.
18093
18094    These changes are back compatible.  The new code can read old
18095    indexes.  But old code will not be able read new indexes. (cutting)
18096
18097 2. Added an optimized implementation of TermDocs.skipTo().  A skip
18098    table is now stored for each term in the .frq file.  This only
18099    adds a percent or two to overall index size, but can substantially
18100    speedup many searches.  (cutting)
18101
18102 3. Restructured the Scorer API and all Scorer implementations to take
18103    advantage of an optimized TermDocs.skipTo() implementation.  In
18104    particular, PhraseQuerys and conjunctive BooleanQuerys are
18105    faster when one clause has substantially fewer matches than the
18106    others.  (A conjunctive BooleanQuery is a BooleanQuery where all
18107    clauses are required.)  (cutting)
18108
18109 4. Added new class ParallelMultiSearcher.  Combined with
18110    RemoteSearchable this makes it easy to implement distributed
18111    search systems.  (Jean-Francois Halleux via cutting)
18112
18113 5. Added support for hit sorting.  Results may now be sorted by any
18114    indexed field.  For details see the javadoc for
18115    Searcher#search(Query, Sort).  (Tim Jones via Cutting)
18116
18117 6. Changed FSDirectory to auto-create a full directory tree that it
18118    needs by using mkdirs() instead of mkdir().  (Mladen Turk via Otis)
18119
18120 7. Added a new span-based query API.  This implements, among other
18121    things, nested phrases.  See javadocs for details.  (Doug Cutting)
18122
18123 8. Added new method Query.getSimilarity(Searcher), and changed
18124    scorers to use it.  This permits one to subclass a Query class so
18125    that it can specify its own Similarity implementation, perhaps
18126    one that delegates through that of the Searcher.  (Julien Nioche
18127    via Cutting)
18128
18129 9. Added MultiReader, an IndexReader that combines multiple other
18130    IndexReaders.  (Cutting)
18131
1813210. Added support for term vectors.  See Field#isTermVectorStored().
18133    (Grant Ingersoll, Cutting & Dmitry)
18134
1813511. Fixed the old bug with escaping of special characters in query
18136    strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
18137    (Jean-Francois Halleux via Otis)
18138
1813912. Added support for overriding default values for the following,
18140    using system properties:
18141      - default commit lock timeout
18142      - default maxFieldLength
18143      - default maxMergeDocs
18144      - default mergeFactor
18145      - default minMergeDocs
18146      - default write lock timeout
18147    (Otis)
18148
1814913. Changed QueryParser.jj to allow '-' and '+' within tokens:
18150    http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
18151    (Morus Walter via Otis)
18152
1815314. Changed so that the compound index format is used by default.
18154    This makes indexing a bit slower, but vastly reduces the chances
18155    of file handle problems.  (Cutting)
18156
18157
181581.3 final
18159
18160 1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
18161    throw ParseException instead. (Erik Hatcher)
18162
18163 2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
18164
18165 3. Added a new method IndexReader.setNorm(), that permits one to
18166    alter the boosting of fields after an index is created.
18167
18168 4. Distinguish between the final position and length when indexing a
18169    field.  The length is now defined as the total number of tokens,
18170    instead of the final position, as it was previously.  Length is
18171    used for score normalization (Similarity.lengthNorm()) and for
18172    controlling memory usage (IndexWriter.maxFieldLength).  In both of
18173    these cases, the total number of tokens is a better value to use
18174    than the final token position.  Position is used in phrase
18175    searching (see PhraseQuery and Token.setPositionIncrement()).
18176
18177 5. Fix StandardTokenizer's handling of CJK characters (Chinese,
18178    Japanese and Korean ideograms).  Previously contiguous sequences
18179    were combined in a single token, which is not very useful.  Now
18180    each ideogram generates a separate token, which is more useful.
18181
18182
181831.3 RC3
18184
18185 1. Added minMergeDocs in IndexWriter.  This can be raised to speed
18186    indexing without altering the number of files, but only using more
18187    memory.  (Julien Nioche via Otis)
18188
18189 2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
18190
18191 3. Fix bug #16952, in demo HTML parser, skip comments in
18192    javascript. (Christoph Goller)
18193
18194 4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
18195    output (Daniel Naber via Christoph Goller)
18196
18197 5. Fix bug #24301, in demo HTML parser, long titles no longer
18198    hang things. (Christoph Goller)
18199
18200 6. Fix bug #23534, Replace use of file timestamp of segments file
18201    with an index version number stored in the segments file.  This
18202    resolves problems when running on file systems with low-resolution
18203    timestamps, e.g., HFS under MacOS X.  (Christoph Goller)
18204
18205 7. Fix QueryParser so that TokenMgrError is not thrown, only
18206    ParseException.  (Erik Hatcher)
18207
18208 8. Fix some bugs introduced by change 11 of RC2.  (Christoph Goller)
18209
18210 9. Fixed a problem compiling TestRussianStem.  (Christoph Goller)
18211
1821210. Cleaned up some build stuff.  (Erik Hatcher)
18213
18214
182151.3 RC2
18216
18217 1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
18218    SegmentsReader. (Julien Nioche via otis)
18219
18220 2. Changed file locking to place lock files in
18221    System.getProperty("java.io.tmpdir"), where all users are
18222    permitted to write files.  This way folks can open and correctly
18223    lock indexes which are read-only to them.
18224
18225 3. IndexWriter: added a new method, addDocument(Document, Analyzer),
18226    permitting one to easily use different analyzers for different
18227    documents in the same index.
18228
18229 4. Minor enhancements to FuzzyTermEnum.
18230    (Christoph Goller via Otis)
18231
18232 5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
18233    and MultiIndexSearcher to use it.
18234    (Christoph Goller via Otis)
18235
18236 6. Fixed a bug in IndexWriter that returned incorrect docCount().
18237    (Christoph Goller via Otis)
18238
18239 7. Fixed SegmentsReader to eliminate the confusing and slightly different
18240    behaviour of TermEnum when dealing with an enumeration of all terms,
18241    versus an enumeration starting from a specific term.
18242    This patch also fixes incorrect term document frequencies when the same term
18243    is present in multiple segments.
18244    (Christoph Goller via Otis)
18245
18246 8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
18247
18248 9. Added support for the new "compound file" index format (Dmitry
18249    Serebrennikov)
18250
1825110. Added Locale setting to QueryParser, for use by date range parsing.
18252
1825311. Changed IndexReader so that it can be subclassed by classes
18254    outside of its package.  Previously it had package-private
18255    abstract methods.  Also modified the index merging code so that it
18256    can work on an arbitrary IndexReader implementation, and added a
18257    new method, IndexWriter.addIndexes(IndexReader[]), to take
18258    advantage of this. (cutting)
18259
1826012. Added a limit to the number of clauses which may be added to a
18261    BooleanQuery.  The default limit is 1024 clauses.  This should
18262    stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
18263    queries which run amok. (cutting)
18264
1826513. Add new method: IndexReader.undeleteAll().  This undeletes all
18266    deleted documents which still remain in the index. (cutting)
18267
18268
182691.3 RC1
18270
18271 1. Fixed PriorityQueue's clear() method.
18272    Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
18273    (Matthijs Bomhoff via otis)
18274
18275 2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
18276    Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
18277    (Dale Anson via otis)
18278
18279 3. Added the ability to disable lock creation by using disableLuceneLocks
18280    system property.  This is useful for read-only media, such as CD-ROMs.
18281    (otis)
18282
18283 4. Added id method to Hits to be able to access the index global id.
18284    Required for sorting options.
18285    (carlson)
18286
18287 5. Added support for new range query syntax to QueryParser.jj.
18288    (briangoetz)
18289
18290 6. Added the ability to retrieve HTML documents' META tag values to
18291    HTMLParser.jj.
18292    (Mark Harwood via otis)
18293
18294 7. Modified QueryParser to make it possible to programmatically specify the
18295    default Boolean operator (OR or AND).
18296    (Péter Halácsy via otis)
18297
18298 8. Made many search methods and classes non-final, per requests.
18299    This includes IndexWriter and IndexSearcher, among others.
18300    (cutting)
18301
18302 9. Added class RemoteSearchable, providing support for remote
18303    searching via RMI.  The test class RemoteSearchableTest.java
18304    provides an example of how this can be used.  (cutting)
18305
18306 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions).  The
18307     test class TestPhrasePrefixQuery provides the usage example.
18308     (Anders Nielsen via otis)
18309
18310 11. Changed the German stemming algorithm to ignore case while
18311     stripping. The new algorithm is faster and produces more equal
18312     stems from nouns and verbs derived from the same word.
18313     (gschwarz)
18314
18315 12. Added support for boosting the score of documents and fields via
18316     the new methods Document.setBoost(float) and Field.setBoost(float).
18317
18318     Note: This changes the encoding of an indexed value.  Indexes
18319     should be re-created from scratch in order for search scores to
18320     be correct.  With the new code and an old index, searches will
18321     yield very large scores for shorter fields, and very small scores
18322     for longer fields.  Once the index is re-created, scores will be
18323     as before. (cutting)
18324
18325 13. Added new method Token.setPositionIncrement().
18326
18327     This permits, for the purpose of phrase searching, placing
18328     multiple terms in a single position.  This is useful with
18329     stemmers that produce multiple possible stems for a word.
18330
18331     This also permits the introduction of gaps between terms, so that
18332     terms which are adjacent in a token stream will not be matched by
18333     and exact phrase query.  This makes it possible, e.g., to build
18334     an analyzer where phrases are not matched over stop words which
18335     have been removed.
18336
18337     Finally, repeating a token with an increment of zero can also be
18338     used to boost scores of matches on that token.  (cutting)
18339
18340 14. Added new Filter class, QueryFilter.  This constrains search
18341     results to only match those which also match a provided query.
18342     Results are cached, so that searches after the first on the same
18343     index using this filter are very fast.
18344
18345     This could be used, for example, with a RangeQuery on a formatted
18346     date field to implement date filtering.  One could re-use a
18347     single QueryFilter that matches, e.g., only documents modified
18348     within the last week.  The QueryFilter and RangeQuery would only
18349     need to be reconstructed once per day. (cutting)
18350
18351 15. Added a new IndexWriter method, getAnalyzer().  This returns the
18352     analyzer used when adding documents to this index. (cutting)
18353
18354 16. Fixed a bug with IndexReader.lastModified().  Before, document
18355     deletion did not update this.  Now it does.  (cutting)
18356
18357 17. Added Russian Analyzer.
18358     (Boris Okner via otis)
18359
18360 18. Added a public, extensible scoring API.  For details, see the
18361     javadoc for org.apache.lucene.search.Similarity.
18362
18363 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
18364
18365 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
18366     (Peter Mularien via otis)
18367
18368 21. Added getFields(String) and getValues(String) methods.
18369     Contributed by Rasik Pandey on 2002-10-09
18370     (Rasik Pandey via otis)
18371
18372 22. Revised internal search APIs.  Changes include:
18373
18374       a. Queries are no longer modified during a search.  This makes
18375       it possible, e.g., to reuse the same query instance with
18376       multiple indexes from multiple threads.
18377
18378       b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
18379       etc.)  now work correctly with MultiSearcher, fixing bugs 12619
18380       and 12667.
18381
18382       c. Boosting BooleanQuery's now works, and is supported by the
18383       query parser (problem reported by Lee Mallabone).  Thus a query
18384       like "(+foo +bar)^2 +baz" is now supported and equivalent to
18385       "(+foo^2 +bar^2) +baz".
18386
18387       d. New method: Query.rewrite(IndexReader).  This permits a
18388       query to re-write itself as an alternate, more primitive query.
18389       Most of the term-expanding query classes (PrefixQuery,
18390       WildcardQuery, etc.) are now implemented using this method.
18391
18392       e. New method: Searchable.explain(Query q, int doc).  This
18393       returns an Explanation instance that describes how a particular
18394       document is scored against a query.  An explanation can be
18395       displayed as either plain text, with the toString() method, or
18396       as HTML, with the toHtml() method.  Note that computing an
18397       explanation is as expensive as executing the query over the
18398       entire index.  This is intended to be used in developing
18399       Similarity implementations, and, for good performance, should
18400       not be displayed with every hit.
18401
18402       f. Scorer and Weight are public, not package protected.  It now
18403       possible for someone to write a Scorer implementation that is
18404       not in the org.apache.lucene.search package.  This is still
18405       fairly advanced programming, and I don't expect anyone to do
18406       this anytime soon, but at least now it is possible.
18407
18408       g. Added public accessors to the primitive query classes
18409       (TermQuery, PhraseQuery and BooleanQuery), permitting access to
18410       their terms and clauses.
18411
18412     Caution: These are extensive changes and they have not yet been
18413     tested extensively.  Bug reports are appreciated.
18414     (cutting)
18415
18416 23. Added convenience RAMDirectory constructors taking File and String
18417     arguments, for easy FSDirectory to RAMDirectory conversion.
18418     (otis)
18419
18420 24. Added code for manual renaming of files in FSDirectory, since it
18421     has been reported that java.io.File's renameTo(File) method sometimes
18422     fails on Windows JVMs.
18423     (Matt Tucker via otis)
18424
18425 25. Refactored QueryParser to make it easier for people to extend it.
18426     Added the ability to automatically lower-case Wildcard terms in
18427     the QueryParser.
18428     (Tatu Saloranta via otis)
18429
18430
184311.2 RC6
18432
18433 1. Changed QueryParser.jj to have "?" be a special character which
18434    allowed it to be used as a wildcard term. Updated TestWildcard
18435    unit test also. (Ralf Hettesheimer via carlson)
18436
184371.2 RC5
18438
18439 1. Renamed build.properties to default.properties and updated
18440    the BUILD.txt document to describe how to override the
18441    default.property settings without having to edit the file. This
18442    brings the build process closer to Scarab's build process.
18443    (jon)
18444
18445 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
18446
18447 3. Updated "powered by" links. (otis)
18448
18449 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
18450
18451 5. Added throwing exception if FSDirectory could not create directory
18452    - Bug #6914 (Eugene Gluzberg via otis)
18453
18454 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
18455    LowerCaseTokenizer javadoc (otis)
18456
18457 7. Added fix to avoid NullPointerException in results.jsp
18458    (Mark Hayes via otis)
18459
18460 8. Changed Wildcard search to find 0 or more char instead of 1 or more
18461    (Lee Mallobone, via otis)
18462
18463 9. Fixed error in offset issue in GermanStemFilter - Bug #7412
18464    (Rodrigo Reyes, via otis)
18465
18466 10. Added unit tests for wildcard search and DateFilter (otis)
18467
18468 11. Allow co-existence of indexed and non-indexed fields with the same name
18469     (cutting/casper, via otis)
18470
18471 12. Add escape character to query parser.
18472     (briangoetz)
18473
18474 13. Applied a patch that ensures that searches that use DateFilter
18475     don't throw an exception when no matches are found. (David Smiley, via
18476     otis)
18477
18478 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
18479
18480
184811.2 RC4
18482
18483 1. Updated contributions section of website.
18484    Add XML Document #3 implementation to Document Section.
18485    Also added Term Highlighting to Misc Section. (carlson)
18486
18487 2. Fixed NullPointerException for phrase searches containing
18488    unindexed terms, introduced in 1.2RC3.  (cutting)
18489
18490 3. Changed document deletion code to obtain the index write lock,
18491    enforcing the fact that document addition and deletion cannot be
18492    performed concurrently.  (cutting)
18493
18494 4. Various documentation cleanups.  (otis, acoliver)
18495
18496 5. Updated "powered by" links.  (cutting, jon)
18497
18498 6. Fixed a bug in the GermanStemmer.  (Bernhard Messer, via otis)
18499
18500 7. Changed Term and Query to implement Serializable.  (scottganyo)
18501
18502 8. Fixed to never delete indexes added with IndexWriter.addIndexes().
18503    (cutting)
18504
18505 9. Upgraded to JUnit 3.7. (otis)
18506
185071.2 RC3
18508
18509 1. IndexWriter: fixed a bug where adding an optimized index to an
18510    empty index failed.  This was encountered using addIndexes to copy
18511    a RAMDirectory index to an FSDirectory.
18512
18513 2. RAMDirectory: fixed a bug where RAMInputStream could not read
18514    across more than across a single buffer boundary.
18515
18516 3. Fix query parser so it accepts queries with unicode characters.
18517    (briangoetz)
18518
18519 4. Fix query parser so that PrefixQuery is used in preference to
18520    WildcardQuery when there's only an asterisk at the end of the
18521    term.  Previously PrefixQuery would never be used.
18522
18523 5. Fix tests so they compile; fix ant file so it compiles tests
18524    properly.  Added test cases for Analyzers and PriorityQueue.
18525
18526 6. Updated demos, added Getting Started documentation. (acoliver)
18527
18528 7. Added 'contributions' section to website & docs. (carlson)
18529
18530 8. Removed JavaCC from source distribution for copyright reasons.
18531    Folks must now download this separately from metamata in order to
18532    compile Lucene.  (cutting)
18533
18534 9. Substantially improved the performance of DateFilter by adding the
18535    ability to reuse TermDocs objects.  (cutting)
18536
1853710. Added IndexReader methods:
18538      public static boolean indexExists(String directory);
18539      public static boolean indexExists(File directory);
18540      public static boolean indexExists(Directory directory);
18541      public static boolean isLocked(Directory directory);
18542      public static void unlock(Directory directory);
18543    (cutting, otis)
18544
1854511. Fixed bugs in GermanAnalyzer (gschwarz)
18546
18547
185481.2 RC2
18549 - added sources to distribution
18550 - removed broken build scripts and libraries from distribution
18551 - SegmentsReader: fixed potential race condition
18552 - FSDirectory: fixed so that getDirectory(xxx,true) correctly
18553   erases the directory contents, even when the directory
18554   has already been accessed in this JVM.
18555 - RangeQuery: Fix issue where an inclusive range query would
18556   include the nearest term in the index above a non-existant
18557   specified upper term.
18558 - SegmentTermEnum: Fix NullPointerException in clone() method
18559   when the Term is null.
18560 - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
18561   since they rely on a feature added in JDK 1.2.
18562
185631.2 RC1
18564  - first Apache release
18565  - packages renamed from com.lucene to org.apache.lucene
18566  - license switched from LGPL to Apache
18567  - ant-only build -- no more makefiles
18568  - addition of lock files--now fully thread & process safe
18569  - addition of German stemmer
18570  - MultiSearcher now supports low-level search API
18571  - added RangeQuery, for term-range searching
18572  - Analyzers can choose tokenizer based on field name
18573  - misc bug fixes.
18574
185751.01b
18576 . last Sourceforge release
18577 . a few bug fixes
18578 . new Query Parser
18579 . new prefix query (search for "foo*" matches "food")
18580
185811.0
18582
18583This release fixes a few serious bugs and also includes some
18584performance optimizations, a stemmer, and a few other minor
18585enhancements.
18586
185870.04
18588
18589Lucene now includes a grammar-based tokenizer, StandardTokenizer.
18590
18591The only tokenizer included in the previous release (LetterTokenizer)
18592identified terms consisting entirely of alphabetic characters.  The
18593new tokenizer uses a regular-expression grammar to identify more
18594complex classes of terms, including numbers, acronyms, email
18595addresses, etc.
18596
18597StandardTokenizer serves two purposes:
18598
18599 1. It is a much better, general purpose tokenizer for use by
18600    applications as is.
18601
18602    The easiest way for applications to start using
18603    StandardTokenizer is to use StandardAnalyzer.
18604
18605 2. It provides a good example of grammar-based tokenization.
18606
18607    If an application has special tokenization requirements, it can
18608    implement a custom tokenizer by copying the directory containing
18609    the new tokenizer into the application and modifying it
18610    accordingly.
18611
186120.01
18613
18614First open source release.
18615
18616The code has been re-organized into a new package and directory
18617structure for this release.  It builds OK, but has not been tested
18618beyond that since the re-organization.
18619