xref: /Lucene/help/regeneration.txt (revision bd8f182b13c055220ff579da609452377bca1b6d)
19d15435bSRobert MuirRegeneration
29d15435bSRobert Muir============
39d15435bSRobert Muir
49d15435bSRobert MuirLucene has a number of machine-generated resources - some of these are
59d15435bSRobert Muirresource (binary) files, others are Java source files that are stored
69d15435bSRobert Muir(and compiled) with the rest of Lucene source code.
79d15435bSRobert Muir
89d15435bSRobert MuirIf you're reading this, chances are that:
99d15435bSRobert Muir
109d15435bSRobert Muir1) you've hit a precommit check error that said you've modified a generated
119d15435bSRobert Muir   resource and some checksums are out of sync.
129d15435bSRobert Muir
139d15435bSRobert Muir2) you need to regenerate one (or more) of these resources.
149d15435bSRobert Muir
159d15435bSRobert MuirIn many cases hitting (1) means you'll have to do (2) so let's discuss
169d15435bSRobert Muirthese in order.
179d15435bSRobert Muir
189d15435bSRobert Muir
199d15435bSRobert MuirChecksum validation errors
209d15435bSRobert Muir--------------------------
219d15435bSRobert Muir
229d15435bSRobert MuirLUCENE-9868 introduced a system of storing (and validating) checksums of
239d15435bSRobert Muirgenerated files so that they are not accidentally modified. This checkums
249d15435bSRobert Muirsystem will fail the build with a message similar to this one:
259d15435bSRobert Muir
269d15435bSRobert MuirExecution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'.
279d15435bSRobert Muir> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged):
289d15435bSRobert Muir  Actual:
299d15435bSRobert Muir    lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326
309d15435bSRobert Muir
319d15435bSRobert Muir  Expected:
329d15435bSRobert Muir    lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8
339d15435bSRobert Muir
349d15435bSRobert MuirThe message shows you which resources have mismatches on checksums (in this case
359d15435bSRobert MuirStandardTokenizerImpl.java) but also the *module* where the generated
369d15435bSRobert Muirresource exists and the *task name* that should be used to regenerate this resource:
379d15435bSRobert Muir
389d15435bSRobert Muir:lucene:core:generateStandardTokenizerIfChanged
399d15435bSRobert Muir
409d15435bSRobert MuirTo resolve the problem, try to:
419d15435bSRobert Muir
429d15435bSRobert Muir1) "git diff" the changes that caused the build failure (to see why the checksums
439d15435bSRobert Muirchanged) and then decide whether to update the generated resource's template (or whatever
449d15435bSRobert Muirit is using to emit the generated resource);
459d15435bSRobert Muir
469d15435bSRobert Muir2) regenerate the derived resources, possibly saving new checksums. If you decide to
479d15435bSRobert Muirregenerate, just run the task hinted at in the error message, for example:
489d15435bSRobert Muir
499d15435bSRobert Muirgradlew :lucene:core:generateStandardTokenizerIfChanged
509d15435bSRobert Muir
519d15435bSRobert MuirThis regenerates all resources the task "generateStandardTokenizer" produces
529d15435bSRobert Muirand updates the corresponding checksums.
539d15435bSRobert Muir
549d15435bSRobert Muir
559d15435bSRobert MuirResource regeneration
569d15435bSRobert Muir---------------------
579d15435bSRobert Muir
589d15435bSRobert MuirThe "convention" task for regenerating all derived resources in a given
599d15435bSRobert Muirmodule is called "regenerate" and you can apply it to all Lucene modules
609d15435bSRobert Muirby running:
619d15435bSRobert Muir
629d15435bSRobert Muirgradlew regenerate
639d15435bSRobert Muir
649d15435bSRobert MuirIt is typically much wiser to limit the scope of regeneration to only
659d15435bSRobert Muirthe module you're working with though:
669d15435bSRobert Muir
679d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate
689d15435bSRobert Muir
699d15435bSRobert MuirIf you're interested in what specific generation tasks are available, see
709d15435bSRobert Muirthe task list for the generation group:
719d15435bSRobert Muir
729d15435bSRobert Muirgradlew tasks --group generation
739d15435bSRobert Muir
749d15435bSRobert Muiror limit the output to a particular module:
759d15435bSRobert Muir
769d15435bSRobert Muirgradlew -p lucene/analysis/common tasks --group generation
779d15435bSRobert Muir
789d15435bSRobert Muirwhich displays (at the moment of writing):
799d15435bSRobert Muir
80beafd113SDawid WeissgenerateClassicTokenizer - Regenerate ClassicTokenizerImpl.java (if sources changed)
81beafd113SDawid WeissgenerateHTMLStripCharFilter - Regenerate HTMLStripCharFilter.java (if sources changed)
82beafd113SDawid WeissgenerateTlds - Regenerate top-level domain jflex macros and tests (if sources changed)
83beafd113SDawid WeissgenerateUAX29URLEmailTokenizer - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed)
84beafd113SDawid WeissgenerateWikipediaTokenizer - Regenerate WikipediaTokenizerImpl.java (if sources changed)
859d15435bSRobert Muirregenerate - Rerun any code or static data generation tasks.
869d15435bSRobert Muirsnowball - Regenerates snowball stemmers.
879d15435bSRobert Muir
88beafd113SDawid WeissYou may wonder why none of these tasks actually exist in gradle source files (identically
89beafd113SDawid Weissnamed tasks with a suffix "Internal" exist).
909d15435bSRobert Muir
919d15435bSRobert Muir
929d15435bSRobert MuirResource checksums, incremental generation and advanced topics
939d15435bSRobert Muir--------------------------------------------------------------
949d15435bSRobert Muir
959d15435bSRobert MuirMany resource generation tasks require specific tools (perl, python, bash shell)
969d15435bSRobert Muirand resources that may not be available on all platforms. In LUCENE-9868 we tried
979d15435bSRobert Muirto make resource generation tasks "incremental" so that they only run if their
989d15435bSRobert Muirsources (or outputs) have changed. So if you run the generic "regenerate" task, many of the
999d15435bSRobert Muiractual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with
1009d15435bSRobert Muirplain console, for example:
1019d15435bSRobert Muir
1029d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate --console=plain
1039d15435bSRobert Muir
1049d15435bSRobert Muir...
105beafd113SDawid Weiss> Task :lucene:analysis:common:generateUnicodeProps
106beafd113SDawid WeissChecksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodePropsInternal
1079d15435bSRobert Muir...
1089d15435bSRobert Muir
109beafd113SDawid WeissThis shouldn't worry you at all - the internal tasks are skipped by wrappers
110beafd113SDawid Weissif the inputs and outputs of the internal task have not changed. If they have changed,
111beafd113SDawid Weissthe task is re-run and followed up by other tasks, such as code-formatting (tidy).
1129d15435bSRobert Muir
1139d15435bSRobert MuirOf course, sometimes you may want to *force* the regeneration task to run, even if the
1149d15435bSRobert Muirchecksums indicate nothing has changed. This may happen because of several reasons:
1159d15435bSRobert Muir
1169d15435bSRobert Muir- the generation task has outputs but no inputs or the inputs are volatile. In this case
1179d15435bSRobert Muironly the outputs have checksums and the task will be skipped if the outputs haven't changed.
1189d15435bSRobert Muir
1199d15435bSRobert Muir- you may want to run the regeneration task just to see that it actually runs and produces
1209d15435bSRobert Muirthe same checksums (git diff should be clean). This would be a wise periodic sanity check
1219d15435bSRobert Muirto ensure everything works as expected.
1229d15435bSRobert Muir
1239d15435bSRobert MuirIf you want to force-run the regeneration, use gradle's "--rerun-tasks" option:
1249d15435bSRobert Muir
1259d15435bSRobert Muirgradlew regenerate --rerun-tasks
1269d15435bSRobert Muir
1279d15435bSRobert MuirScoping the call to a particular module will also work:
1289d15435bSRobert Muir
1299d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate --rerun-tasks
1309d15435bSRobert Muir
1319d15435bSRobert MuirScoping the call to a particular task will also work:
1329d15435bSRobert Muir
133beafd113SDawid Weissgradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks
1349d15435bSRobert Muir
1359d15435bSRobert MuirYou *should not* call the underlying generation task directly; this is possible
1369d15435bSRobert Muirbut discouraged:
1379d15435bSRobert Muir
138beafd113SDawid Weissgradlew -p lucene/analysis/common generateUnicodePropsInternal --rerun-tasks
1399d15435bSRobert Muir
1409d15435bSRobert MuirThe reason is that some of these generation tasks require follow-up (for example
1419d15435bSRobert Muirsource code tidying) and, more importantly, the checksums for these
1429d15435bSRobert Muirregenerated resources won't be saved (so the next time you run 'check' it'll fail
1439d15435bSRobert Muirwith checksum mismatches).
1449d15435bSRobert Muir
1459d15435bSRobert MuirFinally, if you do feel like force-regenerating everything, remember to exclude this
1469d15435bSRobert Muirmonster...
1479d15435bSRobert Muir
148beafd113SDawid Weissgradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks
149*bd8f182bSDawid Weiss
150*bd8f182bSDawid Weissand on Windows, exclude snowball regeneration (requires bash):
151*bd8f182bSDawid Weiss
152*bd8f182bSDawid Weissgradlew regenerate -x generateUAX29URLEmailTokenizerInternal -x snowball --rerun-tasks
153