Regeneration ============ Lucene has a number of machine-generated resources - some of these are resource (binary) files, others are Java source files that are stored (and compiled) with the rest of Lucene source code. If you're reading this, chances are that: 1) you've hit a precommit check error that said you've modified a generated resource and some checksums are out of sync. 2) you need to regenerate one (or more) of these resources. In many cases hitting (1) means you'll have to do (2) so let's discuss these in order. Checksum validation errors -------------------------- LUCENE-9868 introduced a system of storing (and validating) checksums of generated files so that they are not accidentally modified. This checkums system will fail the build with a message similar to this one: Execution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'. > Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged): Actual: lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326 Expected: lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8 The message shows you which resources have mismatches on checksums (in this case StandardTokenizerImpl.java) but also the *module* where the generated resource exists and the *task name* that should be used to regenerate this resource: :lucene:core:generateStandardTokenizerIfChanged To resolve the problem, try to: 1) "git diff" the changes that caused the build failure (to see why the checksums changed) and then decide whether to update the generated resource's template (or whatever it is using to emit the generated resource); 2) regenerate the derived resources, possibly saving new checksums. If you decide to regenerate, just run the task hinted at in the error message, for example: gradlew :lucene:core:generateStandardTokenizerIfChanged This regenerates all resources the task "generateStandardTokenizer" produces and updates the corresponding checksums. Resource regeneration --------------------- The "convention" task for regenerating all derived resources in a given module is called "regenerate" and you can apply it to all Lucene modules by running: gradlew regenerate It is typically much wiser to limit the scope of regeneration to only the module you're working with though: gradlew -p lucene/analysis/common regenerate If you're interested in what specific generation tasks are available, see the task list for the generation group: gradlew tasks --group generation or limit the output to a particular module: gradlew -p lucene/analysis/common tasks --group generation which displays (at the moment of writing): generateClassicTokenizer - Regenerate ClassicTokenizerImpl.java (if sources changed) generateHTMLStripCharFilter - Regenerate HTMLStripCharFilter.java (if sources changed) generateTlds - Regenerate top-level domain jflex macros and tests (if sources changed) generateUAX29URLEmailTokenizer - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed) generateWikipediaTokenizer - Regenerate WikipediaTokenizerImpl.java (if sources changed) regenerate - Rerun any code or static data generation tasks. snowball - Regenerates snowball stemmers. You may wonder why none of these tasks actually exist in gradle source files (identically named tasks with a suffix "Internal" exist). Resource checksums, incremental generation and advanced topics -------------------------------------------------------------- Many resource generation tasks require specific tools (perl, python, bash shell) and resources that may not be available on all platforms. In LUCENE-9868 we tried to make resource generation tasks "incremental" so that they only run if their sources (or outputs) have changed. So if you run the generic "regenerate" task, many of the actual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with plain console, for example: gradlew -p lucene/analysis/common regenerate --console=plain ... > Task :lucene:analysis:common:generateUnicodeProps Checksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodePropsInternal ... This shouldn't worry you at all - the internal tasks are skipped by wrappers if the inputs and outputs of the internal task have not changed. If they have changed, the task is re-run and followed up by other tasks, such as code-formatting (tidy). Of course, sometimes you may want to *force* the regeneration task to run, even if the checksums indicate nothing has changed. This may happen because of several reasons: - the generation task has outputs but no inputs or the inputs are volatile. In this case only the outputs have checksums and the task will be skipped if the outputs haven't changed. - you may want to run the regeneration task just to see that it actually runs and produces the same checksums (git diff should be clean). This would be a wise periodic sanity check to ensure everything works as expected. If you want to force-run the regeneration, use gradle's "--rerun-tasks" option: gradlew regenerate --rerun-tasks Scoping the call to a particular module will also work: gradlew -p lucene/analysis/common regenerate --rerun-tasks Scoping the call to a particular task will also work: gradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks You *should not* call the underlying generation task directly; this is possible but discouraged: gradlew -p lucene/analysis/common generateUnicodePropsInternal --rerun-tasks The reason is that some of these generation tasks require follow-up (for example source code tidying) and, more importantly, the checksums for these regenerated resources won't be saved (so the next time you run 'check' it'll fail with checksum mismatches). Finally, if you do feel like force-regenerating everything, remember to exclude this monster... gradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks and on Windows, exclude snowball regeneration (requires bash): gradlew regenerate -x generateUAX29URLEmailTokenizerInternal -x snowball --rerun-tasks