1Regeneration 2============ 3 4Lucene has a number of machine-generated resources - some of these are 5resource (binary) files, others are Java source files that are stored 6(and compiled) with the rest of Lucene source code. 7 8If you're reading this, chances are that: 9 101) you've hit a precommit check error that said you've modified a generated 11 resource and some checksums are out of sync. 12 132) you need to regenerate one (or more) of these resources. 14 15In many cases hitting (1) means you'll have to do (2) so let's discuss 16these in order. 17 18 19Checksum validation errors 20-------------------------- 21 22LUCENE-9868 introduced a system of storing (and validating) checksums of 23generated files so that they are not accidentally modified. This checkums 24system will fail the build with a message similar to this one: 25 26Execution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'. 27> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged): 28 Actual: 29 lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326 30 31 Expected: 32 lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8 33 34The message shows you which resources have mismatches on checksums (in this case 35StandardTokenizerImpl.java) but also the *module* where the generated 36resource exists and the *task name* that should be used to regenerate this resource: 37 38:lucene:core:generateStandardTokenizerIfChanged 39 40To resolve the problem, try to: 41 421) "git diff" the changes that caused the build failure (to see why the checksums 43changed) and then decide whether to update the generated resource's template (or whatever 44it is using to emit the generated resource); 45 462) regenerate the derived resources, possibly saving new checksums. If you decide to 47regenerate, just run the task hinted at in the error message, for example: 48 49gradlew :lucene:core:generateStandardTokenizerIfChanged 50 51This regenerates all resources the task "generateStandardTokenizer" produces 52and updates the corresponding checksums. 53 54 55Resource regeneration 56--------------------- 57 58The "convention" task for regenerating all derived resources in a given 59module is called "regenerate" and you can apply it to all Lucene modules 60by running: 61 62gradlew regenerate 63 64It is typically much wiser to limit the scope of regeneration to only 65the module you're working with though: 66 67gradlew -p lucene/analysis/common regenerate 68 69If you're interested in what specific generation tasks are available, see 70the task list for the generation group: 71 72gradlew tasks --group generation 73 74or limit the output to a particular module: 75 76gradlew -p lucene/analysis/common tasks --group generation 77 78which displays (at the moment of writing): 79 80generateClassicTokenizer - Regenerate ClassicTokenizerImpl.java (if sources changed) 81generateHTMLStripCharFilter - Regenerate HTMLStripCharFilter.java (if sources changed) 82generateTlds - Regenerate top-level domain jflex macros and tests (if sources changed) 83generateUAX29URLEmailTokenizer - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed) 84generateWikipediaTokenizer - Regenerate WikipediaTokenizerImpl.java (if sources changed) 85regenerate - Rerun any code or static data generation tasks. 86snowball - Regenerates snowball stemmers. 87 88You may wonder why none of these tasks actually exist in gradle source files (identically 89named tasks with a suffix "Internal" exist). 90 91 92Resource checksums, incremental generation and advanced topics 93-------------------------------------------------------------- 94 95Many resource generation tasks require specific tools (perl, python, bash shell) 96and resources that may not be available on all platforms. In LUCENE-9868 we tried 97to make resource generation tasks "incremental" so that they only run if their 98sources (or outputs) have changed. So if you run the generic "regenerate" task, many of the 99actual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with 100plain console, for example: 101 102gradlew -p lucene/analysis/common regenerate --console=plain 103 104... 105> Task :lucene:analysis:common:generateUnicodeProps 106Checksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodePropsInternal 107... 108 109This shouldn't worry you at all - the internal tasks are skipped by wrappers 110if the inputs and outputs of the internal task have not changed. If they have changed, 111the task is re-run and followed up by other tasks, such as code-formatting (tidy). 112 113Of course, sometimes you may want to *force* the regeneration task to run, even if the 114checksums indicate nothing has changed. This may happen because of several reasons: 115 116- the generation task has outputs but no inputs or the inputs are volatile. In this case 117only the outputs have checksums and the task will be skipped if the outputs haven't changed. 118 119- you may want to run the regeneration task just to see that it actually runs and produces 120the same checksums (git diff should be clean). This would be a wise periodic sanity check 121to ensure everything works as expected. 122 123If you want to force-run the regeneration, use gradle's "--rerun-tasks" option: 124 125gradlew regenerate --rerun-tasks 126 127Scoping the call to a particular module will also work: 128 129gradlew -p lucene/analysis/common regenerate --rerun-tasks 130 131Scoping the call to a particular task will also work: 132 133gradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks 134 135You *should not* call the underlying generation task directly; this is possible 136but discouraged: 137 138gradlew -p lucene/analysis/common generateUnicodePropsInternal --rerun-tasks 139 140The reason is that some of these generation tasks require follow-up (for example 141source code tidying) and, more importantly, the checksums for these 142regenerated resources won't be saved (so the next time you run 'check' it'll fail 143with checksum mismatches). 144 145Finally, if you do feel like force-regenerating everything, remember to exclude this 146monster... 147 148gradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks 149 150and on Windows, exclude snowball regeneration (requires bash): 151 152gradlew regenerate -x generateUAX29URLEmailTokenizerInternal -x snowball --rerun-tasks 153