19d15435bSRobert MuirRegeneration 29d15435bSRobert Muir============ 39d15435bSRobert Muir 49d15435bSRobert MuirLucene has a number of machine-generated resources - some of these are 59d15435bSRobert Muirresource (binary) files, others are Java source files that are stored 69d15435bSRobert Muir(and compiled) with the rest of Lucene source code. 79d15435bSRobert Muir 89d15435bSRobert MuirIf you're reading this, chances are that: 99d15435bSRobert Muir 109d15435bSRobert Muir1) you've hit a precommit check error that said you've modified a generated 119d15435bSRobert Muir resource and some checksums are out of sync. 129d15435bSRobert Muir 139d15435bSRobert Muir2) you need to regenerate one (or more) of these resources. 149d15435bSRobert Muir 159d15435bSRobert MuirIn many cases hitting (1) means you'll have to do (2) so let's discuss 169d15435bSRobert Muirthese in order. 179d15435bSRobert Muir 189d15435bSRobert Muir 199d15435bSRobert MuirChecksum validation errors 209d15435bSRobert Muir-------------------------- 219d15435bSRobert Muir 229d15435bSRobert MuirLUCENE-9868 introduced a system of storing (and validating) checksums of 239d15435bSRobert Muirgenerated files so that they are not accidentally modified. This checkums 249d15435bSRobert Muirsystem will fail the build with a message similar to this one: 259d15435bSRobert Muir 269d15435bSRobert MuirExecution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'. 279d15435bSRobert Muir> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged): 289d15435bSRobert Muir Actual: 299d15435bSRobert Muir lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326 309d15435bSRobert Muir 319d15435bSRobert Muir Expected: 329d15435bSRobert Muir lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8 339d15435bSRobert Muir 349d15435bSRobert MuirThe message shows you which resources have mismatches on checksums (in this case 359d15435bSRobert MuirStandardTokenizerImpl.java) but also the *module* where the generated 369d15435bSRobert Muirresource exists and the *task name* that should be used to regenerate this resource: 379d15435bSRobert Muir 389d15435bSRobert Muir:lucene:core:generateStandardTokenizerIfChanged 399d15435bSRobert Muir 409d15435bSRobert MuirTo resolve the problem, try to: 419d15435bSRobert Muir 429d15435bSRobert Muir1) "git diff" the changes that caused the build failure (to see why the checksums 439d15435bSRobert Muirchanged) and then decide whether to update the generated resource's template (or whatever 449d15435bSRobert Muirit is using to emit the generated resource); 459d15435bSRobert Muir 469d15435bSRobert Muir2) regenerate the derived resources, possibly saving new checksums. If you decide to 479d15435bSRobert Muirregenerate, just run the task hinted at in the error message, for example: 489d15435bSRobert Muir 499d15435bSRobert Muirgradlew :lucene:core:generateStandardTokenizerIfChanged 509d15435bSRobert Muir 519d15435bSRobert MuirThis regenerates all resources the task "generateStandardTokenizer" produces 529d15435bSRobert Muirand updates the corresponding checksums. 539d15435bSRobert Muir 549d15435bSRobert Muir 559d15435bSRobert MuirResource regeneration 569d15435bSRobert Muir--------------------- 579d15435bSRobert Muir 589d15435bSRobert MuirThe "convention" task for regenerating all derived resources in a given 599d15435bSRobert Muirmodule is called "regenerate" and you can apply it to all Lucene modules 609d15435bSRobert Muirby running: 619d15435bSRobert Muir 629d15435bSRobert Muirgradlew regenerate 639d15435bSRobert Muir 649d15435bSRobert MuirIt is typically much wiser to limit the scope of regeneration to only 659d15435bSRobert Muirthe module you're working with though: 669d15435bSRobert Muir 679d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate 689d15435bSRobert Muir 699d15435bSRobert MuirIf you're interested in what specific generation tasks are available, see 709d15435bSRobert Muirthe task list for the generation group: 719d15435bSRobert Muir 729d15435bSRobert Muirgradlew tasks --group generation 739d15435bSRobert Muir 749d15435bSRobert Muiror limit the output to a particular module: 759d15435bSRobert Muir 769d15435bSRobert Muirgradlew -p lucene/analysis/common tasks --group generation 779d15435bSRobert Muir 789d15435bSRobert Muirwhich displays (at the moment of writing): 799d15435bSRobert Muir 80beafd113SDawid WeissgenerateClassicTokenizer - Regenerate ClassicTokenizerImpl.java (if sources changed) 81beafd113SDawid WeissgenerateHTMLStripCharFilter - Regenerate HTMLStripCharFilter.java (if sources changed) 82beafd113SDawid WeissgenerateTlds - Regenerate top-level domain jflex macros and tests (if sources changed) 83beafd113SDawid WeissgenerateUAX29URLEmailTokenizer - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed) 84beafd113SDawid WeissgenerateWikipediaTokenizer - Regenerate WikipediaTokenizerImpl.java (if sources changed) 859d15435bSRobert Muirregenerate - Rerun any code or static data generation tasks. 869d15435bSRobert Muirsnowball - Regenerates snowball stemmers. 879d15435bSRobert Muir 88beafd113SDawid WeissYou may wonder why none of these tasks actually exist in gradle source files (identically 89beafd113SDawid Weissnamed tasks with a suffix "Internal" exist). 909d15435bSRobert Muir 919d15435bSRobert Muir 929d15435bSRobert MuirResource checksums, incremental generation and advanced topics 939d15435bSRobert Muir-------------------------------------------------------------- 949d15435bSRobert Muir 959d15435bSRobert MuirMany resource generation tasks require specific tools (perl, python, bash shell) 969d15435bSRobert Muirand resources that may not be available on all platforms. In LUCENE-9868 we tried 979d15435bSRobert Muirto make resource generation tasks "incremental" so that they only run if their 989d15435bSRobert Muirsources (or outputs) have changed. So if you run the generic "regenerate" task, many of the 999d15435bSRobert Muiractual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with 1009d15435bSRobert Muirplain console, for example: 1019d15435bSRobert Muir 1029d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate --console=plain 1039d15435bSRobert Muir 1049d15435bSRobert Muir... 105beafd113SDawid Weiss> Task :lucene:analysis:common:generateUnicodeProps 106beafd113SDawid WeissChecksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodePropsInternal 1079d15435bSRobert Muir... 1089d15435bSRobert Muir 109beafd113SDawid WeissThis shouldn't worry you at all - the internal tasks are skipped by wrappers 110beafd113SDawid Weissif the inputs and outputs of the internal task have not changed. If they have changed, 111beafd113SDawid Weissthe task is re-run and followed up by other tasks, such as code-formatting (tidy). 1129d15435bSRobert Muir 1139d15435bSRobert MuirOf course, sometimes you may want to *force* the regeneration task to run, even if the 1149d15435bSRobert Muirchecksums indicate nothing has changed. This may happen because of several reasons: 1159d15435bSRobert Muir 1169d15435bSRobert Muir- the generation task has outputs but no inputs or the inputs are volatile. In this case 1179d15435bSRobert Muironly the outputs have checksums and the task will be skipped if the outputs haven't changed. 1189d15435bSRobert Muir 1199d15435bSRobert Muir- you may want to run the regeneration task just to see that it actually runs and produces 1209d15435bSRobert Muirthe same checksums (git diff should be clean). This would be a wise periodic sanity check 1219d15435bSRobert Muirto ensure everything works as expected. 1229d15435bSRobert Muir 1239d15435bSRobert MuirIf you want to force-run the regeneration, use gradle's "--rerun-tasks" option: 1249d15435bSRobert Muir 1259d15435bSRobert Muirgradlew regenerate --rerun-tasks 1269d15435bSRobert Muir 1279d15435bSRobert MuirScoping the call to a particular module will also work: 1289d15435bSRobert Muir 1299d15435bSRobert Muirgradlew -p lucene/analysis/common regenerate --rerun-tasks 1309d15435bSRobert Muir 1319d15435bSRobert MuirScoping the call to a particular task will also work: 1329d15435bSRobert Muir 133beafd113SDawid Weissgradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks 1349d15435bSRobert Muir 1359d15435bSRobert MuirYou *should not* call the underlying generation task directly; this is possible 1369d15435bSRobert Muirbut discouraged: 1379d15435bSRobert Muir 138beafd113SDawid Weissgradlew -p lucene/analysis/common generateUnicodePropsInternal --rerun-tasks 1399d15435bSRobert Muir 1409d15435bSRobert MuirThe reason is that some of these generation tasks require follow-up (for example 1419d15435bSRobert Muirsource code tidying) and, more importantly, the checksums for these 1429d15435bSRobert Muirregenerated resources won't be saved (so the next time you run 'check' it'll fail 1439d15435bSRobert Muirwith checksum mismatches). 1449d15435bSRobert Muir 1459d15435bSRobert MuirFinally, if you do feel like force-regenerating everything, remember to exclude this 1469d15435bSRobert Muirmonster... 1479d15435bSRobert Muir 148beafd113SDawid Weissgradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks 149*bd8f182bSDawid Weiss 150*bd8f182bSDawid Weissand on Windows, exclude snowball regeneration (requires bash): 151*bd8f182bSDawid Weiss 152*bd8f182bSDawid Weissgradlew regenerate -x generateUAX29URLEmailTokenizerInternal -x snowball --rerun-tasks 153