make word count computation part of the import

Accurate word counts are now essential when using the ICU tokenizer and don't hurt for the legacy one. Adds about an hour import time.
2021-10-26 09:37:57 +02:00
parent d7267c1603
commit 9934421442
2 changed files with 3 additions and 14 deletions
--- a/nominatim/clicmd/setup.py
+++ b/nominatim/clicmd/setup.py
@@ -125,6 +125,8 @@ class SetupAll:
                freeze.drop_update_tables(conn)
        tokenizer.finalize_import(args.config)

+        LOG.warning('Recompute word counts')
+        tokenizer.update_statistics()

        webdir = args.project_dir / 'website'
        LOG.warning('Setup website at %s', webdir)