make word count computation part of the import

Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.

Adds about an hour import time.
This commit is contained in:
Sarah Hoffmann
2021-10-26 09:37:57 +02:00
parent d7267c1603
commit 9934421442
2 changed files with 3 additions and 14 deletions

View File

@@ -125,6 +125,8 @@ class SetupAll:
freeze.drop_update_tables(conn)
tokenizer.finalize_import(args.config)
LOG.warning('Recompute word counts')
tokenizer.update_statistics()
webdir = args.project_dir / 'website'
LOG.warning('Setup website at %s', webdir)