make word count computation part of the import

Accurate word counts are now essential when using the ICU tokenizer and don't hurt for the legacy one. Adds about an hour import time.
2026-02-16 15:47:58 +00:00 · 2021-10-26 09:37:57 +02:00
parent d7267c1603
commit 9934421442
2 changed files with 3 additions and 14 deletions
--- a/docs/admin/Import.md
+++ b/docs/admin/Import.md
@@ -271,20 +271,7 @@ reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=
 To run Nominatim via webservers like Apache or nginx, please read the
 [Deployment chapter](Deployment.md).

-## Tuning the database
-
-Accurate word frequency information for search terms helps PostgreSQL's query
-planner to make the right decisions. Recomputing them can improve the performance
-of forward geocoding in particular under high load. To recompute word counts run:
-
-```sh
-nominatim refresh --word-counts
-```
-
-This will take a couple of hours for a full planet installation. You can
-also defer that step to a later point in time when you realise that
-performance becomes an issue. Just make sure that updates are stopped before
-running this function.
+## Adding search through category phrases

 If you want to be able to search for places by their type through
 [special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
--- a/nominatim/clicmd/setup.py
+++ b/nominatim/clicmd/setup.py
@@ -125,6 +125,8 @@ class SetupAll:
                freeze.drop_update_tables(conn)
        tokenizer.finalize_import(args.config)

+        LOG.warning('Recompute word counts')
+        tokenizer.update_statistics()

        webdir = args.project_dir / 'website'
        LOG.warning('Setup website at %s', webdir)