make word count computation part of the import

Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.

Adds about an hour import time.
This commit is contained in:
Sarah Hoffmann
2021-10-26 09:37:57 +02:00
parent d7267c1603
commit 9934421442
2 changed files with 3 additions and 14 deletions

View File

@@ -271,20 +271,7 @@ reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=
To run Nominatim via webservers like Apache or nginx, please read the
[Deployment chapter](Deployment.md).
## Tuning the database
Accurate word frequency information for search terms helps PostgreSQL's query
planner to make the right decisions. Recomputing them can improve the performance
of forward geocoding in particular under high load. To recompute word counts run:
```sh
nominatim refresh --word-counts
```
This will take a couple of hours for a full planet installation. You can
also defer that step to a later point in time when you realise that
performance becomes an issue. Just make sure that updates are stopped before
running this function.
## Adding search through category phrases
If you want to be able to search for places by their type through
[special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)

View File

@@ -125,6 +125,8 @@ class SetupAll:
freeze.drop_update_tables(conn)
tokenizer.finalize_import(args.config)
LOG.warning('Recompute word counts')
tokenizer.update_statistics()
webdir = args.project_dir / 'website'
LOG.warning('Setup website at %s', webdir)