make ICU the default tokenizer

This commit is contained in:
Sarah Hoffmann
2022-05-10 12:02:50 +02:00
parent ed6fda6968
commit 4002bee0c1
4 changed files with 29 additions and 4 deletions

View File

@@ -44,7 +44,7 @@ endif()
set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database") set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
set(BUILD_API on CACHE BOOL "Build everything for the API server") set(BUILD_API on CACHE BOOL "Build everything for the API server")
set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module") set(BUILD_MODULE off CACHE BOOL "Build PostgreSQL module for legacy tokenizer")
set(BUILD_TESTS on CACHE BOOL "Build test suite") set(BUILD_TESTS on CACHE BOOL "Build test suite")
set(BUILD_DOCS on CACHE BOOL "Build documentation") set(BUILD_DOCS on CACHE BOOL "Build documentation")
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page") set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")

View File

@@ -158,6 +158,15 @@ make
sudo make install sudo make install
``` ```
!!! warning
The default installation no longer compiles the PostgreSQL module that
is needed for the legacy tokenizer from older Nominatim versions. If you
are upgrading an older database or want to run the
[legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
some other reason, you need to enable the PostgreSQL module via
cmake: `cmake -DBUILD_MODULE=on ../Nominatim`
Nominatim installs itself into `/usr/local` per default. To choose a different Nominatim installs itself into `/usr/local` per default. To choose a different
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
cmake command. Make sure that the `bin` directory is available in your path cmake command. Make sure that the `bin` directory is available in your path

View File

@@ -19,7 +19,22 @@ they can be configured.
The legacy tokenizer implements the analysis algorithms of older Nominatim The legacy tokenizer implements the analysis algorithms of older Nominatim
versions. It uses a special Postgresql module to normalize names and queries. versions. It uses a special Postgresql module to normalize names and queries.
This tokenizer is currently the default. This tokenizer is automatically installed and used when upgrading an older
database. It should not be used for new installations anymore.
### Compiling the PostgreSQL module
The tokeinzer needs a special C module for PostgreSQL which is not compiled
by default. If you need the legacy tokenizer, compile Nominatim as follows:
```
mkdir build
cd build
cmake -DBUILD_MODULE=on
make
```
### Enabling the tokenizer
To enable the tokenizer add the following line to your project configuration: To enable the tokenizer add the following line to your project configuration:
@@ -47,6 +62,7 @@ normalization functions are hard-coded.
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and normalize names and queries. It also offers configurable decomposition and
abbreviation handling. abbreviation handling.
This tokenizer is currently the default.
To enable the tokenizer add the following line to your project configuration: To enable the tokenizer add the following line to your project configuration:

View File

@@ -21,8 +21,8 @@ NOMINATIM_DATABASE_MODULE_PATH=
# Tokenizer used for normalizing and parsing queries and names. # Tokenizer used for normalizing and parsing queries and names.
# The tokenizer is set up during import and cannot be changed afterwards # The tokenizer is set up during import and cannot be changed afterwards
# without a reimport. # without a reimport.
# Currently available tokenizers: legacy # Currently available tokenizers: icu, legacy
NOMINATIM_TOKENIZER="legacy" NOMINATIM_TOKENIZER="icu"
# Number of occurrences of a word before it is considered frequent. # Number of occurrences of a word before it is considered frequent.
# Similar to the concept of stop words. Frequent partial words get ignored # Similar to the concept of stop words. Frequent partial words get ignored