mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-03-13 06:14:07 +00:00
make ICU the default tokenizer
This commit is contained in:
@@ -44,7 +44,7 @@ endif()
|
||||
|
||||
set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
|
||||
set(BUILD_API on CACHE BOOL "Build everything for the API server")
|
||||
set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module")
|
||||
set(BUILD_MODULE off CACHE BOOL "Build PostgreSQL module for legacy tokenizer")
|
||||
set(BUILD_TESTS on CACHE BOOL "Build test suite")
|
||||
set(BUILD_DOCS on CACHE BOOL "Build documentation")
|
||||
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
|
||||
|
||||
@@ -158,6 +158,15 @@ make
|
||||
sudo make install
|
||||
```
|
||||
|
||||
!!! warning
|
||||
The default installation no longer compiles the PostgreSQL module that
|
||||
is needed for the legacy tokenizer from older Nominatim versions. If you
|
||||
are upgrading an older database or want to run the
|
||||
[legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
|
||||
some other reason, you need to enable the PostgreSQL module via
|
||||
cmake: `cmake -DBUILD_MODULE=on ../Nominatim`
|
||||
|
||||
|
||||
Nominatim installs itself into `/usr/local` per default. To choose a different
|
||||
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
|
||||
cmake command. Make sure that the `bin` directory is available in your path
|
||||
|
||||
@@ -19,7 +19,22 @@ they can be configured.
|
||||
|
||||
The legacy tokenizer implements the analysis algorithms of older Nominatim
|
||||
versions. It uses a special Postgresql module to normalize names and queries.
|
||||
This tokenizer is currently the default.
|
||||
This tokenizer is automatically installed and used when upgrading an older
|
||||
database. It should not be used for new installations anymore.
|
||||
|
||||
### Compiling the PostgreSQL module
|
||||
|
||||
The tokeinzer needs a special C module for PostgreSQL which is not compiled
|
||||
by default. If you need the legacy tokenizer, compile Nominatim as follows:
|
||||
|
||||
```
|
||||
mkdir build
|
||||
cd build
|
||||
cmake -DBUILD_MODULE=on
|
||||
make
|
||||
```
|
||||
|
||||
### Enabling the tokenizer
|
||||
|
||||
To enable the tokenizer add the following line to your project configuration:
|
||||
|
||||
@@ -47,6 +62,7 @@ normalization functions are hard-coded.
|
||||
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
|
||||
normalize names and queries. It also offers configurable decomposition and
|
||||
abbreviation handling.
|
||||
This tokenizer is currently the default.
|
||||
|
||||
To enable the tokenizer add the following line to your project configuration:
|
||||
|
||||
|
||||
@@ -21,8 +21,8 @@ NOMINATIM_DATABASE_MODULE_PATH=
|
||||
# Tokenizer used for normalizing and parsing queries and names.
|
||||
# The tokenizer is set up during import and cannot be changed afterwards
|
||||
# without a reimport.
|
||||
# Currently available tokenizers: legacy
|
||||
NOMINATIM_TOKENIZER="legacy"
|
||||
# Currently available tokenizers: icu, legacy
|
||||
NOMINATIM_TOKENIZER="icu"
|
||||
|
||||
# Number of occurrences of a word before it is considered frequent.
|
||||
# Similar to the concept of stop words. Frequent partial words get ignored
|
||||
|
||||
Reference in New Issue
Block a user