mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-03-08 19:14:07 +00:00
make ICU the default tokenizer
This commit is contained in:
@@ -44,7 +44,7 @@ endif()
|
|||||||
|
|
||||||
set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
|
set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
|
||||||
set(BUILD_API on CACHE BOOL "Build everything for the API server")
|
set(BUILD_API on CACHE BOOL "Build everything for the API server")
|
||||||
set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module")
|
set(BUILD_MODULE off CACHE BOOL "Build PostgreSQL module for legacy tokenizer")
|
||||||
set(BUILD_TESTS on CACHE BOOL "Build test suite")
|
set(BUILD_TESTS on CACHE BOOL "Build test suite")
|
||||||
set(BUILD_DOCS on CACHE BOOL "Build documentation")
|
set(BUILD_DOCS on CACHE BOOL "Build documentation")
|
||||||
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
|
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
|
||||||
|
|||||||
@@ -158,6 +158,15 @@ make
|
|||||||
sudo make install
|
sudo make install
|
||||||
```
|
```
|
||||||
|
|
||||||
|
!!! warning
|
||||||
|
The default installation no longer compiles the PostgreSQL module that
|
||||||
|
is needed for the legacy tokenizer from older Nominatim versions. If you
|
||||||
|
are upgrading an older database or want to run the
|
||||||
|
[legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
|
||||||
|
some other reason, you need to enable the PostgreSQL module via
|
||||||
|
cmake: `cmake -DBUILD_MODULE=on ../Nominatim`
|
||||||
|
|
||||||
|
|
||||||
Nominatim installs itself into `/usr/local` per default. To choose a different
|
Nominatim installs itself into `/usr/local` per default. To choose a different
|
||||||
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
|
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
|
||||||
cmake command. Make sure that the `bin` directory is available in your path
|
cmake command. Make sure that the `bin` directory is available in your path
|
||||||
|
|||||||
@@ -19,7 +19,22 @@ they can be configured.
|
|||||||
|
|
||||||
The legacy tokenizer implements the analysis algorithms of older Nominatim
|
The legacy tokenizer implements the analysis algorithms of older Nominatim
|
||||||
versions. It uses a special Postgresql module to normalize names and queries.
|
versions. It uses a special Postgresql module to normalize names and queries.
|
||||||
This tokenizer is currently the default.
|
This tokenizer is automatically installed and used when upgrading an older
|
||||||
|
database. It should not be used for new installations anymore.
|
||||||
|
|
||||||
|
### Compiling the PostgreSQL module
|
||||||
|
|
||||||
|
The tokeinzer needs a special C module for PostgreSQL which is not compiled
|
||||||
|
by default. If you need the legacy tokenizer, compile Nominatim as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
cmake -DBUILD_MODULE=on
|
||||||
|
make
|
||||||
|
```
|
||||||
|
|
||||||
|
### Enabling the tokenizer
|
||||||
|
|
||||||
To enable the tokenizer add the following line to your project configuration:
|
To enable the tokenizer add the following line to your project configuration:
|
||||||
|
|
||||||
@@ -47,6 +62,7 @@ normalization functions are hard-coded.
|
|||||||
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
|
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
|
||||||
normalize names and queries. It also offers configurable decomposition and
|
normalize names and queries. It also offers configurable decomposition and
|
||||||
abbreviation handling.
|
abbreviation handling.
|
||||||
|
This tokenizer is currently the default.
|
||||||
|
|
||||||
To enable the tokenizer add the following line to your project configuration:
|
To enable the tokenizer add the following line to your project configuration:
|
||||||
|
|
||||||
|
|||||||
@@ -21,8 +21,8 @@ NOMINATIM_DATABASE_MODULE_PATH=
|
|||||||
# Tokenizer used for normalizing and parsing queries and names.
|
# Tokenizer used for normalizing and parsing queries and names.
|
||||||
# The tokenizer is set up during import and cannot be changed afterwards
|
# The tokenizer is set up during import and cannot be changed afterwards
|
||||||
# without a reimport.
|
# without a reimport.
|
||||||
# Currently available tokenizers: legacy
|
# Currently available tokenizers: icu, legacy
|
||||||
NOMINATIM_TOKENIZER="legacy"
|
NOMINATIM_TOKENIZER="icu"
|
||||||
|
|
||||||
# Number of occurrences of a word before it is considered frequent.
|
# Number of occurrences of a word before it is considered frequent.
|
||||||
# Similar to the concept of stop words. Frequent partial words get ignored
|
# Similar to the concept of stop words. Frequent partial words get ignored
|
||||||
|
|||||||
Reference in New Issue
Block a user