mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-02-26 11:08:13 +00:00
remove documentation around legacy tokenizer
This commit is contained in:
@@ -61,8 +61,7 @@ pylint3 --extension-pkg-whitelist=osmium nominatim
|
|||||||
Before submitting a pull request make sure that the tests pass:
|
Before submitting a pull request make sure that the tests pass:
|
||||||
|
|
||||||
```
|
```
|
||||||
cd build
|
make tests
|
||||||
make test
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Releases
|
## Releases
|
||||||
|
|||||||
@@ -131,76 +131,13 @@ script ([Geofabrik](https://download.geofabrik.de)) provides daily updates.
|
|||||||
|
|
||||||
## Using an external PostgreSQL database
|
## Using an external PostgreSQL database
|
||||||
|
|
||||||
You can install Nominatim using a database that runs on a different server when
|
You can install Nominatim using a database that runs on a different server.
|
||||||
you have physical access to the file system on the other server. Nominatim
|
Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the
|
||||||
uses a custom normalization library that needs to be made accessible to the
|
server and follow the standard import documentation.
|
||||||
PostgreSQL server. This section explains how to set up the normalization
|
|
||||||
library.
|
|
||||||
|
|
||||||
!!! note
|
|
||||||
The external module is only needed when using the legacy tokenizer.
|
|
||||||
If you have chosen the ICU tokenizer, then you can ignore this section
|
|
||||||
and follow the standard import documentation.
|
|
||||||
|
|
||||||
### Option 1: Compiling the library on the database server
|
|
||||||
|
|
||||||
The most sure way to get a working library is to compile it on the database
|
|
||||||
server. From the prerequisites you need at least cmake, gcc and the
|
|
||||||
PostgreSQL server package.
|
|
||||||
|
|
||||||
Clone or unpack the Nominatim source code, enter the source directory and
|
|
||||||
create and enter a build directory.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
cd Nominatim
|
|
||||||
mkdir build
|
|
||||||
cd build
|
|
||||||
```
|
|
||||||
|
|
||||||
Now configure cmake to only build the PostgreSQL module and build it:
|
|
||||||
|
|
||||||
```
|
|
||||||
cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off ..
|
|
||||||
make
|
|
||||||
```
|
|
||||||
|
|
||||||
When done, you find the normalization library in `build/module/nominatim.so`.
|
|
||||||
Copy it to a place where it is readable and executable by the PostgreSQL server
|
|
||||||
process.
|
|
||||||
|
|
||||||
### Option 2: Compiling the library on the import machine
|
|
||||||
|
|
||||||
You can also compile the normalization library on the machine from where you
|
|
||||||
run the import.
|
|
||||||
|
|
||||||
!!! important
|
|
||||||
You can only do this when the database server and the import machine have
|
|
||||||
the same architecture and run the same version of Linux. Otherwise there is
|
|
||||||
no guarantee that the compiled library is compatible with the PostgreSQL
|
|
||||||
server running on the database server.
|
|
||||||
|
|
||||||
Make sure that the PostgreSQL server package is installed on the machine
|
|
||||||
**with the same version as on the database server**. You do not need to install
|
|
||||||
the PostgreSQL server itself.
|
|
||||||
|
|
||||||
Download and compile Nominatim as per standard instructions. Once done, you find
|
|
||||||
the normalization library in `build/module/nominatim.so`. Copy the file to
|
|
||||||
the database server at a location where it is readable and executable by the
|
|
||||||
PostgreSQL server process.
|
|
||||||
|
|
||||||
### Running the import
|
|
||||||
|
|
||||||
On the client side you now need to configure the import to point to the
|
|
||||||
correct location of the library **on the database server**. Add the following
|
|
||||||
line to your your `.env` file:
|
|
||||||
|
|
||||||
```
|
|
||||||
NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
|
|
||||||
```
|
|
||||||
|
|
||||||
Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
|
|
||||||
to follow the [standard instructions for importing](Import.md).
|
|
||||||
|
|
||||||
|
The import will be faster, if the import is run directly from the database
|
||||||
|
machine. You can easily switch to a different machine for the query frontend
|
||||||
|
after the import.
|
||||||
|
|
||||||
## Moving the database to another machine
|
## Moving the database to another machine
|
||||||
|
|
||||||
@@ -225,20 +162,9 @@ target machine.
|
|||||||
data updates but the resulting database is only about a third of the size
|
data updates but the resulting database is only about a third of the size
|
||||||
of a full database.
|
of a full database.
|
||||||
|
|
||||||
Next install Nominatim on the target machine by following the standard installation
|
Next install nominatim-api on the target machine by following the standard
|
||||||
instructions. Again, make sure to use the same version as the source machine.
|
installation instructions. Again, make sure to use the same version as the
|
||||||
|
source machine.
|
||||||
|
|
||||||
Create a project directory on your destination machine and set up the `.env`
|
Create a project directory on your destination machine and set up the `.env`
|
||||||
file to match the configuration on the source machine. Finally run
|
file to match the configuration on the source machine. That's all.
|
||||||
|
|
||||||
nominatim refresh --website
|
|
||||||
|
|
||||||
to make sure that the local installation of Nominatim will be used.
|
|
||||||
|
|
||||||
If you are using the legacy tokenizer you might also have to switch to the
|
|
||||||
PostgreSQL module that was compiled on your target machine. If you get errors
|
|
||||||
that PostgreSQL cannot find or access `nominatim.so` then rerun
|
|
||||||
|
|
||||||
nominatim refresh --functions
|
|
||||||
|
|
||||||
on the target machine to update the the location of the module.
|
|
||||||
|
|||||||
@@ -178,18 +178,6 @@ make
|
|||||||
sudo make install
|
sudo make install
|
||||||
```
|
```
|
||||||
|
|
||||||
!!! warning
|
|
||||||
The default installation no longer compiles the PostgreSQL module that
|
|
||||||
is needed for the legacy tokenizer from older Nominatim versions. If you
|
|
||||||
are upgrading an older database or want to run the
|
|
||||||
[legacy tokenizer](../customize/Tokenizers.md#legacy-tokenizer) for
|
|
||||||
some other reason, you need to enable the PostgreSQL module via
|
|
||||||
cmake: `cmake -DBUILD_MODULE=on ../Nominatim`. To compile the module
|
|
||||||
you need to have the server development headers for PostgreSQL installed.
|
|
||||||
On Ubuntu/Debian run: `sudo apt install postgresql-server-dev-<postgresql version>`
|
|
||||||
The legacy tokenizer is deprecated and will be removed in Nominatim 5.0
|
|
||||||
|
|
||||||
|
|
||||||
Nominatim installs itself into `/usr/local` per default. To choose a different
|
Nominatim installs itself into `/usr/local` per default. To choose a different
|
||||||
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
|
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
|
||||||
cmake command. Make sure that the `bin` directory is available in your path
|
cmake command. Make sure that the `bin` directory is available in your path
|
||||||
|
|||||||
@@ -64,26 +64,6 @@ Nominatim grants minimal rights to this user to all tables that are needed
|
|||||||
for running geocoding queries.
|
for running geocoding queries.
|
||||||
|
|
||||||
|
|
||||||
#### NOMINATIM_DATABASE_MODULE_PATH
|
|
||||||
|
|
||||||
| Summary | |
|
|
||||||
| -------------- | --------------------------------------------------- |
|
|
||||||
| **Description:** | Directory where to find the PostgreSQL server module |
|
|
||||||
| **Format:** | path |
|
|
||||||
| **Default:** | _empty_ (use `<project_directory>/module`) |
|
|
||||||
| **After Changes:** | run `nominatim refresh --functions` |
|
|
||||||
| **Comment:** | Legacy tokenizer only |
|
|
||||||
|
|
||||||
Defines the directory in which the PostgreSQL server module `nominatim.so`
|
|
||||||
is stored. The directory and module must be accessible by the PostgreSQL
|
|
||||||
server.
|
|
||||||
|
|
||||||
For information on how to use this setting when working with external databases,
|
|
||||||
see [Advanced Installations](../admin/Advanced-Installations.md).
|
|
||||||
|
|
||||||
The option is only used by the Legacy tokenizer and ignored otherwise.
|
|
||||||
|
|
||||||
|
|
||||||
#### NOMINATIM_TOKENIZER
|
#### NOMINATIM_TOKENIZER
|
||||||
|
|
||||||
| Summary | |
|
| Summary | |
|
||||||
@@ -114,20 +94,6 @@ on the file format.
|
|||||||
If a relative path is given, then the file is searched first relative to the
|
If a relative path is given, then the file is searched first relative to the
|
||||||
project directory and then in the global settings directory.
|
project directory and then in the global settings directory.
|
||||||
|
|
||||||
#### NOMINATIM_MAX_WORD_FREQUENCY
|
|
||||||
|
|
||||||
| Summary | |
|
|
||||||
| -------------- | --------------------------------------------------- |
|
|
||||||
| **Description:** | Number of occurrences before a word is considered frequent |
|
|
||||||
| **Format:** | int |
|
|
||||||
| **Default:** | 50000 |
|
|
||||||
| **After Changes:** | cannot be changed after import |
|
|
||||||
| **Comment:** | Legacy tokenizer only |
|
|
||||||
|
|
||||||
The word frequency count is used by the Legacy tokenizer to automatically
|
|
||||||
identify _stop words_. Any partial term that occurs more often then what
|
|
||||||
is defined in this setting, is effectively ignored during search.
|
|
||||||
|
|
||||||
|
|
||||||
#### NOMINATIM_LIMIT_REINDEXING
|
#### NOMINATIM_LIMIT_REINDEXING
|
||||||
|
|
||||||
@@ -162,25 +128,6 @@ codes, to restrict import to a subset of languages.
|
|||||||
Currently only affects the initial import of country names and special phrases.
|
Currently only affects the initial import of country names and special phrases.
|
||||||
|
|
||||||
|
|
||||||
#### NOMINATIM_TERM_NORMALIZATION
|
|
||||||
|
|
||||||
| Summary | |
|
|
||||||
| -------------- | --------------------------------------------------- |
|
|
||||||
| **Description:** | Rules for normalizing terms for comparisons |
|
|
||||||
| **Format:** | string: semicolon-separated list of ICU rules |
|
|
||||||
| **Default:** | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); |
|
|
||||||
| **Comment:** | Legacy tokenizer only |
|
|
||||||
|
|
||||||
[Special phrases](Special-Phrases.md) have stricter matching requirements than
|
|
||||||
normal search terms. They must appear exactly in the query after this term
|
|
||||||
normalization has been applied.
|
|
||||||
|
|
||||||
Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules
|
|
||||||
defined in the
|
|
||||||
[normalization section](Tokenizers.md#normalization-and-transliteration)
|
|
||||||
will be used.
|
|
||||||
|
|
||||||
|
|
||||||
#### NOMINATIM_USE_US_TIGER_DATA
|
#### NOMINATIM_USE_US_TIGER_DATA
|
||||||
|
|
||||||
| Summary | |
|
| Summary | |
|
||||||
|
|||||||
@@ -15,53 +15,6 @@ they can be configured.
|
|||||||
chosen tokenizer is very limited as well. See the comments in each tokenizer
|
chosen tokenizer is very limited as well. See the comments in each tokenizer
|
||||||
section.
|
section.
|
||||||
|
|
||||||
## Legacy tokenizer
|
|
||||||
|
|
||||||
!!! danger
|
|
||||||
The Legacy tokenizer is deprecated and will be removed in Nominatim 5.0.
|
|
||||||
If you still use a database with the legacy tokenizer, you must reimport
|
|
||||||
it using the ICU tokenizer below.
|
|
||||||
|
|
||||||
The legacy tokenizer implements the analysis algorithms of older Nominatim
|
|
||||||
versions. It uses a special Postgresql module to normalize names and queries.
|
|
||||||
This tokenizer is automatically installed and used when upgrading an older
|
|
||||||
database. It should not be used for new installations anymore.
|
|
||||||
|
|
||||||
### Compiling the PostgreSQL module
|
|
||||||
|
|
||||||
The tokeinzer needs a special C module for PostgreSQL which is not compiled
|
|
||||||
by default. If you need the legacy tokenizer, compile Nominatim as follows:
|
|
||||||
|
|
||||||
```
|
|
||||||
mkdir build
|
|
||||||
cd build
|
|
||||||
cmake -DBUILD_MODULE=on
|
|
||||||
make
|
|
||||||
```
|
|
||||||
|
|
||||||
### Enabling the tokenizer
|
|
||||||
|
|
||||||
To enable the tokenizer add the following line to your project configuration:
|
|
||||||
|
|
||||||
```
|
|
||||||
NOMINATIM_TOKENIZER=legacy
|
|
||||||
```
|
|
||||||
|
|
||||||
The Postgresql module for the tokenizer is available in the `module` directory
|
|
||||||
and also installed with the remainder of the software under
|
|
||||||
`lib/nominatim/module/nominatim.so`. You can specify a custom location for
|
|
||||||
the module with
|
|
||||||
|
|
||||||
```
|
|
||||||
NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
|
|
||||||
```
|
|
||||||
|
|
||||||
This is in particular useful when the database runs on a different server.
|
|
||||||
See [Advanced installations](../admin/Advanced-Installations.md#using-an-external-postgresql-database) for details.
|
|
||||||
|
|
||||||
There are no other configuration options for the legacy tokenizer. All
|
|
||||||
normalization functions are hard-coded.
|
|
||||||
|
|
||||||
## ICU tokenizer
|
## ICU tokenizer
|
||||||
|
|
||||||
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
|
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
|
||||||
|
|||||||
@@ -72,8 +72,6 @@ The tests can be configured with a set of environment variables (`behave -D key=
|
|||||||
* `DB_PORT` - (optional) port of database on host
|
* `DB_PORT` - (optional) port of database on host
|
||||||
* `DB_USER` - (optional) username of database login
|
* `DB_USER` - (optional) username of database login
|
||||||
* `DB_PASS` - (optional) password for database login
|
* `DB_PASS` - (optional) password for database login
|
||||||
* `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim
|
|
||||||
module shared library file (only needed for legacy tokenizer)
|
|
||||||
* `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
|
* `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
|
||||||
during the next run. Reusing the base templates speeds
|
during the next run. Reusing the base templates speeds
|
||||||
up tests considerably but might lead to outdated errors
|
up tests considerably but might lead to outdated errors
|
||||||
|
|||||||
@@ -18,12 +18,6 @@ NOMINATIM_DATABASE_WEBUSER="www-data"
|
|||||||
# Currently available tokenizers: icu, legacy
|
# Currently available tokenizers: icu, legacy
|
||||||
NOMINATIM_TOKENIZER="icu"
|
NOMINATIM_TOKENIZER="icu"
|
||||||
|
|
||||||
# Number of occurrences of a word before it is considered frequent.
|
|
||||||
# Similar to the concept of stop words. Frequent partial words get ignored
|
|
||||||
# or handled differently during search.
|
|
||||||
# Changing this value requires a reimport.
|
|
||||||
NOMINATIM_MAX_WORD_FREQUENCY=50000
|
|
||||||
|
|
||||||
# If true, admin level changes on places with many contained children are blocked.
|
# If true, admin level changes on places with many contained children are blocked.
|
||||||
NOMINATIM_LIMIT_REINDEXING=yes
|
NOMINATIM_LIMIT_REINDEXING=yes
|
||||||
|
|
||||||
@@ -34,12 +28,6 @@ NOMINATIM_LIMIT_REINDEXING=yes
|
|||||||
# Currently only affects the initial import of country names and special phrases.
|
# Currently only affects the initial import of country names and special phrases.
|
||||||
NOMINATIM_LANGUAGES=
|
NOMINATIM_LANGUAGES=
|
||||||
|
|
||||||
# Rules for normalizing terms for comparisons.
|
|
||||||
# The default is to remove accents and punctuation and to lower-case the
|
|
||||||
# term. Spaces are kept but collapsed to one standard space.
|
|
||||||
# Changing this value requires a reimport.
|
|
||||||
NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();"
|
|
||||||
|
|
||||||
# Configuration file for the tokenizer.
|
# Configuration file for the tokenizer.
|
||||||
# The content depends on the tokenizer used. If left empty the default settings
|
# The content depends on the tokenizer used. If left empty the default settings
|
||||||
# for the chosen tokenizer will be used. The configuration can only be set
|
# for the chosen tokenizer will be used. The configuration can only be set
|
||||||
|
|||||||
Reference in New Issue
Block a user