mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-02-16 15:47:58 +00:00
add documentation for postcode customization
This commit is contained in:
149
docs/customize/Country-Settings.md
Normal file
149
docs/customize/Country-Settings.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Customizing Per-Country Data
|
||||
|
||||
Whenever an OSM is imported into Nominatim, the object is first assigned
|
||||
a country. Nominatim can use this information to adapt various aspects of
|
||||
the address computation to the local customs of the country. This section
|
||||
explains how country assignment works and the principal per-country
|
||||
localizations.
|
||||
|
||||
## Country assignment
|
||||
|
||||
Countries are assigned on the basis of country data from the OpenStreetMap
|
||||
input data itself. Countries are expected to be tagged according to the
|
||||
[administrative boundary schema](https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative):
|
||||
a OSM relation with `boundary=administrative` and `admin_level=2`. Nominatim
|
||||
uses the country code to distinguish the countries.
|
||||
|
||||
If there is no country data available for a point, then Nominatim uses the
|
||||
fallback data imported from `data/country_osm_grid.sql.gz`. This was computed
|
||||
from OSM data as well but is guaranteed to cover all countries.
|
||||
|
||||
Some OSM objects may also be located outside any country, for example a buoy
|
||||
in the middle of the ocean. These object do not get any country assigned and
|
||||
get a default treatment when it comes to localized handling of data.
|
||||
|
||||
## Per-country settings
|
||||
|
||||
### Global country settings
|
||||
|
||||
The main place to configure settings per country is the file
|
||||
`settings/country_settings.yaml`. This file has one section per country that
|
||||
is recognised by Nominatim. Each section is tagged with the country code
|
||||
(in lower case) and contains the different localization information. Only
|
||||
countries which are listed in this file are taken into account for computations.
|
||||
|
||||
For example, the section for Andorra looks like this:
|
||||
|
||||
```
|
||||
partition: 35
|
||||
languages: ca
|
||||
names: !include country-names/ad.yaml
|
||||
postcode:
|
||||
pattern: "(ddd)"
|
||||
output: AD\1
|
||||
```
|
||||
|
||||
The individual settings are described below.
|
||||
|
||||
#### `partition`
|
||||
|
||||
Nominatim internally splits the data into multiple tables to improve
|
||||
performance. The partition number tells Nominatim into which table to put
|
||||
the country. This is purely internal management and has no effect on the
|
||||
output data.
|
||||
|
||||
The default is to have one partition per country.
|
||||
|
||||
#### `languages`
|
||||
|
||||
A comma-separated list of ISO-639 language codes of default languages in the
|
||||
country. These are the languages used in name tags without a language suffix.
|
||||
Note that this is not necessarily the same as the list of official languages
|
||||
in the country. There may be officially recognised languages in a country
|
||||
which are only ever used in name tags with the appropriate language suffixes.
|
||||
Conversely, a non-official language may appear a lot in the name tags, for
|
||||
example when used as an unofficial Lingua Franca.
|
||||
|
||||
List the languages in order of frequency of appearance with the most frequently
|
||||
used language first. It is not recommended to add languages when there are only
|
||||
very few occurrences.
|
||||
|
||||
If only one language is listed, then Nominatim will 'auto-complete' the
|
||||
language of names without an explicit language-suffix.
|
||||
|
||||
#### `names`
|
||||
|
||||
List of names of the country and its translations. These names are used as
|
||||
a baseline. It is always possible to search countries by the given names, no
|
||||
matter what other names are in the OSM data. They are also used as a fallback
|
||||
when a needed translation is not available.
|
||||
|
||||
!!! Note
|
||||
The list of names per country is currently fairly large because Nominatim
|
||||
supports translations in many languages per default. That is why the
|
||||
name lists have been separated out into extra files. You can find the
|
||||
name lists in the file `settings/country-names/<country code>.yaml`.
|
||||
The names section in the main country settings file only refers to these
|
||||
files via the special `!include` directive.
|
||||
|
||||
#### `postcode`
|
||||
|
||||
Describes the format of the postcode that is in use in the country.
|
||||
|
||||
When a country has no official postcodes, set this to no. Example:
|
||||
|
||||
```
|
||||
ae:
|
||||
postcode: no
|
||||
```
|
||||
|
||||
When a country has a postcode, you need to state the postcode pattern and
|
||||
the default output format. Example:
|
||||
|
||||
```
|
||||
bm:
|
||||
postcode:
|
||||
pattern: "(ll)[ -]?(dd)"
|
||||
output: \1 \2
|
||||
```
|
||||
|
||||
The **pattern** is a regular expression that describes the possible formats
|
||||
accepted as a postcode. The pattern follows the standard syntax for
|
||||
[regular expressions in Python](https://docs.python.org/3/library/re.html#regular-expression-syntax)
|
||||
with two extra shortcuts: `d` is a shortcut for a single digit([0-9])
|
||||
and `l` for a single ASCII letter ([A-Z]).
|
||||
|
||||
Use match groups to indicate groups in the postcode that may optionally be
|
||||
separated with a space or a hyphen.
|
||||
|
||||
For example, the postcode for Bermuda above always consists of two letters
|
||||
and two digits. They may optionally be separated by a space or hyphen. That
|
||||
means that Nominatim will consider `AB56`, `AB 56` and `AB-56` spelling variants
|
||||
for one and the same postcode.
|
||||
|
||||
Never add the country code in front of the postcode pattern. Nominatim will
|
||||
automatically accept variants with a country code prefix for all postcodes.
|
||||
|
||||
The **output** field is an optional field that describes what the canonical
|
||||
spelling of the postcode should be. The format is the
|
||||
[regular expression expand syntax](https://docs.python.org/3/library/re.html#re.Match.expand) referring back to the bracket groups in the pattern.
|
||||
|
||||
Most simple postcodes only have one spelling variant. In that case, the
|
||||
**output** can be omitted. The postcode will simply be used as is.
|
||||
|
||||
In the Bermuda example above, the canonical spelling would be to have a space
|
||||
between letters and digits.
|
||||
|
||||
!!! Warning
|
||||
When your postcode pattern covers multiple variants of the postcode, then
|
||||
you must explicitly state the canonical output or Nominatim will not
|
||||
handle the variations correctly.
|
||||
|
||||
### Other country-specific configuration
|
||||
|
||||
There are some other configuration files where you can set localized settings
|
||||
according to the assigned country. These are:
|
||||
|
||||
* [Place ranking configuration](Ranking.md)
|
||||
|
||||
Please see the linked documentation sections for more information.
|
||||
@@ -205,6 +205,14 @@ The following is a list of sanitizers that are shipped with Nominatim.
|
||||
rendering:
|
||||
heading_level: 6
|
||||
|
||||
##### clean-postcodes
|
||||
|
||||
::: nominatim.tokenizer.sanitizers.clean_postcodes
|
||||
selection:
|
||||
members: False
|
||||
rendering:
|
||||
heading_level: 6
|
||||
|
||||
|
||||
#### Token Analysis
|
||||
|
||||
@@ -222,8 +230,12 @@ by a sanitizer (see for example the
|
||||
The token-analysis section contains the list of configured analyzers. Each
|
||||
analyzer must have an `id` parameter that uniquely identifies the analyzer.
|
||||
The only exception is the default analyzer that is used when no special
|
||||
analyzer was selected. There is one special id '@housenumber'. If an analyzer
|
||||
with that name is present, it is used for normalization of house numbers.
|
||||
analyzer was selected. There are analysers with special ids:
|
||||
|
||||
* '@housenumber'. If an analyzer with that name is present, it is used
|
||||
for normalization of house numbers.
|
||||
* '@potcode'. If an analyzer with that name is present, it is used
|
||||
for normalization of postcodes.
|
||||
|
||||
Different analyzer implementations may exist. To select the implementation,
|
||||
the `analyzer` parameter must be set. The different implementations are
|
||||
@@ -356,6 +368,14 @@ house numbers of the form '3 a', '3A', '3-A' etc. are all considered equivalent.
|
||||
|
||||
The analyzer cannot be customized.
|
||||
|
||||
##### Postcode token analyzer
|
||||
|
||||
The analyzer `postcodes` is pupose-made to analyze postcodes. It supports
|
||||
a 'lookup' varaint of the token, which produces variants with optional
|
||||
spaces. Use together with the clean-postcodes sanitizer.
|
||||
|
||||
The analyzer cannot be customized.
|
||||
|
||||
### Reconfiguration
|
||||
|
||||
Changing the configuration after the import is currently not possible, although
|
||||
|
||||
@@ -28,6 +28,7 @@ pages:
|
||||
- 'Overview': 'customize/Overview.md'
|
||||
- 'Import Styles': 'customize/Import-Styles.md'
|
||||
- 'Configuration Settings': 'customize/Settings.md'
|
||||
- 'Per-Country Data': 'customize/Country-Settings.md'
|
||||
- 'Place Ranking' : 'customize/Ranking.md'
|
||||
- 'Tokenizers' : 'customize/Tokenizers.md'
|
||||
- 'Special Phrases': 'customize/Special-Phrases.md'
|
||||
|
||||
Reference in New Issue
Block a user