mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-03-07 02:24:08 +00:00
fix various typos
This commit is contained in:
@@ -57,9 +57,9 @@ the function.
|
|||||||
show_source: no
|
show_source: no
|
||||||
heading_level: 6
|
heading_level: 6
|
||||||
|
|
||||||
### The sanitation function
|
### The main filter function of the sanitizer
|
||||||
|
|
||||||
The sanitation function receives a single object of type `ProcessInfo`
|
The filter function receives a single object of type `ProcessInfo`
|
||||||
which has with three members:
|
which has with three members:
|
||||||
|
|
||||||
* `place`: read-only information about the place being processed.
|
* `place`: read-only information about the place being processed.
|
||||||
@@ -74,6 +74,22 @@ While the `place` member is provided for information only, the `names` and
|
|||||||
remove entries, change information within a single entry (for example by
|
remove entries, change information within a single entry (for example by
|
||||||
adding extra attributes) or completely replace the list with a different one.
|
adding extra attributes) or completely replace the list with a different one.
|
||||||
|
|
||||||
|
#### PlaceInfo - information about the place
|
||||||
|
|
||||||
|
::: nominatim.data.place_info.PlaceInfo
|
||||||
|
rendering:
|
||||||
|
show_source: no
|
||||||
|
heading_level: 6
|
||||||
|
|
||||||
|
|
||||||
|
#### PlaceName - extended naming information
|
||||||
|
|
||||||
|
::: nominatim.data.place_name.PlaceName
|
||||||
|
rendering:
|
||||||
|
show_source: no
|
||||||
|
heading_level: 6
|
||||||
|
|
||||||
|
|
||||||
### Example: Filter for US street prefixes
|
### Example: Filter for US street prefixes
|
||||||
|
|
||||||
The following sanitizer removes the directional prefixes from street names
|
The following sanitizer removes the directional prefixes from street names
|
||||||
@@ -102,49 +118,32 @@ the filter.
|
|||||||
The filter function first checks if the object is interesting for the
|
The filter function first checks if the object is interesting for the
|
||||||
sanitizer. Namely it checks if the place is in the US (through `country_code`)
|
sanitizer. Namely it checks if the place is in the US (through `country_code`)
|
||||||
and it the place is a street (a `rank_address` of 26 or 27). If the
|
and it the place is a street (a `rank_address` of 26 or 27). If the
|
||||||
conditions are met, then it goes through all available names and replaces
|
conditions are met, then it goes through all available names and
|
||||||
any removes any leading direction prefix using a simple regular expression.
|
removes any leading directional prefix using a simple regular expression.
|
||||||
|
|
||||||
Save the source code in a file in your project directory, for example as
|
Save the source code in a file in your project directory, for example as
|
||||||
`us_streets.py`. Then you can use the sanitizer in your `icu_tokenizer.yaml`:
|
`us_streets.py`. Then you can use the sanitizer in your `icu_tokenizer.yaml`:
|
||||||
|
|
||||||
```
|
``` yaml
|
||||||
...
|
...
|
||||||
sanitizers:
|
sanitizers:
|
||||||
- step: us_streets.py
|
- step: us_streets.py
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
For more sanitizer examples, have a look at the sanitizers provided by Nominatim.
|
|
||||||
They can be found in the directory `nominatim/tokenizer/sanitizers`.
|
|
||||||
|
|
||||||
!!! warning
|
!!! warning
|
||||||
This example is just a simplified show case on how to create a sanitizer.
|
This example is just a simplified show case on how to create a sanitizer.
|
||||||
It is not really read for real-world use: while the sanitizer would
|
It is not really read for real-world use: while the sanitizer would
|
||||||
correcly transform `West 5th Street` into `5th Street`. it would also
|
correcly transform `West 5th Street` into `5th Street`. it would also
|
||||||
shorten a simple `North Street` to `Street`.
|
shorten a simple `North Street` to `Street`.
|
||||||
|
|
||||||
#### PlaceInfo - information about the place
|
For more sanitizer examples, have a look at the sanitizers provided by Nominatim.
|
||||||
|
They can be found in the directory
|
||||||
|
[`nominatim/tokenizer/sanitizers`](https://github.com/osm-search/Nominatim/tree/master/nominatim/tokenizer/sanitizers).
|
||||||
|
|
||||||
::: nominatim.data.place_info.PlaceInfo
|
|
||||||
rendering:
|
|
||||||
show_source: no
|
|
||||||
heading_level: 6
|
|
||||||
|
|
||||||
|
|
||||||
#### PlaceName - extended naming information
|
|
||||||
|
|
||||||
::: nominatim.data.place_name.PlaceName
|
|
||||||
rendering:
|
|
||||||
show_source: no
|
|
||||||
heading_level: 6
|
|
||||||
|
|
||||||
## Custom token analysis module
|
## Custom token analysis module
|
||||||
|
|
||||||
Setup of a token analyser is split into two parts: configuration and
|
|
||||||
analyser factory. A token analysis module must therefore implement two
|
|
||||||
functions:
|
|
||||||
|
|
||||||
::: nominatim.tokenizer.token_analysis.base.AnalysisModule
|
::: nominatim.tokenizer.token_analysis.base.AnalysisModule
|
||||||
rendering:
|
rendering:
|
||||||
show_source: no
|
show_source: no
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ class PlaceName:
|
|||||||
is the part of the key after the first colon.
|
is the part of the key after the first colon.
|
||||||
|
|
||||||
In addition to that, a name may have arbitrary additional attributes.
|
In addition to that, a name may have arbitrary additional attributes.
|
||||||
How attributes are used, depends on the sanatizers and token analysers.
|
How attributes are used, depends on the sanitizers and token analysers.
|
||||||
The exception is is the 'analyzer' attribute. This attribute determines
|
The exception is is the 'analyzer' attribute. This attribute determines
|
||||||
which token analysis module will be used to finalize the treatment of
|
which token analysis module will be used to finalize the treatment of
|
||||||
names.
|
names.
|
||||||
|
|||||||
@@ -23,8 +23,8 @@ else:
|
|||||||
class SanitizerConfig(_BaseUserDict):
|
class SanitizerConfig(_BaseUserDict):
|
||||||
""" The `SanitizerConfig` class is a read-only dictionary
|
""" The `SanitizerConfig` class is a read-only dictionary
|
||||||
with configuration options for the sanitizer.
|
with configuration options for the sanitizer.
|
||||||
In addition to the usual dictionary function, the class provides
|
In addition to the usual dictionary functions, the class provides
|
||||||
accessors to standard sanatizer options that are used by many of the
|
accessors to standard sanitizer options that are used by many of the
|
||||||
sanitizers.
|
sanitizers.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -81,15 +81,15 @@ class SanitizerConfig(_BaseUserDict):
|
|||||||
|
|
||||||
def get_delimiter(self, default: str = ',;') -> Pattern[str]:
|
def get_delimiter(self, default: str = ',;') -> Pattern[str]:
|
||||||
""" Return the 'delimiters' parameter in the configuration as a
|
""" Return the 'delimiters' parameter in the configuration as a
|
||||||
compiled regular expression that can be used to split names on these
|
compiled regular expression that can be used to split strings on
|
||||||
delimiters.
|
these delimiters.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
default: Delimiters to be used, when 'delimiters' parameter
|
default: Delimiters to be used when 'delimiters' parameter
|
||||||
is not explicitly configured.
|
is not explicitly configured.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
A regular expression pattern, which can be used to
|
A regular expression pattern which can be used to
|
||||||
split a string. The regular expression makes sure that the
|
split a string. The regular expression makes sure that the
|
||||||
resulting names are stripped and that repeated delimiters
|
resulting names are stripped and that repeated delimiters
|
||||||
are ignored. It may still create empty fields on occasion. The
|
are ignored. It may still create empty fields on occasion. The
|
||||||
|
|||||||
@@ -44,15 +44,18 @@ class Analyzer(Protocol):
|
|||||||
A list of possible spelling variants. All strings must have
|
A list of possible spelling variants. All strings must have
|
||||||
been transformed with the global normalizer and
|
been transformed with the global normalizer and
|
||||||
transliterator ICU rules. Otherwise they cannot be matched
|
transliterator ICU rules. Otherwise they cannot be matched
|
||||||
against the query later.
|
against the input by the query frontend.
|
||||||
The list may be empty, when there are no useful
|
The list may be empty, when there are no useful
|
||||||
spelling variants. This may happen, when an analyzer only
|
spelling variants. This may happen when an analyzer only
|
||||||
produces extra variants to the canonical spelling.
|
usually outputs additional variants to the canonical spelling
|
||||||
|
and there are no such variants.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
class AnalysisModule(Protocol):
|
class AnalysisModule(Protocol):
|
||||||
""" Protocol for analysis modules.
|
""" The setup of the token analysis is split into two parts:
|
||||||
|
configuration and analyser factory. A token analysis module must
|
||||||
|
therefore implement the two functions here described.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def configure(self, rules: Mapping[str, Any],
|
def configure(self, rules: Mapping[str, Any],
|
||||||
@@ -64,13 +67,14 @@ class AnalysisModule(Protocol):
|
|||||||
Arguments:
|
Arguments:
|
||||||
rules: A dictionary with the additional configuration options
|
rules: A dictionary with the additional configuration options
|
||||||
as specified in the tokenizer configuration.
|
as specified in the tokenizer configuration.
|
||||||
normalizer: an ICU Transliterator with the compiled normalization
|
normalizer: an ICU Transliterator with the compiled
|
||||||
rules.
|
global normalization rules.
|
||||||
transliterator: an ICU transliterator with the compiled
|
transliterator: an ICU Transliterator with the compiled
|
||||||
transliteration rules.
|
global transliteration rules.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
A data object with the configuration that was set up. May be
|
A data object with configuration data. This will be handed
|
||||||
|
as is into the `create()` function and may be
|
||||||
used freely by the analysis module as needed.
|
used freely by the analysis module as needed.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -82,7 +86,7 @@ class AnalysisModule(Protocol):
|
|||||||
Arguments:
|
Arguments:
|
||||||
normalizer: an ICU Transliterator with the compiled normalization
|
normalizer: an ICU Transliterator with the compiled normalization
|
||||||
rules.
|
rules.
|
||||||
transliterator: an ICU tranliterator with the compiled
|
transliterator: an ICU Transliterator with the compiled
|
||||||
transliteration rules.
|
transliteration rules.
|
||||||
config: The object that was returned by the call to configure().
|
config: The object that was returned by the call to configure().
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user