Adds sanitizer for preventing certain tags to enter search index based on parameters

fix: pylint error

added docs for delete tags sanitizer

fixed typos in docs and code comments

fix: python typechecking error

fixed rank address type

Revert "fixed typos in docs and code comments"

This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60.

added default parameters and refactored code

added test for all parameters
This commit is contained in:
biswajit-k
2023-03-02 20:25:06 +05:30
parent 8191c747b9
commit ca149fb796
3 changed files with 479 additions and 2 deletions

View File

@@ -102,7 +102,7 @@ Here is an example configuration file:
``` yaml
normalization:
- ":: lower ()"
- "ß > 'ss'" # German szet is unimbigiously equal to double ss
- "ß > 'ss'" # German szet is unambiguously equal to double ss
transliteration:
- !include /etc/nominatim/icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
@@ -128,7 +128,7 @@ The configuration file contains four sections:
The normalization and transliteration sections each define a set of
ICU rules that are applied to the names.
The **normalisation** rules are applied after sanitation. They should remove
The **normalization** rules are applied after sanitation. They should remove
any information that is not relevant for search at all. Usual rules to be
applied here are: lower-casing, removing of special characters, cleanup of
spaces.
@@ -221,7 +221,13 @@ The following is a list of sanitizers that are shipped with Nominatim.
rendering:
heading_level: 6
#### delete-tags
::: nominatim.tokenizer.sanitizers.delete_tags
selection:
members: False
rendering:
heading_level: 6
#### Token Analysis