Sarah Hoffmann
6d41046b15
add support for external sanitizer modules
2022-07-25 16:10:19 +02:00
Sarah Hoffmann
62eedbb8f6
add type hints for sanitizers
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324
postcodes: add support for optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
baee6f3de0
postcodes: strip leading country codes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
28ab2f6048
add postcodes patterns without optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
90d4d339db
initial postcode cleaner for simple patterns
...
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
a6b4e8ff67
add tests for housenumber-as-name feature
2022-02-07 11:45:12 +01:00
Sarah Hoffmann
38c3ef3da0
add tests for get_string_list()
...
Renaming test file for sanitizer config because pytest requires
unique names for test files.
2022-02-07 11:22:24 +01:00
Sarah Hoffmann
610f2cc254
sanitizer: move helpers into a configuration class
2022-02-07 10:48:00 +01:00
Sarah Hoffmann
3741afa6dc
generalize filter-kind parameter for sanatizers
...
Now behaves the same for tag_analyzer_by_language and
clean_housenumbers. Adds tests.
2022-01-20 15:42:42 +01:00
Sarah Hoffmann
4774e45218
clean_housenumbers: make kinds and delimiters configurable
...
Also adds unit tests for various options.
2022-01-20 12:07:12 +01:00
Sarah Hoffmann
c3788d765e
add consistent SPDX copyright headers
2022-01-03 16:23:58 +01:00
Sarah Hoffmann
b18d042832
add tests for sanitizer tagging language
2021-10-06 12:29:25 +02:00
Sarah Hoffmann
732cd27d2e
add unit tests for new sanatizer functions
2021-10-01 12:27:24 +02:00