marc tobias
247afe1f56
sanetizer no longer strips name parts in brackets when more parts follow
2025-08-23 01:06:35 +02:00
Sarah Hoffmann
4cc788f69e
enable flake for Python tests
2025-03-09 15:33:24 +01:00
Sarah Hoffmann
4da4cbfe27
reduce from 3 to 2 packages
2024-06-28 09:13:22 +02:00
Sarah Hoffmann
2bab0ca060
port unit tests to new python package layout
2024-06-26 11:52:47 +02:00
Paweł Wroniszewski
2cae37ccde
Revert country settings
2023-10-20 12:50:28 +02:00
Paweł Wroniszewski
fbe40e005d
Properly validate postcodes with country code
...
Include postcode pattern in postcode normalisation regex, instead of
removing it from postcode pattern in config.
It properly handles postcode validation and normalization when country code
is part of the postcode, e.g. for Isle of Man, Jersey, Anguilla, Andorra,
Cayman Islands and more.
Fixes #3227 .
2023-10-17 01:04:07 +02:00
miku0
67e1c7dc72
Moved KANJI_MAP to icu-rules
2023-07-31 11:57:49 +00:00
miku0
4d61cc87cf
Add the test of reconbine_place
2023-07-31 02:39:56 +00:00
miku0
0722495434
add japanese sanitizer
2023-07-26 07:54:58 +00:00
biswajit-k
8f03c80ce8
generalize filter for sanitizers
2023-04-01 19:24:09 +05:30
biswajit-k
ca149fb796
Adds sanitizer for preventing certain tags to enter search index based on parameters
...
fix: pylint error
added docs for delete tags sanitizer
fixed typos in docs and code comments
fix: python typechecking error
fixed rank address type
Revert "fixed typos in docs and code comments"
This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60.
added default parameters and refactored code
added test for all parameters
2023-03-09 14:18:39 +05:30
Sarah Hoffmann
fd3dec8efe
add sanitizer for TIGER tags
...
Currently only takes over cleaning the tiger:county data. This was
done by the import until now.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
6d41046b15
add support for external sanitizer modules
2022-07-25 16:10:19 +02:00
Sarah Hoffmann
62eedbb8f6
add type hints for sanitizers
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324
postcodes: add support for optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
baee6f3de0
postcodes: strip leading country codes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
28ab2f6048
add postcodes patterns without optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
90d4d339db
initial postcode cleaner for simple patterns
...
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
a6b4e8ff67
add tests for housenumber-as-name feature
2022-02-07 11:45:12 +01:00
Sarah Hoffmann
38c3ef3da0
add tests for get_string_list()
...
Renaming test file for sanitizer config because pytest requires
unique names for test files.
2022-02-07 11:22:24 +01:00
Sarah Hoffmann
610f2cc254
sanitizer: move helpers into a configuration class
2022-02-07 10:48:00 +01:00
Sarah Hoffmann
3741afa6dc
generalize filter-kind parameter for sanatizers
...
Now behaves the same for tag_analyzer_by_language and
clean_housenumbers. Adds tests.
2022-01-20 15:42:42 +01:00
Sarah Hoffmann
4774e45218
clean_housenumbers: make kinds and delimiters configurable
...
Also adds unit tests for various options.
2022-01-20 12:07:12 +01:00
Sarah Hoffmann
c3788d765e
add consistent SPDX copyright headers
2022-01-03 16:23:58 +01:00
Sarah Hoffmann
b18d042832
add tests for sanitizer tagging language
2021-10-06 12:29:25 +02:00
Sarah Hoffmann
732cd27d2e
add unit tests for new sanatizer functions
2021-10-01 12:27:24 +02:00