Sarah Hoffmann
c8873d34af
harmonize interface of token analysis module
...
The configure() function now receives a Transliterator object instead
of the ICU rules. This harmonizes the parameters with the create
function.
2022-07-29 10:43:07 +02:00
Sarah Hoffmann
6d41046b15
add support for external sanitizer modules
2022-07-25 16:10:19 +02:00
Sarah Hoffmann
7b7203c149
add function for loading plugin modules
...
Loads modules for configurable code like tokenizers, sanitizers, etc.
Supports internal modules, external libraries and code from the
project directory.
2022-07-25 16:10:10 +02:00
Kian-Meng Ang
f5e52e748f
docs: fix typos
2022-07-20 22:05:31 +08:00
Sarah Hoffmann
9963261d8d
add type annotations to special phrase importer
2022-07-18 09:54:29 +02:00
Sarah Hoffmann
62eedbb8f6
add type hints for sanitizers
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
aaf2b6032e
fix uses of config.get_path() to expect None
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
b1903f0fbf
Merge pull request #2761 from lonvia/repair-index-analysis
...
Repair `admin --analyse-indexing`
2022-07-18 09:38:08 +02:00
marc tobias
c70ca7f57b
In tests for PHP 8 disable Just-in-time, it conflicts with tools that determine coverage
2022-07-09 22:03:48 +02:00
Sarah Hoffmann
4b12d52ef5
convert admin --analyse-indexing to new indexing method
...
A proper run of indexing requires the place information from the
analyzer. Add the pre-processing of place data, so the right
information is handed into the update function.
2022-07-07 16:20:08 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
69e51aebab
test: avoid column names with upper-case letters
...
This may cause problems when the column names get quoted.
2022-07-05 09:12:55 +02:00
Sarah Hoffmann
612d34930b
handle postcodes properly on word table updates
...
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
7b6ec4fc6c
add tests for discarding bad postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
80ea13437d
move postcode matcher in a separate file
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
4885fdf0f9
add class for online centroid computation
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324
postcodes: add support for optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
baee6f3de0
postcodes: strip leading country codes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
28ab2f6048
add postcodes patterns without optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
90d4d339db
initial postcode cleaner for simple patterns
...
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
cbb4749996
change indexing order for interpolations
...
Interpolations are now indexed after rank 30 objects. The housenumber
nodes no longer need information from the interpolations while the
interpolations can make use of precomputed postcodes.
2022-06-02 15:16:46 +02:00
Sarah Hoffmann
46689df668
custom comparison for SpecialPhrase
...
Duplicate elemination only works when a custom hash/equal function
is implemented that is based on the members.
2022-05-30 16:30:41 +02:00
Sarah Hoffmann
e828d0d3f7
move quoting hack to wiki loader
...
The bad quotes around the type for special phrases
specifically occure in the Wiki pages, so it should be
removed by the loader and not in the generic SpecialPhrase
object.
2022-05-30 14:40:33 +02:00
Sarah Hoffmann
cce0e5ea38
convert special phrase loaders to generators
...
Generators simplify the code quite a bit compared to the previous
Iterator approach.
2022-05-30 14:12:46 +02:00
Sarah Hoffmann
042e314589
remove the language parameter in the SPWikiLoader
...
Languages must always be configured through config or environment.
Also use monkeypatched environment in tests.
2022-05-30 10:26:20 +02:00
Sarah Hoffmann
61d813bfef
add get_str_list() for config
...
Converts a config value written as a comma-sparated list into
a Python list of strings.
2022-05-29 13:53:50 +02:00
Sarah Hoffmann
adeebec32a
switch tests to ICU tokenizer as default
2022-05-10 14:54:50 +02:00
Sarah Hoffmann
ed6fda6968
Merge pull request #2702 from lonvia/move-country-names-into-includes
...
Clean up country name settings
2022-05-10 09:21:16 +02:00
Marc Tobias
821dabb138
add git commit hash to --version output
2022-05-09 23:56:13 +02:00
Sarah Hoffmann
9d468f6da0
support arbitrary prefixes in country name list
...
This means we can now get rid of the last special cases for names.
2022-05-09 11:55:26 +02:00
Marc Tobias
0de83c4a51
fix typos of name Nominatim
2022-05-05 01:04:47 +02:00
Marc Tobias
a79ab41782
new nominatim --version CLI argument
2022-05-04 01:33:25 +02:00
Sarah Hoffmann
4f59644cc2
add tests for new data invalidation functions
2022-04-14 14:52:13 +02:00
Sarah Hoffmann
fd4ab3f262
Merge pull request #2629 from tareqpi/country-names-yaml-configuration
...
Move default country names into yaml configuration
2022-04-04 09:04:25 +02:00
Tareq Al-Ahdal
e9f979b67b
'read_config' is no longer a fixture
...
add 'read_config' to test cases that need it
2022-04-01 22:52:17 +08:00
Tareq Al-Ahdal
a323b8f63a
test for loading special characters from country_settings.yaml
2022-04-01 21:58:57 +08:00
Tareq Al-Ahdal
9411c14fd2
fix reset country info before loading custom data
2022-04-01 21:55:34 +08:00
Tareq Al-Ahdal
8525e7542f
custom country config loads correctly
2022-04-01 21:46:56 +08:00
Sarah Hoffmann
de18cd1523
add test for new table_has_column function
2022-03-31 15:55:20 +02:00
Tareq Al-Ahdal
b5f311d6bc
separate unit test function into three functions
2022-03-30 22:06:59 +08:00
Tareq Al-Ahdal
9db13aac72
Added unit tests for loading country info from yaml file
2022-03-25 22:22:44 +08:00
Sarah Hoffmann
a0ed80d821
restore the tokenizer directory when missing
...
Automatically repopulate the tokenizer/ directory with the PHP stub
and the postgresql module, when the directory is missing. This allows
to switch working directories and in particular run the service
from a different maschine then where it was installed.
Users still need to make sure that .env files are set up correctly
or they will shoot themselves in the foot.
See #2515 .
2022-03-20 11:31:42 +01:00
Sarah Hoffmann
0a9f971e44
add tests for new analyzed housenumbers
2022-03-01 09:34:32 +01:00
Sarah Hoffmann
837d44391c
move generation of normalized token form to analyzer
...
This gives the analyzer more flexibility in choosing the normalized
form. In particular, an analyzer creating different variants can choose
the variant that will be used as the canonical form.
2022-03-01 09:34:32 +01:00
Sarah Hoffmann
a6b4e8ff67
add tests for housenumber-as-name feature
2022-02-07 11:45:12 +01:00
Sarah Hoffmann
38c3ef3da0
add tests for get_string_list()
...
Renaming test file for sanitizer config because pytest requires
unique names for test files.
2022-02-07 11:22:24 +01:00
Sarah Hoffmann
610f2cc254
sanitizer: move helpers into a configuration class
2022-02-07 10:48:00 +01:00
Sarah Hoffmann
c170d323d9
add tests for cleaning housenumbers
2022-01-20 23:47:20 +01:00