Sarah Hoffmann
51b6d16dc6
overhaul the token analysis interface
...
The functional split betweenthe two functions is now that the
first one creates the ID that is used in the word table and
the second one creates the variants. There no longer is a
requirement that the ID is the normalized version. We might
later reintroduce the requirement that a normalized version be available
but it doesn't necessarily need to be through the ID.
The function that creates the ID now gets the full PlaceName. That way
it might take into account attributes that were set by the sanitizers.
Finally rename both functions to something more sane.
2022-07-29 15:14:11 +02:00
Sarah Hoffmann
c8873d34af
harmonize interface of token analysis module
...
The configure() function now receives a Transliterator object instead
of the ICU rules. This harmonizes the parameters with the create
function.
2022-07-29 10:43:07 +02:00
Sarah Hoffmann
6d41046b15
add support for external sanitizer modules
2022-07-25 16:10:19 +02:00
Sarah Hoffmann
7b7203c149
add function for loading plugin modules
...
Loads modules for configurable code like tokenizers, sanitizers, etc.
Supports internal modules, external libraries and code from the
project directory.
2022-07-25 16:10:10 +02:00
Sarah Hoffmann
cd4bcea894
ignore API parameters in array notation
...
PHP automatically parses parameters in an array notation(foo[]) into
array types. Ignore these parameters as 'unknown'.
Fixes #2763 .
2022-07-23 10:51:44 +02:00
Kian-Meng Ang
f5e52e748f
docs: fix typos
2022-07-20 22:05:31 +08:00
Sarah Hoffmann
9963261d8d
add type annotations to special phrase importer
2022-07-18 09:54:29 +02:00
Sarah Hoffmann
62eedbb8f6
add type hints for sanitizers
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
fc254fc744
adapt use of Connection in bdd tests to name change
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
aaf2b6032e
fix uses of config.get_path() to expect None
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
b1903f0fbf
Merge pull request #2761 from lonvia/repair-index-analysis
...
Repair `admin --analyse-indexing`
2022-07-18 09:38:08 +02:00
marc tobias
c70ca7f57b
In tests for PHP 8 disable Just-in-time, it conflicts with tools that determine coverage
2022-07-09 22:03:48 +02:00
Sarah Hoffmann
4b12d52ef5
convert admin --analyse-indexing to new indexing method
...
A proper run of indexing requires the place information from the
analyzer. Add the pre-processing of place data, so the right
information is handed into the update function.
2022-07-07 16:20:08 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
69e51aebab
test: avoid column names with upper-case letters
...
This may cause problems when the column names get quoted.
2022-07-05 09:12:55 +02:00
Marc Tobias
ccf119206d
PHP 8 behaves slightly different with in_array and usort
2022-07-03 10:55:34 +02:00
Sarah Hoffmann
3dd7410bb7
bdd: correctly skip postcode tests for legacy
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
93d5be097a
bdd: do not expect legacy word table to be without empty tokens
...
It can happen for bogus names and this will not get fixed anymore.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
6eb9044353
adapt search algorithm to new postcode format in word
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
612d34930b
handle postcodes properly on word table updates
...
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
0f00f4968c
fix up BDD tests for postcode changes
...
Includes smaller code fixes found by the tests.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
7b6ec4fc6c
add tests for discarding bad postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
80ea13437d
move postcode matcher in a separate file
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
4885fdf0f9
add class for online centroid computation
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
9172696324
postcodes: add support for optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
baee6f3de0
postcodes: strip leading country codes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
28ab2f6048
add postcodes patterns without optional spaces
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
90d4d339db
initial postcode cleaner for simple patterns
...
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
8080625747
remove postcodes from countries that don't have them
...
The postcodes will only be removed as a 'computed postcode' they
are still searchable for the given object.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
d8623d6818
bdd: remove support for scenes
...
Only keep support for the special point geometry 'country:xx'.
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
6c58a4c46c
bdd: move query tests from scene to grid description
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
19f67e167c
bdd: remove step for scene setup
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
00d8df6fc3
bdd: move update tests from scenes to grid descriptions
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
02068aec7f
bdd: move import tests from scenes to grid descriptions
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
3493d317e4
bdd: clear lof buffer after a successful import run
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
a2b486a5b0
bdd: allow to set an origin of the grid
2022-06-17 11:54:18 +02:00
Sarah Hoffmann
df0142678a
improve address ordering with mixes of place and admin areas
...
Resolves a couple of situations where a mixed use of places areas and
administrative boundaries would result in a hierarchy that did not
properly respect the contains relation.
2022-06-16 10:44:16 +02:00
Sarah Hoffmann
15cf7dd416
add testcase for #2551
...
This test proves that places that are linked need to be reindexed.
2022-06-05 21:39:17 +02:00
Sarah Hoffmann
cbb4749996
change indexing order for interpolations
...
Interpolations are now indexed after rank 30 objects. The housenumber
nodes no longer need information from the interpolations while the
interpolations can make use of precomputed postcodes.
2022-06-02 15:16:46 +02:00
Sarah Hoffmann
8a0e3e2f3d
Merge pull request #2732 from lonvia/fix-ordering-address-parts
...
Fix order when searching for addr:* components
2022-05-31 20:26:05 +02:00
Sarah Hoffmann
bd0e157b91
fix order when searching for addr:* components
...
When matching addr:* components the preference was given to
matches that do not intersect with the place.
2022-05-31 16:57:37 +02:00
Sarah Hoffmann
46689df668
custom comparison for SpecialPhrase
...
Duplicate elemination only works when a custom hash/equal function
is implemented that is based on the members.
2022-05-30 16:30:41 +02:00
Sarah Hoffmann
e828d0d3f7
move quoting hack to wiki loader
...
The bad quotes around the type for special phrases
specifically occure in the Wiki pages, so it should be
removed by the loader and not in the generic SpecialPhrase
object.
2022-05-30 14:40:33 +02:00
Sarah Hoffmann
cce0e5ea38
convert special phrase loaders to generators
...
Generators simplify the code quite a bit compared to the previous
Iterator approach.
2022-05-30 14:12:46 +02:00
Sarah Hoffmann
042e314589
remove the language parameter in the SPWikiLoader
...
Languages must always be configured through config or environment.
Also use monkeypatched environment in tests.
2022-05-30 10:26:20 +02:00
Sarah Hoffmann
61d813bfef
add get_str_list() for config
...
Converts a config value written as a comma-sparated list into
a Python list of strings.
2022-05-29 13:53:50 +02:00
Sarah Hoffmann
1d203fdb3c
fix bug with keeping linking on updates
...
When moving the finding of linked places to the precomputation stage,
it was also moved before the statement where the linked_place_id was
removed from the linkee. The result was that the current linkee was
excluded when looking for a linked place on updates because it was
still linked to the boundary to be updated.
Fixed by allowing to either keep the linkage or change to an unlinked
place.
2022-05-23 10:55:10 +02:00
Sarah Hoffmann
f314abcfe1
bdd: restrict imports to four languages
...
This mainly restricts the number of country names that are loaded.
2022-05-11 16:40:53 +02:00