icu tokenizer: switch to matching against partial names

When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.

The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.
This commit is contained in:
Sarah Hoffmann
2021-09-23 16:57:24 +02:00
parent c6fdcf9b0d
commit bd7c7ddad0
3 changed files with 68 additions and 49 deletions

View File

@@ -125,9 +125,6 @@ Feature: Creation of search terms
Then placex contains
| object | parent_place_id |
| N1 | N2 |
Then search_name contains
| object | name_vector | nameaddress_vector |
| N1 | #Walltown | Strange, Town |
When sending search query "23 Rose Street"
Then exactly 1 results are returned
And results contain
@@ -156,9 +153,6 @@ Feature: Creation of search terms
| W1 | highway | residential | Rose Street | :w-north |
| N2 | place | city | Strange Town | :p-N1 |
When importing
Then search_name contains
| object | name_vector | nameaddress_vector |
| N1 | #Walltown, #Blue house | Walltown, Strange, Town |
When sending search query "23 Walltown, Strange Town"
Then results contain
| osm | display_name |
@@ -190,9 +184,6 @@ Feature: Creation of search terms
| W1 | highway | residential | Rose Street | :w-north |
| N2 | place | city | Strange Town | :p-N1 |
When importing
Then search_name contains
| object | name_vector | nameaddress_vector |
| N1 | #Moon sun, #Blue house | Moon, Sun, Strange, Town |
When sending search query "23 Moon Sun, Strange Town"
Then results contain
| osm | display_name |
@@ -212,9 +203,6 @@ Feature: Creation of search terms
| W1 | highway | residential | Rose Street | Walltown | :w-north |
| N2 | place | suburb | Strange Town | Walltown | :p-N1 |
When importing
Then search_name contains
| object | name_vector | nameaddress_vector |
| N1 | #Walltown | Strange, Town |
When sending search query "23 Rose Street, Walltown"
Then exactly 1 result is returned
And results contain
@@ -303,9 +291,6 @@ Feature: Creation of search terms
| W1 | highway | residential | Rose Street | :w-north |
| N2 | place | suburb | Strange Town | :p-N1 |
When importing
Then search_name contains
| object | name_vector | nameaddress_vector |
| N1 | #Green Moss | Walltown |
When sending search query "Green Moss, Rose Street, Walltown"
Then exactly 0 result is returned
When sending search query "Green Moss, Walltown"