Commit Graph

5 Commits

Author SHA1 Message Date
Sarah Hoffmann
3cd85eaaf1 remove Token from explicit input for SearchDescription extension
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
2021-07-17 18:18:31 +02:00
Sarah Hoffmann
143ff14466 remove special status of partial tokens
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5 introduce a separate token type for partials
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
ba8ed7967d add PHP part for new ICU-base tokenizer 2021-05-05 10:15:27 +02:00