Commit Graph

16 Commits

Author SHA1 Message Date
Sarah Hoffmann
fdff579188 php: force use of global Exception class 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
001b2aa9f9 fix linitin issues in PHP 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1db098c05d reinstate word column in icu word table
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
6ad35aca4a adapt special terms lookup to new word table 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
70f154be8b switch word tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
4342b28882 switch special phrases to new word table format 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5394b1fa1b switch postcode tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5ab0a63fd6 switch housenumber tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1618aba5f2 switch country name tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
8096a1d67f fix parameters for TokenWord creation 2021-07-20 10:21:40 +02:00
Sarah Hoffmann
3cd85eaaf1 remove Token from explicit input for SearchDescription extension
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
2021-07-17 18:18:31 +02:00
Sarah Hoffmann
143ff14466 remove special status of partial tokens
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5 introduce a separate token type for partials
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
ba8ed7967d add PHP part for new ICU-base tokenizer 2021-05-05 10:15:27 +02:00