Commit Graph

31 Commits

Author SHA1 Message Date
Sarah Hoffmann
dddfa3a075 ignore irrelevant extra tags on address interpolations
When deciding if an address interpolation has address information, only
look for addr:street and addr:place. If they are not there go looking
for the address on the address nodes. Ignores irrelevant tags like
addr:inclusion.

Fixes #2797.
2022-08-13 14:07:06 +02:00
Sarah Hoffmann
93d5be097a bdd: do not expect legacy word table to be without empty tokens
It can happen for bogus names and this will not get fixed anymore.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
612d34930b handle postcodes properly on word table updates
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
00d8df6fc3 bdd: move update tests from scenes to grid descriptions 2022-06-17 11:54:18 +02:00
Sarah Hoffmann
3493d317e4 bdd: clear lof buffer after a successful import run 2022-06-17 11:54:18 +02:00
Sarah Hoffmann
e74e577029 bdd: recreate functions on template DB
Avoids calling function refresh on every scenario. The content won't
change between runs.
2022-05-11 15:50:22 +02:00
Sarah Hoffmann
aa0ae610c6 avoid calling OSM servers during bdd tests 2022-05-11 15:33:01 +02:00
Sarah Hoffmann
adeebec32a switch tests to ICU tokenizer as default 2022-05-10 14:54:50 +02:00
Sarah Hoffmann
42cd021d04 save differing linked polace names in extra fields
This keeps the names tracable and ensures that all names are searchable
when they differ. Do not keep names when they are exactly the same
to save some space. Linked names are cleaned out before relinking.
2022-03-16 16:38:52 +01:00
Sarah Hoffmann
f74228830d bdd: run full import on tests
This uncovered a couple of outdated/wrong tests which have been
fixed, too.
2022-02-24 14:27:51 +01:00
Sarah Hoffmann
c3788d765e add consistent SPDX copyright headers 2022-01-03 16:23:58 +01:00
Sarah Hoffmann
118858a55e rename legacy_icu tokenizer to icu tokenizer
The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.
2021-08-17 23:11:47 +02:00
Sarah Hoffmann
1db098c05d reinstate word column in icu word table
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
2e3c5d4c5b adapt tests for ICU tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1ccd4360b4 correctly handle removing all postcodes for country 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a263e54b94 enable BDD tests for different tokenizers
The tokenizer to be used can be choosen with -DTOKENIZER.

Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
2021-05-05 10:31:51 +02:00
Sarah Hoffmann
e1c5673ac3 require tokeinzer for indexer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8 introduce external processing in indexer
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
118befd7d7 bdd tests: make indexing less verbose
Do not print progress info for indexing when there is an error
in the BDD tests.
2021-03-20 10:39:29 +01:00
Sarah Hoffmann
ebae3553e0 bdd: run all setup via nominatim Python library
Drops all calls to PHP utility functions. nominatim cli functions
are used where possible, to stay as close to the final code as
possible with the tests.

By removing the PHP calls, the test code now only uses osm2pgsql and
the database module from the build directory.
2021-03-16 22:20:41 +01:00
Sarah Hoffmann
dd03aeb966 bdd: use python library where possible
Replace calls to PHP scripts with direct calls into the
nominatim Python library where possible. This speed up
tests quite a bit.
2021-02-26 16:14:29 +01:00
Sarah Hoffmann
73cbb6eb9a bdd: clean up DB ops steps
Adds comments and modernizes code.
2021-01-06 16:37:32 +01:00
Sarah Hoffmann
1f29475fa5 bdd: move column comparison in separate file
Introduces a new class DBRow that encapsulates the comparison
functions. This also is responsible for formatting more informative
assert messages. place and placex steps are unified.
2021-01-06 12:28:09 +01:00
Sarah Hoffmann
d586b95ff1 bdd: move nominitim id reader to separate file 2021-01-05 16:00:48 +01:00
Sarah Hoffmann
25557e5f14 bdd: factor out reindexing on updates 2021-01-05 15:17:46 +01:00
Sarah Hoffmann
197870e67a bdd: move place table inserter into separate file
Also simplifies usage by implementing a function that inserts
a complete table row.
2021-01-05 12:12:59 +01:00
Sarah Hoffmann
b8e39d2dde bdd: move scene setup to OSM data steps
The step has nothing to do with the database.
2021-01-05 11:42:28 +01:00
Sarah Hoffmann
5dfa76a610 bdd: switch to auto commit mode
Put the connection to the test database into auto-commit mode
and get rid of the explicit commits. Also use cursors always in
context managers and unify the two implementations that copy
data from the place table.
2021-01-05 11:42:28 +01:00
Sarah Hoffmann
58c471c627 bdd: remove class for lazy formatting
assert in combination with format() does the right thing and calls
the __str__() method only when an assertion hits.
2021-01-05 10:39:44 +01:00
Sarah Hoffmann
213bf7d19d bdd: rename db_ops steps
Now all files implementing steps are called steps_*.py.
2021-01-05 10:20:00 +01:00