Sarah Hoffmann
16daa57e47
unify ICUNameProcessorRules and ICURuleLoader
...
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
231250f2eb
add wrapper class for place data passed to tokenizer
...
This is mostly for convenience and documentation purposes.
2021-09-29 11:54:07 +02:00
Sarah Hoffmann
8e1d4818ac
use yaml config loader for country info
2021-09-04 00:22:55 +02:00
Sarah Hoffmann
7e7dd769fd
remove language and partition from name import
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
79da96b369
read partition and languages from config file
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
78fcabade8
move country name generation to country_info module
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
284645f505
move generation of country tables in own module
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
87dedde5d6
allow multiple files for the import command
...
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
Sarah Hoffmann
e42349c963
replace add-data function with native Python code
2021-07-26 10:41:37 +02:00
Sarah Hoffmann
14f777da18
use psycopg's SQL quoting where possible
...
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67
add helper function for execute_values
...
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00
Sarah Hoffmann
06602b4ec0
provide wrapper function for DROP TABLE
...
Use psycopg2 formatting to ensure correct quoting.
2021-07-12 20:32:46 +02:00
Sarah Hoffmann
cf98cff2a1
more formatting fixes
...
Found by flake8.
2021-07-12 17:45:42 +02:00
Sarah Hoffmann
fff0012249
simplify website setup code
...
Use formaat strings and move variable quoting code into extra
function.
2021-07-12 11:41:05 +02:00
Sarah Hoffmann
d5a1883b62
avoid repeated patterns for table name
2021-07-12 11:33:09 +02:00
Sarah Hoffmann
a08ef43e40
simplify if statements
2021-07-12 11:28:47 +02:00
Sarah Hoffmann
3661f7a321
avoid multiple returns of same value
...
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
a2edbbf78a
cannot use capture_output in subprocess.run
...
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8413075249
move abbreviation computation into import phase
...
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
AntoJvlt
3676310efe
Improved performance of the postcodes query and some code cleaning
2021-06-12 15:46:08 +02:00
AntoJvlt
1c175e3a67
Clean and update tests for postcodes
2021-06-09 09:31:32 +02:00
AntoJvlt
47fb7cd3a8
Use place_exists() into can_compute() for postcodes
2021-06-09 09:31:32 +02:00
AntoJvlt
a4733eed90
Use place instead of placex to compute postcodes
2021-06-09 09:31:32 +02:00
AntoJvlt
799a4c9ab6
Documentation update and small code fixes
2021-05-18 22:35:21 +02:00
AntoJvlt
3206bf59df
Resolve conflicts
2021-05-17 13:52:35 +02:00
AntoJvlt
8b8dfc46eb
Added --no-replace command for special phrases importation and added corresponding tests
2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed
Code cleaning and SPLoader deleted
2021-05-16 16:59:12 +02:00
Sarah Hoffmann
925726222f
Merge pull request #2323 from darkshredder/disable-search-reverse-only
...
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
7d621389ee
adapt tests to new TIGER CSV format
2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c
use tokenizer during Tiger data import
...
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5
feat: Added reverse-only-search validation
2021-05-14 02:36:21 +05:30
Sarah Hoffmann
5feece64c1
use WorkerPool for Tiger data import
...
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
63e35574d4
Merge pull request #2324 from lonvia/generic-external-postcodes
...
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7
fix token_info migration
...
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75
ignore invalid coordinates in external postcodes
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24
ignore entries without country code
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4
correctly handle removing all postcodes for country
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54
index postcodes after refreshing
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234
add and extend tests for new postcode handling
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83
move filling of postcode table to python
...
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
AntoJvlt
9d83da830f
Introduction of SPCsvLoader to load special phrases from a csv file
2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57
Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader
2021-05-10 21:49:31 +02:00
Sarah Hoffmann
36c624ec71
commit between migrations
...
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
fc995ea6b9
move database check for module to tokenizer
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057
boilerplate for PHP code of tokenizer
...
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
7cb7cf848d
move amenity creation to tokenizer
...
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e
move default country name creation to tokenizer
...
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e
move postcode normalization into tokenizer
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60
move houseunumber handling to tokenizer
...
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd
add extra column for tokenizer
...
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00