Commit Graph

118 Commits

Author SHA1 Message Date
AntoJvlt
3206bf59df Resolve conflicts 2021-05-17 13:52:35 +02:00
AntoJvlt
8b8dfc46eb Added --no-replace command for special phrases importation and added corresponding tests 2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed Code cleaning and SPLoader deleted 2021-05-16 16:59:12 +02:00
Sarah Hoffmann
925726222f Merge pull request #2323 from darkshredder/disable-search-reverse-only
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
7d621389ee adapt tests to new TIGER CSV format 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c use tokenizer during Tiger data import
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5 feat: Added reverse-only-search validation 2021-05-14 02:36:21 +05:30
Sarah Hoffmann
5feece64c1 use WorkerPool for Tiger data import
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
63e35574d4 Merge pull request #2324 from lonvia/generic-external-postcodes
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7 fix token_info migration
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75 ignore invalid coordinates in external postcodes 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24 ignore entries without country code 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4 correctly handle removing all postcodes for country 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54 index postcodes after refreshing 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234 add and extend tests for new postcode handling 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83 move filling of postcode table to python
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.

External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
AntoJvlt
9d83da830f Introduction of SPCsvLoader to load special phrases from a csv file 2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57 Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader 2021-05-10 21:49:31 +02:00
Sarah Hoffmann
36c624ec71 commit between migrations
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
fc995ea6b9 move database check for module to tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057 boilerplate for PHP code of tokenizer
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
7cb7cf848d move amenity creation to tokenizer
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e move default country name creation to tokenizer
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e move postcode normalization into tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60 move houseunumber handling to tokenizer
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd add extra column for tokenizer
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399 move word table and normalisation SQL into tokenizer
Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c add migration for configurable tokenizer
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f move module installation to legacy tokenizer 2021-04-30 11:29:57 +02:00
Sarah Hoffmann
185d369404 remove support for AUX housenumber tables
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6 Merge pull request #2299 from lonvia/update-actions
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112 Merge pull request #2291 from AntoJvlt/special-phrases-statistics
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a do not check for extra housenumber index for reverse-only
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
4457bf7528 avoid Path in subprocess parameters
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
AntoJvlt
abb3d56b20 Switching to log info and only send warning for invalid phrases 2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0 Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class 2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2 reorganization of folder/file for the special phrases importer 2021-04-25 17:57:42 +02:00
Sarah Hoffmann
b951b11336 fix pylint complaints 2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9 pylint: disable check too-few-public-methods 2021-04-24 11:39:44 +02:00
Sarah Hoffmann
9c51c133f7 indexes with includes are not available for postgresql < 11 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7 use pathlib version of open 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99 subprocess needs string argument
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4 use more generic ImportError to check for module
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
79d55357e8 simplify sql and website creation functions 2021-04-19 10:53:30 +02:00
Sarah Hoffmann
4fa6c0ad53 simplify constructor for SQL preprocessor
Use sql path from config.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
8f63f9516b simplify interface for adding tiger data
Also simplifies tests using existing fixtures.
2021-04-19 10:26:25 +02:00
AntoJvlt
b2ae715699 Only log a warning if a wrong input is detected on the wiki while importing special phrases 2021-04-17 20:19:39 +02:00
AntoJvlt
a95c748363 Fix occurence regex 2021-04-17 19:24:13 +02:00
Sarah Hoffmann
886a01c796 port function to compute initial postcodes to Python 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
76b1885595 use absolute imports in Python code
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00