Resolve conflicts

This commit is contained in:
AntoJvlt
2021-05-17 13:52:35 +02:00
37 changed files with 7285 additions and 6729 deletions

View File

@@ -11,3 +11,5 @@ ignored-modules=icu
# 'with' statements. # 'with' statements.
ignored-classes=NominatimArgs,closing ignored-classes=NominatimArgs,closing
disable=too-few-public-methods,duplicate-code disable=too-few-public-methods,duplicate-code
good-names=i,x,y,fd

View File

@@ -258,4 +258,5 @@ install(FILES settings/env.defaults
settings/import-address.style settings/import-address.style
settings/import-full.style settings/import-full.style
settings/import-extratags.style settings/import-extratags.style
settings/legacy_icu_tokenizer.json
DESTINATION ${NOMINATIM_CONFIGDIR}) DESTINATION ${NOMINATIM_CONFIGDIR})

View File

@@ -0,0 +1,71 @@
# Customization of the Database
This section explains in detail how to configure a Nominatim import and
the various means to use external data.
## External postcode data
Nominatim creates a table of known postcode centroids during import. This table
is used for searches of postcodes and for adding postcodes to places where the
OSM data does not provide one. These postcode centroids are mainly computed
from the OSM data itself. In addition, Nominatim supports reading postcode
information from an external CSV file, to supplement the postcodes that are
missing in OSM.
To enable external postcode support, simply put one CSV file per country into
your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
two-letter country code for which to apply the file. The file may also be
gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
The CSV file must use commas as a delimiter and have a header line. Nominatim
expects three columns to be present: `postcode`, `lat` and `lon`. All other
columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
postcode centroids in WGS84.
The postcode files are loaded only when there is data for the given country
in your database. For example, if there is a `us_postcodes.csv` file in your
project directory but you import only an excerpt of Italy, then the US postcodes
will simply be ignored.
As a rule, the external postcode data should be put into the project directory
**before** starting the initial import. Still, you can add, remove and update the
external postcode data at any time. Simply
run:
```
nominatim refresh --postcodes
```
to make the changes visible in your database. Be aware, however, that the changes
only have an immediate effect on searches for postcodes. Postcodes that were
added to places are only updated, when they are reindexed. That usually happens
only during replication updates.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2020 data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger2020-nominatim-preprocessed.csv.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger2020-nominatim-preprocessed.csv.tar.gz
3. Enable use of the Tiger data in your `.env` by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions
See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
information on how the data got preprocessed.

View File

@@ -83,15 +83,19 @@ The file is about 400MB and adds around 4GB to the Nominatim database.
`nominatim refresh --wiki-data --importance`. Updating importances for `nominatim refresh --wiki-data --importance`. Updating importances for
a planet can take a couple of hours. a planet can take a couple of hours.
### Great Britain, USA postcodes ### External postcodes
Nominatim can use postcodes from an external source to improve searches that Nominatim can use postcodes from an external source to improve searching with
involve a GB or US postcode. This data can be optionally downloaded into the postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
project directory: and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
This data can be optionally downloaded into the project directory:
cd $PROJECT_DIR cd $PROJECT_DIR
wget https://www.nominatim.org/data/gb_postcode_data.sql.gz wget https://www.nominatim.org/data/gb_postcodes.csv.gz
wget https://www.nominatim.org/data/us_postcode_data.sql.gz wget https://www.nominatim.org/data/us_postcodes.csv.gz
You can also add your own custom postcode sources, see
[Customization of postcodes](Customization.md#external-postcode-data).
## Choosing the data to import ## Choosing the data to import
@@ -248,6 +252,9 @@ to verify that your installation is working. Go to
`http://localhost:8088/status.php` and you should see the message `OK`. `http://localhost:8088/status.php` and you should see the message `OK`.
You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`. You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`.
Note that search query is not supported for reverse-only imports. You can run a
reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=78.04209025`.
To run Nominatim via webservers like Apache or nginx, please read the To run Nominatim via webservers like Apache or nginx, please read the
[Deployment chapter](Deployment.md). [Deployment chapter](Deployment.md).

View File

@@ -19,6 +19,7 @@ pages:
- 'Import' : 'admin/Import.md' - 'Import' : 'admin/Import.md'
- 'Update' : 'admin/Update.md' - 'Update' : 'admin/Update.md'
- 'Deploy' : 'admin/Deployment.md' - 'Deploy' : 'admin/Deployment.md'
- 'Customize Imports' : 'admin/Customization.md'
- 'Nominatim UI' : 'admin/Setup-Nominatim-UI.md' - 'Nominatim UI' : 'admin/Setup-Nominatim-UI.md'
- 'Advanced Installations' : 'admin/Advanced-Installations.md' - 'Advanced Installations' : 'admin/Advanced-Installations.md'
- 'Migration from older Versions' : 'admin/Migration.md' - 'Migration from older Versions' : 'admin/Migration.md'

View File

@@ -80,7 +80,6 @@ class AddressDetails
} }
if (isset($sName)) { if (isset($sName)) {
$sTypeLabel = strtolower(str_replace(' ', '_', $sTypeLabel));
if (!isset($aAddress[$sTypeLabel]) if (!isset($aAddress[$sTypeLabel])
|| $aLine['class'] == 'place' || $aLine['class'] == 'place'
) { ) {

View File

@@ -227,3 +227,10 @@ function closestHouseNumber($aRow)
return max(min($aRow['endnumber'], $iHn), $aRow['startnumber']); return max(min($aRow['endnumber'], $iHn), $aRow['startnumber']);
} }
if (!function_exists('array_key_last')) {
function array_key_last(array $array)
{
if (!empty($array)) return key(array_slice($array, -1, 1, true));
}
}

View File

@@ -0,0 +1,12 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/ParameterParser.php');
$oParams = new Nominatim\ParameterParser();
// Format for output
$sOutputFormat = $oParams->getSet('format', array('xml', 'json', 'jsonv2', 'geojson', 'geocodejson'), 'jsonv2');
set_exception_handler_by_format($sOutputFormat);
throw new Exception('Reverse-only import does not support forward searching.', 404);

View File

@@ -12,4 +12,6 @@ ALTER TABLE location_property_tiger_import RENAME TO location_property_tiger;
ALTER INDEX IF EXISTS idx_location_property_tiger_parent_place_id_imp RENAME TO idx_location_property_tiger_housenumber_parent_place_id; ALTER INDEX IF EXISTS idx_location_property_tiger_parent_place_id_imp RENAME TO idx_location_property_tiger_housenumber_parent_place_id;
ALTER INDEX IF EXISTS idx_location_property_tiger_place_id_imp RENAME TO idx_location_property_tiger_place_id; ALTER INDEX IF EXISTS idx_location_property_tiger_place_id_imp RENAME TO idx_location_property_tiger_place_id;
DROP FUNCTION tiger_line_import (linegeo geometry, in_startnumber integer, in_endnumber integer, interpolationtype text, in_street text, in_isin text, in_postcode text); DROP FUNCTION tiger_line_import (linegeo GEOMETRY, in_startnumber INTEGER,
in_endnumber INTEGER, interpolationtype TEXT,
token_info JSONB, in_postcode TEXT);

View File

@@ -1,9 +1,9 @@
DROP TABLE IF EXISTS location_property_tiger_import; DROP TABLE IF EXISTS location_property_tiger_import;
CREATE TABLE location_property_tiger_import (linegeo GEOMETRY, place_id BIGINT, partition INTEGER, parent_place_id BIGINT, startnumber INTEGER, endnumber INTEGER, interpolationtype TEXT, postcode TEXT); CREATE TABLE location_property_tiger_import (linegeo GEOMETRY, place_id BIGINT, partition INTEGER, parent_place_id BIGINT, startnumber INTEGER, endnumber INTEGER, interpolationtype TEXT, postcode TEXT);
CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, in_startnumber INTEGER, CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, in_startnumber INTEGER,
in_endnumber INTEGER, interpolationtype TEXT, in_endnumber INTEGER, interpolationtype TEXT,
in_street TEXT, in_isin TEXT, in_postcode TEXT) RETURNS INTEGER token_info JSONB, in_postcode TEXT) RETURNS INTEGER
AS $$ AS $$
DECLARE DECLARE
startnumber INTEGER; startnumber INTEGER;
@@ -27,13 +27,13 @@ BEGIN
END IF; END IF;
IF startnumber < 0 THEN IF startnumber < 0 THEN
RAISE WARNING 'Negative house number range (% to %) on %, %', startnumber, endnumber, in_street, in_isin; RAISE WARNING 'Negative house number range (% to %)', startnumber, endnumber;
RETURN 0; RETURN 0;
END IF; END IF;
numberrange := endnumber - startnumber; numberrange := endnumber - startnumber;
IF (interpolationtype = 'odd' AND startnumber%2 = 0) OR (interpolationtype = 'even' AND startnumber%2 = 1) THEN IF (interpolationtype = 'odd' AND startnumber % 2 = 0) OR (interpolationtype = 'even' AND startnumber % 2 = 1) THEN
startnumber := startnumber + 1; startnumber := startnumber + 1;
stepsize := 2; stepsize := 2;
ELSE ELSE
@@ -45,10 +45,10 @@ BEGIN
END IF; END IF;
-- Filter out really broken tiger data -- Filter out really broken tiger data
IF numberrange > 0 AND (numberrange::float/stepsize::float > 500) IF numberrange > 0 AND (numberrange::float/stepsize::float > 500)
AND ST_length(linegeo)/(numberrange::float/stepsize::float) < 0.000001 THEN AND ST_length(linegeo)/(numberrange::float/stepsize::float) < 0.000001 THEN
RAISE WARNING 'Road too short for number range % to % on %, % (%)',startnumber,endnumber,in_street,in_isin, RAISE WARNING 'Road too short for number range % to % (%)',startnumber,endnumber,
ST_length(linegeo)/(numberrange::float/stepsize::float); ST_length(linegeo)/(numberrange::float/stepsize::float);
RETURN 0; RETURN 0;
END IF; END IF;
@@ -56,7 +56,7 @@ BEGIN
out_partition := get_partition('us'); out_partition := get_partition('us');
out_parent_place_id := null; out_parent_place_id := null;
address_street_word_ids := word_ids_from_name(in_street); address_street_word_ids := token_addr_street_match_tokens(token_info);
IF address_street_word_ids IS NOT NULL THEN IF address_street_word_ids IS NOT NULL THEN
out_parent_place_id := getNearestNamedRoadPlaceId(out_partition, place_centroid, out_parent_place_id := getNearestNamedRoadPlaceId(out_partition, place_centroid,
address_street_word_ids); address_street_word_ids);

View File

@@ -1,58 +0,0 @@
-- Create a temporary table with postcodes from placex.
CREATE TEMP TABLE tmp_new_postcode_locations AS
SELECT country_code,
upper(trim (both ' ' from address->'postcode')) as pc,
ST_Centroid(ST_Collect(ST_Centroid(geometry))) as centroid
FROM placex
WHERE address ? 'postcode'
AND address->'postcode' NOT SIMILAR TO '%(,|;|:)%'
AND geometry IS NOT null
GROUP BY country_code, pc;
CREATE INDEX idx_tmp_new_postcode_locations
ON tmp_new_postcode_locations (pc, country_code);
-- add extra US postcodes
INSERT INTO tmp_new_postcode_locations (country_code, pc, centroid)
SELECT 'us', postcode, ST_SetSRID(ST_Point(x,y),4326)
FROM us_postcode u
WHERE NOT EXISTS (SELECT 0 FROM tmp_new_postcode_locations new
WHERE new.country_code = 'us' AND new.pc = u.postcode);
-- add extra UK postcodes
INSERT INTO tmp_new_postcode_locations (country_code, pc, centroid)
SELECT 'gb', postcode, geometry FROM gb_postcode g
WHERE NOT EXISTS (SELECT 0 FROM tmp_new_postcode_locations new
WHERE new.country_code = 'gb' and new.pc = g.postcode);
-- Remove all postcodes that are no longer valid
DELETE FROM location_postcode old
WHERE NOT EXISTS(SELECT 0 FROM tmp_new_postcode_locations new
WHERE old.postcode = new.pc
AND old.country_code = new.country_code);
-- Update geometries where necessary
UPDATE location_postcode old SET geometry = new.centroid, indexed_status = 1
FROM tmp_new_postcode_locations new
WHERE old.postcode = new.pc AND old.country_code = new.country_code
AND ST_AsText(old.geometry) != ST_AsText(new.centroid);
-- Remove all postcodes that already exist from the temporary table
DELETE FROM tmp_new_postcode_locations new
WHERE EXISTS(SELECT 0 FROM location_postcode old
WHERE old.postcode = new.pc AND old.country_code = new.country_code);
-- Add newly added postcode
INSERT INTO location_postcode
(place_id, indexed_status, country_code, postcode, geometry)
SELECT nextval('seq_place'), 1, country_code, pc, centroid
FROM tmp_new_postcode_locations new;
-- Remove unused word entries
DELETE FROM word
WHERE class = 'place' AND type = 'postcode'
AND NOT EXISTS (SELECT 0 FROM location_postcode p
WHERE p.postcode = word.word);
-- Finally index the newly inserted postcodes
UPDATE location_postcode SET indexed_status = 0 WHERE indexed_status > 0;

View File

@@ -13,7 +13,6 @@ from nominatim.tools.exec_utils import run_legacy_script, run_php_server
from nominatim.errors import UsageError from nominatim.errors import UsageError
from nominatim import clicmd from nominatim import clicmd
from nominatim.clicmd.args import NominatimArgs from nominatim.clicmd.args import NominatimArgs
from nominatim.tools import tiger_data
LOG = logging.getLogger() LOG = logging.getLogger()
@@ -147,9 +146,14 @@ class UpdateAddData:
@staticmethod @staticmethod
def run(args): def run(args):
from nominatim.tokenizer import factory as tokenizer_factory
from nominatim.tools import tiger_data
if args.tiger_data: if args.tiger_data:
tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
return tiger_data.add_tiger_data(args.tiger_data, return tiger_data.add_tiger_data(args.tiger_data,
args.config, args.threads or 1) args.config, args.threads or 1,
tokenizer)
params = ['update.php'] params = ['update.php']
if args.file: if args.file:

View File

@@ -45,12 +45,19 @@ class UpdateRefresh:
@staticmethod @staticmethod
def run(args): def run(args):
from ..tools import refresh from ..tools import refresh, postcodes
from ..tokenizer import factory as tokenizer_factory from ..tokenizer import factory as tokenizer_factory
from ..indexer.indexer import Indexer
tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
if args.postcodes: if args.postcodes:
LOG.warning("Update postcodes centroid") LOG.warning("Update postcodes centroid")
refresh.update_postcodes(args.config.get_libpq_dsn(), args.sqllib_dir) postcodes.update_postcodes(args.config.get_libpq_dsn(),
args.project_dir, tokenizer)
indexer = Indexer(args.config.get_libpq_dsn(), tokenizer,
args.threads or 1)
indexer.index_postcodes()
if args.word_counts: if args.word_counts:
LOG.warning('Recompute frequency of full-word search terms') LOG.warning('Recompute frequency of full-word search terms')
@@ -67,7 +74,6 @@ class UpdateRefresh:
with connect(args.config.get_libpq_dsn()) as conn: with connect(args.config.get_libpq_dsn()) as conn:
refresh.create_functions(conn, args.config, refresh.create_functions(conn, args.config,
args.diffs, args.enable_debug_statements) args.diffs, args.enable_debug_statements)
tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
tokenizer.update_sql_functions(args.config) tokenizer.update_sql_functions(args.config)
if args.wiki_data: if args.wiki_data:
@@ -88,6 +94,6 @@ class UpdateRefresh:
if args.website: if args.website:
webdir = args.project_dir / 'website' webdir = args.project_dir / 'website'
LOG.warning('Setting up website directory at %s', webdir) LOG.warning('Setting up website directory at %s', webdir)
refresh.setup_website(webdir, args.config) with connect(args.config.get_libpq_dsn()) as conn:
refresh.setup_website(webdir, args.config, conn)
return 0 return 0

View File

@@ -116,8 +116,8 @@ class SetupAll:
if args.continue_at is None or args.continue_at == 'load-data': if args.continue_at is None or args.continue_at == 'load-data':
LOG.warning('Calculate postcodes') LOG.warning('Calculate postcodes')
postcodes.import_postcodes(args.config.get_libpq_dsn(), args.project_dir, postcodes.update_postcodes(args.config.get_libpq_dsn(),
tokenizer) args.project_dir, tokenizer)
if args.continue_at is None or args.continue_at in ('load-data', 'indexing'): if args.continue_at is None or args.continue_at in ('load-data', 'indexing'):
if args.continue_at is not None and args.continue_at != 'load-data': if args.continue_at is not None and args.continue_at != 'load-data':
@@ -139,7 +139,8 @@ class SetupAll:
webdir = args.project_dir / 'website' webdir = args.project_dir / 'website'
LOG.warning('Setup website at %s', webdir) LOG.warning('Setup website at %s', webdir)
refresh.setup_website(webdir, args.config) with connect(args.config.get_libpq_dsn()) as conn:
refresh.setup_website(webdir, args.config, conn)
with connect(args.config.get_libpq_dsn()) as conn: with connect(args.config.get_libpq_dsn()) as conn:
try: try:

View File

@@ -6,6 +6,9 @@
""" Database helper functions for the indexer. """ Database helper functions for the indexer.
""" """
import logging import logging
import select
import time
import psycopg2 import psycopg2
from psycopg2.extras import wait_select from psycopg2.extras import wait_select
@@ -25,8 +28,9 @@ class DeadlockHandler:
normally. normally.
""" """
def __init__(self, handler): def __init__(self, handler, ignore_sql_errors=False):
self.handler = handler self.handler = handler
self.ignore_sql_errors = ignore_sql_errors
def __enter__(self): def __enter__(self):
pass pass
@@ -41,6 +45,11 @@ class DeadlockHandler:
if exc_value.pgcode == '40P01': if exc_value.pgcode == '40P01':
self.handler() self.handler()
return True return True
if self.ignore_sql_errors and isinstance(exc_value, psycopg2.Error):
LOG.info("SQL error ignored: %s", exc_value)
return True
return False return False
@@ -48,10 +57,11 @@ class DBConnection:
""" A single non-blocking database connection. """ A single non-blocking database connection.
""" """
def __init__(self, dsn, cursor_factory=None): def __init__(self, dsn, cursor_factory=None, ignore_sql_errors=False):
self.current_query = None self.current_query = None
self.current_params = None self.current_params = None
self.dsn = dsn self.dsn = dsn
self.ignore_sql_errors = ignore_sql_errors
self.conn = None self.conn = None
self.cursor = None self.cursor = None
@@ -98,7 +108,7 @@ class DBConnection:
""" Block until any pending operation is done. """ Block until any pending operation is done.
""" """
while True: while True:
with DeadlockHandler(self._deadlock_handler): with DeadlockHandler(self._deadlock_handler, self.ignore_sql_errors):
wait_select(self.conn) wait_select(self.conn)
self.current_query = None self.current_query = None
return return
@@ -125,9 +135,78 @@ class DBConnection:
if self.current_query is None: if self.current_query is None:
return True return True
with DeadlockHandler(self._deadlock_handler): with DeadlockHandler(self._deadlock_handler, self.ignore_sql_errors):
if self.conn.poll() == psycopg2.extensions.POLL_OK: if self.conn.poll() == psycopg2.extensions.POLL_OK:
self.current_query = None self.current_query = None
return True return True
return False return False
class WorkerPool:
""" A pool of asynchronous database connections.
The pool may be used as a context manager.
"""
REOPEN_CONNECTIONS_AFTER = 100000
def __init__(self, dsn, pool_size, ignore_sql_errors=False):
self.threads = [DBConnection(dsn, ignore_sql_errors=ignore_sql_errors)
for _ in range(pool_size)]
self.free_workers = self._yield_free_worker()
self.wait_time = 0
def finish_all(self):
""" Wait for all connection to finish.
"""
for thread in self.threads:
while not thread.is_done():
thread.wait()
self.free_workers = self._yield_free_worker()
def close(self):
""" Close all connections and clear the pool.
"""
for thread in self.threads:
thread.close()
self.threads = []
self.free_workers = None
def next_free_worker(self):
""" Get the next free connection.
"""
return next(self.free_workers)
def _yield_free_worker(self):
ready = self.threads
command_stat = 0
while True:
for thread in ready:
if thread.is_done():
command_stat += 1
yield thread
if command_stat > self.REOPEN_CONNECTIONS_AFTER:
for thread in self.threads:
while not thread.is_done():
thread.wait()
thread.connect()
ready = self.threads
command_stat = 0
else:
tstart = time.time()
_, ready, _ = select.select([], self.threads, [])
self.wait_time += time.time() - tstart
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.finish_all()
self.close()

View File

@@ -2,14 +2,13 @@
Main work horse for indexing (computing addresses) the database. Main work horse for indexing (computing addresses) the database.
""" """
import logging import logging
import select
import time import time
import psycopg2.extras import psycopg2.extras
from nominatim.indexer.progress import ProgressLogger from nominatim.indexer.progress import ProgressLogger
from nominatim.indexer import runners from nominatim.indexer import runners
from nominatim.db.async_connection import DBConnection from nominatim.db.async_connection import DBConnection, WorkerPool
from nominatim.db.connection import connect from nominatim.db.connection import connect
LOG = logging.getLogger() LOG = logging.getLogger()
@@ -81,73 +80,6 @@ class PlaceFetcher:
self.conn.wait() self.conn.wait()
self.close() self.close()
class WorkerPool:
""" A pool of asynchronous database connections.
The pool may be used as a context manager.
"""
REOPEN_CONNECTIONS_AFTER = 100000
def __init__(self, dsn, pool_size):
self.threads = [DBConnection(dsn) for _ in range(pool_size)]
self.free_workers = self._yield_free_worker()
self.wait_time = 0
def finish_all(self):
""" Wait for all connection to finish.
"""
for thread in self.threads:
while not thread.is_done():
thread.wait()
self.free_workers = self._yield_free_worker()
def close(self):
""" Close all connections and clear the pool.
"""
for thread in self.threads:
thread.close()
self.threads = []
self.free_workers = None
def next_free_worker(self):
""" Get the next free connection.
"""
return next(self.free_workers)
def _yield_free_worker(self):
ready = self.threads
command_stat = 0
while True:
for thread in ready:
if thread.is_done():
command_stat += 1
yield thread
if command_stat > self.REOPEN_CONNECTIONS_AFTER:
for thread in self.threads:
while not thread.is_done():
thread.wait()
thread.connect()
ready = self.threads
command_stat = 0
else:
tstart = time.time()
_, ready, _ = select.select([], self.threads, [])
self.wait_time += time.time() - tstart
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.finish_all()
self.close()
class Indexer: class Indexer:
""" Main indexing routine. """ Main indexing routine.

View File

@@ -263,6 +263,16 @@ class LegacyICUNameAnalyzer:
""" """
return self.normalizer.transliterate(phrase) return self.normalizer.transliterate(phrase)
@staticmethod
def normalize_postcode(postcode):
""" Convert the postcode to a standardized form.
This function must yield exactly the same result as the SQL function
'token_normalized_postcode()'.
"""
return postcode.strip().upper()
@functools.lru_cache(maxsize=1024) @functools.lru_cache(maxsize=1024)
def make_standard_word(self, name): def make_standard_word(self, name):
""" Create the normalised version of the input. """ Create the normalised version of the input.
@@ -285,25 +295,44 @@ class LegacyICUNameAnalyzer:
return self.transliterator.transliterate(hnr) return self.transliterator.transliterate(hnr)
def add_postcodes_from_db(self): def update_postcodes_from_db(self):
""" Add postcodes from the location_postcode table to the word table. """ Update postcode tokens in the word table from the location_postcode
table.
""" """
to_delete = []
copystr = io.StringIO() copystr = io.StringIO()
with self.conn.cursor() as cur: with self.conn.cursor() as cur:
cur.execute("SELECT distinct(postcode) FROM location_postcode") # This finds us the rows in location_postcode and word that are
for (postcode, ) in cur: # missing in the other table.
copystr.write(postcode) cur.execute("""SELECT * FROM
copystr.write('\t ') (SELECT pc, word FROM
copystr.write(self.transliterator.transliterate(postcode)) (SELECT distinct(postcode) as pc FROM location_postcode) p
copystr.write('\tplace\tpostcode\t0\n') FULL JOIN
(SELECT word FROM word
WHERE class ='place' and type = 'postcode') w
ON pc = word) x
WHERE pc is null or word is null""")
copystr.seek(0) for postcode, word in cur:
cur.copy_from(copystr, 'word', if postcode is None:
columns=['word', 'word_token', 'class', 'type', to_delete.append(word)
'search_name_count']) else:
# Don't really need an ID for postcodes.... copystr.write(postcode)
# cur.execute("""UPDATE word SET word_id = nextval('seq_word') copystr.write('\t ')
# WHERE word_id is null and type = 'postcode'""") copystr.write(self.transliterator.transliterate(postcode))
copystr.write('\tplace\tpostcode\t0\n')
if to_delete:
cur.execute("""DELETE FROM WORD
WHERE class ='place' and type = 'postcode'
and word = any(%s)
""", (to_delete, ))
if copystr.getvalue():
copystr.seek(0)
cur.copy_from(copystr, 'word',
columns=['word', 'word_token', 'class', 'type',
'search_name_count'])
def update_special_phrases(self, phrases, should_replace): def update_special_phrases(self, phrases, should_replace):
@@ -435,22 +464,25 @@ class LegacyICUNameAnalyzer:
def _add_postcode(self, postcode): def _add_postcode(self, postcode):
""" Make sure the normalized postcode is present in the word table. """ Make sure the normalized postcode is present in the word table.
""" """
if re.search(r'[:,;]', postcode) is None and not postcode in self._cache.postcodes: if re.search(r'[:,;]', postcode) is None:
term = self.make_standard_word(postcode) postcode = self.normalize_postcode(postcode)
if not term:
return
with self.conn.cursor() as cur: if postcode not in self._cache.postcodes:
# no word_id needed for postcodes term = self.make_standard_word(postcode)
cur.execute("""INSERT INTO word (word, word_token, class, type, if not term:
search_name_count) return
(SELECT pc, %s, 'place', 'postcode', 0
FROM (VALUES (%s)) as v(pc) with self.conn.cursor() as cur:
WHERE NOT EXISTS # no word_id needed for postcodes
(SELECT * FROM word cur.execute("""INSERT INTO word (word, word_token, class, type,
WHERE word = pc and class='place' and type='postcode')) search_name_count)
""", (' ' + term, postcode)) (SELECT pc, %s, 'place', 'postcode', 0
self._cache.postcodes.add(postcode) FROM (VALUES (%s)) as v(pc)
WHERE NOT EXISTS
(SELECT * FROM word
WHERE word = pc and class='place' and type='postcode'))
""", (' ' + term, postcode))
self._cache.postcodes.add(postcode)
@staticmethod @staticmethod
def _split_housenumbers(hnrs): def _split_housenumbers(hnrs):

View File

@@ -305,13 +305,51 @@ class LegacyNameAnalyzer:
return self.normalizer.transliterate(phrase) return self.normalizer.transliterate(phrase)
def add_postcodes_from_db(self): @staticmethod
""" Add postcodes from the location_postcode table to the word table. def normalize_postcode(postcode):
""" Convert the postcode to a standardized form.
This function must yield exactly the same result as the SQL function
'token_normalized_postcode()'.
"""
return postcode.strip().upper()
def update_postcodes_from_db(self):
""" Update postcode tokens in the word table from the location_postcode
table.
""" """
with self.conn.cursor() as cur: with self.conn.cursor() as cur:
cur.execute("""SELECT count(create_postcode_id(pc)) # This finds us the rows in location_postcode and word that are
FROM (SELECT distinct(postcode) as pc # missing in the other table.
FROM location_postcode) x""") cur.execute("""SELECT * FROM
(SELECT pc, word FROM
(SELECT distinct(postcode) as pc FROM location_postcode) p
FULL JOIN
(SELECT word FROM word
WHERE class ='place' and type = 'postcode') w
ON pc = word) x
WHERE pc is null or word is null""")
to_delete = []
to_add = []
for postcode, word in cur:
if postcode is None:
to_delete.append(word)
else:
to_add.append(postcode)
if to_delete:
cur.execute("""DELETE FROM WORD
WHERE class ='place' and type = 'postcode'
and word = any(%s)
""", (to_delete, ))
if to_add:
cur.execute("""SELECT count(create_postcode_id(pc))
FROM unnest(%s) as pc
""", (to_add, ))
def update_special_phrases(self, phrases, should_replace): def update_special_phrases(self, phrases, should_replace):
@@ -416,12 +454,8 @@ class LegacyNameAnalyzer:
def _add_postcode(self, postcode): def _add_postcode(self, postcode):
""" Make sure the normalized postcode is present in the word table. """ Make sure the normalized postcode is present in the word table.
""" """
def _create_postcode_from_db(pcode):
with self.conn.cursor() as cur:
cur.execute('SELECT create_postcode_id(%s)', (pcode, ))
if re.search(r'[:,;]', postcode) is None: if re.search(r'[:,;]', postcode) is None:
self._cache.postcodes.get(postcode.strip().upper(), _create_postcode_from_db) self._cache.add_postcode(self.conn, self.normalize_postcode(postcode))
class _TokenInfo: class _TokenInfo:
@@ -552,16 +586,19 @@ class _TokenCache:
FROM generate_series(1, 100) as i""") FROM generate_series(1, 100) as i""")
self._cached_housenumbers = {str(r[0]) : r[1] for r in cur} self._cached_housenumbers = {str(r[0]) : r[1] for r in cur}
# Get postcodes that are already saved # For postcodes remember the ones that have already been added
postcodes = OrderedDict() self.postcodes = set()
with conn.cursor() as cur:
cur.execute("""SELECT word FROM word
WHERE class ='place' and type = 'postcode'""")
for row in cur:
postcodes[row[0]] = None
self.postcodes = _LRU(maxsize=32, init_data=postcodes)
def get_housenumber(self, number): def get_housenumber(self, number):
""" Get a housenumber token from the cache. """ Get a housenumber token from the cache.
""" """
return self._cached_housenumbers.get(number) return self._cached_housenumbers.get(number)
def add_postcode(self, conn, postcode):
""" Make sure the given postcode is in the database.
"""
if postcode not in self.postcodes:
with conn.cursor() as cur:
cur.execute('SELECT create_postcode_id(%s)', (postcode, ))
self.postcodes.add(postcode)

View File

@@ -185,8 +185,8 @@ def install_legacy_tokenizer(conn, config, **_):
WHERE table_name = %s WHERE table_name = %s
and column_name = 'token_info'""", and column_name = 'token_info'""",
(table, )) (table, ))
if has_column == 0: if has_column == 0:
cur.execute('ALTER TABLE {} ADD COLUMN token_info JSONB'.format(table)) cur.execute('ALTER TABLE {} ADD COLUMN token_info JSONB'.format(table))
tokenizer = tokenizer_factory.create_tokenizer(config, init_db=False, tokenizer = tokenizer_factory.create_tokenizer(config, init_db=False,
module_name='legacy') module_name='legacy')

View File

@@ -2,80 +2,196 @@
Functions for importing, updating and otherwise maintaining the table Functions for importing, updating and otherwise maintaining the table
of artificial postcode centroids. of artificial postcode centroids.
""" """
import csv
import gzip
import logging
from math import isfinite
from psycopg2.extras import execute_values
from nominatim.db.utils import execute_file
from nominatim.db.connection import connect from nominatim.db.connection import connect
def import_postcodes(dsn, project_dir, tokenizer): LOG = logging.getLogger()
""" Set up the initial list of postcodes.
def _to_float(num, max_value):
""" Convert the number in string into a float. The number is expected
to be in the range of [-max_value, max_value]. Otherwise rises a
ValueError.
"""
num = float(num)
if not isfinite(num) or num <= -max_value or num >= max_value:
raise ValueError()
return num
class _CountryPostcodesCollector:
""" Collector for postcodes of a single country.
""" """
with connect(dsn) as conn: def __init__(self, country):
conn.drop_table('gb_postcode') self.country = country
conn.drop_table('us_postcode') self.collected = dict()
def add(self, postcode, x, y):
""" Add the given postcode to the collection cache. If the postcode
already existed, it is overwritten with the new centroid.
"""
self.collected[postcode] = (x, y)
def commit(self, conn, analyzer, project_dir):
""" Update postcodes for the country from the postcodes selected so far
as well as any externally supplied postcodes.
"""
self._update_from_external(analyzer, project_dir)
to_add, to_delete, to_update = self._compute_changes(conn)
LOG.info("Processing country '%s' (%s added, %s deleted, %s updated).",
self.country, len(to_add), len(to_delete), len(to_update))
with conn.cursor() as cur: with conn.cursor() as cur:
cur.execute("""CREATE TABLE gb_postcode ( if to_add:
id integer, execute_values(cur,
postcode character varying(9), """INSERT INTO location_postcode
geometry GEOMETRY(Point, 4326))""") (place_id, indexed_status, country_code,
postcode, geometry) VALUES %s""",
to_add,
template="""(nextval('seq_place'), 1, '{}',
%s, 'SRID=4326;POINT(%s %s)')
""".format(self.country))
if to_delete:
cur.execute("""DELETE FROM location_postcode
WHERE country_code = %s and postcode = any(%s)
""", (self.country, to_delete))
if to_update:
execute_values(cur,
"""UPDATE location_postcode
SET indexed_status = 2,
geometry = ST_SetSRID(ST_Point(v.x, v.y), 4326)
FROM (VALUES %s) AS v (pc, x, y)
WHERE country_code = '{}' and postcode = pc
""".format(self.country),
to_update)
def _compute_changes(self, conn):
""" Compute which postcodes from the collected postcodes have to be
added or modified and which from the location_postcode table
have to be deleted.
"""
to_update = []
to_delete = []
with conn.cursor() as cur: with conn.cursor() as cur:
cur.execute("""CREATE TABLE us_postcode ( cur.execute("""SELECT postcode, ST_X(geometry), ST_Y(geometry)
postcode text, FROM location_postcode
x double precision, WHERE country_code = %s""",
y double precision)""") (self.country, ))
conn.commit() for postcode, x, y in cur:
newx, newy = self.collected.pop(postcode, (None, None))
if newx is not None:
dist = (x - newx)**2 + (y - newy)**2
if dist > 0.0000001:
to_update.append((postcode, newx, newy))
else:
to_delete.append(postcode)
gb_postcodes = project_dir / 'gb_postcode_data.sql.gz' to_add = [(k, v[0], v[1]) for k, v in self.collected.items()]
if gb_postcodes.is_file(): self.collected = []
execute_file(dsn, gb_postcodes)
us_postcodes = project_dir / 'us_postcode_data.sql.gz' return to_add, to_delete, to_update
if us_postcodes.is_file():
execute_file(dsn, us_postcodes)
with conn.cursor() as cur:
cur.execute("TRUNCATE location_postcode")
cur.execute("""
INSERT INTO location_postcode
(place_id, indexed_status, country_code, postcode, geometry)
SELECT nextval('seq_place'), 1, country_code,
token_normalized_postcode(address->'postcode') as pc,
ST_Centroid(ST_Collect(ST_Centroid(geometry)))
FROM placex
WHERE address ? 'postcode'
and token_normalized_postcode(address->'postcode') is not null
AND geometry IS NOT null
GROUP BY country_code, pc
""")
cur.execute(""" def _update_from_external(self, analyzer, project_dir):
INSERT INTO location_postcode """ Look for an external postcode file for the active country in
(place_id, indexed_status, country_code, postcode, geometry) the project directory and add missing postcodes when found.
SELECT nextval('seq_place'), 1, 'us', """
token_normalized_postcode(postcode), csvfile = self._open_external(project_dir)
ST_SetSRID(ST_Point(x,y),4326) if csvfile is None:
FROM us_postcode WHERE token_normalized_postcode(postcode) NOT IN return
(SELECT postcode FROM location_postcode
WHERE country_code = 'us')
""")
cur.execute(""" try:
INSERT INTO location_postcode reader = csv.DictReader(csvfile)
(place_id, indexed_status, country_code, postcode, geometry) for row in reader:
SELECT nextval('seq_place'), 1, 'gb', if 'postcode' not in row or 'lat' not in row or 'lon' not in row:
token_normalized_postcode(postcode), geometry LOG.warning("Bad format for external postcode file for country '%s'."
FROM gb_postcode WHERE token_normalized_postcode(postcode) NOT IN " Ignored.", self.country)
(SELECT postcode FROM location_postcode return
WHERE country_code = 'gb') postcode = analyzer.normalize_postcode(row['postcode'])
""") if postcode not in self.collected:
try:
self.collected[postcode] = (_to_float(row['lon'], 180),
_to_float(row['lat'], 90))
except ValueError:
LOG.warning("Bad coordinates %s, %s in %s country postcode file.",
row['lat'], row['lon'], self.country)
cur.execute(""" finally:
DELETE FROM word WHERE class='place' and type='postcode' csvfile.close()
and word NOT IN (SELECT postcode FROM location_postcode)
""")
conn.commit()
with tokenizer.name_analyzer() as analyzer:
analyzer.add_postcodes_from_db() def _open_external(self, project_dir):
fname = project_dir / '{}_postcodes.csv'.format(self.country)
if fname.is_file():
LOG.info("Using external postcode file '%s'.", fname)
return open(fname, 'r')
fname = project_dir / '{}_postcodes.csv.gz'.format(self.country)
if fname.is_file():
LOG.info("Using external postcode file '%s'.", fname)
return gzip.open(fname, 'rt')
return None
def update_postcodes(dsn, project_dir, tokenizer):
""" Update the table of artificial postcodes.
Computes artificial postcode centroids from the placex table,
potentially enhances it with external data and then updates the
postcodes in the table 'location_postcode'.
"""
with tokenizer.name_analyzer() as analyzer:
with connect(dsn) as conn:
# First get the list of countries that currently have postcodes.
# (Doing this before starting to insert, so it is fast on import.)
with conn.cursor() as cur:
cur.execute("SELECT DISTINCT country_code FROM location_postcode")
todo_countries = set((row[0] for row in cur))
# Recompute the list of valid postcodes from placex.
with conn.cursor(name="placex_postcodes") as cur:
cur.execute("""SELECT country_code, pc, ST_X(centroid), ST_Y(centroid)
FROM (
SELECT country_code,
token_normalized_postcode(address->'postcode') as pc,
ST_Centroid(ST_Collect(ST_Centroid(geometry))) as centroid
FROM placex
WHERE address ? 'postcode' and geometry IS NOT null
and country_code is not null
GROUP BY country_code, pc) xx
WHERE pc is not null
ORDER BY country_code, pc""")
collector = None
for country, postcode, x, y in cur:
if collector is None or country != collector.country:
if collector is not None:
collector.commit(conn, analyzer, project_dir)
collector = _CountryPostcodesCollector(country)
todo_countries.discard(country)
collector.add(postcode, x, y)
if collector is not None:
collector.commit(conn, analyzer, project_dir)
# Now handle any countries that are only in the postcode table.
for country in todo_countries:
_CountryPostcodesCollector(country).commit(conn, analyzer, project_dir)
conn.commit()
analyzer.update_postcodes_from_db()

View File

@@ -13,12 +13,6 @@ from nominatim.version import NOMINATIM_VERSION
LOG = logging.getLogger() LOG = logging.getLogger()
def update_postcodes(dsn, sql_dir):
""" Recalculate postcode centroids and add, remove and update entries in the
location_postcode table. `conn` is an opne connection to the database.
"""
execute_file(dsn, sql_dir / 'update-postcodes.sql')
def recompute_word_counts(dsn, sql_dir): def recompute_word_counts(dsn, sql_dir):
""" Compute the frequency of full-word search terms. """ Compute the frequency of full-word search terms.
@@ -161,7 +155,7 @@ def recompute_importance(conn):
conn.commit() conn.commit()
def setup_website(basedir, config): def setup_website(basedir, config, conn):
""" Create the website script stubs. """ Create the website script stubs.
""" """
if not basedir.exists(): if not basedir.exists():
@@ -193,5 +187,10 @@ def setup_website(basedir, config):
template += "\nrequire_once('{}/website/{{}}');\n".format(config.lib_dir.php) template += "\nrequire_once('{}/website/{{}}');\n".format(config.lib_dir.php)
search_name_table_exists = bool(conn and conn.table_exists('search_name'))
for script in WEBSITE_SCRIPTS: for script in WEBSITE_SCRIPTS:
(basedir / script).write_text(template.format(script), 'utf-8') if not search_name_table_exists and script == 'search.php':
(basedir / script).write_text(template.format('reverse-only-search.php'), 'utf-8')
else:
(basedir / script).write_text(template.format(script), 'utf-8')

View File

@@ -1,15 +1,18 @@
""" """
Functions for importing tiger data and handling tarbar and directory files Functions for importing tiger data and handling tarbar and directory files
""" """
import csv
import io
import logging import logging
import os import os
import tarfile import tarfile
import selectors
import psycopg2.extras
from nominatim.db.connection import connect from nominatim.db.connection import connect
from nominatim.db.async_connection import DBConnection from nominatim.db.async_connection import WorkerPool
from nominatim.db.sql_preprocessor import SQLPreprocessor from nominatim.db.sql_preprocessor import SQLPreprocessor
from nominatim.errors import UsageError
LOG = logging.getLogger() LOG = logging.getLogger()
@@ -20,96 +23,81 @@ def handle_tarfile_or_directory(data_dir):
tar = None tar = None
if data_dir.endswith('.tar.gz'): if data_dir.endswith('.tar.gz'):
tar = tarfile.open(data_dir) try:
sql_files = [i for i in tar.getmembers() if i.name.endswith('.sql')] tar = tarfile.open(data_dir)
LOG.warning("Found %d SQL files in tarfile with path %s", len(sql_files), data_dir) except tarfile.ReadError as err:
if not sql_files: LOG.fatal("Cannot open '%s'. Is this a tar file?", data_dir)
raise UsageError("Cannot open Tiger data file.") from err
csv_files = [i for i in tar.getmembers() if i.name.endswith('.csv')]
LOG.warning("Found %d CSV files in tarfile with path %s", len(csv_files), data_dir)
if not csv_files:
LOG.warning("Tiger data import selected but no files in tarfile's path %s", data_dir) LOG.warning("Tiger data import selected but no files in tarfile's path %s", data_dir)
return None, None return None, None
else: else:
files = os.listdir(data_dir) files = os.listdir(data_dir)
sql_files = [os.path.join(data_dir, i) for i in files if i.endswith('.sql')] csv_files = [os.path.join(data_dir, i) for i in files if i.endswith('.csv')]
LOG.warning("Found %d SQL files in path %s", len(sql_files), data_dir) LOG.warning("Found %d CSV files in path %s", len(csv_files), data_dir)
if not sql_files: if not csv_files:
LOG.warning("Tiger data import selected but no files found in path %s", data_dir) LOG.warning("Tiger data import selected but no files found in path %s", data_dir)
return None, None return None, None
return sql_files, tar return csv_files, tar
def handle_threaded_sql_statements(sel, file): def handle_threaded_sql_statements(pool, fd, analyzer):
""" Handles sql statement with multiplexing """ Handles sql statement with multiplexing
""" """
lines = 0 lines = 0
end_of_file = False
# Using pool of database connections to execute sql statements # Using pool of database connections to execute sql statements
while not end_of_file:
for key, _ in sel.select(1):
conn = key.data
try:
if conn.is_done():
sql_query = file.readline()
lines += 1
if not sql_query:
end_of_file = True
break
conn.perform(sql_query)
if lines == 1000:
print('. ', end='', flush=True)
lines = 0
except Exception as exc: # pylint: disable=broad-except
LOG.info('Wrong SQL statement: %s', exc)
def handle_unregister_connection_pool(sel, place_threads): sql = "SELECT tiger_line_import(%s, %s, %s, %s, %s, %s)"
""" Handles unregistering pool of connections
"""
while place_threads > 0: for row in csv.DictReader(fd, delimiter=';'):
for key, _ in sel.select(1): try:
conn = key.data address = dict(street=row['street'], postcode=row['postcode'])
sel.unregister(conn) args = ('SRID=4326;' + row['geometry'],
try: int(row['from']), int(row['to']), row['interpolation'],
conn.wait() psycopg2.extras.Json(analyzer.process_place(dict(address=address))),
except Exception as exc: # pylint: disable=broad-except analyzer.normalize_postcode(row['postcode']))
LOG.info('Wrong SQL statement: %s', exc) except ValueError:
conn.close() continue
place_threads -= 1 pool.next_free_worker().perform(sql, args=args)
def add_tiger_data(data_dir, config, threads): lines += 1
if lines == 1000:
print('.', end='', flush=True)
lines = 0
def add_tiger_data(data_dir, config, threads, tokenizer):
""" Import tiger data from directory or tar file `data dir`. """ Import tiger data from directory or tar file `data dir`.
""" """
dsn = config.get_libpq_dsn() dsn = config.get_libpq_dsn()
sql_files, tar = handle_tarfile_or_directory(data_dir) files, tar = handle_tarfile_or_directory(data_dir)
if not sql_files: if not files:
return return
with connect(dsn) as conn: with connect(dsn) as conn:
sql = SQLPreprocessor(conn, config) sql = SQLPreprocessor(conn, config)
sql.run_sql_file(conn, 'tiger_import_start.sql') sql.run_sql_file(conn, 'tiger_import_start.sql')
# Reading sql_files and then for each file line handling # Reading files and then for each file line handling
# sql_query in <threads - 1> chunks. # sql_query in <threads - 1> chunks.
sel = selectors.DefaultSelector()
place_threads = max(1, threads - 1) place_threads = max(1, threads - 1)
# Creates a pool of database connections with WorkerPool(dsn, place_threads, ignore_sql_errors=True) as pool:
for _ in range(place_threads): with tokenizer.name_analyzer() as analyzer:
conn = DBConnection(dsn) for fname in files:
conn.connect() if not tar:
sel.register(conn, selectors.EVENT_WRITE, conn) fd = open(fname)
else:
fd = io.TextIOWrapper(tar.extractfile(fname))
for sql_file in sql_files: handle_threaded_sql_statements(pool, fd, analyzer)
if not tar:
file = open(sql_file)
else:
file = tar.extractfile(sql_file)
handle_threaded_sql_statements(sel, file) fd.close()
# Unregistering pool of database connections
handle_unregister_connection_pool(sel, place_threads)
if tar: if tar:
tar.close() tar.close()

View File

@@ -9,6 +9,7 @@ sys.path.insert(1, str((Path(__file__) / '..' / '..' / '..' / '..').resolve()))
from nominatim import cli from nominatim import cli
from nominatim.config import Configuration from nominatim.config import Configuration
from nominatim.db.connection import _Connection
from nominatim.tools import refresh from nominatim.tools import refresh
from nominatim.tokenizer import factory as tokenizer_factory from nominatim.tokenizer import factory as tokenizer_factory
from steps.utils import run_script from steps.utils import run_script
@@ -54,7 +55,7 @@ class NominatimEnvironment:
dbargs['user'] = self.db_user dbargs['user'] = self.db_user
if self.db_pass: if self.db_pass:
dbargs['password'] = self.db_pass dbargs['password'] = self.db_pass
conn = psycopg2.connect(**dbargs) conn = psycopg2.connect(connection_factory=_Connection, **dbargs)
return conn return conn
def next_code_coverage_file(self): def next_code_coverage_file(self):
@@ -110,8 +111,13 @@ class NominatimEnvironment:
self.website_dir.cleanup() self.website_dir.cleanup()
self.website_dir = tempfile.TemporaryDirectory() self.website_dir = tempfile.TemporaryDirectory()
try:
conn = self.connect_database(dbname)
except:
conn = False
refresh.setup_website(Path(self.website_dir.name) / 'website', refresh.setup_website(Path(self.website_dir.name) / 'website',
self.get_test_config()) self.get_test_config(), conn)
def get_test_config(self): def get_test_config(self):
@@ -228,13 +234,13 @@ class NominatimEnvironment:
""" Setup a test against a fresh, empty test database. """ Setup a test against a fresh, empty test database.
""" """
self.setup_template_db() self.setup_template_db()
self.write_nominatim_config(self.test_db)
conn = self.connect_database(self.template_db) conn = self.connect_database(self.template_db)
conn.set_isolation_level(0) conn.set_isolation_level(0)
cur = conn.cursor() cur = conn.cursor()
cur.execute('DROP DATABASE IF EXISTS {}'.format(self.test_db)) cur.execute('DROP DATABASE IF EXISTS {}'.format(self.test_db))
cur.execute('CREATE DATABASE {} TEMPLATE = {}'.format(self.test_db, self.template_db)) cur.execute('CREATE DATABASE {} TEMPLATE = {}'.format(self.test_db, self.template_db))
conn.close() conn.close()
self.write_nominatim_config(self.test_db)
context.db = self.connect_database(self.test_db) context.db = self.connect_database(self.test_db)
context.db.autocommit = True context.db.autocommit = True
psycopg2.extras.register_hstore(context.db, globally=False) psycopg2.extras.register_hstore(context.db, globally=False)

View File

@@ -251,7 +251,7 @@ def check_location_postcode(context):
with context.db.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur: with context.db.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:
cur.execute("SELECT *, ST_AsText(geometry) as geomtxt FROM location_postcode") cur.execute("SELECT *, ST_AsText(geometry) as geomtxt FROM location_postcode")
assert cur.rowcount == len(list(context.table)), \ assert cur.rowcount == len(list(context.table)), \
"Postcode table has {} rows, expected {}.".foramt(cur.rowcount, len(list(context.table))) "Postcode table has {} rows, expected {}.".format(cur.rowcount, len(list(context.table)))
results = {} results = {}
for row in cur: for row in cur:

View File

@@ -19,6 +19,7 @@ from nominatim.db.sql_preprocessor import SQLPreprocessor
from nominatim.db import properties from nominatim.db import properties
import dummy_tokenizer import dummy_tokenizer
import mocks
class _TestingCursor(psycopg2.extras.DictCursor): class _TestingCursor(psycopg2.extras.DictCursor):
""" Extension to the DictCursor class that provides execution """ Extension to the DictCursor class that provides execution
@@ -211,33 +212,7 @@ def place_row(place_table, temp_db_cursor):
def placex_table(temp_db_with_extensions, temp_db_conn): def placex_table(temp_db_with_extensions, temp_db_conn):
""" Create an empty version of the place table. """ Create an empty version of the place table.
""" """
with temp_db_conn.cursor() as cur: return mocks.MockPlacexTable(temp_db_conn)
cur.execute("""CREATE TABLE placex (
place_id BIGINT,
parent_place_id BIGINT,
linked_place_id BIGINT,
importance FLOAT,
indexed_date TIMESTAMP,
geometry_sector INTEGER,
rank_address SMALLINT,
rank_search SMALLINT,
partition SMALLINT,
indexed_status SMALLINT,
osm_id int8,
osm_type char(1),
class text,
type text,
name hstore,
admin_level smallint,
address hstore,
extratags hstore,
geometry Geometry(Geometry,4326),
wikipedia TEXT,
country_code varchar(2),
housenumber TEXT,
postcode TEXT,
centroid GEOMETRY(Geometry, 4326))""")
temp_db_conn.commit()
@pytest.fixture @pytest.fixture
@@ -262,18 +237,8 @@ def osmline_table(temp_db_with_extensions, temp_db_conn):
@pytest.fixture @pytest.fixture
def word_table(temp_db, temp_db_conn): def word_table(temp_db_conn):
with temp_db_conn.cursor() as cur: return mocks.MockWordTable(temp_db_conn)
cur.execute("""CREATE TABLE word (
word_id INTEGER,
word_token text,
word text,
class text,
type text,
country_code varchar(2),
search_name_count INTEGER,
operator TEXT)""")
temp_db_conn.commit()
@pytest.fixture @pytest.fixture

View File

@@ -51,7 +51,10 @@ class DummyNameAnalyzer:
def close(self): def close(self):
pass pass
def add_postcodes_from_db(self): def normalize_postcode(self, postcode):
return postcode
def update_postcodes_from_db(self):
pass pass
def update_special_phrases(self, phrases, should_replace): def update_special_phrases(self, phrases, should_replace):

View File

@@ -1,7 +1,9 @@
""" """
Custom mocks for testing. Custom mocks for testing.
""" """
import itertools
import psycopg2.extras
class MockParamCapture: class MockParamCapture:
""" Mock that records the parameters with which a function was called """ Mock that records the parameters with which a function was called
@@ -16,3 +18,110 @@ class MockParamCapture:
self.last_args = args self.last_args = args
self.last_kwargs = kwargs self.last_kwargs = kwargs
return self.return_value return self.return_value
class MockWordTable:
""" A word table for testing.
"""
def __init__(self, conn):
self.conn = conn
with conn.cursor() as cur:
cur.execute("""CREATE TABLE word (word_id INTEGER,
word_token text,
word text,
class text,
type text,
country_code varchar(2),
search_name_count INTEGER,
operator TEXT)""")
conn.commit()
def add_special(self, word_token, word, cls, typ, op):
with self.conn.cursor() as cur:
cur.execute("""INSERT INTO word (word_token, word, class, type, operator)
VALUES (%s, %s, %s, %s, %s)
""", (word_token, word, cls, typ, op))
self.conn.commit()
def add_postcode(self, word_token, postcode):
with self.conn.cursor() as cur:
cur.execute("""INSERT INTO word (word_token, word, class, type)
VALUES (%s, %s, 'place', 'postcode')
""", (word_token, postcode))
self.conn.commit()
def count(self):
with self.conn.cursor() as cur:
return cur.scalar("SELECT count(*) FROM word")
def count_special(self):
with self.conn.cursor() as cur:
return cur.scalar("SELECT count(*) FROM word WHERE class != 'place'")
def get_special(self):
with self.conn.cursor() as cur:
cur.execute("""SELECT word_token, word, class, type, operator
FROM word WHERE class != 'place'""")
return set((tuple(row) for row in cur))
def get_postcodes(self):
with self.conn.cursor() as cur:
cur.execute("""SELECT word FROM word
WHERE class = 'place' and type = 'postcode'""")
return set((row[0] for row in cur))
class MockPlacexTable:
""" A placex table for testing.
"""
def __init__(self, conn):
self.idseq = itertools.count(10000)
self.conn = conn
with conn.cursor() as cur:
cur.execute("""CREATE TABLE placex (
place_id BIGINT,
parent_place_id BIGINT,
linked_place_id BIGINT,
importance FLOAT,
indexed_date TIMESTAMP,
geometry_sector INTEGER,
rank_address SMALLINT,
rank_search SMALLINT,
partition SMALLINT,
indexed_status SMALLINT,
osm_id int8,
osm_type char(1),
class text,
type text,
name hstore,
admin_level smallint,
address hstore,
extratags hstore,
geometry Geometry(Geometry,4326),
wikipedia TEXT,
country_code varchar(2),
housenumber TEXT,
postcode TEXT,
centroid GEOMETRY(Geometry, 4326))""")
cur.execute("CREATE SEQUENCE IF NOT EXISTS seq_place")
conn.commit()
def add(self, osm_type='N', osm_id=None, cls='amenity', typ='cafe', names=None,
admin_level=None, address=None, extratags=None, geom='POINT(10 4)',
country=None):
with self.conn.cursor() as cur:
psycopg2.extras.register_hstore(cur)
cur.execute("""INSERT INTO placex (place_id, osm_type, osm_id, class,
type, name, admin_level, address,
extratags, geometry, country_code)
VALUES(nextval('seq_place'), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
(osm_type, osm_id or next(self.idseq), cls, typ, names,
admin_level, address, extratags, 'SRID=4326;' + geom,
country))
self.conn.commit()

View File

@@ -120,7 +120,7 @@ def test_import_full(temp_db, mock_func_factory, tokenizer_mock):
mock_func_factory(nominatim.tools.database_import, 'create_search_indices'), mock_func_factory(nominatim.tools.database_import, 'create_search_indices'),
mock_func_factory(nominatim.tools.database_import, 'create_country_names'), mock_func_factory(nominatim.tools.database_import, 'create_country_names'),
mock_func_factory(nominatim.tools.refresh, 'load_address_levels_from_file'), mock_func_factory(nominatim.tools.refresh, 'load_address_levels_from_file'),
mock_func_factory(nominatim.tools.postcodes, 'import_postcodes'), mock_func_factory(nominatim.tools.postcodes, 'update_postcodes'),
mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'), mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'),
mock_func_factory(nominatim.tools.refresh, 'setup_website'), mock_func_factory(nominatim.tools.refresh, 'setup_website'),
mock_func_factory(nominatim.db.properties, 'set_property') mock_func_factory(nominatim.db.properties, 'set_property')
@@ -143,7 +143,7 @@ def test_import_continue_load_data(temp_db, mock_func_factory, tokenizer_mock):
mock_func_factory(nominatim.tools.database_import, 'load_data'), mock_func_factory(nominatim.tools.database_import, 'load_data'),
mock_func_factory(nominatim.tools.database_import, 'create_search_indices'), mock_func_factory(nominatim.tools.database_import, 'create_search_indices'),
mock_func_factory(nominatim.tools.database_import, 'create_country_names'), mock_func_factory(nominatim.tools.database_import, 'create_country_names'),
mock_func_factory(nominatim.tools.postcodes, 'import_postcodes'), mock_func_factory(nominatim.tools.postcodes, 'update_postcodes'),
mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'), mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'),
mock_func_factory(nominatim.tools.refresh, 'setup_website'), mock_func_factory(nominatim.tools.refresh, 'setup_website'),
mock_func_factory(nominatim.db.properties, 'set_property') mock_func_factory(nominatim.db.properties, 'set_property')
@@ -280,20 +280,26 @@ def test_special_phrases_csv_command(temp_db, mock_func_factory, tokenizer_mock,
assert func.called == 1 assert func.called == 1
@pytest.mark.parametrize("command,func", [ @pytest.mark.parametrize("command,func", [
('postcodes', 'update_postcodes'),
('word-counts', 'recompute_word_counts'), ('word-counts', 'recompute_word_counts'),
('address-levels', 'load_address_levels_from_file'), ('address-levels', 'load_address_levels_from_file'),
('wiki-data', 'import_wikipedia_articles'), ('wiki-data', 'import_wikipedia_articles'),
('importance', 'recompute_importance'), ('importance', 'recompute_importance'),
('website', 'setup_website'), ('website', 'setup_website'),
]) ])
def test_refresh_command(mock_func_factory, temp_db, command, func): def test_refresh_command(mock_func_factory, temp_db, command, func, tokenizer_mock):
func_mock = mock_func_factory(nominatim.tools.refresh, func) func_mock = mock_func_factory(nominatim.tools.refresh, func)
assert 0 == call_nominatim('refresh', '--' + command) assert 0 == call_nominatim('refresh', '--' + command)
assert func_mock.called == 1 assert func_mock.called == 1
def test_refresh_postcodes(mock_func_factory, temp_db, tokenizer_mock):
func_mock = mock_func_factory(nominatim.tools.postcodes, 'update_postcodes')
idx_mock = mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_postcodes')
assert 0 == call_nominatim('refresh', '--postcodes')
assert func_mock.called == 1
def test_refresh_create_functions(mock_func_factory, temp_db, tokenizer_mock): def test_refresh_create_functions(mock_func_factory, temp_db, tokenizer_mock):
func_mock = mock_func_factory(nominatim.tools.refresh, 'create_functions') func_mock = mock_func_factory(nominatim.tools.refresh, 'create_functions')
@@ -302,7 +308,7 @@ def test_refresh_create_functions(mock_func_factory, temp_db, tokenizer_mock):
assert tokenizer_mock.update_sql_functions_called assert tokenizer_mock.update_sql_functions_called
def test_refresh_importance_computed_after_wiki_import(monkeypatch, temp_db): def test_refresh_importance_computed_after_wiki_import(monkeypatch, temp_db, tokenizer_mock):
calls = [] calls = []
monkeypatch.setattr(nominatim.tools.refresh, 'import_wikipedia_articles', monkeypatch.setattr(nominatim.tools.refresh, 'import_wikipedia_articles',
lambda *args, **kwargs: calls.append('import') or 0) lambda *args, **kwargs: calls.append('import') or 0)

View File

@@ -56,13 +56,21 @@ def test_bad_query(conn):
conn.wait() conn.wait()
def test_bad_query_ignore(temp_db):
with closing(DBConnection('dbname=' + temp_db, ignore_sql_errors=True)) as conn:
conn.connect()
conn.perform('SELECT efasfjsea')
conn.wait()
def exec_with_deadlock(cur, sql, detector): def exec_with_deadlock(cur, sql, detector):
with DeadlockHandler(lambda *args: detector.append(1)): with DeadlockHandler(lambda *args: detector.append(1)):
cur.execute(sql) cur.execute(sql)
def test_deadlock(simple_conns): def test_deadlock(simple_conns):
print(psycopg2.__version__)
cur1, cur2 = simple_conns cur1, cur2 = simple_conns
cur1.execute("""CREATE TABLE t1 (id INT PRIMARY KEY, t TEXT); cur1.execute("""CREATE TABLE t1 (id INT PRIMARY KEY, t TEXT);

View File

@@ -77,12 +77,12 @@ def make_standard_name(temp_db_cursor):
@pytest.fixture @pytest.fixture
def create_postcode_id(table_factory, temp_db_cursor): def create_postcode_id(temp_db_cursor):
table_factory('out_postcode_table', 'postcode TEXT')
temp_db_cursor.execute("""CREATE OR REPLACE FUNCTION create_postcode_id(postcode TEXT) temp_db_cursor.execute("""CREATE OR REPLACE FUNCTION create_postcode_id(postcode TEXT)
RETURNS BOOLEAN AS $$ RETURNS BOOLEAN AS $$
INSERT INTO out_postcode_table VALUES (postcode) RETURNING True; INSERT INTO word (word_token, word, class, type)
VALUES (' ' || postcode, postcode, 'place', 'postcode')
RETURNING True;
$$ LANGUAGE SQL""") $$ LANGUAGE SQL""")
@@ -192,27 +192,38 @@ def test_normalize(analyzer):
assert analyzer.normalize('TEsT') == 'test' assert analyzer.normalize('TEsT') == 'test'
def test_add_postcodes_from_db(analyzer, table_factory, temp_db_cursor, def test_update_postcodes_from_db_empty(analyzer, table_factory, word_table,
create_postcode_id): create_postcode_id):
table_factory('location_postcode', 'postcode TEXT', table_factory('location_postcode', 'postcode TEXT',
content=(('1234',), ('12 34',), ('AB23',), ('1234',))) content=(('1234',), ('12 34',), ('AB23',), ('1234',)))
analyzer.add_postcodes_from_db() analyzer.update_postcodes_from_db()
assert temp_db_cursor.row_set("SELECT * from out_postcode_table") \ assert word_table.count() == 3
== set((('1234', ), ('12 34', ), ('AB23',))) assert word_table.get_postcodes() == {'1234', '12 34', 'AB23'}
def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor, def test_update_postcodes_from_db_add_and_remove(analyzer, table_factory, word_table,
make_standard_name): create_postcode_id):
table_factory('location_postcode', 'postcode TEXT',
content=(('1234',), ('45BC', ), ('XX45', )))
word_table.add_postcode(' 1234', '1234')
word_table.add_postcode(' 5678', '5678')
analyzer.update_postcodes_from_db()
assert word_table.count() == 3
assert word_table.get_postcodes() == {'1234', '45BC', 'XX45'}
def test_update_special_phrase_empty_table(analyzer, word_table, make_standard_name):
analyzer.update_special_phrases([ analyzer.update_special_phrases([
("König bei", "amenity", "royal", "near"), ("König bei", "amenity", "royal", "near"),
("Könige", "amenity", "royal", "-"), ("Könige", "amenity", "royal", "-"),
("strasse", "highway", "primary", "in") ("strasse", "highway", "primary", "in")
], True) ], True)
assert temp_db_cursor.row_set("""SELECT word_token, word, class, type, operator assert word_table.get_special() \
FROM word WHERE class != 'place'""") \
== set(((' könig bei', 'könig bei', 'amenity', 'royal', 'near'), == set(((' könig bei', 'könig bei', 'amenity', 'royal', 'near'),
(' könige', 'könige', 'amenity', 'royal', None), (' könige', 'könige', 'amenity', 'royal', None),
(' strasse', 'strasse', 'highway', 'primary', 'in'))) (' strasse', 'strasse', 'highway', 'primary', 'in')))
@@ -220,15 +231,14 @@ def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor,
def test_update_special_phrase_delete_all(analyzer, word_table, temp_db_cursor, def test_update_special_phrase_delete_all(analyzer, word_table, temp_db_cursor,
make_standard_name): make_standard_name):
temp_db_cursor.execute("""INSERT INTO word (word_token, word, class, type, operator) word_table.add_special(' foo', 'foo', 'amenity', 'prison', 'in')
VALUES (' foo', 'foo', 'amenity', 'prison', 'in'), word_table.add_special(' bar', 'bar', 'highway', 'road', None)
(' bar', 'bar', 'highway', 'road', null)""")
assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""") assert word_table.count_special() == 2
analyzer.update_special_phrases([], True) analyzer.update_special_phrases([], True)
assert 0 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""") assert word_table.count_special() == 0
def test_update_special_phrases_no_replace(analyzer, word_table, temp_db_cursor, def test_update_special_phrases_no_replace(analyzer, word_table, temp_db_cursor,
@@ -244,13 +254,11 @@ def test_update_special_phrases_no_replace(analyzer, word_table, temp_db_cursor,
assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""") assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""")
def test_update_special_phrase_modify(analyzer, word_table, temp_db_cursor, def test_update_special_phrase_modify(analyzer, word_table, make_standard_name):
make_standard_name): word_table.add_special(' foo', 'foo', 'amenity', 'prison', 'in')
temp_db_cursor.execute("""INSERT INTO word (word_token, word, class, type, operator) word_table.add_special(' bar', 'bar', 'highway', 'road', None)
VALUES (' foo', 'foo', 'amenity', 'prison', 'in'),
(' bar', 'bar', 'highway', 'road', null)""")
assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""") assert word_table.count_special() == 2
analyzer.update_special_phrases([ analyzer.update_special_phrases([
('prison', 'amenity', 'prison', 'in'), ('prison', 'amenity', 'prison', 'in'),
@@ -258,8 +266,7 @@ def test_update_special_phrase_modify(analyzer, word_table, temp_db_cursor,
('garden', 'leisure', 'garden', 'near') ('garden', 'leisure', 'garden', 'near')
], True) ], True)
assert temp_db_cursor.row_set("""SELECT word_token, word, class, type, operator assert word_table.get_special() \
FROM word WHERE class != 'place'""") \
== set(((' prison', 'prison', 'amenity', 'prison', 'in'), == set(((' prison', 'prison', 'amenity', 'prison', 'in'),
(' bar', 'bar', 'highway', 'road', None), (' bar', 'bar', 'highway', 'road', None),
(' garden', 'garden', 'leisure', 'garden', 'near'))) (' garden', 'garden', 'leisure', 'garden', 'near')))
@@ -273,21 +280,17 @@ def test_process_place_names(analyzer, make_keywords):
@pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345']) @pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345'])
def test_process_place_postcode(analyzer, temp_db_cursor, create_postcode_id, pc): def test_process_place_postcode(analyzer, create_postcode_id, word_table, pc):
info = analyzer.process_place({'address': {'postcode' : pc}}) info = analyzer.process_place({'address': {'postcode' : pc}})
assert temp_db_cursor.row_set("SELECT * from out_postcode_table") \ assert word_table.get_postcodes() == {pc, }
== set(((pc, ),))
@pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836']) @pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836'])
def test_process_place_bad_postcode(analyzer, temp_db_cursor, create_postcode_id, def test_process_place_bad_postcode(analyzer, create_postcode_id, word_table, pc):
pc):
info = analyzer.process_place({'address': {'postcode' : pc}}) info = analyzer.process_place({'address': {'postcode' : pc}})
assert 0 == temp_db_cursor.scalar("SELECT count(*) from out_postcode_table") assert not word_table.get_postcodes()
@pytest.mark.parametrize('hnr', ['123a', '1', '101']) @pytest.mark.parametrize('hnr', ['123a', '1', '101'])

View File

@@ -141,16 +141,28 @@ def test_make_standard_hnr(analyzer):
assert a._make_standard_hnr('iv') == 'IV' assert a._make_standard_hnr('iv') == 'IV'
def test_add_postcodes_from_db(analyzer, word_table, table_factory, temp_db_cursor): def test_update_postcodes_from_db_empty(analyzer, table_factory, word_table):
table_factory('location_postcode', 'postcode TEXT', table_factory('location_postcode', 'postcode TEXT',
content=(('1234',), ('12 34',), ('AB23',), ('1234',))) content=(('1234',), ('12 34',), ('AB23',), ('1234',)))
with analyzer() as a: with analyzer() as a:
a.add_postcodes_from_db() a.update_postcodes_from_db()
assert temp_db_cursor.row_set("""SELECT word, word_token from word assert word_table.count() == 3
""") \ assert word_table.get_postcodes() == {'1234', '12 34', 'AB23'}
== set((('1234', ' 1234'), ('12 34', ' 12 34'), ('AB23', ' AB23')))
def test_update_postcodes_from_db_add_and_remove(analyzer, table_factory, word_table):
table_factory('location_postcode', 'postcode TEXT',
content=(('1234',), ('45BC', ), ('XX45', )))
word_table.add_postcode(' 1234', '1234')
word_table.add_postcode(' 5678', '5678')
with analyzer() as a:
a.update_postcodes_from_db()
assert word_table.count() == 3
assert word_table.get_postcodes() == {'1234', '45BC', 'XX45'}
def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor): def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor):
@@ -224,22 +236,19 @@ def test_process_place_names(analyzer, getorcreate_term_id):
@pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345']) @pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345'])
def test_process_place_postcode(analyzer, temp_db_cursor, pc): def test_process_place_postcode(analyzer, word_table, pc):
with analyzer() as a: with analyzer() as a:
info = a.process_place({'address': {'postcode' : pc}}) info = a.process_place({'address': {'postcode' : pc}})
assert temp_db_cursor.row_set("""SELECT word FROM word assert word_table.get_postcodes() == {pc, }
WHERE class = 'place' and type = 'postcode'""") \
== set(((pc, ),))
@pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836']) @pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836'])
def test_process_place_bad_postcode(analyzer, temp_db_cursor, pc): def test_process_place_bad_postcode(analyzer, word_table, pc):
with analyzer() as a: with analyzer() as a:
info = a.process_place({'address': {'postcode' : pc}}) info = a.process_place({'address': {'postcode' : pc}})
assert 0 == temp_db_cursor.scalar("""SELECT count(*) FROM word assert not word_table.get_postcodes()
WHERE class = 'place' and type = 'postcode'""")
@pytest.mark.parametrize('hnr', ['123a', '1', '101']) @pytest.mark.parametrize('hnr', ['123a', '1', '101'])

View File

@@ -153,8 +153,8 @@ def test_truncate_database_tables(temp_db_conn, temp_db_cursor, table_factory):
@pytest.mark.parametrize("threads", (1, 5)) @pytest.mark.parametrize("threads", (1, 5))
def test_load_data(dsn, src_dir, place_row, placex_table, osmline_table, word_table, def test_load_data(dsn, src_dir, place_row, placex_table, osmline_table,
temp_db_cursor, threads): word_table, temp_db_cursor, threads):
for func in ('precompute_words', 'getorcreate_housenumber_id', 'make_standard_name'): for func in ('precompute_words', 'getorcreate_housenumber_id', 'make_standard_name'):
temp_db_cursor.execute("""CREATE FUNCTION {} (src TEXT) temp_db_cursor.execute("""CREATE FUNCTION {} (src TEXT)
RETURNS TEXT AS $$ SELECT 'a'::TEXT $$ LANGUAGE SQL RETURNS TEXT AS $$ SELECT 'a'::TEXT $$ LANGUAGE SQL

View File

@@ -1,55 +1,185 @@
""" """
Tests for functions to maintain the artificial postcode table. Tests for functions to maintain the artificial postcode table.
""" """
import subprocess
import pytest import pytest
from nominatim.tools import postcodes from nominatim.tools import postcodes
import dummy_tokenizer import dummy_tokenizer
class MockPostcodeTable:
""" A location_postcode table for testing.
"""
def __init__(self, conn):
self.conn = conn
with conn.cursor() as cur:
cur.execute("""CREATE TABLE location_postcode (
place_id BIGINT,
parent_place_id BIGINT,
rank_search SMALLINT,
rank_address SMALLINT,
indexed_status SMALLINT,
indexed_date TIMESTAMP,
country_code varchar(2),
postcode TEXT,
geometry GEOMETRY(Geometry, 4326))""")
cur.execute("""CREATE OR REPLACE FUNCTION token_normalized_postcode(postcode TEXT)
RETURNS TEXT AS $$ BEGIN RETURN postcode; END; $$ LANGUAGE plpgsql;
""")
conn.commit()
def add(self, country, postcode, x, y):
with self.conn.cursor() as cur:
cur.execute("""INSERT INTO location_postcode (place_id, indexed_status,
country_code, postcode,
geometry)
VALUES (nextval('seq_place'), 1, %s, %s,
'SRID=4326;POINT(%s %s)')""",
(country, postcode, x, y))
self.conn.commit()
@property
def row_set(self):
with self.conn.cursor() as cur:
cur.execute("""SELECT country_code, postcode,
ST_X(geometry), ST_Y(geometry)
FROM location_postcode""")
return set((tuple(row) for row in cur))
@pytest.fixture @pytest.fixture
def tokenizer(): def tokenizer():
return dummy_tokenizer.DummyTokenizer(None, None) return dummy_tokenizer.DummyTokenizer(None, None)
@pytest.fixture @pytest.fixture
def postcode_table(temp_db_with_extensions, temp_db_cursor, table_factory, def postcode_table(temp_db_conn, placex_table, word_table):
placex_table, word_table): return MockPostcodeTable(temp_db_conn)
table_factory('location_postcode',
""" place_id BIGINT,
parent_place_id BIGINT,
rank_search SMALLINT,
rank_address SMALLINT,
indexed_status SMALLINT,
indexed_date TIMESTAMP,
country_code varchar(2),
postcode TEXT,
geometry GEOMETRY(Geometry, 4326)""")
temp_db_cursor.execute('CREATE SEQUENCE seq_place')
temp_db_cursor.execute("""CREATE OR REPLACE FUNCTION token_normalized_postcode(postcode TEXT)
RETURNS TEXT AS $$ BEGIN RETURN postcode; END; $$ LANGUAGE plpgsql;
""")
def test_import_postcodes_empty(dsn, temp_db_cursor, postcode_table, tmp_path, tokenizer): def test_import_postcodes_empty(dsn, postcode_table, tmp_path, tokenizer):
postcodes.import_postcodes(dsn, tmp_path, tokenizer) postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert temp_db_cursor.table_exists('gb_postcode') assert not postcode_table.row_set
assert temp_db_cursor.table_exists('us_postcode')
assert temp_db_cursor.table_rows('location_postcode') == 0
def test_import_postcodes_from_placex(dsn, temp_db_cursor, postcode_table, tmp_path, tokenizer): def test_import_postcodes_add_new(dsn, placex_table, postcode_table, tmp_path, tokenizer):
temp_db_cursor.execute(""" placex_table.add(country='xx', geom='POINT(10 12)',
INSERT INTO placex (place_id, country_code, address, geometry) address=dict(postcode='9486'))
VALUES (1, 'xx', '"postcode"=>"9486"', 'SRID=4326;POINT(10 12)') postcode_table.add('yy', '9486', 99, 34)
""")
postcodes.import_postcodes(dsn, tmp_path, tokenizer) postcodes.update_postcodes(dsn, tmp_path, tokenizer)
rows = temp_db_cursor.row_set(""" SELECT postcode, country_code, assert postcode_table.row_set == {('xx', '9486', 10, 12), }
ST_X(geometry), ST_Y(geometry)
FROM location_postcode""")
print(rows)
assert len(rows) == 1
assert rows == set((('9486', 'xx', 10, 12), ))
def test_import_postcodes_replace_coordinates(dsn, placex_table, postcode_table, tmp_path, tokenizer):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
postcode_table.add('xx', 'AB 4511', 99, 34)
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
def test_import_postcodes_replace_coordinates_close(dsn, placex_table, postcode_table, tmp_path, tokenizer):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
postcode_table.add('xx', 'AB 4511', 10, 11.99999)
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 11.99999)}
def test_import_postcodes_remove(dsn, placex_table, postcode_table, tmp_path, tokenizer):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
postcode_table.add('xx', 'badname', 10, 12)
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
def test_import_postcodes_ignore_empty_country(dsn, placex_table, postcode_table, tmp_path, tokenizer):
placex_table.add(country=None, geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert not postcode_table.row_set
def test_import_postcodes_remove_all(dsn, placex_table, postcode_table, tmp_path, tokenizer):
postcode_table.add('ch', '5613', 10, 12)
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert not postcode_table.row_set
def test_import_postcodes_multi_country(dsn, placex_table, postcode_table, tmp_path, tokenizer):
placex_table.add(country='de', geom='POINT(10 12)',
address=dict(postcode='54451'))
placex_table.add(country='cc', geom='POINT(100 56)',
address=dict(postcode='DD23 T'))
placex_table.add(country='de', geom='POINT(10.3 11.0)',
address=dict(postcode='54452'))
placex_table.add(country='cc', geom='POINT(10.3 11.0)',
address=dict(postcode='54452'))
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('de', '54451', 10, 12),
('de', '54452', 10.3, 11.0),
('cc', '54452', 10.3, 11.0),
('cc', 'DD23 T', 100, 56)}
@pytest.mark.parametrize("gzipped", [True, False])
def test_import_postcodes_extern(dsn, placex_table, postcode_table, tmp_path,
tokenizer, gzipped):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
extfile = tmp_path / 'xx_postcodes.csv'
extfile.write_text("postcode,lat,lon\nAB 4511,-4,-1\nCD 4511,-5, -10")
if gzipped:
subprocess.run(['gzip', str(extfile)])
assert not extfile.is_file()
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12),
('xx', 'CD 4511', -10, -5)}
def test_import_postcodes_extern_bad_column(dsn, placex_table, postcode_table,
tmp_path, tokenizer):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
extfile = tmp_path / 'xx_postcodes.csv'
extfile.write_text("postode,lat,lon\nAB 4511,-4,-1\nCD 4511,-5, -10")
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
def test_import_postcodes_extern_bad_number(dsn, placex_table, postcode_table,
tmp_path, tokenizer):
placex_table.add(country='xx', geom='POINT(10 12)',
address=dict(postcode='AB 4511'))
extfile = tmp_path / 'xx_postcodes.csv'
extfile.write_text("postcode,lat,lon\nXX 4511,-4,NaN\nCD 4511,-5, -10\n34,200,0")
postcodes.update_postcodes(dsn, tmp_path, tokenizer)
assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12),
('xx', 'CD 4511', -10, -5)}

View File

@@ -18,16 +18,16 @@ def envdir(tmpdir):
@pytest.fixture @pytest.fixture
def test_script(envdir): def test_script(envdir):
def _create_file(code): def _create_file(code):
outfile = envdir / 'php' / 'website' / 'search.php' outfile = envdir / 'php' / 'website' / 'reverse-only-search.php'
outfile.write_text('<?php\n{}\n'.format(code), 'utf-8') outfile.write_text('<?php\n{}\n'.format(code), 'utf-8')
return _create_file return _create_file
def run_website_script(envdir, config): def run_website_script(envdir, config, conn):
config.lib_dir.php = envdir / 'php' config.lib_dir.php = envdir / 'php'
config.project_dir = envdir config.project_dir = envdir
refresh.setup_website(envdir, config) refresh.setup_website(envdir, config, conn)
proc = subprocess.run(['/usr/bin/env', 'php', '-Cq', proc = subprocess.run(['/usr/bin/env', 'php', '-Cq',
envdir / 'search.php'], check=False) envdir / 'search.php'], check=False)
@@ -37,36 +37,39 @@ def run_website_script(envdir, config):
@pytest.mark.parametrize("setting,retval", (('yes', 10), ('no', 20))) @pytest.mark.parametrize("setting,retval", (('yes', 10), ('no', 20)))
def test_setup_website_check_bool(def_config, monkeypatch, envdir, test_script, def test_setup_website_check_bool(def_config, monkeypatch, envdir, test_script,
setting, retval): setting, retval, temp_db_conn):
monkeypatch.setenv('NOMINATIM_CORS_NOACCESSCONTROL', setting) monkeypatch.setenv('NOMINATIM_CORS_NOACCESSCONTROL', setting)
test_script('exit(CONST_NoAccessControl ? 10 : 20);') test_script('exit(CONST_NoAccessControl ? 10 : 20);')
assert run_website_script(envdir, def_config) == retval assert run_website_script(envdir, def_config, temp_db_conn) == retval
@pytest.mark.parametrize("setting", (0, 10, 99067)) @pytest.mark.parametrize("setting", (0, 10, 99067))
def test_setup_website_check_int(def_config, monkeypatch, envdir, test_script, setting): def test_setup_website_check_int(def_config, monkeypatch, envdir, test_script, setting,
temp_db_conn):
monkeypatch.setenv('NOMINATIM_LOOKUP_MAX_COUNT', str(setting)) monkeypatch.setenv('NOMINATIM_LOOKUP_MAX_COUNT', str(setting))
test_script('exit(CONST_Places_Max_ID_count == {} ? 10 : 20);'.format(setting)) test_script('exit(CONST_Places_Max_ID_count == {} ? 10 : 20);'.format(setting))
assert run_website_script(envdir, def_config) == 10 assert run_website_script(envdir, def_config, temp_db_conn) == 10
def test_setup_website_check_empty_str(def_config, monkeypatch, envdir, test_script): def test_setup_website_check_empty_str(def_config, monkeypatch, envdir, test_script,
temp_db_conn):
monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', '') monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', '')
test_script('exit(CONST_Default_Language === false ? 10 : 20);') test_script('exit(CONST_Default_Language === false ? 10 : 20);')
assert run_website_script(envdir, def_config) == 10 assert run_website_script(envdir, def_config, temp_db_conn) == 10
def test_setup_website_check_str(def_config, monkeypatch, envdir, test_script): def test_setup_website_check_str(def_config, monkeypatch, envdir, test_script,
temp_db_conn):
monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', 'ffde 2') monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', 'ffde 2')
test_script('exit(CONST_Default_Language === "ffde 2" ? 10 : 20);') test_script('exit(CONST_Default_Language === "ffde 2" ? 10 : 20);')
assert run_website_script(envdir, def_config) == 10 assert run_website_script(envdir, def_config, temp_db_conn) == 10

View File

@@ -2,60 +2,137 @@
Test for tiger data function Test for tiger data function
""" """
from pathlib import Path from pathlib import Path
from textwrap import dedent
import pytest import pytest
import tarfile import tarfile
from nominatim.tools import tiger_data, database_import from nominatim.tools import tiger_data, database_import
from nominatim.errors import UsageError
class MockTigerTable:
def __init__(self, conn):
self.conn = conn
with conn.cursor() as cur:
cur.execute("""CREATE TABLE tiger (linegeo GEOMETRY,
start INTEGER,
stop INTEGER,
interpol TEXT,
token_info JSONB,
postcode TEXT)""")
def count(self):
with self.conn.cursor() as cur:
return cur.scalar("SELECT count(*) FROM tiger")
def row(self):
with self.conn.cursor() as cur:
cur.execute("SELECT * FROM tiger LIMIT 1")
return cur.fetchone()
@pytest.fixture
def tiger_table(def_config, temp_db_conn, sql_preprocessor,
temp_db_with_extensions, tmp_path):
def_config.lib_dir.sql = tmp_path / 'sql'
def_config.lib_dir.sql.mkdir()
(def_config.lib_dir.sql / 'tiger_import_start.sql').write_text(
"""CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, start INTEGER,
stop INTEGER, interpol TEXT,
token_info JSONB, postcode TEXT)
RETURNS INTEGER AS $$
INSERT INTO tiger VALUES(linegeo, start, stop, interpol, token_info, postcode) RETURNING 1
$$ LANGUAGE SQL;""")
(def_config.lib_dir.sql / 'tiger_import_finish.sql').write_text(
"""DROP FUNCTION tiger_line_import (linegeo GEOMETRY, in_startnumber INTEGER,
in_endnumber INTEGER, interpolationtype TEXT,
token_info JSONB, in_postcode TEXT);""")
return MockTigerTable(temp_db_conn)
@pytest.fixture
def csv_factory(tmp_path):
def _mk_file(fname, hnr_from=1, hnr_to=9, interpol='odd', street='Main St',
city='Newtown', state='AL', postcode='12345',
geometry='LINESTRING(-86.466995 32.428956,-86.466923 32.428933)'):
(tmp_path / (fname + '.csv')).write_text(dedent("""\
from;to;interpolation;street;city;state;postcode;geometry
{};{};{};{};{};{};{};{}
""".format(hnr_from, hnr_to, interpol, street, city, state,
postcode, geometry)))
return _mk_file
@pytest.mark.parametrize("threads", (1, 5)) @pytest.mark.parametrize("threads", (1, 5))
def test_add_tiger_data(def_config, tmp_path, sql_preprocessor, def test_add_tiger_data(def_config, src_dir, tiger_table, tokenizer_mock, threads):
temp_db_cursor, threads, temp_db_with_extensions): tiger_data.add_tiger_data(str(src_dir / 'test' / 'testdb' / 'tiger'),
temp_db_cursor.execute('CREATE TABLE place (id INT)') def_config, threads, tokenizer_mock())
sqlfile = tmp_path / '1010.sql'
sqlfile.write_text("""INSERT INTO place values (1);
INSERT INTO non_existant_table values (1);""")
tiger_data.add_tiger_data(str(tmp_path), def_config, threads)
assert temp_db_cursor.table_rows('place') == 1 assert tiger_table.count() == 6213
@pytest.mark.parametrize("threads", (1, 5)) def test_add_tiger_data_no_files(def_config, tiger_table, tokenizer_mock,
def test_add_tiger_data_bad_file(def_config, tmp_path, sql_preprocessor, tmp_path):
temp_db_cursor, threads, temp_db_with_extensions): tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
temp_db_cursor.execute('CREATE TABLE place (id INT)')
sqlfile = tmp_path / '1010.txt' assert tiger_table.count() == 0
def test_add_tiger_data_bad_file(def_config, tiger_table, tokenizer_mock,
tmp_path):
sqlfile = tmp_path / '1010.csv'
sqlfile.write_text("""Random text""") sqlfile.write_text("""Random text""")
tiger_data.add_tiger_data(str(tmp_path), def_config, threads)
assert temp_db_cursor.table_rows('place') == 0 tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
assert tiger_table.count() == 0
def test_add_tiger_data_hnr_nan(def_config, tiger_table, tokenizer_mock,
csv_factory, tmp_path):
csv_factory('file1', hnr_from=99)
csv_factory('file2', hnr_from='L12')
csv_factory('file3', hnr_to='12.4')
tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
assert tiger_table.count() == 1
assert tiger_table.row()['start'] == 99
@pytest.mark.parametrize("threads", (1, 5)) @pytest.mark.parametrize("threads", (1, 5))
def test_add_tiger_data_tarfile(def_config, tmp_path, temp_db_cursor, def test_add_tiger_data_tarfile(def_config, tiger_table, tokenizer_mock,
threads, temp_db_with_extensions, sql_preprocessor): tmp_path, src_dir, threads):
temp_db_cursor.execute('CREATE TABLE place (id INT)')
sqlfile = tmp_path / '1010.sql'
sqlfile.write_text("""INSERT INTO place values (1);
INSERT INTO non_existant_table values (1);""")
tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz") tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz")
tar.add(sqlfile) tar.add(str(src_dir / 'test' / 'testdb' / 'tiger' / '01001.csv'))
tar.close() tar.close()
tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, threads)
assert temp_db_cursor.table_rows('place') == 1 tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, 1,
tokenizer_mock())
assert tiger_table.count() == 6213
@pytest.mark.parametrize("threads", (1, 5)) def test_add_tiger_data_bad_tarfile(def_config, tiger_table, tokenizer_mock,
def test_add_tiger_data_bad_tarfile(def_config, tmp_path, temp_db_cursor, threads, tmp_path):
temp_db_with_extensions, sql_preprocessor): tarfile = tmp_path / 'sample.tar.gz'
temp_db_cursor.execute('CREATE TABLE place (id INT)') tarfile.write_text("""Random text""")
sqlfile = tmp_path / '1010.txt'
sqlfile.write_text("""Random text""") with pytest.raises(UsageError):
tiger_data.add_tiger_data(str(tarfile), def_config, 1, tokenizer_mock())
def test_add_tiger_data_empty_tarfile(def_config, tiger_table, tokenizer_mock,
tmp_path, src_dir):
tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz") tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz")
tar.add(sqlfile) tar.add(__file__)
tar.close() tar.close()
tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, threads)
assert temp_db_cursor.table_rows('place') == 0 tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, 1,
tokenizer_mock())
assert tiger_table.count() == 0

6214
test/testdb/tiger/01001.csv Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff