Resolve conflicts

2026-02-16 15:47:58 +00:00 · 2021-05-17 13:52:35 +02:00
parent a33f2c0f5b 925726222f
commit 3206bf59df
37 changed files with 7285 additions and 6729 deletions
--- a/.pylintrc
+++ b/.pylintrc
@@ -11,3 +11,5 @@ ignored-modules=icu
 # 'with' statements.
 ignored-classes=NominatimArgs,closing
 disable=too-few-public-methods,duplicate-code
 good-names=i,x,y,fd
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -258,4 +258,5 @@ install(FILES settings/env.defaults
              settings/import-address.style
              settings/import-full.style
              settings/import-extratags.style
              settings/legacy_icu_tokenizer.json
        DESTINATION ${NOMINATIM_CONFIGDIR})
--- a/docs/admin/Customization.md
+++ b/docs/admin/Customization.md
@@ -0,0 +1,71 @@
 # Customization of the Database
 This section explains in detail how to configure a Nominatim import and
 the various means to use external data.
 ## External postcode data
 Nominatim creates a table of known postcode centroids during import. This table
 is used for searches of postcodes and for adding postcodes to places where the
 OSM data does not provide one. These postcode centroids are mainly computed
 from the OSM data itself. In addition, Nominatim supports reading postcode
 information from an external CSV file, to supplement the postcodes that are
 missing in OSM.
 To enable external postcode support, simply put one CSV file per country into
 your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
 two-letter country code for which to apply the file. The file may also be
 gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
 The CSV file must use commas as a delimiter and have a header line. Nominatim
 expects three columns to be present: `postcode`, `lat` and `lon`. All other
 columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
 postcode centroids in WGS84.
 The postcode files are loaded only when there is data for the given country
 in your database. For example, if there is a `us_postcodes.csv` file in your
 project directory but you import only an excerpt of Italy, then the US postcodes
 will simply be ignored.
 As a rule, the external postcode data should be put into the project directory
 **before** starting the initial import. Still, you can add, remove and update the
 external postcode data at any time. Simply
 run:
 ```
 nominatim refresh --postcodes
 ```
 to make the changes visible in your database. Be aware, however, that the changes
 only have an immediate effect on searches for postcodes. Postcodes that were
 added to places are only updated, when they are reindexed. That usually happens
 only during replication updates.
 ## Installing Tiger housenumber data for the US
 Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
 address set to complement the OSM house number data in the US. You can add
 TIGER data to your own Nominatim instance by following these steps. The
 entire US adds about 10GB to your database.
  1. Get preprocessed TIGER 2020 data:
        cd $PROJECT_DIR
        wget https://nominatim.org/data/tiger2020-nominatim-preprocessed.csv.tar.gz
  2. Import the data into your Nominatim database:
        nominatim add-data --tiger-data tiger2020-nominatim-preprocessed.csv.tar.gz
  3. Enable use of the Tiger data in your `.env` by adding:
        echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
  4. Apply the new settings:
        nominatim refresh --functions
 See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
 information on how the data got preprocessed.
--- a/docs/admin/Import.md
+++ b/docs/admin/Import.md
@@ -83,15 +83,19 @@ The file is about 400MB and adds around 4GB to the Nominatim database.
    `nominatim refresh --wiki-data --importance`. Updating importances for
    a planet can take a couple of hours.
-### Great Britain, USA postcodes
+### External postcodes
-Nominatim can use postcodes from an external source to improve searches that
+Nominatim can use postcodes from an external source to improve searching with
-involve a GB or US postcode. This data can be optionally downloaded into the
+postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
-project directory:
+and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
 This data can be optionally downloaded into the project directory:
    cd $PROJECT_DIR
-    wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
+    wget https://www.nominatim.org/data/gb_postcodes.csv.gz
-    wget https://www.nominatim.org/data/us_postcode_data.sql.gz
+    wget https://www.nominatim.org/data/us_postcodes.csv.gz
 You can also add your own custom postcode sources, see
 [Customization of postcodes](Customization.md#external-postcode-data).
 ## Choosing the data to import
@@ -248,6 +252,9 @@ to verify that your installation is working. Go to
 `http://localhost:8088/status.php` and you should see the message `OK`.
 You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`.
 Note that search query is not supported for reverse-only imports. You can run a
 reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=78.04209025`.
 To run Nominatim via webservers like Apache or nginx, please read the
 [Deployment chapter](Deployment.md).
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -19,6 +19,7 @@ pages:
        - 'Import' : 'admin/Import.md'
        - 'Update' : 'admin/Update.md'
        - 'Deploy' : 'admin/Deployment.md'
        - 'Customize Imports' : 'admin/Customization.md'
        - 'Nominatim UI'  : 'admin/Setup-Nominatim-UI.md'
        - 'Advanced Installations' : 'admin/Advanced-Installations.md'
        - 'Migration from older Versions' : 'admin/Migration.md'
--- a/lib-php/AddressDetails.php
+++ b/lib-php/AddressDetails.php
@@ -80,7 +80,6 @@ class AddressDetails
            }
            if (isset($sName)) {
                $sTypeLabel = strtolower(str_replace(' ', '_', $sTypeLabel));
                if (!isset($aAddress[$sTypeLabel])
                    || $aLine['class'] == 'place'
                ) {
--- a/lib-php/lib.php
+++ b/lib-php/lib.php
@@ -227,3 +227,10 @@ function closestHouseNumber($aRow)
    return max(min($aRow['endnumber'], $iHn), $aRow['startnumber']);
 }
 if (!function_exists('array_key_last')) {
    function array_key_last(array $array)
    {
        if (!empty($array)) return key(array_slice($array, -1, 1, true));
    }
 }
--- a/lib-php/website/reverse-only-search.php
+++ b/lib-php/website/reverse-only-search.php
@@ -0,0 +1,12 @@
 <?php
 require_once(CONST_LibDir.'/init-website.php');
 require_once(CONST_LibDir.'/ParameterParser.php');
 $oParams = new Nominatim\ParameterParser();
 // Format for output
 $sOutputFormat = $oParams->getSet('format', array('xml', 'json', 'jsonv2', 'geojson', 'geocodejson'), 'jsonv2');
 set_exception_handler_by_format($sOutputFormat);
 throw new Exception('Reverse-only import does not support forward searching.', 404);
--- a/lib-sql/tiger_import_finish.sql
+++ b/lib-sql/tiger_import_finish.sql
@@ -12,4 +12,6 @@ ALTER TABLE location_property_tiger_import RENAME TO location_property_tiger;
 ALTER INDEX IF EXISTS idx_location_property_tiger_parent_place_id_imp RENAME TO idx_location_property_tiger_housenumber_parent_place_id;
 ALTER INDEX IF EXISTS idx_location_property_tiger_place_id_imp RENAME TO idx_location_property_tiger_place_id;
-DROP FUNCTION tiger_line_import (linegeo geometry, in_startnumber integer, in_endnumber integer, interpolationtype text, in_street text, in_isin text, in_postcode text);
+DROP FUNCTION tiger_line_import (linegeo GEOMETRY, in_startnumber INTEGER,
                                 in_endnumber INTEGER, interpolationtype TEXT,
                                 token_info JSONB, in_postcode TEXT);
--- a/lib-sql/tiger_import_start.sql
+++ b/lib-sql/tiger_import_start.sql
@@ -1,9 +1,9 @@
 DROP TABLE IF EXISTS location_property_tiger_import;
 CREATE TABLE location_property_tiger_import (linegeo GEOMETRY, place_id BIGINT, partition INTEGER, parent_place_id BIGINT, startnumber INTEGER, endnumber INTEGER, interpolationtype TEXT, postcode TEXT);
-CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, in_startnumber INTEGER, 
+CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, in_startnumber INTEGER,
-  in_endnumber INTEGER, interpolationtype TEXT,
+                                             in_endnumber INTEGER, interpolationtype TEXT,
-  in_street TEXT, in_isin TEXT, in_postcode TEXT) RETURNS INTEGER
+                                             token_info JSONB, in_postcode TEXT) RETURNS INTEGER
  AS $$
 DECLARE
  startnumber INTEGER;
@@ -27,13 +27,13 @@ BEGIN
  END IF;
  IF startnumber < 0 THEN
-    RAISE WARNING 'Negative house number range (% to %) on %, %', startnumber, endnumber, in_street, in_isin;
+    RAISE WARNING 'Negative house number range (% to %)', startnumber, endnumber;
    RETURN 0;
  END IF;
  numberrange := endnumber - startnumber;
-  IF (interpolationtype = 'odd' AND startnumber%2 = 0) OR (interpolationtype = 'even' AND startnumber%2 = 1) THEN
+  IF (interpolationtype = 'odd' AND startnumber % 2 = 0) OR (interpolationtype = 'even' AND startnumber % 2 = 1) THEN
    startnumber := startnumber + 1;
    stepsize := 2;
  ELSE
@@ -45,10 +45,10 @@ BEGIN
  END IF;
  -- Filter out really broken tiger data
-  IF numberrange > 0 AND (numberrange::float/stepsize::float > 500) 
+  IF numberrange > 0 AND (numberrange::float/stepsize::float > 500)
                    AND ST_length(linegeo)/(numberrange::float/stepsize::float) < 0.000001 THEN
-    RAISE WARNING 'Road too short for number range % to % on %, % (%)',startnumber,endnumber,in_street,in_isin,
+    RAISE WARNING 'Road too short for number range % to % (%)',startnumber,endnumber,
-                  ST_length(linegeo)/(numberrange::float/stepsize::float);    
+                  ST_length(linegeo)/(numberrange::float/stepsize::float);
    RETURN 0;
  END IF;
@@ -56,7 +56,7 @@ BEGIN
  out_partition := get_partition('us');
  out_parent_place_id := null;
-  address_street_word_ids := word_ids_from_name(in_street);
+  address_street_word_ids := token_addr_street_match_tokens(token_info);
  IF address_street_word_ids IS NOT NULL THEN
    out_parent_place_id := getNearestNamedRoadPlaceId(out_partition, place_centroid,
                                                      address_street_word_ids);
--- a/lib-sql/update-postcodes.sql
+++ b/lib-sql/update-postcodes.sql
@@ -1,58 +0,0 @@
 -- Create a temporary table with postcodes from placex.
 CREATE TEMP TABLE tmp_new_postcode_locations AS
 SELECT country_code,
       upper(trim (both ' ' from address->'postcode')) as pc,
       ST_Centroid(ST_Collect(ST_Centroid(geometry))) as centroid
  FROM placex
 WHERE address ? 'postcode'
       AND address->'postcode' NOT SIMILAR TO '%(,|;|:)%'
       AND geometry IS NOT null
 GROUP BY country_code, pc;
 CREATE INDEX idx_tmp_new_postcode_locations
          ON tmp_new_postcode_locations (pc, country_code);
 -- add extra US postcodes
 INSERT INTO tmp_new_postcode_locations (country_code, pc, centroid)
    SELECT 'us', postcode, ST_SetSRID(ST_Point(x,y),4326)
      FROM us_postcode u
      WHERE NOT EXISTS (SELECT 0 FROM tmp_new_postcode_locations new
                         WHERE new.country_code = 'us' AND new.pc = u.postcode);
 -- add extra UK postcodes
 INSERT INTO tmp_new_postcode_locations (country_code, pc, centroid)
    SELECT 'gb', postcode, geometry FROM gb_postcode g
     WHERE NOT EXISTS (SELECT 0 FROM tmp_new_postcode_locations new
                             WHERE new.country_code = 'gb' and new.pc = g.postcode);
 -- Remove all postcodes that are no longer valid
 DELETE FROM location_postcode old
  WHERE NOT EXISTS(SELECT 0 FROM tmp_new_postcode_locations new
                   WHERE old.postcode = new.pc
                         AND old.country_code = new.country_code);
 -- Update geometries where necessary
 UPDATE location_postcode old SET geometry = new.centroid, indexed_status = 1
  FROM tmp_new_postcode_locations new
 WHERE old.postcode = new.pc AND old.country_code = new.country_code
       AND ST_AsText(old.geometry) != ST_AsText(new.centroid);
 -- Remove all postcodes that already exist from the temporary table
 DELETE FROM tmp_new_postcode_locations new
    WHERE EXISTS(SELECT 0 FROM location_postcode old
                 WHERE old.postcode = new.pc AND old.country_code = new.country_code);
 -- Add newly added postcode
 INSERT INTO location_postcode
  (place_id, indexed_status, country_code, postcode, geometry)
  SELECT nextval('seq_place'), 1, country_code, pc, centroid
    FROM tmp_new_postcode_locations new;
 -- Remove unused word entries
 DELETE FROM word
    WHERE class = 'place' AND type = 'postcode'
          AND NOT EXISTS (SELECT 0 FROM location_postcode p
                          WHERE p.postcode = word.word);
 -- Finally index the newly inserted postcodes
 UPDATE location_postcode SET indexed_status = 0 WHERE indexed_status > 0;
--- a/nominatim/cli.py
+++ b/nominatim/cli.py
@@ -13,7 +13,6 @@ from nominatim.tools.exec_utils import run_legacy_script, run_php_server
 from nominatim.errors import UsageError
 from nominatim import clicmd
 from nominatim.clicmd.args import NominatimArgs
 from nominatim.tools import tiger_data
 LOG = logging.getLogger()
@@ -147,9 +146,14 @@ class UpdateAddData:
    @staticmethod
    def run(args):
        from nominatim.tokenizer import factory as tokenizer_factory
        from nominatim.tools import tiger_data
        if args.tiger_data:
            tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
            return tiger_data.add_tiger_data(args.tiger_data,
-                                             args.config, args.threads or 1)
+                                             args.config, args.threads or 1,
                                             tokenizer)
        params = ['update.php']
        if args.file:
--- a/nominatim/clicmd/refresh.py
+++ b/nominatim/clicmd/refresh.py
@@ -45,12 +45,19 @@ class UpdateRefresh:
    @staticmethod
    def run(args):
-        from ..tools import refresh
+        from ..tools import refresh, postcodes
        from ..tokenizer import factory as tokenizer_factory
        from ..indexer.indexer import Indexer
        tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
        if args.postcodes:
            LOG.warning("Update postcodes centroid")
-            refresh.update_postcodes(args.config.get_libpq_dsn(), args.sqllib_dir)
+            postcodes.update_postcodes(args.config.get_libpq_dsn(),
                                       args.project_dir, tokenizer)
            indexer = Indexer(args.config.get_libpq_dsn(), tokenizer,
                              args.threads or 1)
            indexer.index_postcodes()
        if args.word_counts:
            LOG.warning('Recompute frequency of full-word search terms')
@@ -67,7 +74,6 @@ class UpdateRefresh:
            with connect(args.config.get_libpq_dsn()) as conn:
                refresh.create_functions(conn, args.config,
                                         args.diffs, args.enable_debug_statements)
                tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
                tokenizer.update_sql_functions(args.config)
        if args.wiki_data:
@@ -88,6 +94,6 @@ class UpdateRefresh:
        if args.website:
            webdir = args.project_dir / 'website'
            LOG.warning('Setting up website directory at %s', webdir)
-            refresh.setup_website(webdir, args.config)
+            with connect(args.config.get_libpq_dsn()) as conn:
-
+                refresh.setup_website(webdir, args.config, conn)
        return 0
--- a/nominatim/clicmd/setup.py
+++ b/nominatim/clicmd/setup.py
@@ -116,8 +116,8 @@ class SetupAll:
        if args.continue_at is None or args.continue_at == 'load-data':
            LOG.warning('Calculate postcodes')
-            postcodes.import_postcodes(args.config.get_libpq_dsn(), args.project_dir,
+            postcodes.update_postcodes(args.config.get_libpq_dsn(),
-                                       tokenizer)
+                                       args.project_dir, tokenizer)
        if args.continue_at is None or args.continue_at in ('load-data', 'indexing'):
            if args.continue_at is not None and args.continue_at != 'load-data':
@@ -139,7 +139,8 @@ class SetupAll:
        webdir = args.project_dir / 'website'
        LOG.warning('Setup website at %s', webdir)
-        refresh.setup_website(webdir, args.config)
+        with connect(args.config.get_libpq_dsn()) as conn:
            refresh.setup_website(webdir, args.config, conn)
        with connect(args.config.get_libpq_dsn()) as conn:
            try:
--- a/nominatim/db/async_connection.py
+++ b/nominatim/db/async_connection.py
@@ -6,6 +6,9 @@
 """ Database helper functions for the indexer.
 """
 import logging
 import select
 import time
 import psycopg2
 from psycopg2.extras import wait_select
@@ -25,8 +28,9 @@ class DeadlockHandler:
        normally.
    """
-    def __init__(self, handler):
+    def __init__(self, handler, ignore_sql_errors=False):
        self.handler = handler
        self.ignore_sql_errors = ignore_sql_errors
    def __enter__(self):
        pass
@@ -41,6 +45,11 @@ class DeadlockHandler:
                if exc_value.pgcode == '40P01':
                    self.handler()
                    return True
        if self.ignore_sql_errors and isinstance(exc_value, psycopg2.Error):
            LOG.info("SQL error ignored: %s", exc_value)
            return True
        return False
@@ -48,10 +57,11 @@ class DBConnection:
    """ A single non-blocking database connection.
    """
-    def __init__(self, dsn, cursor_factory=None):
+    def __init__(self, dsn, cursor_factory=None, ignore_sql_errors=False):
        self.current_query = None
        self.current_params = None
        self.dsn = dsn
        self.ignore_sql_errors = ignore_sql_errors
        self.conn = None
        self.cursor = None
@@ -98,7 +108,7 @@ class DBConnection:
        """ Block until any pending operation is done.
        """
        while True:
-            with DeadlockHandler(self._deadlock_handler):
+            with DeadlockHandler(self._deadlock_handler, self.ignore_sql_errors):
                wait_select(self.conn)
                self.current_query = None
                return
@@ -125,9 +135,78 @@ class DBConnection:
        if self.current_query is None:
            return True
-        with DeadlockHandler(self._deadlock_handler):
+        with DeadlockHandler(self._deadlock_handler, self.ignore_sql_errors):
            if self.conn.poll() == psycopg2.extensions.POLL_OK:
                self.current_query = None
                return True
        return False
 class WorkerPool:
    """ A pool of asynchronous database connections.
        The pool may be used as a context manager.
    """
    REOPEN_CONNECTIONS_AFTER = 100000
    def __init__(self, dsn, pool_size, ignore_sql_errors=False):
        self.threads = [DBConnection(dsn, ignore_sql_errors=ignore_sql_errors)
                        for _ in range(pool_size)]
        self.free_workers = self._yield_free_worker()
        self.wait_time = 0
    def finish_all(self):
        """ Wait for all connection to finish.
        """
        for thread in self.threads:
            while not thread.is_done():
                thread.wait()
        self.free_workers = self._yield_free_worker()
    def close(self):
        """ Close all connections and clear the pool.
        """
        for thread in self.threads:
            thread.close()
        self.threads = []
        self.free_workers = None
    def next_free_worker(self):
        """ Get the next free connection.
        """
        return next(self.free_workers)
    def _yield_free_worker(self):
        ready = self.threads
        command_stat = 0
        while True:
            for thread in ready:
                if thread.is_done():
                    command_stat += 1
                    yield thread
            if command_stat > self.REOPEN_CONNECTIONS_AFTER:
                for thread in self.threads:
                    while not thread.is_done():
                        thread.wait()
                    thread.connect()
                ready = self.threads
                command_stat = 0
            else:
                tstart = time.time()
                _, ready, _ = select.select([], self.threads, [])
                self.wait_time += time.time() - tstart
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        self.finish_all()
        self.close()
--- a/nominatim/indexer/indexer.py
+++ b/nominatim/indexer/indexer.py
@@ -2,14 +2,13 @@
 Main work horse for indexing (computing addresses) the database.
 """
 import logging
 import select
 import time
 import psycopg2.extras
 from nominatim.indexer.progress import ProgressLogger
 from nominatim.indexer import runners
-from nominatim.db.async_connection import DBConnection
+from nominatim.db.async_connection import DBConnection, WorkerPool
 from nominatim.db.connection import connect
 LOG = logging.getLogger()
@@ -81,73 +80,6 @@ class PlaceFetcher:
        self.conn.wait()
        self.close()
 class WorkerPool:
    """ A pool of asynchronous database connections.
        The pool may be used as a context manager.
    """
    REOPEN_CONNECTIONS_AFTER = 100000
    def __init__(self, dsn, pool_size):
        self.threads = [DBConnection(dsn) for _ in range(pool_size)]
        self.free_workers = self._yield_free_worker()
        self.wait_time = 0
    def finish_all(self):
        """ Wait for all connection to finish.
        """
        for thread in self.threads:
            while not thread.is_done():
                thread.wait()
        self.free_workers = self._yield_free_worker()
    def close(self):
        """ Close all connections and clear the pool.
        """
        for thread in self.threads:
            thread.close()
        self.threads = []
        self.free_workers = None
    def next_free_worker(self):
        """ Get the next free connection.
        """
        return next(self.free_workers)
    def _yield_free_worker(self):
        ready = self.threads
        command_stat = 0
        while True:
            for thread in ready:
                if thread.is_done():
                    command_stat += 1
                    yield thread
            if command_stat > self.REOPEN_CONNECTIONS_AFTER:
                for thread in self.threads:
                    while not thread.is_done():
                        thread.wait()
                    thread.connect()
                ready = self.threads
                command_stat = 0
            else:
                tstart = time.time()
                _, ready, _ = select.select([], self.threads, [])
                self.wait_time += time.time() - tstart
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        self.finish_all()
        self.close()
 class Indexer:
    """ Main indexing routine.
--- a/nominatim/tokenizer/legacy_icu_tokenizer.py
+++ b/nominatim/tokenizer/legacy_icu_tokenizer.py
@@ -263,6 +263,16 @@ class LegacyICUNameAnalyzer:
        """
        return self.normalizer.transliterate(phrase)
    @staticmethod
    def normalize_postcode(postcode):
        """ Convert the postcode to a standardized form.
            This function must yield exactly the same result as the SQL function
            'token_normalized_postcode()'.
        """
        return postcode.strip().upper()
    @functools.lru_cache(maxsize=1024)
    def make_standard_word(self, name):
        """ Create the normalised version of the input.
@@ -285,25 +295,44 @@ class LegacyICUNameAnalyzer:
        return self.transliterator.transliterate(hnr)
-    def add_postcodes_from_db(self):
+    def update_postcodes_from_db(self):
-        """ Add postcodes from the location_postcode table to the word table.
+        """ Update postcode tokens in the word table from the location_postcode
            table.
        """
        to_delete = []
        copystr = io.StringIO()
        with self.conn.cursor() as cur:
-            cur.execute("SELECT distinct(postcode) FROM location_postcode")
+            # This finds us the rows in location_postcode and word that are
-            for (postcode, ) in cur:
+            # missing in the other table.
-                copystr.write(postcode)
+            cur.execute("""SELECT * FROM
-                copystr.write('\t ')
+                            (SELECT pc, word FROM
-                copystr.write(self.transliterator.transliterate(postcode))
+                              (SELECT distinct(postcode) as pc FROM location_postcode) p
-                copystr.write('\tplace\tpostcode\t0\n')
+                              FULL JOIN
                              (SELECT word FROM word
                                WHERE class ='place' and type = 'postcode') w
                              ON pc = word) x
                           WHERE pc is null or word is null""")
-            copystr.seek(0)
+            for postcode, word in cur:
-            cur.copy_from(copystr, 'word',
+                if postcode is None:
-                          columns=['word', 'word_token', 'class', 'type',
+                    to_delete.append(word)
-                                   'search_name_count'])
+                else:
-            # Don't really need an ID for postcodes....
+                    copystr.write(postcode)
-            # cur.execute("""UPDATE word SET word_id = nextval('seq_word')
+                    copystr.write('\t ')
-            #                WHERE word_id is null and type = 'postcode'""")
+                    copystr.write(self.transliterator.transliterate(postcode))
                    copystr.write('\tplace\tpostcode\t0\n')
            if to_delete:
                cur.execute("""DELETE FROM WORD
                               WHERE class ='place' and type = 'postcode'
                                     and word = any(%s)
                            """, (to_delete, ))
            if copystr.getvalue():
                copystr.seek(0)
                cur.copy_from(copystr, 'word',
                              columns=['word', 'word_token', 'class', 'type',
                                       'search_name_count'])
    def update_special_phrases(self, phrases, should_replace):
@@ -435,22 +464,25 @@ class LegacyICUNameAnalyzer:
    def _add_postcode(self, postcode):
        """ Make sure the normalized postcode is present in the word table.
        """
-        if re.search(r'[:,;]', postcode) is None and not postcode in self._cache.postcodes:
+        if re.search(r'[:,;]', postcode) is None:
-            term = self.make_standard_word(postcode)
+            postcode = self.normalize_postcode(postcode)
            if not term:
                return
-            with self.conn.cursor() as cur:
+            if postcode not in self._cache.postcodes:
-                # no word_id needed for postcodes
+                term = self.make_standard_word(postcode)
-                cur.execute("""INSERT INTO word (word, word_token, class, type,
+                if not term:
-                                                 search_name_count)
+                    return
-                               (SELECT pc, %s, 'place', 'postcode', 0
+
-                                FROM (VALUES (%s)) as v(pc)
+                with self.conn.cursor() as cur:
-                                WHERE NOT EXISTS
+                    # no word_id needed for postcodes
-                                 (SELECT * FROM word
+                    cur.execute("""INSERT INTO word (word, word_token, class, type,
-                                  WHERE word = pc and class='place' and type='postcode'))
+                                                     search_name_count)
-                            """, (' ' + term, postcode))
+                                   (SELECT pc, %s, 'place', 'postcode', 0
-            self._cache.postcodes.add(postcode)
+                                    FROM (VALUES (%s)) as v(pc)
                                    WHERE NOT EXISTS
                                     (SELECT * FROM word
                                      WHERE word = pc and class='place' and type='postcode'))
                                """, (' ' + term, postcode))
                self._cache.postcodes.add(postcode)
    @staticmethod
    def _split_housenumbers(hnrs):
--- a/nominatim/tokenizer/legacy_tokenizer.py
+++ b/nominatim/tokenizer/legacy_tokenizer.py
@@ -305,13 +305,51 @@ class LegacyNameAnalyzer:
        return self.normalizer.transliterate(phrase)
-    def add_postcodes_from_db(self):
+    @staticmethod
-        """ Add postcodes from the location_postcode table to the word table.
+    def normalize_postcode(postcode):
        """ Convert the postcode to a standardized form.
            This function must yield exactly the same result as the SQL function
            'token_normalized_postcode()'.
        """
        return postcode.strip().upper()
    def update_postcodes_from_db(self):
        """ Update postcode tokens in the word table from the location_postcode
            table.
        """
        with self.conn.cursor() as cur:
-            cur.execute("""SELECT count(create_postcode_id(pc))
+            # This finds us the rows in location_postcode and word that are
-                           FROM (SELECT distinct(postcode) as pc
+            # missing in the other table.
-                                 FROM location_postcode) x""")
+            cur.execute("""SELECT * FROM
                            (SELECT pc, word FROM
                              (SELECT distinct(postcode) as pc FROM location_postcode) p
                              FULL JOIN
                              (SELECT word FROM word
                                WHERE class ='place' and type = 'postcode') w
                              ON pc = word) x
                           WHERE pc is null or word is null""")
            to_delete = []
            to_add = []
            for postcode, word in cur:
                if postcode is None:
                    to_delete.append(word)
                else:
                    to_add.append(postcode)
            if to_delete:
                cur.execute("""DELETE FROM WORD
                               WHERE class ='place' and type = 'postcode'
                                     and word = any(%s)
                            """, (to_delete, ))
            if to_add:
                cur.execute("""SELECT count(create_postcode_id(pc))
                               FROM unnest(%s) as pc
                            """, (to_add, ))
    def update_special_phrases(self, phrases, should_replace):
@@ -416,12 +454,8 @@ class LegacyNameAnalyzer:
    def _add_postcode(self, postcode):
        """ Make sure the normalized postcode is present in the word table.
        """
        def _create_postcode_from_db(pcode):
            with self.conn.cursor() as cur:
                cur.execute('SELECT create_postcode_id(%s)', (pcode, ))
        if re.search(r'[:,;]', postcode) is None:
-            self._cache.postcodes.get(postcode.strip().upper(), _create_postcode_from_db)
+            self._cache.add_postcode(self.conn, self.normalize_postcode(postcode))
 class _TokenInfo:
@@ -552,16 +586,19 @@ class _TokenCache:
                           FROM generate_series(1, 100) as i""")
            self._cached_housenumbers = {str(r[0]) : r[1] for r in cur}
-        # Get postcodes that are already saved
+        # For postcodes remember the ones that have already been added
-        postcodes = OrderedDict()
+        self.postcodes = set()
        with conn.cursor() as cur:
            cur.execute("""SELECT word FROM word
                           WHERE class ='place' and type = 'postcode'""")
            for row in cur:
                postcodes[row[0]] = None
        self.postcodes = _LRU(maxsize=32, init_data=postcodes)
    def get_housenumber(self, number):
        """ Get a housenumber token from the cache.
        """
        return self._cached_housenumbers.get(number)
    def add_postcode(self, conn, postcode):
        """ Make sure the given postcode is in the database.
        """
        if postcode not in self.postcodes:
            with conn.cursor() as cur:
                cur.execute('SELECT create_postcode_id(%s)', (postcode, ))
            self.postcodes.add(postcode)
--- a/nominatim/tools/migration.py
+++ b/nominatim/tools/migration.py
@@ -185,8 +185,8 @@ def install_legacy_tokenizer(conn, config, **_):
                                           WHERE table_name = %s
                                           and column_name = 'token_info'""",
                                        (table, ))
-            if has_column == 0:
+                if has_column == 0:
-                cur.execute('ALTER TABLE {} ADD COLUMN token_info JSONB'.format(table))
+                    cur.execute('ALTER TABLE {} ADD COLUMN token_info JSONB'.format(table))
        tokenizer = tokenizer_factory.create_tokenizer(config, init_db=False,
                                                       module_name='legacy')
--- a/nominatim/tools/postcodes.py
+++ b/nominatim/tools/postcodes.py
@@ -2,80 +2,196 @@
 Functions for importing, updating and otherwise maintaining the table
 of artificial postcode centroids.
 """
 import csv
 import gzip
 import logging
 from math import isfinite
 from psycopg2.extras import execute_values
 from nominatim.db.utils import execute_file
 from nominatim.db.connection import connect
-def import_postcodes(dsn, project_dir, tokenizer):
+LOG = logging.getLogger()
-    """ Set up the initial list of postcodes.
+
 def _to_float(num, max_value):
    """ Convert the number in string into a float. The number is expected
        to be in the range of [-max_value, max_value]. Otherwise rises a
        ValueError.
    """
    num = float(num)
    if not isfinite(num) or num <= -max_value or num >= max_value:
        raise ValueError()
    return num
 class _CountryPostcodesCollector:
    """ Collector for postcodes of a single country.
    """
-    with connect(dsn) as conn:
+    def __init__(self, country):
-        conn.drop_table('gb_postcode')
+        self.country = country
-        conn.drop_table('us_postcode')
+        self.collected = dict()
    def add(self, postcode, x, y):
        """ Add the given postcode to the collection cache. If the postcode
            already existed, it is overwritten with the new centroid.
        """
        self.collected[postcode] = (x, y)
    def commit(self, conn, analyzer, project_dir):
        """ Update postcodes for the country from the postcodes selected so far
            as well as any externally supplied postcodes.
        """
        self._update_from_external(analyzer, project_dir)
        to_add, to_delete, to_update = self._compute_changes(conn)
        LOG.info("Processing country '%s' (%s added, %s deleted, %s updated).",
                 self.country, len(to_add), len(to_delete), len(to_update))
        with conn.cursor() as cur:
-            cur.execute("""CREATE TABLE gb_postcode (
+            if to_add:
-                            id integer,
+                execute_values(cur,
-                            postcode character varying(9),
+                               """INSERT INTO location_postcode
-                            geometry GEOMETRY(Point, 4326))""")
+                                      (place_id, indexed_status, country_code,
                                       postcode, geometry) VALUES %s""",
                               to_add,
                               template="""(nextval('seq_place'), 1, '{}',
                                           %s, 'SRID=4326;POINT(%s %s)')
                                        """.format(self.country))
            if to_delete:
                cur.execute("""DELETE FROM location_postcode
                               WHERE country_code = %s and postcode = any(%s)
                            """, (self.country, to_delete))
            if to_update:
                execute_values(cur,
                               """UPDATE location_postcode
                                  SET indexed_status = 2,
                                      geometry = ST_SetSRID(ST_Point(v.x, v.y), 4326)
                                  FROM (VALUES %s) AS v (pc, x, y)
                                  WHERE country_code = '{}' and postcode = pc
                               """.format(self.country),
                               to_update)
    def _compute_changes(self, conn):
        """ Compute which postcodes from the collected postcodes have to be
            added or modified and which from the location_postcode table
            have to be deleted.
        """
        to_update = []
        to_delete = []
        with conn.cursor() as cur:
-            cur.execute("""CREATE TABLE us_postcode (
+            cur.execute("""SELECT postcode, ST_X(geometry), ST_Y(geometry)
-                            postcode text,
+                           FROM location_postcode
-                            x double precision,
+                           WHERE country_code = %s""",
-                            y double precision)""")
+                        (self.country, ))
-        conn.commit()
+            for postcode, x, y in cur:
                newx, newy = self.collected.pop(postcode, (None, None))
                if newx is not None:
                    dist = (x - newx)**2 + (y - newy)**2
                    if dist > 0.0000001:
                        to_update.append((postcode, newx, newy))
                else:
                    to_delete.append(postcode)
-        gb_postcodes = project_dir / 'gb_postcode_data.sql.gz'
+        to_add = [(k, v[0], v[1]) for k, v in self.collected.items()]
-        if gb_postcodes.is_file():
+        self.collected = []
            execute_file(dsn, gb_postcodes)
-        us_postcodes = project_dir / 'us_postcode_data.sql.gz'
+        return to_add, to_delete, to_update
        if us_postcodes.is_file():
            execute_file(dsn, us_postcodes)
        with conn.cursor() as cur:
            cur.execute("TRUNCATE location_postcode")
            cur.execute("""
                INSERT INTO location_postcode
                 (place_id, indexed_status, country_code, postcode, geometry)
                SELECT nextval('seq_place'), 1, country_code,
                       token_normalized_postcode(address->'postcode') as pc,
                       ST_Centroid(ST_Collect(ST_Centroid(geometry)))
                  FROM placex
                 WHERE address ? 'postcode'
                       and token_normalized_postcode(address->'postcode') is not null
                       AND geometry IS NOT null
                 GROUP BY country_code, pc
            """)
-            cur.execute("""
+    def _update_from_external(self, analyzer, project_dir):
-                INSERT INTO location_postcode
+        """ Look for an external postcode file for the active country in
-                 (place_id, indexed_status, country_code, postcode, geometry)
+            the project directory and add missing postcodes when found.
-                SELECT nextval('seq_place'), 1, 'us',
+        """
-                       token_normalized_postcode(postcode),
+        csvfile = self._open_external(project_dir)
-                       ST_SetSRID(ST_Point(x,y),4326)
+        if csvfile is None:
-                  FROM us_postcode WHERE token_normalized_postcode(postcode) NOT IN
+            return
                        (SELECT postcode FROM location_postcode
                          WHERE country_code = 'us')
            """)
-            cur.execute("""
+        try:
-                INSERT INTO location_postcode
+            reader = csv.DictReader(csvfile)
-                 (place_id, indexed_status, country_code, postcode, geometry)
+            for row in reader:
-                SELECT nextval('seq_place'), 1, 'gb',
+                if 'postcode' not in row or 'lat' not in row or 'lon' not in row:
-                       token_normalized_postcode(postcode), geometry
+                    LOG.warning("Bad format for external postcode file for country '%s'."
-                  FROM gb_postcode WHERE token_normalized_postcode(postcode) NOT IN
+                                " Ignored.", self.country)
-                           (SELECT postcode FROM location_postcode
+                    return
-                             WHERE country_code = 'gb')
+                postcode = analyzer.normalize_postcode(row['postcode'])
-            """)
+                if postcode not in self.collected:
                    try:
                        self.collected[postcode] = (_to_float(row['lon'], 180),
                                                    _to_float(row['lat'], 90))
                    except ValueError:
                        LOG.warning("Bad coordinates %s, %s in %s country postcode file.",
                                    row['lat'], row['lon'], self.country)
-            cur.execute("""
+        finally:
-                    DELETE FROM word WHERE class='place' and type='postcode'
+            csvfile.close()
                    and word NOT IN (SELECT postcode FROM location_postcode)
            """)
        conn.commit()
-        with tokenizer.name_analyzer() as analyzer:
+
-            analyzer.add_postcodes_from_db()
+    def _open_external(self, project_dir):
        fname = project_dir / '{}_postcodes.csv'.format(self.country)
        if fname.is_file():
            LOG.info("Using external postcode file '%s'.", fname)
            return open(fname, 'r')
        fname = project_dir / '{}_postcodes.csv.gz'.format(self.country)
        if fname.is_file():
            LOG.info("Using external postcode file '%s'.", fname)
            return gzip.open(fname, 'rt')
        return None
 def update_postcodes(dsn, project_dir, tokenizer):
    """ Update the table of artificial postcodes.
        Computes artificial postcode centroids from the placex table,
        potentially enhances it with external data and then updates the
        postcodes in the table 'location_postcode'.
    """
    with tokenizer.name_analyzer() as analyzer:
        with connect(dsn) as conn:
            # First get the list of countries that currently have postcodes.
            # (Doing this before starting to insert, so it is fast on import.)
            with conn.cursor() as cur:
                cur.execute("SELECT DISTINCT country_code FROM location_postcode")
                todo_countries = set((row[0] for row in cur))
            # Recompute the list of valid postcodes from placex.
            with conn.cursor(name="placex_postcodes") as cur:
                cur.execute("""SELECT country_code, pc, ST_X(centroid), ST_Y(centroid)
                               FROM (
                                 SELECT country_code,
                                        token_normalized_postcode(address->'postcode') as pc,
                                        ST_Centroid(ST_Collect(ST_Centroid(geometry))) as centroid
                                 FROM placex
                                 WHERE address ? 'postcode' and geometry IS NOT null
                                       and country_code is not null
                                 GROUP BY country_code, pc) xx
                               WHERE pc is not null
                               ORDER BY country_code, pc""")
                collector = None
                for country, postcode, x, y in cur:
                    if collector is None or country != collector.country:
                        if collector is not None:
                            collector.commit(conn, analyzer, project_dir)
                        collector = _CountryPostcodesCollector(country)
                        todo_countries.discard(country)
                    collector.add(postcode, x, y)
                if collector is not None:
                    collector.commit(conn, analyzer, project_dir)
            # Now handle any countries that are only in the postcode table.
            for country in todo_countries:
                _CountryPostcodesCollector(country).commit(conn, analyzer, project_dir)
            conn.commit()
        analyzer.update_postcodes_from_db()
--- a/nominatim/tools/refresh.py
+++ b/nominatim/tools/refresh.py
@@ -13,12 +13,6 @@ from nominatim.version import NOMINATIM_VERSION
 LOG = logging.getLogger()
 def update_postcodes(dsn, sql_dir):
    """ Recalculate postcode centroids and add, remove and update entries in the
        location_postcode table. `conn` is an opne connection to the database.
    """
    execute_file(dsn, sql_dir / 'update-postcodes.sql')
 def recompute_word_counts(dsn, sql_dir):
    """ Compute the frequency of full-word search terms.
@@ -161,7 +155,7 @@ def recompute_importance(conn):
    conn.commit()
-def setup_website(basedir, config):
+def setup_website(basedir, config, conn):
    """ Create the website script stubs.
    """
    if not basedir.exists():
@@ -193,5 +187,10 @@ def setup_website(basedir, config):
    template += "\nrequire_once('{}/website/{{}}');\n".format(config.lib_dir.php)
    search_name_table_exists = bool(conn and conn.table_exists('search_name'))
    for script in WEBSITE_SCRIPTS:
-        (basedir / script).write_text(template.format(script), 'utf-8')
+        if not search_name_table_exists and script == 'search.php':
            (basedir / script).write_text(template.format('reverse-only-search.php'), 'utf-8')
        else:
            (basedir / script).write_text(template.format(script), 'utf-8')
--- a/nominatim/tools/tiger_data.py
+++ b/nominatim/tools/tiger_data.py
@@ -1,15 +1,18 @@
 """
 Functions for importing tiger data and handling tarbar and directory files
 """
 import csv
 import io
 import logging
 import os
 import tarfile
-import selectors
+
 import psycopg2.extras
 from nominatim.db.connection import connect
-from nominatim.db.async_connection import DBConnection
+from nominatim.db.async_connection import WorkerPool
 from nominatim.db.sql_preprocessor import SQLPreprocessor
-
+from nominatim.errors import UsageError
 LOG = logging.getLogger()
@@ -20,96 +23,81 @@ def handle_tarfile_or_directory(data_dir):
    tar = None
    if data_dir.endswith('.tar.gz'):
-        tar = tarfile.open(data_dir)
+        try:
-        sql_files = [i for i in tar.getmembers() if i.name.endswith('.sql')]
+            tar = tarfile.open(data_dir)
-        LOG.warning("Found %d SQL files in tarfile with path %s", len(sql_files), data_dir)
+        except tarfile.ReadError as err:
-        if not sql_files:
+            LOG.fatal("Cannot open '%s'. Is this a tar file?", data_dir)
            raise UsageError("Cannot open Tiger data file.") from err
        csv_files = [i for i in tar.getmembers() if i.name.endswith('.csv')]
        LOG.warning("Found %d CSV files in tarfile with path %s", len(csv_files), data_dir)
        if not csv_files:
            LOG.warning("Tiger data import selected but no files in tarfile's path %s", data_dir)
            return None, None
    else:
        files = os.listdir(data_dir)
-        sql_files = [os.path.join(data_dir, i) for i in files if i.endswith('.sql')]
+        csv_files = [os.path.join(data_dir, i) for i in files if i.endswith('.csv')]
-        LOG.warning("Found %d SQL files in path %s", len(sql_files), data_dir)
+        LOG.warning("Found %d CSV files in path %s", len(csv_files), data_dir)
-        if not sql_files:
+        if not csv_files:
            LOG.warning("Tiger data import selected but no files found in path %s", data_dir)
            return None, None
-    return sql_files, tar
+    return csv_files, tar
-def handle_threaded_sql_statements(sel, file):
+def handle_threaded_sql_statements(pool, fd, analyzer):
    """ Handles sql statement with multiplexing
    """
    lines = 0
    end_of_file = False
    # Using pool of database connections to execute sql statements
    while not end_of_file:
        for key, _ in sel.select(1):
            conn = key.data
            try:
                if conn.is_done():
                    sql_query = file.readline()
                    lines += 1
                    if not sql_query:
                        end_of_file = True
                        break
                    conn.perform(sql_query)
                    if lines == 1000:
                        print('. ', end='', flush=True)
                        lines = 0
            except Exception as exc: # pylint: disable=broad-except
                LOG.info('Wrong SQL statement: %s', exc)
-def handle_unregister_connection_pool(sel, place_threads):
+    sql = "SELECT tiger_line_import(%s, %s, %s, %s, %s, %s)"
    """ Handles unregistering pool of connections
    """
-    while place_threads > 0:
+    for row in csv.DictReader(fd, delimiter=';'):
-        for key, _ in sel.select(1):
+        try:
-            conn = key.data
+            address = dict(street=row['street'], postcode=row['postcode'])
-            sel.unregister(conn)
+            args = ('SRID=4326;' + row['geometry'],
-            try:
+                    int(row['from']), int(row['to']), row['interpolation'],
-                conn.wait()
+                    psycopg2.extras.Json(analyzer.process_place(dict(address=address))),
-            except Exception as exc: # pylint: disable=broad-except
+                    analyzer.normalize_postcode(row['postcode']))
-                LOG.info('Wrong SQL statement: %s', exc)
+        except ValueError:
-            conn.close()
+            continue
-            place_threads -= 1
+        pool.next_free_worker().perform(sql, args=args)
-def add_tiger_data(data_dir, config, threads):
+        lines += 1
        if lines == 1000:
            print('.', end='', flush=True)
            lines = 0
 def add_tiger_data(data_dir, config, threads, tokenizer):
    """ Import tiger data from directory or tar file `data dir`.
    """
    dsn = config.get_libpq_dsn()
-    sql_files, tar = handle_tarfile_or_directory(data_dir)
+    files, tar = handle_tarfile_or_directory(data_dir)
-    if not sql_files:
+    if not files:
        return
    with connect(dsn) as conn:
        sql = SQLPreprocessor(conn, config)
        sql.run_sql_file(conn, 'tiger_import_start.sql')
-    # Reading sql_files and then for each file line handling
+    # Reading files and then for each file line handling
    # sql_query in <threads - 1> chunks.
    sel = selectors.DefaultSelector()
    place_threads = max(1, threads - 1)
-    # Creates a pool of database connections
+    with WorkerPool(dsn, place_threads, ignore_sql_errors=True) as pool:
-    for _ in range(place_threads):
+        with tokenizer.name_analyzer() as analyzer:
-        conn = DBConnection(dsn)
+            for fname in files:
-        conn.connect()
+                if not tar:
-        sel.register(conn, selectors.EVENT_WRITE, conn)
+                    fd = open(fname)
                else:
                    fd = io.TextIOWrapper(tar.extractfile(fname))
-    for sql_file in sql_files:
+                handle_threaded_sql_statements(pool, fd, analyzer)
        if not tar:
            file = open(sql_file)
        else:
            file = tar.extractfile(sql_file)
-        handle_threaded_sql_statements(sel, file)
+                fd.close()
    # Unregistering pool of database connections
    handle_unregister_connection_pool(sel, place_threads)
    if tar:
        tar.close()
--- a/test/bdd/steps/nominatim_environment.py
+++ b/test/bdd/steps/nominatim_environment.py
@@ -9,6 +9,7 @@ sys.path.insert(1, str((Path(__file__) / '..' / '..' / '..' / '..').resolve()))
 from nominatim import cli
 from nominatim.config import Configuration
 from nominatim.db.connection import _Connection
 from nominatim.tools import refresh
 from nominatim.tokenizer import factory as tokenizer_factory
 from steps.utils import run_script
@@ -54,7 +55,7 @@ class NominatimEnvironment:
            dbargs['user'] = self.db_user
        if self.db_pass:
            dbargs['password'] = self.db_pass
-        conn = psycopg2.connect(**dbargs)
+        conn = psycopg2.connect(connection_factory=_Connection, **dbargs)
        return conn
    def next_code_coverage_file(self):
@@ -110,8 +111,13 @@ class NominatimEnvironment:
            self.website_dir.cleanup()
        self.website_dir = tempfile.TemporaryDirectory()
        try:
            conn = self.connect_database(dbname)
        except:
            conn = False
        refresh.setup_website(Path(self.website_dir.name) / 'website',
-                              self.get_test_config())
+                              self.get_test_config(), conn)
    def get_test_config(self):
@@ -228,13 +234,13 @@ class NominatimEnvironment:
        """ Setup a test against a fresh, empty test database.
        """
        self.setup_template_db()
        self.write_nominatim_config(self.test_db)
        conn = self.connect_database(self.template_db)
        conn.set_isolation_level(0)
        cur = conn.cursor()
        cur.execute('DROP DATABASE IF EXISTS {}'.format(self.test_db))
        cur.execute('CREATE DATABASE {} TEMPLATE = {}'.format(self.test_db, self.template_db))
        conn.close()
        self.write_nominatim_config(self.test_db)
        context.db = self.connect_database(self.test_db)
        context.db.autocommit = True
        psycopg2.extras.register_hstore(context.db, globally=False)
--- a/test/bdd/steps/steps_db_ops.py
+++ b/test/bdd/steps/steps_db_ops.py
@@ -251,7 +251,7 @@ def check_location_postcode(context):
    with context.db.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:
        cur.execute("SELECT *, ST_AsText(geometry) as geomtxt FROM location_postcode")
        assert cur.rowcount == len(list(context.table)), \
-            "Postcode table has {} rows, expected {}.".foramt(cur.rowcount, len(list(context.table)))
+            "Postcode table has {} rows, expected {}.".format(cur.rowcount, len(list(context.table)))
        results = {}
        for row in cur:
--- a/test/python/conftest.py
+++ b/test/python/conftest.py
@@ -19,6 +19,7 @@ from nominatim.db.sql_preprocessor import SQLPreprocessor
 from nominatim.db import properties
 import dummy_tokenizer
 import mocks
 class _TestingCursor(psycopg2.extras.DictCursor):
    """ Extension to the DictCursor class that provides execution
@@ -211,33 +212,7 @@ def place_row(place_table, temp_db_cursor):
 def placex_table(temp_db_with_extensions, temp_db_conn):
    """ Create an empty version of the place table.
    """
-    with temp_db_conn.cursor() as cur:
+    return mocks.MockPlacexTable(temp_db_conn)
        cur.execute("""CREATE TABLE placex (
                           place_id BIGINT,
                           parent_place_id BIGINT,
                           linked_place_id BIGINT,
                           importance FLOAT,
                           indexed_date TIMESTAMP,
                           geometry_sector INTEGER,
                           rank_address SMALLINT,
                           rank_search SMALLINT,
                           partition SMALLINT,
                           indexed_status SMALLINT,
                           osm_id int8,
                           osm_type char(1),
                           class text,
                           type text,
                           name hstore,
                           admin_level smallint,
                           address hstore,
                           extratags hstore,
                           geometry Geometry(Geometry,4326),
                           wikipedia TEXT,
                           country_code varchar(2),
                           housenumber TEXT,
                           postcode TEXT,
                           centroid GEOMETRY(Geometry, 4326))""")
    temp_db_conn.commit()
@pytest.fixture
@@ -262,18 +237,8 @@ def osmline_table(temp_db_with_extensions, temp_db_conn):
@pytest.fixture
-def word_table(temp_db, temp_db_conn):
+def word_table(temp_db_conn):
-    with temp_db_conn.cursor() as cur:
+    return mocks.MockWordTable(temp_db_conn)
        cur.execute("""CREATE TABLE word (
                           word_id INTEGER,
                           word_token text,
                           word text,
                           class text,
                           type text,
                           country_code varchar(2),
                           search_name_count INTEGER,
                           operator TEXT)""")
    temp_db_conn.commit()
@pytest.fixture
--- a/test/python/dummy_tokenizer.py
+++ b/test/python/dummy_tokenizer.py
@@ -51,7 +51,10 @@ class DummyNameAnalyzer:
    def close(self):
        pass
-    def add_postcodes_from_db(self):
+    def normalize_postcode(self, postcode):
        return postcode
    def update_postcodes_from_db(self):
        pass
    def update_special_phrases(self, phrases, should_replace):
--- a/test/python/mocks.py
+++ b/test/python/mocks.py
@@ -1,7 +1,9 @@
 """
 Custom mocks for testing.
 """
 import itertools
 import psycopg2.extras
 class MockParamCapture:
    """ Mock that records the parameters with which a function was called
@@ -16,3 +18,110 @@ class MockParamCapture:
        self.last_args = args
        self.last_kwargs = kwargs
        return self.return_value
 class MockWordTable:
    """ A word table for testing.
    """
    def __init__(self, conn):
        self.conn = conn
        with conn.cursor() as cur:
            cur.execute("""CREATE TABLE word (word_id INTEGER,
                                              word_token text,
                                              word text,
                                              class text,
                                              type text,
                                              country_code varchar(2),
                                              search_name_count INTEGER,
                                              operator TEXT)""")
        conn.commit()
    def add_special(self, word_token, word, cls, typ, op):
        with self.conn.cursor() as cur:
            cur.execute("""INSERT INTO word (word_token, word, class, type, operator)
                              VALUES (%s, %s, %s, %s, %s)
                        """, (word_token, word, cls, typ, op))
        self.conn.commit()
    def add_postcode(self, word_token, postcode):
        with self.conn.cursor() as cur:
            cur.execute("""INSERT INTO word (word_token, word, class, type)
                              VALUES (%s, %s, 'place', 'postcode')
                        """, (word_token, postcode))
        self.conn.commit()
    def count(self):
        with self.conn.cursor() as cur:
            return cur.scalar("SELECT count(*) FROM word")
    def count_special(self):
        with self.conn.cursor() as cur:
            return cur.scalar("SELECT count(*) FROM word WHERE class != 'place'")
    def get_special(self):
        with self.conn.cursor() as cur:
            cur.execute("""SELECT word_token, word, class, type, operator
                           FROM word WHERE class != 'place'""")
            return set((tuple(row) for row in cur))
    def get_postcodes(self):
        with self.conn.cursor() as cur:
            cur.execute("""SELECT word FROM word
                           WHERE class = 'place' and type = 'postcode'""")
            return set((row[0] for row in cur))
 class MockPlacexTable:
    """ A placex table for testing.
    """
    def __init__(self, conn):
        self.idseq = itertools.count(10000)
        self.conn = conn
        with conn.cursor() as cur:
            cur.execute("""CREATE TABLE placex (
                               place_id BIGINT,
                               parent_place_id BIGINT,
                               linked_place_id BIGINT,
                               importance FLOAT,
                               indexed_date TIMESTAMP,
                               geometry_sector INTEGER,
                               rank_address SMALLINT,
                               rank_search SMALLINT,
                               partition SMALLINT,
                               indexed_status SMALLINT,
                               osm_id int8,
                               osm_type char(1),
                               class text,
                               type text,
                               name hstore,
                               admin_level smallint,
                               address hstore,
                               extratags hstore,
                               geometry Geometry(Geometry,4326),
                               wikipedia TEXT,
                               country_code varchar(2),
                               housenumber TEXT,
                               postcode TEXT,
                               centroid GEOMETRY(Geometry, 4326))""")
            cur.execute("CREATE SEQUENCE IF NOT EXISTS seq_place")
        conn.commit()
    def add(self, osm_type='N', osm_id=None, cls='amenity', typ='cafe', names=None,
            admin_level=None, address=None, extratags=None, geom='POINT(10 4)',
            country=None):
        with self.conn.cursor() as cur:
            psycopg2.extras.register_hstore(cur)
            cur.execute("""INSERT INTO placex (place_id, osm_type, osm_id, class,
                                               type, name, admin_level, address,
                                               extratags, geometry, country_code)
                            VALUES(nextval('seq_place'), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
                        (osm_type, osm_id or next(self.idseq), cls, typ, names,
                         admin_level, address, extratags, 'SRID=4326;' + geom,
                         country))
        self.conn.commit()
--- a/test/python/test_cli.py
+++ b/test/python/test_cli.py
@@ -120,7 +120,7 @@ def test_import_full(temp_db, mock_func_factory, tokenizer_mock):
        mock_func_factory(nominatim.tools.database_import, 'create_search_indices'),
        mock_func_factory(nominatim.tools.database_import, 'create_country_names'),
        mock_func_factory(nominatim.tools.refresh, 'load_address_levels_from_file'),
-        mock_func_factory(nominatim.tools.postcodes, 'import_postcodes'),
+        mock_func_factory(nominatim.tools.postcodes, 'update_postcodes'),
        mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'),
        mock_func_factory(nominatim.tools.refresh, 'setup_website'),
        mock_func_factory(nominatim.db.properties, 'set_property')
@@ -143,7 +143,7 @@ def test_import_continue_load_data(temp_db, mock_func_factory, tokenizer_mock):
        mock_func_factory(nominatim.tools.database_import, 'load_data'),
        mock_func_factory(nominatim.tools.database_import, 'create_search_indices'),
        mock_func_factory(nominatim.tools.database_import, 'create_country_names'),
-        mock_func_factory(nominatim.tools.postcodes, 'import_postcodes'),
+        mock_func_factory(nominatim.tools.postcodes, 'update_postcodes'),
        mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_full'),
        mock_func_factory(nominatim.tools.refresh, 'setup_website'),
        mock_func_factory(nominatim.db.properties, 'set_property')
@@ -280,20 +280,26 @@ def test_special_phrases_csv_command(temp_db, mock_func_factory, tokenizer_mock,
    assert func.called == 1
@pytest.mark.parametrize("command,func", [
                         ('postcodes', 'update_postcodes'),
                         ('word-counts', 'recompute_word_counts'),
                         ('address-levels', 'load_address_levels_from_file'),
                         ('wiki-data', 'import_wikipedia_articles'),
                         ('importance', 'recompute_importance'),
                         ('website', 'setup_website'),
                         ])
-def test_refresh_command(mock_func_factory, temp_db, command, func):
+def test_refresh_command(mock_func_factory, temp_db, command, func, tokenizer_mock):
    func_mock = mock_func_factory(nominatim.tools.refresh, func)
    assert 0 == call_nominatim('refresh', '--' + command)
    assert func_mock.called == 1
 def test_refresh_postcodes(mock_func_factory, temp_db, tokenizer_mock):
    func_mock = mock_func_factory(nominatim.tools.postcodes, 'update_postcodes')
    idx_mock = mock_func_factory(nominatim.indexer.indexer.Indexer, 'index_postcodes')
    assert 0 == call_nominatim('refresh', '--postcodes')
    assert func_mock.called == 1
 def test_refresh_create_functions(mock_func_factory, temp_db, tokenizer_mock):
    func_mock = mock_func_factory(nominatim.tools.refresh, 'create_functions')
@@ -302,7 +308,7 @@ def test_refresh_create_functions(mock_func_factory, temp_db, tokenizer_mock):
    assert tokenizer_mock.update_sql_functions_called
-def test_refresh_importance_computed_after_wiki_import(monkeypatch, temp_db):
+def test_refresh_importance_computed_after_wiki_import(monkeypatch, temp_db, tokenizer_mock):
    calls = []
    monkeypatch.setattr(nominatim.tools.refresh, 'import_wikipedia_articles',
                        lambda *args, **kwargs: calls.append('import') or 0)
--- a/test/python/test_db_async_connection.py
+++ b/test/python/test_db_async_connection.py
@@ -56,13 +56,21 @@ def test_bad_query(conn):
        conn.wait()
 def test_bad_query_ignore(temp_db):
    with closing(DBConnection('dbname=' + temp_db, ignore_sql_errors=True)) as conn:
        conn.connect()
        conn.perform('SELECT efasfjsea')
        conn.wait()
 def exec_with_deadlock(cur, sql, detector):
    with DeadlockHandler(lambda *args: detector.append(1)):
        cur.execute(sql)
 def test_deadlock(simple_conns):
    print(psycopg2.__version__)
    cur1, cur2 = simple_conns
    cur1.execute("""CREATE TABLE t1 (id INT PRIMARY KEY, t TEXT);
--- a/test/python/test_tokenizer_legacy.py
+++ b/test/python/test_tokenizer_legacy.py
@@ -77,12 +77,12 @@ def make_standard_name(temp_db_cursor):
@pytest.fixture
-def create_postcode_id(table_factory, temp_db_cursor):
+def create_postcode_id(temp_db_cursor):
    table_factory('out_postcode_table', 'postcode TEXT')
    temp_db_cursor.execute("""CREATE OR REPLACE FUNCTION create_postcode_id(postcode TEXT)
                              RETURNS BOOLEAN AS $$
-                              INSERT INTO out_postcode_table VALUES (postcode) RETURNING True;
+                              INSERT INTO word (word_token, word, class, type)
                                VALUES (' ' || postcode, postcode, 'place', 'postcode')
                              RETURNING True;
                              $$ LANGUAGE SQL""")
@@ -192,27 +192,38 @@ def test_normalize(analyzer):
    assert analyzer.normalize('TEsT') == 'test'
-def test_add_postcodes_from_db(analyzer, table_factory, temp_db_cursor,
+def test_update_postcodes_from_db_empty(analyzer, table_factory, word_table,
-                               create_postcode_id):
+                                        create_postcode_id):
    table_factory('location_postcode', 'postcode TEXT',
                  content=(('1234',), ('12 34',), ('AB23',), ('1234',)))
-    analyzer.add_postcodes_from_db()
+    analyzer.update_postcodes_from_db()
-    assert temp_db_cursor.row_set("SELECT * from out_postcode_table") \
+    assert word_table.count() == 3
-               == set((('1234', ), ('12 34', ), ('AB23',)))
+    assert word_table.get_postcodes() == {'1234', '12 34', 'AB23'}
-def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor,
+def test_update_postcodes_from_db_add_and_remove(analyzer, table_factory, word_table,
-                                           make_standard_name):
+                                                 create_postcode_id):
    table_factory('location_postcode', 'postcode TEXT',
                  content=(('1234',), ('45BC', ), ('XX45', )))
    word_table.add_postcode(' 1234', '1234')
    word_table.add_postcode(' 5678', '5678')
    analyzer.update_postcodes_from_db()
    assert word_table.count() == 3
    assert word_table.get_postcodes() == {'1234', '45BC', 'XX45'}
 def test_update_special_phrase_empty_table(analyzer, word_table, make_standard_name):
    analyzer.update_special_phrases([
        ("König bei", "amenity", "royal", "near"),
        ("Könige", "amenity", "royal", "-"),
        ("strasse", "highway", "primary", "in")
    ], True)
-    assert temp_db_cursor.row_set("""SELECT word_token, word, class, type, operator
+    assert word_table.get_special() \
                                     FROM word WHERE class != 'place'""") \
               == set(((' könig bei', 'könig bei', 'amenity', 'royal', 'near'),
                       (' könige', 'könige', 'amenity', 'royal', None),
                       (' strasse', 'strasse', 'highway', 'primary', 'in')))
@@ -220,15 +231,14 @@ def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor,
 def test_update_special_phrase_delete_all(analyzer, word_table, temp_db_cursor,
                                          make_standard_name):
-    temp_db_cursor.execute("""INSERT INTO word (word_token, word, class, type, operator)
+    word_table.add_special(' foo', 'foo', 'amenity', 'prison', 'in')
-                              VALUES (' foo', 'foo', 'amenity', 'prison', 'in'),
+    word_table.add_special(' bar', 'bar', 'highway', 'road', None)
                                     (' bar', 'bar', 'highway', 'road', null)""")
-    assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""")
+    assert word_table.count_special() == 2
    analyzer.update_special_phrases([], True)
-    assert 0 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""")
+    assert word_table.count_special() == 0
 def test_update_special_phrases_no_replace(analyzer, word_table, temp_db_cursor,
@@ -244,13 +254,11 @@ def test_update_special_phrases_no_replace(analyzer, word_table, temp_db_cursor,
    assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""")
-def test_update_special_phrase_modify(analyzer, word_table, temp_db_cursor,
+def test_update_special_phrase_modify(analyzer, word_table, make_standard_name):
-                                      make_standard_name):
+    word_table.add_special(' foo', 'foo', 'amenity', 'prison', 'in')
-    temp_db_cursor.execute("""INSERT INTO word (word_token, word, class, type, operator)
+    word_table.add_special(' bar', 'bar', 'highway', 'road', None)
                              VALUES (' foo', 'foo', 'amenity', 'prison', 'in'),
                                     (' bar', 'bar', 'highway', 'road', null)""")
-    assert 2 == temp_db_cursor.scalar("SELECT count(*) FROM word WHERE class != 'place'""")
+    assert word_table.count_special() == 2
    analyzer.update_special_phrases([
      ('prison', 'amenity', 'prison', 'in'),
@@ -258,8 +266,7 @@ def test_update_special_phrase_modify(analyzer, word_table, temp_db_cursor,
      ('garden', 'leisure', 'garden', 'near')
    ], True)
-    assert temp_db_cursor.row_set("""SELECT word_token, word, class, type, operator
+    assert word_table.get_special() \
                                     FROM word WHERE class != 'place'""") \
               == set(((' prison', 'prison', 'amenity', 'prison', 'in'),
                       (' bar', 'bar', 'highway', 'road', None),
                       (' garden', 'garden', 'leisure', 'garden', 'near')))
@@ -273,21 +280,17 @@ def test_process_place_names(analyzer, make_keywords):
@pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345'])
-def test_process_place_postcode(analyzer, temp_db_cursor, create_postcode_id, pc):
+def test_process_place_postcode(analyzer, create_postcode_id, word_table, pc):
    info = analyzer.process_place({'address': {'postcode' : pc}})
-    assert temp_db_cursor.row_set("SELECT * from out_postcode_table") \
+    assert word_table.get_postcodes() == {pc, }
               == set(((pc, ),))
@pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836'])
-def test_process_place_bad_postcode(analyzer, temp_db_cursor, create_postcode_id,
+def test_process_place_bad_postcode(analyzer, create_postcode_id, word_table, pc):
                                    pc):
    info = analyzer.process_place({'address': {'postcode' : pc}})
-    assert 0 == temp_db_cursor.scalar("SELECT count(*) from out_postcode_table")
+    assert not word_table.get_postcodes()
@pytest.mark.parametrize('hnr', ['123a', '1', '101'])
--- a/test/python/test_tokenizer_legacy_icu.py
+++ b/test/python/test_tokenizer_legacy_icu.py
@@ -141,16 +141,28 @@ def test_make_standard_hnr(analyzer):
        assert a._make_standard_hnr('iv') == 'IV'
-def test_add_postcodes_from_db(analyzer, word_table, table_factory, temp_db_cursor):
+def test_update_postcodes_from_db_empty(analyzer, table_factory, word_table):
    table_factory('location_postcode', 'postcode TEXT',
                  content=(('1234',), ('12 34',), ('AB23',), ('1234',)))
    with analyzer() as a:
-        a.add_postcodes_from_db()
+        a.update_postcodes_from_db()
-    assert temp_db_cursor.row_set("""SELECT word, word_token from word
+    assert word_table.count() == 3
-                                     """) \
+    assert word_table.get_postcodes() == {'1234', '12 34', 'AB23'}
-               == set((('1234', ' 1234'), ('12 34', ' 12 34'), ('AB23', ' AB23')))
+
 def test_update_postcodes_from_db_add_and_remove(analyzer, table_factory, word_table):
    table_factory('location_postcode', 'postcode TEXT',
                  content=(('1234',), ('45BC', ), ('XX45', )))
    word_table.add_postcode(' 1234', '1234')
    word_table.add_postcode(' 5678', '5678')
    with analyzer() as a:
        a.update_postcodes_from_db()
    assert word_table.count() == 3
    assert word_table.get_postcodes() == {'1234', '45BC', 'XX45'}
 def test_update_special_phrase_empty_table(analyzer, word_table, temp_db_cursor):
@@ -224,22 +236,19 @@ def test_process_place_names(analyzer, getorcreate_term_id):
@pytest.mark.parametrize('pc', ['12345', 'AB 123', '34-345'])
-def test_process_place_postcode(analyzer, temp_db_cursor, pc):
+def test_process_place_postcode(analyzer, word_table, pc):
    with analyzer() as a:
        info = a.process_place({'address': {'postcode' : pc}})
-    assert temp_db_cursor.row_set("""SELECT word FROM word
+    assert word_table.get_postcodes() == {pc, }
                                     WHERE class = 'place' and type = 'postcode'""") \
               == set(((pc, ),))
@pytest.mark.parametrize('pc', ['12:23', 'ab;cd;f', '123;836'])
-def test_process_place_bad_postcode(analyzer, temp_db_cursor, pc):
+def test_process_place_bad_postcode(analyzer, word_table, pc):
    with analyzer() as a:
        info = a.process_place({'address': {'postcode' : pc}})
-    assert 0 == temp_db_cursor.scalar("""SELECT count(*) FROM word
+    assert not word_table.get_postcodes()
                                         WHERE class = 'place' and type = 'postcode'""")
@pytest.mark.parametrize('hnr', ['123a', '1', '101'])
--- a/test/python/test_tools_database_import.py
+++ b/test/python/test_tools_database_import.py
@@ -153,8 +153,8 @@ def test_truncate_database_tables(temp_db_conn, temp_db_cursor, table_factory):
@pytest.mark.parametrize("threads", (1, 5))
-def test_load_data(dsn, src_dir, place_row, placex_table, osmline_table, word_table,
+def test_load_data(dsn, src_dir, place_row, placex_table, osmline_table,
-                   temp_db_cursor, threads):
+                   word_table, temp_db_cursor, threads):
    for func in ('precompute_words', 'getorcreate_housenumber_id', 'make_standard_name'):
        temp_db_cursor.execute("""CREATE FUNCTION {} (src TEXT)
                                  RETURNS TEXT AS $$ SELECT 'a'::TEXT $$ LANGUAGE SQL
--- a/test/python/test_tools_postcodes.py
+++ b/test/python/test_tools_postcodes.py
@@ -1,55 +1,185 @@
 """
 Tests for functions to maintain the artificial postcode table.
 """
 import subprocess
 import pytest
 from nominatim.tools import postcodes
 import dummy_tokenizer
 class MockPostcodeTable:
    """ A location_postcode table for testing.
    """
    def __init__(self, conn):
        self.conn = conn
        with conn.cursor() as cur:
            cur.execute("""CREATE TABLE location_postcode (
                               place_id BIGINT,
                               parent_place_id BIGINT,
                               rank_search SMALLINT,
                               rank_address SMALLINT,
                               indexed_status SMALLINT,
                               indexed_date TIMESTAMP,
                               country_code varchar(2),
                               postcode TEXT,
                               geometry GEOMETRY(Geometry, 4326))""")
            cur.execute("""CREATE OR REPLACE FUNCTION token_normalized_postcode(postcode TEXT)
                           RETURNS TEXT AS $$ BEGIN RETURN postcode; END; $$ LANGUAGE plpgsql;
                        """)
        conn.commit()
    def add(self, country, postcode, x, y):
        with self.conn.cursor() as cur:
            cur.execute("""INSERT INTO location_postcode (place_id, indexed_status,
                                                          country_code, postcode,
                                                          geometry)
                           VALUES (nextval('seq_place'), 1, %s, %s,
                                   'SRID=4326;POINT(%s %s)')""",
                        (country, postcode, x, y))
        self.conn.commit()
    @property
    def row_set(self):
        with self.conn.cursor() as cur:
            cur.execute("""SELECT country_code, postcode,
                                  ST_X(geometry), ST_Y(geometry)
                           FROM location_postcode""")
            return set((tuple(row) for row in cur))
@pytest.fixture
 def tokenizer():
    return dummy_tokenizer.DummyTokenizer(None, None)
@pytest.fixture
-def postcode_table(temp_db_with_extensions, temp_db_cursor, table_factory,
+def postcode_table(temp_db_conn, placex_table, word_table):
-                   placex_table, word_table):
+    return MockPostcodeTable(temp_db_conn)
    table_factory('location_postcode',
                  """ place_id BIGINT,
                      parent_place_id BIGINT,
                      rank_search SMALLINT,
                      rank_address SMALLINT,
                      indexed_status SMALLINT,
                      indexed_date TIMESTAMP,
                      country_code varchar(2),
                      postcode TEXT,
                      geometry GEOMETRY(Geometry, 4326)""")
    temp_db_cursor.execute('CREATE SEQUENCE seq_place')
    temp_db_cursor.execute("""CREATE OR REPLACE FUNCTION token_normalized_postcode(postcode TEXT)
                              RETURNS TEXT AS $$ BEGIN RETURN postcode; END; $$ LANGUAGE plpgsql;
                           """)
-def test_import_postcodes_empty(dsn, temp_db_cursor, postcode_table, tmp_path, tokenizer):
+def test_import_postcodes_empty(dsn, postcode_table, tmp_path, tokenizer):
-    postcodes.import_postcodes(dsn, tmp_path, tokenizer)
+    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
-    assert temp_db_cursor.table_exists('gb_postcode')
+    assert not postcode_table.row_set
    assert temp_db_cursor.table_exists('us_postcode')
    assert temp_db_cursor.table_rows('location_postcode') == 0
-def test_import_postcodes_from_placex(dsn, temp_db_cursor, postcode_table, tmp_path, tokenizer):
+def test_import_postcodes_add_new(dsn, placex_table, postcode_table, tmp_path, tokenizer):
-    temp_db_cursor.execute("""
+    placex_table.add(country='xx', geom='POINT(10 12)',
-        INSERT INTO placex (place_id, country_code, address, geometry)
+                     address=dict(postcode='9486'))
-          VALUES (1, 'xx', '"postcode"=>"9486"', 'SRID=4326;POINT(10 12)')
+    postcode_table.add('yy', '9486', 99, 34)
    """)
-    postcodes.import_postcodes(dsn, tmp_path, tokenizer)
+    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
-    rows = temp_db_cursor.row_set(""" SELECT postcode, country_code,
+    assert postcode_table.row_set == {('xx', '9486', 10, 12), }
                                      ST_X(geometry), ST_Y(geometry)
                                      FROM location_postcode""")
    print(rows)
    assert len(rows) == 1
    assert rows == set((('9486', 'xx', 10, 12), ))
 def test_import_postcodes_replace_coordinates(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    postcode_table.add('xx', 'AB 4511', 99, 34)
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
 def test_import_postcodes_replace_coordinates_close(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    postcode_table.add('xx', 'AB 4511', 10, 11.99999)
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 11.99999)}
 def test_import_postcodes_remove(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    postcode_table.add('xx', 'badname', 10, 12)
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
 def test_import_postcodes_ignore_empty_country(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    placex_table.add(country=None, geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert not postcode_table.row_set
 def test_import_postcodes_remove_all(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    postcode_table.add('ch', '5613', 10, 12)
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert not postcode_table.row_set
 def test_import_postcodes_multi_country(dsn, placex_table, postcode_table, tmp_path, tokenizer):
    placex_table.add(country='de', geom='POINT(10 12)',
                     address=dict(postcode='54451'))
    placex_table.add(country='cc', geom='POINT(100 56)',
                     address=dict(postcode='DD23 T'))
    placex_table.add(country='de', geom='POINT(10.3 11.0)',
                     address=dict(postcode='54452'))
    placex_table.add(country='cc', geom='POINT(10.3 11.0)',
                     address=dict(postcode='54452'))
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('de', '54451', 10, 12),
                                      ('de', '54452', 10.3, 11.0),
                                      ('cc', '54452', 10.3, 11.0),
                                      ('cc', 'DD23 T', 100, 56)}
@pytest.mark.parametrize("gzipped", [True, False])
 def test_import_postcodes_extern(dsn, placex_table, postcode_table, tmp_path,
                                 tokenizer, gzipped):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    extfile = tmp_path / 'xx_postcodes.csv'
    extfile.write_text("postcode,lat,lon\nAB 4511,-4,-1\nCD 4511,-5, -10")
    if gzipped:
        subprocess.run(['gzip', str(extfile)])
        assert not extfile.is_file()
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12),
                                      ('xx', 'CD 4511', -10, -5)}
 def test_import_postcodes_extern_bad_column(dsn, placex_table, postcode_table,
                                            tmp_path, tokenizer):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    extfile = tmp_path / 'xx_postcodes.csv'
    extfile.write_text("postode,lat,lon\nAB 4511,-4,-1\nCD 4511,-5, -10")
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12)}
 def test_import_postcodes_extern_bad_number(dsn, placex_table, postcode_table,
                                            tmp_path, tokenizer):
    placex_table.add(country='xx', geom='POINT(10 12)',
                     address=dict(postcode='AB 4511'))
    extfile = tmp_path / 'xx_postcodes.csv'
    extfile.write_text("postcode,lat,lon\nXX 4511,-4,NaN\nCD 4511,-5, -10\n34,200,0")
    postcodes.update_postcodes(dsn, tmp_path, tokenizer)
    assert postcode_table.row_set == {('xx', 'AB 4511', 10, 12),
                                      ('xx', 'CD 4511', -10, -5)}
--- a/test/python/test_tools_refresh_setup_website.py
+++ b/test/python/test_tools_refresh_setup_website.py
@@ -18,16 +18,16 @@ def envdir(tmpdir):
@pytest.fixture
 def test_script(envdir):
    def _create_file(code):
-        outfile = envdir / 'php' / 'website' / 'search.php'
+        outfile = envdir / 'php' / 'website' / 'reverse-only-search.php'
        outfile.write_text('<?php\n{}\n'.format(code), 'utf-8')
    return _create_file
-def run_website_script(envdir, config):
+def run_website_script(envdir, config, conn):
    config.lib_dir.php = envdir / 'php'
    config.project_dir = envdir
-    refresh.setup_website(envdir, config)
+    refresh.setup_website(envdir, config, conn)
    proc = subprocess.run(['/usr/bin/env', 'php', '-Cq',
                           envdir / 'search.php'], check=False)
@@ -37,36 +37,39 @@ def run_website_script(envdir, config):
@pytest.mark.parametrize("setting,retval", (('yes', 10), ('no', 20)))
 def test_setup_website_check_bool(def_config, monkeypatch, envdir, test_script,
-                                  setting, retval):
+                                  setting, retval, temp_db_conn):
    monkeypatch.setenv('NOMINATIM_CORS_NOACCESSCONTROL', setting)
    test_script('exit(CONST_NoAccessControl ? 10 : 20);')
-    assert run_website_script(envdir, def_config) == retval
+    assert run_website_script(envdir, def_config, temp_db_conn) == retval
@pytest.mark.parametrize("setting", (0, 10, 99067))
-def test_setup_website_check_int(def_config, monkeypatch, envdir, test_script, setting):
+def test_setup_website_check_int(def_config, monkeypatch, envdir, test_script, setting,
                                 temp_db_conn):
    monkeypatch.setenv('NOMINATIM_LOOKUP_MAX_COUNT', str(setting))
    test_script('exit(CONST_Places_Max_ID_count == {} ? 10 : 20);'.format(setting))
-    assert run_website_script(envdir, def_config) == 10
+    assert run_website_script(envdir, def_config, temp_db_conn) == 10
-def test_setup_website_check_empty_str(def_config, monkeypatch, envdir, test_script):
+def test_setup_website_check_empty_str(def_config, monkeypatch, envdir, test_script,
                                       temp_db_conn):
    monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', '')
    test_script('exit(CONST_Default_Language === false ? 10 : 20);')
-    assert run_website_script(envdir, def_config) == 10
+    assert run_website_script(envdir, def_config, temp_db_conn) == 10
-def test_setup_website_check_str(def_config, monkeypatch, envdir, test_script):
+def test_setup_website_check_str(def_config, monkeypatch, envdir, test_script,
                                 temp_db_conn):
    monkeypatch.setenv('NOMINATIM_DEFAULT_LANGUAGE', 'ffde 2')
    test_script('exit(CONST_Default_Language === "ffde 2" ? 10 : 20);')
-    assert run_website_script(envdir, def_config) == 10
+    assert run_website_script(envdir, def_config, temp_db_conn) == 10
--- a/test/python/test_tools_tiger_data.py
+++ b/test/python/test_tools_tiger_data.py
@@ -2,60 +2,137 @@
 Test for tiger data function
 """
 from pathlib import Path
 from textwrap import dedent
 import pytest
 import tarfile
 from nominatim.tools import tiger_data, database_import
 from nominatim.errors import UsageError
 class MockTigerTable:
    def __init__(self, conn):
        self.conn = conn
        with conn.cursor() as cur:
            cur.execute("""CREATE TABLE tiger (linegeo GEOMETRY,
                                               start INTEGER,
                                               stop INTEGER,
                                               interpol TEXT,
                                               token_info JSONB,
                                               postcode TEXT)""")
    def count(self):
        with self.conn.cursor() as cur:
            return cur.scalar("SELECT count(*) FROM tiger")
    def row(self):
        with self.conn.cursor() as cur:
            cur.execute("SELECT * FROM tiger LIMIT 1")
            return cur.fetchone()
@pytest.fixture
 def tiger_table(def_config, temp_db_conn, sql_preprocessor,
                temp_db_with_extensions, tmp_path):
    def_config.lib_dir.sql = tmp_path / 'sql'
    def_config.lib_dir.sql.mkdir()
    (def_config.lib_dir.sql / 'tiger_import_start.sql').write_text(
        """CREATE OR REPLACE FUNCTION tiger_line_import(linegeo GEOMETRY, start INTEGER,
                                                        stop INTEGER, interpol TEXT,
                                                        token_info JSONB, postcode TEXT)
           RETURNS INTEGER AS $$
            INSERT INTO tiger VALUES(linegeo, start, stop, interpol, token_info, postcode) RETURNING 1
           $$ LANGUAGE SQL;""")
    (def_config.lib_dir.sql / 'tiger_import_finish.sql').write_text(
        """DROP FUNCTION tiger_line_import (linegeo GEOMETRY, in_startnumber INTEGER,
                                 in_endnumber INTEGER, interpolationtype TEXT,
                                 token_info JSONB, in_postcode TEXT);""")
    return MockTigerTable(temp_db_conn)
@pytest.fixture
 def csv_factory(tmp_path):
    def _mk_file(fname, hnr_from=1, hnr_to=9, interpol='odd', street='Main St',
                 city='Newtown', state='AL', postcode='12345',
                 geometry='LINESTRING(-86.466995 32.428956,-86.466923 32.428933)'):
        (tmp_path / (fname + '.csv')).write_text(dedent("""\
        from;to;interpolation;street;city;state;postcode;geometry
        {};{};{};{};{};{};{};{}
        """.format(hnr_from, hnr_to, interpol, street, city, state,
                   postcode, geometry)))
    return _mk_file
@pytest.mark.parametrize("threads", (1, 5))
-def test_add_tiger_data(def_config, tmp_path, sql_preprocessor,
+def test_add_tiger_data(def_config, src_dir, tiger_table, tokenizer_mock, threads):
-                        temp_db_cursor, threads, temp_db_with_extensions):
+    tiger_data.add_tiger_data(str(src_dir / 'test' / 'testdb' / 'tiger'),
-    temp_db_cursor.execute('CREATE TABLE place (id INT)')
+                              def_config, threads, tokenizer_mock())
    sqlfile = tmp_path / '1010.sql'
    sqlfile.write_text("""INSERT INTO place values (1);
                          INSERT INTO non_existant_table values (1);""")
    tiger_data.add_tiger_data(str(tmp_path), def_config, threads)
-    assert temp_db_cursor.table_rows('place') == 1
+    assert tiger_table.count() == 6213
-@pytest.mark.parametrize("threads", (1, 5))
+def test_add_tiger_data_no_files(def_config, tiger_table, tokenizer_mock,
-def test_add_tiger_data_bad_file(def_config, tmp_path, sql_preprocessor,
+                                 tmp_path):
-                                 temp_db_cursor, threads, temp_db_with_extensions):
+    tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
-    temp_db_cursor.execute('CREATE TABLE place (id INT)')
+
-    sqlfile = tmp_path / '1010.txt'
+    assert tiger_table.count() == 0
 def test_add_tiger_data_bad_file(def_config, tiger_table, tokenizer_mock,
                                 tmp_path):
    sqlfile = tmp_path / '1010.csv'
    sqlfile.write_text("""Random text""")
    tiger_data.add_tiger_data(str(tmp_path), def_config, threads)
-    assert temp_db_cursor.table_rows('place') == 0
+    tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
    assert tiger_table.count() == 0
 def test_add_tiger_data_hnr_nan(def_config, tiger_table, tokenizer_mock,
                                csv_factory, tmp_path):
    csv_factory('file1', hnr_from=99)
    csv_factory('file2', hnr_from='L12')
    csv_factory('file3', hnr_to='12.4')
    tiger_data.add_tiger_data(str(tmp_path), def_config, 1, tokenizer_mock())
    assert tiger_table.count() == 1
    assert tiger_table.row()['start'] == 99
@pytest.mark.parametrize("threads", (1, 5))
-def test_add_tiger_data_tarfile(def_config, tmp_path, temp_db_cursor,
+def test_add_tiger_data_tarfile(def_config, tiger_table, tokenizer_mock,
-                                threads, temp_db_with_extensions, sql_preprocessor):
+                                tmp_path, src_dir, threads):
    temp_db_cursor.execute('CREATE TABLE place (id INT)')
    sqlfile = tmp_path / '1010.sql'
    sqlfile.write_text("""INSERT INTO place values (1);
                          INSERT INTO non_existant_table values (1);""")
    tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz")
-    tar.add(sqlfile)
+    tar.add(str(src_dir / 'test' / 'testdb' / 'tiger' / '01001.csv'))
    tar.close()
    tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, threads)
-    assert temp_db_cursor.table_rows('place') == 1
+    tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, 1,
                              tokenizer_mock())
    assert tiger_table.count() == 6213
-@pytest.mark.parametrize("threads", (1, 5))
+def test_add_tiger_data_bad_tarfile(def_config, tiger_table, tokenizer_mock,
-def test_add_tiger_data_bad_tarfile(def_config, tmp_path, temp_db_cursor, threads,
+                                    tmp_path):
-                                    temp_db_with_extensions, sql_preprocessor):
+    tarfile = tmp_path / 'sample.tar.gz'
-    temp_db_cursor.execute('CREATE TABLE place (id INT)')
+    tarfile.write_text("""Random text""")
-    sqlfile = tmp_path / '1010.txt'
+
-    sqlfile.write_text("""Random text""")
+    with pytest.raises(UsageError):
        tiger_data.add_tiger_data(str(tarfile), def_config, 1, tokenizer_mock())
 def test_add_tiger_data_empty_tarfile(def_config, tiger_table, tokenizer_mock,
                                      tmp_path, src_dir):
    tar = tarfile.open(str(tmp_path / 'sample.tar.gz'), "w:gz")
-    tar.add(sqlfile)
+    tar.add(__file__)
    tar.close()
    tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, threads)
-    assert temp_db_cursor.table_rows('place') == 0
+    tiger_data.add_tiger_data(str(tmp_path / 'sample.tar.gz'), def_config, 1,
                              tokenizer_mock())
    assert tiger_table.count() == 0
--- a/test/testdb/tiger/01001.csv
+++ b/test/testdb/tiger/01001.csv
--- a/test/testdb/tiger/01001.sql
+++ b/test/testdb/tiger/01001.sql