introduce external processing in indexer

Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
This commit is contained in:
Sarah Hoffmann
2021-04-23 15:49:38 +02:00
parent fbbdd31399
commit 9397bf54b8
6 changed files with 139 additions and 65 deletions

View File

@@ -17,6 +17,7 @@ class IndexerTestDB:
self.conn = conn
self.conn.set_isolation_level(0)
with self.conn.cursor() as cur:
cur.execute('CREATE EXTENSION hstore')
cur.execute("""CREATE TABLE placex (place_id BIGINT,
class TEXT,
type TEXT,
@@ -26,6 +27,7 @@ class IndexerTestDB:
indexed_date TIMESTAMP,
partition SMALLINT,
admin_level SMALLINT,
address HSTORE,
geometry_sector INTEGER)""")
cur.execute("""CREATE TABLE location_property_osmline (
place_id BIGINT,
@@ -46,6 +48,17 @@ class IndexerTestDB:
END IF;
RETURN NEW;
END; $$ LANGUAGE plpgsql;""")
cur.execute("""CREATE OR REPLACE FUNCTION placex_prepare_update(p placex,
OUT name HSTORE,
OUT address HSTORE,
OUT country_feature VARCHAR)
AS $$
BEGIN
address := p.address;
name := p.address;
END;
$$ LANGUAGE plpgsql STABLE;
""")
for table in ('placex', 'location_property_osmline', 'location_postcode'):
cur.execute("""CREATE TRIGGER {0}_update BEFORE UPDATE ON {0}
FOR EACH ROW EXECUTE PROCEDURE date_update()