Commit Graph

24 Commits

Author SHA1 Message Date
Sarah Hoffmann
fa2bc60468 introduce name analyzer
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3 require tokeinzer for indexer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8 introduce external processing in indexer
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
f7e4aa51d3 indexer: reset query counter
Reset the counter for queries after the asynchronous connections
have been reopened.
2021-04-21 10:33:45 +02:00
Sarah Hoffmann
50b6d7298c factor out async connection handling into separate class
Also adds a test for reconnecting regularly while indexing.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
26a81654a8 indexer: make self.conn function-local
Also switches to our internal connect function which gives us
a cursor with a sclar() function.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
6430371d7d make index() function private 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
18705b3f18 move analyse function into indexinf function 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c6bd2bb7fb indexer: move runner into separate file 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
76b1885595 use absolute imports in Python code
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Sarah Hoffmann
a08ca5b1b5 avoid division by zero in progress meter
On Windows systems the timer may not be accurate enough to measure
the time between init() and done(). Avoid computing statistics with
a diff time of 0 in such cases.

Fixes #2230.
2021-03-21 16:47:22 +01:00
Sarah Hoffmann
dd301cf5ac indexer: ANALYSE must be run outside transactions 2021-03-04 11:06:33 +01:00
Sarah Hoffmann
15b5906790 move setup function to python
There are still back-calls to PHP for some of the sub-steps.
These needs some larger refactoring to be moved to Python.
2021-02-26 15:02:39 +01:00
Sarah Hoffmann
3ee8d9fa75 properly close connections of indexer after use 2021-02-26 12:10:54 +01:00
Sarah Hoffmann
3c186f8030 add a function for the intial indexing run
Also moves postcodes to fully parallel indexing.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
8c02786820 add tests for indexer 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
504922ffbe remove old nominatim.py in favour of 'nominatim index'
The PHP scripts need to know the position of the nominatim
tool in order to call it. This is handed in as environment
variable, so it can be set by the Python script.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
c77877a934 implementaion of 'nominatim index' 2021-01-18 15:43:27 +01:00
Sarah Hoffmann
27977411e9 move indexing function into its own Python module
This makes it mow a standard function of our new Python
library instead of a stand-alone program.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
8e53f63036 fix errors reported by pylint 2021-01-15 08:57:00 +01:00
Sarah Hoffmann
5016eace34 improve progress logging during indexing
Wait for 2 seconds before logging the first progress, so that we
have numbers that are a bit more reliable statistically speaking.

Also provides an actual implementation for the log_interval
parameter and fixes some small style issues.
2020-11-30 10:59:29 +01:00
Sarah Hoffmann
fc50eb8688 nominatim: move DBConnection class into its own file 2020-08-18 15:17:09 +02:00
Sarah Hoffmann
5be084e0f5 indexer: allow batch processing of places
Request and process multiple place_ids at once so that
Postgres can make better use of caching and there are less
transactions running.
2020-08-03 10:32:39 +02:00
Sarah Hoffmann
2323923bec indexer: move progress tracker into separate class 2020-08-03 10:32:39 +02:00