Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2026-02-16 15:47:58 +00:00

Author	SHA1	Message	Date
Sarah Hoffmann	90b40fc3e6	define formal public Python interface for tokenizer This introduces an abstract class for the Tokenizer/Analyzer for documentation purposes.	2021-08-16 11:41:54 +02:00
Sarah Hoffmann	75a5c7013f	split up large setup function	2021-08-15 12:24:13 +02:00
Sarah Hoffmann	87dedde5d6	allow multiple files for the import command The files are forwarded to osm2pgsql which is now able to merge them correctly.	2021-08-14 21:42:21 +02:00
Sarah Hoffmann	d48793c22c	fix Python linitin errors	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1db098c05d	reinstate word column in icu word table Postgresql is very bad at creating statistics for jsonb columns. The result is that the query planer tends to use JIT for queries with a where over 'info' even when there is an index.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42878eeda	adapt unit test for new word table Requires a second wrapper class for the word table with the new layout. This class is interface-compatible, so that later when the ICU tokenizer becomes the default, all tests that depend on behaviour of the default tokenizer can be switched to the other wrapper.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	eb6814d74e	convert word info column to json before copying	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	70f154be8b	switch word tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	4342b28882	switch special phrases to new word table format	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5394b1fa1b	switch postcode tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	5ab0a63fd6	switch housenumber tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	1618aba5f2	switch country name tokens to new word table layout	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	8377528952	new word table layout for icu tokenizer The table now directly reflects the different token types. Extra information is saved in a json structure that may be dynamically extended in the future without affecting the table layout.	2021-07-28 11:31:47 +02:00
Sarah Hoffmann	e42349c963	replace add-data function with native Python code	2021-07-26 10:41:37 +02:00
Sarah Hoffmann	878835e4bd	move add-data subcommand into a separate file	2021-07-25 18:14:12 +02:00
Sarah Hoffmann	2c8242c8df	remove special code for pre9.5 postgresql 9.5 is now the minimum requirement.	2021-07-19 10:24:57 +02:00
Sarah Hoffmann	e7d6f89aca	increase minimum version for PostgreSQL to 9.5 This is the minimum version we can test with the CI. With 9.5 there is also complete support for jsonb available.	2021-07-19 10:21:19 +02:00
Sarah Hoffmann	14f777da18	use psycopg's SQL quoting where possible Use the SQL formatting supplied with psycopg whenever the query needs to be put together from snippets.	2021-07-12 22:05:22 +02:00
Sarah Hoffmann	6f6681ce67	add helper function for execute_values Make psycopg2's convenience function accessible through the cursor.	2021-07-12 21:08:20 +02:00
Sarah Hoffmann	06602b4ec0	provide wrapper function for DROP TABLE Use psycopg2 formatting to ensure correct quoting.	2021-07-12 20:32:46 +02:00
Sarah Hoffmann	cf98cff2a1	more formatting fixes Found by flake8.	2021-07-12 17:45:42 +02:00
Sarah Hoffmann	f8b5a63de3	factor out connection reset code	2021-07-12 14:58:44 +02:00
Sarah Hoffmann	568316f07c	simplify analyse function	2021-07-12 14:47:50 +02:00
Sarah Hoffmann	daa597b300	split up variant computation for better readability	2021-07-12 14:43:50 +02:00
Sarah Hoffmann	47adb2a3fc	reorganise process_place function Move address processing into its own function as it is rather extensive.	2021-07-12 11:57:55 +02:00
Sarah Hoffmann	fff0012249	simplify website setup code Use formaat strings and move variable quoting code into extra function.	2021-07-12 11:41:05 +02:00
Sarah Hoffmann	d5a1883b62	avoid repeated patterns for table name	2021-07-12 11:33:09 +02:00
Sarah Hoffmann	a08ef43e40	simplify if statements	2021-07-12 11:28:47 +02:00
Sarah Hoffmann	3661f7a321	avoid multiple returns of same value Found by Sonarqube.	2021-07-11 18:23:42 +02:00
Sarah Hoffmann	a2edbbf78a	cannot use capture_output in subprocess.run Only available since Python 3.7.	2021-07-06 22:57:42 +02:00
Sarah Hoffmann	1e86dc1d93	remove default parameter for namedtuple This is only available in Python 3.7.	2021-07-06 22:57:42 +02:00
Sarah Hoffmann	62d5984b1b	limit the number of variants that can be produced	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	c32551b4e0	restrict partial word counting to names of reasoanble length The partial word count does not split names to save a bit of time. The result is that it might enounter unreasonably long names which in truth consist of multiple words. No accurate statistics are needed so simply restrict the count to words shorter than 75 characters.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	7b0f6b7905	leave ICU variant properties empty for now Saving unused properties causes unnecessary duplicates.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	9ff4f66f55	complete tests for icu tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	32ca631b74	fix full term token in special phrases	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	6ba00e6aee	icu tokenizer: move transliteration rules in separate file The tokenizer configuration has become difficult to handle due to the additional manual transliteration rules. Allow to have a separate rule file that is given to the ICU library as is.	2021-07-04 10:28:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	47fb7cd3a8	Use place_exists() into can_compute() for postcodes	2021-06-09 09:31:32 +02:00

1 2 3 4 5 ...

346 Commits