Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2026-02-15 19:07:58 +00:00

Author	SHA1	Message	Date
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	e879814e43	Update tests for postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	430c316e45	test: fix linting errors	2021-05-19 23:07:39 +02:00
Sarah Hoffmann	01f5a9ff84	test: more use of table_factory	2021-05-19 17:37:03 +02:00
Sarah Hoffmann	af52eed0dd	test: avoid use of tempfile module Use the tmp_path fixture instead which provides automatic cleanup.	2021-05-19 16:43:26 +02:00
Sarah Hoffmann	f93d0fa957	test: use src_dir fixture instead of self-computed paths	2021-05-19 16:03:54 +02:00
Sarah Hoffmann	c06a1d007a	test: replace raw execute() with fixture code where possible	2021-05-19 12:11:04 +02:00
Sarah Hoffmann	65bd749918	test: use table_rows() and execute_values() where possible Some uses of scalar() could also be replaced with convenience functions from the word table mock.	2021-05-19 10:51:10 +02:00
Sarah Hoffmann	510eb53f53	test: move Testingcursor into separate class Also adds more convenience functions: counting with a where statement and a wrapper to execute_values().	2021-05-19 10:30:36 +02:00
Sarah Hoffmann	16bb007135	Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer Do not hide errors when importing tokenizer	2021-05-18 23:00:10 +02:00
Sarah Hoffmann	b2722650d4	do not hide errors when importing tokenizer Explicitly check for the tokenizer source file to check that the name is correct. We can't use the import error for that because it hides other import errors like a missing library. Fixes #2327.	2021-05-18 16:28:21 +02:00
AntoJvlt	3206bf59df	Resolve conflicts	2021-05-17 13:52:35 +02:00
AntoJvlt	8b8dfc46eb	Added --no-replace command for special phrases importation and added corresponding tests	2021-05-17 13:25:06 +02:00
AntoJvlt	06aab389ed	Code cleaning and SPLoader deleted	2021-05-16 16:59:12 +02:00
AntoJvlt	fb0ebb5bf0	Add tests for the new SPWikiLoader and SPCsvLoader	2021-05-16 16:10:06 +02:00
Sarah Hoffmann	925726222f	Merge pull request #2323 from darkshredder/disable-search-reverse-only Feat: Disabled search API for --reverse-only imports	2021-05-14 10:40:22 +02:00
Sarah Hoffmann	7d621389ee	adapt tests to new TIGER CSV format	2021-05-14 00:02:50 +02:00
Darkshredder	e5ffc59cd5	feat: Added reverse-only-search validation	2021-05-14 02:36:21 +05:30
Sarah Hoffmann	5feece64c1	use WorkerPool for Tiger data import Requires adding an option that SQL errors are ignored.	2021-05-13 20:36:50 +02:00
Sarah Hoffmann	f5977dac75	ignore invalid coordinates in external postcodes	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	8f2746fe24	ignore entries without country code	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	1ccd4360b4	correctly handle removing all postcodes for country	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	bf864b2c54	index postcodes after refreshing	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	4abaf71234	add and extend tests for new postcode handling	2021-05-13 14:15:42 +02:00
AntoJvlt	9d83da830f	Introduction of SPCsvLoader to load special phrases from a csv file	2021-05-10 23:26:39 +02:00
AntoJvlt	00959fac57	Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader	2021-05-10 21:49:31 +02:00
Sarah Hoffmann	18c99a5c5f	add unit tests for legacy ICU tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	8bdb9aa607	mock tokenizer factory for replication tests	2021-05-01 10:50:39 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	893490f94e	add more tests for legacy tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	bef300305e	move default country name creation to tokenizer The new function is also used, when a country us updated. All SQL function related to country names have been removed.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	ffc2d82b0e	move postcode normalization into tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fa2bc60468	introduce name analyzer The name analyzer is the actual work horse of the tokenizer. It is instantiated on a thread-base and provides all functions for analysing names and queries.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	9397bf54b8	introduce external processing in indexer Indexing is now split into three parts: first a preparation step that collects the necessary information from the database and returns it to Python. In a second step the data is transformed within Python as necessary and then returned to the database through the usual UPDATE which now not only sets the indexed_status but also other fields. The third step comprises the address computation which is still done inside the update trigger in the database. The second processing step doesn't do anything useful yet.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	fbbdd31399	move word table and normalisation SQL into tokenizer Creating and populating the word table is now the responsibility of the tokenizer. The get_maxwordfreq() function has been replaced with a simple template parameter to the SQL during function installation. The number is taken from the parameter list in the database to ensure that it is not changed after installation.	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	296a66558f	move module installation to legacy tokenizer	2021-04-30 11:29:57 +02:00

... 2 3 4 5 6

279 Commits