Nominatim

Author	SHA1	Message	Date
Sarah Hoffmann	e85f7e7aa9	fix subsequent replacements Two replacement words directly following each other did not work as expected because each expects a space at the beginning/end while there was only one space available. Also forbit composing a word after a space was added in the end by a previous replacement.	2021-07-04 10:28:28 +02:00
Sarah Hoffmann	b9fbfeff67	only consider partials in multi-words for initial count This ensures that it is less likely that we exclude meaningful words like 'hauptstrasse' just because they are frequent.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	62828fc5c1	switch to a more flexible variant description format The new format combines compound splitting and abbreviation. It also allows to restrict rules to additional conditions (like language or region). This latter ability is not used yet.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a6aa6360e0	use yaml tag syntax to mark include files	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	0d80a9b897	tests for composing decomposed suffixes	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	f70930b1a0	make compund decomposition pure import feature Compound decomposition now creates a full name variant on import just like abbreviations. This simplifies query time normalization and opens a path for changing abbreviation and compund decomposition lists for an existing database.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	9ff4f66f55	complete tests for icu tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e81084f35	complete tests for rule loader	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	a0a7b05c9f	correctly quote strings when copying in data Encapsulate the copy string in a class that ensures that copy lines are written with correct quoting.	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2f6e4edcdb	update unit tests for adapted abbreviation code	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	2e3c5d4c5b	adapt tests for ICU tokenizer	2021-07-04 10:28:20 +02:00
Sarah Hoffmann	8413075249	move abbreviation computation into import phase This adds precomputation of abbreviated terms for names and removes abbreviation of terms in the query. Basic import works but still needs some thorough testing as well as speed improvements during import. New dependency for python library datrie.	2021-07-04 10:28:20 +02:00
AntoJvlt	3676310efe	Improved performance of the postcodes query and some code cleaning	2021-06-12 15:46:08 +02:00
AntoJvlt	1c175e3a67	Clean and update tests for postcodes	2021-06-09 09:31:32 +02:00
AntoJvlt	e879814e43	Update tests for postcodes	2021-06-09 09:31:32 +02:00
Sarah Hoffmann	bc981d0261	fix insertion of special terms and countries into word table Special terms need to be prefixed by a space because they are full terms. For countries avoid duplicate entries of word tokens. Adds tests for adding country terms.	2021-06-02 20:22:39 +02:00
Sarah Hoffmann	24c986c842	add tests for new full name computation with ICU	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	430c316e45	test: fix linting errors	2021-05-19 23:07:39 +02:00
Sarah Hoffmann	01f5a9ff84	test: more use of table_factory	2021-05-19 17:37:03 +02:00
Sarah Hoffmann	af52eed0dd	test: avoid use of tempfile module Use the tmp_path fixture instead which provides automatic cleanup.	2021-05-19 16:43:26 +02:00
Sarah Hoffmann	f93d0fa957	test: use src_dir fixture instead of self-computed paths	2021-05-19 16:03:54 +02:00
Sarah Hoffmann	c06a1d007a	test: replace raw execute() with fixture code where possible	2021-05-19 12:11:04 +02:00
Sarah Hoffmann	65bd749918	test: use table_rows() and execute_values() where possible Some uses of scalar() could also be replaced with convenience functions from the word table mock.	2021-05-19 10:51:10 +02:00
Sarah Hoffmann	510eb53f53	test: move Testingcursor into separate class Also adds more convenience functions: counting with a where statement and a wrapper to execute_values().	2021-05-19 10:30:36 +02:00
Sarah Hoffmann	16bb007135	Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer Do not hide errors when importing tokenizer	2021-05-18 23:00:10 +02:00
Sarah Hoffmann	b2722650d4	do not hide errors when importing tokenizer Explicitly check for the tokenizer source file to check that the name is correct. We can't use the import error for that because it hides other import errors like a missing library. Fixes #2327.	2021-05-18 16:28:21 +02:00
AntoJvlt	3206bf59df	Resolve conflicts	2021-05-17 13:52:35 +02:00
AntoJvlt	8b8dfc46eb	Added --no-replace command for special phrases importation and added corresponding tests	2021-05-17 13:25:06 +02:00
AntoJvlt	06aab389ed	Code cleaning and SPLoader deleted	2021-05-16 16:59:12 +02:00
AntoJvlt	fb0ebb5bf0	Add tests for the new SPWikiLoader and SPCsvLoader	2021-05-16 16:10:06 +02:00
Sarah Hoffmann	925726222f	Merge pull request #2323 from darkshredder/disable-search-reverse-only Feat: Disabled search API for --reverse-only imports	2021-05-14 10:40:22 +02:00
Sarah Hoffmann	7d621389ee	adapt tests to new TIGER CSV format	2021-05-14 00:02:50 +02:00
Darkshredder	e5ffc59cd5	feat: Added reverse-only-search validation	2021-05-14 02:36:21 +05:30
Sarah Hoffmann	5feece64c1	use WorkerPool for Tiger data import Requires adding an option that SQL errors are ignored.	2021-05-13 20:36:50 +02:00
Sarah Hoffmann	f5977dac75	ignore invalid coordinates in external postcodes	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	8f2746fe24	ignore entries without country code	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	1ccd4360b4	correctly handle removing all postcodes for country	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	bf864b2c54	index postcodes after refreshing	2021-05-13 14:15:42 +02:00
Sarah Hoffmann	4abaf71234	add and extend tests for new postcode handling	2021-05-13 14:15:42 +02:00
AntoJvlt	9d83da830f	Introduction of SPCsvLoader to load special phrases from a csv file	2021-05-10 23:26:39 +02:00
AntoJvlt	00959fac57	Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader	2021-05-10 21:49:31 +02:00
Sarah Hoffmann	18c99a5c5f	add unit tests for legacy ICU tokenizer	2021-05-05 10:15:27 +02:00
Sarah Hoffmann	8bdb9aa607	mock tokenizer factory for replication tests	2021-05-01 10:50:39 +02:00
Sarah Hoffmann	388ebcbae2	move index creation for word table to tokenizer This introduces a finalization routing for the tokenizer where it can post-process the import if necessary.	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	fc995ea6b9	move database check for module to tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	893490f94e	add more tests for legacy tokenizer	2021-04-30 17:41:08 +02:00
Sarah Hoffmann	3eb4d88057	boilerplate for PHP code of tokenizer This adds an installation step for PHP code for the tokenizer. The PHP code is split in two parts. The updateable code is found in lib-php. The tokenizer installs an additional script in the project directory which then includes the code from lib-php and defines all settings that are static to the database. The website code then always includes the PHP from the project directory.	2021-04-30 11:31:52 +02:00
Sarah Hoffmann	23fd1d032a	tests for legacy tokenizer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	7cb7cf848d	move amenity creation to tokenizer The BDD tests still use the old-style amenity creation scripts because we don't have simple means to import a hand-crafted test file of special phrases right now.	2021-04-30 11:30:51 +02:00

... 2 3 4 5 6

286 Commits