Compare commits

..

649 Commits

Author SHA1 Message Date
mtmail
b0aaa25f0d Installation docs - link to Kubernetes install project
As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412
2021-08-03 12:02:35 +02:00
Sarah Hoffmann
c3ddc7579a Merge pull request #2408 from lonvia/icu-change-word-table-layout
Change table layout of word table for ICU tokenizer
2021-07-28 14:28:49 +02:00
Sarah Hoffmann
fdff579188 php: force use of global Exception class 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
d48793c22c fix Python linitin errors 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
001b2aa9f9 fix linitin issues in PHP 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1db098c05d reinstate word column in icu word table
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
e42878eeda adapt unit test for new word table
Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
eb6814d74e convert word info column to json before copying 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
6ad35aca4a adapt special terms lookup to new word table 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
70f154be8b switch word tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
4342b28882 switch special phrases to new word table format 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5394b1fa1b switch postcode tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5ab0a63fd6 switch housenumber tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1618aba5f2 switch country name tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
8377528952 new word table layout for icu tokenizer
The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
34dcf02dee fix typos in tokenizer docs 2021-07-28 11:28:49 +02:00
Sarah Hoffmann
5d7d7f15d9 Merge pull request #2401 from lonvia/port-add-data-to-python
Port add-data functions from PHP to Python
2021-07-26 12:38:56 +02:00
Sarah Hoffmann
0c023fb4d2 adapt cli tests to Python port for add-data 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
1bd068d42d remove unused update script 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
e42349c963 replace add-data function with native Python code 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
878835e4bd move add-data subcommand into a separate file 2021-07-25 18:14:12 +02:00
Sarah Hoffmann
8096a1d67f fix parameters for TokenWord creation 2021-07-20 10:21:40 +02:00
Sarah Hoffmann
e16c5d5f70 Merge pull request #2397 from lonvia/increase-minimum-required-versions
Increase minimum required PostgreSQL version to 9.5
2021-07-19 14:28:02 +02:00
Sarah Hoffmann
2c8242c8df remove special code for pre9.5 postgresql
9.5 is now the minimum requirement.
2021-07-19 10:24:57 +02:00
Sarah Hoffmann
e7d6f89aca increase minimum version for PostgreSQL to 9.5
This is the minimum version we can test with the CI.
With 9.5 there is also complete support for jsonb available.
2021-07-19 10:21:19 +02:00
Sarah Hoffmann
379f5db516 require Python 3.6 also in CMakeFile
This had been forgotten when increasing the minimum Python version.
2021-07-19 10:14:14 +02:00
Sarah Hoffmann
ee32315378 Merge pull request #2396 from lonvia/partial-word-token
Reorganise code that build the SearchDescription
2021-07-19 09:42:37 +02:00
Sarah Hoffmann
cca912af4e make all Token menbers private 2021-07-18 22:54:55 +02:00
Sarah Hoffmann
86ea077092 merge marking rare name with adding name token
Only name tokens can be rare, so this should be the same
function.
2021-07-18 16:52:37 +02:00
Sarah Hoffmann
5d6aabc457 add documentation for public interface of SearchDescription 2021-07-18 16:10:42 +02:00
Sarah Hoffmann
b14ce959d9 factor out check if a token fits current search
Saves allocating an empty array.
2021-07-17 22:01:35 +02:00
Sarah Hoffmann
a48ebd9b47 move SearchDescription building into tokens
Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.
2021-07-17 20:24:33 +02:00
Sarah Hoffmann
3cd85eaaf1 remove Token from explicit input for SearchDescription extension
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
2021-07-17 18:18:31 +02:00
Sarah Hoffmann
ec3f6c9c42 factor out query position
Moves token and phrase position and phrase type into a separate
class that is handed in when assembling the search description.
This drastically reduces the number of parameters for the function
to extend the search descriptions and gives us more flexibility
in the future for more complex positional analysis.
2021-07-15 14:12:59 +02:00
Sarah Hoffmann
143ff14466 remove special status of partial tokens
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5 introduce a separate token type for partials
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
bc8b2d4ae0 Merge pull request #2393 from lonvia/fix-flake8-issues
Fix flake8 issues
2021-07-13 16:46:12 +02:00
Sarah Hoffmann
14f777da18 use psycopg's SQL quoting where possible
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67 add helper function for execute_values
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00
Sarah Hoffmann
06602b4ec0 provide wrapper function for DROP TABLE
Use psycopg2 formatting to ensure correct quoting.
2021-07-12 20:32:46 +02:00
Sarah Hoffmann
cf98cff2a1 more formatting fixes
Found by flake8.
2021-07-12 17:45:42 +02:00
Sarah Hoffmann
b4fec57b6d Merge pull request #2391 from lonvia/fix-sonar-issues
Fix bugs and code smells found by Sonarqube
2021-07-12 17:14:59 +02:00
Sarah Hoffmann
f8b5a63de3 factor out connection reset code 2021-07-12 14:58:44 +02:00
Sarah Hoffmann
568316f07c simplify analyse function 2021-07-12 14:47:50 +02:00
Sarah Hoffmann
daa597b300 split up variant computation for better readability 2021-07-12 14:43:50 +02:00
Sarah Hoffmann
47adb2a3fc reorganise process_place function
Move address processing into its own function as it is
rather extensive.
2021-07-12 11:57:55 +02:00
Sarah Hoffmann
fff0012249 simplify website setup code
Use formaat strings and move variable quoting code into extra
function.
2021-07-12 11:41:05 +02:00
Sarah Hoffmann
d5a1883b62 avoid repeated patterns for table name 2021-07-12 11:33:09 +02:00
Sarah Hoffmann
a08ef43e40 simplify if statements 2021-07-12 11:28:47 +02:00
Sarah Hoffmann
bc5e15996a convert single case switch to if statement 2021-07-12 11:28:47 +02:00
Sarah Hoffmann
128ca800cd avoid local variable assignment 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
000d133af6 fix more missing braces on one-liners 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
1e40d65aa9 remove dead code 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
bffbe68ec3 do not intermix params with and without default 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
58b10074ad directly return data in function
The temporary variable is not necessary.
2021-07-11 19:24:04 +02:00
Sarah Hoffmann
d933ead2b5 remove unnecessayly nested ifs
Found by Sonarqube.
2021-07-11 19:11:37 +02:00
Sarah Hoffmann
1cdc30c5e8 remove unused functions
The functions were necessary for the transitory code
to Python and are no longer used.
2021-07-11 19:10:04 +02:00
Sarah Hoffmann
3661f7a321 avoid multiple returns of same value
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
27af9b102c always use brackets on if statements
This adds bracket around all one-line if statements that did
not have them yet.
2021-07-10 17:04:46 +02:00
Sarah Hoffmann
500c61685b remove unused variables
As reported by sonarqube.
2021-07-09 16:36:42 +02:00
Sarah Hoffmann
106d960f84 fix bad use of echo in PHP output 2021-07-09 12:50:35 +02:00
Sarah Hoffmann
322fa19ceb Merge pull request #2390 from lonvia/responsible-disclosure
Add security issue disclosure policy
2021-07-09 12:32:37 +02:00
Sarah Hoffmann
5bea0b6086 add security issue disclosure policy 2021-07-09 11:36:59 +02:00
Sarah Hoffmann
a5970d7548 Merge pull request #2384 from lonvia/actions-add-icu-tokenizer
CI: run tests on Ubuntu 18
2021-07-07 14:39:53 +02:00
Sarah Hoffmann
c216144dd1 add missing pyyaml requirement 2021-07-07 11:29:33 +02:00
Sarah Hoffmann
42e08da7ca enable PHP 7.2 for Ubuntu 18 CI 2021-07-07 11:29:33 +02:00
Sarah Hoffmann
a2edbbf78a cannot use capture_output in subprocess.run
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
1e86dc1d93 remove default parameter for namedtuple
This is only available in Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
54f295be52 CI: run tests on older Ubuntu version as well 2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8bc3c0a07c Merge pull request #2382 from lonvia/remove-json-config
Remove outdated ICU tokenizer JSON config
2021-07-05 12:34:34 +02:00
Sarah Hoffmann
d75bc20174 Merge pull request #2383 from lonvia/remove-more-names
Exclude name:etymology and name:signed
2021-07-05 12:34:16 +02:00
Sarah Hoffmann
fd8751658f exclude name:etymology and name:signed
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.

name:signed basically indicates that the feature does not have
a name.
2021-07-05 11:04:16 +02:00
Sarah Hoffmann
4db5a1a0b8 remove outdated ICU tokenizer JSON config 2021-07-05 11:01:35 +02:00
Sarah Hoffmann
4c52777ef0 Merge pull request #2371 from lonvia/increase-python-version
Increase minimum required Python version to 3.6
2021-07-05 10:32:38 +02:00
Sarah Hoffmann
d4c7bf20a2 Merge pull request #2381 from lonvia/reorganise-abbreviations
Reorganise abbreviation handling
2021-07-05 10:32:16 +02:00
Sarah Hoffmann
affe1300d9 add warning about experimental nature of ICU tokenizer 2021-07-04 10:44:58 +02:00
Sarah Hoffmann
62d5984b1b limit the number of variants that can be produced 2021-07-04 10:28:28 +02:00
Sarah Hoffmann
c32551b4e0 restrict partial word counting to names of reasoanble length
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
e85f7e7aa9 fix subsequent replacements
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.

Also forbit composing a word after a space was added in the
end by a previous replacement.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
7b0f6b7905 leave ICU variant properties empty for now
Saving unused properties causes unnecessary duplicates.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0894ce9dc3 import abbreviations from OSM Wiki
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
4fd2e961b6 improve normalization
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
b9fbfeff67 only consider partials in multi-words for initial count
This ensures that it is less likely that we exclude meaningful
words like 'hauptstrasse' just because they are frequent.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
5dd24b3ef0 add documentation for ICU tokenizer configuration 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
62828fc5c1 switch to a more flexible variant description format
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a6aa6360e0 use yaml tag syntax to mark include files 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
c4f6c06f44 add dependency on datrie 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0d80a9b897 tests for composing decomposed suffixes 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
f70930b1a0 make compund decomposition pure import feature
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
9ff4f66f55 complete tests for icu tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
32ca631b74 fix full term token in special phrases 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e81084f35 complete tests for rule loader 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a0a7b05c9f correctly quote strings when copying in data
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2f6e4edcdb update unit tests for adapted abbreviation code 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1bd9f455fc add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e3c5d4c5b adapt tests for ICU tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
6ba00e6aee icu tokenizer: move transliteration rules in separate file
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
de4fac33dc docs: nominatim-ui should be installed from the release
The development version does not provide the pre-packaged
dist directory anymore.
2021-07-03 21:16:52 +02:00
Sarah Hoffmann
c9984669a7 Merge pull request #2373 from lonvia/tweak-search-cost
Further tweaking of search cost
2021-06-26 16:21:08 +02:00
Sarah Hoffmann
63755c31ff remove penalty for full words in address
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.

Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
161f5f5cee adjust penalty for housenumber-in-name searches
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.

Change the code to first compute the penalties and then create
the new search variants.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
c7073a1fc0 increase minimum Python to 3.6
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.

Affected distributions are Ubuntu 16.04 and Debian Stretch.
2021-06-21 18:37:37 +02:00
Sarah Hoffmann
e7b4fc70e7 make sure old data gets deleted on place type change
When changing from some other place type to place=postcode
make sure that the old place type entry in the place table
is deleted.
2021-06-18 10:58:41 +02:00
Sarah Hoffmann
457982e1d2 update postcode in place if it already exists 2021-06-18 00:28:52 +02:00
Sarah Hoffmann
aa558e6080 Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search
Do not return POIs when dropping house number in query
2021-06-17 15:30:05 +02:00
Sarah Hoffmann
fe11d3cbbd do not return POIs when dropping house number in query
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.

This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
2021-06-17 14:21:20 +02:00
Sarah Hoffmann
1ce223a83b Merge pull request #2360 from AntoJvlt/postcodes-place-table
Use place instead of placex to compute postcodes
2021-06-16 11:45:07 +02:00
AntoJvlt
3676310efe Improved performance of the postcodes query and some code cleaning 2021-06-12 15:46:08 +02:00
AntoJvlt
ddf866c4c7 Always delete old placex entry for type=postcode when inserting a new one into the place table 2021-06-12 15:35:51 +02:00
AntoJvlt
9e07a197e9 Handle postcode type change in place insert trigger 2021-06-09 09:31:32 +02:00
AntoJvlt
1c175e3a67 Clean and update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
47fb7cd3a8 Use place_exists() into can_compute() for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
e879814e43 Update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
a4733eed90 Use place instead of placex to compute postcodes 2021-06-09 09:31:32 +02:00
Sarah Hoffmann
38fbc4fcbb do not fail CI on codecov errors
The CodeCove upload depends on unreliable external code.
2021-06-08 10:42:14 +02:00
Sarah Hoffmann
c6fe91bfa5 Merge pull request #2359 from lonvia/switch-bdd-tests-to-api-search
Remove deprecated commandline query function
2021-06-06 18:29:51 +02:00
Sarah Hoffmann
7383f05e45 remove deprecated query interface
Searches can now be done via the thin API wrapper.
2021-06-06 15:28:21 +02:00
Sarah Hoffmann
3aac51c81f switch BDD tests to always use search API 2021-06-06 15:27:52 +02:00
Sarah Hoffmann
f0a7850edf Merge pull request #2358 from AntoJvlt/documentation-update
Update documentation
2021-06-04 23:54:37 +02:00
AntoJvlt
4336ca69c7 Update documentation 2021-06-03 18:39:40 +02:00
Sarah Hoffmann
4bca5e838b Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries
Fix insertion of special terms and countries into word table
2021-06-02 20:58:14 +02:00
Sarah Hoffmann
bc981d0261 fix insertion of special terms and countries into word table
Special terms need to be prefixed by a space because they are
full terms.

For countries avoid duplicate entries of word tokens.

Adds tests for adding country terms.
2021-06-02 20:22:39 +02:00
Sarah Hoffmann
b1d33e6b49 Merge pull request #2356 from lonvia/freeze-after-import
Call freeze after running and non-updateable import
2021-06-02 16:25:26 +02:00
Sarah Hoffmann
38d442edf6 docs: reload SQL when migrating to 3.6
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.

Fixes #2335.
2021-06-02 16:11:29 +02:00
Sarah Hoffmann
72625dc72a call freeze after running and non-updateable import
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
2021-06-02 11:08:48 +02:00
Sarah Hoffmann
cc2f152d70 commit changes to replication log table
Fixes #2350.
2021-05-26 11:47:08 +02:00
Sarah Hoffmann
f74dc38766 always compute guessed postcode for POIs from centroid
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.

Fixes #2301.
2021-05-26 11:15:13 +02:00
Sarah Hoffmann
7d9665d8d2 Merge pull request #2349 from lonvia/fix-website-refresh
Only initialise tokenizer for refresh functions where needed
2021-05-25 20:43:44 +02:00
Sarah Hoffmann
a0e85cc17c only initialise tokenizer for refresh functions where needed
Fixes #2347.
2021-05-25 19:16:22 +02:00
Sarah Hoffmann
29b02f9e56 Merge pull request #2346 from lonvia/words-vs-tokens
Cleanup use of partial words in legacy tokenizers
2021-05-24 17:41:38 +02:00
Sarah Hoffmann
24c986c842 add tests for new full name computation with ICU 2021-05-24 10:41:42 +02:00
Sarah Hoffmann
4f4d15c28a reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
  (but not the part after the bracket)

Fixes #244.
2021-05-24 10:41:42 +02:00
Sarah Hoffmann
fa3e48c59f use make_keywords for place search terms also
Ensures that place indeed uses the same search names as other
names.
2021-05-23 23:08:11 +02:00
Sarah Hoffmann
02f6afa51b always ignore multi term partials in search
Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.
2021-05-23 22:13:03 +02:00
Sarah Hoffmann
10143e0ac7 Merge pull request #2342 from lonvia/icu-tokenizer-ci
Add BDD tests with icu tokenizer to CI runs
2021-05-22 10:36:35 +02:00
Sarah Hoffmann
8f3429939f CI: run BDD tests with legacy_icu tokenizer 2021-05-21 23:18:45 +02:00
Sarah Hoffmann
00094c43d1 enable Tiger BDD API test for legacy_icu 2021-05-21 22:39:56 +02:00
Sarah Hoffmann
8bf15fa691 Merge pull request #2341 from lonvia/cleanup-python-tests
Cleanup and linting of python tests
2021-05-20 17:30:30 +02:00
Sarah Hoffmann
63dc503b8d Merge pull request #2337 from mogita/fix/invalid-query-string
fix: add the missing question mark
2021-05-20 10:26:23 +02:00
Sarah Hoffmann
430c316e45 test: fix linting errors 2021-05-19 23:07:39 +02:00
Sarah Hoffmann
01f5a9ff84 test: more use of table_factory 2021-05-19 17:37:03 +02:00
Sarah Hoffmann
af52eed0dd test: avoid use of tempfile module
Use the tmp_path fixture instead which provides automatic
cleanup.
2021-05-19 16:43:26 +02:00
Sarah Hoffmann
f93d0fa957 test: use src_dir fixture instead of self-computed paths 2021-05-19 16:03:54 +02:00
Sarah Hoffmann
c06a1d007a test: replace raw execute() with fixture code where possible 2021-05-19 12:11:04 +02:00
Sarah Hoffmann
65bd749918 test: use table_rows() and execute_values() where possible
Some uses of scalar() could also be replaced with convenience
functions from the word table mock.
2021-05-19 10:51:10 +02:00
Sarah Hoffmann
510eb53f53 test: move Testingcursor into separate class
Also adds more convenience functions: counting with a where
statement and a wrapper to execute_values().
2021-05-19 10:30:36 +02:00
mogita
507543a482 fix: add the missing question mark 2021-05-19 13:35:15 +08:00
Sarah Hoffmann
16bb007135 Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer
Do not hide errors when importing tokenizer
2021-05-18 23:00:10 +02:00
Sarah Hoffmann
1ffb6bd5d0 Merge pull request #2321 from AntoJvlt/csv-import-special-phrases
CSV import for special phrases and loader refactoring
2021-05-18 22:58:25 +02:00
AntoJvlt
799a4c9ab6 Documentation update and small code fixes 2021-05-18 22:35:21 +02:00
Sarah Hoffmann
b2722650d4 do not hide errors when importing tokenizer
Explicitly check for the tokenizer source file to check that
the name is correct. We can't use the import error for that
because it hides other import errors like a missing
library.

Fixes #2327.
2021-05-18 16:28:21 +02:00
Sarah Hoffmann
54b06d7abc Merge pull request #2332 from lonvia/fix-keyword-details
Always use object type for details keywords
2021-05-18 11:30:58 +02:00
Sarah Hoffmann
fef1bbb1a7 always use object type for details keywords
When name and address is empty, the keywords field in the response
of the details API would be an array because that is what PHP's
json_encode defaults to with empty array(). This default can only
be changed globally per json_encode call and that might cause
unintended colleteral damage. Work around the issue by making
name and address an empty array instead of keywords.

Fixes #2329.
2021-05-17 16:36:32 +02:00
AntoJvlt
3206bf59df Resolve conflicts 2021-05-17 13:52:35 +02:00
AntoJvlt
a33f2c0f5b Special phrases documentation updated 2021-05-17 13:25:16 +02:00
AntoJvlt
8b8dfc46eb Added --no-replace command for special phrases importation and added corresponding tests 2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed Code cleaning and SPLoader deleted 2021-05-16 16:59:12 +02:00
AntoJvlt
fb0ebb5bf0 Add tests for the new SPWikiLoader and SPCsvLoader 2021-05-16 16:10:06 +02:00
Sarah Hoffmann
925726222f Merge pull request #2323 from darkshredder/disable-search-reverse-only
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
550e7edb64 Merge pull request #2328 from lonvia/convert-tiger-to-csv
Switch external Tiger data to CSV format
2021-05-14 09:58:50 +02:00
Sarah Hoffmann
2992dea5c8 install default settings for legacy_icu tokenizer 2021-05-14 09:44:10 +02:00
Sarah Hoffmann
e76e4bd964 adapt documentation to use Tiger CSV dump 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
7d621389ee adapt tests to new TIGER CSV format 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c use tokenizer during Tiger data import
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5 feat: Added reverse-only-search validation 2021-05-14 02:36:21 +05:30
Sarah Hoffmann
d7f9d2bde9 Merge pull request #2326 from lonvia/wokerpool-for-tiger-data
Use WorkerPool when importing Tiger data
2021-05-13 22:09:56 +02:00
Sarah Hoffmann
5feece64c1 use WorkerPool for Tiger data import
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
b9a09129fa move WorkerPool into db module
The pool is independent of the indexer and may also be used
by other parts of the software.
2021-05-13 17:11:17 +02:00
Sarah Hoffmann
96e6bbe3a1 Merge pull request #2325 from lonvia/do-not-precompute-postcodes
Do not preload postcodes in the legacy tokenizer
2021-05-13 17:00:29 +02:00
Frederik Ramm
fe39185894 Add array_key_last function for PHP <7.3
This patch adds an array_key_last function if it doesn't yet exist, fixes #2316. It is tested on PHP 7.2.24 but not PHP 7.3.
2021-05-13 16:42:22 +02:00
Sarah Hoffmann
fc860787dd do not preload postcodes
This is too expensive for updates.
2021-05-13 16:14:12 +02:00
Sarah Hoffmann
63e35574d4 Merge pull request #2324 from lonvia/generic-external-postcodes
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7 fix token_info migration
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75 ignore invalid coordinates in external postcodes 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24 ignore entries without country code 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
41b9bc9984 add documentation for external postcode feature 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4 correctly handle removing all postcodes for country 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54 index postcodes after refreshing 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234 add and extend tests for new postcode handling 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83 move filling of postcode table to python
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.

External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
cae0cf3546 Merge pull request #2322 from mtmail/type-label-already-lowercased
typelabel value is already lowercased
2021-05-12 20:25:22 +02:00
marc tobias
38f9e18afb typelabel value is already lowercased 2021-05-12 19:16:51 +02:00
AntoJvlt
9d83da830f Introduction of SPCsvLoader to load special phrases from a csv file 2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57 Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader 2021-05-10 21:49:31 +02:00
Sarah Hoffmann
40cb17d299 Merge pull request #2314 from lonvia/fix-status-no-import-date
Correctly catch the exception when import date is missing
2021-05-06 17:41:53 +02:00
Sarah Hoffmann
2ae293aeb6 Merge pull request #2312 from lonvia/icu-tokenizer
Add new tokenizer based on libICU
2021-05-06 17:22:04 +02:00
Sarah Hoffmann
d8ead78e03 correctly catch the exception when import date is missing 2021-05-06 16:27:42 +02:00
Sarah Hoffmann
b2c6eca2c8 add missing transliterations
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
2021-05-05 21:16:55 +02:00
Sarah Hoffmann
872ab91421 fix name of transliterator
Should be different from the normalisation rules.
2021-05-05 17:09:38 +02:00
Sarah Hoffmann
a263e54b94 enable BDD tests for different tokenizers
The tokenizer to be used can be choosen with -DTOKENIZER.

Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
2021-05-05 10:31:51 +02:00
Sarah Hoffmann
18c99a5c5f add unit tests for legacy ICU tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
d55fc39275 cache translieration results 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
ba8ed7967d add PHP part for new ICU-base tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
f44af49df9 add Python part for new ICU-based tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
3c67bae868 Merge pull request #2310 from RhinoDevel/master
2nd try: Add hint about replication update & recheck intervals being in seconds.
2021-05-04 12:45:26 +02:00
Marc
3dade534fd Add hint about replication update & recheck intervals being in seconds. 2021-05-04 11:47:15 +02:00
Sarah Hoffmann
8b1a509442 Merge pull request #2305 from lonvia/tokenizer
Factor out normalization into a separate module
2021-05-03 09:15:34 +02:00
Sarah Hoffmann
8bdb9aa607 mock tokenizer factory for replication tests 2021-05-01 10:50:39 +02:00
Sarah Hoffmann
36c624ec71 commit between migrations
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
7fd871a74d increase database version for tokenizer migration 2021-05-01 10:47:35 +02:00
Sarah Hoffmann
ced8f0f4a2 fix liniting issues 2021-04-30 17:59:50 +02:00
Sarah Hoffmann
388ebcbae2 move index creation for word table to tokenizer
This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
20891abe1c indexer: fetch extra place data asynchronously
The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
6ce6f62b8e fetch place info asynchronously 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
602728895e indexer: fetch ids in batches 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
fc995ea6b9 move database check for module to tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
be6262c6ce move status test to tokenizer
The availability of the module is now tested by the tokenizer.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
893490f94e add more tests for legacy tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
044bb6afa5 move tokenization in query into tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057 boilerplate for PHP code of tokenizer
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
23fd1d032a tests for legacy tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
7cb7cf848d move amenity creation to tokenizer
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e move default country name creation to tokenizer
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
dc700c25b6 cache all postcodes 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
0ba93e5ba9 reorganise address iteration in tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
0da481f207 remove debug code 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d75a235c1f use address tokens in SQL 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9e92759ac7 extract address tokens in tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e move postcode normalization into tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60 move houseunumber handling to tokenizer
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d711f5a81e move name token creation into tokenizer
Name tokens are now handed in via token_info and used from there.

Also moves the generic search name insertion function back to
placex_triggers.sql.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fa2bc60468 introduce name analyzer
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3 require tokeinzer for indexer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
1b1ed820c3 introduce index for finding surrounding buildings 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd add extra column for tokenizer
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8 introduce external processing in indexer
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399 move word table and normalisation SQL into tokenizer
Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c add migration for configurable tokenizer
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f move module installation to legacy tokenizer 2021-04-30 11:29:57 +02:00
Sarah Hoffmann
af968d4903 introduce tokenizer modules
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.

The legacy tokenizer implements Nominatim's traditional algorithms.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
5c7b9ef909 Merge pull request #2303 from lonvia/remove-aux-support
Remove support for AUX housenumber tables
2021-04-30 11:19:35 +02:00
Sarah Hoffmann
185d369404 remove support for AUX housenumber tables
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6 Merge pull request #2299 from lonvia/update-actions
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112 Merge pull request #2291 from AntoJvlt/special-phrases-statistics
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a do not check for extra housenumber index for reverse-only
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
1fd483643b add tests for different scripts 2021-04-26 23:01:06 +02:00
Sarah Hoffmann
a21a0864f1 Merge pull request #2298 from lonvia/add-warming-to-ci
Add warming to CI import tests and fix more Python 3.5 compatibility issues
2021-04-26 11:21:44 +02:00
Sarah Hoffmann
4457bf7528 avoid Path in subprocess parameters
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
Sarah Hoffmann
5ed6f18d83 add warming to CI import test 2021-04-26 09:54:09 +02:00
AntoJvlt
abb3d56b20 Switching to log info and only send warning for invalid phrases 2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0 Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class 2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2 reorganization of folder/file for the special phrases importer 2021-04-25 17:57:42 +02:00
Sarah Hoffmann
6812f397af Merge pull request #2297 from lonvia/update-deployment-docs
docs: update deployment to use project directory
2021-04-24 15:35:00 +02:00
Sarah Hoffmann
68bd9c6091 Merge pull request #2296 from lonvia/disable-too-few-public-methods-check
pylint: disable too-few-public-methods check
2021-04-24 15:03:28 +02:00
Sarah Hoffmann
754f9e3a20 docs: update deployment to use project directory
Fixes #2295.
2021-04-24 15:00:46 +02:00
Sarah Hoffmann
b951b11336 fix pylint complaints 2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9 pylint: disable check too-few-public-methods 2021-04-24 11:39:44 +02:00
Sarah Hoffmann
b4fe7d7c7d Merge pull request #2293 from darkshredder/update-manpage
Updated manual page
2021-04-24 09:20:28 +02:00
Sarah Hoffmann
5071710db7 Merge pull request #2294 from lonvia/update-actions
CI: add import test against Python 3.5 and fix discovered issues
2021-04-23 23:33:15 +02:00
Sarah Hoffmann
9faaf3fc88 actions: add import on ubuntu 18.04
This uses oldest possible dependencies where possible.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9c51c133f7 indexes with includes are not available for postgresql < 11 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
91d2fb6b1c use group() for regex matches
Needed for compatibility with Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7 use pathlib version of open 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99 subprocess needs string argument
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
f8f8c7e534 check for existance of custom .env before opening 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4 use more generic ImportError to check for module
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9685c68e30 replace usages of fromisoformat() with strptime()
fromisoformat was only introduced with Python 3.7 while we
still support Python 3.5.

Fixes #2292.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
95e6ec091b remove argparse dependency for vagrant scripts
Users don't need to recreate the manpage.
2021-04-23 22:50:08 +02:00
Darkshredder
34f5e4a199 Updated manual page 2021-04-24 01:42:38 +05:30
Sarah Hoffmann
788baafa26 bdd tests: fix place dependen ranking tests
The ranks of places may differ for some countries. Force the
place nodes in the test on null island which always uses the
default ranking.
2021-04-22 17:31:00 +02:00
Sarah Hoffmann
4c31813398 Merge pull request #2288 from RhinoDevel/patch-1
Replace "nominatim-update" with "nominatim".
2021-04-22 17:12:25 +02:00
RhinoDevel
b7bae80616 Replace "nominatim-update" with "nominatim".
If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".
2021-04-22 15:40:22 +02:00
Sarah Hoffmann
f7e4aa51d3 indexer: reset query counter
Reset the counter for queries after the asynchronous connections
have been reopened.
2021-04-21 10:33:45 +02:00
Sarah Hoffmann
696c50459f Merge pull request #2285 from lonvia/split-indexer-code
Rework indexer code
2021-04-20 15:34:14 +02:00
Sarah Hoffmann
50b6d7298c factor out async connection handling into separate class
Also adds a test for reconnecting regularly while indexing.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
26a81654a8 indexer: make self.conn function-local
Also switches to our internal connect function which gives us
a cursor with a sclar() function.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
6430371d7d make index() function private 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
18705b3f18 move analyse function into indexinf function 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c6bd2bb7fb indexer: move runner into separate file 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c4fd94bd1a Merge pull request #2284 from lonvia/cleanup-word-frequency-computation
Rename and simplify function for word pre-computation
2021-04-19 18:28:04 +02:00
Sarah Hoffmann
b88b952f56 simplify token precomputation
Rename function to reflect that it is only used for precomputation.
The token IDs are not really needed, so don't bother to compute
the array of tokens.
2021-04-19 17:24:19 +02:00
Sarah Hoffmann
d68b02d36a remove unused word recomputation script
Has been replaced by a script recomputing counts from search_name.
2021-04-19 16:40:57 +02:00
Sarah Hoffmann
b9b85eb208 Merge pull request #2283 from darkshredder/tiger-data-test-fix
Fix: tiger-data tarfile test
2021-04-19 13:56:36 +02:00
Darkshredder
1f898405a6 Fix: tiger-data tarfile test 2021-04-19 16:02:52 +05:30
Sarah Hoffmann
6f6910101e Merge pull request #2282 from lonvia/add-paths-to-config
Include software paths in Python config object
2021-04-19 12:14:25 +02:00
Sarah Hoffmann
79d55357e8 simplify sql and website creation functions 2021-04-19 10:53:30 +02:00
Sarah Hoffmann
4fa6c0ad53 simplify constructor for SQL preprocessor
Use sql path from config.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
8f63f9516b simplify interface for adding tiger data
Also simplifies tests using existing fixtures.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
995ba2c7c2 add library directories to config
Allows to reduce the number of parameters in functions that take
the config anyway.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
830e3be1e6 Merge pull request #2281 from changpingc/changping/fix-tiger-index
fix index on location_property_tiger (parent_place_id)
2021-04-19 08:42:59 +02:00
Channgping Chen
29a314a092 fix index on location_property_tiger (parent_place_id)
Looks like 2af82975cd
accidentally renamed an index. Because of the added "if not
exists" clause, the index doesn't get created. This
significantly slows down reverse queries because they now
require full scans on location_property_tiger.

Without this fix, reverse queries can take 8s on a full
planet install on an r5.8xlarge instance in EC2.
2021-04-19 00:33:15 +00:00
Sarah Hoffmann
abdba5fdc7 Merge pull request #2280 from AntoJvlt/Fix-special-phrases-import-and-tests-cleaning
Fix regex and sanity check for the import of special phrases and tests cleaning.
2021-04-18 11:57:19 +02:00
AntoJvlt
b2ae715699 Only log a warning if a wrong input is detected on the wiki while importing special phrases 2021-04-17 20:19:39 +02:00
AntoJvlt
a95c748363 Fix occurence regex 2021-04-17 19:24:13 +02:00
AntoJvlt
ec859e41c6 Cleaned tests and add database cleaning tests on test_import_from_wiki 2021-04-17 19:23:33 +02:00
Sarah Hoffmann
7aeae9da81 Merge pull request #2279 from lonvia/add-index-for-continued-indexing
Add index for continued indexing
2021-04-17 11:51:21 +02:00
Sarah Hoffmann
2ca11ccc6b add tests for continuing import 2021-04-17 11:10:36 +02:00
Sarah Hoffmann
d74ae669e3 add support index when continuing import at index phase
Indexing scans the placex table sequentially during indexing
on the initial import. That is okay because we know that all
rows need to be processed anywhere. When continuing the import,
however, a large part might already be indexed, so that the
process spends a lot of time going through rows that are no
longer of interest. Create a supporting index for all unindexed
rows to speed up the scan. This is the same index as used later
for updates.
2021-04-17 11:07:04 +02:00
Sarah Hoffmann
9fabc5572d Merge pull request #2278 from lonvia/remove-transistion-functions
Remove transition functions
2021-04-17 10:13:33 +02:00
Sarah Hoffmann
da98a2102a remove transition functions from Python 2021-04-16 18:41:14 +02:00
Sarah Hoffmann
fb3353b854 Merge pull request #2277 from lonvia/update-osm2pgsql
Update osm2pgsql to current master
2021-04-16 17:40:43 +02:00
Sarah Hoffmann
b7e5c54593 remove PHP code for transition functions 2021-04-16 17:28:51 +02:00
Sarah Hoffmann
68beec5590 remove installation of PHP util scripts 2021-04-16 17:09:40 +02:00
Sarah Hoffmann
6ba06d1eb4 Merge pull request #2276 from lonvia/port-country-code-creation-to-python
Port country code creation to python
2021-04-16 16:57:04 +02:00
Sarah Hoffmann
0f11e311c4 add test for new postcode import function 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
886a01c796 port function to compute initial postcodes to Python 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
a632b9f86a Merge pull request #2275 from lonvia/switch-to-absolute-imports
Use absolute imports in Python code
2021-04-16 15:04:10 +02:00
Sarah Hoffmann
76b1885595 use absolute imports in Python code
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Sarah Hoffmann
c55b409cf6 update osm2pgsql to current master (fixes version output) 2021-04-15 10:24:01 +02:00
Sarah Hoffmann
c64193f839 Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate
Implemented auto update of special phrases while importing them
2021-04-15 10:13:25 +02:00
Sarah Hoffmann
28a2a795ba Merge pull request #2270 from lonvia/simplify-place-boundary-merge
Simplify matching between place and boundary names
2021-04-15 10:12:53 +02:00
Sarah Hoffmann
e90adfc7c3 adapt database check to new index layout 2021-04-14 17:52:59 +02:00
Sarah Hoffmann
16267dc021 add migration for new placenode geometry index 2021-04-14 17:52:59 +02:00
Sarah Hoffmann
e7266b52ae simplify name matching between boundary and place node
Instead of normalising the names simply compare them in lower
case. This removes the dependency on the tokenizer for
linking boundaries and nodes. When looking up the linked places
by place type also allow that one name is simply contained in the
other. This catches the frequent case where one of the names has
an addendum (e.g. Newport vs. City of Newport).

Drops the special index for the name lookup and insted relies
on a slightly extended version of the geometry index used for
reverse lookup. Saves around 100MB on a planet.
2021-04-14 17:52:59 +02:00
Sarah Hoffmann
dc02610408 Merge pull request #2269 from lonvia/fix-actions
github actions: reintroduce postgresql repo
2021-04-14 17:50:02 +02:00
Sarah Hoffmann
dc1bfe4a93 github actions: reintroduce postgresql repo 2021-04-14 17:25:44 +02:00
Sarah Hoffmann
cf69daaafb Merge pull request #2264 from darkshredder/tiger-data-tests
Fix:  Error if last statements is wrong and improved tests in tiger data import
2021-04-14 10:56:12 +02:00
Darkshredder
49ee7505ed Fix: Removed error if endstatement is wrong and improved tests 2021-04-13 15:44:12 +05:30
AntoJvlt
ae2b2cb9a5 Tests added for the auto update of special phrases during import 2021-04-12 14:35:29 +02:00
AntoJvlt
8c2f287ce4 Implemented auto update of special phrases while importing them 2021-04-12 14:30:48 +02:00
Sarah Hoffmann
2351f36315 Merge pull request #2260 from AntoJvlt/fix-load-languages-special-phrases
Fix default languages loading for special phrases import
2021-04-11 23:09:45 +02:00
AntoJvlt
5ecae10713 Fix default languages loading 2021-04-11 22:26:31 +02:00
Sarah Hoffmann
2e3d657794 Merge pull request #2258 from darkshredder/code-coverage
Disabled Code coverage status checks
2021-04-10 21:19:55 +02:00
Darkshredder
90f990b806 CodeCov comment only when codecoverage changes 2021-04-10 22:28:29 +05:30
Darkshredder
7666d48409 Disabled Coverage status checks 2021-04-10 20:44:52 +05:30
Sarah Hoffmann
be4cb190e8 add badge for codecov 2021-04-10 16:57:39 +02:00
Sarah Hoffmann
2f4eca8c46 Merge pull request #2252 from darkshredder/code-coverage
Added Code coverage support using Codecov
2021-04-10 16:37:12 +02:00
Sarah Hoffmann
71564fa1de split LANGUAGES parameter before use
The user supplies the languages as a comma-separated list.
2021-04-09 17:48:28 +02:00
Sarah Hoffmann
ce08cb6cd7 add migration information for new configuration format 2021-04-08 11:01:46 +02:00
Sarah Hoffmann
1f0cf6311a Merge pull request #2256 from lonvia/remove-reverseinplan-option
Remove ReverseInPlan option
2021-04-08 10:54:16 +02:00
Sarah Hoffmann
1db468b6c3 remove special handling for reversed queries in getGroupedSearches
getGroupedSearches is guaranteed not to be called with reversed
structured queries, so there is no need to have special exclusion
code.
2021-04-08 10:35:14 +02:00
Sarah Hoffmann
534de5ba81 remove reverseInPlan option from Geocode
Disabling query reversal is no longer possible in the configuration,
so there is no need to keep this as an option. Reversal is
automatically disabled for structured search only.
2021-04-08 10:19:27 +02:00
Sarah Hoffmann
492186716f prepare 3.7.0 release 2021-04-06 21:23:29 +02:00
Sarah Hoffmann
07fda48cee docs: minor spelling corrections 2021-04-06 16:09:53 +02:00
Sarah Hoffmann
4b31be5203 docs: unpacking tiger data is no longer necessary 2021-04-06 15:56:08 +02:00
Sarah Hoffmann
5d69c7ade1 Merge pull request #2250 from lonvia/save-transliterated-housenumbers
Switch to saving transliterated housenumbers in placex
2021-04-05 15:48:22 +02:00
Darkshredder
2bfea15fdc Fixed BDD tests coverage reports 2021-04-05 06:30:31 +05:30
Sarah Hoffmann
96b0699621 add migration for transliterated housenumbers 2021-04-04 15:26:47 +02:00
Sarah Hoffmann
6cbef84cad use new transliteration in initial housenumber word computation
The new create_housenumber_id() function splits housenumber
lists correctly. Otherwise there is no difference.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
55fcc44c8c correctly handle housenumber lists
Lists are now standardised to use a semicolon separator.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
16a66b5326 move transliteration of housenumbers into indexing
Housenumbers are now saved in transliterated form in the housenumber
column. This saves the transliteration step during lookup.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
3590e76a1c tests for finding non-ascii housenumbers 2021-04-04 15:26:47 +02:00
Sarah Hoffmann
0ec3fdd3ba return housenumbers always from address field
This means that we can use normalized versions of the
housenumber in the housenumber field as it is no longer
a user visible field.
2021-04-04 15:26:47 +02:00
Sarah Hoffmann
c0f0b66509 Merge pull request #2248 from darkshredder/special-term-test
Added Test for TokenSpecialTerm
2021-04-03 18:31:01 +02:00
Darkshredder
0f9df32d11 Added Test for TokenSpecialTerm 2021-04-02 04:49:05 +05:30
Sarah Hoffmann
a370c8be4b Merge pull request #2247 from lonvia/index-for-housenumber-lookup
Index for housenumber lookup
2021-04-01 18:35:00 +02:00
Sarah Hoffmann
d6e0bc698e add recommendation for Postgresql 11+ 2021-04-01 17:10:44 +02:00
Sarah Hoffmann
8d8b1d4307 use non-key index to speed up housenumber search
On Postgresql versions 11+ add an index to speed up the lookup
of housenumbers for terms found in search_name. This is really
just a band-aid around the query planer's interpretation of the
query.
2021-04-01 17:10:44 +02:00
Darkshredder
771b3377c0 Added code-cov Support for Code Coverage 2021-03-31 05:00:03 +05:30
Sarah Hoffmann
8dbfdd59b0 Merge pull request #2243 from darkshredder/XML-format-fix
Fixed: XML format: more_url points to localhost, not base URL
2021-03-30 09:19:01 +02:00
Sarah Hoffmann
cd03882536 Merge pull request #2244 from AntoJvlt/import-special-phrases-tests-cleaning
Cleaned tests for special phrases.
2021-03-30 09:17:27 +02:00
Darkshredder
0b154a2a1a Added HTTP_HOST to if statement 2021-03-30 03:02:55 +05:30
AntoJvlt
e82de99e5a Cleaned tests of exceptions and fix phrase_settings.json test file name. 2021-03-29 22:07:29 +02:00
Darkshredder
27b379c1e3 fixed: XML format: more_url points to localhost, not base URL 2021-03-30 01:02:43 +05:30
Sarah Hoffmann
f9517e9143 Merge pull request #2234 from darkshredder/add-man-page
Added Manual page for Nominatim tool
2021-03-29 14:25:10 +02:00
Sarah Hoffmann
e05dee6df5 allow sorting by housenumbers for rare street names
Usually we don't narrow down search results by house number when
only a street name is given because there may be a lot of rows
to cross check when the street name is very frequent. However,
when it is known to be rare, the housenumber check may be done
anyway.

Fixes #2238.
2021-03-29 12:06:51 +02:00
Darkshredder
3fad492c6f Update manpage after rebase 2021-03-29 14:27:06 +05:30
Darkshredder
b7d6ae93e3 Nominatim/cli.py rebase fixes 2021-03-29 14:16:41 +05:30
Darkshredder
21b1b75b08 Rebase with master 2021-03-29 14:00:45 +05:30
Darkshredder
bbe0353b23 fixed indentation and used sed to remove AUTHORS section 2021-03-29 13:57:13 +05:30
Darkshredder
51e2654cd2 Added Manual page and fixed documentation 2021-03-29 13:57:13 +05:30
Sarah Hoffmann
09b2510219 Merge pull request #2228 from AntoJvlt/import-special-phrases-porting-python
Import special phrases porting python
2021-03-29 09:49:35 +02:00
AntoJvlt
57ce75eb67 Change command 'import-special-phrases --from-wiki' to 'special-phrases --import-from-wiki'. 2021-03-26 02:22:38 +01:00
AntoJvlt
cde9389e75 Errors fixes, Cleaning code, Improvement and addition of tests 2021-03-26 01:53:33 +01:00
AntoJvlt
2c19bd5ea3 Encapsulation of tools/special_phrases.py into SpecialPhrasesImporter class and add new tests. 2021-03-25 21:13:57 +01:00
AntoJvlt
ff34198569 Code cleaning, tests simplification and use of python3-icu package 2021-03-23 23:56:39 +01:00
AntoJvlt
919469c8fe Updated documentation for PyICU support 2021-03-23 23:34:19 +01:00
AntoJvlt
1ce8b530cd Introduction of PyICU for transliteration in python. Reversed changes in normalization.sql. 2021-03-23 23:34:16 +01:00
AntoJvlt
2fb6018078 Added wrapper in specialphrases.php to call corresponding nominatim command. 2021-03-23 23:30:42 +01:00
AntoJvlt
6d56cbb3e8 Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file. 2021-03-23 23:30:39 +01:00
AntoJvlt
1a93319093 Changed phrase_settings.py to phrase-settings.json and added migration function for old php settings file. 2021-03-23 23:27:56 +01:00
Sarah Hoffmann
28b4fb12b6 Merge pull request #2233 from lonvia/index-for-postcode-ids
Create postcode id index earlier
2021-03-23 09:18:10 +01:00
Sarah Hoffmann
5dabc0aca8 create postcode id index earlier
Now that the indexer takes care of indexing the postcode tables,
the id index is needed to find the rows to index.
2021-03-22 22:24:56 +01:00
Sarah Hoffmann
4f1bdde32e Merge pull request #2231 from mtmail/correct-cli-help-page
nominatim -h was printing wrong text for lookup and details
2021-03-21 16:52:20 +01:00
Sarah Hoffmann
a08ca5b1b5 avoid division by zero in progress meter
On Windows systems the timer may not be accurate enough to measure
the time between init() and done(). Avoid computing statistics with
a diff time of 0 in such cases.

Fixes #2230.
2021-03-21 16:47:22 +01:00
marc tobias
87d5883ddb nominatim -h was priting wrong text for lookup and details 2021-03-21 16:06:41 +01:00
AntoJvlt
d5acade4db Deleted specialphrases.php and phrase_settings.php 2021-03-20 19:48:05 +01:00
AntoJvlt
9d1c23e4f5 Updated specialphrases_testdb.sql 2021-03-20 19:17:03 +01:00
AntoJvlt
17cb59efbd Ported functions for the import of special phrases from php to python.
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
2021-03-20 19:11:50 +01:00
Sarah Hoffmann
118befd7d7 bdd tests: make indexing less verbose
Do not print progress info for indexing when there is an error
in the BDD tests.
2021-03-20 10:39:29 +01:00
Sarah Hoffmann
0d9fe6e49c Merge pull request #2219 from lonvia/bdd-test-remove-php
BDD tests: run all setup via nominatim Python library
2021-03-17 11:40:34 +01:00
Sarah Hoffmann
ebae3553e0 bdd: run all setup via nominatim Python library
Drops all calls to PHP utility functions. nominatim cli functions
are used where possible, to stay as close to the final code as
possible with the tests.

By removing the PHP calls, the test code now only uses osm2pgsql and
the database module from the build directory.
2021-03-16 22:20:41 +01:00
Sarah Hoffmann
d3ff831b8a Merge pull request #2216 from lonvia/fix-reverse-interpolation
Reverse: do not prefer interpolations over closer housenumbers
2021-03-15 14:08:54 +01:00
Sarah Hoffmann
4d7c5ec089 reverse: do not prefer interpolations over closer housenumbers
Always look up the closest housenumber before looking up
interpolations. This ensures that closer housenumbers are
preferred over interpolations.

Fixes #2214.
2021-03-15 10:50:04 +01:00
Sarah Hoffmann
81a6b746b8 Merge pull request #2212 from darkshredder/country-name
Ported createCountryNames() to python and Added tests
2021-03-15 09:36:06 +01:00
Darkshredder
f356a75a24 Add setup.php 2021-03-14 15:02:30 +05:30
Sarah Hoffmann
7212fa8630 fix template variable name 2021-03-13 12:05:53 +01:00
Sarah Hoffmann
6cabc44841 Merge pull request #2213 from lonvia/tweak-search-weights
Some more tweaking of the ranking of search interpretations
2021-03-12 15:47:36 +01:00
Darkshredder
b108bd1c1e Linting fix 2021-03-12 18:28:47 +05:30
Darkshredder
077a8c1f95 refactored tests and made changes to code for easy readibility 2021-03-12 18:23:20 +05:30
Darkshredder
7a874d5b97 Ported createCountryNames() to python and added tests 2021-03-12 10:28:41 +05:30
Sarah Hoffmann
9086a794a1 Merge pull request #2204 from darkshredder/tiger-data
Ported tiger-data-import to Python and Added Tarball Support
2021-03-11 22:48:38 +01:00
Sarah Hoffmann
6dd2b9c2ec do not mix partial names with other words
As soon as a housenumber, postcode, etc. appear, the name term
must obviously be closed and no further partial terms can be
appended.
2021-03-11 22:44:49 +01:00
Sarah Hoffmann
3fbe4511f9 make linter happy 2021-03-11 21:14:23 +01:00
Sarah Hoffmann
3933fc3ad3 avoid multi-term partials in names
Names are either full words or single-word partial names.
Searching for multi-word partials yields exactly the same
result as with full words.
2021-03-11 20:42:37 +01:00
Sarah Hoffmann
00b05e2394 higher penalty for special searches
Adds a general higher penalty for special search term and an
additional one if the term is anywhere but the beginning or the
end. Also housenumbers and special searches together are less
likely.
2021-03-11 20:37:51 +01:00
Sarah Hoffmann
d5e8c5e975 do not mix partial and full name terms
If NameNonSearch already contains a partial term, then a
full term must not be added to the Name list anymore.
2021-03-11 20:22:54 +01:00
Sarah Hoffmann
478dfb0639 add one-rank penalty for using partial search
Ensures that full matches are preferred over partial ones even when
the full word consists of only one term.
2021-03-11 17:52:44 +01:00
Sarah Hoffmann
f498e40208 fix result splitting for last search group
When we are in the final iteration of the search groups, it is not
possible to further delay the results. Unconditionally use the
results with the best rank instead.
2021-03-11 17:14:46 +01:00
Sarah Hoffmann
182f5f5d7b give preference to full words in address, too
Full word terms are already preferred for the name part. Adding
only one-word partials to the address, makes it impossible to
give a similar preference for the address part. Each term adds
a rank penalty. The problem here is that we interpret the query
forwards and backwards. Having different penalty systems for
name and address means that the same term ends up with different
penalties and that often leads to interpretations of the wrong
direction being in the way.
2021-03-11 15:03:36 +01:00
Darkshredder
e5719de657 Added fixture for sql_preprocessor and fixed some issues 2021-03-11 15:39:17 +05:30
Darkshredder
8486a83cf5 Added test for tarfile 2021-03-10 18:14:17 +05:30
Darkshredder
ccfad57fca Added test and removed runlegacyscript 2021-03-10 17:18:12 +05:30
Darkshredder
64128b699a fixed linting, refactored threaded sql handling and removed importTigerData() function 2021-03-10 13:28:29 +05:30
Darkshredder
4080fbb95c Test fixes 2021-03-09 01:00:56 +05:30
Darkshredder
14ec83c886 Linting fixes 2021-03-08 23:10:49 +05:30
Darkshredder
122c4618b9 Linting fixes 2021-03-08 22:59:51 +05:30
Darkshredder
2af82975cd Ported tiger-data-import to python and Added Tarball Support 2021-03-08 21:57:56 +05:30
Sarah Hoffmann
35f4695b67 Merge pull request #2200 from lonvia/migrations-for-current-version
Introduce a command for database migration
2021-03-08 10:14:03 +01:00
Sarah Hoffmann
3c9e09545e documentation for new migration command 2021-03-06 16:38:37 +01:00
Sarah Hoffmann
764a41b973 automatic migration from 3.6 release
Adds a 'admin --migrate' command that checks for the current
database version and runs any necessary migrations. Also
has migrations going back to 3.6.
2021-03-06 16:36:57 +01:00
Sarah Hoffmann
9d103503f7 Merge pull request #2197 from lonvia/use-jinja-for-sql-preprocessing
Use jinja2 for SQL preprocessing
2021-03-04 16:36:18 +01:00
Sarah Hoffmann
09f4d767e4 port index creation to python
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
2021-03-04 11:11:47 +01:00
Sarah Hoffmann
dd301cf5ac indexer: ANALYSE must be run outside transactions 2021-03-04 11:06:33 +01:00
Sarah Hoffmann
eacabb0e96 move table creation to jinja-based preprocessing 2021-03-03 22:07:51 +01:00
Sarah Hoffmann
6cda021d9b add new jinja2 requirement 2021-03-03 17:51:08 +01:00
Sarah Hoffmann
d2bd6aa78d introduce jinja2 for preprocessing SQL
Replaces various hand-crafted replacements of varying format with
a single Jinja2 templating mechanism. Allows full access to
configuration if necessary.
2021-03-03 17:51:08 +01:00
Sarah Hoffmann
6b306f30b6 Merge pull request #2194 from grischard/patch-1
Fix typo in .github/actions/build-nominatim/action.yml
2021-03-03 11:34:12 +01:00
Guillaume Rischard
c48fd18344 Update action.yml 2021-03-03 11:20:21 +01:00
Sarah Hoffmann
8ea7e04363 Merge pull request #2192 from lonvia/database-versioning
Introduce database versioning
2021-03-02 15:57:46 +01:00
Sarah Hoffmann
32c2d2b248 document new status fields 2021-03-01 22:21:37 +01:00
Sarah Hoffmann
111cca8c9a return database version with status API 2021-03-01 22:17:16 +01:00
Sarah Hoffmann
7ae9c3a9f0 add database_version setting to tests 2021-03-01 21:49:33 +01:00
Sarah Hoffmann
bf4320a7d6 do not depend on cmdline parameter for creating partition tables
The partition numbers in use only depend on the entries in search_name.
2021-03-01 21:28:39 +01:00
Sarah Hoffmann
3a0a4b9175 save software version in the database
The version represents the software version that was used to
import the data.
2021-03-01 20:35:15 +01:00
Sarah Hoffmann
4faefe156c report software version of status call 2021-03-01 16:47:19 +01:00
Sarah Hoffmann
86273f5e2a introduce database patch level for version
This will be needed later for automatic migrations.
2021-03-01 16:46:19 +01:00
Sarah Hoffmann
b4f64aa770 make sure that calls to PHP legacy scripts are fatal on error 2021-03-01 16:10:45 +01:00
Sarah Hoffmann
976c5e9121 introduce table for in-database properties
Adds a simple table where settings for the database can be
saved. This is useful for state that must not change after
import.
2021-03-01 16:09:17 +01:00
Sarah Hoffmann
db663dd92f remove unused import 2021-03-01 09:26:08 +01:00
Sarah Hoffmann
90a5d23016 use tmp_path fixture in config tests 2021-03-01 09:24:04 +01:00
Sarah Hoffmann
99e35d256a fix typo 2021-03-01 09:07:49 +01:00
Sarah Hoffmann
e14e7c6235 Merge pull request #2186 from lonvia/port-import-to-python
Move setup procedure to Python
2021-02-27 12:09:23 +01:00
Sarah Hoffmann
b46adbad22 make sure psql always finishes
If an execption is raised by other means, we still have to close
the stdin pipe to psql to make sure that it exits and releases its
connection to the database.
2021-02-27 10:24:40 +01:00
Sarah Hoffmann
afabbeb546 older versions of Postgresql need explicit return type 2021-02-27 09:46:42 +01:00
Sarah Hoffmann
d14a3df10f do not truncate search_name in reverse-only mode 2021-02-27 09:46:42 +01:00
Sarah Hoffmann
9feb84e426 actions: add psutil dependency 2021-02-26 16:50:09 +01:00
Sarah Hoffmann
c7f40e3cee fix verbose flag for PHP wrapper scripts
The flag must come after the command.
2021-02-26 16:49:32 +01:00
Sarah Hoffmann
dd03aeb966 bdd: use python library where possible
Replace calls to PHP scripts with direct calls into the
nominatim Python library where possible. This speed up
tests quite a bit.
2021-02-26 16:14:29 +01:00
Sarah Hoffmann
15b5906790 move setup function to python
There are still back-calls to PHP for some of the sub-steps.
These needs some larger refactoring to be moved to Python.
2021-02-26 15:02:39 +01:00
Sarah Hoffmann
3ee8d9fa75 properly close connections of indexer after use 2021-02-26 12:10:54 +01:00
Sarah Hoffmann
57db5819ef prot load-data function to python 2021-02-25 21:32:40 +01:00
Sarah Hoffmann
3c186f8030 add a function for the intial indexing run
Also moves postcodes to fully parallel indexing.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
db5e78c879 remove unused partitionfunction function 2021-02-25 18:42:54 +01:00
Sarah Hoffmann
c7fd0a7af4 port wikipedia importance functions to python 2021-02-25 18:42:54 +01:00
Sarah Hoffmann
32683f73c7 move import-data option to native python
This adds a new dependecy to the Python psutil package.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
7222235579 introduce custom object for cmdline arguments
Allows to define special functions over the arguments.

Also splits CLI tests in two files as they have become too many.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
f6e894a53a port database setup function to python
Hide the former PHP functions in a transition command until
they are removed.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
b93ec2522e use psql for executing sql files
This allows to run larger files without needing to keep
them in memory.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
af7226393a add function to set up libpq environment
Instead of parsing the DSN for each external libpq program we
are going to execute, provide a function that feeds them all
necessary parameters through the environment.

osm2pgsql is the first user.
2021-02-25 18:42:54 +01:00
Sarah Hoffmann
e520613362 convert connect() into a context manager 2021-02-25 18:42:54 +01:00
Sarah Hoffmann
204fe20b4b Merge pull request #2185 from lonvia/fix-deadlock-handling-for-psycopg27
Improve deadlock detection for various versions of psycopg2
2021-02-25 18:39:40 +01:00
Sarah Hoffmann
a1f0fc1a10 improve deadlock detection for various versions of psycopg2
Psycopg2 has changed the kind of exception that is emitted on
deadlocks between versions 2.7 and 2.8. The code was already
trying to catch both kind of errors but because the
psycopg2.errors package is unknown in 2.7 and below, the
code would throw an exception on anything but a deadlock error.

This commit wraps the deadlock handling into a context manager
to avoid code duplication and uses module imports to detect if
the new error codes are available.

Also sets the required psycopg2 version to 2.7 or bigger as
versions below are difficult to test.
2021-02-25 18:11:16 +01:00
Sarah Hoffmann
68c3862270 Merge pull request #2182 from lonvia/change-error-for-details
Return 404 for details when no object is found in database
2021-02-23 09:09:35 +01:00
Sarah Hoffmann
5b7483ada5 return 404 for details when no bject is found in database
Fixes #2157.
2021-02-22 16:28:29 +01:00
Sarah Hoffmann
72b01148d2 Merge pull request #2181 from lonvia/port-more-tool-functions-to-python
Port more tool functions to python
2021-02-22 16:11:21 +01:00
Sarah Hoffmann
971df231b0 avoid os.environ as default valie 2021-02-19 19:29:57 +01:00
Sarah Hoffmann
4b32cbe518 fix return code for check database run with 'not applicable' 2021-02-19 18:32:00 +01:00
Sarah Hoffmann
f08078ccca bdd tests: directly call python code for setup-website 2021-02-19 18:20:55 +01:00
Sarah Hoffmann
389138abfe port setup-website to python 2021-02-19 17:51:06 +01:00
Sarah Hoffmann
a0ae4945cd add unit tests for new check_database code 2021-02-18 20:36:11 +01:00
Sarah Hoffmann
b169e4c88c port check-database function to python
This change also adapts the hints to use the nominatim tool.
Slightly changed checks, so that they are just as effective on
a frozen database.
2021-02-18 17:32:30 +01:00
Sarah Hoffmann
a60c34bded use a frozen DB for API tests
This way we also test that dropping does the right thing.
2021-02-17 22:35:27 +01:00
Sarah Hoffmann
153dbb71b8 remove unused code 2021-02-17 22:25:23 +01:00
Sarah Hoffmann
101a1f895d port freeze function to python 2021-02-17 21:43:15 +01:00
Sarah Hoffmann
bd27310c68 Merge pull request #2173 from lonvia/penality-for-housenumberless-places
Increase penalty for places without housenumber
2021-02-17 17:52:59 +01:00
Sarah Hoffmann
42ecd535b3 Merge pull request #2174 from lonvia/disable-jit-for-osm2pgsql-again
Disable JIT and parallel execution for osm2pgsql updates again
2021-02-16 21:32:57 +01:00
Sarah Hoffmann
c9838a02ce disable JIT and parallel execution for osm2pgsql updates again
The gazetteer output doesn't disable these functions when
writing to the place table but the triggers may contain
operations that cause misplanning for the query planner.
2021-02-16 18:23:47 +01:00
Sarah Hoffmann
7ebcf602ac add simple test for result splitting with multiple ranks 2021-02-16 17:59:12 +01:00
Sarah Hoffmann
8eb85f1340 increase penalty for places without housenumber
Results where the housenumber was dropped are an unlikely result
when they refer to something other than a street. Therefore
increase their result rank so that other matches are tried first
before choosing them as a result.

Improves #2167.
2021-02-16 17:47:06 +01:00
Sarah Hoffmann
2a8e3741fa Merge pull request #2166 from mtmail/tiger-2020
documentation: 2020 TIGER data got released
2021-02-16 14:45:14 +01:00
Sarah Hoffmann
684378722c Merge pull request #2171 from lonvia/update-vagrant-scripts-for-make-install
Update vagrant scripts for make install
2021-02-16 14:42:38 +01:00
Sarah Hoffmann
286a686f88 switch vagrant scripts to make install 2021-02-16 12:04:34 +01:00
Sarah Hoffmann
7360e6c5df use file copy on older cmake to install osm2pgsql
Fixes #2170.
2021-02-16 11:06:14 +01:00
Sarah Hoffmann
fbe7be760b ignore failure to get replication date 2021-02-14 12:17:30 +01:00
marc tobias
a3ce89aeff documentation: 2020 TIGER data got released 2021-02-12 23:57:12 +01:00
Sarah Hoffmann
6a7e0d652b Merge pull request #2164 from lonvia/add-make-install
Make Nominatim installable
2021-02-12 10:32:26 +01:00
Sarah Hoffmann
7cc4c53adb always return 0 for updates unless there is an error
This is more in line with previous behavioru than returning
a status code when no updates are available.
2021-02-11 10:33:49 +01:00
Sarah Hoffmann
24b13a7a87 docs: adapt check-database command 2021-02-10 21:55:04 +01:00
Sarah Hoffmann
b6c2dbf69c actions: remove install directories before import
This ensures that any dangling references to the build
or source directory are caught by the CI.
2021-02-10 17:59:52 +01:00
Sarah Hoffmann
0e0e9a6809 need test database for analysing cli test 2021-02-10 16:19:51 +01:00
Sarah Hoffmann
ed60154552 actions: test import with installed version of Nominatim 2021-02-10 16:17:52 +01:00
Sarah Hoffmann
85589cf7dc add 'make install' to installation instructions 2021-02-10 11:15:21 +01:00
Sarah Hoffmann
99dcd10d3f test for existance of country grid in cmake already
Given that the file potentially gets installed, it needs to be
present during build time already. Checking during the import
is too late.
2021-02-10 10:40:36 +01:00
Sarah Hoffmann
745ae02f47 make installation targets conditional to what is built 2021-02-10 10:04:07 +01:00
Sarah Hoffmann
b6bd11f292 add make install target
Installation includes PHP andPython libraries, settings, the basic
country data, the postgresql module and our custom version of
osm2pgsql. The latter is installed in our private library directory
so that it does not get in the way of a potentially installed
osm2pgsql from the distribution.
2021-02-09 21:04:42 +01:00
Sarah Hoffmann
c60a0784ea adapt unit tests to new directory structure 2021-02-09 20:13:00 +01:00
Sarah Hoffmann
3cb6f3e460 use DataDir constant for data only
So far the data directory constant has pointed to the source
directory to be usable with different subdirectories. Now only
the data subdirectory itself is being used with the constant,
so point to the directory directly.
2021-02-09 20:04:08 +01:00
Sarah Hoffmann
de37dc9300 forgot to replace one occurence of sql_dir 2021-02-09 19:32:05 +01:00
Sarah Hoffmann
8ffd7d9243 remove unused BINDIR constant 2021-02-09 19:30:31 +01:00
Sarah Hoffmann
298ed11261 introduce constant for configuration directory
This replaces {data_dir}/settings throughout the code, so that
the configuration may be placed somewhere else in the directory
structure (e.g. in /etc).
2021-02-09 18:45:45 +01:00
Sarah Hoffmann
b9517c99ae rename sql directory to lib-sql
Also introduces a separate constant for the sql directory, so that
it can be put separately from the rest of the data if required.
2021-02-09 15:26:56 +01:00
Sarah Hoffmann
db3ced17bb rename lib to lib-php 2021-02-09 11:52:07 +01:00
Sarah Hoffmann
248b4cddab update osm2pgsql (disable install rule) 2021-02-09 09:48:50 +01:00
Sarah Hoffmann
d81e152804 integrate analyse of indexing into nominatim tool 2021-02-08 22:22:49 +01:00
Sarah Hoffmann
0cbf98c020 consolidate warm and db-check into single admin command 2021-02-08 21:05:06 +01:00
Sarah Hoffmann
195f9f5ef3 split cli.py by subcommands
Reduces file size below 1000 lines.
2021-02-08 17:23:05 +01:00
Sarah Hoffmann
a759c5b75b move website into php library directory 2021-02-08 12:00:34 +01:00
Sarah Hoffmann
7dfe645b5f move postcode table setup to sql/
Also moves the call to the setup from the setup-db
step to the calculate-postcodes step. The tables also need
no longer be accessible by the webservice.
2021-02-08 11:53:01 +01:00
Sarah Hoffmann
ca3283cbaa remove unused SQL script 2021-02-08 11:28:24 +01:00
Sarah Hoffmann
861e67dfe8 fix off-by-one error in replication download 2021-02-04 17:04:04 +01:00
Sarah Hoffmann
82ef02cd1a Merge pull request #2161 from lonvia/timeout-for-replication
Reintroduce timeout for replication file download
2021-02-04 16:52:24 +01:00
Sarah Hoffmann
948217d5e9 reintroduce timeout for replication file download
This ports the --socket-timeout parameter from
pyosmium-get-changes which ensures that the update
process eventually times out on hanging network connections.
2021-02-04 11:47:11 +01:00
Sarah Hoffmann
6cc06828db Merge pull request #2160 from lonvia/introduce-project-dir
Officially introduce and recommend use of a project directory
2021-02-04 09:52:59 +01:00
Sarah Hoffmann
0b2abfb115 replace make serve with nominatim serve command
With the website directory now tied to the project directory instead
of the build directory, it is no longer possible to use make for
running the web server.
2021-02-03 16:34:31 +01:00
Sarah Hoffmann
b2f8fb6201 add migration info for status table 2021-02-03 14:13:09 +01:00
Sarah Hoffmann
e2329c03fe Revert "increase splitting for large geometries"
This reverts commit 559fe513fa.

Increasing the splitting results in geometries where with rounding
issues at the split points, so that contain operations do not
work as expected anymore.

Fixes #2137.
2021-02-03 10:23:38 +01:00
Sarah Hoffmann
9bca670b4e adapt quick start instructions in README to project dir 2021-02-03 10:17:22 +01:00
Sarah Hoffmann
cb06d1f4ca do not overwrite custom set module paths
Given that the module is now copied to the project directory
when no module path is set, we need the information that the
module path is empty. Therefore hand in the default module path
in a separate variable.
2021-02-02 18:31:25 +01:00
Sarah Hoffmann
36447c488a print project directory before running any command 2021-02-02 11:19:31 +01:00
Sarah Hoffmann
69092030cd make phpcs happy 2021-02-02 11:15:56 +01:00
Sarah Hoffmann
109aa9c428 actions: switch to using separate project dir
Also fixes reverse-only import which not run at all.
2021-02-02 11:03:09 +01:00
Sarah Hoffmann
1d97816c53 docs: add hint about putting the nominatim tool into the PATH 2021-02-02 10:56:40 +01:00
Sarah Hoffmann
7591c4fb42 copy database module on install
When no explicity database module is configured, then the
module is now copied into the project directory and used from
there. This means that Nominatim can be updated to a new
version of the module while existing installation keep their
version of normalisation.
2021-02-02 10:56:40 +01:00
Sarah Hoffmann
60cbeb165e hand in absolute path to nominatim tool to php scripts 2021-02-02 10:56:40 +01:00
Sarah Hoffmann
bddfc109f8 refer to new nominatim tool in configuration comments 2021-02-02 10:56:40 +01:00
Sarah Hoffmann
b05c379b39 change the default location for external data to project dir 2021-02-02 10:56:40 +01:00
Sarah Hoffmann
7ba5283fe8 actions: revert to reletive paths for caching 2021-02-02 10:37:18 +01:00
Sarah Hoffmann
98fe5af07d actions: remove setting custom .env
It only set the pyosmium-get-changes binary which is no longer
needed.
2021-02-02 10:35:30 +01:00
Sarah Hoffmann
59cb1d6c27 remove pyosmium-get-changes detection from cmake
pyosmium-get-changes is not longer used.
2021-02-02 10:33:15 +01:00
Sarah Hoffmann
0ad1b28497 Merge pull request #2155 from lonvia/port-regresh-to-python
Port replication and part of the refrsh function to native Python
2021-02-01 11:50:05 +01:00
Sarah Hoffmann
5f63d4ca1f print nice summary after updates 2021-02-01 10:34:31 +01:00
Sarah Hoffmann
90aaab77fc fix linting issues 2021-01-30 16:42:25 +01:00
Sarah Hoffmann
7158433cd3 disable warning about non-toplevel import
They are needed here so nominatim can be run when osmium
is not installed. Everything except replication will work fine.
2021-01-30 16:29:28 +01:00
Sarah Hoffmann
e629a175ed introduce custom UsageError
This is a exception to be thrown when the error occures because
of bad user data. We don't want to print a full stack trace in
these cases but just tell the user what went wrong.
2021-01-30 16:20:10 +01:00
Sarah Hoffmann
45ea73913f remove setting for PYOSMIUM_BINARY
pyosmium is now called as a library from the python code,
so that pyosmium-get-changes is no longer needed.
2021-01-30 15:55:04 +01:00
Sarah Hoffmann
01e0fd7e13 whitelist pyosmium for pylint 2021-01-30 15:52:49 +01:00
Sarah Hoffmann
4cb6dc01f3 port replication update function to python 2021-01-30 15:50:34 +01:00
Sarah Hoffmann
8f0885f6cb port check-for-update function to python 2021-01-28 14:50:14 +01:00
Sarah Hoffmann
beb0fa0727 Merge pull request #2153 from rizkyarlin/patch-1
fix indentation
2021-01-28 09:06:33 +01:00
Muh. Rizky Eka Arlin
436cb9229b fix indentation 2021-01-28 14:21:54 +08:00
Sarah Hoffmann
d78f0ba804 port replication initialisation to Python 2021-01-26 22:50:54 +01:00
Sarah Hoffmann
5b46fcad8e convert functon creation to python
The new functions always creates normal and partitioned functions.
Also adds specialised connection and cursor classes for adding
frequently used helper functions.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
94fa7162be port address level computation to Python
Also adds simple tests for correct table creation.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
e6c2842b66 move update code for postcode and word count to Python
Adds also tests for the new function to execute a SQL script.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
e6d9485c4a cli: import python modules for commands on demand
Given that only one command will be executed in the end, it is
not necessary to import what amounts to the whole library. This
becomes in particular important for update functions that have
a dependency on pyosmium. The dependency can remain optional for
people not using updates.
2021-01-26 22:50:54 +01:00
Sarah Hoffmann
30cd2f2280 remove API comparison util
This is outdated and unmaintained. There are tools out there
which can do this better. Try, for example
https://github.com/radarlabs/api-diff
2021-01-26 22:46:35 +01:00
Sarah Hoffmann
2c909c1f0c Merge pull request #2147 from lonvia/tests-for-python-code
Add basic set of tests for Python code
2021-01-21 10:07:50 +01:00
Sarah Hoffmann
063a4cb403 cli indexer tests need a fake database
The Indexer constructor opens a connection to the given database.
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
42ec67f63c add more tests for CLI parameter parser 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
8c02786820 add tests for indexer 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
c26f323bf5 add simple tests for CLI parsing 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
041ae67fd9 optionally hand in command line arguments to CLI functions
Allows easier testing.
2021-01-20 21:30:27 +01:00
Sarah Hoffmann
bfa6580ad5 use pytest mocking functions for manipulating os.environ 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
52b76d1d01 add tests for Python exec_utils 2021-01-20 21:30:27 +01:00
Sarah Hoffmann
a3767f9142 Merge pull request #2146 from mtmail/two-typos
correct parameter name in query CLI
2021-01-20 21:29:49 +01:00
marc tobias
f62c784102 correct parameter name in query CLI 2021-01-20 21:09:41 +01:00
Sarah Hoffmann
ffc221a87f Merge pull request #2145 from lonvia/cli-query-functions
Add interface to search via command line tool
2021-01-20 09:00:45 +01:00
Sarah Hoffmann
8cf54a1317 add API functions to nominatim tool 2021-01-19 19:38:46 +01:00
Sarah Hoffmann
77e287f669 rename nominatim.admin to nominatim.tools 2021-01-19 19:38:46 +01:00
Sarah Hoffmann
5d95a72758 probe for php_cgi in cmake to be used for querying 2021-01-19 19:38:46 +01:00
Sarah Hoffmann
3475e1dfd6 Merge pull request #2143 from lonvia/integrate-indexer-into-nominatim-tool
Integrate indexer into nominatim tool
2021-01-19 08:42:22 +01:00
Sarah Hoffmann
504922ffbe remove old nominatim.py in favour of 'nominatim index'
The PHP scripts need to know the position of the nominatim
tool in order to call it. This is handed in as environment
variable, so it can be set by the Python script.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
c77877a934 implementaion of 'nominatim index' 2021-01-18 15:43:27 +01:00
Sarah Hoffmann
27977411e9 move indexing function into its own Python module
This makes it mow a standard function of our new Python
library instead of a stand-alone program.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
b79c79fa73 add function to get a DSN for psycopg
Converts the PHP DSN syntax into psycopg syntax when necessary.
2021-01-18 15:43:27 +01:00
Sarah Hoffmann
cd0001b55a Merge pull request #2142 from lonvia/update-bdd-api-tests
Update BDD API tests
2021-01-18 15:40:50 +01:00
Sarah Hoffmann
340e7f7210 bdd: complete coverage for API tests
Also removes some functions that are no longer used and
fixes debug output where the tests found an issue.
2021-01-17 16:12:06 +01:00
Sarah Hoffmann
f9c43137c9 remove unused output formatting functions 2021-01-16 17:39:49 +01:00
Sarah Hoffmann
171ed36e36 bdd: remove duplicated tests 2021-01-16 16:57:28 +01:00
Sarah Hoffmann
c6c907d451 bdd: clean up and extend API tests for details
- remove duplicates created by replacing HTML tests
  with JSON tests
- add tests for newer functions for returning geometries
  and hierarchies
2021-01-16 12:04:13 +01:00
Sarah Hoffmann
19ab038724 collect coverage for /website directory as well 2021-01-15 20:27:14 +01:00
Sarah Hoffmann
1c26fd489d Merge pull request #2139 from lonvia/add-pytest
Introduce unit testing for Python code
2021-01-15 17:37:36 +01:00
Sarah Hoffmann
e8cfba1b10 pytest may also be installed as py-test[-3] 2021-01-15 17:22:31 +01:00
Sarah Hoffmann
496a3d29db enable pytest testing in CI 2021-01-15 15:33:53 +01:00
Sarah Hoffmann
438ed431dd add documentation for new pytest tests 2021-01-15 15:18:45 +01:00
Sarah Hoffmann
f1f0032758 add pytest as a test goal in cmake 2021-01-15 15:09:36 +01:00
Sarah Hoffmann
eb3b789855 add initial pytest test for Configuration 2021-01-15 14:42:03 +01:00
Sarah Hoffmann
c077050855 Merge pull request #2136 from lonvia/introduce-pylint
Introduce pylint for code style checking for Python.
2021-01-15 14:39:26 +01:00
Sarah Hoffmann
d9998bfab3 pylint may be available as pylint3 or pylint 2021-01-15 10:52:25 +01:00
Sarah Hoffmann
7cf9d459d6 use check parameter of subprocess.run
...instead of checking on our own.

Also increase required version of Python to 3.5 because of
subprocess.run().
2021-01-15 10:43:04 +01:00
Sarah Hoffmann
de724aa576 add pylint to list of required linting tools
With pylint being run in the CI, passing it is required now.
2021-01-15 10:43:04 +01:00
Sarah Hoffmann
8e53f63036 fix errors reported by pylint 2021-01-15 08:57:00 +01:00
Sarah Hoffmann
565356613a Merge pull request #2135 from lonvia/python-frontend
Introduce new 'nominatim' all-in-one command-line tool
2021-01-15 08:56:07 +01:00
Sarah Hoffmann
eda0900c8e fix typo 2021-01-14 20:30:27 +01:00
Sarah Hoffmann
3dd67083b2 replace Symfony dotenv dependency with Python dotenv 2021-01-14 18:31:18 +01:00
Sarah Hoffmann
2f73bb3643 bdd: directly call utility scripts in lib
This removes the dependency on php-symfony-dotenv for the tests.
2021-01-14 18:19:22 +01:00
Sarah Hoffmann
9348fc5e15 move dotenv parsing to installed PHP scripts
This means that the php-symfony-dotenv library is now only needed
when using the legacy scripts. This includes the BDD tests which
currently still rely on the PHP utils.
2021-01-14 18:06:22 +01:00
Sarah Hoffmann
97710ee9d1 use cli tool for github CI 2021-01-14 16:35:01 +01:00
Sarah Hoffmann
9619cb3fe5 forward cli tool return value as exit code 2021-01-14 14:36:41 +01:00
Sarah Hoffmann
1c1e951826 adapt documentation to new nominatim cli tool 2021-01-14 12:12:38 +01:00
Sarah Hoffmann
88c57b4dc8 maller command execution fixes 2021-01-14 12:03:49 +01:00
Sarah Hoffmann
ba13cfd9ff make sure that environment variables have highest prio 2021-01-14 11:12:45 +01:00
Sarah Hoffmann
1ff8751caa liniting of new python code 2021-01-14 10:19:21 +01:00
Sarah Hoffmann
98dbc84836 add wrapper calls for all nominatim tool functions 2021-01-14 09:37:47 +01:00
Sarah Hoffmann
0847964a27 avoid accessing constants when getting data from env
When a setting can be read from the environment variable, avoid
accessing the internal defaults. This way the scripts can be
accessed directly in \lib as long as the environment is set up
correctly with full defaults.
2021-01-14 09:37:04 +01:00
Sarah Hoffmann
bc09d7aedb fix access to environment variable 2021-01-14 09:29:43 +01:00
Sarah Hoffmann
04690ad8c4 implement warming in new cli tool
Adds infrastructure for calling the legacy PHP scripts. As the
CONST_* values cannot be set from the python script, hand the values
in via secret environment variables instead. These are all
temporary hacks for the transition phase to python code.
2021-01-13 18:25:15 +01:00
Sarah Hoffmann
ec636111ba warm.php needs constant setup for queries
Warming is done using the query classes and therefore the same
copy-over from dotenv settings to CONST_* parameters is needed
as for query.php.
2021-01-13 18:12:53 +01:00
Sarah Hoffmann
e467b956ff set CONST_LibDir directly from the source scripts
Now that the source scripts have been moved to \lib, they
can determine the position of the PHP library relative to
themselves.
2021-01-13 17:00:38 +01:00
Sarah Hoffmann
ff5a237200 move PHP utilities into the lib directory
These are not called directly as programs but used in a library
fashion by the installed utilities. So the library directory
is a better place.
2021-01-13 14:44:45 +01:00
Sarah Hoffmann
d6bcb7c8b7 consolidate cli interface to single tool 2021-01-13 10:11:58 +01:00
Sarah Hoffmann
57f5e6d898 create skeleton for new CLI tools 2021-01-12 22:21:20 +01:00
Sarah Hoffmann
612fd50612 add skeleton for new Nominatim executables 2021-01-12 10:17:28 +01:00
Sarah Hoffmann
a74e736283 Merge pull request #2132 from lonvia/reduce-api-testdb
Reduce BDD API test database to Liechtenstein
2021-01-11 10:42:22 +01:00
Sarah Hoffmann
86cd5ddd65 also run BDD API tests in CI 2021-01-09 17:58:06 +01:00
Sarah Hoffmann
812de0545d test can be run all in one go with make 2021-01-09 17:57:30 +01:00
Sarah Hoffmann
3bed5516da update documentation for new BDD API tests 2021-01-09 17:54:45 +01:00
Sarah Hoffmann
0495dbe756 bdd: add new API test data
Make all data necessary for API tests directly available in the
repository.
2021-01-09 17:01:33 +01:00
Sarah Hoffmann
5d656891ba bdd: convert API tests to smaller test db
Changes BDD API tests to restrict themselves to
Liechtenstein. One test moved to DB as no appropriate
data is available.
2021-01-09 16:59:46 +01:00
Sarah Hoffmann
74122dc965 bdd: improve assert output for API query checks
Adds wrapper function for checking address parts and
more explanation strings to asserts.
2021-01-09 16:58:37 +01:00
Sarah Hoffmann
ee18a511c6 bdd: import API test DB as part of step setup
In the future, the BDD tests will simply set up the required
test database themselves. Like with the template database, it
is not reimported when it already exists unless that is explicitly
forced.

Makes most of the API tests currently fail because they still
point to old test data.
2021-01-07 11:51:38 +01:00
Sarah Hoffmann
da20881096 Merge pull request #2129 from lonvia/cleanup-bdd-tests
Clean up Python support code for BDD tests
2021-01-07 09:10:40 +01:00
Sarah Hoffmann
aaabb46f20 add symphony dotenv to prerequisites list 2021-01-07 08:56:52 +01:00
Sarah Hoffmann
49142eb6e5 use relative dir for sources for phpunit 2021-01-07 08:55:15 +01:00
Sarah Hoffmann
73cbb6eb9a bdd: clean up DB ops steps
Adds comments and modernizes code.
2021-01-06 16:37:32 +01:00
Sarah Hoffmann
1f29475fa5 bdd: move column comparison in separate file
Introduces a new class DBRow that encapsulates the comparison
functions. This also is responsible for formatting more informative
assert messages. place and placex steps are unified.
2021-01-06 12:28:09 +01:00
Sarah Hoffmann
d586b95ff1 bdd: move nominitim id reader to separate file 2021-01-05 16:00:48 +01:00
Sarah Hoffmann
25557e5f14 bdd: factor out reindexing on updates 2021-01-05 15:17:46 +01:00
Sarah Hoffmann
197870e67a bdd: move place table inserter into separate file
Also simplifies usage by implementing a function that inserts
a complete table row.
2021-01-05 12:12:59 +01:00
Sarah Hoffmann
b8e39d2dde bdd: move scene setup to OSM data steps
The step has nothing to do with the database.
2021-01-05 11:42:28 +01:00
Sarah Hoffmann
5dfa76a610 bdd: switch to auto commit mode
Put the connection to the test database into auto-commit mode
and get rid of the explicit commits. Also use cursors always in
context managers and unify the two implementations that copy
data from the place table.
2021-01-05 11:42:28 +01:00
Sarah Hoffmann
58c471c627 bdd: remove class for lazy formatting
assert in combination with format() does the right thing and calls
the __str__() method only when an assertion hits.
2021-01-05 10:39:44 +01:00
Sarah Hoffmann
213bf7d19d bdd: rename db_ops steps
Now all files implementing steps are called steps_*.py.
2021-01-05 10:20:00 +01:00
Sarah Hoffmann
12ae8a4ed3 bdd: move output format computation into response 2021-01-05 10:17:59 +01:00
Sarah Hoffmann
8a93f8ed94 bdd: move Response classes in own file and simplify
Removes most of the duplicated parse functions, introduces
a common assert_field function with a more expressive error
message.
2021-01-05 10:03:47 +01:00
Sarah Hoffmann
2712c5f90e bdd: rename and clean up osm_data steps
Move common OPL creation code into a function and remove
unused imports.
2021-01-04 20:17:17 +01:00
Sarah Hoffmann
72587b08fa bdd: move external process execution in separate func 2021-01-04 19:58:59 +01:00
Sarah Hoffmann
faa85ded50 bdd: move NominatimEnvironment into separate file
Also cleans up and modernizes the code and adds documentation.
2021-01-04 17:54:51 +01:00
Sarah Hoffmann
14e5bc7a17 bdd: move grid generation code into geometry factory 2021-01-04 17:04:47 +01:00
Sarah Hoffmann
f727620859 bdd: move geoemtry creation into separate file
Also renames the OsmDataFactory in the more appropriate
GeometryFactory and modernizes code for python3.
2021-01-04 16:34:40 +01:00
Sarah Hoffmann
843d3a137c remove stale code for python2 2021-01-04 14:14:34 +01:00
Sarah Hoffmann
e4691005e2 Merge pull request #2125 from lonvia/independent-project-directory
Allow for truely independent project directory
2021-01-04 14:10:24 +01:00
Sarah Hoffmann
4aba70caee create a temporary project dir for tests
The project directory contains the website script as
configured through the test configuration. This means
that tests are now completely independet of any
configuration that may be contained in the build
directory.

Also removes the hack to inject additional settings via
a environment variable.
2021-01-04 11:39:45 +01:00
Sarah Hoffmann
5e989b9296 configure osm2pgsql and module location via cmake
The default location of osm2pgsql and the postgresql module
is decided at compile/installation time and is not necessarily
in the project directory.

With this change it is now possible to have a project directory
that is completely separate from the build directory.
2021-01-04 11:37:56 +01:00
Sarah Hoffmann
cba2d252c8 Merge pull request #2124 from lonvia/remove-nose
Remove nose dependency for tests
2021-01-03 21:04:59 +01:00
Sarah Hoffmann
2ecec19df0 remove nose requirement from documentation 2021-01-03 17:23:44 +01:00
Sarah Hoffmann
4ca7197826 replace nose assertions with simple asserts 2021-01-03 17:21:24 +01:00
Sarah Hoffmann
a8ec250993 Merge pull request #2119 from mtmail/check-import-finished-when-tables-droped
utils/check_import_finished: skip some checks when setup ran with --drop
2020-12-22 15:57:48 +01:00
Sarah Hoffmann
f3e0e401fd Merge pull request #2118 from mtmail/vagrant-ubuntu-dotenv
Vagrant ubuntu: install dotenv package
2020-12-22 15:54:48 +01:00
marc tobias
d60f89867b utils/check_import_finished: skip some checks when setup ran with --drop 2020-12-21 20:12:31 +01:00
marc tobias
b133f2bc4c Vagrant ubuntu: install dotenv package 2020-12-21 20:10:13 +01:00
Sarah Hoffmann
301fd7f7e8 Merge pull request #2115 from lonvia/use-dotenv
Switch configuration to dotenv
2020-12-21 11:33:38 +01:00
Sarah Hoffmann
45148c7078 switch documentation to describing dotenv 2020-12-20 12:09:27 +01:00
Sarah Hoffmann
3c75194448 adapt instructions for creating the test db to dotenv 2020-12-20 11:53:19 +01:00
Sarah Hoffmann
f218e20522 mark CentOS installation instructions as broken
Getting symfony-dotenv installed on CentOS is a major pain,
so just mark it broken instead.

Still sSwitch the config format to dotenv already.
2020-12-20 11:35:29 +01:00
Sarah Hoffmann
33b038ce6f tests: always create the config file
There is also one database test that uses the API functions.
2020-12-19 17:55:46 +01:00
Sarah Hoffmann
f62c65e9d9 adapt php tests to new directory constants 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
867baab3d1 make phpcs happy 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
63ad0cb498 github actions: need dotenv 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
433017b990 move creation of website scripts to setup script
Instead of creating the website wrapper scripts with cmake,
they are now created when --setup-website is called. The
setup of the configuration constants is directly embedded
into the scripts. This means we can get rid of the separate
settings-frontend.php. More importantly however, it means
that it is now possible to set up multiple website directories
from the same build directory.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
d97aed8741 adapt tests to new dotenv environment
DB tests now can simply set the environment to change configuration
variables. API tests still rely on a configuration file.

Also, query.php needs to set up the CONST_* variables to work with
the query scripts. That is a tiny bit messy and duplicates code
but this part will need to be reworked later.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
06d89e1d47 fix various typos 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
8676e45d88 remove old default settings 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
992d3faac8 switch all utils to initialising dotenv 2020-12-19 14:33:04 +01:00
Sarah Hoffmann
0947b61808 switch remaining settings to dotenv format
CONST_Search_AreaPolygons and CONST_Search_ReversePlanForAll have
been removed completely.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
d43f30903c use explicit DSN for website scripts
Website scripts have no access to the dotenv variables, so use
the DSN constant instead when connecting to the database.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
15a1666f8a replace database settings with dotenv variant
As we can't refer to the project root dir in the module path, the
module path may now also be a relative directory which is then
taken as being relative to the project root path.

Moves the checkModulePresence() function into the Setup class, so
that it can work on the computed absolute module path.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
25bdd7c6d9 introduce dotenv parsing for setup.php
This adds the notion of a project directory. This is the directory
that holds all necessary files for one specific installation of
Nominatim. Dotenv looks for an .env file in this directory and
adds it to the global environment together with the defaults from
Nominatim's data directory.

Add's symfony's dotenv library as a new dependency.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
ac116980ac make HTTP proxy setup explicit
The setup relies on the project configuration which we want to
explicitly set up in later steps. Therefore proxy setup needs to
be done explicitly as well. There is the added bonus that the
setup is done only for the utils which try to call outside.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
b5480f6e36 reorganise path settings in config
CONST_BasePath is split into separate configuration variables
for binaries, libraries and data. These variables as well as
the installation path are now set in the executable directly and
no longer configurable via project settings.

This is the first step towards an installable software. The
executables should know per installation where to find their
necessary data to execute. Project configuration needs to be
restricted to settings that really concern the specific Nominatim
installation.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
17a8cc5e29 use /usr/bin/env for python script
Makes it easier to use the script with a virtualenv setup.
2020-12-19 14:33:04 +01:00
Sarah Hoffmann
aeeee0d5da Merge pull request #2112 from lonvia/fix-tests-for-php-8
work around failing CI tests
2020-12-18 14:25:50 +01:00
Sarah Hoffmann
de03a0f924 work around failing CI tests
Force use of phpunit7 to avoid an issue with different sort order.
2020-12-18 10:58:09 +01:00
Sarah Hoffmann
5528918d5d Issue templates: require postgres config modifications 2020-12-18 10:37:01 +01:00
Sarah Hoffmann
9e0d5cb669 Issue templates: more commenting of instructions 2020-12-16 08:41:20 +01:00
Sarah Hoffmann
11622b2863 Issue templates: put instructions into comments 2020-12-16 08:38:04 +01:00
371 changed files with 37346 additions and 9060 deletions

View File

@@ -7,16 +7,16 @@ assignees: ''
---
Before opening a new feature request, please search through the open issue to check that your request hasn't been reported already.
<!-- Before opening a new feature request, please search through the open issue to check that your request hasn't been reported already. -->
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
<!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] -->
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
<!-- A clear and concise description of what you want to happen. -->
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
<!-- A clear and concise description of any alternative solutions or features you've considered. -->
**Additional context**
Add any other context or screenshots about the feature request here.
<!-- Add any other context or screenshots about the feature request here. -->

View File

@@ -9,7 +9,7 @@ assignees: ''
## What did you search for?
Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using.
<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. -->
## What result did you get?
@@ -17,11 +17,11 @@ Please try to provide a link to your search. You can go to https://nominatim.op
**Is the result in the right place and just named wrongly?**
Please tell us the display name you expected.
<!-- Please tell us the display name you expected. -->
**Is the result missing completely?**
Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.
<!-- Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.
To get the link to the OSM object, you can try the following:
@@ -30,7 +30,8 @@ To get the link to the OSM object, you can try the following:
* Click on the question mark on the right side of the map. You get a question cursor. Use it to click on the map where your object is located.
* Find the object of interest in the list that appears on the left side.
* Click on the object and report back the URL that the browser shows.
-->
## Further details
Anything else we should know about the search. Particularities with addresses in the area etc.
<!-- Anything else we should know about the search. Particularities with addresses in the area etc. -->

View File

@@ -7,13 +7,13 @@ assignees: ''
---
___Note: if you are installing Nominatim through a docker image, you should report issues with the installation process with the docker repository first.___
<!-- Note: if you are installing Nominatim through a docker image, you should report issues with the installation process with the docker repository first. -->
**Describe the bug**
A clear and concise description of what the bug is.
<!-- A clear and concise description of what the bug is. -->
**To Reproduce**
Please describe what you did to get to the issue.
<!-- Please describe what you did to get to the issue. -->
**Software Environment (please complete the following information):**
- Nominatim version:
@@ -27,5 +27,10 @@ Please describe what you did to get to the issue.
- type and size of disks:
- bare metal/AWS/other cloud service:
**Postgresql Configuration:**
<!-- List any configuration items you changed in your postgresql configuration. -->
**Additional context**
Add any other context about the problem here.
<!-- Add any other context about the problem here. -->

View File

@@ -1,28 +1,42 @@
name: 'Build Nominatim'
inputs:
ubuntu:
description: 'Version of Ubuntu to install on'
required: false
default: '20'
runs:
using: "composite"
steps:
- name: Install prerequisits
run: sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev python3-psycopg2 python3-pyosmium
- name: Install prerequisites
run: |
sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev
if [ "x$UBUNTUVER" == "x18" ]; then
pip3 install python-dotenv psycopg2==2.7.7 jinja2==2.8 psutil==5.4.2 pyicu osmium PyYAML==5.1 datrie
else
sudo apt-get install -y -qq python3-icu python3-datrie python3-pyosmium python3-jinja2 python3-psutil python3-psycopg2 python3-dotenv python3-yaml
fi
shell: bash
env:
UBUNTUVER: ${{ inputs.ubuntu }}
- name: Download dependencies
run: |
if [ ! -f country_grid.sql.gz ]; then
wget --no-verbose https://www.nominatim.org/data/country_grid.sql.gz
fi
cp country_grid.sql.gz Nominatim/data/country_osm_grid.sql.gz
shell: bash
- name: Configure
run: mkdir build && cd build && cmake ..
run: mkdir build && cd build && cmake ../Nominatim
shell: bash
- name: Build
run: |
make -j2 all
./utils/setup.php --setup-website
sudo make install
shell: bash
working-directory: build
- name: Download dependencies
run: |
if [ ! -f data/country_osm_grid.sql.gz ]; then
wget --no-verbose -O data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz
fi
shell: bash

View File

@@ -14,8 +14,10 @@ runs:
steps:
- name: Remove existing PostgreSQL
run: |
sudo apt-get update -qq
sudo apt-get purge -yq postgresql*
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get update -qq
shell: bash
- name: Install PostgreSQL

View File

@@ -4,21 +4,40 @@ on: [ push, pull_request ]
jobs:
tests:
runs-on: ubuntu-20.04
strategy:
matrix:
postgresql: [9.5, 13]
ubuntu: [18, 20]
include:
- postgresql: 9.5
- ubuntu: 18
postgresql: 9.5
postgis: 2.5
- postgresql: 13
pytest: pytest
php: 7.2
- ubuntu: 20
postgresql: 13
postgis: 3
pytest: py.test-3
php: 7.4
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/checkout@v2
with:
submodules: true
path: Nominatim
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
coverage: xdebug
tools: phpunit, phpcs, composer
- uses: actions/setup-python@v2
with:
python-version: 3.6
if: matrix.ubuntu == 18
- name: Get Date
id: get-date
@@ -29,39 +48,96 @@ jobs:
- uses: actions/cache@v2
with:
path: |
data/country_osm_grid.sql.gz
monaco-latest.osm.pbf
key: nominatim-data-${{ steps.get-date.outputs.date }}
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
- uses: ./.github/actions/setup-postgresql
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./.github/actions/build-nominatim
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Install test prerequsites
run: sudo apt-get install -y -qq pylint python3-pytest python3-behave python3-pytest-cov php-codecoverage
if: matrix.ubuntu == 20
- name: Install test prerequsites
run: |
sudo apt-get install -y -qq php-codesniffer python3-tidylib
sudo pip3 install behave nose
pip3 install pylint==2.6.0 pytest pytest-cov behave==1.2.6
if: matrix.ubuntu == 18
- name: PHP linting
run: phpcs --report-width=120 .
working-directory: Nominatim
- name: Python linting
run: pylint nominatim
working-directory: Nominatim
- name: PHP unit tests
run: phpunit ./
working-directory: test/php
run: phpunit --coverage-clover ../../coverage-php.xml ./
working-directory: Nominatim/test/php
if: matrix.ubuntu == 20
- name: Python unit tests
run: $PYTEST --cov=nominatim --cov-report=xml test/python
working-directory: Nominatim
env:
PYTEST: ${{ matrix.pytest }}
- name: BDD tests
run: behave -DREMOVE_TEMPLATE=1 --format=progress3 db osm2pgsql
working-directory: test/bdd
run: |
mkdir cov
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3 -DPHPCOV=./cov
composer require phpunit/phpcov:7.0.2
vendor/bin/phpcov merge --clover ../../coverage-bdd.xml ./cov
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 20
- name: BDD tests
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 18
- name: BDD tests (legacy_icu tokenizer)
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build -DTOKENIZER=legacy_icu --format=progress3
working-directory: Nominatim/test/bdd
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
files: ./Nominatim/coverage*.xml
directory: ./
name: codecov-umbrella
fail_ci_if_error: false
path_to_write_report: ./coverage/codecov_report.txt
verbose: true
if: matrix.ubuntu == 20
import:
runs-on: ubuntu-20.04
strategy:
matrix:
ubuntu: [18, 20]
include:
- ubuntu: 18
postgresql: 9.5
postgis: 2.5
- ubuntu: 20
postgresql: 13
postgis: 3
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/checkout@v2
with:
submodules: true
path: Nominatim
- name: Get Date
id: get-date
@@ -72,49 +148,70 @@ jobs:
- uses: actions/cache@v2
with:
path: |
data/country_osm_grid.sql.gz
monaco-latest.osm.pbf
key: nominatim-data-${{ steps.get-date.outputs.date }}
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
- uses: ./.github/actions/setup-postgresql
- uses: actions/cache@v2
with:
postgresql-version: 13
postgis-version: 3
- uses: ./.github/actions/build-nominatim
path: |
monaco-latest.osm.pbf
key: nominatim-test-data-${{ steps.get-date.outputs.date }}
- name: Create configuration
run: |
echo '<?php' > settings/local.php
echo " @define('CONST_Pyosmium_Binary', '/usr/lib/python3-pyosmium/pyosmium-get-changes');" >> settings/local.php
working-directory: build
- uses: actions/setup-python@v2
with:
python-version: 3.6
if: matrix.ubuntu == 18
- name: Download import data
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Clean installation
run: rm -rf Nominatim build
shell: bash
- name: Prepare import environment
run: |
if [ ! -f monaco-latest.osm.pbf ]; then
wget --no-verbose https://download.geofabrik.de/europe/monaco-latest.osm.pbf
fi
mkdir data-env
cd data-env
shell: bash
- name: Import
run: php ./utils/setup.php --osm-file ../monaco-latest.osm.pbf --osm2pgsql-cache 500 --all
working-directory: build
run: nominatim import --osm-file ../monaco-latest.osm.pbf
shell: bash
working-directory: data-env
- name: Import special phrases
run: php ./utils/specialphrases.php --wiki-import | psql -d nominatim
working-directory: build
run: nominatim special-phrases --import-from-wiki
working-directory: data-env
- name: Check import
run: php ./utils/check_import_finished.php
working-directory: build
- name: Check full import
run: nominatim admin --check-database
working-directory: data-env
- name: Warm up database
run: nominatim admin --warm
working-directory: data-env
- name: Run update
run: |
php ./utils/update.php --init-updates
php ./utils/update.php --import-osmosis
working-directory: build
nominatim replication --init
nominatim replication --once
working-directory: data-env
- name: Run reverse-only import
run : |
dropdb nominatim
php ./utils/setup.php --osm-file ../monaco-latest.osm.pbf --reverse-only --all
working-directory: build
run : nominatim import --osm-file ../monaco-latest.osm.pbf --reverse-only --no-updates
working-directory: data-env
env:
NOMINATIM_DATABASE_DSN: pgsql:dbname=reverse
- name: Check reverse import
run: nominatim admin --check-database
working-directory: data-env

1
.gitignore vendored
View File

@@ -9,3 +9,4 @@ data/wiki_specialphrases.sql
data/osmosischange.osc
.vagrant
data/country_osm_grid.sql.gz

15
.pylintrc Normal file
View File

@@ -0,0 +1,15 @@
[MASTER]
extension-pkg-whitelist=osmium
ignored-modules=icu,datrie
[MESSAGES CONTROL]
[TYPECHECK]
# closing added here because it sometimes triggers a false positive with
# 'with' statements.
ignored-classes=NominatimArgs,closing
disable=too-few-public-methods,duplicate-code
good-names=i,x,y,fd,db

View File

@@ -6,7 +6,7 @@
#
#-----------------------------------------------------------------------------
cmake_minimum_required(VERSION 2.8 FATAL_ERROR)
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
@@ -19,7 +19,7 @@ list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
project(nominatim)
set(NOMINATIM_VERSION_MAJOR 3)
set(NOMINATIM_VERSION_MINOR 6)
set(NOMINATIM_VERSION_MINOR 7)
set(NOMINATIM_VERSION_PATCH 0)
set(NOMINATIM_VERSION "${NOMINATIM_VERSION_MAJOR}.${NOMINATIM_VERSION_MINOR}.${NOMINATIM_VERSION_PATCH}")
@@ -36,6 +36,7 @@ set(BUILD_API on CACHE BOOL "Build everything for the API server")
set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module")
set(BUILD_TESTS on CACHE BOOL "Build test suite")
set(BUILD_DOCS on CACHE BOOL "Build documentation")
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
set(BUILD_OSM2PGSQL on CACHE BOOL "Build osm2pgsql (expert only)")
#-----------------------------------------------------------------------------
@@ -57,20 +58,11 @@ endif()
#-----------------------------------------------------------------------------
# python and pyosmium (imports/updates only)
# python (imports/updates only)
#-----------------------------------------------------------------------------
if (BUILD_IMPORTER)
find_package(PythonInterp 3)
find_program(PYOSMIUM pyosmium-get-changes)
if (NOT EXISTS "${PYOSMIUM}")
set(PYOSMIUM_PATH "")
message(WARNING "pyosmium-get-changes not found (required for updates)")
else()
set(PYOSMIUM_PATH "${PYOSMIUM}")
message(STATUS "Using pyosmium-get-changes at ${PYOSMIUM_PATH}")
endif()
find_package(PythonInterp 3.6 REQUIRED)
endif()
#-----------------------------------------------------------------------------
@@ -86,8 +78,19 @@ if (BUILD_API OR BUILD_IMPORTER)
# sanity check if PHP binary exists
if (NOT EXISTS ${PHP_BIN})
message(FATAL_ERROR "PHP binary not found. Install php or provide location with -DPHP_BIN=/path/php ")
else()
message (STATUS "Using PHP binary " ${PHP_BIN})
endif()
if (NOT PHPCGI_BIN)
find_program (PHPCGI_BIN php-cgi)
endif()
# sanity check if PHP binary exists
if (NOT EXISTS ${PHPCGI_BIN})
message(WARNING "php-cgi binary not found. nominatim tool will not provide query functions.")
set (PHPCGI_BIN "")
else()
message (STATUS "Using php-cgi binary " ${PHPCGI_BIN})
endif()
message (STATUS "Using PHP binary " ${PHP_BIN})
endif()
#-----------------------------------------------------------------------------
@@ -95,70 +98,21 @@ endif()
#-----------------------------------------------------------------------------
if (BUILD_IMPORTER)
set(CUSTOMSCRIPTS
utils/check_import_finished.php
utils/country_languages.php
utils/importWikipedia.php
utils/export.php
utils/query.php
utils/setup.php
utils/specialphrases.php
utils/update.php
utils/warm.php
)
find_file(COUNTRY_GRID_FILE country_osm_grid.sql.gz
PATHS ${PROJECT_SOURCE_DIR}/data
NO_DEFAULT_PATH
DOC "Location of the country grid file."
)
foreach (script_source ${CUSTOMSCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/script.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
if (NOT COUNTRY_GRID_FILE)
message(FATAL_ERROR "\nYou need to download the country_osm_grid first:\n"
" wget -O ${PROJECT_SOURCE_DIR}/data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz")
endif()
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool.tmpl
${PROJECT_BINARY_DIR}/nominatim)
endif()
#-----------------------------------------------------------------------------
# webserver scripts (API only)
#-----------------------------------------------------------------------------
if (BUILD_API)
set(WEBSITESCRIPTS
website/deletable.php
website/details.php
website/lookup.php
website/polygons.php
website/reverse.php
website/search.php
website/status.php
)
foreach (script_source ${WEBSITESCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/website.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
set(WEBPATHS css images js)
foreach (wp ${WEBPATHS})
execute_process(
COMMAND ln -sf ${PROJECT_SOURCE_DIR}/website/${wp} ${PROJECT_BINARY_DIR}/website/
)
endforeach()
add_custom_target(serve
php -S 127.0.0.1:8088
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/website
)
add_custom_target(serve-global
php -S 0.0.0.0:8088
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/website
)
endif()
#-----------------------------------------------------------------------------
# default settings
#-----------------------------------------------------------------------------
configure_file(${PROJECT_SOURCE_DIR}/settings/defaults.php
${PROJECT_BINARY_DIR}/settings/settings.php)
#-----------------------------------------------------------------------------
# Tests
#-----------------------------------------------------------------------------
@@ -168,21 +122,60 @@ if (BUILD_TESTS)
set(TEST_BDD db osm2pgsql api)
foreach (test ${TEST_BDD})
add_test(NAME bdd_${test}
COMMAND behave ${test}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/bdd)
set_tests_properties(bdd_${test}
PROPERTIES ENVIRONMENT "NOMINATIM_DIR=${PROJECT_BINARY_DIR}")
endforeach()
find_program(PYTHON_BEHAVE behave)
find_program(PYLINT NAMES pylint3 pylint)
find_program(PYTEST NAMES pytest py.test-3 py.test)
find_program(PHPCS phpcs)
find_program(PHPUNIT phpunit)
add_test(NAME php
COMMAND phpunit ./
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/php)
if (PYTHON_BEHAVE)
message(STATUS "Using Python behave binary ${PYTHON_BEHAVE}")
foreach (test ${TEST_BDD})
add_test(NAME bdd_${test}
COMMAND ${PYTHON_BEHAVE} ${test}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/bdd)
set_tests_properties(bdd_${test}
PROPERTIES ENVIRONMENT "NOMINATIM_DIR=${PROJECT_BINARY_DIR}")
endforeach()
else()
message(WARNING "behave not found. BDD tests disabled." )
endif()
add_test(NAME phpcs
COMMAND phpcs --report-width=120 --colors lib website utils
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
if (PHPUNIT)
message(STATUS "Using phpunit binary ${PHPUNIT}")
add_test(NAME php
COMMAND ${PHPUNIT} ./
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/php)
else()
message(WARNING "phpunit not found. PHP unit tests disabled." )
endif()
if (PHPCS)
message(STATUS "Using phpcs binary ${PHPCS}")
add_test(NAME phpcs
COMMAND ${PHPCS} --report-width=120 --colors lib website utils
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "phpcs not found. PHP linting tests disabled." )
endif()
if (PYLINT)
message(STATUS "Using pylint binary ${PYLINT}")
add_test(NAME pylint
COMMAND ${PYLINT} nominatim
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "pylint not found. Python linting tests disabled.")
endif()
if (PYTEST)
message(STATUS "Using pytest binary ${PYTEST}")
add_test(NAME pytest
COMMAND ${PYTEST} test/python
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "pytest not found. Python tests disabled." )
endif()
endif()
#-----------------------------------------------------------------------------
@@ -200,3 +193,71 @@ endif()
if (BUILD_DOCS)
add_subdirectory(docs)
endif()
#-----------------------------------------------------------------------------
# Manual page
#-----------------------------------------------------------------------------
if (BUILD_MANPAGE)
add_subdirectory(manual)
endif()
#-----------------------------------------------------------------------------
# Installation
#-----------------------------------------------------------------------------
include(GNUInstallDirs)
set(NOMINATIM_DATADIR ${CMAKE_INSTALL_FULL_DATADIR}/${PROJECT_NAME})
set(NOMINATIM_LIBDIR ${CMAKE_INSTALL_FULL_LIBDIR}/${PROJECT_NAME})
set(NOMINATIM_CONFIGDIR ${CMAKE_INSTALL_FULL_SYSCONFDIR}/${PROJECT_NAME})
if (BUILD_IMPORTER)
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool-installed.tmpl installed.bin)
install(PROGRAMS ${PROJECT_BINARY_DIR}/installed.bin
DESTINATION ${CMAKE_INSTALL_BINDIR}
RENAME nominatim)
install(DIRECTORY nominatim
DESTINATION ${NOMINATIM_LIBDIR}/lib-python
FILES_MATCHING PATTERN "*.py"
PATTERN __pycache__ EXCLUDE)
install(DIRECTORY lib-sql DESTINATION ${NOMINATIM_LIBDIR})
install(FILES data/country_name.sql
${COUNTRY_GRID_FILE}
data/words.sql
DESTINATION ${NOMINATIM_DATADIR})
endif()
if (BUILD_OSM2PGSQL)
if (${CMAKE_VERSION} VERSION_LESS 3.13)
# Installation of subdirectory targets was only introduced in 3.13.
# So just copy the osm2pgsql file for older versions.
install(PROGRAMS ${PROJECT_BINARY_DIR}/osm2pgsql/osm2pgsql
DESTINATION ${NOMINATIM_LIBDIR})
else()
install(TARGETS osm2pgsql RUNTIME DESTINATION ${NOMINATIM_LIBDIR})
endif()
endif()
if (BUILD_MODULE)
install(PROGRAMS ${PROJECT_BINARY_DIR}/module/nominatim.so
DESTINATION ${NOMINATIM_LIBDIR}/module)
endif()
if (BUILD_API)
install(DIRECTORY lib-php DESTINATION ${NOMINATIM_LIBDIR})
endif()
install(FILES settings/env.defaults
settings/address-levels.json
settings/phrase-settings.json
settings/import-admin.style
settings/import-street.style
settings/import-address.style
settings/import-full.style
settings/import-extratags.style
settings/legacy_icu_tokenizer.yaml
settings/icu-rules/extended-unicode-to-asccii.yaml
DESTINATION ${NOMINATIM_CONFIGDIR})

View File

@@ -49,22 +49,18 @@ are in process of consolidating the style. The following rules apply:
* for PHP variables use CamelCase with a prefixing letter indicating the type
(i - integer, f - float, a - array, s - string, o - object)
The coding style is enforced with PHPCS and can be tested with:
The coding style is enforced with PHPCS and pylint. It can be tested with:
```
phpcs --report-width=120 --colors .
phpcs --report-width=120 --colors .
pylint3 --extension-pkg-whitelist=osmium nominatim
```
## Testing
Before submitting a pull request make sure that the following tests pass:
Before submitting a pull request make sure that the tests pass:
```
cd test/bdd
behave -DBUILDDIR=<builddir> db osm2pgsql
```
```
cd test/php
phpunit ./
cd build
make test
```

View File

@@ -1,3 +1,26 @@
3.7.0
* switch to dotenv for configuration file
* introduce 'make install' (reorganising most of the code)
* introduce nominatim tool as replacement for various php scripts
* introduce project directories and allow multiple installations from same build
* clean up BDD tests: drop nose, reorganise step code
* simplify test database for API BDD tests and autoinstall database
* port most of the code for command-line tools to Python
(thanks to @darkshredder and @AntoJvlt)
* add tests for all tooling
* replace pyosmium-get-changes with custom internal implementation using
pyosmium
* improve search for queries with housenumber and partial terms
* add database versioning
* use jinja2 for preprocessing SQL files
* introduce automatic migrations
* reverse fix preference of interpolations over housenumbers
* parallelize indexing of postcodes
* add non-key indexes to speed up housenumber + street searches
* switch housenumber field in placex to save transliterated names
3.6.0
* add full support for searching by and displaying of addr:* tags

View File

@@ -1,4 +1,5 @@
[![Build Status](https://github.com/osm-search/Nominatim/workflows/CI%20Tests/badge.svg)](https://github.com/osm-search/Nominatim/actions?query=workflow%3A%22CI+Tests%22)
[![codecov](https://codecov.io/gh/osm-search/Nominatim/branch/master/graph/badge.svg?token=8P1LXrhCMy)](https://codecov.io/gh/osm-search/Nominatim)
Nominatim
=========
@@ -24,15 +25,14 @@ Installing and running Nominatim is something for experienced system
administrators only who can do some trouble-shooting themselves. We are sorry,
but we can not provide installation support. We are all doing this in our free
time and there is just so much of that time to go around. Do not open issues in
our bug tracker if you need help. You can ask questions on the mailing list
(see below) or on [help.openstreetmap.org](https://help.openstreetmap.org/).**
our bug tracker if you need help. Use the discussions forum
or ask for help on [help.openstreetmap.org](https://help.openstreetmap.org/).**
The latest stable release can be downloaded from https://nominatim.org.
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation), as well as an extensive [Troubleshooting/FAQ section](https://nominatim.org/release-docs/latest/admin/Faq/).
Detailed installation instructions for the development version can be
found at [nominatim.org](https://nominatim.org/release-docs/develop/admin/Installation)
as well.
[Detailed installation instructions for current master](https://nominatim.org/release-docs/develop/admin/Installation)
can be found at nominatim.org as well.
A quick summary of the necessary steps:
@@ -42,12 +42,15 @@ A quick summary of the necessary steps:
cd build
cmake ..
make
sudo make install
2. Get OSM data and import:
2. Create a project directory, get OSM data and import:
./build/utils/setup.php --osm-file <your planet file> --all
mkdir nominatim-project
cd nominatim-project
nominatim import --osm-file <your planet file>
3. Point your webserver to the ./build/website directory.
3. Point your webserver to the nominatim-project/website directory.
License
@@ -59,13 +62,14 @@ The source code is available under a GPLv2 license.
Contributing
============
Contributions are welcome. For details see [contribution guide](CONTRIBUTING.md).
Both bug reports and pull requests are welcome.
Contributions, bugreport and pull requests are welcome.
For details see [contribution guide](CONTRIBUTING.md).
Mailing list
============
Questions and help
==================
For questions you can join the geocoding mailing list, see
https://lists.openstreetmap.org/listinfo/geocoding
For questions, community help and discussions you can use the
[Github discussions forum](https://github.com/osm-search/Nominatim/discussions)
or join the
[geocoding mailing list](https://lists.openstreetmap.org/listinfo/geocoding).

39
SECURITY.md Normal file
View File

@@ -0,0 +1,39 @@
# Security Policy
## Supported Versions
All Nominatim releases receive security updates for two years.
The following table lists the end of support for all currently supported
versions.
| Version | End of support for security updates |
| ------- | ----------------------------------- |
| 3.7.x | 2023-04-05 |
| 3.6.x | 2022-12-12 |
| 3.5.x | 2022-06-05 |
| 3.4.x | 2021-10-24 |
## Reporting a Vulnerability
If you believe, you have found an issue in Nominatim that has implications on
security, please send a description of the issue to **security@nominatim.org**.
You will receive an acknowledgement of your mail within 3 work days where we
also notify you of the next steps.
## How we Disclose Security Issues
** The following section only applies to security issues found in released
versions. Issues that concern the master development branch only will be
fixed immediately on the branch with the corresponding PR containing the
description of the nature and severity of the issue. **
Patches for identified security issues are applied to all affected versions and
new minor versions are released. At the same time we release a statement at
the [Nominatim blog](https://nominatim.org/blog/) describing the nature of the
incident. Announcements will also be published at the
[geocoding mailinglist](https://lists.openstreetmap.org/listinfo/geocoding).
## List of Previous Incidents
* 2020-05-04 - [SQL injection issue on /details endpoint](https://lists.openstreetmap.org/pipermail/geocoding/2020-May/002012.html)

View File

@@ -1,4 +0,0 @@
#!@PHP_BIN@ -Cq
<?php
require_once(dirname(dirname(__FILE__)).'/settings/settings.php');
require_once(CONST_BasePath.'/@script_source@');

17
cmake/tool-installed.tmpl Normal file
View File

@@ -0,0 +1,17 @@
#!/usr/bin/env python3
import sys
import os
sys.path.insert(1, '@NOMINATIM_LIBDIR@/lib-python')
os.environ['NOMINATIM_NOMINATIM_TOOL'] = os.path.abspath(__file__)
from nominatim import cli
exit(cli.nominatim(module_dir='@NOMINATIM_LIBDIR@/module',
osm2pgsql_path='@NOMINATIM_LIBDIR@/osm2pgsql',
phplib_dir='@NOMINATIM_LIBDIR@/lib-php',
sqllib_dir='@NOMINATIM_LIBDIR@/lib-sql',
data_dir='@NOMINATIM_DATADIR@',
config_dir='@NOMINATIM_CONFIGDIR@',
phpcgi_path='@PHPCGI_BIN@'))

17
cmake/tool.tmpl Executable file
View File

@@ -0,0 +1,17 @@
#!/usr/bin/env python3
import sys
import os
sys.path.insert(1, '@CMAKE_SOURCE_DIR@')
os.environ['NOMINATIM_NOMINATIM_TOOL'] = os.path.abspath(__file__)
from nominatim import cli
exit(cli.nominatim(module_dir='@CMAKE_BINARY_DIR@/module',
osm2pgsql_path='@CMAKE_BINARY_DIR@/osm2pgsql/osm2pgsql',
phplib_dir='@CMAKE_SOURCE_DIR@/lib-php',
sqllib_dir='@CMAKE_SOURCE_DIR@/lib-sql',
data_dir='@CMAKE_SOURCE_DIR@/data',
config_dir='@CMAKE_SOURCE_DIR@/settings',
phpcgi_path='@PHPCGI_BIN@'))

View File

@@ -1,5 +0,0 @@
<?php
@define('CONST_Debug', (isset($_GET['debug']) && $_GET['debug']));
require_once(dirname(dirname(__FILE__)).'/settings/settings-frontend.php');
require_once(CONST_BasePath.'/@script_source@');

14
codecov.yml Normal file
View File

@@ -0,0 +1,14 @@
codecov:
require_ci_to_pass: yes
coverage:
status:
project: off
patch: off
comment:
require_changes: true
after_n_builds: 2
fixes:
- "Nominatim/::"

View File

@@ -1,26 +0,0 @@
-- This data contains Ordnance Survey data © Crown copyright and database right 2010.
-- Code-Point Open contains Royal Mail data © Royal Mail copyright and database right 2010.
-- OS data may be used under the terms of the OS OpenData licence:
-- http://www.ordnancesurvey.co.uk/oswebsite/opendata/licence/docs/licence.pdf
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE gb_postcode (
id integer,
postcode character varying(9),
geometry geometry,
CONSTRAINT enforce_dims_geometry CHECK ((st_ndims(geometry) = 2)),
CONSTRAINT enforce_srid_geometry CHECK ((st_srid(geometry) = 4326))
);

View File

@@ -1,16 +0,0 @@
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE us_postcode (
postcode text,
x double precision,
y double precision
);

View File

@@ -29787,7 +29787,7 @@ st 5557484
-- prefill word table
select count(make_keywords(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(precompute_words(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(getorcreate_housenumber_id(make_standard_name(v))) from (select distinct address->'housenumber' as v from place where address ? 'housenumber') as w;
-- copy the word frequencies

View File

@@ -47,15 +47,16 @@ The file `import_multiple_regions.sh` needs to be edited as per your requirement
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
### Setting up multiple regions
!!! tip
If your database already exists and you want to add more countries, replace the setting up part
If your database already exists and you want to add more countries,
replace the setting up part
`${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1`
with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1`
where N is the numbers of CPUs in your system.
### Setting up multiple regions
Run the following command from your Nominatim directory after configuring the file.
bash ./utils/import_multiple_regions.sh
@@ -154,7 +155,7 @@ Make sure that the PostgreSQL server package is installed on the machine
the PostgreSQL server itself.
Download and compile Nominatim as per standard instructions. Once done, you find
the nomrmalization library in `build/module/nominatim.so`. Copy the file to
the normalization library in `build/module/nominatim.so`. Copy the file to
the database server at a location where it is readable and executable by the
PostgreSQL server process.
@@ -162,11 +163,11 @@ PostgreSQL server process.
On the client side you now need to configure the import to point to the
correct location of the library **on the database server**. Add the following
line to your your `settings/local.php` file:
line to your your `.env` file:
```php
@define('CONST_Database_Module_Path', '<directory on the database server where nominatim.so resides>');
NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
```
Now change the `CONST_Database_DSN` to point to your remote server and continue
Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
to follow the [standard instructions for importing](/admin/Import).

101
docs/admin/Customization.md Normal file
View File

@@ -0,0 +1,101 @@
# Customization of the Database
This section explains in detail how to configure a Nominatim import and
the various means to use external data.
## External postcode data
Nominatim creates a table of known postcode centroids during import. This table
is used for searches of postcodes and for adding postcodes to places where the
OSM data does not provide one. These postcode centroids are mainly computed
from the OSM data itself. In addition, Nominatim supports reading postcode
information from an external CSV file, to supplement the postcodes that are
missing in OSM.
To enable external postcode support, simply put one CSV file per country into
your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
two-letter country code for which to apply the file. The file may also be
gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
The CSV file must use commas as a delimiter and have a header line. Nominatim
expects three columns to be present: `postcode`, `lat` and `lon`. All other
columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
postcode centroids in WGS84.
The postcode files are loaded only when there is data for the given country
in your database. For example, if there is a `us_postcodes.csv` file in your
project directory but you import only an excerpt of Italy, then the US postcodes
will simply be ignored.
As a rule, the external postcode data should be put into the project directory
**before** starting the initial import. Still, you can add, remove and update the
external postcode data at any time. Simply
run:
```
nominatim refresh --postcodes
```
to make the changes visible in your database. Be aware, however, that the changes
only have an immediate effect on searches for postcodes. Postcodes that were
added to places are only updated, when they are reindexed. That usually happens
only during replication updates.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2020 data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger2020-nominatim-preprocessed.csv.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger2020-nominatim-preprocessed.csv.tar.gz
3. Enable use of the Tiger data in your `.env` by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions
See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
information on how the data got preprocessed.
## Special phrases import
As described in the [Importation chapter](Import.md), it is possible to
import special phrases from the wiki with the following command:
```sh
nominatim special-phrases --import-from-wiki
```
But, it is also possible to import some phrases from a csv file.
To do so, you have access to the following command:
```sh
nominatim special-phrases --import-from-csv <csv file>
```
Note that the two previous import commands will update the phrases from your database.
This means that if you import some phrases from a csv file, only the phrases
present in the csv file will be kept into the database. All other phrases will
be removed.
If you want to only add new phrases and not update the other ones you can add
the argument `--no-replace` to the import command. For example:
```sh
nominatim special-phrases --import-from-csv <csv file> --no-replace
```
This will add the phrases present in the csv file into the database without
removing the other ones.

View File

@@ -1,7 +1,7 @@
# Deploying Nominatim
The Nominatim API is implemented as a PHP application. The `website/` directory
in the build directory contains the configured website. You can serve this
in the project directory contains the configured website. You can serve this
in a production environment with any web server that is capable to run
PHP scripts.
@@ -13,10 +13,11 @@ to run a web service. Please refer to the documentation of
for background information on configuring the services.
!!! Note
Throughout this page, we assume that your Nominatim build directory is
located in `/srv/nominatim/build` and the source code in
`/srv/nominatim/Nominatim`. If you have put it somewhere else, you
need to adjust the commands and configuration accordingly.
Throughout this page, we assume that your Nominatim project directory is
located in `/srv/nominatim-project` and that you have installed Nominatim
using the default installation prefix `/usr/local`. If you have put it
somewhere else, you need to adjust the commands and configuration
accordingly.
We further assume that your web server runs as user `www-data`. Older
versions of CentOS may still use the user name `apache`. You also need
@@ -29,7 +30,7 @@ web server user. You can check that the permissions are correct by accessing
on of the php files as the web server user:
``` sh
sudo -u www-data head -n 1 /srv/nominatim/build/website/search.php
sudo -u www-data head -n 1 /srv/nominatim-project/website/search.php
```
If this shows a permission error, then you need to adapt the permissions of
@@ -40,11 +41,11 @@ web server access. At a minimum the following SELinux labelling should be done
for Nominatim:
``` sh
sudo semanage fcontext -a -t httpd_sys_content_t "/srv/nominatim/Nominatim/(website|lib|settings)(/.*)?"
sudo semanage fcontext -a -t httpd_sys_content_t "/srv/nominatim/build/(website|settings)(/.*)?"
sudo semanage fcontext -a -t lib_t "/srv/nominatim/build/module/nominatim.so"
sudo restorecon -R -v /srv/nominatim/Nominatim
sudo restorecon -R -v /srv/nominatim/build
sudo semanage fcontext -a -t httpd_sys_content_t "/usr/local/nominatim/lib/lib-php(/.*)?"
sudo semanage fcontext -a -t httpd_sys_content_t "/srv/nominatim-project/website(/.*)?"
sudo semanage fcontext -a -t lib_t "/srv/nominatim-project/module/nominatim.so"
sudo restorecon -R -v /usr/local/lib/nominatim
sudo restorecon -R -v /srv/nominatim-project
```
## Nominatim with Apache
@@ -65,13 +66,13 @@ Make sure your Apache configuration contains the required permissions for the
directory and create an alias:
``` apache
<Directory "/srv/nominatim/build/website">
<Directory "/srv/nominatim-project/website">
Options FollowSymLinks MultiViews
AddType text/html .php
DirectoryIndex search.php
Require all granted
</Directory>
Alias /nominatim /srv/nominatim/build/website
Alias /nominatim /srv/nominatim-project/website
```
After making changes in the apache config you need to restart apache.
@@ -110,7 +111,7 @@ Tell nginx that php files are special and to fastcgi_pass to the php-fpm
unix socket by adding the location definition to the default configuration.
``` nginx
root /srv/nominatim/build/website;
root /srv/nominatim-project/website;
index search.php;
location / {
try_files $uri $uri/ @php;

View File

@@ -16,7 +16,7 @@ was killed. If it looks like this:
then you can resume with the following command:
```sh
./utils/setup.php --index --create-search-indices --create-country-names
nominatim import --continue indexing
```
If the reported rank is 26 or higher, you can also safely add `--index-noanalyse`.
@@ -31,7 +31,7 @@ list for hints.
If it happened during index creation you can try rerunning the step with
```sh
./utils/setup.php --create-search-indices --ignore-errors
nominatim import --continue indexing
```
Otherwise it's best to start the full setup from the beginning.
@@ -93,7 +93,7 @@ on a non-managed machine.
### I see the error: "function transliteration(text) does not exist"
Reinstall the nominatim functions with `setup.php --create--functions`
Reinstall the nominatim functions with `nominatim refresh --functions`
and check for any errors, e.g. a missing `nominatim.so` file.
### I see the error: "ERROR: mmap (remap) failed"
@@ -113,7 +113,8 @@ Double-check clang is installed. Instead of `make` try running `make CLANG=true`
### nominatim UPDATE failed: ERROR: buffer 179261 is not owned by resource owner Portal
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168) during the initial import of the database. It's
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168)
during the initial import of the database. It's
something PostgreSQL internal Nominatim doesn't control. And PostgreSQL forums
suggest it's threading related but definitely some kind of crash of a process.
Users reported either rebooting the server, different hardware or just trying
@@ -202,7 +203,7 @@ See the installation instructions for a full list of required packages.
### I forgot to delete the flatnodes file before starting an import.
That's fine. For each import the flatnodes file get overwritten.
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage]()
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage](https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage)
for more information.
@@ -211,11 +212,3 @@ for more information.
### Can I import negative OSM ids into Nominatim?
See [this question of Stackoverflow](https://help.openstreetmap.org/questions/64662/nominatim-flatnode-with-negative-id).
### Missing XML or text declaration
The website might show: `XML Parsing Error: XML or text declaration not at start of entity Location.`
Make sure there are no spaces at the beginning of your `settings/local.php` file.

View File

@@ -1,22 +1,54 @@
# Importing the Database
The following instructions explain how to create a Nominatim database
from an OSM planet file and how to keep the database up to date. It
is assumed that you have already successfully installed the Nominatim
software itself, if not return to the [installation page](Installation.md).
from an OSM planet file. It is assumed that you have already successfully
installed the Nominatim software itself and the `nominatim` tool can be found
in your `PATH`. If this is not the case, return to the
[installation page](Installation.md).
## Configuration setup in settings/local.php
## Creating the project directory
The Nominatim server can be customized via the file `settings/local.php`
in the build directory. Note that this is a PHP file, so it must always
start like this:
Before you start the import, you should create a project directory for your
new database installation. This directory receives all data that is related
to a single Nominatim setup: configuration, extra data, etc. Create a project
directory apart from the Nominatim software and change into the directory:
<?php
```
mkdir ~/nominatim-planet
cd ~/nominatim-planet
```
without any leading spaces.
In the following, we refer to the project directory as `$PROJECT_DIR`. To be
able to copy&paste instructions, you can export the appropriate variable:
```
export PROJECT_DIR=~/nominatim-planet
```
The Nominatim tool assumes per default that the current working directory is
the project directory but you may explicitly state a different directory using
the `--project-dir` parameter. The following instructions assume that you run
all commands from the project directory.
!!! tip "Migration Tip"
Nominatim used to be run directly from the build directory until version 3.6.
Essentially, the build directory functioned as the project directory
for the database installation. This setup still works and can be useful for
development purposes. It is not recommended anymore for production setups.
Create a project directory that is separate from the Nominatim software.
### Configuration setup in `.env`
The Nominatim server can be customized via an `.env` configuration file in the
project directory. This is a file in [dotenv](https://github.com/theskumar/python-dotenv)
format which looks the same as variable settings in a standard shell environment.
You can also set the same configuration via environment variables. All
settings have a `NOMINATIM_` prefix to avoid conflicts with other environment
variables.
There are lots of configuration settings you can tweak. Have a look
at `settings/default.php` for a full list. Most should have a sensible default.
at `Nominatim/settings/env.default` for a full list. Most should have a sensible default.
#### Flatnode files
@@ -24,9 +56,9 @@ If you plan to import a large dataset (e.g. Europe, North America, planet),
you should also enable flatnode storage of node locations. With this
setting enabled, node coordinates are stored in a simple file instead
of the database. This will save you import time and disk storage.
Add to your `settings/local.php`:
Add to your `.env`:
@define('CONST_Osm2pgsql_Flatnode_File', '/path/to/flatnode.file');
NOMINATIM_FLATNODE_FILE="/path/to/flatnode.file"
Replace the second part with a suitable path on your system and make sure
the directory exists. There should be at least 75GB of free space.
@@ -38,9 +70,9 @@ the directory exists. There should be at least 75GB of free space.
Wikipedia can be used as an optional auxiliary data source to help indicate
the importance of OSM features. Nominatim will work without this information
but it will improve the quality of the results if this is installed.
This data is available as a binary download:
This data is available as a binary download. Put it into your project directory:
cd $NOMINATIM_SOURCE_DIR/data
cd $PROJECT_DIR
wget https://www.nominatim.org/data/wikimedia-importance.sql.gz
The file is about 400MB and adds around 4GB to the Nominatim database.
@@ -48,17 +80,22 @@ The file is about 400MB and adds around 4GB to the Nominatim database.
!!! tip
If you forgot to download the wikipedia rankings, you can also add
importances after the import. Download the files, then run
`./utils/setup.php --import-wikipedia-articles`
and `./utils/update.php --recompute-importance`.
`nominatim refresh --wiki-data --importance`. Updating importances for
a planet can take a couple of hours.
### Great Britain, USA postcodes
### External postcodes
Nominatim can use postcodes from an external source to improve searches that
involve a GB or US postcode. This data can be optionally downloaded:
Nominatim can use postcodes from an external source to improve searching with
postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
This data can be optionally downloaded into the project directory:
cd $NOMINATIM_SOURCE_DIR/data
wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
wget https://www.nominatim.org/data/us_postcode_data.sql.gz
cd $PROJECT_DIR
wget https://www.nominatim.org/data/gb_postcodes.csv.gz
wget https://www.nominatim.org/data/us_postcodes.csv.gz
You can also add your own custom postcode sources, see
[Customization of postcodes](Customization.md#external-postcode-data).
## Choosing the data to import
@@ -86,11 +123,14 @@ that Nominatim cannot compute the areas for some administrative areas.
About half of the data in Nominatim's database is not really used for serving
the API. It is only there to allow the data to be updated from the latest
changes from OSM. For many uses these dynamic updates are not really required.
If you don't plan to apply updates, the dynamic part of the database can be
safely dropped using the following command:
If you don't plan to apply updates, you can run the import with the
`--no-updates` parameter. This will drop the dynamic part of the database as
soon as it is not required anymore.
You can also drop the dynamic part later using the following command:
```
./utils/setup.php --drop
nominatim freeze
```
Note that you still need to provide for sufficient disk space for the initial
@@ -124,7 +164,7 @@ import styles available which only read selected data:
Like the full style but also adds most of the OSM tags into the extratags
column.
The style can be changed with the configuration `CONST_Import_Style`.
The style can be changed with the configuration `NOMINATIM_IMPORT_STYLE`.
To give you an idea of the impact of using the different styles, the table
below gives rough estimates of the final database size after import of a
@@ -153,12 +193,15 @@ can be found in the development section.
[Geofabrik](https://download.geofabrik.de).
Download the data to import. Then issue the following command
from the **build directory** to start the import:
from the **project directory** to start the import:
```sh
./utils/setup.php --osm-file <data file> --all 2>&1 | tee setup.log
nominatim import --osm-file <data file> 2>&1 | tee setup.log
```
The **project directory** is the one that you have set up at the beginning.
See [creating the project directory](Import#creating-the-project-directory).
### Notes on full planet imports
Even on a perfectly configured machine
@@ -192,29 +235,19 @@ MB. Make sure you leave enough RAM for PostgreSQL and osm2pgsql as mentioned
above. If the system starts swapping or you are getting out-of-memory errors,
reduce the cache size or even consider using a flatnode file.
### Verify the import
### Testing the installation
Run this script to verify all required tables and indices got created successfully.
```sh
./utils/check_import_finished.php
nominatim admin --check-database
```
### Setting up the website
Run the following command to set up the configuration file for the API frontend
`settings/settings-frontend.php`. These settings are used in website/*.php files.
```sh
./utils/setup.php --setup-website
```
!!! Note
This step is not necessary if you use `--all` option while setting up the DB.
Now you can try out your installation by running:
```sh
make serve
nominatim serve
```
This runs a small test server normally used for development. You can use it
@@ -222,6 +255,9 @@ to verify that your installation is working. Go to
`http://localhost:8088/status.php` and you should see the message `OK`.
You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`.
Note that search query is not supported for reverse-only imports. You can run a
reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=78.04209025`.
To run Nominatim via webservers like Apache or nginx, please read the
[Deployment chapter](Deployment.md).
@@ -232,7 +268,7 @@ planner to make the right decisions. Recomputing them can improve the performanc
of forward geocoding in particular under high load. To recompute word counts run:
```sh
./utils/update.php --recompute-word-counts
nominatim refresh --word-counts
```
This will take a couple of hours for a full planet installation. You can
@@ -242,44 +278,14 @@ running this function.
If you want to be able to search for places by their type through
[special key phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to enable these key phrases like this:
you also need to import these key phrases like this:
./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql
```sh
nominatim special-phrases --import-from-wiki
```
Note that this command downloads the phrases from the wiki link above. You
need internet access for the step.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2019 data and unpack it into the
data directory in your Nominatim sources:
cd Nominatim/data
wget https://nominatim.org/data/tiger2019-nominatim-preprocessed.tar.gz
tar xf tiger2019-nominatim-preprocessed.tar.gz
2. Import the data into your Nominatim database:
./utils/setup.php --import-tiger-data
3. Enable use of the Tiger data in your `settings/local.php` by adding:
@define('CONST_Use_US_Tiger_Data', true);
4. Apply the new settings:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
information on how the data got preprocessed.
You can also import special phrases from a csv file, for more
information please read the [Customization chapter](Customization.md).

View File

@@ -17,6 +17,7 @@ and can't offer support.
* [Docker](https://github.com/mediagis/nominatim-docker)
* [Docker on Kubernetes](https://github.com/peter-evans/nominatim-k8s)
* [Kubernetes with Helm](https://github.com/robjuz/helm-charts/blob/master/charts/nominatim/README.md)
* [Ansible](https://github.com/synthesio/infra-ansible-nominatim)
## Prerequisites
@@ -30,23 +31,31 @@ For compiling:
* [proj](https://proj.org/)
* [bzip2](http://www.bzip.org/)
* [zlib](https://www.zlib.net/)
* [ICU](http://site.icu-project.org/)
* [Boost libraries](https://www.boost.org/), including system and filesystem
* PostgreSQL client libraries
* a recent C++ compiler (gcc 5+ or Clang 3.8+)
For running Nominatim:
* [PostgreSQL](https://www.postgresql.org) (9.3+)
* [PostgreSQL](https://www.postgresql.org) (9.5+ will work, 11+ strongly recommended)
* [PostGIS](https://postgis.net) (2.2+)
* [Python 3](https://www.python.org/)
* [Psycopg2](https://www.psycopg.org)
* [Python 3](https://www.python.org/) (3.6+)
* [Psycopg2](https://www.psycopg.org) (2.7+)
* [Python Dotenv](https://github.com/theskumar/python-dotenv)
* [psutil](https://github.com/giampaolo/psutil)
* [Jinja2](https://palletsprojects.com/p/jinja/)
* [PyICU](https://pypi.org/project/PyICU/)
* [PyYaml](https://pyyaml.org/) (5.1+)
* [datrie](https://github.com/pytries/datrie)
* [PHP](https://php.net) (7.0 or later)
* PHP-pgsql
* PHP-intl (bundled with PHP)
* PHP-cgi (for running queries from the command line)
For running continuous updates:
* [pyosmium](https://osmcode.org/pyosmium/) (with Python 3)
* [pyosmium](https://osmcode.org/pyosmium/)
For dependencies for running tests and building documentation, see
the [Development section](../develop/Development-Environment.md).
@@ -142,6 +151,16 @@ build at the same level as the Nominatim source directory run:
```
cmake ../Nominatim
make
sudo make install
```
Nominatim installs itself into `/usr/local` per default. To choose a different
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
cmake command. Make sure that the `bin` directory is available in your path
in that case, e.g.
```
export PATH=<install root>/bin:$PATH
```
Now continue with [importing the database](Import.md).

View File

@@ -1,10 +1,66 @@
# Database Migrations
This page describes database migrations necessary to update existing databases
to newer versions of Nominatim.
Since version 3.7.0 Nominatim offers automatic migrations. Please follow
the following steps:
SQL statements should be executed from the PostgreSQL commandline. Execute
`psql nominatim` to enter command line mode.
* stop any updates that are potentially running
* update Nominatim to the newer version
* go to your project directory and run `nominatim admin --migrate`
* (optionally) restart updates
Below you find additional migrations and hints about other structural and
breaking changes. **Please read them before running the migration.**
!!! note
If you are migrating from a version <3.6, then you still have to follow
the manual migration steps up to 3.6.
## 3.6.0 -> 3.7.0
### New format and name of configuration file
The configuration for an import is now saved in a `.env` file in the project
directory. This file follows the dotenv format. For more information, see
the [installation chapter](Import.md#configuration-setup-in-env).
To migrate to the new system, create a new project directory, add the `.env`
file and port your custom configuration from `settings/local.php`. Most
settings are named similar and only have received a `NOMINATIM_` prefix.
Use the default settings in `settings/env.defaults` as a reference.
### New location for data files
External data files for Wikipedia importance, postcodes etc. are no longer
expected to reside in the source tree by default. Instead they will be searched
in the project directory. If you have an automated setup script you must
either adapt the download location or explicitly set the location of the
files to the old place in your `.env`.
### Introducing `nominatim` command line tool
The various php utilities have been replaced with a single `nominatim`
command line tool. Make sure to adapt any scripts. There is no direct 1:1
matching between the old utilities and the commands of nominatim CLI. The
following list gives you a list of nominatim sub-commands that contain
functionality of each script:
* ./utils/setup.php: `import`, `freeze`, `refresh`
* ./utils/update.php: `replication`, `add-data`, `index`, `refresh`
* ./utils/specialphrases.php: `special-phrases`
* ./utils/check_import_finished.php: `admin`
* ./utils/warm.php: `admin`
* ./utils/export.php: `export`
Try `nominatim <command> --help` for more information about each subcommand.
`./utils/query.php` no longer exists in its old form. `nominatim search`
provides a replacement but returns different output.
### Switch to normalized house numbers
The housenumber column in the placex table uses now normalized version.
The automatic migration step will convert the column but this may take a
very long time. It is advisable to take the machine offline while doing that.
## 3.5.0 -> 3.6.0
@@ -68,6 +124,14 @@ configuration file, run the following command after updating:
./utils/setup.php --setup-website
```
### Update SQL code
To update the SQL code to the leatest version run:
```
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.4.0 -> 3.5.0
### New Wikipedia/Wikidata importance tables

View File

@@ -10,12 +10,11 @@ installation. For more details, please also have a look at the
## Installing nominatim-ui
nominatim-ui does not need any special installation, just download, configure
and run it.
Clone the source from github:
git clone https://github.com/osm-search/nominatim-ui
We provide regular releases of nominatim-ui that contain the packaged website.
They do not need any special installation. Just download, configure
and run it. Grab the latest release from
[nominatim-ui's Github release page](https://github.com/osm-search/nominatim-ui/releases)
and unpack it. You can use `nominatim-ui-x.x.x.tar.gz` or `nominatim-ui-x.x.x.zip`.
Copy the example configuration into the right place:

205
docs/admin/Tokenizers.md Normal file
View File

@@ -0,0 +1,205 @@
# Tokenizers
The tokenizer module in Nominatim is responsible for analysing the names given
to OSM objects and the terms of an incoming query in order to make sure, they
can be matched appropriately.
Nominatim offers different tokenizer modules, which behave differently and have
different configuration options. This sections describes the tokenizers and how
they can be configured.
!!! important
The use of a tokenizer is tied to a database installation. You need to choose
and configure the tokenizer before starting the initial import. Once the import
is done, you cannot switch to another tokenizer anymore. Reconfiguring the
chosen tokenizer is very limited as well. See the comments in each tokenizer
section.
## Legacy tokenizer
The legacy tokenizer implements the analysis algorithms of older Nominatim
versions. It uses a special Postgresql module to normalize names and queries.
This tokenizer is currently the default.
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=legacy
```
The Postgresql module for the tokenizer is available in the `module` directory
and also installed with the remainder of the software under
`lib/nominatim/module/nominatim.so`. You can specify a custom location for
the module with
```
NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
```
This is in particular useful when the database runs on a different server.
See [Advanced installations](Advanced-Installations.md#importing-nominatim-to-an-external-postgresql-database) for details.
There are no other configuration options for the legacy tokenizer. All
normalization functions are hard-coded.
## ICU tokenizer
!!! danger
This tokenizer is currently in active development and still subject
to backwards-incompatible changes.
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and
abbreviation handling.
### How it works
On import the tokenizer processes names in the following four stages:
1. The **Normalization** part removes all non-relevant information from the
input.
2. Incoming names are now converted to **full names**. This process is currently
hard coded and mostly serves to handle name tags from OSM that contain
multiple names (e.g. [Biel/Bienne](https://www.openstreetmap.org/node/240097197)).
3. Next the tokenizer creates **variants** from the full names. These variants
cover decomposition and abbreviation handling. Variants are saved to the
database, so that it is not necessary to create the variants for a search
query.
4. The final **Tokenization** step converts the names to a simple ASCII form,
potentially removing further spelling variants for better matching.
At query time only stage 1) and 4) are used. The query is normalized and
tokenized and the resulting string used for searching in the database.
### Configuration
The ICU tokenizer is configured using a YAML file which can be configured using
`NOMINATIM_TOKENIZER_CONFIG`. The configuration is read on import and then
saved as part of the internal database status. Later changes to the variable
have no effect.
Here is an example configuration file:
``` yaml
normalization:
- ":: lower ()"
- "ß > 'ss'" # German szet is unimbigiously equal to double ss
transliteration:
- !include /etc/nominatim/icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
variants:
- language: de
words:
- ~haus => haus
- ~strasse -> str
- language: en
words:
- road -> rd
- bridge -> bdge,br,brdg,bri,brg
```
The configuration file contains three sections:
`normalization`, `transliteration`, `variants`.
The normalization and transliteration sections each must contain a list of
[ICU transformation rules](https://unicode-org.github.io/icu/userguide/transforms/general/rules.html).
The rules are applied in the order in which they appear in the file.
You can also include additional rules from external yaml file using the
`!include` tag. The included file must contain a valid YAML list of ICU rules
and may again include other files.
!!! warning
The ICU rule syntax contains special characters that conflict with the
YAML syntax. You should therefore always enclose the ICU rules in
double-quotes.
The variants section defines lists of replacements which create alternative
spellings of a name. To create the variants, a name is scanned from left to
right and the longest matching replacement is applied until the end of the
string is reached.
The variants section must contain a list of replacement groups. Each group
defines a set of properties that describes where the replacements are
applicable. In addition, the word section defines the list of replacements
to be made. The basic replacement description is of the form:
```
<source>[,<source>[...]] => <target>[,<target>[...]]
```
The left side contains one or more `source` terms to be replaced. The right side
lists one or more replacements. Each source is replaced with each replacement
term.
!!! tip
The source and target terms are internally normalized using the
normalization rules given in the configuration. This ensures that the
strings match as expected. In fact, it is better to use unnormalized
words in the configuration because then it is possible to change the
rules for normalization later without having to adapt the variant rules.
#### Decomposition
In its standard form, only full words match against the source. There
is a special notation to match the prefix and suffix of a word:
``` yaml
- ~strasse => str # matches "strasse" as full word and in suffix position
- hinter~ => hntr # matches "hinter" as full word and in prefix position
```
There is no facility to match a string in the middle of the word. The suffix
and prefix notation automatically trigger the decomposition mode: two variants
are created for each replacement, one with the replacement attached to the word
and one separate. So in above example, the tokenization of "hauptstrasse" will
create the variants "hauptstr" and "haupt str". Similarly, the name "rote strasse"
triggers the variants "rote str" and "rotestr". By having decomposition work
both ways, it is sufficient to create the variants at index time. The variant
rules are not applied at query time.
To avoid automatic decomposition, use the '|' notation:
``` yaml
- ~strasse |=> str
```
simply changes "hauptstrasse" to "hauptstr" and "rote strasse" to "rote str".
#### Initial and final terms
It is also possible to restrict replacements to the beginning and end of a
name:
``` yaml
- ^south => s # matches only at the beginning of the name
- road$ => rd # matches only at the end of the name
```
So the first example would trigger a replacement for "south 45th street" but
not for "the south beach restaurant".
#### Replacements vs. variants
The replacement syntax `source => target` works as a pure replacement. It changes
the name instead of creating a variant. To create an additional version, you'd
have to write `source => source,target`. As this is a frequent case, there is
a shortcut notation for it:
```
<source>[,<source>[...]] -> <target>[,<target>[...]]
```
The simple arrow causes an additional variant to be added. Note that
decomposition has an effect here on the source as well. So a rule
``` yaml
- "~strasse -> str"
```
means that for a word like `hauptstrasse` four variants are created:
`hauptstrasse`, `haupt strasse`, `hauptstr` and `haupt str`.
### Reconfiguration
Changing the configuration after the import is currently not possible, although
this feature may be added at a later time.

View File

@@ -1,8 +1,10 @@
# Updating the Database
There are many different ways to update your Nominatim database.
The following section describes how to keep it up-to-date with Pyosmium.
For a list of other methods see the output of `./utils/update.php --help`.
The following section describes how to keep it up-to-date using
an [online replication service for OpenStreetMap data](https://wiki.openstreetmap.org/wiki/Planet.osm/diffs)
For a list of other methods to add or update data see the output of
`nominatim add-data --help`.
!!! important
If you have configured a flatnode file for the import, then you
@@ -17,50 +19,37 @@ Run (as the same user who will later run the updates):
pip3 install --user osmium
```
Nominatim needs a tool called `pyosmium-get-changes` which comes with
Pyosmium. You need to tell Nominatim where to find it. Add the
following line to your `settings/local.php`:
@define('CONST_Pyosmium_Binary', '/home/user/.local/bin/pyosmium-get-changes');
The path above is fine if you used the `--user` parameter with pip.
Replace `user` with your user name.
#### Setting up the update process
Next the update needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
to `settings/local.php`. For example, to use the daily country extracts
to `.env`. For example, to use the daily country extracts
diffs for Ireland from Geofabrik add the following:
// base URL of the replication service
@define('CONST_Replication_Url', 'https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates');
// How often upstream publishes diffs
@define('CONST_Replication_Update_Interval', '86400');
// How long to sleep if no update found yet
@define('CONST_Replication_Recheck_Interval', '900');
# base URL of the replication service
NOMINATIM_REPLICATION_URL="https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates"
# How often upstream publishes diffs (in seconds)
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
# How long to sleep if no update found yet (in seconds)
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900
To set up the update process now run the following command:
./utils/update.php --init-updates
nominatim replication --init
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `--init-updates` command needs to be rerun whenever the replication service
is changed.
The `replication --init` command needs to be rerun whenever the replication
service is changed.
#### Updating Nominatim
The following command will keep your database constantly up to date:
./utils/update.php --import-osmosis-all
(Note that even though the old name "import-osmosis-all" has been kept for
compatibility reasons, Osmosis is not required to run this - it uses pyosmium
behind the scenes.)
nominatim replication
If you have imported multiple country extracts and want to keep them
up-to-date, [Advanced installations section](Advanced-Installations.md) contains instructions

View File

@@ -58,4 +58,4 @@ The [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) is more
suited for these kinds of queries.
That said if you installed your own Nominatim instance you can use the
`/utils/export.php` PHP script as basis to return such lists.
`nominatim export` PHP script as basis to return such lists.

View File

@@ -162,7 +162,7 @@ This overrides the specified machine readable format. (Default: 0)
"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https:\/\/www.openstreetmap.org\/copyright",
"osm_type":"way",
"osm_id":"280940520",
"lat":"-34.4391708",
"lat":"-34.4391708",
"lon":"-58.7064573",
"place_rank":"26",
"category":"highway",

View File

@@ -35,10 +35,16 @@ will return HTTP code 200 and a structure
{
"status": 0,
"message": "OK",
"data_updated": "2020-05-04T14:47:00+00:00"
"data_updated": "2020-05-04T14:47:00+00:00",
"software_version": "3.6.0-0",
"database_version": "3.6.0-0"
}
```
The `software_version` field contains the version of Nominatim used to serve
the API. The `database_version` field contains the version of the data format
in the database.
On error will also return HTTP status code 200 and a structure with error
code and message, e.g.

View File

@@ -26,12 +26,14 @@ following packages should get you started:
## Prerequisites for testing and documentation
The Nominatim test suite consists of behavioural tests (using behave) and
unit tests (using PHPUnit). It has the following additional requirements:
unit tests (using PHPUnit for PHP code and pytest for Python code).
It has the following additional requirements:
* [behave test framework](https://behave.readthedocs.io) >= 1.2.5
* [nose](https://nose.readthedocs.io)
* [behave test framework](https://behave.readthedocs.io) >= 1.2.6
* [phpunit](https://phpunit.de) >= 7.3
* [PHP CodeSniffer](https://github.com/squizlabs/PHP_CodeSniffer)
* [Pylint](https://pylint.org/) (2.6.0 is used for the CI)
* [pytest](https://pytest.org)
The documentation is built with mkdocs:
@@ -47,9 +49,9 @@ To install all necessary packages run:
```sh
sudo apt install php-cgi phpunit php-codesniffer \
python3-pip python3-setuptools python3-dev
python3-pip python3-setuptools python3-dev pylint
pip3 install --user behave nose mkdocs
pip3 install --user behave mkdocs pytest
```
The `mkdocs` executable will be located in `.local/bin`. You may have to add
@@ -78,58 +80,15 @@ echo 'export PATH=~/.config/composer/vendor/bin:$PATH' > ~/.profile
## Executing Tests
All tests are located in the `\test` directory.
All tests are located in the `/test` directory.
### Preparing the test database
Some of the behavioural test expect a test database to be present. You need at
least 2GB RAM and 10GB disk space to create the database.
First create a separate directory for the test DB and fetch the test planet
data and the Tiger data for South Dakota:
```
mkdir testdb
cd testdb
wget https://www.nominatim.org/data/test/nominatim-api-testdata.pbf
wget -O - https://nominatim.org/data/tiger2018-nominatim-preprocessed.tar.gz | tar xz --wildcards --no-anchored '46*'
```
Configure and build Nominatim in the usual way:
```
cmake $USERNAME/Nominatim
make
```
Copy the test settings:
```
cp $USERNAME/Nominatim/test/testdb/local.php settings/
```
Inspect the file to check that all settings are correct for your local setup.
Now you can import the test database:
```
dropdb --if-exists test_api_nominatim
./utils/setup.php --all --osm-file nominatim-api-testdb.pbf 2>&1 | tee import.log
./utils/specialphrases.php --wiki-import | psql -d test_api_nominatim 2>&1 | tee -a import.log
./utils/setup.php --import-tiger-data 2>&1 | tee -a import.log
```
### Running the tests
To run all tests just go to the test directory and run make:
To run all tests just go to the build directory and run make:
```sh
cd test
make
cd build
make test
```
To skip tests that require the test database, run `make no-test-db` instead.
For more information about the structure of the tests and how to change and
extend the test suite, see the [Testing chapter](Testing.md).

View File

@@ -29,7 +29,7 @@ once with `class` of `highway` and once with a `class` of `bridge`. Thus the
## Configuring the Import
How tags are interpreted and assigned to the different `place` columns can be
configured via the import style configuration file (`CONST_Import_style`). This
configured via the import style configuration file (`NOMINATIM_IMPORT_STYLE`). This
is a JSON file which contains a list of rules which are matched against every
tag of every object and then assign the tag its specific role.

View File

@@ -14,7 +14,7 @@ country's format, e.g. if Swiss postcodes are 4 digits.
## Regular updating calculated postcodes
The script to rerun the calculation is
`build/utils/update.php --calculate-postcodes`
`nominatim refresh --postcodes`
and runs once per night on nominatim.openstreetmap.org.

View File

@@ -87,9 +87,9 @@ into the database. There are a few hard-coded rules for the assignment:
* highway nodes
* landuse that is not an area
Other than that, the ranks can be freely assigned via the JSON file
defined with `CONST_Address_Level_Config` according to their type and
the country they are in.
Other than that, the ranks can be freely assigned via the JSON file according
to their type and the country they are in. The name of the config file to be
used can be changed with the setting `NOMINATIM_ADDRESS_LEVEL_CONFIG`.
The address level configuration must consist of an array of configuration
entries, each containing a tag definition and an optional country array:

View File

@@ -21,14 +21,15 @@ This test directory is sturctured as follows:
| +- api Tests for API endpoints (search, reverse, etc.)
|
+- php PHP unit tests
+- python Python unit tests
+- scenes Geometry test data
+- testdb Base data for generating API test database
```
## PHP Unit Tests (`test/php`)
Unit tests can be found in the php/ directory. They test selected php functions.
Very low coverage.
Unit tests for PHP code can be found in the `php/` directory. They test selected
PHP functions. Very low coverage.
To execute the test suite run
@@ -36,11 +37,26 @@ To execute the test suite run
UNIT_TEST_DSN='pgsql:dbname=nominatim_unit_tests' phpunit ../
It will read phpunit.xml which points to the library, test path, bootstrap
strip and set other parameters.
strip and sets other parameters.
It will use (and destroy) a local database 'nominatim_unit_tests'. You can set
a different connection string with e.g. UNIT_TEST_DSN='pgsql:dbname=foo_unit_tests'.
## Python Unit Tests (`test/python`)
Unit tests for Python code can be found in the `python/` directory. The goal is
to have complete coverage of the Python library in `nominatim`.
To execute the tests run
py.test-3 test/python
or
pytest test/python
The name of the pytest binary depends on your installation.
## BDD Functional Tests (`test/bdd`)
Functional tests are written as BDD instructions. For more information on
@@ -67,17 +83,17 @@ The tests can be configured with a set of environment variables (`behave -D key=
the test databases (db tests)
* `TEST_DB` - name of test database (db tests)
* `API_TEST_DB` - name of the database containing the API test data (api tests)
* `API_TEST_FILE` - OSM file to be imported into the API test database (api tests)
* `DB_HOST` - (optional) hostname of database host
* `DB_PORT` - (optional) port of database on host
* `DB_USER` - (optional) username of database login
* `DB_PASS` - (optional) password for database login
* `SERVER_MODULE_PATH` - (optional) path on the Postgres server to Nominatim
module shared library file
* `TEST_SETTINGS_TEMPLATE` - file to write temporary Nominatim settings to
* `REMOVE_TEMPLATE` - if true, the template database will not be reused during
the next run. Reusing the base templates speeds up tests
considerably but might lead to outdated errors for some
changes in the database layout.
* `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
during the next run. Reusing the base templates speeds
up tests considerably but might lead to outdated errors
for some changes in the database layout.
* `KEEP_TEST_DB` - if true, the test database will not be dropped after a test
is finished. Should only be used if one single scenario is
run, otherwise the result is undefined.
@@ -89,23 +105,20 @@ feature of behave which comes in handy when writing new tests.
### API Tests (`test/bdd/api`)
These tests are meant to test the different API endpoints and their parameters.
They require to import several datasets into a test database.
See the [Development Setup chapter](Development-Environment.md#preparing-the-test-database)
for instructions on how to set up this database.
They require to import several datasets into a test database. This is normally
done automatically during setup of the test. The API test database is then
kept around and reused in subsequent runs of behave. Use `behave -DREMOVE_TEMPLATE`
to force a reimport of the database.
The official test dataset was derived from the 180924 planet (note: such
file no longer exists at https://planet.openstreetmap.org/planet/2018/).
Newer planets are likely to work as well but you may see isolated test
failures where the data has changed.
The official test dataset is saved in the file `test/testdb/apidb-test-data.pbf`
and compromises the following data:
The official test dataset can always be downloaded from
[nominatim.org](https://www.nominatim.org/data/test/nominatim-api-testdata.pbf)
To recreate the input data for the test database run:
* Geofabrik extract of Liechtenstein
* extract of Autauga country, Alabama, US (for tests against Tiger data)
* additional data from `test/testdb/additional_api_test.data.osm`
```
wget https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-180924.osm.pbf
osmconvert planet-180924.osm.pbf -B=test/testdb/testdb.polys -o=testdb.pbf
```
API tests should only be testing the functionality of the website PHP code.
Most tests should be formulated as BDD DB creation tests (see below) instead.
#### Code Coverage
@@ -140,3 +153,7 @@ needs superuser rights for postgres.
These tests check that data is imported correctly into the place table. They
use the same template database as the DB Creation tests, so the same remarks apply.
Note that most testing of the gazetteer output of osm2pgsql is done in the tests
of osm2pgsql itself. The BDD tests are just there to ensure compatibility of
the osm2pgsql and Nominatim code.

View File

@@ -19,6 +19,8 @@ pages:
- 'Import' : 'admin/Import.md'
- 'Update' : 'admin/Update.md'
- 'Deploy' : 'admin/Deployment.md'
- 'Customize Imports' : 'admin/Customization.md'
- 'Tokenizers' : 'admin/Tokenizers.md'
- 'Nominatim UI' : 'admin/Setup-Nominatim-UI.md'
- 'Advanced Installations' : 'admin/Advanced-Installations.md'
- 'Migration from older Versions' : 'admin/Migration.md'

View File

@@ -2,7 +2,7 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/ClassTypes.php');
require_once(CONST_LibDir.'/ClassTypes.php');
/**
* Detailed list of address parts for a single result
@@ -61,7 +61,7 @@ class AddressDetails
return join(', ', $aParts);
}
public function getAddressNames($sCountry = null)
public function getAddressNames()
{
$aAddress = array();
@@ -79,13 +79,11 @@ class AddressDetails
$sName = $aLine['housenumber'];
}
if (isset($sName)) {
$sTypeLabel = strtolower(str_replace(' ', '_', $sTypeLabel));
if (!isset($aAddress[$sTypeLabel])
|| $aLine['class'] == 'place'
) {
$aAddress[$sTypeLabel] = $sName;
}
if (isset($sName)
&& (!isset($aAddress[$sTypeLabel])
|| $aLine['class'] == 'place')
) {
$aAddress[$sTypeLabel] = $sName;
}
}

View File

@@ -2,7 +2,7 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/DatabaseError.php');
require_once(CONST_LibDir.'/DatabaseError.php');
/**
* Uses PDO to access the database specified in the CONST_Database_DSN
@@ -12,9 +12,9 @@ class DB
{
protected $connection;
public function __construct($sDSN = CONST_Database_DSN)
public function __construct($sDSN = null)
{
$this->sDSN = $sDSN;
$this->sDSN = $sDSN ?? getSetting('DATABASE_DSN');
}
public function connect($bNew = false, $bPersistent = true)
@@ -39,7 +39,9 @@ class DB
$conn->exec("SET DateStyle TO 'sql,european'");
$conn->exec("SET client_encoding TO 'utf-8'");
$iMaxExecution = ini_get('max_execution_time');
if ($iMaxExecution > 0) $conn->setAttribute(\PDO::ATTR_TIMEOUT, $iMaxExecution); // seconds
if ($iMaxExecution > 0) {
$conn->setAttribute(\PDO::ATTR_TIMEOUT, $iMaxExecution); // seconds
}
$this->connection = $conn;
return true;
@@ -95,7 +97,9 @@ class DB
try {
$stmt = $this->getQueryStatement($sSQL, $aInputVars, $sErrMessage);
$row = $stmt->fetch(\PDO::FETCH_NUM);
if ($row === false) return false;
if ($row === false) {
return false;
}
} catch (\PDOException $e) {
throw new \Nominatim\DatabaseError($sErrMessage, 500, null, $e, $sSQL);
}
@@ -240,16 +244,6 @@ class DB
return ($this->getOne($sSQL, array(':tablename' => $sTableName)) == 1);
}
/**
* Returns a list of table names in the database
*
* @return array[]
*/
public function getListOfTables()
{
return $this->getCol("SELECT tablename FROM pg_tables WHERE schemaname='public'");
}
/**
* Deletes a table. Returns true if deleted or didn't exist.
*
@@ -262,76 +256,6 @@ class DB
return $this->exec('DROP TABLE IF EXISTS '.$sTableName.' CASCADE') == 0;
}
/**
* Check if an index exists in the database. Optional filtered by tablename
*
* @param string $sTableName
*
* @return boolean
*/
public function indexExists($sIndexName, $sTableName = null)
{
return in_array($sIndexName, $this->getListOfIndices($sTableName));
}
/**
* Returns a list of index names in the database, optional filtered by tablename
*
* @param string $sTableName
*
* @return array
*/
public function getListOfIndices($sTableName = null)
{
// table_name | index_name | column_name
// -----------------------+---------------------------------+--------------
// country_name | idx_country_name_country_code | country_code
// country_osm_grid | idx_country_osm_grid_geometry | geometry
// import_polygon_delete | idx_import_polygon_delete_osmid | osm_id
// import_polygon_delete | idx_import_polygon_delete_osmid | osm_type
// import_polygon_error | idx_import_polygon_error_osmid | osm_id
// import_polygon_error | idx_import_polygon_error_osmid | osm_type
$sSql = <<< END
SELECT
t.relname as table_name,
i.relname as index_name,
a.attname as column_name
FROM
pg_class t,
pg_class i,
pg_index ix,
pg_attribute a
WHERE
t.oid = ix.indrelid
and i.oid = ix.indexrelid
and a.attrelid = t.oid
and a.attnum = ANY(ix.indkey)
and t.relkind = 'r'
and i.relname NOT LIKE 'pg_%'
FILTERS
ORDER BY
t.relname,
i.relname,
a.attname
END;
$aRows = null;
if ($sTableName) {
$sSql = str_replace('FILTERS', 'and t.relname = :tablename', $sSql);
$aRows = $this->getAll($sSql, array(':tablename' => $sTableName));
} else {
$sSql = str_replace('FILTERS', '', $sSql);
$aRows = $this->getAll($sSql);
}
$aIndexNames = array_unique(array_map(function ($aRow) {
return $aRow['index_name'];
}, $aRows));
sort($aIndexNames);
return $aIndexNames;
}
/**
* Tries to connect to the database but on failure doesn't throw an exception.
*
@@ -386,9 +310,13 @@ END;
if (preg_match('/^pgsql:(.+)$/', $sDSN, $aMatches)) {
foreach (explode(';', $aMatches[1]) as $sKeyVal) {
list($sKey, $sVal) = explode('=', $sKeyVal, 2);
if ($sKey == 'host') $sKey = 'hostspec';
if ($sKey == 'dbname') $sKey = 'database';
if ($sKey == 'user') $sKey = 'username';
if ($sKey == 'host') {
$sKey = 'hostspec';
} elseif ($sKey == 'dbname') {
$sKey = 'database';
} elseif ($sKey == 'user') {
$sKey = 'username';
}
$aInfo[$sKey] = $sVal;
}
}

View File

@@ -5,7 +5,7 @@ namespace Nominatim;
class DatabaseError extends \Exception
{
public function __construct($message, $code = 500, Exception $previous = null, $oPDOErr, $sSql = null)
public function __construct($message, $code, $previous, $oPDOErr, $sSql = null)
{
parent::__construct($message, $code, $previous);
// https://secure.php.net/manual/en/class.pdoexception.php

View File

@@ -78,7 +78,7 @@ class Debug
echo '<th>Address Tokens</th><th>Address Not</th>';
echo '<th>country</th><th>operator</th>';
echo '<th>class</th><th>type</th><th>postcode</th><th>housenumber</th></tr>';
foreach ($aSearches as $iRank => $aRankedSet) {
foreach ($aSearches as $aRankedSet) {
foreach ($aRankedSet as $aRow) {
$aRow->dumpAsHtmlTableRow($aWordsIDs);
}

View File

@@ -2,23 +2,25 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/PlaceLookup.php');
require_once(CONST_BasePath.'/lib/Phrase.php');
require_once(CONST_BasePath.'/lib/ReverseGeocode.php');
require_once(CONST_BasePath.'/lib/SearchDescription.php');
require_once(CONST_BasePath.'/lib/SearchContext.php');
require_once(CONST_BasePath.'/lib/TokenList.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/Phrase.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
require_once(CONST_LibDir.'/SearchDescription.php');
require_once(CONST_LibDir.'/SearchContext.php');
require_once(CONST_LibDir.'/SearchPosition.php');
require_once(CONST_LibDir.'/TokenList.php');
require_once(CONST_TokenizerDir.'/tokenizer.php');
class Geocode
{
protected $oDB;
protected $oPlaceLookup;
protected $oTokenizer;
protected $aLangPrefOrder = array();
protected $aExcludePlaceIDs = array();
protected $bReverseInPlan = false;
protected $iLimit = 20;
protected $iFinalLimit = 10;
@@ -42,28 +44,12 @@ class Geocode
protected $sQuery = false;
protected $aStructuredQuery = false;
protected $oNormalizer = null;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oPlaceLookup = new PlaceLookup($this->oDB);
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
}
private function normTerm($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
public function setReverseInPlan($bReverse)
{
$this->bReverseInPlan = $bReverse;
$this->oTokenizer = new \Nominatim\Tokenizer($this->oDB);
}
public function setLanguagePreference($aLangPref)
@@ -85,7 +71,9 @@ class Geocode
$aParams['exclude_place_ids'] = implode(',', $this->aExcludePlaceIDs);
}
if ($this->bBoundedSearch) $aParams['bounded'] = '1';
if ($this->bBoundedSearch) {
$aParams['bounded'] = '1';
}
if ($this->aCountryCodes) {
$aParams['countrycodes'] = implode(',', $this->aCountryCodes);
@@ -100,8 +88,11 @@ class Geocode
public function setLimit($iLimit = 10)
{
if ($iLimit > 50) $iLimit = 50;
if ($iLimit < 1) $iLimit = 1;
if ($iLimit > 50) {
$iLimit = 50;
} elseif ($iLimit < 1) {
$iLimit = 1;
}
$this->iFinalLimit = $iLimit;
$this->iLimit = $iLimit + min($iLimit, 10);
@@ -196,18 +187,24 @@ class Geocode
if ($sExcluded) {
foreach ($sExcluded as $iExcludedPlaceID) {
$iExcludedPlaceID = (int)$iExcludedPlaceID;
if ($iExcludedPlaceID)
if ($iExcludedPlaceID) {
$aExcludePlaceIDs[$iExcludedPlaceID] = $iExcludedPlaceID;
}
}
if (isset($aExcludePlaceIDs))
if (isset($aExcludePlaceIDs)) {
$this->aExcludePlaceIDs = $aExcludePlaceIDs;
}
}
// Only certain ranks of feature
$sFeatureType = $oParams->getString('featureType');
if (!$sFeatureType) $sFeatureType = $oParams->getString('featuretype');
if ($sFeatureType) $this->setFeatureType($sFeatureType);
if (!$sFeatureType) {
$sFeatureType = $oParams->getString('featuretype');
}
if ($sFeatureType) {
$this->setFeatureType($sFeatureType);
}
// Country code list
$sCountries = $oParams->getStringList('countrycodes');
@@ -217,8 +214,9 @@ class Geocode
$aCountries[] = strtolower($sCountryCode);
}
}
if (isset($aCountries))
if (isset($aCountries)) {
$this->aCountryCodes = $aCountries;
}
}
$aViewbox = $oParams->getStringList('viewboxlbrt');
@@ -262,7 +260,6 @@ class Geocode
$oParams->getString('country'),
$oParams->getString('postalcode')
);
$this->setReverseInPlan(false);
} else {
$this->setQuery($sQuery);
}
@@ -271,13 +268,17 @@ class Geocode
public function loadStructuredAddressElement($sValue, $sKey, $iNewMinAddressRank, $iNewMaxAddressRank, $aItemListValues)
{
$sValue = trim($sValue);
if (!$sValue) return false;
if (!$sValue) {
return false;
}
$this->aStructuredQuery[$sKey] = $sValue;
if ($this->iMinAddressRank == 0 && $this->iMaxAddressRank == 30) {
$this->iMinAddressRank = $iNewMinAddressRank;
$this->iMaxAddressRank = $iNewMaxAddressRank;
}
if ($aItemListValues) $this->aAddressRankList = array_merge($this->aAddressRankList, $aItemListValues);
if ($aItemListValues) {
$this->aAddressRankList = array_merge($this->aAddressRankList, $aItemListValues);
}
return true;
}
@@ -311,11 +312,11 @@ class Geocode
public function fallbackStructuredQuery()
{
if (!$this->aStructuredQuery) return false;
$aParams = $this->aStructuredQuery;
if (count($aParams) == 1) return false;
if (!$aParams || count($aParams) == 1) {
return false;
}
$aOrderToFallback = array('postalcode', 'street', 'city', 'county', 'state');
@@ -330,7 +331,7 @@ class Geocode
return false;
}
public function getGroupedSearches($aSearches, $aPhrases, $oValidTokens, $bIsStructured)
public function getGroupedSearches($aSearches, $aPhrases, $oValidTokens)
{
/*
Calculate all searches using oValidTokens i.e.
@@ -345,52 +346,26 @@ class Geocode
*/
foreach ($aPhrases as $iPhrase => $oPhrase) {
$aNewPhraseSearches = array();
$sPhraseType = $bIsStructured ? $oPhrase->getPhraseType() : '';
$oPosition = new SearchPosition(
$oPhrase->getPhraseType(),
$iPhrase,
count($aPhrases)
);
foreach ($oPhrase->getWordSets() as $aWordset) {
$aWordsetSearches = $aSearches;
// Add all words from this wordset
foreach ($aWordset as $iToken => $sToken) {
//echo "<br><b>$sToken</b>";
$aNewWordsetSearches = array();
$oPosition->setTokenPosition($iToken, count($aWordset));
foreach ($aWordsetSearches as $oCurrentSearch) {
//echo "<i>";
//var_dump($oCurrentSearch);
//echo "</i>";
// Tokens with full name matches.
foreach ($oValidTokens->get(' '.$sToken) as $oSearchTerm) {
$aNewSearches = $oCurrentSearch->extendWithFullTerm(
$oSearchTerm,
$oValidTokens->contains($sToken)
&& strpos($sToken, ' ') === false,
$sPhraseType,
$iToken == 0 && $iPhrase == 0,
$iPhrase == 0,
$iToken + 1 == count($aWordset)
&& $iPhrase + 1 == count($aPhrases)
);
foreach ($aNewSearches as $oSearch) {
if ($oSearch->getRank() < $this->iMaxRank) {
$aNewWordsetSearches[] = $oSearch;
}
}
}
// Look for partial matches.
// Note that there is no point in adding country terms here
// because country is omitted in the address.
if ($sPhraseType != 'country') {
// Allow searching for a word - but at extra cost
foreach ($oValidTokens->get($sToken) as $oSearchTerm) {
$aNewSearches = $oCurrentSearch->extendWithPartialTerm(
$sToken,
$oSearchTerm,
$bIsStructured,
$iPhrase,
$oValidTokens->get(' '.$sToken)
foreach ($oValidTokens->get($sToken) as $oSearchTerm) {
if ($oSearchTerm->isExtendable($oCurrentSearch, $oPosition)) {
$aNewSearches = $oSearchTerm->extendSearch(
$oCurrentSearch,
$oPosition
);
foreach ($aNewSearches as $oSearch) {
@@ -405,7 +380,6 @@ class Geocode
usort($aNewWordsetSearches, array('Nominatim\SearchDescription', 'bySearchRank'));
$aWordsetSearches = array_slice($aNewWordsetSearches, 0, 50);
}
//var_Dump('<hr>',count($aWordsetSearches)); exit;
$aNewPhraseSearches = array_merge($aNewPhraseSearches, $aNewWordsetSearches);
usort($aNewPhraseSearches, array('Nominatim\SearchDescription', 'bySearchRank'));
@@ -413,8 +387,11 @@ class Geocode
$aSearchHash = array();
foreach ($aNewPhraseSearches as $iSearch => $aSearch) {
$sHash = serialize($aSearch);
if (isset($aSearchHash[$sHash])) unset($aNewPhraseSearches[$iSearch]);
else $aSearchHash[$sHash] = 1;
if (isset($aSearchHash[$sHash])) {
unset($aNewPhraseSearches[$iSearch]);
} else {
$aSearchHash[$sHash] = 1;
}
}
$aNewPhraseSearches = array_slice($aNewPhraseSearches, 0, 50);
@@ -435,10 +412,12 @@ class Geocode
$iSearchCount = 0;
$aSearches = array();
foreach ($aGroupedSearches as $iScore => $aNewSearches) {
foreach ($aGroupedSearches as $aNewSearches) {
$iSearchCount += count($aNewSearches);
$aSearches = array_merge($aSearches, $aNewSearches);
if ($iSearchCount > 50) break;
if ($iSearchCount > 50) {
break;
}
}
}
@@ -495,7 +474,9 @@ class Geocode
public function lookup()
{
Debug::newFunction('Geocode::lookup');
if (!$this->sQuery && !$this->aStructuredQuery) return array();
if (!$this->sQuery && !$this->aStructuredQuery) {
return array();
}
Debug::printDebugArray('Geocode', $this);
@@ -517,16 +498,10 @@ class Geocode
if ($this->aCountryCodes) {
$oCtx->setCountryList($this->aCountryCodes);
}
$this->oTokenizer->setCountryRestriction($this->aCountryCodes);
Debug::newSection('Query Preprocessing');
$sNormQuery = $this->normTerm($this->sQuery);
Debug::printVar('Normalized query', $sNormQuery);
$sLanguagePrefArraySQL = $this->oDB->getArraySQL(
$this->oDB->getDBQuotedList($this->aLangPrefOrder)
);
$sQuery = $this->sQuery;
if (!preg_match('//u', $sQuery)) {
userError('Query string is not UTF-8 encoded.');
@@ -576,117 +551,62 @@ class Geocode
}
if ($sSpecialTerm && !$aSearches[0]->hasOperator()) {
$sSpecialTerm = pg_escape_string($sSpecialTerm);
$sToken = $this->oDB->getOne(
'SELECT make_standard_name(:term)',
array(':term' => $sSpecialTerm),
'Cannot decode query. Wrong encoding?'
);
$sSQL = 'SELECT class, type FROM word ';
$sSQL .= ' WHERE word_token in (\' '.$sToken.'\')';
$sSQL .= ' AND class is not null AND class not in (\'place\')';
$aTokens = $this->oTokenizer->tokensForSpecialTerm($sSpecialTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL);
$aNewSearches = array();
foreach ($aSearches as $oSearch) {
foreach ($aSearchWords as $aSearchTerm) {
$oNewSearch = clone $oSearch;
$oNewSearch->setPoiSearch(
Operator::TYPE,
$aSearchTerm['class'],
$aSearchTerm['type']
);
$aNewSearches[] = $oNewSearch;
if (!empty($aTokens)) {
$aNewSearches = array();
$oPosition = new SearchPosition('', 0, 1);
$oPosition->setTokenPosition(0, 1);
foreach ($aSearches as $oSearch) {
foreach ($aTokens as $oToken) {
$aNewSearches = array_merge(
$aNewSearches,
$oToken->extendSearch($oSearch, $oPosition)
);
}
}
$aSearches = $aNewSearches;
}
$aSearches = $aNewSearches;
}
// Split query into phrases
// Commas are used to reduce the search space by indicating where phrases split
$aPhrases = array();
if ($this->aStructuredQuery) {
$aInPhrases = $this->aStructuredQuery;
$bStructuredPhrases = true;
foreach ($this->aStructuredQuery as $iPhrase => $sPhrase) {
$aPhrases[] = new Phrase($sPhrase, $iPhrase);
}
} else {
$aInPhrases = explode(',', $sQuery);
$bStructuredPhrases = false;
foreach (explode(',', $sQuery) as $sPhrase) {
$aPhrases[] = new Phrase($sPhrase, '');
}
}
Debug::printDebugArray('Search context', $oCtx);
Debug::printDebugArray('Base search', empty($aSearches) ? null : $aSearches[0]);
Debug::printVar('Final query phrases', $aInPhrases);
// Convert each phrase to standard form
// Create a list of standard words
// Get all 'sets' of words
// Generate a complete list of all
Debug::newSection('Tokenization');
$aTokens = array();
$aPhrases = array();
foreach ($aInPhrases as $iPhrase => $sPhrase) {
$sPhrase = $this->oDB->getOne(
'SELECT make_standard_name(:phrase)',
array(':phrase' => $sPhrase),
'Cannot normalize query string (is it a UTF-8 string?)'
);
if (trim($sPhrase)) {
$oPhrase = new Phrase($sPhrase, is_string($iPhrase) ? $iPhrase : '');
$oPhrase->addTokens($aTokens);
$aPhrases[] = $oPhrase;
}
}
Debug::printVar('Tokens', $aTokens);
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$oValidTokens->addTokensFromDB(
$this->oDB,
$aTokens,
$this->aCountryCodes,
$sNormQuery,
$this->oNormalizer
);
$oValidTokens = $this->oTokenizer->extractTokensFromPhrases($aPhrases);
if ($oValidTokens->count() > 0) {
$oCtx->setFullNameWords($oValidTokens->getFullWordIDs());
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] == ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^ ([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^ [0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
$aPhrases = array_filter($aPhrases, function ($oPhrase) {
return $oPhrase->getWordSets() !== null;
});
// Any words that have failed completely?
// TODO: suggestions
Debug::printGroupTable('Valid Tokens', $oValidTokens->debugInfo());
foreach ($aPhrases as $oPhrase) {
$oPhrase->computeWordSets($oValidTokens);
}
Debug::printDebugTable('Phrases', $aPhrases);
Debug::newSection('Search candidates');
$aGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens, $bStructuredPhrases);
$aGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens);
if ($this->bReverseInPlan) {
if (!$this->aStructuredQuery) {
// Reverse phrase array and also reverse the order of the wordsets in
// the first and final phrase. Don't bother about phrases in the middle
// because order in the address doesn't matter.
@@ -695,7 +615,7 @@ class Geocode
if (count($aPhrases) > 1) {
$aPhrases[count($aPhrases)-1]->invertWordSets();
}
$aReverseGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens, false);
$aReverseGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens);
foreach ($aGroupedSearches as $aSearches) {
foreach ($aSearches as $aSearch) {
@@ -714,7 +634,9 @@ class Geocode
$aGroupedSearches = array();
foreach ($aSearches as $aSearch) {
if ($aSearch->getRank() < $this->iMaxRank) {
if (!isset($aGroupedSearches[$aSearch->getRank()])) $aGroupedSearches[$aSearch->getRank()] = array();
if (!isset($aGroupedSearches[$aSearch->getRank()])) {
$aGroupedSearches[$aSearch->getRank()] = array();
}
$aGroupedSearches[$aSearch->getRank()][] = $aSearch;
}
}
@@ -728,7 +650,9 @@ class Geocode
$sHash = serialize($aSearch);
if (isset($aSearchHash[$sHash])) {
unset($aGroupedSearches[$iGroup][$iSearch]);
if (empty($aGroupedSearches[$iGroup])) unset($aGroupedSearches[$iGroup]);
if (empty($aGroupedSearches[$iGroup])) {
unset($aGroupedSearches[$iGroup]);
}
} else {
$aSearchHash[$sHash] = 1;
}
@@ -772,20 +696,27 @@ class Geocode
}
}
if ($iQueryLoop > 20) break;
if ($iQueryLoop > 20) {
break;
}
}
if (!empty($aResults)) {
$aSplitResults = Result::splitResults($aResults);
Debug::printVar('Split results', $aSplitResults);
if ($iGroupLoop <= 4 && empty($aSplitResults['tail'])
&& reset($aSplitResults['head'])->iResultRank > 0) {
if ($iGroupLoop <= 4
&& reset($aSplitResults['head'])->iResultRank > 0
&& $iGroupedRank !== array_key_last($aGroupedSearches)) {
// Haven't found an exact match for the query yet.
// Therefore add result from the next group level.
$aNextResults = $aSplitResults['head'];
foreach ($aNextResults as $oRes) {
$oRes->iResultRank--;
}
foreach ($aSplitResults['tail'] as $oRes) {
$oRes->iResultRank--;
$aNextResults[$oRes->iId] = $oRes;
}
$aResults = array();
} else {
$aResults = $aSplitResults['head'];
@@ -833,7 +764,6 @@ class Geocode
foreach ($aResults as $oResult) {
if (($this->iMaxAddressRank == 30 &&
($oResult->iTable == Result::TABLE_OSMLINE
|| $oResult->iTable == Result::TABLE_AUX
|| $oResult->iTable == Result::TABLE_TIGER))
|| in_array($oResult->iId, $aFilteredIDs)
) {
@@ -843,9 +773,9 @@ class Geocode
$aResults = $tempIDs;
}
if (!empty($aResults)) break;
if ($iGroupLoop > 4) break;
if ($iQueryLoop > 30) break;
if (!empty($aResults) || $iGroupLoop > 4 || $iQueryLoop > 30) {
break;
}
}
} else {
// Just interpret as a reverse geocode
@@ -863,10 +793,8 @@ class Geocode
// No results? Done
if (empty($aResults)) {
if ($this->bFallback) {
if ($this->fallbackStructuredQuery()) {
return $this->lookup();
}
if ($this->bFallback && $this->fallbackStructuredQuery()) {
return $this->lookup();
}
return array();
@@ -885,7 +813,9 @@ class Geocode
$aRecheckWords = preg_split('/\b[\s,\\-]*/u', $sQuery);
foreach ($aRecheckWords as $i => $sWord) {
if (!preg_match('/[\pL\pN]/', $sWord)) unset($aRecheckWords[$i]);
if (!preg_match('/[\pL\pN]/', $sWord)) {
unset($aRecheckWords[$i]);
}
}
Debug::printVar('Recheck words', $aRecheckWords);
@@ -945,7 +875,9 @@ class Geocode
foreach ($aRecheckWords as $i => $sWord) {
if (stripos($sAddress, $sWord)!==false) {
$iCountWords++;
if (preg_match('/(^|,)\s*'.preg_quote($sWord, '/').'\s*(,|$)/', $sAddress)) $iCountWords += 0.1;
if (preg_match('/(^|,)\s*'.preg_quote($sWord, '/').'\s*(,|$)/', $sAddress)) {
$iCountWords += 0.1;
}
}
}
@@ -962,15 +894,8 @@ class Geocode
$aToFilter = $aSearchResults;
$aSearchResults = array();
$bFirst = true;
foreach ($aToFilter as $aResult) {
$this->aExcludePlaceIDs[$aResult['place_id']] = $aResult['place_id'];
if ($bFirst) {
$fLat = $aResult['lat'];
$fLon = $aResult['lon'];
if (isset($aResult['zoom'])) $iZoom = $aResult['zoom'];
$bFirst = false;
}
if (!$this->oPlaceLookup->doDeDupe() || (!isset($aOSMIDDone[$aResult['osm_type'].$aResult['osm_id']])
&& !isset($aClassTypeNameDone[$aResult['osm_type'].$aResult['class'].$aResult['type'].$aResult['name'].$aResult['admin_level']]))
) {
@@ -980,7 +905,9 @@ class Geocode
}
// Absolute limit on number of results
if (count($aSearchResults) >= $this->iFinalLimit) break;
if (count($aSearchResults) >= $this->iFinalLimit) {
break;
}
}
Debug::printVar('Post-filter results', $aSearchResults);
@@ -994,7 +921,6 @@ class Geocode
'Structured query' => $this->aStructuredQuery,
'Name keys' => Debug::fmtArrayVals($this->aLangPrefOrder),
'Excluded place IDs' => Debug::fmtArrayVals($this->aExcludePlaceIDs),
'Try reversed query'=> $this->bReverseInPlan,
'Limit (for searches)' => $this->iLimit,
'Limit (for results)'=> $this->iFinalLimit,
'Country codes' => Debug::fmtArrayVals($this->aCountryCodes),

View File

@@ -90,14 +90,16 @@ class ParameterParser
$aLanguages = array();
$sLangString = $this->getString('accept-language', $sFallback);
if ($sLangString) {
if (preg_match_all('/(([a-z]{1,8})([-_][a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $sLangString, $aLanguagesParse, PREG_SET_ORDER)) {
foreach ($aLanguagesParse as $iLang => $aLanguage) {
$aLanguages[$aLanguage[1]] = isset($aLanguage[5])?(float)$aLanguage[5]:1 - ($iLang/100);
if (!isset($aLanguages[$aLanguage[2]])) $aLanguages[$aLanguage[2]] = $aLanguages[$aLanguage[1]]/10;
if ($sLangString
&& preg_match_all('/(([a-z]{1,8})([-_][a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $sLangString, $aLanguagesParse, PREG_SET_ORDER)
) {
foreach ($aLanguagesParse as $iLang => $aLanguage) {
$aLanguages[$aLanguage[1]] = isset($aLanguage[5])?(float)$aLanguage[5]:1 - ($iLang/100);
if (!isset($aLanguages[$aLanguage[2]])) {
$aLanguages[$aLanguage[2]] = $aLanguages[$aLanguage[1]]/10;
}
arsort($aLanguages);
}
arsort($aLanguages);
}
if (empty($aLanguages) && CONST_Default_Language) {
$aLanguages[CONST_Default_Language] = 1;

View File

@@ -16,8 +16,6 @@ class Phrase
private $sPhrase;
// Element type for structured searches.
private $sPhraseType;
// Space-separated words of the phrase.
private $aWords;
// Possible segmentations of the phrase.
private $aWordSets;
@@ -38,7 +36,14 @@ class Phrase
{
$this->sPhrase = trim($sPhrase);
$this->sPhraseType = $sPhraseType;
$this->aWords = explode(' ', $this->sPhrase);
}
/**
* Get the orginal phrase of the string.
*/
public function getPhrase()
{
return $this->sPhrase;
}
/**
@@ -63,30 +68,6 @@ class Phrase
return $this->aWordSets;
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
public function addTokens(&$aTokens)
{
$iNumWords = count($this->aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $this->aWords[$i];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$this->aWords[$j];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
}
}
}
/**
* Invert the set of possible segmentations.
*
@@ -99,21 +80,27 @@ class Phrase
}
}
public function computeWordSets($oTokens)
public function computeWordSets($aWords, $oTokens)
{
$iNumWords = count($this->aWords);
$iNumWords = count($aWords);
if ($iNumWords == 0) {
$this->aWordSets = null;
return;
}
// Caches the word set for the partial phrase up to word i.
$aSetCache = array_fill(0, $iNumWords, array());
// Initialise first element of cache. There can only be the word.
if ($oTokens->containsAny($this->aWords[0])) {
$aSetCache[0][] = array($this->aWords[0]);
if ($oTokens->containsAny($aWords[0])) {
$aSetCache[0][] = array($aWords[0]);
}
// Now do the next elements using what we already have.
for ($i = 1; $i < $iNumWords; $i++) {
for ($j = $i; $j > 0; $j--) {
$sPartial = $j == $i ? $this->aWords[$j] : $this->aWords[$j].' '.$sPartial;
$sPartial = $j == $i ? $aWords[$j] : $aWords[$j].' '.$sPartial;
if (!empty($aSetCache[$j - 1]) && $oTokens->containsAny($sPartial)) {
$aPartial = array($sPartial);
foreach ($aSetCache[$j - 1] as $aSet) {
@@ -136,7 +123,7 @@ class Phrase
}
// finally the current full phrase
$sPartial = $this->aWords[0].' '.$sPartial;
$sPartial = $aWords[0].' '.$sPartial;
if ($oTokens->containsAny($sPartial)) {
$aSetCache[$i][] = array($sPartial);
}
@@ -153,7 +140,6 @@ class Phrase
return array(
'Type' => $this->sPhraseType,
'Phrase' => $this->sPhrase,
'Words' => $this->aWords,
'WordSets' => $this->aWordSets
);
}

View File

@@ -2,8 +2,8 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/AddressDetails.php');
require_once(CONST_BasePath.'/lib/Result.php');
require_once(CONST_LibDir.'/AddressDetails.php');
require_once(CONST_LibDir.'/Result.php');
class PlaceLookup
{
@@ -89,20 +89,36 @@ class PlaceLookup
{
$aParams = array();
if ($this->bAddressDetails) $aParams['addressdetails'] = '1';
if ($this->bExtraTags) $aParams['extratags'] = '1';
if ($this->bNameDetails) $aParams['namedetails'] = '1';
if ($this->bAddressDetails) {
$aParams['addressdetails'] = '1';
}
if ($this->bExtraTags) {
$aParams['extratags'] = '1';
}
if ($this->bNameDetails) {
$aParams['namedetails'] = '1';
}
if ($this->bIncludePolygonAsText) $aParams['polygon_text'] = '1';
if ($this->bIncludePolygonAsGeoJSON) $aParams['polygon_geojson'] = '1';
if ($this->bIncludePolygonAsKML) $aParams['polygon_kml'] = '1';
if ($this->bIncludePolygonAsSVG) $aParams['polygon_svg'] = '1';
if ($this->bIncludePolygonAsText) {
$aParams['polygon_text'] = '1';
}
if ($this->bIncludePolygonAsGeoJSON) {
$aParams['polygon_geojson'] = '1';
}
if ($this->bIncludePolygonAsKML) {
$aParams['polygon_kml'] = '1';
}
if ($this->bIncludePolygonAsSVG) {
$aParams['polygon_svg'] = '1';
}
if ($this->fPolygonSimplificationThreshold > 0.0) {
$aParams['polygon_threshold'] = $this->fPolygonSimplificationThreshold;
}
if (!$this->bDeDupe) $aParams['dedupe'] = '0';
if (!$this->bDeDupe) {
$aParams['dedupe'] = '0';
}
return $aParams;
}
@@ -147,8 +163,9 @@ class PlaceLookup
private function langAddressSql($sHousenumber)
{
if ($this->bAddressDetails)
if ($this->bAddressDetails) {
return ''; // langaddress will be computed from address details
}
return 'get_address_by_language(place_id,'.$sHousenumber.','.$this->aLangPrefOrderSql.') AS langaddress,';
}
@@ -234,12 +251,20 @@ class PlaceLookup
$sSQL .= ' housenumber,';
$sSQL .= ' country_code, ';
$sSQL .= ' importance, ';
if (!$this->bDeDupe) $sSQL .= 'place_id,';
if (!$this->bAddressDetails) $sSQL .= 'langaddress, ';
if (!$this->bDeDupe) {
$sSQL .= 'place_id,';
}
if (!$this->bAddressDetails) {
$sSQL .= 'langaddress, ';
}
$sSQL .= ' placename, ';
$sSQL .= ' ref, ';
if ($this->bExtraTags) $sSQL .= 'extratags, ';
if ($this->bNameDetails) $sSQL .= 'name, ';
if ($this->bExtraTags) {
$sSQL .= 'extratags, ';
}
if ($this->bNameDetails) {
$sSQL .= 'name, ';
}
$sSQL .= ' extra_place ';
$aSubSelects[] = $sSQL;
@@ -260,8 +285,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('-1');
$sSQL .= ' postcode as placename,';
$sSQL .= ' postcode as ref,';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra,';
if ($this->bNameDetails) $sSQL .= 'null::text AS names,';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra,';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names,';
}
$sSQL .= ' ST_x(geometry) AS lon, ST_y(geometry) AS lat,';
$sSQL .= ' (0.75-(rank_search::float/40)) AS importance, ';
$sSQL .= $this->addressImportanceSql('geometry', 'lp.parent_place_id');
@@ -298,8 +327,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('housenumber_for_place');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra,';
if ($this->bNameDetails) $sSQL .= 'null::text AS names,';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra,';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names,';
}
$sSQL .= ' st_x(centroid) AS lon, ';
$sSQL .= ' st_y(centroid) AS lat,';
$sSQL .= ' -1.15 AS importance, ';
@@ -344,8 +377,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('housenumber_for_place');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra, ';
if ($this->bNameDetails) $sSQL .= 'null::text AS names, ';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra, ';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names, ';
}
$sSQL .= ' st_x(centroid) AS lon, ';
$sSQL .= ' st_y(centroid) AS lat, ';
// slightly smaller than the importance for normal houses
@@ -373,42 +410,6 @@ class PlaceLookup
$aSubSelects[] = $sSQL;
}
if (CONST_Use_Aux_Location_data) {
$sPlaceIDs = Result::joinIdsByTable($aResults, Result::TABLE_AUX);
if ($sPlaceIDs) {
$sHousenumbers = Result::sqlHouseNumberTable($aResults, Result::TABLE_AUX);
$sSQL = ' SELECT ';
$sSQL .= " 'L' AS osm_type, ";
$sSQL .= ' place_id AS osm_id, ';
$sSQL .= " 'place' AS class,";
$sSQL .= " 'house' AS type, ";
$sSQL .= ' null::smallint AS admin_level, ';
$sSQL .= ' 30 AS rank_search,';
$sSQL .= ' 30 AS rank_address, ';
$sSQL .= ' place_id,';
$sSQL .= ' parent_place_id, ';
$sSQL .= ' housenumber,';
$sSQL .= " 'us' AS country_code, ";
$sSQL .= $this->langAddressSql('-1');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra, ';
if ($this->bNameDetails) $sSQL .= 'null::text AS names, ';
$sSQL .= ' ST_X(centroid) AS lon, ';
$sSQL .= ' ST_Y(centroid) AS lat, ';
$sSQL .= ' -1.10 AS importance, ';
$sSQL .= $this->addressImportanceSql(
'centroid',
'location_property_aux.parent_place_id'
);
$sSQL .= ' null::text AS extra_place ';
$sSQL .= ' FROM location_property_aux ';
$sSQL .= " WHERE place_id in ($sPlaceIDs) ";
$aSubSelects[] = $sSQL;
}
}
}
if (empty($aSubSelects)) {
@@ -484,67 +485,83 @@ class PlaceLookup
{
$aOutlineResult = array();
if (!$iPlaceID) return $aOutlineResult;
if (!$iPlaceID) {
return $aOutlineResult;
}
if (CONST_Search_AreaPolygons) {
// Get the bounding box and outline polygon
$sSQL = 'select place_id,0 as numfeatures,st_area(geometry) as area,';
if ($fLonReverse != null && $fLatReverse != null) {
$sSQL .= ' ST_Y(closest_point) as centrelat,';
$sSQL .= ' ST_X(closest_point) as centrelon,';
} else {
$sSQL .= ' ST_Y(centroid) as centrelat, ST_X(centroid) as centrelon,';
}
$sSQL .= ' ST_YMin(geometry) as minlat,ST_YMax(geometry) as maxlat,';
$sSQL .= ' ST_XMin(geometry) as minlon,ST_XMax(geometry) as maxlon';
if ($this->bIncludePolygonAsGeoJSON) $sSQL .= ',ST_AsGeoJSON(geometry) as asgeojson';
if ($this->bIncludePolygonAsKML) $sSQL .= ',ST_AsKML(geometry) as askml';
if ($this->bIncludePolygonAsSVG) $sSQL .= ',ST_AsSVG(geometry) as assvg';
if ($this->bIncludePolygonAsText) $sSQL .= ',ST_AsText(geometry) as astext';
if ($fLonReverse != null && $fLatReverse != null) {
$sFrom = ' from (SELECT * , CASE WHEN (class = \'highway\') AND (ST_GeometryType(geometry) = \'ST_LineString\') THEN ';
$sFrom .=' ST_ClosestPoint(geometry, ST_SetSRID(ST_Point('.$fLatReverse.','.$fLonReverse.'),4326))';
$sFrom .=' ELSE centroid END AS closest_point';
$sFrom .= ' from placex where place_id = '.$iPlaceID.') as plx';
} else {
$sFrom = ' from placex where place_id = '.$iPlaceID;
}
if ($this->fPolygonSimplificationThreshold > 0) {
$sSQL .= ' from (select place_id,centroid,ST_SimplifyPreserveTopology(geometry,'.$this->fPolygonSimplificationThreshold.') as geometry'.$sFrom.') as plx';
} else {
$sSQL .= $sFrom;
// Get the bounding box and outline polygon
$sSQL = 'select place_id,0 as numfeatures,st_area(geometry) as area,';
if ($fLonReverse != null && $fLatReverse != null) {
$sSQL .= ' ST_Y(closest_point) as centrelat,';
$sSQL .= ' ST_X(closest_point) as centrelon,';
} else {
$sSQL .= ' ST_Y(centroid) as centrelat, ST_X(centroid) as centrelon,';
}
$sSQL .= ' ST_YMin(geometry) as minlat,ST_YMax(geometry) as maxlat,';
$sSQL .= ' ST_XMin(geometry) as minlon,ST_XMax(geometry) as maxlon';
if ($this->bIncludePolygonAsGeoJSON) {
$sSQL .= ',ST_AsGeoJSON(geometry) as asgeojson';
}
if ($this->bIncludePolygonAsKML) {
$sSQL .= ',ST_AsKML(geometry) as askml';
}
if ($this->bIncludePolygonAsSVG) {
$sSQL .= ',ST_AsSVG(geometry) as assvg';
}
if ($this->bIncludePolygonAsText) {
$sSQL .= ',ST_AsText(geometry) as astext';
}
if ($fLonReverse != null && $fLatReverse != null) {
$sFrom = ' from (SELECT * , CASE WHEN (class = \'highway\') AND (ST_GeometryType(geometry) = \'ST_LineString\') THEN ';
$sFrom .=' ST_ClosestPoint(geometry, ST_SetSRID(ST_Point('.$fLatReverse.','.$fLonReverse.'),4326))';
$sFrom .=' ELSE centroid END AS closest_point';
$sFrom .= ' from placex where place_id = '.$iPlaceID.') as plx';
} else {
$sFrom = ' from placex where place_id = '.$iPlaceID;
}
if ($this->fPolygonSimplificationThreshold > 0) {
$sSQL .= ' from (select place_id,centroid,ST_SimplifyPreserveTopology(geometry,'.$this->fPolygonSimplificationThreshold.') as geometry'.$sFrom.') as plx';
} else {
$sSQL .= $sFrom;
}
$aPointPolygon = $this->oDB->getRow($sSQL, null, 'Could not get outline');
if ($aPointPolygon && $aPointPolygon['place_id']) {
if ($aPointPolygon['centrelon'] !== null && $aPointPolygon['centrelat'] !== null) {
$aOutlineResult['lat'] = $aPointPolygon['centrelat'];
$aOutlineResult['lon'] = $aPointPolygon['centrelon'];
}
$aPointPolygon = $this->oDB->getRow($sSQL, null, 'Could not get outline');
if ($aPointPolygon && $aPointPolygon['place_id']) {
if ($aPointPolygon['centrelon'] !== null && $aPointPolygon['centrelat'] !== null) {
$aOutlineResult['lat'] = $aPointPolygon['centrelat'];
$aOutlineResult['lon'] = $aPointPolygon['centrelon'];
}
if ($this->bIncludePolygonAsGeoJSON) $aOutlineResult['asgeojson'] = $aPointPolygon['asgeojson'];
if ($this->bIncludePolygonAsKML) $aOutlineResult['askml'] = $aPointPolygon['askml'];
if ($this->bIncludePolygonAsSVG) $aOutlineResult['assvg'] = $aPointPolygon['assvg'];
if ($this->bIncludePolygonAsText) $aOutlineResult['astext'] = $aPointPolygon['astext'];
if (abs($aPointPolygon['minlat'] - $aPointPolygon['maxlat']) < 0.0000001) {
$aPointPolygon['minlat'] = $aPointPolygon['minlat'] - $fRadius;
$aPointPolygon['maxlat'] = $aPointPolygon['maxlat'] + $fRadius;
}
if (abs($aPointPolygon['minlon'] - $aPointPolygon['maxlon']) < 0.0000001) {
$aPointPolygon['minlon'] = $aPointPolygon['minlon'] - $fRadius;
$aPointPolygon['maxlon'] = $aPointPolygon['maxlon'] + $fRadius;
}
$aOutlineResult['aBoundingBox'] = array(
(string)$aPointPolygon['minlat'],
(string)$aPointPolygon['maxlat'],
(string)$aPointPolygon['minlon'],
(string)$aPointPolygon['maxlon']
);
if ($this->bIncludePolygonAsGeoJSON) {
$aOutlineResult['asgeojson'] = $aPointPolygon['asgeojson'];
}
if ($this->bIncludePolygonAsKML) {
$aOutlineResult['askml'] = $aPointPolygon['askml'];
}
if ($this->bIncludePolygonAsSVG) {
$aOutlineResult['assvg'] = $aPointPolygon['assvg'];
}
if ($this->bIncludePolygonAsText) {
$aOutlineResult['astext'] = $aPointPolygon['astext'];
}
if (abs($aPointPolygon['minlat'] - $aPointPolygon['maxlat']) < 0.0000001) {
$aPointPolygon['minlat'] = $aPointPolygon['minlat'] - $fRadius;
$aPointPolygon['maxlat'] = $aPointPolygon['maxlat'] + $fRadius;
}
if (abs($aPointPolygon['minlon'] - $aPointPolygon['maxlon']) < 0.0000001) {
$aPointPolygon['minlon'] = $aPointPolygon['minlon'] - $fRadius;
$aPointPolygon['maxlon'] = $aPointPolygon['maxlon'] + $fRadius;
}
$aOutlineResult['aBoundingBox'] = array(
(string)$aPointPolygon['minlat'],
(string)$aPointPolygon['maxlat'],
(string)$aPointPolygon['minlon'],
(string)$aPointPolygon['maxlon']
);
}
// as a fallback we generate a bounding box without knowing the size of the geometry

View File

@@ -13,8 +13,7 @@ class Result
const TABLE_PLACEX = 0;
const TABLE_POSTCODE = 1;
const TABLE_OSMLINE = 2;
const TABLE_AUX = 3;
const TABLE_TIGER = 4;
const TABLE_TIGER = 3;
/// Database table that contains the result.
public $iTable;
@@ -26,6 +25,8 @@ class Result
public $iExactMatches = 0;
/// Subranking within the results (the higher the worse).
public $iResultRank = 0;
/// Address rank of the result.
public $iAddressRank;
public function debugInfo()
{
@@ -54,6 +55,27 @@ class Result
}
)));
}
public static function joinIdsByTableMinRank($aResults, $iTable, $iMinAddressRank)
{
return join(',', array_keys(array_filter(
$aResults,
function ($aValue) use ($iTable, $iMinAddressRank) {
return $aValue->iTable == $iTable && $aValue->iAddressRank >= $iMinAddressRank;
}
)));
}
public static function joinIdsByTableMaxRank($aResults, $iTable, $iMaxAddressRank)
{
return join(',', array_keys(array_filter(
$aResults,
function ($aValue) use ($iTable, $iMaxAddressRank) {
return $aValue->iTable == $iTable && $aValue->iAddressRank <= $iMaxAddressRank;
}
)));
}
public static function sqlHouseNumberTable($aResults, $iTable)
{
$sHousenumbers = '';
@@ -84,7 +106,7 @@ class Result
foreach ($aResults as $oRes) {
if ($oRes->iResultRank < $iMinRank) {
$aTail = array_merge($aTail, $aHead);
$aTail += $aHead;
$aHead = array($oRes->iId => $oRes);
$iMinRank = $oRes->iResultRank;
} elseif ($oRes->iResultRank == $iMinRank) {

View File

@@ -2,7 +2,7 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/Result.php');
require_once(CONST_LibDir.'/Result.php');
class ReverseGeocode
{
@@ -74,8 +74,6 @@ class ReverseGeocode
protected function lookupLargeArea($sPointSQL, $iMaxRank)
{
$oResult = null;
if ($iMaxRank > 4) {
$aPlace = $this->lookupPolygon($sPointSQL, $iMaxRank);
if ($aPlace) {
@@ -167,9 +165,13 @@ class ReverseGeocode
{
Debug::newFunction('lookupPolygon');
// polygon search begins at suburb-level
if ($iMaxRank > 25) $iMaxRank = 25;
if ($iMaxRank > 25) {
$iMaxRank = 25;
}
// no polygon search over country-level
if ($iMaxRank < 5) $iMaxRank = 5;
if ($iMaxRank < 5) {
$iMaxRank = 5;
}
// search for polygon
$sSQL = 'SELECT place_id, parent_place_id, rank_address, rank_search FROM';
$sSQL .= '(select place_id, parent_place_id, rank_address, rank_search, country_code, geometry';
@@ -190,7 +192,6 @@ class ReverseGeocode
if ($aPoly) {
// if a polygon is found, search for placenodes begins ...
$iParentPlaceID = $aPoly['parent_place_id'];
$iRankAddress = $aPoly['rank_address'];
$iRankSearch = $aPoly['rank_search'];
$iPlaceID = $aPoly['place_id'];
@@ -242,26 +243,24 @@ class ReverseGeocode
public function lookupPoint($sPointSQL, $bDoInterpolation = true)
{
Debug::newFunction('lookupPoint');
// starts if the search is on POI or street level,
// searches for the nearest POI or street,
// if a street is found and a POI is searched for,
// the nearest POI which the found street is a parent of is choosen.
$iMaxRank = $this->iMaxRank;
// Find the nearest point
$fSearchDiam = 0.006;
$oResult = null;
$aPlace = null;
// for POI or street level
if ($iMaxRank >= 26) {
if ($this->iMaxRank >= 26) {
// starts if the search is on POI or street level,
// searches for the nearest POI or street,
// if a street is found and a POI is searched for,
// the nearest POI which the found street is a parent of is choosen.
$sSQL = 'select place_id,parent_place_id,rank_address,country_code,';
$sSQL .= ' ST_distance('.$sPointSQL.', geometry) as distance';
$sSQL .= ' FROM ';
$sSQL .= ' placex';
$sSQL .= ' WHERE ST_DWithin('.$sPointSQL.', geometry, '.$fSearchDiam.')';
$sSQL .= ' AND';
$sSQL .= ' rank_address between 26 and '.$iMaxRank;
$sSQL .= ' rank_address between 26 and '.$this->iMaxRank;
$sSQL .= ' and (name is not null or housenumber is not null';
$sSQL .= ' or rank_address between 26 and 27)';
$sSQL .= ' and (rank_address between 26 and 27';
@@ -280,34 +279,11 @@ class ReverseGeocode
$iPlaceID = $aPlace['place_id'];
$oResult = new Result($iPlaceID);
$iRankAddress = $aPlace['rank_address'];
$iParentPlaceID = $aPlace['parent_place_id'];
}
if ($bDoInterpolation && $iMaxRank >= 30) {
$fDistance = $fSearchDiam;
if ($aPlace) {
// We can't reliably go from the closest street to an
// interpolation line because the closest interpolation
// may have a different street segments as a parent.
// Therefore allow an interpolation line to take precendence
// even when the street is closer.
$fDistance = $iRankAddress < 28 ? 0.001 : $aPlace['distance'];
}
$aHouse = $this->lookupInterpolation($sPointSQL, $fDistance);
Debug::printVar('Interpolation result', $aPlace);
if ($aHouse) {
$oResult = new Result($aHouse['place_id'], Result::TABLE_OSMLINE);
$oResult->iHouseNumber = closestHouseNumber($aHouse);
$aPlace = $aHouse;
$iRankAddress = 30;
}
}
if ($aPlace) {
// if street and maxrank > streetlevel
if ($iRankAddress <= 27 && $iMaxRank > 27) {
if ($iRankAddress <= 27 && $this->iMaxRank > 27) {
// find the closest object (up to a certain radius) of which the street is a parent of
$sSQL = ' select place_id,';
$sSQL .= ' ST_distance('.$sPointSQL.', geometry) as distance';
@@ -328,7 +304,9 @@ class ReverseGeocode
Debug::printVar('Closest POI result', $aStreet);
if ($aStreet) {
$aPlace = $aStreet;
$oResult = new Result($aStreet['place_id']);
$iRankAddress = 30;
}
}
@@ -351,17 +329,42 @@ class ReverseGeocode
Debug::printVar('Tiger house number result', $aPlaceTiger);
if ($aPlaceTiger) {
$aPlace = $aPlaceTiger;
$oResult = new Result($aPlaceTiger['place_id'], Result::TABLE_TIGER);
$oResult->iHouseNumber = closestHouseNumber($aPlaceTiger);
$iRankAddress = 30;
}
}
} else {
}
if ($bDoInterpolation && $this->iMaxRank >= 30) {
$fDistance = $fSearchDiam;
if ($aPlace) {
// We can't reliably go from the closest street to an
// interpolation line because the closest interpolation
// may have a different street segments as a parent.
// Therefore allow an interpolation line to take precendence
// even when the street is closer.
$fDistance = $iRankAddress < 28 ? 0.001 : $aPlace['distance'];
}
$aHouse = $this->lookupInterpolation($sPointSQL, $fDistance);
Debug::printVar('Interpolation result', $aPlace);
if ($aHouse) {
$oResult = new Result($aHouse['place_id'], Result::TABLE_OSMLINE);
$oResult->iHouseNumber = closestHouseNumber($aHouse);
$aPlace = $aHouse;
}
}
if (!$aPlace) {
// if no POI or street is found ...
$oResult = $this->lookupLargeArea($sPointSQL, 25);
}
} else {
// lower than street level ($iMaxRank < 26 )
$oResult = $this->lookupLargeArea($sPointSQL, $iMaxRank);
$oResult = $this->lookupLargeArea($sPointSQL, $this->iMaxRank);
}
Debug::printVar('Final result', $oResult);

View File

@@ -2,7 +2,7 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/lib.php');
require_once(CONST_LibDir.'/lib.php');
/**

View File

@@ -2,9 +2,9 @@
namespace Nominatim;
require_once(CONST_BasePath.'/lib/SpecialSearchOperator.php');
require_once(CONST_BasePath.'/lib/SearchContext.php');
require_once(CONST_BasePath.'/lib/Result.php');
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
require_once(CONST_LibDir.'/SearchContext.php');
require_once(CONST_LibDir.'/Result.php');
/**
* Description of a single interpretation of a search query.
@@ -67,47 +67,6 @@ class SearchDescription
return $this->iSearchRank;
}
/**
* Make this search a POI search.
*
* In a POI search, objects are not (only) searched by their name
* but also by the primary OSM key/value pair (class and type in Nominatim).
*
* @param integer $iOperator Type of POI search
* @param string $sClass Class (or OSM tag key) of POI.
* @param string $sType Type (or OSM tag value) of POI.
*
* @return void
*/
public function setPoiSearch($iOperator, $sClass, $sType)
{
$this->iOperator = $iOperator;
$this->sClass = $sClass;
$this->sType = $sType;
}
/**
* Check if this might be a full address search.
*
* @return bool True if the search contains name, address and housenumber.
*/
public function looksLikeFullAddress()
{
return (!empty($this->aName))
&& (!empty($this->aAddress) || $this->sCountryCode)
&& preg_match('/[0-9]+/', $this->sHouseNumber);
}
/**
* Check if any operator is set.
*
* @return bool True, if this is a special search operation.
*/
public function hasOperator()
{
return $this->iOperator != Operator::NONE;
}
/**
* Extract key/value pairs from a query.
*
@@ -160,252 +119,234 @@ class SearchDescription
/////////// Search building functions
/**
* Derive new searches by adding a full term to the existing search.
* Create a copy of this search description adding to search rank.
*
* @param object $oSearchTerm Description of the token.
* @param bool $bHasPartial True if there are also tokens of partial terms
* with the same name.
* @param string $sPhraseType Type of phrase the token is contained in.
* @param bool $bFirstToken True if the token is at the beginning of the
* query.
* @param bool $bFirstPhrase True if the token is in the first phrase of
* the query.
* @param bool $bLastToken True if the token is at the end of the query.
* @param integer $iTermCost Cost to add to the current search rank.
*
* @return SearchDescription[] List of derived search descriptions.
* @return object Cloned search description.
*/
public function extendWithFullTerm($oSearchTerm, $bHasPartial, $sPhraseType, $bFirstToken, $bFirstPhrase, $bLastToken)
public function clone($iTermCost)
{
$aNewSearches = array();
$oSearch = clone $this;
$oSearch->iSearchRank += $iTermCost;
if (($sPhraseType == '' || $sPhraseType == 'country')
&& is_a($oSearchTerm, '\Nominatim\Token\Country')
) {
if (!$this->sCountryCode) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->sCountryCode = $oSearchTerm->sCountryCode;
// Country is almost always at the end of the string
// - increase score for finding it anywhere else (optimisation)
if (!$bLastToken) {
$oSearch->iSearchRank += 5;
}
$aNewSearches[] = $oSearch;
}
} elseif (($sPhraseType == '' || $sPhraseType == 'postalcode')
&& is_a($oSearchTerm, '\Nominatim\Token\Postcode')
) {
if (!$this->sPostcode) {
// If we have structured search or this is the first term,
// make the postcode the primary search element.
if ($this->iOperator == Operator::NONE && $bFirstToken) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->iOperator = Operator::POSTCODE;
$oSearch->aAddress = array_merge($this->aAddress, $this->aName);
$oSearch->aName =
array($oSearchTerm->iId => $oSearchTerm->sPostcode);
$aNewSearches[] = $oSearch;
}
// If we have a structured search or this is not the first term,
// add the postcode as an addendum.
if ($this->iOperator != Operator::POSTCODE
&& ($sPhraseType == 'postalcode' || !empty($this->aName))
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
if (strlen($oSearchTerm->sPostcode) < 4) {
$oSearch->iSearchRank += 4 - strlen($oSearchTerm->sPostcode);
}
$oSearch->sPostcode = $oSearchTerm->sPostcode;
$aNewSearches[] = $oSearch;
}
}
} elseif (($sPhraseType == '' || $sPhraseType == 'street')
&& is_a($oSearchTerm, '\Nominatim\Token\HouseNumber')
) {
if (!$this->sHouseNumber && $this->iOperator != Operator::POSTCODE) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->sHouseNumber = $oSearchTerm->sToken;
// sanity check: if the housenumber is not mainly made
// up of numbers, add a penalty
if (preg_match('/\\d/', $oSearch->sHouseNumber) === 0
|| preg_match_all('/[^0-9]/', $oSearch->sHouseNumber, $aMatches) > 2) {
$oSearch->iSearchRank++;
}
if (empty($oSearchTerm->iId)) {
$oSearch->iSearchRank++;
}
// also must not appear in the middle of the address
if (!empty($this->aAddress)
|| (!empty($this->aAddressNonSearch))
|| $this->sPostcode
) {
$oSearch->iSearchRank++;
}
$aNewSearches[] = $oSearch;
// Housenumbers may appear in the name when the place has its own
// address terms.
if ($oSearchTerm->iId !== null
&& ($this->iNamePhrase >= 0 || empty($this->aName))
&& empty($this->aAddress)
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->aAddress = $this->aName;
$oSearch->bRareName = false;
$oSearch->aName = array($oSearchTerm->iId => $oSearchTerm->iId);
$aNewSearches[] = $oSearch;
}
}
} elseif ($sPhraseType == ''
&& is_a($oSearchTerm, '\Nominatim\Token\SpecialTerm')
) {
if ($this->iOperator == Operator::NONE) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$iOp = $oSearchTerm->iOperator;
if ($iOp == Operator::NONE) {
if (!empty($this->aName) || $this->oContext->isBoundedSearch()) {
$iOp = Operator::NAME;
} else {
$iOp = Operator::NEAR;
}
$oSearch->iSearchRank += 2;
}
$oSearch->setPoiSearch(
$iOp,
$oSearchTerm->sClass,
$oSearchTerm->sType
);
$aNewSearches[] = $oSearch;
}
} elseif ($sPhraseType != 'country'
&& is_a($oSearchTerm, '\Nominatim\Token\Word')
) {
$iWordID = $oSearchTerm->iId;
// Full words can only be a name if they appear at the beginning
// of the phrase. In structured search the name must forcably in
// the first phrase. In unstructured search it may be in a later
// phrase when the first phrase is a house number.
if (!empty($this->aName) || !($bFirstPhrase || $sPhraseType == '')) {
if (($sPhraseType == '' || !$bFirstPhrase) && !$bHasPartial) {
$oSearch = clone $this;
$oSearch->iSearchRank += 3 * $oSearchTerm->iTermCount;
$oSearch->aAddress[$iWordID] = $iWordID;
$aNewSearches[] = $oSearch;
}
} else {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->aName = array($iWordID => $iWordID);
if (CONST_Search_NameOnlySearchFrequencyThreshold) {
$oSearch->bRareName =
$oSearchTerm->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold;
}
$aNewSearches[] = $oSearch;
}
}
return $aNewSearches;
return $oSearch;
}
/**
* Derive new searches by adding a partial term to the existing search.
* Check if the search currently includes a name.
*
* @param string $sToken Term for the token.
* @param object $oSearchTerm Description of the token.
* @param bool $bStructuredPhrases True if the search is structured.
* @param integer $iPhrase Number of the phrase the token is in.
* @param array[] $aFullTokens List of full term tokens with the
* same name.
* @param bool bIncludeNonNames If true stop-word tokens are taken into
* account, too.
*
* @return SearchDescription[] List of derived search descriptions.
* @return bool True, if search has a name.
*/
public function extendWithPartialTerm($sToken, $oSearchTerm, $bStructuredPhrases, $iPhrase, $aFullTokens)
public function hasName($bIncludeNonNames = false)
{
// Only allow name terms.
if (!(is_a($oSearchTerm, '\Nominatim\Token\Word'))) {
return array();
return !empty($this->aName)
|| (!empty($this->aNameNonSearch) && $bIncludeNonNames);
}
/**
* Check if the search currently includes an address term.
*
* @return bool True, if any address term is included, including stop-word
* terms.
*/
public function hasAddress()
{
return !empty($this->aAddress) || !empty($this->aAddressNonSearch);
}
/**
* Check if a country restriction is currently included in the search.
*
* @return bool True, if a country restriction is set.
*/
public function hasCountry()
{
return $this->sCountryCode !== '';
}
/**
* Check if a postcode is currently included in the search.
*
* @return bool True, if a postcode is set.
*/
public function hasPostcode()
{
return $this->sPostcode !== '';
}
/**
* Check if a house number is set for the search.
*
* @return bool True, if a house number is set.
*/
public function hasHousenumber()
{
return $this->sHouseNumber !== '';
}
/**
* Check if a special type of place is requested.
*
* param integer iOperator When set, check for the particular
* operator used for the special type.
*
* @return bool True, if speial type is requested or, if requested,
* a special type with the given operator.
*/
public function hasOperator($iOperator = null)
{
return $iOperator === null ? $this->iOperator != Operator::NONE : $this->iOperator == $iOperator;
}
/**
* Add the given token to the list of terms to search for in the address.
*
* @param integer iID ID of term to add.
* @param bool bSearchable Term should be used to search for result
* (i.e. term is not a stop word).
*/
public function addAddressToken($iId, $bSearchable = true)
{
if ($bSearchable) {
$this->aAddress[$iId] = $iId;
} else {
$this->aAddressNonSearch[$iId] = $iId;
}
}
$aNewSearches = array();
$iWordID = $oSearchTerm->iId;
/**
* Add the given full-word token to the list of terms to search for in the
* name.
*
* @param interger iId ID of term to add.
* @param bool bRareName True if the term is infrequent enough to not
* require other constraints for efficient search.
*/
public function addNameToken($iId, $bRareName)
{
$this->aName[$iId] = $iId;
$this->bRareName = $bRareName;
}
if ((!$bStructuredPhrases || $iPhrase > 0)
&& (!empty($this->aName))
&& strpos($sToken, ' ') === false
) {
if ($oSearchTerm->iSearchNameCount < CONST_Max_Word_Frequency) {
$oSearch = clone $this;
$oSearch->iSearchRank += $oSearchTerm->iTermCount + 1;
if (empty($this->aName)) {
$oSearch->iSearchRank++;
}
if (preg_match('#^[0-9]+$#', $sToken)) {
$oSearch->iSearchRank++;
}
$oSearch->aAddress[$iWordID] = $iWordID;
$aNewSearches[] = $oSearch;
} else {
$oSearch = clone $this;
$oSearch->iSearchRank += $oSearchTerm->iTermCount + 1;
$oSearch->aAddressNonSearch[$iWordID] = $iWordID;
if (!empty($aFullTokens)) {
$oSearch->iSearchRank++;
}
$aNewSearches[] = $oSearch;
// revert to the token version?
foreach ($aFullTokens as $oSearchTermToken) {
if (is_a($oSearchTermToken, '\Nominatim\Token\Word')) {
$oSearch = clone $this;
$oSearch->iSearchRank += 3;
$oSearch->aAddress[$oSearchTermToken->iId]
= $oSearchTermToken->iId;
$aNewSearches[] = $oSearch;
}
}
}
/**
* Add the given partial token to the list of terms to search for in
* the name.
*
* @param integer iID ID of term to add.
* @param bool bSearchable Term should be used to search for result
* (i.e. term is not a stop word).
* @param integer iPhraseNumber Index of phrase, where the partial term
* appears.
*/
public function addPartialNameToken($iId, $bSearchable, $iPhraseNumber)
{
if ($bSearchable) {
$this->aName[$iId] = $iId;
} else {
$this->aNameNonSearch[$iId] = $iId;
}
$this->iNamePhrase = $iPhraseNumber;
}
if ((!$this->sPostcode && !$this->aAddress && !$this->aAddressNonSearch)
&& (empty($this->aName) || $this->iNamePhrase == $iPhrase)
) {
$oSearch = clone $this;
$oSearch->iSearchRank += 2;
if (empty($this->aName)) {
$oSearch->iSearchRank += 1;
}
if (preg_match('#^[0-9]+$#', $sToken)) {
$oSearch->iSearchRank += 2;
}
if ($oSearchTerm->iSearchNameCount < CONST_Max_Word_Frequency) {
if (empty($this->aName)
&& CONST_Search_NameOnlySearchFrequencyThreshold
) {
$oSearch->bRareName =
$oSearchTerm->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold;
} else {
$oSearch->bRareName = false;
}
$oSearch->aName[$iWordID] = $iWordID;
} else {
$oSearch->aNameNonSearch[$iWordID] = $iWordID;
}
$oSearch->iNamePhrase = $iPhrase;
$aNewSearches[] = $oSearch;
}
/**
* Set country restriction for the search.
*
* @param string sCountryCode Country code of country to restrict search to.
*/
public function setCountry($sCountryCode)
{
$this->sCountryCode = $sCountryCode;
$this->iNamePhrase = -1;
}
return $aNewSearches;
/**
* Set postcode search constraint.
*
* @param string sPostcode Postcode the result should have.
*/
public function setPostcode($sPostcode)
{
$this->sPostcode = $sPostcode;
$this->iNamePhrase = -1;
}
/**
* Make this search a search for a postcode object.
*
* @param integer iId Token Id for the postcode.
* @param string sPostcode Postcode to look for.
*/
public function setPostcodeAsName($iId, $sPostcode)
{
$this->iOperator = Operator::POSTCODE;
$this->aAddress = array_merge($this->aAddress, $this->aName);
$this->aName = array($iId => $sPostcode);
$this->bRareName = true;
$this->iNamePhrase = -1;
}
/**
* Set house number search cnstraint.
*
* @param string sNumber House number the result should have.
*/
public function setHousenumber($sNumber)
{
$this->sHouseNumber = $sNumber;
$this->iNamePhrase = -1;
}
/**
* Make this search a search for a house number.
*
* @param integer iId Token Id for the house number.
*/
public function setHousenumberAsName($iId)
{
$this->aAddress = array_merge($this->aAddress, $this->aName);
$this->bRareName = false;
$this->aName = array($iId => $iId);
$this->iNamePhrase = -1;
}
/**
* Make this search a POI search.
*
* In a POI search, objects are not (only) searched by their name
* but also by the primary OSM key/value pair (class and type in Nominatim).
*
* @param integer $iOperator Type of POI search
* @param string $sClass Class (or OSM tag key) of POI.
* @param string $sType Type (or OSM tag value) of POI.
*
* @return void
*/
public function setPoiSearch($iOperator, $sClass, $sType)
{
$this->iOperator = $iOperator;
$this->sClass = $sClass;
$this->sType = $sType;
$this->iNamePhrase = -1;
}
public function getNamePhrase()
{
return $this->iNamePhrase;
}
/**
* Get the global search context.
*
* @return object Objects of global search constraints.
*/
public function getContext()
{
return $this->oContext;
}
/////////// Query functions
@@ -426,7 +367,6 @@ class SearchDescription
public function query(&$oDB, $iMinRank, $iMaxRank, $iLimit)
{
$aResults = array();
$iHousenumber = -1;
if ($this->sCountryCode
&& empty($this->aName)
@@ -459,19 +399,24 @@ class SearchDescription
// Now search for housenumber, if housenumber provided. Can be zero.
if (($this->sHouseNumber || $this->sHouseNumber === '0') && !empty($aResults)) {
// Downgrade the rank of the street results, they are missing
// the housenumber.
foreach ($aResults as $oRes) {
$oRes->iResultRank++;
}
$aHnResults = $this->queryHouseNumber($oDB, $aResults);
if (!empty($aHnResults)) {
foreach ($aHnResults as $oRes) {
$aResults[$oRes->iId] = $oRes;
// Downgrade the rank of the street results, they are missing
// the housenumber. Also drop POI places (rank 30) here, they
// cannot be a parent place and therefore must not be shown
// as a result for a search with a missing housenumber.
foreach ($aResults as $oRes) {
if ($oRes->iAddressRank < 28) {
if ($oRes->iAddressRank >= 26) {
$oRes->iResultRank++;
} else {
$oRes->iResultRank += 2;
}
$aHnResults[$oRes->iId] = $oRes;
}
}
$aResults = $aHnResults;
}
// finally get POIs if requested
@@ -623,14 +568,14 @@ class SearchDescription
// too many results are expected for the street, i.e. if the result
// will be narrowed down by an address. Remeber that with ordering
// every single result has to be checked.
if ($this->sHouseNumber && (!empty($this->aAddress) || $this->sPostcode)) {
if ($this->sHouseNumber && ($this->bRareName || !empty($this->aAddress) || $this->sPostcode)) {
$sHouseNumberRegex = '\\\\m'.$this->sHouseNumber.'\\\\M';
$aOrder[] = ' (';
$aOrder[0] .= 'EXISTS(';
$aOrder[0] .= ' SELECT place_id';
$aOrder[0] .= ' FROM placex';
$aOrder[0] .= ' WHERE parent_place_id = search_name.place_id';
$aOrder[0] .= " AND transliteration(housenumber) ~* E'".$sHouseNumberRegex."'";
$aOrder[0] .= " AND housenumber ~* E'".$sHouseNumberRegex."'";
$aOrder[0] .= ' LIMIT 1';
$aOrder[0] .= ') ';
// also housenumbers from interpolation lines table are needed
@@ -727,7 +672,7 @@ class SearchDescription
$aResults = array();
if (!empty($aTerms)) {
$sSQL = 'SELECT place_id,'.$sExactMatchSQL;
$sSQL = 'SELECT place_id, address_rank,'.$sExactMatchSQL;
$sSQL .= ' FROM search_name';
$sSQL .= ' WHERE '.join(' and ', $aTerms);
$sSQL .= ' ORDER BY '.join(', ', $aOrder);
@@ -740,6 +685,7 @@ class SearchDescription
foreach ($aDBResults as $aResult) {
$oResult = new Result($aResult['place_id']);
$oResult->iExactMatches = $aResult['exactmatch'];
$oResult->iAddressRank = $aResult['address_rank'];
$aResults[$aResult['place_id']] = $oResult;
}
}
@@ -750,16 +696,33 @@ class SearchDescription
private function queryHouseNumber(&$oDB, $aRoadPlaceIDs)
{
$aResults = array();
$sPlaceIDs = Result::joinIdsByTable($aRoadPlaceIDs, Result::TABLE_PLACEX);
$sRoadPlaceIDs = Result::joinIdsByTableMaxRank(
$aRoadPlaceIDs,
Result::TABLE_PLACEX,
27
);
$sPOIPlaceIDs = Result::joinIdsByTableMinRank(
$aRoadPlaceIDs,
Result::TABLE_PLACEX,
30
);
if (!$sPlaceIDs) {
$aIDCondition = array();
if ($sRoadPlaceIDs) {
$aIDCondition[] = 'parent_place_id in ('.$sRoadPlaceIDs.')';
}
if ($sPOIPlaceIDs) {
$aIDCondition[] = 'place_id in ('.$sPOIPlaceIDs.')';
}
if (empty($aIDCondition)) {
return $aResults;
}
$sHouseNumberRegex = '\\\\m'.$this->sHouseNumber.'\\\\M';
$sSQL = 'SELECT place_id FROM placex ';
$sSQL .= 'WHERE parent_place_id in ('.$sPlaceIDs.')';
$sSQL .= " AND transliteration(housenumber) ~* E'".$sHouseNumberRegex."'";
$sSQL = 'SELECT place_id FROM placex WHERE';
$sSQL .= " housenumber ~* E'".$sHouseNumberRegex."'";
$sSQL .= ' AND ('.join(' OR ', $aIDCondition).')';
$sSQL .= $this->oContext->excludeSQL(' AND place_id');
Debug::printSQL($sSQL);
@@ -771,11 +734,11 @@ class SearchDescription
$bIsIntHouseNumber= (bool) preg_match('/[0-9]+/', $this->sHouseNumber);
$iHousenumber = intval($this->sHouseNumber);
if ($bIsIntHouseNumber && empty($aResults)) {
if ($bIsIntHouseNumber && $sRoadPlaceIDs && empty($aResults)) {
// if nothing found, search in the interpolation line table
$sSQL = 'SELECT distinct place_id FROM location_property_osmline';
$sSQL .= ' WHERE startnumber is not NULL';
$sSQL .= ' AND parent_place_id in ('.$sPlaceIDs.') AND (';
$sSQL .= ' AND parent_place_id in ('.$sRoadPlaceIDs.') AND (';
if ($iHousenumber % 2 == 0) {
// If housenumber is even, look for housenumber in streets
// with interpolationtype even or all.
@@ -798,24 +761,10 @@ class SearchDescription
}
}
// If nothing found try the aux fallback table
if (CONST_Use_Aux_Location_data && empty($aResults)) {
$sSQL = 'SELECT place_id FROM location_property_aux';
$sSQL .= ' WHERE parent_place_id in ('.$sPlaceIDs.')';
$sSQL .= " AND housenumber = '".$this->sHouseNumber."'";
$sSQL .= $this->oContext->excludeSQL(' AND place_id');
Debug::printSQL($sSQL);
foreach ($oDB->getCol($sSQL) as $iPlaceId) {
$aResults[$iPlaceId] = new Result($iPlaceId, Result::TABLE_AUX);
}
}
// If nothing found then search in Tiger data (location_property_tiger)
if (CONST_Use_US_Tiger_Data && $bIsIntHouseNumber && empty($aResults)) {
if (CONST_Use_US_Tiger_Data && $sRoadPlaceIDs && $bIsIntHouseNumber && empty($aResults)) {
$sSQL = 'SELECT place_id FROM location_property_tiger';
$sSQL .= ' WHERE parent_place_id in ('.$sPlaceIDs.') and (';
$sSQL .= ' WHERE parent_place_id in ('.$sRoadPlaceIDs.') and (';
if ($iHousenumber % 2 == 0) {
$sSQL .= "interpolationtype='even'";
} else {
@@ -1027,7 +976,7 @@ class SearchDescription
'Name terms (stop words)' => $this->aNameNonSearch,
'Address terms' => $this->aAddress,
'Address terms (stop words)' => $this->aAddressNonSearch,
'Address terms (full words)' => $this->aFullNameAddress,
'Address terms (full words)' => $this->aFullNameAddress ?? '',
'Special search' => $this->iOperator,
'Class' => $this->sClass,
'Type' => $this->sType,
@@ -1039,7 +988,7 @@ class SearchDescription
public function dumpAsHtmlTableRow(&$aWordIDs)
{
$kf = function ($k) use (&$aWordIDs) {
return $aWordIDs[$k];
return $aWordIDs[$k] ?? '['.$k.']';
};
echo '<tr>';

View File

@@ -0,0 +1,87 @@
<?php
namespace Nominatim;
/**
* Description of the position of a token within a query.
*/
class SearchPosition
{
private $sPhraseType;
private $iPhrase;
private $iNumPhrases;
private $iToken;
private $iNumTokens;
public function __construct($sPhraseType, $iPhrase, $iNumPhrases)
{
$this->sPhraseType = $sPhraseType;
$this->iPhrase = $iPhrase;
$this->iNumPhrases = $iNumPhrases;
}
public function setTokenPosition($iToken, $iNumTokens)
{
$this->iToken = $iToken;
$this->iNumTokens = $iNumTokens;
}
/**
* Check if the phrase can be of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase is untyped or of the given type.
*/
public function maybePhrase($sType)
{
return $this->sPhraseType == '' || $this->sPhraseType == $sType;
}
/**
* Check if the phrase is exactly of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase of the given type.
*/
public function isPhrase($sType)
{
return $this->sPhraseType == $sType;
}
/**
* Return true if the token is the very first in the query.
*/
public function isFirstToken()
{
return $this->iPhrase == 0 && $this->iToken == 0;
}
/**
* Check if the token is the final one in the query.
*/
public function isLastToken()
{
return $this->iToken + 1 == $this->iNumTokens && $this->iPhrase + 1 == $this->iNumPhrases;
}
/**
* Check if the current token is part of the first phrase in the query.
*/
public function isFirstPhrase()
{
return $this->iPhrase == 0;
}
/**
* Get the phrase position in the query.
*/
public function getPhrase()
{
return $this->iPhrase;
}
}

View File

@@ -7,7 +7,7 @@ class Shell
public function __construct($sBaseCmd, ...$aParams)
{
if (!$sBaseCmd) {
throw new Exception('Command missing in new() call');
throw new \Exception('Command missing in new() call');
}
$this->baseCmd = $sBaseCmd;
$this->aParams = array();
@@ -33,7 +33,9 @@ class Shell
public function addEnvPair($sKey, $sVal)
{
if (isset($sKey) && $sKey && isset($sVal)) {
if (!isset($this->aEnv)) $this->aEnv = $_ENV;
if (!isset($this->aEnv)) {
$this->aEnv = $_ENV;
}
$this->aEnv = array_merge($this->aEnv, array($sKey => $sVal), $_ENV);
}
return $this;
@@ -48,7 +50,7 @@ class Shell
return join(' ', $aEscaped);
}
public function run()
public function run($bExitOnFail = false)
{
$sCmd = $this->escapedCmd();
// $aEnv does not need escaping, proc_open seems to handle it fine
@@ -67,14 +69,16 @@ class Shell
fclose($aPipes[0]); // no stdin
$iStat = proc_close($hProc);
if ($iStat != 0 && $bExitOnFail) {
exit($iStat);
}
return $iStat;
}
private function escapeParam($sParam)
{
if (preg_match('/^-*\w+$/', $sParam)) return $sParam;
return escapeshellarg($sParam);
return (preg_match('/^-*\w+$/', $sParam)) ? $sParam : escapeshellarg($sParam);
}
}

51
lib-php/Status.php Normal file
View File

@@ -0,0 +1,51 @@
<?php
namespace Nominatim;
require_once(CONST_TokenizerDir.'/tokenizer.php');
use Exception;
class Status
{
protected $oDB;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
}
public function status()
{
if (!$this->oDB) {
throw new Exception('No database', 700);
}
try {
$this->oDB->connect();
} catch (\Nominatim\DatabaseError $e) {
throw new Exception('Database connection failed', 700);
}
$oTokenizer = new \Nominatim\Tokenizer($this->oDB);
$oTokenizer->checkStatus();
}
public function dataDate()
{
$sSQL = 'SELECT EXTRACT(EPOCH FROM lastimportdate) FROM import_status LIMIT 1';
$iDataDateEpoch = $this->oDB->getOne($sSQL);
if ($iDataDateEpoch === false) {
throw new Exception('Import date is not available', 705);
}
return $iDataDateEpoch;
}
public function databaseVersion()
{
$sSQL = 'SELECT value FROM nominatim_properties WHERE property = \'database_version\'';
return $this->oDB->getOne($sSQL);
}
}

72
lib-php/TokenCountry.php Normal file
View File

@@ -0,0 +1,72 @@
<?php
namespace Nominatim\Token;
/**
* A country token.
*/
class Country
{
/// Database word id, if available.
private $iId;
/// Two-letter country code (lower-cased).
private $sCountryCode;
public function __construct($iId, $sCountryCode)
{
$this->iId = $iId;
$this->sCountryCode = $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasCountry() && $oPosition->maybePhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$oNewSearch = $oSearch->clone($oPosition->isLastToken() ? 1 : 6);
$oNewSearch->setCountry($this->sCountryCode);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'country',
'Info' => $this->sCountryCode
);
}
public function debugCode()
{
return 'C';
}
}

View File

@@ -0,0 +1,108 @@
<?php
namespace Nominatim\Token;
/**
* A house number token.
*/
class HouseNumber
{
/// Database word id, if available.
private $iId;
/// Normalized house number.
private $sToken;
public function __construct($iId, $sToken)
{
$this->iId = $iId;
$this->sToken = $sToken;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasHousenumber()
&& !$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& $oPosition->maybePhrase('street');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// sanity check: if the housenumber is not mainly made
// up of numbers, add a penalty
$iSearchCost = 1;
if (preg_match('/\\d/', $this->sToken) === 0
|| preg_match_all('/[^0-9]/', $this->sToken, $aMatches) > 2) {
$iSearchCost++;
}
if (!$oSearch->hasOperator(\Nominatim\Operator::NONE)) {
$iSearchCost++;
}
if (empty($this->iId)) {
$iSearchCost++;
}
// also must not appear in the middle of the address
if ($oSearch->hasAddress() || $oSearch->hasPostcode()) {
$iSearchCost++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumber($this->sToken);
$aNewSearches[] = $oNewSearch;
// Housenumbers may appear in the name when the place has its own
// address terms.
if ($this->iId !== null
&& ($oSearch->getNamePhrase() >= 0 || !$oSearch->hasName())
&& !$oSearch->hasAddress()
) {
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumberAsName($this->iId);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'house number',
'Info' => array('nr' => $this->sToken)
);
}
public function debugCode()
{
return 'H';
}
}

126
lib-php/TokenList.php Normal file
View File

@@ -0,0 +1,126 @@
<?php
namespace Nominatim;
require_once(CONST_LibDir.'/TokenCountry.php');
require_once(CONST_LibDir.'/TokenHousenumber.php');
require_once(CONST_LibDir.'/TokenPostcode.php');
require_once(CONST_LibDir.'/TokenSpecialTerm.php');
require_once(CONST_LibDir.'/TokenWord.php');
require_once(CONST_LibDir.'/TokenPartial.php');
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
/**
* Saves information about the tokens that appear in a search query.
*
* Tokens are sorted by their normalized form, the token word. There are different
* kinds of tokens, represented by different Token* classes. Note that
* tokens do not have a common base class. All tokens need to have a field
* with the word id that points to an entry in the `word` database table
* but otherwise the information saved about a token can be very different.
*/
class TokenList
{
// List of list of tokens indexed by their word_token.
private $aTokens = array();
/**
* Return total number of tokens.
*
* @return Integer
*/
public function count()
{
return count($this->aTokens);
}
/**
* Check if there are tokens for the given token word.
*
* @param string $sWord Token word to look for.
*
* @return bool True if there is one or more token for the token word.
*/
public function contains($sWord)
{
return isset($this->aTokens[$sWord]);
}
/**
* Check if there are partial or full tokens for the given word.
*
* @param string $sWord Token word to look for.
*
* @return bool True if there is one or more token for the token word.
*/
public function containsAny($sWord)
{
return isset($this->aTokens[$sWord]);
}
/**
* Get the list of tokens for the given token word.
*
* @param string $sWord Token word to look for.
*
* @return object[] Array of tokens for the given token word or an
* empty array if no tokens could be found.
*/
public function get($sWord)
{
return isset($this->aTokens[$sWord]) ? $this->aTokens[$sWord] : array();
}
public function getFullWordIDs()
{
$ids = array();
foreach ($this->aTokens as $aTokenList) {
foreach ($aTokenList as $oToken) {
if (is_a($oToken, '\Nominatim\Token\Word')) {
$ids[$oToken->getId()] = $oToken->getId();
}
}
}
return $ids;
}
/**
* Add a new token for the given word.
*
* @param string $sWord Word the token describes.
* @param object $oToken Token object to add.
*
* @return void
*/
public function addToken($sWord, $oToken)
{
if (isset($this->aTokens[$sWord])) {
$this->aTokens[$sWord][] = $oToken;
} else {
$this->aTokens[$sWord] = array($oToken);
}
}
public function debugTokenByWordIdList()
{
$aWordsIDs = array();
foreach ($this->aTokens as $sToken => $aWords) {
foreach ($aWords as $aToken) {
$iId = $aToken->getId();
if ($iId !== null) {
$aWordsIDs[$iId] = '#'.$sToken.'('.$aToken->debugCode().' '.$iId.')#';
}
}
}
return $aWordsIDs;
}
public function debugInfo()
{
return $this->aTokens;
}
}

118
lib-php/TokenPartial.php Normal file
View File

@@ -0,0 +1,118 @@
<?php
namespace Nominatim\Token;
/**
* A standard word token.
*/
class Partial
{
/// Database word id, if applicable.
private $iId;
/// Number of appearances in the database.
private $iSearchNameCount;
/// True, if the token consists exclusively of digits and spaces.
private $bNumberToken;
public function __construct($iId, $sToken, $iSearchNameCount)
{
$this->iId = $iId;
$this->bNumberToken = (bool) preg_match('#^[0-9 ]+$#', $sToken);
$this->iSearchNameCount = $iSearchNameCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// Partial token in Address.
if (($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
&& $oSearch->hasName()
) {
$iSearchCost = $this->bNumberToken ? 2 : 1;
if ($this->iSearchNameCount >= CONST_Max_Word_Frequency) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addAddressToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency
);
$aNewSearches[] = $oNewSearch;
}
// Partial token in Name.
if ((!$oSearch->hasPostcode() && !$oSearch->hasAddress())
&& (!$oSearch->hasName(true)
|| $oSearch->getNamePhrase() == $oPosition->getPhrase())
) {
$iSearchCost = 1;
if (!$oSearch->hasName(true)) {
$iSearchCost += 1;
}
if ($this->bNumberToken) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addPartialNameToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency,
$oPosition->getPhrase()
);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'partial',
'Info' => array(
'count' => $this->iSearchNameCount
)
);
}
public function debugCode()
{
return 'w';
}
}

98
lib-php/TokenPostcode.php Normal file
View File

@@ -0,0 +1,98 @@
<?php
namespace Nominatim\Token;
/**
* A postcode token.
*/
class Postcode
{
/// Database word id, if available.
private $iId;
/// Full nomralized postcode (upper cased).
private $sPostcode;
// Optional country code the postcode belongs to (currently unused).
private $sCountryCode;
public function __construct($iId, $sPostcode, $sCountryCode = '')
{
$this->iId = $iId;
$this->sPostcode = $sPostcode;
$this->sCountryCode = empty($sCountryCode) ? '' : $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasPostcode() && $oPosition->maybePhrase('postalcode');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// If we have structured search or this is the first term,
// make the postcode the primary search element.
if ($oSearch->hasOperator(\Nominatim\Operator::NONE) && $oPosition->isFirstToken()) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->setPostcodeAsName($this->iId, $this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
// If we have a structured search or this is not the first term,
// add the postcode as an addendum.
if (!$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& ($oPosition->isPhrase('postalcode') || $oSearch->hasName())
) {
$iPenalty = 1;
if (strlen($this->sPostcode) < 4) {
$iPenalty += 4 - strlen($this->sPostcode);
}
$oNewSearch = $oSearch->clone($iPenalty);
$oNewSearch->setPostcode($this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'postcode',
'Info' => $this->sPostcode.'('.$this->sCountryCode.')'
);
}
public function debugCode()
{
return 'P';
}
}

View File

@@ -0,0 +1,102 @@
<?php
namespace Nominatim\Token;
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
/**
* A word token describing a place type.
*/
class SpecialTerm
{
/// Database word id, if applicable.
private $iId;
/// Class (or OSM tag key) of the place to look for.
private $sClass;
/// Type (or OSM tag value) of the place to look for.
private $sType;
/// Relationship of the operator to the object (see Operator class).
private $iOperator;
public function __construct($iID, $sClass, $sType, $iOperator)
{
$this->iId = $iID;
$this->sClass = $sClass;
$this->sType = $sType;
$this->iOperator = $iOperator;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasOperator() && $oPosition->isPhrase('');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$iSearchCost = 2;
$iOp = $this->iOperator;
if ($iOp == \Nominatim\Operator::NONE) {
if ($oSearch->hasName() || $oSearch->getContext()->isBoundedSearch()) {
$iOp = \Nominatim\Operator::NAME;
} else {
$iOp = \Nominatim\Operator::NEAR;
}
$iSearchCost += 2;
} elseif (!$oPosition->isFirstToken() && !$oPosition->isLastToken()) {
$iSearchCost += 2;
}
if ($oSearch->hasHousenumber()) {
$iSearchCost ++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setPoiSearch($iOp, $this->sClass, $this->sType);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'special term',
'Info' => array(
'class' => $this->sClass,
'type' => $this->sType,
'operator' => \Nominatim\Operator::toString($this->iOperator)
)
);
}
public function debugCode()
{
return 'S';
}
}

102
lib-php/TokenWord.php Normal file
View File

@@ -0,0 +1,102 @@
<?php
namespace Nominatim\Token;
/**
* A standard word token.
*/
class Word
{
/// Database word id, if applicable.
private $iId;
/// Number of appearances in the database.
private $iSearchNameCount;
/// Number of terms in the word.
private $iTermCount;
public function __construct($iId, $iSearchNameCount, $iTermCount)
{
$this->iId = $iId;
$this->iSearchNameCount = $iSearchNameCount;
$this->iTermCount = $iTermCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
// Full words can only be a name if they appear at the beginning
// of the phrase. In structured search the name must forcably in
// the first phrase. In unstructured search it may be in a later
// phrase when the first phrase is a house number.
if ($oSearch->hasName()
|| !($oPosition->isFirstPhrase() || $oPosition->isPhrase(''))
) {
if ($this->iTermCount > 1
&& ($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addAddressToken($this->iId);
return array($oNewSearch);
}
} elseif (!$oSearch->hasName(true)) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addNameToken(
$this->iId,
CONST_Search_NameOnlySearchFrequencyThreshold
&& $this->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold
);
return array($oNewSearch);
}
return array();
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'word',
'Info' => array(
'count' => $this->iSearchNameCount,
'terms' => $this->iTermCount
)
);
}
public function debugCode()
{
return 'W';
}
}

View File

@@ -1,6 +1,7 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_BasePath.'/lib/init-cmd.php');
require_once(CONST_LibDir.'/init-cmd.php');
ini_set('memory_limit', '800M');
ini_set('display_errors', 'stderr');
@@ -11,10 +12,12 @@ $aCMDOptions
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aCMDResult, true, true);
include(CONST_Phrase_Config);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
setupHTTPProxy();
if (true) {
$sURL = 'https://wiki.openstreetmap.org/wiki/Special:Export/Nominatim/Country_Codes';

View File

@@ -1,10 +1,11 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
// Script to extract structured city and street data
// from a running nominatim instance as CSV data
require_once(CONST_BasePath.'/lib/init-cmd.php');
require_once(CONST_BasePath.'/lib/ParameterParser.php');
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/ParameterParser.php');
ini_set('memory_limit', '800M');
$aCMDOptions = array(
@@ -21,6 +22,7 @@
array('restrict-to-osm-node', '', 0, 1, 1, 1, 'int', 'Export only objects that are children of this OSM node'),
array('restrict-to-osm-way', '', 0, 1, 1, 1, 'int', 'Export only objects that are children of this OSM way'),
array('restrict-to-osm-relation', '', 0, 1, 1, 1, 'int', 'Export only objects that are children of this OSM relation'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
"\nAddress ranks: continent, country, state, county, city, suburb, street, path",
'Additional output types: postcode, placeid (placeid for each object)',
"\noutput-format must be a semicolon-separated list of address ranks. Multiple ranks",
@@ -30,6 +32,8 @@
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aCMDResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
$aRankmap = array(
'continent' => 1,
'country' => 4,
@@ -45,7 +49,9 @@
$oDB->connect();
if (isset($aCMDResult['output-type'])) {
if (!isset($aRankmap[$aCMDResult['output-type']])) fail('unknown output-type: '.$aCMDResult['output-type']);
if (!isset($aRankmap[$aCMDResult['output-type']])) {
fail('unknown output-type: '.$aCMDResult['output-type']);
}
$iOutputRank = $aRankmap[$aCMDResult['output-type']];
} else {
$iOutputRank = $aRankmap['street'];
@@ -54,14 +60,18 @@
// Preferred language
$oParams = new Nominatim\ParameterParser();
if (!isset($aCMDResult['language'])) $aCMDResult['language'] = 'xx';
if (!isset($aCMDResult['language'])) {
$aCMDResult['language'] = 'xx';
}
$aLangPrefOrder = $oParams->getPreferredLanguages($aCMDResult['language']);
$sLanguagePrefArraySQL = $oDB->getArraySQL($oDB->getDBQuotedList($aLangPrefOrder));
// output formatting: build up a lookup table that maps address ranks to columns
$aColumnMapping = array();
$iNumCol = 0;
if (!isset($aCMDResult['output-format'])) $aCMDResult['output-format'] = 'street;suburb;city;county;state;country';
if (!isset($aCMDResult['output-format'])) {
$aCMDResult['output-format'] = 'street;suburb;city;county;state;country';
}
foreach (preg_split('/\s*;\s*/', $aCMDResult['output-format']) as $sColumn) {
$bHasData = false;
foreach (preg_split('/\s*,\s*/', $sColumn) as $sRank) {
@@ -76,7 +86,9 @@
}
}
}
if ($bHasData) $iNumCol++;
if ($bHasData) {
$iNumCol++;
}
}
// build the query for objects
@@ -118,7 +130,9 @@
if ($sOsmType) {
$sSQL = 'select place_id from placex where osm_type = :osm_type and osm_id = :osm_id';
$sParentId = $oDB->getOne($sSQL, array('osm_type' => $sOsmType, 'osm_id' => $sOsmId));
if (!$sParentId) fail('Could not find place '.$sOsmType.' '.$sOsmId);
if (!$sParentId) {
fail('Could not find place '.$sOsmType.' '.$sOsmId);
}
}
if ($sParentId) {
$sPlacexSQL .= ' and place_id in (select place_id from place_addressline where address_place_id = '.$sParentId.' and isaddress)';
@@ -132,7 +146,6 @@
$oResults = $oDB->getQueryStatement($sPlacexSQL);
$fOutstream = fopen('php://output', 'w');
while ($aRow = $oResults->fetch()) {
//var_dump($aRow);
$iPlaceID = $aRow['place_id'];
$sSQL = "select rank_address,get_name_by_language(name,$sLanguagePrefArraySQL) as localname from get_addressdata(:place_id, -1)";
$sSQL .= ' WHERE isaddress';

View File

@@ -1,6 +1,11 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
require_once(CONST_BasePath.'/lib/init-cmd.php');
ini_set('memory_limit', '800M');
$aCMDOptions = array(
@@ -10,13 +15,26 @@ $aCMDOptions = array(
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('reverse-only', '', 0, 1, 0, 0, 'bool', 'Warm reverse only'),
array('search-only', '', 0, 1, 0, 0, 'bool', 'Warm search only'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aResult, true, true);
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/Geocode.php');
require_once(CONST_BasePath.'/lib/PlaceLookup.php');
require_once(CONST_BasePath.'/lib/ReverseGeocode.php');
loadSettings($aCMDResult['project-dir'] ?? getcwd());
@define('CONST_Database_DSN', getSetting('DATABASE_DSN'));
@define('CONST_Default_Language', getSetting('DEFAULT_LANGUAGE', false));
@define('CONST_Log_DB', getSettingBool('LOG_DB'));
@define('CONST_Log_File', getSetting('LOG_FILE', false));
@define('CONST_NoAccessControl', getSettingBool('CORS_NOACCESSCONTROL'));
@define('CONST_Places_Max_ID_count', getSetting('LOOKUP_MAX_COUNT'));
@define('CONST_PolygonOutput_MaximumTypes', getSetting('POLYGON_OUTPUT_MAX_TYPES'));
@define('CONST_Search_BatchMode', getSettingBool('SEARCH_BATCH_MODE'));
@define('CONST_Search_NameOnlySearchFrequencyThreshold', getSetting('SEARCH_NAME_ONLY_THRESHOLD'));
@define('CONST_Use_US_Tiger_Data', getSettingBool('USE_US_TIGER_DATA'));
@define('CONST_MapIcon_URL', getSetting('MAPICON_URL', false));
@define('CONST_TokenizerDir', CONST_InstallDir.'/tokenizer');
require_once(CONST_LibDir.'/Geocode.php');
$oDB = new Nominatim\DB();
$oDB->connect();
@@ -44,11 +62,15 @@ if (!$aResult['search-only']) {
$oPlaceLookup->setLanguagePreference(array('en'));
echo 'Warm reverse: ';
if ($bVerbose) echo "\n";
if ($bVerbose) {
echo "\n";
}
for ($i = 0; $i < 1000; $i++) {
$fLat = rand(-9000, 9000) / 100;
$fLon = rand(-18000, 18000) / 100;
if ($bVerbose) echo "$fLat, $fLon = ";
if ($bVerbose) {
echo "$fLat, $fLon = ";
}
$oLookup = $oReverseGeocode->lookup($fLat, $fLon);
$aSearchResults = $oLookup ? $oPlaceLookup->lookup(array($oLookup->iId => $oLookup)) : null;
@@ -61,10 +83,14 @@ if (!$aResult['reverse-only']) {
$oGeocode = new Nominatim\Geocode($oDB);
echo 'Warm search: ';
if ($bVerbose) echo "\n";
if ($bVerbose) {
echo "\n";
}
$sSQL = 'SELECT word FROM word WHERE word is not null ORDER BY search_name_count DESC LIMIT 1000';
foreach ($oDB->getCol($sSQL) as $sWord) {
if ($bVerbose) echo "$sWord = ";
if ($bVerbose) {
echo "$sWord = ";
}
$oGeocode->setLanguagePreference(array('en'));
$oGeocode->setQuery($sWord);

View File

@@ -1,6 +1,6 @@
<?php
require_once(CONST_BasePath.'/lib/Shell.php');
require_once(CONST_LibDir.'/Shell.php');
function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnknown = false)
{
@@ -9,8 +9,12 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
foreach ($aSpec as $aLine) {
if (is_array($aLine)) {
if ($aLine[0]) $aQuick['--'.$aLine[0]] = $aLine;
if ($aLine[1]) $aQuick['-'.$aLine[1]] = $aLine;
if ($aLine[0]) {
$aQuick['--'.$aLine[0]] = $aLine;
}
if ($aLine[1]) {
$aQuick['-'.$aLine[1]] = $aLine;
}
$aCounts[$aLine[0]] = 0;
}
}
@@ -28,7 +32,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
$xVal = array();
for ($n = $aLine[4]; $i < $iSize && $n; $n--) {
$i++;
if ($i >= $iSize || $aArg[$i][0] == '-') showUsage($aSpec, $bExitOnError, 'Parameter of \''.$aLine[0].'\' is missing');
if ($i >= $iSize || $aArg[$i][0] == '-') {
showUsage($aSpec, $bExitOnError, 'Parameter of \''.$aLine[0].'\' is missing');
}
switch ($aLine[6]) {
case 'realpath':
@@ -56,7 +62,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
break;
}
}
if ($aLine[4] == 1) $xVal = $xVal[0];
if ($aLine[4] == 1) {
$xVal = $xVal[0];
}
} else {
$xVal = true;
}
@@ -65,7 +73,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
}
if ($aLine[3] > 1) {
if (!array_key_exists($aLine[0], $aResult)) $aResult[$aLine[0]] = array();
if (!array_key_exists($aLine[0], $aResult)) {
$aResult[$aLine[0]] = array();
}
$aResult[$aLine[0]][] = $xVal;
} else {
$aResult[$aLine[0]] = $xVal;
@@ -75,18 +85,23 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
}
}
if (array_key_exists('help', $aResult)) showUsage($aSpec);
if ($bUnknown && $bExitOnUnknown) showUsage($aSpec, $bExitOnError, 'Unknown option \''.$bUnknown.'\'');
if (array_key_exists('help', $aResult)) {
showUsage($aSpec);
}
if ($bUnknown && $bExitOnUnknown) {
showUsage($aSpec, $bExitOnError, 'Unknown option \''.$bUnknown.'\'');
}
foreach ($aSpec as $aLine) {
if (is_array($aLine)) {
if ($aCounts[$aLine[0]] < $aLine[2]) showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is missing');
if ($aCounts[$aLine[0]] > $aLine[3]) showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is pressent too many times');
switch ($aLine[6]) {
case 'bool':
if (!array_key_exists($aLine[0], $aResult))
$aResult[$aLine[0]] = false;
break;
if ($aCounts[$aLine[0]] < $aLine[2]) {
showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is missing');
}
if ($aCounts[$aLine[0]] > $aLine[3]) {
showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is pressent too many times');
}
if ($aLine[6] == 'bool' && !array_key_exists($aLine[0], $aResult)) {
$aResult[$aLine[0]] = false;
}
}
}
@@ -109,8 +124,12 @@ function showUsage($aSpec, $bExit = false, $sError = false)
echo "\n";
}
$aNames = array();
if ($aLine[1]) $aNames[] = '-'.$aLine[1];
if ($aLine[0]) $aNames[] = '--'.$aLine[0];
if ($aLine[1]) {
$aNames[] = '-'.$aLine[1];
}
if ($aLine[0]) {
$aNames[] = '--'.$aLine[0];
}
$sName = join(', ', $aNames);
echo ' '.$sName.str_repeat(' ', 30-strlen($sName)).$aLine[7]."\n";
} else {
@@ -144,54 +163,29 @@ function repeatWarnings()
}
function runSQLScript($sScript, $bfatal = true, $bVerbose = false, $bIgnoreErrors = false)
function setupHTTPProxy()
{
// Convert database DSN to psql parameters
$aDSNInfo = \Nominatim\DB::parseDSN(CONST_Database_DSN);
if (!isset($aDSNInfo['port']) || !$aDSNInfo['port']) $aDSNInfo['port'] = 5432;
$oCmd = new \Nominatim\Shell('psql');
$oCmd->addParams('--port', $aDSNInfo['port']);
$oCmd->addParams('--dbname', $aDSNInfo['database']);
if (isset($aDSNInfo['hostspec']) && $aDSNInfo['hostspec']) {
$oCmd->addParams('--host', $aDSNInfo['hostspec']);
}
if (isset($aDSNInfo['username']) && $aDSNInfo['username']) {
$oCmd->addParams('--username', $aDSNInfo['username']);
}
if (isset($aDSNInfo['password'])) {
$oCmd->addEnvPair('PGPASSWORD', $aDSNInfo['password']);
}
if (!$bVerbose) {
$oCmd->addParams('--quiet');
}
if ($bfatal && !$bIgnoreErrors) {
$oCmd->addParams('-v', 'ON_ERROR_STOP=1');
if (!getSettingBool('HTTP_PROXY')) {
return;
}
$aDescriptors = array(
0 => array('pipe', 'r'),
1 => STDOUT,
2 => STDERR
$sProxy = 'tcp://'.getSetting('HTTP_PROXY_HOST').':'.getSetting('HTTP_PROXY_PROT');
$aHeaders = array();
$sLogin = getSetting('HTTP_PROXY_LOGIN');
$sPassword = getSetting('HTTP_PROXY_PASSWORD');
if ($sLogin && $sPassword) {
$sAuth = base64_encode($sLogin.':'.$sPassword);
$aHeaders = array('Proxy-Authorization: Basic '.$sAuth);
}
$aProxyHeader = array(
'proxy' => $sProxy,
'request_fulluri' => true,
'header' => $aHeaders
);
$ahPipes = null;
$hProcess = @proc_open($oCmd->escapedCmd(), $aDescriptors, $ahPipes, null, $oCmd->aEnv);
if (!is_resource($hProcess)) {
fail('unable to start pgsql');
}
if (!$bVerbose) {
fwrite($ahPipes[0], 'set client_min_messages to WARNING;');
}
while (strlen($sScript)) {
$iWritten = fwrite($ahPipes[0], $sScript);
if ($iWritten <= 0) break;
$sScript = substr($sScript, $iWritten);
}
fclose($ahPipes[0]);
$iReturn = proc_close($hProcess);
if ($bfatal && $iReturn > 0) {
fail("pgsql returned with error code ($iReturn)");
}
$aContext = array('http' => $aProxyHeader, 'https' => $aProxyHeader);
stream_context_set_default($aContext);
}

13
lib-php/dotenv_loader.php Normal file
View File

@@ -0,0 +1,13 @@
<?php
require('Symfony/Component/Dotenv/autoload.php');
function loadDotEnv()
{
$dotenv = new \Symfony\Component\Dotenv\Dotenv();
$dotenv->load(CONST_ConfigDir.'/env.defaults');
if (file_exists('.env')) {
$dotenv->load('.env');
}
}

5
lib-php/init-cmd.php Normal file
View File

@@ -0,0 +1,5 @@
<?php
require_once('init.php');
require_once('cmd.php');
require_once('DebugNone.php');

View File

@@ -12,7 +12,7 @@ require_once(CONST_Debug ? 'DebugHtml.php' : 'DebugNone.php');
function userError($sMsg)
{
throw new Exception($sMsg, 400);
throw new \Exception($sMsg, 400);
}
@@ -20,7 +20,7 @@ function exception_handler_json($exception)
{
http_response_code($exception->getCode());
header('Content-type: application/json; charset=utf-8');
include(CONST_BasePath.'/lib/template/error-json.php');
include(CONST_LibDir.'/template/error-json.php');
exit();
}
@@ -29,7 +29,7 @@ function exception_handler_xml($exception)
http_response_code($exception->getCode());
header('Content-type: text/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8" ?>'."\n";
include(CONST_BasePath.'/lib/template/error-xml.php');
include(CONST_LibDir.'/template/error-xml.php');
exit();
}
@@ -37,7 +37,7 @@ function shutdown_exception_handler_xml()
{
$error = error_get_last();
if ($error !== null && $error['type'] === E_ERROR) {
exception_handler_xml(new Exception($error['message'], 500));
exception_handler_xml(new \Exception($error['message'], 500));
}
}
@@ -45,7 +45,7 @@ function shutdown_exception_handler_json()
{
$error = error_get_last();
if ($error !== null && $error['type'] === E_ERROR) {
exception_handler_json(new Exception($error['message'], 500));
exception_handler_json(new \Exception($error['message'], 500));
}
}
@@ -81,6 +81,10 @@ if (CONST_NoAccessControl) {
header('Access-Control-Allow-Headers: '.$_SERVER['HTTP_ACCESS_CONTROL_REQUEST_HEADERS']);
}
}
if (isset($_SERVER['REQUEST_METHOD']) && $_SERVER['REQUEST_METHOD'] == 'OPTIONS') exit;
if (isset($_SERVER['REQUEST_METHOD']) && $_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
exit;
}
if (CONST_Debug) header('Content-type: text/html; charset=utf-8');
if (CONST_Debug) {
header('Content-type: text/html; charset=utf-8');
}

4
lib-php/init.php Normal file
View File

@@ -0,0 +1,4 @@
<?php
require_once(CONST_LibDir.'/lib.php');
require_once(CONST_LibDir.'/DB.php');

View File

@@ -1,10 +1,42 @@
<?php
function loadSettings($sProjectDir)
{
@define('CONST_InstallDir', $sProjectDir);
// Temporary hack to set the direcory via environment instead of
// the installed scripts. Neither setting is part of the official
// set of settings.
defined('CONST_ConfigDir') or define('CONST_ConfigDir', $_SERVER['NOMINATIM_CONFIGDIR']);
}
function getSetting($sConfName, $sDefault = null)
{
$sValue = $_SERVER['NOMINATIM_'.$sConfName];
if ($sDefault !== null && !$sValue) {
return $sDefault;
}
return $sValue;
}
function getSettingBool($sConfName)
{
$sVal = strtolower(getSetting($sConfName));
return strcmp($sVal, 'yes') == 0
|| strcmp($sVal, 'true') == 0
|| strcmp($sVal, '1') == 0;
}
function fail($sError, $sUserError = false)
{
if (!$sUserError) $sUserError = $sError;
if (!$sUserError) {
$sUserError = $sError;
}
error_log('ERROR: '.$sError);
var_dump($sUserError)."\n";
var_dump($sUserError);
echo "\n";
exit(-1);
}
@@ -52,8 +84,9 @@ function getDatabaseDate(&$oDB)
function byImportance($a, $b)
{
if ($a['importance'] != $b['importance'])
if ($a['importance'] != $b['importance']) {
return ($a['importance'] > $b['importance']?-1:1);
}
return $a['foundorder'] <=> $b['foundorder'];
}
@@ -165,17 +198,6 @@ function parseLatLon($sQuery)
return array($sFound, $fQueryLat, $fQueryLon);
}
function createPointsAroundCenter($fLon, $fLat, $fRadius)
{
$iSteps = max(8, min(100, ($fRadius * 40000)^2));
$fStepSize = (2*pi())/$iSteps;
$aPolyPoints = array();
for ($f = 0; $f < 2*pi(); $f += $fStepSize) {
$aPolyPoints[] = array('', $fLon+($fRadius*sin($f)), $fLat+($fRadius*cos($f)) );
}
return $aPolyPoints;
}
function closestHouseNumber($aRow)
{
$fHouse = $aRow['startnumber']
@@ -196,24 +218,11 @@ function closestHouseNumber($aRow)
return max(min($aRow['endnumber'], $iHn), $aRow['startnumber']);
}
function getSearchRankLabel($iRank)
{
if (!isset($iRank)) return 'unknown';
if ($iRank < 2) return 'continent';
if ($iRank < 4) return 'sea';
if ($iRank < 8) return 'country';
if ($iRank < 12) return 'state';
if ($iRank < 16) return 'county';
if ($iRank == 16) return 'city';
if ($iRank == 17) return 'town / island';
if ($iRank == 18) return 'village / hamlet';
if ($iRank == 20) return 'suburb';
if ($iRank == 21) return 'postcode area';
if ($iRank == 22) return 'croft / farm / locality / islet';
if ($iRank == 23) return 'postcode area';
if ($iRank == 25) return 'postcode point';
if ($iRank == 26) return 'street / major landmark';
if ($iRank == 27) return 'minory street / path';
if ($iRank == 28) return 'house / building';
return 'other: ' . $iRank;
if (!function_exists('array_key_last')) {
function array_key_last(array $array)
{
if (!empty($array)) {
return key(array_slice($array, -1, 1, true));
}
}
}

View File

@@ -5,15 +5,23 @@ function logStart(&$oDB, $sType = '', $sQuery = '', $aLanguageList = array())
{
$fStartTime = microtime(true);
$aStartTime = explode('.', $fStartTime);
if (!isset($aStartTime[1])) $aStartTime[1] = '0';
if (!isset($aStartTime[1])) {
$aStartTime[1] = '0';
}
$sOutputFormat = '';
if (isset($_GET['format'])) $sOutputFormat = $_GET['format'];
if (isset($_GET['format'])) {
$sOutputFormat = $_GET['format'];
}
if ($sType == 'reverse') {
$sOutQuery = (isset($_GET['lat'])?$_GET['lat']:'').'/';
if (isset($_GET['lon'])) $sOutQuery .= $_GET['lon'];
if (isset($_GET['zoom'])) $sOutQuery .= '/'.$_GET['zoom'];
if (isset($_GET['lon'])) {
$sOutQuery .= $_GET['lon'];
}
if (isset($_GET['zoom'])) {
$sOutQuery .= '/'.$_GET['zoom'];
}
} else {
$sOutQuery = $sQuery;
}
@@ -28,13 +36,15 @@ function logStart(&$oDB, $sType = '', $sQuery = '', $aLanguageList = array())
);
if (CONST_Log_DB) {
if (isset($_GET['email']))
if (isset($_GET['email'])) {
$sUserAgent = $_GET['email'];
elseif (isset($_SERVER['HTTP_REFERER']))
} elseif (isset($_SERVER['HTTP_REFERER'])) {
$sUserAgent = $_SERVER['HTTP_REFERER'];
elseif (isset($_SERVER['HTTP_USER_AGENT']))
} elseif (isset($_SERVER['HTTP_USER_AGENT'])) {
$sUserAgent = $_SERVER['HTTP_USER_AGENT'];
else $sUserAgent = '';
} else {
$sUserAgent = '';
}
$sSQL = 'insert into new_query_log (type,starttime,query,ipaddress,useragent,language,format,searchterm)';
$sSQL .= ' values (';
$sSQL .= join(',', $oDB->getDBQuotedList(array(
@@ -60,7 +70,9 @@ function logEnd(&$oDB, $hLog, $iNumResults)
if (CONST_Log_DB) {
$aEndTime = explode('.', $fEndTime);
if (!$aEndTime[1]) $aEndTime[1] = '0';
if (!$aEndTime[1]) {
$aEndTime[1] = '0';
}
$sEndTime = date('Y-m-d H:i:s', $aEndTime[0]).'.'.$aEndTime[1];
$sSQL = 'update new_query_log set endtime = '.$oDB->getDBQuoted($sEndTime).', results = '.$iNumResults;

View File

@@ -0,0 +1,21 @@
<?php
$phpPhraseSettingsFile = $argv[1];
$jsonPhraseSettingsFile = dirname($phpPhraseSettingsFile).'/'.basename($phpPhraseSettingsFile, '.php').'.json';
if (file_exists($phpPhraseSettingsFile) && !file_exists($jsonPhraseSettingsFile)) {
include $phpPhraseSettingsFile;
$data = array();
if (isset($aTagsBlacklist)) {
$data['blackList'] = $aTagsBlacklist;
}
if (isset($aTagsWhitelist)) {
$data['whiteList'] = $aTagsWhitelist;
}
$jsonFile = fopen($jsonPhraseSettingsFile, 'w');
fwrite($jsonFile, json_encode($data));
fclose($jsonFile);
}

30
lib-php/output.php Normal file
View File

@@ -0,0 +1,30 @@
<?php
function formatOSMType($sType, $bIncludeExternal = true)
{
if ($sType == 'N') {
return 'node';
}
if ($sType == 'W') {
return 'way';
}
if ($sType == 'R') {
return 'relation';
}
if (!$bIncludeExternal) {
return '';
}
if ($sType == 'T') {
return 'way';
}
if ($sType == 'I') {
return 'way';
}
// not handled: P, L
return '';
}

19
lib-php/setup_functions.php Executable file
View File

@@ -0,0 +1,19 @@
<?php
function getOsm2pgsqlBinary()
{
$sBinary = getSetting('OSM2PGSQL_BINARY');
return $sBinary ? $sBinary : CONST_Default_Osm2pgsql;
}
function getImportStyle()
{
$sStyle = getSetting('IMPORT_STYLE');
if (in_array($sStyle, array('admin', 'street', 'address', 'full', 'extratags'))) {
return CONST_ConfigDir.'/import-'.$sStyle.'.style';
}
return $sStyle;
}

View File

@@ -5,9 +5,11 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
javascript_renderData($aFilteredPlaces);
} else {
$aFilteredPlaces = array(
@@ -17,7 +19,9 @@ if (empty($aPlace)) {
)
);
if (isset($aPlace['place_id'])) $aFilteredPlaces['properties']['geocoding']['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['properties']['geocoding']['place_id'] = $aPlace['place_id'];
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['properties']['geocoding']['osm_type'] = $sOSMType;

View File

@@ -3,9 +3,11 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
javascript_renderData($aFilteredPlaces);
} else {
$aFilteredPlaces = array(
@@ -13,7 +15,9 @@ if (empty($aPlace)) {
'properties' => array()
);
if (isset($aPlace['place_id'])) $aFilteredPlaces['properties']['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['properties']['place_id'] = $aPlace['place_id'];
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['properties']['osm_type'] = $sOSMType;
@@ -36,8 +40,12 @@ if (empty($aPlace)) {
if (isset($aPlace['address'])) {
$aFilteredPlaces['properties']['address'] = $aPlace['address']->getAddressNames();
}
if (isset($aPlace['sExtraTags'])) $aFilteredPlaces['properties']['extratags'] = $aPlace['sExtraTags'];
if (isset($aPlace['sNameDetails'])) $aFilteredPlaces['properties']['namedetails'] = $aPlace['sNameDetails'];
if (isset($aPlace['sExtraTags'])) {
$aFilteredPlaces['properties']['extratags'] = $aPlace['sExtraTags'];
}
if (isset($aPlace['sNameDetails'])) {
$aFilteredPlaces['properties']['namedetails'] = $aPlace['sNameDetails'];
}
if (isset($aPlace['aBoundingBox'])) {
$aFilteredPlaces['bbox'] = array(

View File

@@ -3,19 +3,27 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
} else {
if (isset($aPlace['place_id'])) $aFilteredPlaces['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['place_id'] = $aPlace['place_id'];
}
$aFilteredPlaces['licence'] = 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright';
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['osm_type'] = $sOSMType;
$aFilteredPlaces['osm_id'] = $aPlace['osm_id'];
}
if (isset($aPlace['lat'])) $aFilteredPlaces['lat'] = $aPlace['lat'];
if (isset($aPlace['lon'])) $aFilteredPlaces['lon'] = $aPlace['lon'];
if (isset($aPlace['lat'])) {
$aFilteredPlaces['lat'] = $aPlace['lat'];
}
if (isset($aPlace['lon'])) {
$aFilteredPlaces['lon'] = $aPlace['lon'];
}
if ($sOutputFormat == 'jsonv2' || $sOutputFormat == 'geojson') {
$aFilteredPlaces['place_rank'] = $aPlace['rank_search'];
@@ -35,8 +43,12 @@ if (empty($aPlace)) {
if (isset($aPlace['address'])) {
$aFilteredPlaces['address'] = $aPlace['address']->getAddressNames();
}
if (isset($aPlace['sExtraTags'])) $aFilteredPlaces['extratags'] = $aPlace['sExtraTags'];
if (isset($aPlace['sNameDetails'])) $aFilteredPlaces['namedetails'] = $aPlace['sNameDetails'];
if (isset($aPlace['sExtraTags'])) {
$aFilteredPlaces['extratags'] = $aPlace['sExtraTags'];
}
if (isset($aPlace['sNameDetails'])) {
$aFilteredPlaces['namedetails'] = $aPlace['sNameDetails'];
}
if (isset($aPlace['aBoundingBox'])) {
$aFilteredPlaces['boundingbox'] = $aPlace['aBoundingBox'];

View File

@@ -12,17 +12,29 @@ echo " querystring='".htmlspecialchars($_SERVER['QUERY_STRING'], ENT_QUOTES)."'"
echo ">\n";
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
echo "<error>$sError</error>";
else echo '<error>Unable to geocode</error>';
} else {
echo '<error>Unable to geocode</error>';
}
} else {
echo '<result';
if ($aPlace['place_id']) echo ' place_id="'.$aPlace['place_id'].'"';
if ($aPlace['place_id']) {
echo ' place_id="'.$aPlace['place_id'].'"';
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) echo ' osm_type="'.$sOSMType.'"'.' osm_id="'.$aPlace['osm_id'].'"';
if ($aPlace['ref']) echo ' ref="'.htmlspecialchars($aPlace['ref']).'"';
if (isset($aPlace['lat'])) echo ' lat="'.htmlspecialchars($aPlace['lat']).'"';
if (isset($aPlace['lon'])) echo ' lon="'.htmlspecialchars($aPlace['lon']).'"';
if ($sOSMType) {
echo ' osm_type="'.$sOSMType.'"'.' osm_id="'.$aPlace['osm_id'].'"';
}
if ($aPlace['ref']) {
echo ' ref="'.htmlspecialchars($aPlace['ref']).'"';
}
if (isset($aPlace['lat'])) {
echo ' lat="'.htmlspecialchars($aPlace['lat']).'"';
}
if (isset($aPlace['lon'])) {
echo ' lon="'.htmlspecialchars($aPlace['lon']).'"';
}
if (isset($aPlace['aBoundingBox'])) {
echo ' boundingbox="';
echo join(',', $aPlace['aBoundingBox']);

View File

@@ -43,29 +43,26 @@ $aPlaceDetails['centroid'] = array(
$aPlaceDetails['geometry'] = json_decode($aPointDetails['asgeojson']);
$funcMapAddressLine = function ($aFull) {
$aMapped = array(
'localname' => $aFull['localname'],
'place_id' => isset($aFull['place_id']) ? (int) $aFull['place_id'] : null,
'osm_id' => isset($aFull['osm_id']) ? (int) $aFull['osm_id'] : null,
'osm_type' => isset($aFull['osm_type']) ? $aFull['osm_type'] : null,
'place_type' => isset($aFull['place_type']) ? $aFull['place_type'] : null,
'class' => $aFull['class'],
'type' => $aFull['type'],
'admin_level' => isset($aFull['admin_level']) ? (int) $aFull['admin_level'] : null,
'rank_address' => $aFull['rank_address'] ? (int) $aFull['rank_address'] : null,
'distance' => (float) $aFull['distance'],
'isaddress' => isset($aFull['isaddress']) ? (bool) $aFull['isaddress'] : null
);
return $aMapped;
return array(
'localname' => $aFull['localname'],
'place_id' => isset($aFull['place_id']) ? (int) $aFull['place_id'] : null,
'osm_id' => isset($aFull['osm_id']) ? (int) $aFull['osm_id'] : null,
'osm_type' => isset($aFull['osm_type']) ? $aFull['osm_type'] : null,
'place_type' => isset($aFull['place_type']) ? $aFull['place_type'] : null,
'class' => $aFull['class'],
'type' => $aFull['type'],
'admin_level' => isset($aFull['admin_level']) ? (int) $aFull['admin_level'] : null,
'rank_address' => $aFull['rank_address'] ? (int) $aFull['rank_address'] : null,
'distance' => (float) $aFull['distance'],
'isaddress' => isset($aFull['isaddress']) ? (bool) $aFull['isaddress'] : null
);
};
$funcMapKeyword = function ($aFull) {
$aMapped = array(
'id' => (int) $aFull['word_id'],
'token' => $aFull['word_token']
);
return $aMapped;
return array(
'id' => (int) $aFull['word_id'],
'token' => $aFull['word_token']
);
};
if ($aAddressLines) {
@@ -81,10 +78,14 @@ if ($bIncludeKeywords) {
if ($aPlaceSearchNameKeywords) {
$aPlaceDetails['keywords']['name'] = array_map($funcMapKeyword, $aPlaceSearchNameKeywords);
} else {
$aPlaceDetails['keywords']['name'] = array();
}
if ($aPlaceSearchAddressKeywords) {
$aPlaceDetails['keywords']['address'] = array_map($funcMapKeyword, $aPlaceSearchAddressKeywords);
} else {
$aPlaceDetails['keywords']['address'] = array();
}
}
@@ -92,11 +93,15 @@ if ($bIncludeHierarchy) {
if ($bGroupHierarchy) {
$aPlaceDetails['hierarchy'] = array();
foreach ($aHierarchyLines as $aAddressLine) {
if ($aAddressLine['type'] == 'yes') $sType = $aAddressLine['class'];
else $sType = $aAddressLine['type'];
if ($aAddressLine['type'] == 'yes') {
$sType = $aAddressLine['class'];
} else {
$sType = $aAddressLine['type'];
}
if (!isset($aPlaceDetails['hierarchy'][$sType]))
if (!isset($aPlaceDetails['hierarchy'][$sType])) {
$aPlaceDetails['hierarchy'][$sType] = array();
}
$aPlaceDetails['hierarchy'][$sType][] = $funcMapAddressLine($aAddressLine);
}
} else {

View File

@@ -8,4 +8,4 @@
$error['details'] = $exception->getFile() . '('. $exception->getLine() . ')';
}
echo javascript_renderData(array('error' => $error));
javascript_renderData(array('error' => $error));

View File

@@ -5,7 +5,9 @@ $aOutput['licence'] = 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm
$aOutput['batch'] = array();
foreach ($aBatchResults as $aSearchResults) {
if (!$aSearchResults) $aSearchResults = array();
if (!$aSearchResults) {
$aSearchResults = array();
}
$aFilteredPlaces = array();
foreach ($aSearchResults as $iResNum => $aPointDetails) {
$aPlace = array(

View File

@@ -9,7 +9,9 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
)
);
if (isset($aPointDetails['place_id'])) $aPlace['properties']['geocoding']['place_id'] = $aPointDetails['place_id'];
if (isset($aPointDetails['place_id'])) {
$aPlace['properties']['geocoding']['place_id'] = $aPointDetails['place_id'];
}
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['properties']['geocoding']['osm_type'] = $sOSMType;

View File

@@ -8,7 +8,7 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
'place_id'=>$aPointDetails['place_id'],
)
);
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['properties']['osm_type'] = $sOSMType;
@@ -58,8 +58,12 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
}
if (isset($aPointDetails['sExtraTags'])) $aPlace['properties']['extratags'] = $aPointDetails['sExtraTags'];
if (isset($aPointDetails['sNameDetails'])) $aPlace['properties']['namedetails'] = $aPointDetails['sNameDetails'];
if (isset($aPointDetails['sExtraTags'])) {
$aPlace['properties']['extratags'] = $aPointDetails['sExtraTags'];
}
if (isset($aPointDetails['sNameDetails'])) {
$aPlace['properties']['namedetails'] = $aPointDetails['sNameDetails'];
}
$aFilteredPlaces[] = $aPlace;
}

View File

@@ -6,7 +6,7 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
'place_id'=>$aPointDetails['place_id'],
'licence'=>'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
);
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['osm_type'] = $sOSMType;
@@ -60,8 +60,12 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
$aPlace['geokml'] = $aPointDetails['askml'];
}
if (isset($aPointDetails['sExtraTags'])) $aPlace['extratags'] = $aPointDetails['sExtraTags'];
if (isset($aPointDetails['sNameDetails'])) $aPlace['namedetails'] = $aPointDetails['sNameDetails'];
if (isset($aPointDetails['sExtraTags'])) {
$aPlace['extratags'] = $aPointDetails['sExtraTags'];
}
if (isset($aPointDetails['sNameDetails'])) {
$aPlace['namedetails'] = $aPointDetails['sNameDetails'];
}
$aFilteredPlaces[] = $aPlace;
}

View File

@@ -10,7 +10,9 @@ echo (isset($sXmlRootTag)?$sXmlRootTag:'searchresults');
echo " timestamp='".date(DATE_RFC822)."'";
echo " attribution='Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright'";
echo " querystring='".htmlspecialchars($sQuery, ENT_QUOTES)."'";
if (isset($aMoreParams['viewbox'])) echo " viewbox='".htmlspecialchars($aMoreParams['viewbox'], ENT_QUOTES)."'";
if (isset($aMoreParams['viewbox'])) {
echo " viewbox='".htmlspecialchars($aMoreParams['viewbox'], ENT_QUOTES)."'";
}
if (isset($aMoreParams['exclude_place_ids'])) {
echo " exclude_place_ids='".htmlspecialchars($aMoreParams['exclude_place_ids'])."'";
}

View File

@@ -0,0 +1,246 @@
<?php
namespace Nominatim;
class Tokenizer
{
private $oDB;
private $oNormalizer;
private $oTransliterator;
private $aCountryRestriction;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
$this->oTransliterator = \Transliterator::createFromRules(CONST_Transliteration);
}
public function checkStatus()
{
$sSQL = 'SELECT word_id FROM word limit 1';
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
}
if (!$iWordID) {
throw new \Exception('No value', 704);
}
}
public function setCountryRestriction($aCountries)
{
$this->aCountryRestriction = $aCountries;
}
public function normalizeString($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
private function makeStandardWord($sTerm)
{
return trim($this->oTransliterator->transliterate(' '.$sTerm.' '));
}
public function tokensForSpecialTerm($sTerm)
{
$aResults = array();
$sSQL = "SELECT word_id, info->>'class' as class, info->>'type' as type ";
$sSQL .= ' FROM word WHERE word_token = :term and type = \'S\'';
Debug::printVar('Term', $sTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL, array(':term' => $this->makeStandardWord($sTerm)));
Debug::printVar('Results', $aSearchWords);
foreach ($aSearchWords as $aSearchTerm) {
$aResults[] = new \Nominatim\Token\SpecialTerm(
$aSearchTerm['word_id'],
$aSearchTerm['class'],
$aSearchTerm['type'],
\Nominatim\Operator::TYPE
);
}
Debug::printVar('Special term tokens', $aResults);
return $aResults;
}
public function extractTokensFromPhrases(&$aPhrases)
{
$sNormQuery = '';
$aWordLists = array();
$aTokens = array();
foreach ($aPhrases as $iPhrase => $oPhrase) {
$sNormQuery .= ','.$this->normalizeString($oPhrase->getPhrase());
$sPhrase = $this->makeStandardWord($oPhrase->getPhrase());
Debug::printVar('Phrase', $sPhrase);
if (strlen($sPhrase) > 0) {
$aWords = explode(' ', $sPhrase);
Tokenizer::addTokens($aTokens, $aWords);
$aWordLists[] = $aWords;
} else {
$aWordLists[] = array();
}
}
Debug::printVar('Tokens', $aTokens);
Debug::printVar('WordLists', $aWordLists);
$oValidTokens = $this->computeValidTokens($aTokens, $sNormQuery);
foreach ($aPhrases as $iPhrase => $oPhrase) {
$oPhrase->computeWordSets($aWordLists[$iPhrase], $oValidTokens);
}
return $oValidTokens;
}
private function computeValidTokens($aTokens, $sNormQuery)
{
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$this->addTokensFromDB($oValidTokens, $aTokens, $sNormQuery);
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] != ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^[0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
}
return $oValidTokens;
}
private function addTokensFromDB(&$oValidTokens, $aTokens, $sNormQuery)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, type, word,';
$sSQL .= " info->>'op' as operator,";
$sSQL .= " info->>'class' as class, info->>'type' as ctype,";
$sSQL .= " info->>'count' as count";
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $this->oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $this->oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$iId = (int) $aWord['word_id'];
$sTok = $aWord['word_token'];
switch ($aWord['type']) {
case 'C': // country name tokens
if ($aWord['word'] !== null
&& (!$this->aCountryRestriction
|| in_array($aWord['word'], $this->aCountryRestriction))
) {
$oValidTokens->addToken(
$sTok,
new Token\Country($iId, $aWord['word'])
);
}
break;
case 'H': // house number tokens
$oValidTokens->addToken($sTok, new Token\HouseNumber($iId, $aWord['word_token']));
break;
case 'P': // postcode tokens
// Postcodes are not normalized, so they may have content
// that makes SQL injection possible. Reject postcodes
// that would need special escaping.
if ($aWord['word'] !== null
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$sNormPostcode = $this->normalizeString($aWord['word']);
if (strpos($sNormQuery, $sNormPostcode) !== false) {
$oValidTokens->addToken(
$sTok,
new Token\Postcode($iId, $aWord['word'], null)
);
}
}
break;
case 'S': // tokens for classification terms (special phrases)
if ($aWord['class'] !== null && $aWord['ctype'] !== null) {
$oValidTokens->addToken($sTok, new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['ctype'],
(isset($aWord['operator'])) ? Operator::NEAR : Operator::NONE
));
}
break;
case 'W': // full-word tokens
$oValidTokens->addToken($sTok, new Token\Word(
$iId,
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
));
break;
case 'w': // partial word terms
$oValidTokens->addToken($sTok, new Token\Partial(
$iId,
$aWord['word_token'],
(int) $aWord['count']
));
break;
default:
break;
}
}
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
private static function addTokens(&$aTokens, $aWords)
{
$iNumWords = count($aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $aWords[$i];
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$aWords[$j];
$aTokens[$sPhrase] = $sPhrase;
}
}
}
}

View File

@@ -0,0 +1,266 @@
<?php
namespace Nominatim;
class Tokenizer
{
private $oDB;
private $oNormalizer = null;
private $aCountryRestriction = null;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
}
public function checkStatus()
{
$sStandardWord = $this->oDB->getOne("SELECT make_standard_name('a')");
if ($sStandardWord === false) {
throw new \Exception('Module failed', 701);
}
if ($sStandardWord != 'a') {
throw new \Exception('Module call failed', 702);
}
$sSQL = "SELECT word_id FROM word WHERE word_token IN (' a')";
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
}
if (!$iWordID) {
throw new \Exception('No value', 704);
}
}
public function setCountryRestriction($aCountries)
{
$this->aCountryRestriction = $aCountries;
}
public function normalizeString($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
public function tokensForSpecialTerm($sTerm)
{
$aResults = array();
$sSQL = 'SELECT word_id, class, type FROM word ';
$sSQL .= ' WHERE word_token = \' \' || make_standard_name(:term)';
$sSQL .= ' AND class is not null AND class not in (\'place\')';
Debug::printVar('Term', $sTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL, array(':term' => $sTerm));
Debug::printVar('Results', $aSearchWords);
foreach ($aSearchWords as $aSearchTerm) {
$aResults[] = new \Nominatim\Token\SpecialTerm(
$aSearchTerm['word_id'],
$aSearchTerm['class'],
$aSearchTerm['type'],
\Nominatim\Operator::TYPE
);
}
Debug::printVar('Special term tokens', $aResults);
return $aResults;
}
public function extractTokensFromPhrases(&$aPhrases)
{
// First get the normalized version of all phrases
$sNormQuery = '';
$sSQL = 'SELECT ';
$aParams = array();
foreach ($aPhrases as $iPhrase => $oPhrase) {
$sNormQuery .= ','.$this->normalizeString($oPhrase->getPhrase());
$sSQL .= 'make_standard_name(:' .$iPhrase.') as p'.$iPhrase.',';
$aParams[':'.$iPhrase] = $oPhrase->getPhrase();
}
$sSQL = substr($sSQL, 0, -1);
Debug::printSQL($sSQL);
Debug::printVar('SQL parameters', $aParams);
$aNormPhrases = $this->oDB->getRow($sSQL, $aParams);
Debug::printVar('SQL result', $aNormPhrases);
// now compute all possible tokens
$aWordLists = array();
$aTokens = array();
foreach ($aNormPhrases as $sPhrase) {
if (strlen($sPhrase) > 0) {
$aWords = explode(' ', $sPhrase);
Tokenizer::addTokens($aTokens, $aWords);
$aWordLists[] = $aWords;
} else {
$aWordLists[] = array();
}
}
Debug::printVar('Tokens', $aTokens);
Debug::printVar('WordLists', $aWordLists);
$oValidTokens = $this->computeValidTokens($aTokens, $sNormQuery);
foreach ($aPhrases as $iPhrase => $oPhrase) {
$oPhrase->computeWordSets($aWordLists[$iPhrase], $oValidTokens);
}
return $oValidTokens;
}
private function computeValidTokens($aTokens, $sNormQuery)
{
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$this->addTokensFromDB($oValidTokens, $aTokens, $sNormQuery);
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] != ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^[0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
}
return $oValidTokens;
}
private function addTokensFromDB(&$oValidTokens, $aTokens, $sNormQuery)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, word, class, type, country_code,';
$sSQL .= ' operator, coalesce(search_name_count, 0) as count';
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $this->oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $this->oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$oToken = null;
$iId = (int) $aWord['word_id'];
if ($aWord['class']) {
// Special terms need to appear in their normalized form.
// (postcodes are not normalized in the word table)
$sNormWord = $this->normalizeString($aWord['word']);
if ($aWord['word'] && strpos($sNormQuery, $sNormWord) === false) {
continue;
}
if ($aWord['class'] == 'place' && $aWord['type'] == 'house') {
$oToken = new Token\HouseNumber($iId, trim($aWord['word_token']));
} elseif ($aWord['class'] == 'place' && $aWord['type'] == 'postcode') {
if ($aWord['word']
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$oToken = new Token\Postcode(
$iId,
$aWord['word'],
$aWord['country_code']
);
}
} else {
// near and in operator the same at the moment
$oToken = new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['type'],
$aWord['operator'] ? Operator::NEAR : Operator::NONE
);
}
} elseif ($aWord['country_code']) {
// Filter country tokens that do not match restricted countries.
if (!$this->aCountryRestriction
|| in_array($aWord['country_code'], $this->aCountryRestriction)
) {
$oToken = new Token\Country($iId, $aWord['country_code']);
}
} elseif ($aWord['word_token'][0] == ' ') {
$oToken = new Token\Word(
$iId,
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
);
// For backward compatibility: ignore all partial tokens with more
// than one word.
} elseif (strpos($aWord['word_token'], ' ') === false) {
$oToken = new Token\Partial(
$iId,
$aWord['word_token'],
(int) $aWord['count']
);
}
if ($oToken) {
// remove any leading spaces
if ($aWord['word_token'][0] == ' ') {
$oValidTokens->addToken(substr($aWord['word_token'], 1), $oToken);
} else {
$oValidTokens->addToken($aWord['word_token'], $oToken);
}
}
}
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
private static function addTokens(&$aTokens, $aWords)
{
$iNumWords = count($aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $aWords[$i];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$aWords[$j];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
}
}
}
}

View File

@@ -1,15 +1,15 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
$sOutputFormat = $oParams->getSet('format', array('json'), 'json');
set_exception_handler_by_format($sOutputFormat);
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$sSQL = 'select placex.place_id, country_code,';

View File

@@ -1,9 +1,9 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_BasePath.'/lib/AddressDetails.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/output.php');
require_once(CONST_LibDir.'/AddressDetails.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
@@ -25,7 +25,7 @@ $bIncludeHierarchy = $oParams->getBool('hierarchy', false);
$bGroupHierarchy = $oParams->getBool('group_hierarchy', false);
$bIncludePolygonAsGeoJSON = $oParams->getBool('polygon_geojson', false);
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$sLanguagePrefArraySQL = $oDB->getArraySQL($oDB->getDBQuotedList($aLangPrefOrder));
@@ -53,7 +53,7 @@ if ($sOsmType && $iOsmId > 0) {
// Be nice about our error messages for broken geometry
if (!$sPlaceId) {
if (!$sPlaceId && $oDB->tableExists('import_polygon_error')) {
$sSQL = 'SELECT ';
$sSQL .= ' osm_type, ';
$sSQL .= ' osm_id, ';
@@ -77,33 +77,39 @@ if ($sOsmType && $iOsmId > 0) {
$aPointDetails['error_x'] = 0;
$aPointDetails['error_y'] = 0;
}
include(CONST_BasePath.'/lib/template/details-error-'.$sOutputFormat.'.php');
include(CONST_LibDir.'/template/details-error-'.$sOutputFormat.'.php');
exit;
}
}
if ($sPlaceId === false) {
throw new \Exception('No place with that OSM ID found.', 404);
}
} else {
if ($sPlaceId === false) {
userError('Required parameters missing. Need either osmtype/osmid or place_id.');
}
}
if ($sPlaceId === false) userError('Please select a place id');
$iPlaceID = (int)$sPlaceId;
if (CONST_Use_US_Tiger_Data) {
$iParentPlaceID = $oDB->getOne('SELECT parent_place_id FROM location_property_tiger WHERE place_id = '.$iPlaceID);
if ($iParentPlaceID) $iPlaceID = $iParentPlaceID;
if ($iParentPlaceID) {
$iPlaceID = $iParentPlaceID;
}
}
// interpolated house numbers
$iParentPlaceID = $oDB->getOne('SELECT parent_place_id FROM location_property_osmline WHERE place_id = '.$iPlaceID);
if ($iParentPlaceID) $iPlaceID = $iParentPlaceID;
if ($iParentPlaceID) {
$iPlaceID = $iParentPlaceID;
}
// artificial postcodes
$iParentPlaceID = $oDB->getOne('SELECT parent_place_id FROM location_postcode WHERE place_id = '.$iPlaceID);
if ($iParentPlaceID) $iPlaceID = $iParentPlaceID;
if (CONST_Use_Aux_Location_data) {
$iParentPlaceID = $oDB->getOne('SELECT parent_place_id FROM location_property_aux WHERE place_id = '.$iPlaceID);
if ($iParentPlaceID) $iPlaceID = $iParentPlaceID;
if ($iParentPlaceID) {
$iPlaceID = $iParentPlaceID;
}
$hLog = logStart($oDB, 'details', $_SERVER['QUERY_STRING'], $aLangPrefOrder);
@@ -140,11 +146,10 @@ $sSQL .= " WHERE place_id = $iPlaceID";
$aPointDetails = $oDB->getRow($sSQL, null, 'Could not get details of place object.');
if (!$aPointDetails) {
userError('Unknown place id.');
throw new \Exception('No place with that place ID found.', 404);
}
$aPointDetails['localname'] = $aPointDetails['localname']?$aPointDetails['localname']:$aPointDetails['housenumber'];
$aPointDetails['rank_search_label'] = getSearchRankLabel($aPointDetails['rank_search']); // only used in HTML format
// Get all alternative names (languages, etc)
$sSQL = 'SELECT (each(name)).key,(each(name)).value FROM placex ';
@@ -247,4 +252,4 @@ if ($bIncludeKeywords) {
logEnd($oDB, $hLog, 1);
include(CONST_BasePath.'/lib/template/details-'.$sOutputFormat.'.php');
include(CONST_LibDir.'/template/details-'.$sOutputFormat.'.php');

View File

@@ -1,9 +1,9 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/PlaceLookup.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
@@ -15,7 +15,7 @@ set_exception_handler_by_format($sOutputFormat);
// Preferred language
$aLangPrefOrder = $oParams->getPreferredLanguages();
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$hLog = logStart($oDB, 'place', $_SERVER['QUERY_STRING'], $aLangPrefOrder);
@@ -35,8 +35,10 @@ if (count($aOsmIds) > CONST_Places_Max_ID_count) {
foreach ($aOsmIds as $sItem) {
// Skip empty sItem
if (empty($sItem)) continue;
if (empty($sItem)) {
continue;
}
$sType = $sItem[0];
$iId = (int) substr($sItem, 1);
if ($iId > 0 && ($sType == 'N' || $sType == 'W' || $sType == 'R')) {
@@ -48,7 +50,9 @@ foreach ($aOsmIds as $sItem) {
// key names
$oResult = $oPlace;
unset($oResult['aAddress']);
if (isset($oPlace['aAddress'])) $oResult['address'] = $oPlace['aAddress'];
if (isset($oPlace['aAddress'])) {
$oResult['address'] = $oPlace['aAddress'];
}
if ($sOutputFormat != 'geocodejson') {
unset($oResult['langaddress']);
$oResult['name'] = $oPlace['langaddress'];
@@ -71,7 +75,9 @@ foreach ($aOsmIds as $sItem) {
}
if (CONST_Debug) exit;
if (CONST_Debug) {
exit;
}
$sXmlRootTag = 'lookupresults';
$sQuery = join(',', $aCleanedQueryParts);
@@ -84,4 +90,4 @@ $sMoreURL = '';
logEnd($oDB, $hLog, 1);
$sOutputTemplate = ($sOutputFormat == 'jsonv2') ? 'json' : $sOutputFormat;
include(CONST_BasePath.'/lib/template/search-'.$sOutputTemplate.'.php');
include(CONST_LibDir.'/template/search-'.$sOutputTemplate.'.php');

View File

@@ -1,8 +1,8 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
@@ -13,7 +13,7 @@ $iDays = $oParams->getInt('days', false);
$bReduced = $oParams->getBool('reduced', false);
$sClass = $oParams->getString('class', false);
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$iTotalBroken = (int) $oDB->getOne('SELECT count(*) FROM import_polygon_error');
@@ -30,8 +30,12 @@ while ($iTotalBroken && empty($aPolygons)) {
$iDays++;
}
if ($bReduced) $aWhere[] = "errormessage like 'Area reduced%'";
if ($sClass) $sWhere[] = "class = '".pg_escape_string($sClass)."'";
if ($bReduced) {
$aWhere[] = "errormessage like 'Area reduced%'";
}
if ($sClass) {
$sWhere[] = "class = '".pg_escape_string($sClass)."'";
}
if (!empty($aWhere)) {
$sSQL .= ' WHERE '.join(' and ', $aWhere);

View File

@@ -0,0 +1,12 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/ParameterParser.php');
$oParams = new Nominatim\ParameterParser();
// Format for output
$sOutputFormat = $oParams->getSet('format', array('xml', 'json', 'jsonv2', 'geojson', 'geocodejson'), 'jsonv2');
set_exception_handler_by_format($sOutputFormat);
throw new Exception('Reverse-only import does not support forward searching.', 404);

View File

@@ -1,10 +1,10 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/PlaceLookup.php');
require_once(CONST_BasePath.'/lib/ReverseGeocode.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
@@ -16,7 +16,7 @@ set_exception_handler_by_format($sOutputFormat);
// Preferred language
$aLangPrefOrder = $oParams->getPreferredLanguages();
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$hLog = logStart($oDB, 'reverse', $_SERVER['QUERY_STRING'], $aLangPrefOrder);
@@ -84,4 +84,4 @@ if ($sOutputFormat == 'geocodejson') {
}
$sOutputTemplate = ($sOutputFormat == 'jsonv2') ? 'json' : $sOutputFormat;
include(CONST_BasePath.'/lib/template/address-'.$sOutputTemplate.'.php');
include(CONST_LibDir.'/template/address-'.$sOutputTemplate.'.php');

View File

@@ -1,12 +1,12 @@
<?php
require_once(CONST_BasePath.'/lib/init-website.php');
require_once(CONST_BasePath.'/lib/log.php');
require_once(CONST_BasePath.'/lib/Geocode.php');
require_once(CONST_BasePath.'/lib/output.php');
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/Geocode.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oDB = new Nominatim\DB();
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$oParams = new Nominatim\ParameterParser();
@@ -15,15 +15,6 @@ $oGeocode = new Nominatim\Geocode($oDB);
$aLangPrefOrder = $oParams->getPreferredLanguages();
$oGeocode->setLanguagePreference($aLangPrefOrder);
if (CONST_Search_ReversePlanForAll
|| isset($aLangPrefOrder['name:de'])
|| isset($aLangPrefOrder['name:ru'])
|| isset($aLangPrefOrder['name:ja'])
|| isset($aLangPrefOrder['name:pl'])
) {
$oGeocode->setReverseInPlan(true);
}
// Format for output
$sOutputFormat = $oParams->getSet('format', array('xml', 'json', 'jsonv2', 'geojson', 'geocodejson'), 'jsonv2');
set_exception_handler_by_format($sOutputFormat);
@@ -41,7 +32,7 @@ if (CONST_Search_BatchMode && isset($_GET['batch'])) {
$aSearchResults = $oBatchGeocode->lookup();
$aBatchResults[] = $aSearchResults;
}
include(CONST_BasePath.'/lib/template/search-batch-json.php');
include(CONST_LibDir.'/template/search-batch-json.php');
exit;
}
@@ -76,17 +67,19 @@ if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
}
if (isset($_SERVER['REQUEST_SCHEME'])
&& isset($_SERVER['SERVER_NAME'])
&& isset($_SERVER['HTTP_HOST'])
&& isset($_SERVER['DOCUMENT_URI'])
) {
$sMoreURL = $_SERVER['REQUEST_SCHEME'].'://'
.$_SERVER['SERVER_NAME'].$_SERVER['DOCUMENT_URI'].'/?'
.$_SERVER['HTTP_HOST'].$_SERVER['DOCUMENT_URI'].'/?'
.http_build_query($aMoreParams);
} else {
$sMoreURL = '/search.php'.http_build_query($aMoreParams);
$sMoreURL = '/search.php?'.http_build_query($aMoreParams);
}
if (CONST_Debug) exit;
if (CONST_Debug) {
exit;
}
$sOutputTemplate = ($sOutputFormat == 'jsonv2') ? 'json' : $sOutputFormat;
include(CONST_BasePath.'/lib/template/search-'.$sOutputTemplate.'.php');
include(CONST_LibDir.'/template/search-'.$sOutputTemplate.'.php');

View File

@@ -0,0 +1,48 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/ParameterParser.php');
require_once(CONST_LibDir.'/Status.php');
$oParams = new Nominatim\ParameterParser();
$sOutputFormat = $oParams->getSet('format', array('text', 'json'), 'text');
$oDB = new Nominatim\DB(CONST_Database_DSN);
if ($sOutputFormat == 'json') {
header('content-type: application/json; charset=UTF-8');
}
try {
$oStatus = new Nominatim\Status($oDB);
$oStatus->status();
if ($sOutputFormat == 'json') {
$epoch = $oStatus->dataDate();
$aResponse = array(
'status' => 0,
'message' => 'OK',
'data_updated' => (new DateTime('@'.$epoch))->format(DateTime::RFC3339),
'software_version' => CONST_NominatimVersion
);
$sDatabaseVersion = $oStatus->databaseVersion();
if ($sDatabaseVersion) {
$aResponse['database_version'] = $sDatabaseVersion;
}
javascript_renderData($aResponse);
} else {
echo 'OK';
}
} catch (Exception $oErr) {
if ($sOutputFormat == 'json') {
$aResponse = array(
'status' => $oErr->getCode(),
'message' => $oErr->getMessage()
);
javascript_renderData($aResponse);
} else {
header('HTTP/1.0 500 Internal Server Error');
echo 'ERROR: '.$oErr->getMessage();
}
}

Some files were not shown because too many files have changed in this diff Show More