Compare commits

..

487 Commits

Author SHA1 Message Date
Sarah Hoffmann
a3343ab048 fix package name in Centos 7 vagrant script
Fixes #2531.
2021-12-06 15:45:09 +01:00
Sarah Hoffmann
517eb52352 prepare 4.0.1 release 2021-11-22 16:23:19 +01:00
Sarah Hoffmann
02bd1bad2d add a section about moving the database to another machine 2021-11-22 16:17:47 +01:00
Sarah Hoffmann
8d7afbfd4f adapt to release 4.0.0 2021-11-02 20:58:07 +01:00
Sarah Hoffmann
d479a0585d prepare release 4.0.0 2021-11-02 20:27:55 +01:00
Sarah Hoffmann
addfae31b6 fix typo 2021-11-02 11:09:17 +01:00
Sarah Hoffmann
ccf61db726 Merge pull request #2502 from lonvia/improve-development-documentation
Extend developer's documentation
2021-11-01 16:12:23 +01:00
Sarah Hoffmann
5b86b2078a docs: add overview over indexing 2021-11-01 11:04:03 +01:00
Sarah Hoffmann
a069479340 docs: section about database layout
Replaces the import description which basically was
table layout only now.
2021-10-29 12:03:22 +02:00
Sarah Hoffmann
d11bf9288e Merge pull request #2498 from lonvia/ordering-for-unlisted-place-results
Include unlisted places in ordering by housenumber
2021-10-28 15:28:47 +02:00
Sarah Hoffmann
86eeb4d2ed Merge pull request #2497 from lonvia/docs-maintenance
docs: add new maintenance section
2021-10-28 11:33:34 +02:00
Sarah Hoffmann
2275fe59ab include unlisted places in ordering by housenumber
When ordering results by the fact that they have a housenumber,
also take cases into account where the housenumber is on the
place itself. This may happen when the search includes the name
of the place and the housenumber or for addr:place addresses
where the place is unlisted.
2021-10-28 11:27:31 +02:00
Sarah Hoffmann
48be8c33ba docs: add new maintenance section
currently used for postcode updates, word count updates and
deleted relations.
2021-10-28 09:22:37 +02:00
Sarah Hoffmann
d3d07128b2 Merge pull request #2495 from lonvia/fix-normalization-in-php
ICU: use correct normalization during search
2021-10-27 14:40:42 +02:00
Sarah Hoffmann
37eeccbf4c ICU: use normalization from config in PHP
The TERM_NORMALIZATION config option is no longer applicable.
That was already documented but not yet implemented.
2021-10-27 11:32:44 +02:00
Sarah Hoffmann
1722fc537f bdd: add tests for non-latin scripts 2021-10-26 17:29:03 +02:00
Sarah Hoffmann
b240b182cb Merge pull request #2493 from lonvia/handle-frequent-partials
Tune search queries with frequent partial words
2021-10-26 17:00:43 +02:00
Sarah Hoffmann
c0f347fc8c adapt BDD tests to stricter partial search 2021-10-26 15:52:57 +02:00
Sarah Hoffmann
53dbe58ada do not count words when in reverse-only mode 2021-10-26 12:00:13 +02:00
Sarah Hoffmann
2c4b798f9b further refactor setup to keep function small 2021-10-26 12:00:13 +02:00
Sarah Hoffmann
1cf14a8e94 searches for house numbers must have an address 2021-10-26 12:00:13 +02:00
Sarah Hoffmann
4864bf1509 disallow search for partials without address
Very frequent partial terms take too long to look up and
do not return any valuable results unless the search is
further narrowed down by an address.
2021-10-26 12:00:13 +02:00
Sarah Hoffmann
9934421442 make word count computation part of the import
Accurate word counts are now essential when using
the ICU tokenizer and don't hurt for the legacy one.

Adds about an hour import time.
2021-10-26 12:00:13 +02:00
Sarah Hoffmann
d7267c1603 actions: move ICU tests into its own run 2021-10-26 11:59:13 +02:00
Sarah Hoffmann
5c778c6d32 Merge pull request #2486 from lonvia/fix-special-phrases
Fix parsing of operator in special phrases
2021-10-25 21:45:08 +02:00
Sarah Hoffmann
85797acf1e ICU: add an index over word_ids
Needed for keyword lookup in the details response.
2021-10-25 21:33:27 +02:00
Sarah Hoffmann
c4f5c11a4e be case-insensitve about special phrase operator 2021-10-25 19:51:20 +02:00
Sarah Hoffmann
5a1c3dbea3 fix parsing of operator in special phrases
Because of unstripped input, the operators wouldn't match.
2021-10-25 19:46:30 +02:00
Sarah Hoffmann
8e439d3dd9 Merge pull request #2484 from lonvia/fix-index-use
Reverse: add index hints
2021-10-25 17:20:42 +02:00
Sarah Hoffmann
9ebf921c53 Merge pull request #2483 from lonvia/fix-warming
Fix warming for ICU tokenizer
2021-10-25 16:21:36 +02:00
Sarah Hoffmann
7bd9094aaa reverse: add index hints
The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.

Fixes #2480.
2021-10-25 15:01:03 +02:00
Sarah Hoffmann
16cc395f78 fix warming for ICU tokenizer
Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.
2021-10-25 13:08:16 +02:00
Sarah Hoffmann
13e7398566 allow relative paths for log files 2021-10-25 10:26:05 +02:00
Sarah Hoffmann
8b90ee4364 Merge pull request #2476 from lonvia/harmonize-configuration-file-settings
Standardize handling of file names in configuration values
2021-10-24 10:57:48 +02:00
Sarah Hoffmann
1098ab732f allow relative paths for flatnode file 2021-10-22 17:32:51 +02:00
Sarah Hoffmann
507fdd4f40 switch IMPORT_STYLE to use generic file search
Allows relative paths wrt project directory.
2021-10-22 16:49:57 +02:00
Sarah Hoffmann
0ae8d7ac08 have ADDRESS_LEVEL_CONFIG use load_sub_configuration
This means that relative paths now are looked up in the
project directory.
2021-10-22 16:36:52 +02:00
Sarah Hoffmann
c77df2d1eb replace NOMINATIM_PHRASE_CONFIG with command line option 2021-10-22 14:41:14 +02:00
Sarah Hoffmann
cefae021db doc: clarify relative paths for tokenizer config 2021-10-21 16:38:06 +02:00
Sarah Hoffmann
771aee8cd8 Merge pull request #2475 from lonvia/catchup-mode
Add catch-up mode to replication and extend documentation for updating
2021-10-21 16:21:58 +02:00
Sarah Hoffmann
2d13d8b3b6 extend documentation for updating database
Explains the different modes and adds hints for
setting up a systemd job.
2021-10-21 12:14:47 +02:00
Sarah Hoffmann
c1fa70639b add new replication mode catch-up
This mode gets updates until the server reports no new diffs
anymore.

Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
2021-10-20 22:05:15 +02:00
Sarah Hoffmann
12643c5986 run Tiger import with parallel threads per default 2021-10-19 15:00:26 +02:00
Sarah Hoffmann
a0f5613a23 Merge pull request #2472 from lonvia/word-count-computation
Fix word count computation for ICU tokenizer
2021-10-19 14:58:57 +02:00
Sarah Hoffmann
824562357b adapt tests for new word count mechanism 2021-10-19 12:03:48 +02:00
Sarah Hoffmann
ec7184c533 icu: no longer precompute terms
The ICU analyzer no longer drops frequent partials, so it is no
longer necessary to know the frequencies in advance.
2021-10-19 11:52:28 +02:00
Sarah Hoffmann
e8e2502e2f make word recount a tokenizer-specific function 2021-10-19 11:21:16 +02:00
Sarah Hoffmann
c86cfefc48 Merge pull request #2471 from lonvia/update-install-rules
Reorganise, update and extend documentation
2021-10-19 09:11:16 +02:00
Sarah Hoffmann
2635fe8b4c docs: fix more links 2021-10-18 17:26:14 +02:00
Sarah Hoffmann
632436d54d docs: refer to our new Settings chapter in the import instruchtions 2021-10-18 17:02:52 +02:00
Sarah Hoffmann
74be6828dd check and fix all liks in documentation 2021-10-18 16:53:24 +02:00
Sarah Hoffmann
f4acfed48f add extended documentation of settings 2021-10-18 16:30:52 +02:00
Sarah Hoffmann
91e1c1bea8 docs: update overview pages 2021-10-18 09:04:06 +02:00
Sarah Hoffmann
bbb9a41ea4 docs: move place ranking into customization part 2021-10-18 09:04:06 +02:00
Sarah Hoffmann
f6418887b2 docs: nominatim-ui has a new place for custom config 2021-10-18 09:04:06 +02:00
Sarah Hoffmann
a3f8a097a1 docs: move import style description to customize section 2021-10-18 09:04:06 +02:00
Sarah Hoffmann
751563644f docs: make customization chapter a separate section 2021-10-18 09:04:01 +02:00
Sarah Hoffmann
e52b801cd0 fix typo 2021-10-18 09:03:07 +02:00
Sarah Hoffmann
445a6428a6 docs: remove the development warning for ICU tokenizer 2021-10-18 09:03:07 +02:00
Sarah Hoffmann
d59b26dad7 docs: add a warning about using --no-updates with TIGER data 2021-10-18 09:03:07 +02:00
Sarah Hoffmann
47417d1871 update and extend man page
Provide extended descriptions for most subcommands.
2021-10-18 09:03:07 +02:00
Sarah Hoffmann
381aecb952 rename manual directory to man
Avoids confusion between 'docs' and 'manual'.
2021-10-18 09:03:07 +02:00
Sarah Hoffmann
45344575c6 add munin scipts and ICU subrules to installation 2021-10-18 09:03:07 +02:00
Sarah Hoffmann
83381625bd Merge pull request #2469 from lonvia/fix-tablespace-assignment
Fix template expressions for tablespaces
2021-10-15 18:20:43 +02:00
Sarah Hoffmann
552fb16cb2 fix template expressions for tablespaces 2021-10-15 15:11:09 +02:00
Sarah Hoffmann
75c631f080 Merge pull request #2450 from mtmail/tiger-data-2021
US TIGER data 2021 released
2021-10-11 19:22:15 +02:00
Sarah Hoffmann
e2464fdf62 Merge pull request #2465 from lonvia/use-spgist-index
Use SP-GIST for building index
2021-10-11 10:48:44 +02:00
Sarah Hoffmann
9ff98073db remove outdated country_languages.php 2021-10-10 21:58:43 +02:00
Sarah Hoffmann
98ee5def37 add recommendation for Postgis 3+ 2021-10-10 21:55:38 +02:00
Sarah Hoffmann
3649487f5e use SP-GIST index for building index where available
Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.

Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.
2021-10-10 21:55:38 +02:00
Sarah Hoffmann
4b007ae740 Merge pull request #2460 from lonvia/multiple-analyzers
Add support for multiple token analyzers
2021-10-09 14:41:09 +02:00
Sarah Hoffmann
6c79a60e19 add documentation for new configuration of ICU tokenizer 2021-10-07 11:55:53 +02:00
Sarah Hoffmann
2a94bfc703 fix argument description for check_database 2021-10-07 09:49:13 +02:00
Sarah Hoffmann
299934fd2a reorganize and complete tests around generic token analysis 2021-10-06 17:03:37 +02:00
Sarah Hoffmann
b18d042832 add tests for sanitizer tagging language 2021-10-06 12:29:25 +02:00
Sarah Hoffmann
97a10ec218 apply variants by languages
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
2021-10-06 11:09:54 +02:00
Sarah Hoffmann
d35400a7d7 use analyser provided in the 'analyzer' property
Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.
2021-10-05 14:10:32 +02:00
Sarah Hoffmann
92f6ec2328 remove support for properties on variants
Those are not going to be used in the near future, so no need to
carry that code around just now.
2021-10-05 10:29:36 +02:00
Sarah Hoffmann
9ba2019470 precompute replacements while loading configuration 2021-10-05 10:20:08 +02:00
Sarah Hoffmann
c171d88194 move parsing of token analysis config to analyzer
Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.
2021-10-04 18:31:58 +02:00
Sarah Hoffmann
7cfcbacfc7 make token analyzers configurable modules
Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.
2021-10-04 17:37:34 +02:00
Sarah Hoffmann
52847b61a3 extend ICU config to accomodate multiple analysers
Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.
2021-10-04 16:40:28 +02:00
Sarah Hoffmann
5a36559834 move flatten_config_list into config module
For general usage by other modules.
2021-10-04 11:56:54 +02:00
Sarah Hoffmann
19d4e047f6 Merge pull request #2458 from lonvia/add-tokenizer-preprocessing
Add a "sanitation" step for name and address tags before token processing
2021-10-01 21:53:34 +02:00
Sarah Hoffmann
6b348d43c6 replace test variable for PG env tests
'tty' was removed in PG14 and causes an error.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
732cd27d2e add unit tests for new sanatizer functions 2021-10-01 12:27:24 +02:00
Sarah Hoffmann
8171fe4571 introduce sanitizer step before token analysis
Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.

Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
16daa57e47 unify ICUNameProcessorRules and ICURuleLoader
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
2021-10-01 12:27:24 +02:00
Sarah Hoffmann
5e5addcdbf fix typo 2021-09-29 14:16:09 +02:00
Sarah Hoffmann
be65c8303f export more data for the tokenizer name preparation
Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.
2021-09-29 11:54:14 +02:00
Sarah Hoffmann
231250f2eb add wrapper class for place data passed to tokenizer
This is mostly for convenience and documentation purposes.
2021-09-29 11:54:07 +02:00
Sarah Hoffmann
d44a428b74 Merge pull request #2455 from lonvia/adjust-address-levels-slovakia
Adjust address levels for boundaries in Slovakia
2021-09-28 11:21:08 +02:00
Sarah Hoffmann
40f9d52ad8 Merge pull request #2454 from lonvia/sort-out-token-assignment-in-sql
ICU tokenizer: switch match method to using partial terms
2021-09-28 09:45:15 +02:00
Sarah Hoffmann
7f3b05c179 adjust address levels for boundaries in Slovakia
Levels choosen according to OSM wiki. Mainly moves admin_level 6
to county level and admin_level 8 to city/town level. Higher
levels are adjusted accordingly.

Fixes #2453.
2021-09-27 23:32:11 +02:00
Sarah Hoffmann
09c9fad6c3 adapt tests to new ICU address token handling 2021-09-27 17:36:23 +02:00
Sarah Hoffmann
bb18479d5b remove unused parameter 2021-09-27 14:58:43 +02:00
Sarah Hoffmann
779ea8ac62 Merge pull request #2452 from lonvia/update-houses-on-street-name-change
Force update of surrounding houses when street or place name changes
2021-09-27 14:55:50 +02:00
Sarah Hoffmann
bd7c7ddad0 icu tokenizer: switch to matching against partial names
When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.

The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.
2021-09-27 11:36:19 +02:00
Sarah Hoffmann
c6fdcf9b0d adapt documentation for SQL tokenizer interface 2021-09-27 11:36:19 +02:00
Sarah Hoffmann
59fe74ddf6 move name matching into tokenizer module
Instead of requesting the match tokens from the tokenizer
when looking for parent streets/places and address parts,
hand in the saved tokens and ask if they match. This gives
the tokenizer more freedom to decide how name matching
should be done.
2021-09-27 11:36:19 +02:00
Sarah Hoffmann
6d7c067461 force update on rank30 children when place name changes
Name changes may have an effect on parenting. Don't update
surrounding rank30 objects with addr:place tags as this is
potentially too expensive.
2021-09-27 11:04:17 +02:00
Sarah Hoffmann
316205e455 force update of surrounding houses when street name changes
When the street changes its name then this may cause changes
in the parenting of rank-30 objects with an addr:street
tag.

Fixes #2242.
2021-09-27 10:22:41 +02:00
marc tobias
834ae0a93f US TIGER data 2021 released 2021-09-25 00:05:17 +02:00
Sarah Hoffmann
d562f11298 slightly increase radius to look for postcodes 2021-09-24 23:56:42 +02:00
Sarah Hoffmann
972628c751 Merge pull request #2449 from lonvia/address-ranking-spain
Adjust address ranks for Spain
2021-09-24 22:48:21 +02:00
Sarah Hoffmann
09b1db63f4 adjust address ranks for Spain
Adjusts levels for boundaries according to the list on
https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative

* no admin_level 5, so drop that from addresses
* admin_level 6 has the province
* admin_level 7 has the county when it exists

Also reranks place=province so that it matches up with
admin_level 6 and introduces place=civil_parish which
is used as a place node for some admin_level=9 boundaries
in Galicia.
2021-09-24 18:39:44 +02:00
Sarah Hoffmann
e9d54f752c Merge pull request #2447 from lonvia/fix-dynamic-address-assignment
Fix dynamic assignment of address parts
2021-09-19 15:57:28 +02:00
Sarah Hoffmann
c335025167 CI: install locale for CentOS 2021-09-19 13:49:11 +02:00
Sarah Hoffmann
2b2109c89a Remove the installation warning
Installation has become a lot easier.
2021-09-19 13:01:32 +02:00
Sarah Hoffmann
56124546a6 fix dynamic assignment of address parts
A boolean check for dynamic changes of address parts is not
sufficient. The order of choice should be:

 1. an addr:* part matches the name
 2. the address part surrounds the object
 3. the address part was declared as isaddress

The implementation uses a slightly different ordering
to avoid geometry checks unless strictly necessary (isaddress
is false and no matching address).

See #2446.
2021-09-19 12:34:39 +02:00
Sarah Hoffmann
336258ecf8 Merge pull request #2440 from lonvia/generic-config-loader
Add generic loader for YAML configuration files
2021-09-04 17:41:15 +02:00
Sarah Hoffmann
b894d2c04a fix indent 2021-09-04 10:30:35 +02:00
Sarah Hoffmann
8e1d4818ac use yaml config loader for country info 2021-09-04 00:22:55 +02:00
Sarah Hoffmann
28c98584c1 add tests for generic YAML config reader 2021-09-03 22:31:30 +02:00
Sarah Hoffmann
1c42780bb5 introduce generic YAML config loader
Adds a function to the Configuration class to load a YAML
file. This means that searching for the file is generalised
and works the same now for all configuration files. Changes
the search logic, so that it is always possible to have a
custom version of the configuration file in the project
directory.

Move ICU tokenizer to use new load function.
2021-09-03 18:20:07 +02:00
Sarah Hoffmann
18554dfed7 Merge pull request #2437 from lonvia/tweak-ranking-searches
Some more tweaks for search interpretation
2021-09-03 14:16:23 +02:00
Sarah Hoffmann
2e493fec46 Merge pull request #2436 from lonvia/country-configuration
Move configuration of default languages into a configuration file
2021-09-03 08:55:36 +02:00
Sarah Hoffmann
98c2e08add reduce penalty for special searches by name
Additional penalty for special terms with operator None
should only go to near searches. To reduce the number
of produced searches, restrict the none operator to
appear only in conjunction with the name.
2021-09-03 08:50:38 +02:00
Sarah Hoffmann
94d3dee369 further increase penalty on housenumbers without numbers
Make the penality dependent on the length of the token:
no penalty for one letter house numbers and increasing one
for more letters.
2021-09-02 18:11:49 +02:00
Sarah Hoffmann
7e7dd769fd remove language and partition from name import 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
79da96b369 read partition and languages from config file 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
78fcabade8 move country name generation to country_info module 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
284645f505 move generation of country tables in own module 2021-09-02 14:41:11 +02:00
Sarah Hoffmann
0b349761a8 add country configuration
The new configuration saves the default language(s) originally
maintained in the OSM wiki as well as the partition information.
2021-09-02 14:41:11 +02:00
Sarah Hoffmann
d18794931a Merge pull request #2435 from lonvia/simplified-to-traditional-chinese
icu: normalise simplified to traditional chinese
2021-08-31 15:29:26 +02:00
Sarah Hoffmann
b7d4ff3201 icu: normalise simplified to traditional chinese
The conversion is unambigious in most cases, so that the
information loss is minimal.
2021-08-31 11:18:34 +02:00
Sarah Hoffmann
4c6d674e03 Merge pull request #2434 from lonvia/vagrant-scripts-in-actions
Test installation instructions via CI
2021-08-29 10:11:59 +02:00
Sarah Hoffmann
2c97af8021 CI: use packaged source also for test runs 2021-08-24 10:10:01 +02:00
Sarah Hoffmann
832f75a55e CI: unify jobs for different vagrant scripts 2021-08-24 10:10:01 +02:00
Sarah Hoffmann
4e77969545 add workflow for centos 8 2021-08-24 10:10:01 +02:00
Sarah Hoffmann
6ebbbfee61 CI: use vagrant scripts for import tests
Use vanilla docker images of Ubuntu and leave the setup
to the vagrant scripts. Then do the usual import tests.

Also fixes a couple of issues found with the scripts
2021-08-24 10:10:01 +02:00
Sarah Hoffmann
0fabeefc3e Merge pull request #2432 from Mastercuber/patch-1
Added postcode
2021-08-22 09:32:31 +02:00
Mastercuber
c70d72f06b Added postcode
Added postcode to the list of addressdetails
2021-08-22 02:52:41 +02:00
Sarah Hoffmann
cc141bf1a5 Add link to fixthemap to issue template 2021-08-21 20:36:16 +02:00
Sarah Hoffmann
199532c802 Merge pull request #2429 from lonvia/place-name-to-admin-boundary
Indexing: move linking of places to the preparation stage
2021-08-21 10:21:39 +02:00
Sarah Hoffmann
28ee3d0949 move linking of places to the preparation stage
Linked places may bring in extra names. These names need to be
processed by the tokenizer. That means that the linking needs
to be done before the data is handed to the tokenizer. Move finding
the linked place into the preparation stage and update the name
fields. Everything else is still done in the indexing stage.
2021-08-20 22:44:17 +02:00
Sarah Hoffmann
925195725d Merge pull request #2428 from lonvia/rename-icu-tokenizer
Rename legacy_icu tokenizer to icu tokenizer
2021-08-18 15:02:19 +02:00
Sarah Hoffmann
f6d22df76e adapt CI workflow to new tokenizer name 2021-08-18 09:08:20 +02:00
Sarah Hoffmann
118858a55e rename legacy_icu tokenizer to icu tokenizer
The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.
2021-08-17 23:11:47 +02:00
Sarah Hoffmann
656c1291b1 Merge pull request #2427 from lonvia/remove-us-states-special-casing
Move US state hack into legacy tokenizer
2021-08-17 21:55:32 +02:00
Sarah Hoffmann
f00b8dd1c3 move special hack for US states to legacy tokenizer
The hack for IL, AL and LA is only needed because these abbreviations
are removed by the legacy tokenizer as a stop word. There is no need
to keep the hack for future tokenizers. Move it therefore to the
token extraction function.
2021-08-17 14:28:55 +02:00
Sarah Hoffmann
5f2b9e317a add tests for US state hacks
IL, AS and LA are replaced with the US state in Geocode because
the old tokenizer would simply remove the abbreviations otherwise.
2021-08-17 10:49:07 +02:00
Sarah Hoffmann
4ae5ba7fc4 Merge pull request #2425 from lonvia/tokenizer-documentation
Introduce official Tokenizer API
2021-08-17 09:38:03 +02:00
Sarah Hoffmann
3656eed9ad add mkdocstrings requirement for building docs
mkdocstrings also needs access to the Python sources, so set
a PYTHONPATH accordingly. This makes running mkdocs directly
a bit awkward, therefore add a `make serve-doc` target.
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
2e82a6ce03 docs: extend explanation of query phrase 2021-08-16 11:51:49 +02:00
Sarah Hoffmann
c4b8a3b768 add documentation for PHP part of tokenizer 2021-08-16 11:51:49 +02:00
Sarah Hoffmann
1147b83b22 php: make word list a first-class object
This separates the logic of creating word sets from the Phrase
class. A tokenizer may now derived the word sets any way they
like. The SimpleWordList class provides a standard implementation
for splitting phrases on spaces.
2021-08-16 11:51:49 +02:00
Sarah Hoffmann
0fb8eade13 remove country restriction from tokenizer
Restricting tokens due to the search context is better done in
the generic search part instead of repeating the same test in
every tokenizer implementation.
2021-08-16 11:41:54 +02:00
Sarah Hoffmann
78d11fe628 document tokenizer SQL interface 2021-08-16 11:41:54 +02:00
Sarah Hoffmann
90b40fc3e6 define formal public Python interface for tokenizer
This introduces an abstract class for the Tokenizer/Analyzer
for documentation purposes.
2021-08-16 11:41:54 +02:00
Sarah Hoffmann
e25e268e2e docs: querying and tokenizers 2021-08-16 08:59:44 +02:00
Sarah Hoffmann
68bff31cc9 docs: add developer doc page for Tokenizer 2021-08-16 08:58:56 +02:00
Sarah Hoffmann
31d9545702 Merge pull request #2424 from lonvia/multi-country-import
Update instructions for importing multiple regions
2021-08-16 08:48:28 +02:00
Sarah Hoffmann
e449071a35 Merge pull request #2423 from hummeltech/patch-1
Fix old paths for `phpcs` when using `make test`
2021-08-15 22:00:50 +02:00
Sarah Hoffmann
23e3724abb ignore words without id for status 2021-08-15 21:59:36 +02:00
Sarah Hoffmann
75a5c7013f split up large setup function 2021-08-15 12:24:13 +02:00
Sarah Hoffmann
56d24085f9 port multi-region update scripts to nominatim tool
Also updates the documentation. For the simple case of just
importing multiple regions, provide simplified instructions
that use the new multi-file import feature.

Fixes #2365.
2021-08-14 23:55:48 +02:00
Sarah Hoffmann
95b82af42a update osm2pgsql to 1.5.1 2021-08-14 22:46:35 +02:00
Sarah Hoffmann
87dedde5d6 allow multiple files for the import command
The files are forwarded to osm2pgsql which is now able to merge
them correctly.
2021-08-14 21:42:21 +02:00
David Hummel
8b6489c60e Fix old paths for phpcs when using make test
These paths no longer exist since db3ced17bb, they are now all located under `lib-php`
2021-08-12 13:34:18 -07:00
Sarah Hoffmann
bf4f05fff3 Merge pull request #2413 from osm-search/helm-chart
Installation docs - link to Kubernetes install project
2021-08-08 11:09:36 +02:00
mtmail
b0aaa25f0d Installation docs - link to Kubernetes install project
As reported by @robjuz in https://github.com/osm-search/Nominatim/discussions/2412
2021-08-03 12:02:35 +02:00
Sarah Hoffmann
c3ddc7579a Merge pull request #2408 from lonvia/icu-change-word-table-layout
Change table layout of word table for ICU tokenizer
2021-07-28 14:28:49 +02:00
Sarah Hoffmann
fdff579188 php: force use of global Exception class 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
d48793c22c fix Python linitin errors 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
001b2aa9f9 fix linitin issues in PHP 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1db098c05d reinstate word column in icu word table
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
324b1b5575 bdd tests: do not query word table directly
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
e42878eeda adapt unit test for new word table
Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
eb6814d74e convert word info column to json before copying 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
6ad35aca4a adapt special terms lookup to new word table 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
70f154be8b switch word tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
4342b28882 switch special phrases to new word table format 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5394b1fa1b switch postcode tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
5ab0a63fd6 switch housenumber tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
1618aba5f2 switch country name tokens to new word table layout 2021-07-28 11:31:47 +02:00
Sarah Hoffmann
8377528952 new word table layout for icu tokenizer
The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.
2021-07-28 11:31:47 +02:00
Sarah Hoffmann
34dcf02dee fix typos in tokenizer docs 2021-07-28 11:28:49 +02:00
Sarah Hoffmann
5d7d7f15d9 Merge pull request #2401 from lonvia/port-add-data-to-python
Port add-data functions from PHP to Python
2021-07-26 12:38:56 +02:00
Sarah Hoffmann
0c023fb4d2 adapt cli tests to Python port for add-data 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
1bd068d42d remove unused update script 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
e42349c963 replace add-data function with native Python code 2021-07-26 10:41:37 +02:00
Sarah Hoffmann
878835e4bd move add-data subcommand into a separate file 2021-07-25 18:14:12 +02:00
Sarah Hoffmann
8096a1d67f fix parameters for TokenWord creation 2021-07-20 10:21:40 +02:00
Sarah Hoffmann
e16c5d5f70 Merge pull request #2397 from lonvia/increase-minimum-required-versions
Increase minimum required PostgreSQL version to 9.5
2021-07-19 14:28:02 +02:00
Sarah Hoffmann
2c8242c8df remove special code for pre9.5 postgresql
9.5 is now the minimum requirement.
2021-07-19 10:24:57 +02:00
Sarah Hoffmann
e7d6f89aca increase minimum version for PostgreSQL to 9.5
This is the minimum version we can test with the CI.
With 9.5 there is also complete support for jsonb available.
2021-07-19 10:21:19 +02:00
Sarah Hoffmann
379f5db516 require Python 3.6 also in CMakeFile
This had been forgotten when increasing the minimum Python version.
2021-07-19 10:14:14 +02:00
Sarah Hoffmann
ee32315378 Merge pull request #2396 from lonvia/partial-word-token
Reorganise code that build the SearchDescription
2021-07-19 09:42:37 +02:00
Sarah Hoffmann
cca912af4e make all Token menbers private 2021-07-18 22:54:55 +02:00
Sarah Hoffmann
86ea077092 merge marking rare name with adding name token
Only name tokens can be rare, so this should be the same
function.
2021-07-18 16:52:37 +02:00
Sarah Hoffmann
5d6aabc457 add documentation for public interface of SearchDescription 2021-07-18 16:10:42 +02:00
Sarah Hoffmann
b14ce959d9 factor out check if a token fits current search
Saves allocating an empty array.
2021-07-17 22:01:35 +02:00
Sarah Hoffmann
a48ebd9b47 move SearchDescription building into tokens
Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.
2021-07-17 20:24:33 +02:00
Sarah Hoffmann
3cd85eaaf1 remove Token from explicit input for SearchDescription extension
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.

Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
2021-07-17 18:18:31 +02:00
Sarah Hoffmann
ec3f6c9c42 factor out query position
Moves token and phrase position and phrase type into a separate
class that is handed in when assembling the search description.
This drastically reduces the number of parameters for the function
to extend the search descriptions and gives us more flexibility
in the future for more complex positional analysis.
2021-07-15 14:12:59 +02:00
Sarah Hoffmann
143ff14466 remove special status of partial tokens
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.

The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
2021-07-14 22:17:17 +02:00
Sarah Hoffmann
6070c3d1d5 introduce a separate token type for partials
This means that the leading space can be removed as a partial
word indicator.
2021-07-13 16:57:12 +02:00
Sarah Hoffmann
bc8b2d4ae0 Merge pull request #2393 from lonvia/fix-flake8-issues
Fix flake8 issues
2021-07-13 16:46:12 +02:00
Sarah Hoffmann
14f777da18 use psycopg's SQL quoting where possible
Use the SQL formatting supplied with psycopg whenever the
query needs to be put together from snippets.
2021-07-12 22:05:22 +02:00
Sarah Hoffmann
6f6681ce67 add helper function for execute_values
Make psycopg2's convenience function accessible through
the cursor.
2021-07-12 21:08:20 +02:00
Sarah Hoffmann
06602b4ec0 provide wrapper function for DROP TABLE
Use psycopg2 formatting to ensure correct quoting.
2021-07-12 20:32:46 +02:00
Sarah Hoffmann
cf98cff2a1 more formatting fixes
Found by flake8.
2021-07-12 17:45:42 +02:00
Sarah Hoffmann
b4fec57b6d Merge pull request #2391 from lonvia/fix-sonar-issues
Fix bugs and code smells found by Sonarqube
2021-07-12 17:14:59 +02:00
Sarah Hoffmann
f8b5a63de3 factor out connection reset code 2021-07-12 14:58:44 +02:00
Sarah Hoffmann
568316f07c simplify analyse function 2021-07-12 14:47:50 +02:00
Sarah Hoffmann
daa597b300 split up variant computation for better readability 2021-07-12 14:43:50 +02:00
Sarah Hoffmann
47adb2a3fc reorganise process_place function
Move address processing into its own function as it is
rather extensive.
2021-07-12 11:57:55 +02:00
Sarah Hoffmann
fff0012249 simplify website setup code
Use formaat strings and move variable quoting code into extra
function.
2021-07-12 11:41:05 +02:00
Sarah Hoffmann
d5a1883b62 avoid repeated patterns for table name 2021-07-12 11:33:09 +02:00
Sarah Hoffmann
a08ef43e40 simplify if statements 2021-07-12 11:28:47 +02:00
Sarah Hoffmann
bc5e15996a convert single case switch to if statement 2021-07-12 11:28:47 +02:00
Sarah Hoffmann
128ca800cd avoid local variable assignment 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
000d133af6 fix more missing braces on one-liners 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
1e40d65aa9 remove dead code 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
bffbe68ec3 do not intermix params with and without default 2021-07-11 23:22:53 +02:00
Sarah Hoffmann
58b10074ad directly return data in function
The temporary variable is not necessary.
2021-07-11 19:24:04 +02:00
Sarah Hoffmann
d933ead2b5 remove unnecessayly nested ifs
Found by Sonarqube.
2021-07-11 19:11:37 +02:00
Sarah Hoffmann
1cdc30c5e8 remove unused functions
The functions were necessary for the transitory code
to Python and are no longer used.
2021-07-11 19:10:04 +02:00
Sarah Hoffmann
3661f7a321 avoid multiple returns of same value
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
27af9b102c always use brackets on if statements
This adds bracket around all one-line if statements that did
not have them yet.
2021-07-10 17:04:46 +02:00
Sarah Hoffmann
500c61685b remove unused variables
As reported by sonarqube.
2021-07-09 16:36:42 +02:00
Sarah Hoffmann
106d960f84 fix bad use of echo in PHP output 2021-07-09 12:50:35 +02:00
Sarah Hoffmann
322fa19ceb Merge pull request #2390 from lonvia/responsible-disclosure
Add security issue disclosure policy
2021-07-09 12:32:37 +02:00
Sarah Hoffmann
5bea0b6086 add security issue disclosure policy 2021-07-09 11:36:59 +02:00
Sarah Hoffmann
a5970d7548 Merge pull request #2384 from lonvia/actions-add-icu-tokenizer
CI: run tests on Ubuntu 18
2021-07-07 14:39:53 +02:00
Sarah Hoffmann
c216144dd1 add missing pyyaml requirement 2021-07-07 11:29:33 +02:00
Sarah Hoffmann
42e08da7ca enable PHP 7.2 for Ubuntu 18 CI 2021-07-07 11:29:33 +02:00
Sarah Hoffmann
a2edbbf78a cannot use capture_output in subprocess.run
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
1e86dc1d93 remove default parameter for namedtuple
This is only available in Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
54f295be52 CI: run tests on older Ubuntu version as well 2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8bc3c0a07c Merge pull request #2382 from lonvia/remove-json-config
Remove outdated ICU tokenizer JSON config
2021-07-05 12:34:34 +02:00
Sarah Hoffmann
d75bc20174 Merge pull request #2383 from lonvia/remove-more-names
Exclude name:etymology and name:signed
2021-07-05 12:34:16 +02:00
Sarah Hoffmann
fd8751658f exclude name:etymology and name:signed
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.

name:signed basically indicates that the feature does not have
a name.
2021-07-05 11:04:16 +02:00
Sarah Hoffmann
4db5a1a0b8 remove outdated ICU tokenizer JSON config 2021-07-05 11:01:35 +02:00
Sarah Hoffmann
4c52777ef0 Merge pull request #2371 from lonvia/increase-python-version
Increase minimum required Python version to 3.6
2021-07-05 10:32:38 +02:00
Sarah Hoffmann
d4c7bf20a2 Merge pull request #2381 from lonvia/reorganise-abbreviations
Reorganise abbreviation handling
2021-07-05 10:32:16 +02:00
Sarah Hoffmann
affe1300d9 add warning about experimental nature of ICU tokenizer 2021-07-04 10:44:58 +02:00
Sarah Hoffmann
62d5984b1b limit the number of variants that can be produced 2021-07-04 10:28:28 +02:00
Sarah Hoffmann
c32551b4e0 restrict partial word counting to names of reasoanble length
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
e85f7e7aa9 fix subsequent replacements
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.

Also forbit composing a word after a space was added in the
end by a previous replacement.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
7b0f6b7905 leave ICU variant properties empty for now
Saving unused properties causes unnecessary duplicates.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0894ce9dc3 import abbreviations from OSM Wiki
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
4fd2e961b6 improve normalization
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
b9fbfeff67 only consider partials in multi-words for initial count
This ensures that it is less likely that we exclude meaningful
words like 'hauptstrasse' just because they are frequent.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
5dd24b3ef0 add documentation for ICU tokenizer configuration 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
62828fc5c1 switch to a more flexible variant description format
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a6aa6360e0 use yaml tag syntax to mark include files 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
c4f6c06f44 add dependency on datrie 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0d80a9b897 tests for composing decomposed suffixes 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
f70930b1a0 make compund decomposition pure import feature
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
9ff4f66f55 complete tests for icu tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
32ca631b74 fix full term token in special phrases 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e81084f35 complete tests for rule loader 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a0a7b05c9f correctly quote strings when copying in data
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2f6e4edcdb update unit tests for adapted abbreviation code 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1bd9f455fc add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e3c5d4c5b adapt tests for ICU tokenizer 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
6ba00e6aee icu tokenizer: move transliteration rules in separate file
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
de4fac33dc docs: nominatim-ui should be installed from the release
The development version does not provide the pre-packaged
dist directory anymore.
2021-07-03 21:16:52 +02:00
Sarah Hoffmann
c9984669a7 Merge pull request #2373 from lonvia/tweak-search-cost
Further tweaking of search cost
2021-06-26 16:21:08 +02:00
Sarah Hoffmann
63755c31ff remove penalty for full words in address
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.

Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
161f5f5cee adjust penalty for housenumber-in-name searches
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.

Change the code to first compute the penalties and then create
the new search variants.
2021-06-26 11:37:15 +02:00
Sarah Hoffmann
c7073a1fc0 increase minimum Python to 3.6
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.

Affected distributions are Ubuntu 16.04 and Debian Stretch.
2021-06-21 18:37:37 +02:00
Sarah Hoffmann
e7b4fc70e7 make sure old data gets deleted on place type change
When changing from some other place type to place=postcode
make sure that the old place type entry in the place table
is deleted.
2021-06-18 10:58:41 +02:00
Sarah Hoffmann
457982e1d2 update postcode in place if it already exists 2021-06-18 00:28:52 +02:00
Sarah Hoffmann
aa558e6080 Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search
Do not return POIs when dropping house number in query
2021-06-17 15:30:05 +02:00
Sarah Hoffmann
fe11d3cbbd do not return POIs when dropping house number in query
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.

This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
2021-06-17 14:21:20 +02:00
Sarah Hoffmann
1ce223a83b Merge pull request #2360 from AntoJvlt/postcodes-place-table
Use place instead of placex to compute postcodes
2021-06-16 11:45:07 +02:00
AntoJvlt
3676310efe Improved performance of the postcodes query and some code cleaning 2021-06-12 15:46:08 +02:00
AntoJvlt
ddf866c4c7 Always delete old placex entry for type=postcode when inserting a new one into the place table 2021-06-12 15:35:51 +02:00
AntoJvlt
9e07a197e9 Handle postcode type change in place insert trigger 2021-06-09 09:31:32 +02:00
AntoJvlt
1c175e3a67 Clean and update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
47fb7cd3a8 Use place_exists() into can_compute() for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
e879814e43 Update tests for postcodes 2021-06-09 09:31:32 +02:00
AntoJvlt
a4733eed90 Use place instead of placex to compute postcodes 2021-06-09 09:31:32 +02:00
Sarah Hoffmann
38fbc4fcbb do not fail CI on codecov errors
The CodeCove upload depends on unreliable external code.
2021-06-08 10:42:14 +02:00
Sarah Hoffmann
c6fe91bfa5 Merge pull request #2359 from lonvia/switch-bdd-tests-to-api-search
Remove deprecated commandline query function
2021-06-06 18:29:51 +02:00
Sarah Hoffmann
7383f05e45 remove deprecated query interface
Searches can now be done via the thin API wrapper.
2021-06-06 15:28:21 +02:00
Sarah Hoffmann
3aac51c81f switch BDD tests to always use search API 2021-06-06 15:27:52 +02:00
Sarah Hoffmann
f0a7850edf Merge pull request #2358 from AntoJvlt/documentation-update
Update documentation
2021-06-04 23:54:37 +02:00
AntoJvlt
4336ca69c7 Update documentation 2021-06-03 18:39:40 +02:00
Sarah Hoffmann
4bca5e838b Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries
Fix insertion of special terms and countries into word table
2021-06-02 20:58:14 +02:00
Sarah Hoffmann
bc981d0261 fix insertion of special terms and countries into word table
Special terms need to be prefixed by a space because they are
full terms.

For countries avoid duplicate entries of word tokens.

Adds tests for adding country terms.
2021-06-02 20:22:39 +02:00
Sarah Hoffmann
b1d33e6b49 Merge pull request #2356 from lonvia/freeze-after-import
Call freeze after running and non-updateable import
2021-06-02 16:25:26 +02:00
Sarah Hoffmann
38d442edf6 docs: reload SQL when migrating to 3.6
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.

Fixes #2335.
2021-06-02 16:11:29 +02:00
Sarah Hoffmann
72625dc72a call freeze after running and non-updateable import
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
2021-06-02 11:08:48 +02:00
Sarah Hoffmann
cc2f152d70 commit changes to replication log table
Fixes #2350.
2021-05-26 11:47:08 +02:00
Sarah Hoffmann
f74dc38766 always compute guessed postcode for POIs from centroid
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.

Fixes #2301.
2021-05-26 11:15:13 +02:00
Sarah Hoffmann
7d9665d8d2 Merge pull request #2349 from lonvia/fix-website-refresh
Only initialise tokenizer for refresh functions where needed
2021-05-25 20:43:44 +02:00
Sarah Hoffmann
a0e85cc17c only initialise tokenizer for refresh functions where needed
Fixes #2347.
2021-05-25 19:16:22 +02:00
Sarah Hoffmann
29b02f9e56 Merge pull request #2346 from lonvia/words-vs-tokens
Cleanup use of partial words in legacy tokenizers
2021-05-24 17:41:38 +02:00
Sarah Hoffmann
24c986c842 add tests for new full name computation with ICU 2021-05-24 10:41:42 +02:00
Sarah Hoffmann
4f4d15c28a reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
  (but not the part after the bracket)

Fixes #244.
2021-05-24 10:41:42 +02:00
Sarah Hoffmann
fa3e48c59f use make_keywords for place search terms also
Ensures that place indeed uses the same search names as other
names.
2021-05-23 23:08:11 +02:00
Sarah Hoffmann
02f6afa51b always ignore multi term partials in search
Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.
2021-05-23 22:13:03 +02:00
Sarah Hoffmann
10143e0ac7 Merge pull request #2342 from lonvia/icu-tokenizer-ci
Add BDD tests with icu tokenizer to CI runs
2021-05-22 10:36:35 +02:00
Sarah Hoffmann
8f3429939f CI: run BDD tests with legacy_icu tokenizer 2021-05-21 23:18:45 +02:00
Sarah Hoffmann
00094c43d1 enable Tiger BDD API test for legacy_icu 2021-05-21 22:39:56 +02:00
Sarah Hoffmann
8bf15fa691 Merge pull request #2341 from lonvia/cleanup-python-tests
Cleanup and linting of python tests
2021-05-20 17:30:30 +02:00
Sarah Hoffmann
63dc503b8d Merge pull request #2337 from mogita/fix/invalid-query-string
fix: add the missing question mark
2021-05-20 10:26:23 +02:00
Sarah Hoffmann
430c316e45 test: fix linting errors 2021-05-19 23:07:39 +02:00
Sarah Hoffmann
01f5a9ff84 test: more use of table_factory 2021-05-19 17:37:03 +02:00
Sarah Hoffmann
af52eed0dd test: avoid use of tempfile module
Use the tmp_path fixture instead which provides automatic
cleanup.
2021-05-19 16:43:26 +02:00
Sarah Hoffmann
f93d0fa957 test: use src_dir fixture instead of self-computed paths 2021-05-19 16:03:54 +02:00
Sarah Hoffmann
c06a1d007a test: replace raw execute() with fixture code where possible 2021-05-19 12:11:04 +02:00
Sarah Hoffmann
65bd749918 test: use table_rows() and execute_values() where possible
Some uses of scalar() could also be replaced with convenience
functions from the word table mock.
2021-05-19 10:51:10 +02:00
Sarah Hoffmann
510eb53f53 test: move Testingcursor into separate class
Also adds more convenience functions: counting with a where
statement and a wrapper to execute_values().
2021-05-19 10:30:36 +02:00
mogita
507543a482 fix: add the missing question mark 2021-05-19 13:35:15 +08:00
Sarah Hoffmann
16bb007135 Merge pull request #2336 from lonvia/do-not-mask-error-when-loading-tokenizer
Do not hide errors when importing tokenizer
2021-05-18 23:00:10 +02:00
Sarah Hoffmann
1ffb6bd5d0 Merge pull request #2321 from AntoJvlt/csv-import-special-phrases
CSV import for special phrases and loader refactoring
2021-05-18 22:58:25 +02:00
AntoJvlt
799a4c9ab6 Documentation update and small code fixes 2021-05-18 22:35:21 +02:00
Sarah Hoffmann
b2722650d4 do not hide errors when importing tokenizer
Explicitly check for the tokenizer source file to check that
the name is correct. We can't use the import error for that
because it hides other import errors like a missing
library.

Fixes #2327.
2021-05-18 16:28:21 +02:00
Sarah Hoffmann
54b06d7abc Merge pull request #2332 from lonvia/fix-keyword-details
Always use object type for details keywords
2021-05-18 11:30:58 +02:00
Sarah Hoffmann
fef1bbb1a7 always use object type for details keywords
When name and address is empty, the keywords field in the response
of the details API would be an array because that is what PHP's
json_encode defaults to with empty array(). This default can only
be changed globally per json_encode call and that might cause
unintended colleteral damage. Work around the issue by making
name and address an empty array instead of keywords.

Fixes #2329.
2021-05-17 16:36:32 +02:00
AntoJvlt
3206bf59df Resolve conflicts 2021-05-17 13:52:35 +02:00
AntoJvlt
a33f2c0f5b Special phrases documentation updated 2021-05-17 13:25:16 +02:00
AntoJvlt
8b8dfc46eb Added --no-replace command for special phrases importation and added corresponding tests 2021-05-17 13:25:06 +02:00
AntoJvlt
06aab389ed Code cleaning and SPLoader deleted 2021-05-16 16:59:12 +02:00
AntoJvlt
fb0ebb5bf0 Add tests for the new SPWikiLoader and SPCsvLoader 2021-05-16 16:10:06 +02:00
Sarah Hoffmann
925726222f Merge pull request #2323 from darkshredder/disable-search-reverse-only
Feat: Disabled search API for --reverse-only imports
2021-05-14 10:40:22 +02:00
Sarah Hoffmann
550e7edb64 Merge pull request #2328 from lonvia/convert-tiger-to-csv
Switch external Tiger data to CSV format
2021-05-14 09:58:50 +02:00
Sarah Hoffmann
2992dea5c8 install default settings for legacy_icu tokenizer 2021-05-14 09:44:10 +02:00
Sarah Hoffmann
e76e4bd964 adapt documentation to use Tiger CSV dump 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
7d621389ee adapt tests to new TIGER CSV format 2021-05-14 00:02:50 +02:00
Sarah Hoffmann
35efe3b41c use tokenizer during Tiger data import
This also changes the required import format to CSV.
2021-05-14 00:02:50 +02:00
Darkshredder
e5ffc59cd5 feat: Added reverse-only-search validation 2021-05-14 02:36:21 +05:30
Sarah Hoffmann
d7f9d2bde9 Merge pull request #2326 from lonvia/wokerpool-for-tiger-data
Use WorkerPool when importing Tiger data
2021-05-13 22:09:56 +02:00
Sarah Hoffmann
5feece64c1 use WorkerPool for Tiger data import
Requires adding an option that SQL errors are ignored.
2021-05-13 20:36:50 +02:00
Sarah Hoffmann
b9a09129fa move WorkerPool into db module
The pool is independent of the indexer and may also be used
by other parts of the software.
2021-05-13 17:11:17 +02:00
Sarah Hoffmann
96e6bbe3a1 Merge pull request #2325 from lonvia/do-not-precompute-postcodes
Do not preload postcodes in the legacy tokenizer
2021-05-13 17:00:29 +02:00
Frederik Ramm
fe39185894 Add array_key_last function for PHP <7.3
This patch adds an array_key_last function if it doesn't yet exist, fixes #2316. It is tested on PHP 7.2.24 but not PHP 7.3.
2021-05-13 16:42:22 +02:00
Sarah Hoffmann
fc860787dd do not preload postcodes
This is too expensive for updates.
2021-05-13 16:14:12 +02:00
Sarah Hoffmann
63e35574d4 Merge pull request #2324 from lonvia/generic-external-postcodes
Rework postcode handling and generalised external postcode support
2021-05-13 14:52:19 +02:00
Sarah Hoffmann
db2dbf15f7 fix token_info migration
A bad indent meant that only one table received the new column.
2021-05-13 14:31:41 +02:00
Sarah Hoffmann
f5977dac75 ignore invalid coordinates in external postcodes 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
8f2746fe24 ignore entries without country code 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
41b9bc9984 add documentation for external postcode feature 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
1ccd4360b4 correctly handle removing all postcodes for country 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
bf864b2c54 index postcodes after refreshing 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
4abaf71234 add and extend tests for new postcode handling 2021-05-13 14:15:42 +02:00
Sarah Hoffmann
a4aba23a83 move filling of postcode table to python
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.

External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
2021-05-13 14:15:42 +02:00
Sarah Hoffmann
cae0cf3546 Merge pull request #2322 from mtmail/type-label-already-lowercased
typelabel value is already lowercased
2021-05-12 20:25:22 +02:00
marc tobias
38f9e18afb typelabel value is already lowercased 2021-05-12 19:16:51 +02:00
AntoJvlt
9d83da830f Introduction of SPCsvLoader to load special phrases from a csv file 2021-05-10 23:26:39 +02:00
AntoJvlt
00959fac57 Refactoring loading of external special phrases and importation process by introducing SPLoader and SPWikiLoader 2021-05-10 21:49:31 +02:00
Sarah Hoffmann
40cb17d299 Merge pull request #2314 from lonvia/fix-status-no-import-date
Correctly catch the exception when import date is missing
2021-05-06 17:41:53 +02:00
Sarah Hoffmann
2ae293aeb6 Merge pull request #2312 from lonvia/icu-tokenizer
Add new tokenizer based on libICU
2021-05-06 17:22:04 +02:00
Sarah Hoffmann
d8ead78e03 correctly catch the exception when import date is missing 2021-05-06 16:27:42 +02:00
Sarah Hoffmann
b2c6eca2c8 add missing transliterations
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
2021-05-05 21:16:55 +02:00
Sarah Hoffmann
872ab91421 fix name of transliterator
Should be different from the normalisation rules.
2021-05-05 17:09:38 +02:00
Sarah Hoffmann
a263e54b94 enable BDD tests for different tokenizers
The tokenizer to be used can be choosen with -DTOKENIZER.

Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
2021-05-05 10:31:51 +02:00
Sarah Hoffmann
18c99a5c5f add unit tests for legacy ICU tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
d55fc39275 cache translieration results 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
ba8ed7967d add PHP part for new ICU-base tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
f44af49df9 add Python part for new ICU-based tokenizer 2021-05-05 10:15:27 +02:00
Sarah Hoffmann
3c67bae868 Merge pull request #2310 from RhinoDevel/master
2nd try: Add hint about replication update & recheck intervals being in seconds.
2021-05-04 12:45:26 +02:00
Marc
3dade534fd Add hint about replication update & recheck intervals being in seconds. 2021-05-04 11:47:15 +02:00
Sarah Hoffmann
8b1a509442 Merge pull request #2305 from lonvia/tokenizer
Factor out normalization into a separate module
2021-05-03 09:15:34 +02:00
Sarah Hoffmann
8bdb9aa607 mock tokenizer factory for replication tests 2021-05-01 10:50:39 +02:00
Sarah Hoffmann
36c624ec71 commit between migrations
Later migrations may require tables set up by older ones.
2021-05-01 10:47:35 +02:00
Sarah Hoffmann
7fd871a74d increase database version for tokenizer migration 2021-05-01 10:47:35 +02:00
Sarah Hoffmann
ced8f0f4a2 fix liniting issues 2021-04-30 17:59:50 +02:00
Sarah Hoffmann
388ebcbae2 move index creation for word table to tokenizer
This introduces a finalization routing for the tokenizer
where it can post-process the import if necessary.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
20891abe1c indexer: fetch extra place data asynchronously
The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
6ce6f62b8e fetch place info asynchronously 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
602728895e indexer: fetch ids in batches 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
fc995ea6b9 move database check for module to tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
be6262c6ce move status test to tokenizer
The availability of the module is now tested by the tokenizer.
2021-04-30 17:41:08 +02:00
Sarah Hoffmann
893490f94e add more tests for legacy tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
044bb6afa5 move tokenization in query into tokenizer 2021-04-30 17:41:08 +02:00
Sarah Hoffmann
3eb4d88057 boilerplate for PHP code of tokenizer
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
2021-04-30 11:31:52 +02:00
Sarah Hoffmann
23fd1d032a tests for legacy tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
7cb7cf848d move amenity creation to tokenizer
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
bef300305e move default country name creation to tokenizer
The new function is also used, when a country us updated. All SQL
function related to country names have been removed.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
dc700c25b6 cache all postcodes 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
0ba93e5ba9 reorganise address iteration in tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
0da481f207 remove debug code 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d75a235c1f use address tokens in SQL 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9e92759ac7 extract address tokens in tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
ffc2d82b0e move postcode normalization into tokenizer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d8ed1bfc60 move houseunumber handling to tokenizer
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
d711f5a81e move name token creation into tokenizer
Name tokens are now handed in via token_info and used from there.

Also moves the generic search name insertion function back to
placex_triggers.sql.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fa2bc60468 introduce name analyzer
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
e1c5673ac3 require tokeinzer for indexer 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
1b1ed820c3 introduce index for finding surrounding buildings 2021-04-30 11:30:51 +02:00
Sarah Hoffmann
a73711f3cd add extra column for tokenizer
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
9397bf54b8 introduce external processing in indexer
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.

The second processing step doesn't do anything useful yet.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
fbbdd31399 move word table and normalisation SQL into tokenizer
Creating and populating the word table is now the responsibility
of the tokenizer.

The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
2021-04-30 11:30:51 +02:00
Sarah Hoffmann
b5540dc35c add migration for configurable tokenizer
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
296a66558f move module installation to legacy tokenizer 2021-04-30 11:29:57 +02:00
Sarah Hoffmann
af968d4903 introduce tokenizer modules
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.

The legacy tokenizer implements Nominatim's traditional algorithms.
2021-04-30 11:29:57 +02:00
Sarah Hoffmann
5c7b9ef909 Merge pull request #2303 from lonvia/remove-aux-support
Remove support for AUX housenumber tables
2021-04-30 11:19:35 +02:00
Sarah Hoffmann
185d369404 remove support for AUX housenumber tables
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.

This removes the aux tables and all code that references them.
2021-04-30 10:08:29 +02:00
Sarah Hoffmann
51d20b19b6 Merge pull request #2299 from lonvia/update-actions
Fix database check for reverse-only
2021-04-27 12:18:45 +02:00
Sarah Hoffmann
46e8c6b112 Merge pull request #2291 from AntoJvlt/special-phrases-statistics
Special phrases statistics
2021-04-27 11:57:05 +02:00
Sarah Hoffmann
c8fb25201a do not check for extra housenumber index for reverse-only
Also adds a database check for reverse only import to the CI.
2021-04-27 10:14:26 +02:00
Sarah Hoffmann
1fd483643b add tests for different scripts 2021-04-26 23:01:06 +02:00
Sarah Hoffmann
a21a0864f1 Merge pull request #2298 from lonvia/add-warming-to-ci
Add warming to CI import tests and fix more Python 3.5 compatibility issues
2021-04-26 11:21:44 +02:00
Sarah Hoffmann
4457bf7528 avoid Path in subprocess parameters
Not supported by Python 3.5.
2021-04-26 10:55:23 +02:00
Sarah Hoffmann
5ed6f18d83 add warming to CI import test 2021-04-26 09:54:09 +02:00
AntoJvlt
abb3d56b20 Switching to log info and only send warning for invalid phrases 2021-04-25 17:57:43 +02:00
AntoJvlt
c5ecb9bae0 Implemented statistics for the import of special phrases through the SpecialPhrasesImporterStatistics class 2021-04-25 17:57:43 +02:00
AntoJvlt
1b68152fb2 reorganization of folder/file for the special phrases importer 2021-04-25 17:57:42 +02:00
Sarah Hoffmann
6812f397af Merge pull request #2297 from lonvia/update-deployment-docs
docs: update deployment to use project directory
2021-04-24 15:35:00 +02:00
Sarah Hoffmann
68bd9c6091 Merge pull request #2296 from lonvia/disable-too-few-public-methods-check
pylint: disable too-few-public-methods check
2021-04-24 15:03:28 +02:00
Sarah Hoffmann
754f9e3a20 docs: update deployment to use project directory
Fixes #2295.
2021-04-24 15:00:46 +02:00
Sarah Hoffmann
b951b11336 fix pylint complaints 2021-04-24 11:59:32 +02:00
Sarah Hoffmann
89c90bedb9 pylint: disable check too-few-public-methods 2021-04-24 11:39:44 +02:00
Sarah Hoffmann
b4fe7d7c7d Merge pull request #2293 from darkshredder/update-manpage
Updated manual page
2021-04-24 09:20:28 +02:00
Sarah Hoffmann
5071710db7 Merge pull request #2294 from lonvia/update-actions
CI: add import test against Python 3.5 and fix discovered issues
2021-04-23 23:33:15 +02:00
Sarah Hoffmann
9faaf3fc88 actions: add import on ubuntu 18.04
This uses oldest possible dependencies where possible.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9c51c133f7 indexes with includes are not available for postgresql < 11 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
91d2fb6b1c use group() for regex matches
Needed for compatibility with Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
280406c0d7 use pathlib version of open 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
d5fc3b5e99 subprocess needs string argument
Compatibility change for Python 3.5.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
f8f8c7e534 check for existance of custom .env before opening 2021-04-23 22:50:08 +02:00
Sarah Hoffmann
3a642d50a4 use more generic ImportError to check for module
ModuleNotFoundError was only introduced in Python 3.6.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
9685c68e30 replace usages of fromisoformat() with strptime()
fromisoformat was only introduced with Python 3.7 while we
still support Python 3.5.

Fixes #2292.
2021-04-23 22:50:08 +02:00
Sarah Hoffmann
95e6ec091b remove argparse dependency for vagrant scripts
Users don't need to recreate the manpage.
2021-04-23 22:50:08 +02:00
Darkshredder
34f5e4a199 Updated manual page 2021-04-24 01:42:38 +05:30
Sarah Hoffmann
788baafa26 bdd tests: fix place dependen ranking tests
The ranks of places may differ for some countries. Force the
place nodes in the test on null island which always uses the
default ranking.
2021-04-22 17:31:00 +02:00
Sarah Hoffmann
4c31813398 Merge pull request #2288 from RhinoDevel/patch-1
Replace "nominatim-update" with "nominatim".
2021-04-22 17:12:25 +02:00
RhinoDevel
b7bae80616 Replace "nominatim-update" with "nominatim".
If I am not mistaken, the correct command to index imported data via commandline is "nominatim index".
2021-04-22 15:40:22 +02:00
Sarah Hoffmann
f7e4aa51d3 indexer: reset query counter
Reset the counter for queries after the asynchronous connections
have been reopened.
2021-04-21 10:33:45 +02:00
Sarah Hoffmann
696c50459f Merge pull request #2285 from lonvia/split-indexer-code
Rework indexer code
2021-04-20 15:34:14 +02:00
Sarah Hoffmann
50b6d7298c factor out async connection handling into separate class
Also adds a test for reconnecting regularly while indexing.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
26a81654a8 indexer: make self.conn function-local
Also switches to our internal connect function which gives us
a cursor with a sclar() function.
2021-04-20 14:08:37 +02:00
Sarah Hoffmann
6430371d7d make index() function private 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
18705b3f18 move analyse function into indexinf function 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c6bd2bb7fb indexer: move runner into separate file 2021-04-20 14:08:37 +02:00
Sarah Hoffmann
c4fd94bd1a Merge pull request #2284 from lonvia/cleanup-word-frequency-computation
Rename and simplify function for word pre-computation
2021-04-19 18:28:04 +02:00
Sarah Hoffmann
b88b952f56 simplify token precomputation
Rename function to reflect that it is only used for precomputation.
The token IDs are not really needed, so don't bother to compute
the array of tokens.
2021-04-19 17:24:19 +02:00
Sarah Hoffmann
d68b02d36a remove unused word recomputation script
Has been replaced by a script recomputing counts from search_name.
2021-04-19 16:40:57 +02:00
Sarah Hoffmann
b9b85eb208 Merge pull request #2283 from darkshredder/tiger-data-test-fix
Fix: tiger-data tarfile test
2021-04-19 13:56:36 +02:00
Darkshredder
1f898405a6 Fix: tiger-data tarfile test 2021-04-19 16:02:52 +05:30
Sarah Hoffmann
6f6910101e Merge pull request #2282 from lonvia/add-paths-to-config
Include software paths in Python config object
2021-04-19 12:14:25 +02:00
Sarah Hoffmann
79d55357e8 simplify sql and website creation functions 2021-04-19 10:53:30 +02:00
Sarah Hoffmann
4fa6c0ad53 simplify constructor for SQL preprocessor
Use sql path from config.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
8f63f9516b simplify interface for adding tiger data
Also simplifies tests using existing fixtures.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
995ba2c7c2 add library directories to config
Allows to reduce the number of parameters in functions that take
the config anyway.
2021-04-19 10:26:25 +02:00
Sarah Hoffmann
830e3be1e6 Merge pull request #2281 from changpingc/changping/fix-tiger-index
fix index on location_property_tiger (parent_place_id)
2021-04-19 08:42:59 +02:00
Channgping Chen
29a314a092 fix index on location_property_tiger (parent_place_id)
Looks like 2af82975cd
accidentally renamed an index. Because of the added "if not
exists" clause, the index doesn't get created. This
significantly slows down reverse queries because they now
require full scans on location_property_tiger.

Without this fix, reverse queries can take 8s on a full
planet install on an r5.8xlarge instance in EC2.
2021-04-19 00:33:15 +00:00
Sarah Hoffmann
abdba5fdc7 Merge pull request #2280 from AntoJvlt/Fix-special-phrases-import-and-tests-cleaning
Fix regex and sanity check for the import of special phrases and tests cleaning.
2021-04-18 11:57:19 +02:00
AntoJvlt
b2ae715699 Only log a warning if a wrong input is detected on the wiki while importing special phrases 2021-04-17 20:19:39 +02:00
AntoJvlt
a95c748363 Fix occurence regex 2021-04-17 19:24:13 +02:00
AntoJvlt
ec859e41c6 Cleaned tests and add database cleaning tests on test_import_from_wiki 2021-04-17 19:23:33 +02:00
Sarah Hoffmann
7aeae9da81 Merge pull request #2279 from lonvia/add-index-for-continued-indexing
Add index for continued indexing
2021-04-17 11:51:21 +02:00
Sarah Hoffmann
2ca11ccc6b add tests for continuing import 2021-04-17 11:10:36 +02:00
Sarah Hoffmann
d74ae669e3 add support index when continuing import at index phase
Indexing scans the placex table sequentially during indexing
on the initial import. That is okay because we know that all
rows need to be processed anywhere. When continuing the import,
however, a large part might already be indexed, so that the
process spends a lot of time going through rows that are no
longer of interest. Create a supporting index for all unindexed
rows to speed up the scan. This is the same index as used later
for updates.
2021-04-17 11:07:04 +02:00
Sarah Hoffmann
9fabc5572d Merge pull request #2278 from lonvia/remove-transistion-functions
Remove transition functions
2021-04-17 10:13:33 +02:00
Sarah Hoffmann
da98a2102a remove transition functions from Python 2021-04-16 18:41:14 +02:00
Sarah Hoffmann
fb3353b854 Merge pull request #2277 from lonvia/update-osm2pgsql
Update osm2pgsql to current master
2021-04-16 17:40:43 +02:00
Sarah Hoffmann
b7e5c54593 remove PHP code for transition functions 2021-04-16 17:28:51 +02:00
Sarah Hoffmann
68beec5590 remove installation of PHP util scripts 2021-04-16 17:09:40 +02:00
Sarah Hoffmann
6ba06d1eb4 Merge pull request #2276 from lonvia/port-country-code-creation-to-python
Port country code creation to python
2021-04-16 16:57:04 +02:00
Sarah Hoffmann
0f11e311c4 add test for new postcode import function 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
886a01c796 port function to compute initial postcodes to Python 2021-04-16 16:11:20 +02:00
Sarah Hoffmann
a632b9f86a Merge pull request #2275 from lonvia/switch-to-absolute-imports
Use absolute imports in Python code
2021-04-16 15:04:10 +02:00
Sarah Hoffmann
76b1885595 use absolute imports in Python code
Relative imports are no longer officially recommended.
2021-04-16 14:20:09 +02:00
Sarah Hoffmann
c55b409cf6 update osm2pgsql to current master (fixes version output) 2021-04-15 10:24:01 +02:00
Sarah Hoffmann
c64193f839 Merge pull request #2263 from AntoJvlt/special-phrases-autoupdate
Implemented auto update of special phrases while importing them
2021-04-15 10:13:25 +02:00
Sarah Hoffmann
28a2a795ba Merge pull request #2270 from lonvia/simplify-place-boundary-merge
Simplify matching between place and boundary names
2021-04-15 10:12:53 +02:00
Sarah Hoffmann
e90adfc7c3 adapt database check to new index layout 2021-04-14 17:52:59 +02:00
Sarah Hoffmann
16267dc021 add migration for new placenode geometry index 2021-04-14 17:52:59 +02:00
Sarah Hoffmann
e7266b52ae simplify name matching between boundary and place node
Instead of normalising the names simply compare them in lower
case. This removes the dependency on the tokenizer for
linking boundaries and nodes. When looking up the linked places
by place type also allow that one name is simply contained in the
other. This catches the frequent case where one of the names has
an addendum (e.g. Newport vs. City of Newport).

Drops the special index for the name lookup and insted relies
on a slightly extended version of the geometry index used for
reverse lookup. Saves around 100MB on a planet.
2021-04-14 17:52:59 +02:00
Sarah Hoffmann
dc02610408 Merge pull request #2269 from lonvia/fix-actions
github actions: reintroduce postgresql repo
2021-04-14 17:50:02 +02:00
Sarah Hoffmann
dc1bfe4a93 github actions: reintroduce postgresql repo 2021-04-14 17:25:44 +02:00
Sarah Hoffmann
cf69daaafb Merge pull request #2264 from darkshredder/tiger-data-tests
Fix:  Error if last statements is wrong and improved tests in tiger data import
2021-04-14 10:56:12 +02:00
Darkshredder
49ee7505ed Fix: Removed error if endstatement is wrong and improved tests 2021-04-13 15:44:12 +05:30
AntoJvlt
ae2b2cb9a5 Tests added for the auto update of special phrases during import 2021-04-12 14:35:29 +02:00
AntoJvlt
8c2f287ce4 Implemented auto update of special phrases while importing them 2021-04-12 14:30:48 +02:00
Sarah Hoffmann
2351f36315 Merge pull request #2260 from AntoJvlt/fix-load-languages-special-phrases
Fix default languages loading for special phrases import
2021-04-11 23:09:45 +02:00
AntoJvlt
5ecae10713 Fix default languages loading 2021-04-11 22:26:31 +02:00
Sarah Hoffmann
2e3d657794 Merge pull request #2258 from darkshredder/code-coverage
Disabled Code coverage status checks
2021-04-10 21:19:55 +02:00
Darkshredder
90f990b806 CodeCov comment only when codecoverage changes 2021-04-10 22:28:29 +05:30
Darkshredder
7666d48409 Disabled Coverage status checks 2021-04-10 20:44:52 +05:30
Sarah Hoffmann
be4cb190e8 add badge for codecov 2021-04-10 16:57:39 +02:00
Sarah Hoffmann
2f4eca8c46 Merge pull request #2252 from darkshredder/code-coverage
Added Code coverage support using Codecov
2021-04-10 16:37:12 +02:00
Sarah Hoffmann
71564fa1de split LANGUAGES parameter before use
The user supplies the languages as a comma-separated list.
2021-04-09 17:48:28 +02:00
Sarah Hoffmann
ce08cb6cd7 add migration information for new configuration format 2021-04-08 11:01:46 +02:00
Sarah Hoffmann
1f0cf6311a Merge pull request #2256 from lonvia/remove-reverseinplan-option
Remove ReverseInPlan option
2021-04-08 10:54:16 +02:00
Sarah Hoffmann
1db468b6c3 remove special handling for reversed queries in getGroupedSearches
getGroupedSearches is guaranteed not to be called with reversed
structured queries, so there is no need to have special exclusion
code.
2021-04-08 10:35:14 +02:00
Sarah Hoffmann
534de5ba81 remove reverseInPlan option from Geocode
Disabling query reversal is no longer possible in the configuration,
so there is no need to keep this as an option. Reversal is
automatically disabled for structured search only.
2021-04-08 10:19:27 +02:00
Darkshredder
2bfea15fdc Fixed BDD tests coverage reports 2021-04-05 06:30:31 +05:30
Darkshredder
771b3377c0 Added code-cov Support for Code Coverage 2021-03-31 05:00:03 +05:30
314 changed files with 34387 additions and 13454 deletions

View File

@@ -7,6 +7,8 @@ assignees: ''
---
<!-- Note: this template is for reporting problems with searching. If you have found an issue with the data, you need to report/fix the issue directly in OpenStreetMap. See https://www.openstreetmap.org/fixthemap for details. -->
## What did you search for?
<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. -->
@@ -15,11 +17,11 @@ assignees: ''
## What result did you expect?
**Is the result in the right place and just named wrongly?**
**When the result in the right place and just named wrongly:**
<!-- Please tell us the display name you expected. -->
**Is the result missing completely?**
**When the result missing completely:**
<!-- Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.

View File

@@ -1,13 +1,26 @@
name: 'Build Nominatim'
inputs:
ubuntu:
description: 'Version of Ubuntu to install on'
required: false
default: '20'
runs:
using: "composite"
steps:
- name: Install prerequisites
run: |
sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev python3-psycopg2 python3-pyosmium python3-dotenv python3-psutil python3-jinja2 python3-icu
sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev
if [ "x$UBUNTUVER" == "x18" ]; then
pip3 install python-dotenv psycopg2==2.7.7 jinja2==2.8 psutil==5.4.2 pyicu osmium PyYAML==5.1 datrie
else
sudo apt-get install -y -qq python3-icu python3-datrie python3-pyosmium python3-jinja2 python3-psutil python3-psycopg2 python3-dotenv python3-yaml
fi
shell: bash
env:
UBUNTUVER: ${{ inputs.ubuntu }}
- name: Download dependencies
run: |

View File

@@ -3,72 +3,38 @@ name: CI Tests
on: [ push, pull_request ]
jobs:
tests:
runs-on: ubuntu-20.04
strategy:
matrix:
postgresql: [9.5, 13]
include:
- postgresql: 9.5
postgis: 2.5
- postgresql: 13
postgis: 3
create-archive:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
submodules: true
path: Nominatim
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '7.4'
tools: phpunit, phpcs
- name: Get Date
id: get-date
run: |
echo "::set-output name=date::$(/bin/date -u "+%Y%W")"
shell: bash
submodules: true
- uses: actions/cache@v2
with:
path: |
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
data/country_osm_grid.sql.gz
key: nominatim-country-data-1
- uses: ./Nominatim/.github/actions/setup-postgresql
- name: Package tarball
run: |
if [ ! -f data/country_osm_grid.sql.gz ]; then
wget --no-verbose -O data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz
fi
cd ..
tar czf nominatim-src.tar.bz2 Nominatim
mv nominatim-src.tar.bz2 Nominatim
- name: 'Upload Artifact'
uses: actions/upload-artifact@v2
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
name: full-source
path: nominatim-src.tar.bz2
retention-days: 1
- name: Install test prerequsites
run: sudo apt-get install -y -qq php-codesniffer pylint python3-pytest python3-behave
- name: PHP linting
run: phpcs --report-width=120 .
working-directory: Nominatim
- name: Python linting
run: pylint --extension-pkg-whitelist=osmium nominatim
working-directory: Nominatim
- name: PHP unit tests
run: phpunit ./
working-directory: Nominatim/test/php
- name: Python unit tests
run: py.test-3 test/python
working-directory: Nominatim
- name: BDD tests
run: behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3
working-directory: Nominatim/test/bdd
import:
tests:
needs: create-archive
strategy:
matrix:
ubuntu: [18, 20]
@@ -76,92 +42,279 @@ jobs:
- ubuntu: 18
postgresql: 9.5
postgis: 2.5
pytest: pytest
php: 7.2
- ubuntu: 20
postgresql: 13
postgis: 3
pytest: py.test-3
php: 7.4
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
with:
submodules: true
path: Nominatim
name: full-source
- name: Get Date
id: get-date
run: |
echo "::set-output name=date::$(/bin/date -u "+%Y%W")"
shell: bash
- name: Unpack Nominatim
run: tar xf nominatim-src.tar.bz2
- uses: actions/cache@v2
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
path: |
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
- uses: actions/cache@v2
with:
path: |
monaco-latest.osm.pbf
key: nominatim-test-data-${{ steps.get-date.outputs.date }}
php-version: ${{ matrix.php }}
coverage: xdebug
tools: phpunit, phpcs, composer
- uses: actions/setup-python@v2
with:
python-version: 3.5
python-version: 3.6
if: matrix.ubuntu == 18
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
- name: Install extra dependencies for Ubuntu 18
run: |
sudo apt-get install libicu-dev
pip3 install python-dotenv psycopg2==2.7.7 jinja2==2.8 psutil==5.4.2 pyicu osmium
shell: bash
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Install test prerequsites
run: sudo apt-get install -y -qq pylint python3-pytest python3-behave python3-pytest-cov php-codecoverage
if: matrix.ubuntu == 20
- name: Install test prerequsites
run: pip3 install pylint==2.6.0 pytest pytest-cov behave==1.2.6
if: matrix.ubuntu == 18
- name: Clean installation
run: rm -rf Nominatim build
- name: PHP linting
run: phpcs --report-width=120 .
working-directory: Nominatim
- name: Python linting
run: pylint nominatim
working-directory: Nominatim
- name: PHP unit tests
run: phpunit --coverage-clover ../../coverage-php.xml ./
working-directory: Nominatim/test/php
if: matrix.ubuntu == 20
- name: Python unit tests
run: $PYTEST --cov=nominatim --cov-report=xml test/python
working-directory: Nominatim
env:
PYTEST: ${{ matrix.pytest }}
- name: BDD tests
run: |
mkdir cov
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3 -DPHPCOV=./cov
composer require phpunit/phpcov:7.0.2
vendor/bin/phpcov merge --clover ../../coverage-bdd.xml ./cov
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 20
- name: BDD tests
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 18
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
files: ./Nominatim/coverage*.xml
directory: ./
name: codecov-umbrella
fail_ci_if_error: false
path_to_write_report: ./coverage/codecov_report.txt
verbose: true
if: matrix.ubuntu == 20
icu-test:
needs: create-archive
strategy:
matrix:
ubuntu: [20]
include:
- ubuntu: 20
postgresql: 13
postgis: 3
pytest: py.test-3
php: 7.4
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/download-artifact@v2
with:
name: full-source
- name: Unpack Nominatim
run: tar xf nominatim-src.tar.bz2
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
coverage: xdebug
tools: phpunit, phpcs, composer
- uses: actions/setup-python@v2
with:
python-version: 3.6
if: matrix.ubuntu == 18
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Install test prerequsites
run: sudo apt-get install -y -qq python3-behave
if: matrix.ubuntu == 20
- name: Install test prerequsites
run: pip3 install behave==1.2.6
if: matrix.ubuntu == 18
- name: BDD tests (icu tokenizer)
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build -DTOKENIZER=icu --format=progress3
working-directory: Nominatim/test/bdd
install:
runs-on: ubuntu-latest
needs: create-archive
strategy:
matrix:
name: [Ubuntu-18, Ubuntu-20, Centos-8]
include:
- name: Ubuntu-18
flavour: ubuntu
image: "ubuntu:18.04"
ubuntu: 18
install_mode: install-nginx
- name: Ubuntu-20
flavour: ubuntu
image: "ubuntu:20.04"
ubuntu: 20
install_mode: install-apache
- name: Centos-8
flavour: centos
image: "centos:8"
container:
image: ${{ matrix.image }}
env:
LANG: en_US.UTF-8
defaults:
run:
shell: sudo -Hu nominatim bash --noprofile --norc -eo pipefail {0}
steps:
- name: Prepare container (Ubuntu)
run: |
export APT_LISTCHANGES_FRONTEND=none
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get install -y git sudo wget
ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZONE /etc/localtime && echo $CONTAINER_TIMEZONE > /etc/timezone
shell: bash
if: matrix.flavour == 'ubuntu'
- name: Prepare container (CentOS)
run: |
dnf update -y
dnf install -y sudo glibc-langpack-en
shell: bash
if: matrix.flavour == 'centos'
- name: Setup import user
run: |
useradd -m nominatim
echo 'nominatim ALL=(ALL:ALL) NOPASSWD: ALL' > /etc/sudoers.d/nominiatim
echo "/home/nominatim/Nominatim/vagrant/Install-on-${OS}.sh no $INSTALL_MODE" > /home/nominatim/vagrant.sh
shell: bash
env:
OS: ${{ matrix.name }}
INSTALL_MODE: ${{ matrix.install_mode }}
- uses: actions/download-artifact@v2
with:
name: full-source
path: /home/nominatim
- name: Install Nominatim
run: |
export USERNAME=nominatim
export USERHOME=/home/nominatim
export NOSYSTEMD=yes
export HAVE_SELINUX=no
tar xf nominatim-src.tar.bz2
. vagrant.sh
working-directory: /home/nominatim
- name: Prepare import environment
run: |
if [ ! -f monaco-latest.osm.pbf ]; then
wget --no-verbose https://download.geofabrik.de/europe/monaco-latest.osm.pbf
fi
mkdir data-env
cd data-env
shell: bash
mv Nominatim/test/testdb/apidb-test-data.pbf test.pbf
rm -rf Nominatim
mkdir data-env-reverse
working-directory: /home/nominatim
- name: Prepare import environment (CentOS)
run: |
sudo ln -s /usr/local/bin/nominatim /usr/bin/nominatim
echo NOMINATIM_DATABASE_WEBUSER="apache" > nominatim-project/.env
cp nominatim-project/.env data-env-reverse/.env
working-directory: /home/nominatim
if: matrix.flavour == 'centos'
- name: Import
run: nominatim import --osm-file ../monaco-latest.osm.pbf
shell: bash
working-directory: data-env
run: nominatim import --osm-file ../test.pbf
working-directory: /home/nominatim/nominatim-project
- name: Import special phrases
run: nominatim special-phrases --import-from-wiki
working-directory: data-env
working-directory: /home/nominatim/nominatim-project
- name: Check import
- name: Check full import
run: nominatim admin --check-database
working-directory: data-env
working-directory: /home/nominatim/nominatim-project
- name: Warm up database
run: nominatim admin --warm
working-directory: data-env
working-directory: /home/nominatim/nominatim-project
- name: Prepare update (Ubuntu)
run: apt-get install -y python3-pip
shell: bash
if: matrix.flavour == 'ubuntu'
- name: Run update
run: |
nominatim replication --init
nominatim replication --once
working-directory: data-env
pip3 install --user osmium
nominatim replication --init
NOMINATIM_REPLICATION_MAX_DIFF=1 nominatim replication --once
working-directory: /home/nominatim/nominatim-project
- name: Run reverse-only import
run : nominatim import --osm-file ../monaco-latest.osm.pbf --reverse-only
working-directory: data-env
env:
NOMINATIM_DATABASE_DSN: pgsql:dbname=reverse
run : |
echo 'NOMINATIM_DATABASE_DSN="pgsql:dbname=reverse"' >> .env
nominatim import --osm-file ../test.pbf --reverse-only --no-updates
working-directory: /home/nominatim/data-env-reverse
- name: Check reverse import
run: nominatim admin --check-database
working-directory: /home/nominatim/data-env-reverse

7
.gitignore vendored
View File

@@ -1,12 +1,9 @@
*.log
*.pyc
build
settings/local.php
docs/develop/*.png
data/wiki_import.sql
data/wiki_specialphrases.sql
data/osmosischange.osc
build
.vagrant
data/country_osm_grid.sql.gz

View File

@@ -1,7 +1,7 @@
[MASTER]
extension-pkg-whitelist=osmium
ignored-modules=icu
ignored-modules=icu,datrie
[MESSAGES CONTROL]
@@ -10,3 +10,6 @@ ignored-modules=icu
# closing added here because it sometimes triggers a false positive with
# 'with' statements.
ignored-classes=NominatimArgs,closing
disable=too-few-public-methods,duplicate-code
good-names=i,x,y,fd,db

View File

@@ -18,9 +18,9 @@ list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
project(nominatim)
set(NOMINATIM_VERSION_MAJOR 3)
set(NOMINATIM_VERSION_MINOR 7)
set(NOMINATIM_VERSION_PATCH 1)
set(NOMINATIM_VERSION_MAJOR 4)
set(NOMINATIM_VERSION_MINOR 0)
set(NOMINATIM_VERSION_PATCH 0)
set(NOMINATIM_VERSION "${NOMINATIM_VERSION_MAJOR}.${NOMINATIM_VERSION_MINOR}.${NOMINATIM_VERSION_PATCH}")
@@ -38,6 +38,7 @@ set(BUILD_TESTS on CACHE BOOL "Build test suite")
set(BUILD_DOCS on CACHE BOOL "Build documentation")
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
set(BUILD_OSM2PGSQL on CACHE BOOL "Build osm2pgsql (expert only)")
set(INSTALL_MUNIN_PLUGINS on CACHE BOOL "Install Munin plugins for supervising Nominatim")
#-----------------------------------------------------------------------------
# osm2pgsql (imports/updates only)
@@ -62,7 +63,7 @@ endif()
#-----------------------------------------------------------------------------
if (BUILD_IMPORTER)
find_package(PythonInterp 3.5 REQUIRED)
find_package(PythonInterp 3.6 REQUIRED)
endif()
#-----------------------------------------------------------------------------
@@ -109,21 +110,6 @@ if (BUILD_IMPORTER)
" wget -O ${PROJECT_SOURCE_DIR}/data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz")
endif()
set(CUSTOMSCRIPTS
check_import_finished.php
country_languages.php
export.php
query.php
setup.php
update.php
warm.php
)
foreach (script_source ${CUSTOMSCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/script.tmpl
${PROJECT_BINARY_DIR}/utils/${script_source})
endforeach()
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool.tmpl
${PROJECT_BINARY_DIR}/nominatim)
endif()
@@ -168,7 +154,7 @@ if (BUILD_TESTS)
if (PHPCS)
message(STATUS "Using phpcs binary ${PHPCS}")
add_test(NAME phpcs
COMMAND ${PHPCS} --report-width=120 --colors lib website utils
COMMAND ${PHPCS} --report-width=120 --colors lib-php
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "phpcs not found. PHP linting tests disabled." )
@@ -214,7 +200,7 @@ endif()
#-----------------------------------------------------------------------------
if (BUILD_MANPAGE)
add_subdirectory(manual)
add_subdirectory(man)
endif()
#-----------------------------------------------------------------------------
@@ -226,6 +212,7 @@ include(GNUInstallDirs)
set(NOMINATIM_DATADIR ${CMAKE_INSTALL_FULL_DATADIR}/${PROJECT_NAME})
set(NOMINATIM_LIBDIR ${CMAKE_INSTALL_FULL_LIBDIR}/${PROJECT_NAME})
set(NOMINATIM_CONFIGDIR ${CMAKE_INSTALL_FULL_SYSCONFDIR}/${PROJECT_NAME})
set(NOMINATIM_MUNINDIR ${CMAKE_INSTALL_FULL_DATADIR}/munin/plugins)
if (BUILD_IMPORTER)
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool-installed.tmpl installed.bin)
@@ -273,4 +260,16 @@ install(FILES settings/env.defaults
settings/import-address.style
settings/import-full.style
settings/import-extratags.style
settings/icu_tokenizer.yaml
settings/country_settings.yaml
DESTINATION ${NOMINATIM_CONFIGDIR})
install(DIRECTORY settings/icu-rules
DESTINATION ${NOMINATIM_CONFIGDIR})
if (INSTALL_MUNIN_PLUGINS)
install(FILES munin/nominatim_importlag
munin/nominatim_query_speed
munin/nominatim_requests
DESTINATION ${NOMINATIM_MUNINDIR})
endif()

View File

@@ -1,8 +1,45 @@
4.0.0
* refactor name token computation and introduce ICU tokenizer
* name processing now happens in the indexer outside the DB
* reorganizes abbreviation handling and moves it to the indexing phases
* adds preprocessing of names
* add country-specific ranking for Spain, Slovakia
* partially switch to using SP-GIST indexes
* better updating of dependent addresses for name changes in streets
* remove unused/broken tables for external housenumbers
* move external postcodes to CSV format and no longer save them in tables
(adds support for postcodes for arbitrary countries)
* remove postcode helper entries from placex (thanks @AntoJvlt)
* change required format for TIGER data to CSV
* move configuration of default languages from wiki into config file
* expect customized configuration files in project directory by default
* disable search API for reverse-only import (thanks @darkshredder)
* port most of maintenance/import code to Python and remove PHP utils
* add catch-up mode for replication
* add updating of special phrases (thanks @AntoJvlt)
* add support for special phrases in CSV files (thanks @AntoJvlt)
* switch to case-independent matching between place and boundary names
* remove disabling of reverse query parsing
* minor tweaks to search algorithm to avoid more false positives
* major overhaul of the administrator and developer documentation
* add security disclosure policy
* add testing of installation scripts via CI
* drop support for Python < 3.6 and Postgresql < 9.5
3.7.2
* fix database check for reverse-only imports
* do not error out in status API result when import date is missing
* add array_key_last function for PHP < 7.3 (thanks to @woodpeck)
* fix more url when server name is unknown (thanks to @mogita)
* commit changes to replication log table
3.7.1
* fix smaller issues with special phrases import
* fix smaller issues with special phrases import (thanks @AntoJvlt)
* add index to speed up continued indexing during import
* fix index on location_property_tiger(parent_place_id)
* fix index on location_property_tiger(parent_place_id) (thanks @changpingc)
* make sure Python code is backward-compatible with Python 3.5
* various documentation fixes
@@ -28,7 +65,6 @@
* add non-key indexes to speed up housenumber + street searches
* switch housenumber field in placex to save transliterated names
3.6.0
* add full support for searching by and displaying of addr:* tags

View File

@@ -1,4 +1,5 @@
[![Build Status](https://github.com/osm-search/Nominatim/workflows/CI%20Tests/badge.svg)](https://github.com/osm-search/Nominatim/actions?query=workflow%3A%22CI+Tests%22)
[![codecov](https://codecov.io/gh/osm-search/Nominatim/branch/master/graph/badge.svg?token=8P1LXrhCMy)](https://codecov.io/gh/osm-search/Nominatim)
Nominatim
=========
@@ -19,14 +20,6 @@ https://nominatim.org/release-docs/develop/ .
Installation
============
**Nominatim is a complex piece of software and runs in a complex environment.
Installing and running Nominatim is something for experienced system
administrators only who can do some trouble-shooting themselves. We are sorry,
but we can not provide installation support. We are all doing this in our free
time and there is just so much of that time to go around. Do not open issues in
our bug tracker if you need help. Use the discussions forum
or ask for help on [help.openstreetmap.org](https://help.openstreetmap.org/).**
The latest stable release can be downloaded from https://nominatim.org.
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation), as well as an extensive [Troubleshooting/FAQ section](https://nominatim.org/release-docs/latest/admin/Faq/).

39
SECURITY.md Normal file
View File

@@ -0,0 +1,39 @@
# Security Policy
## Supported Versions
All Nominatim releases receive security updates for two years.
The following table lists the end of support for all currently supported
versions.
| Version | End of support for security updates |
| ------- | ----------------------------------- |
| 3.7.x | 2023-04-05 |
| 3.6.x | 2022-12-12 |
| 3.5.x | 2022-06-05 |
| 3.4.x | 2021-10-24 |
## Reporting a Vulnerability
If you believe, you have found an issue in Nominatim that has implications on
security, please send a description of the issue to **security@nominatim.org**.
You will receive an acknowledgement of your mail within 3 work days where we
also notify you of the next steps.
## How we Disclose Security Issues
** The following section only applies to security issues found in released
versions. Issues that concern the master development branch only will be
fixed immediately on the branch with the corresponding PR containing the
description of the nature and severity of the issue. **
Patches for identified security issues are applied to all affected versions and
new minor versions are released. At the same time we release a statement at
the [Nominatim blog](https://nominatim.org/blog/) describing the nature of the
incident. Announcements will also be published at the
[geocoding mailinglist](https://lists.openstreetmap.org/listinfo/geocoding).
## List of Previous Incidents
* 2020-05-04 - [SQL injection issue on /details endpoint](https://lists.openstreetmap.org/pipermail/geocoding/2020-May/002012.html)

View File

@@ -1,14 +0,0 @@
#!@PHP_BIN@ -Cq
<?php
require('@CMAKE_SOURCE_DIR@/lib-php/dotenv_loader.php');
@define('CONST_Default_ModulePath', '@CMAKE_BINARY_DIR@/module');
@define('CONST_Default_Osm2pgsql', '@CMAKE_BINARY_DIR@/osm2pgsql/osm2pgsql');
@define('CONST_DataDir', '@CMAKE_SOURCE_DIR@/data');
@define('CONST_SqlDir', '@CMAKE_SOURCE_DIR@/lib-sql');
@define('CONST_ConfigDir', '@CMAKE_SOURCE_DIR@/settings');
loadDotEnv();
$_SERVER['NOMINATIM_NOMINATIM_TOOL'] = '@CMAKE_BINARY_DIR@/nominatim';
require_once('@CMAKE_SOURCE_DIR@/lib-php/admin/@script_source@');

14
codecov.yml Normal file
View File

@@ -0,0 +1,14 @@
codecov:
require_ci_to_pass: yes
coverage:
status:
project: off
patch: off
comment:
require_changes: true
after_n_builds: 2
fixes:
- "Nominatim/::"

File diff suppressed because one or more lines are too long

View File

@@ -29787,7 +29787,7 @@ st 5557484
-- prefill word table
select count(make_keywords(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(precompute_words(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(getorcreate_housenumber_id(make_standard_name(v))) from (select distinct address->'housenumber' as v from place where address ? 'housenumber') as w;
-- copy the word frequencies

View File

@@ -10,6 +10,7 @@ set (DOC_SOURCES
admin
develop
api
customize
index.md
extra.css
styles.css
@@ -26,7 +27,10 @@ ADD_CUSTOM_TARGET(doc
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Centos-8.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Centos-8.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-18.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-18.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-20.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-20.md
COMMAND mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
COMMAND PYTHONPATH=${PROJECT_SOURCE_DIR} mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
)
ADD_CUSTOM_TARGET(serve-doc
COMMAND PYTHONPATH=${PROJECT_SOURCE_DIR} mkdocs serve
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
)

View File

@@ -5,9 +5,34 @@ your Nominatim database. It is assumed that you have already successfully
installed the Nominatim software itself, if not return to the
[installation page](Installation.md).
## Importing multiple regions
## Importing multiple regions (without updates)
To import multiple regions in your database, you need to configure and run `utils/import_multiple_regions.sh` file. This script will set up the update directory which has the following structure:
To import multiple regions in your database you can simply give multiple
OSM files to the import command:
```
nominatim import --osm-file file1.pbf --osm-file file2.pbf
```
If you already have imported a file and want to add another one, you can
use the add-data function to import the additional data as follows:
```
nominatim add-data --file <FILE>
nominatim refresh --postcodes
nominatim index -j <NUMBER OF THREADS>
```
Please note that adding additional data is always significantly slower than
the original import.
## Importing multiple regions (with updates)
If you want to import multiple regions _and_ be able to keep them up-to-date
with updates, then you can use the scripts provided in the `utils` directory.
These scripts will set up an `update` directory in your project directory,
which has the following structure:
```bash
update
@@ -17,7 +42,6 @@ update
   │   └── monaco
   │   └── sequence.state
   └── tmp
├── combined.osm.pbf
└── europe
├── andorra-latest.osm.pbf
└── monaco-latest.osm.pbf
@@ -25,87 +49,59 @@ update
```
The `sequence.state` files will contain the sequence ID, which will be used by pyosmium to get updates. The tmp folder is used for import dump.
The `sequence.state` files contain the sequence ID for each region. They will
be used by pyosmium to get updates. The `tmp` folder is used for import dump and
can be deleted once the import is complete.
### Configuring multiple regions
The file `import_multiple_regions.sh` needs to be edited as per your requirement:
1. List of countries. eg:
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
### Setting up multiple regions
!!! tip
If your database already exists and you want to add more countries,
replace the setting up part
`${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1`
with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1`
where N is the numbers of CPUs in your system.
Create a project directory as described for the
[simple import](Import.md#creating-the-project-directory). If necessary,
you can also add an `.env` configuration with customized options. In particular,
you need to make sure that `NOMINATIM_REPLICATION_UPDATE_INTERVAL` and
`NOMINATIM_REPLICATION_RECHECK_INTERVAL` are set according to the update
interval of the extract server you use.
Run the following command from your Nominatim directory after configuring the file.
Copy the scripts `utils/import_multiple_regions.sh` and `utils/update_database.sh`
into the project directory.
bash ./utils/import_multiple_regions.sh
Now customize both files as per your requirements
!!! danger "Important"
This file uses osmium-tool. It must be installed before executing the import script.
Installation instructions can be found [here](https://osmcode.org/osmium-tool/manual.html#installation).
### Updating multiple regions
To import multiple regions in your database, you need to configure and run ```utils/update_database.sh```.
This uses the update directory set up while setting up the DB.
### Configuring multiple regions
The file `update_database.sh` needs to be edited as per your requirement:
1. List of countries. eg:
1. List of countries. e.g.
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
2. URL to the service providing the extracts and updates. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-updates"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
5. Followup can be set according to your installation. eg: For Photon,
5. Followup in the update script can be set according to your installation.
E.g. for Photon,
FOLLOWUP="curl http://localhost:2322/nominatim-update"
will handle the indexing.
To start the initial import, change into the project directory and run
```
bash import_multiple_regions.sh
```
### Updating the database
Run the following command from your Nominatim directory after configuring the file.
Change into the project directory and run the following command:
bash ./utils/update_database.sh
bash update_database.sh
This will get diffs from the replication server, import diffs and index the database. The default replication server in the script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
This will get diffs from the replication server, import diffs and index
the database. The default replication server in the
script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Importing Nominatim to an external PostgreSQL database
## Using an external PostgreSQL database
You can install Nominatim using a database that runs on a different server when
you have physical access to the file system on the other server. Nominatim
@@ -113,6 +109,11 @@ uses a custom normalization library that needs to be made accessible to the
PostgreSQL server. This section explains how to set up the normalization
library.
!!! note
The external module is only needed when using the legacy tokenizer.
If you have choosen the ICU tokenizer, then you can ignore this section
and follow the standard import documentation.
### Option 1: Compiling the library on the database server
The most sure way to get a working library is to compile it on the database
@@ -170,4 +171,45 @@ NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominati
```
Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
to follow the [standard instructions for importing](/admin/Import).
to follow the [standard instructions for importing](Import.md).
## Moving the database to another machine
For some configurations it may be useful to run the import on one machine, then
move the database to another machine and run the Nominatim service from there.
For example, you might want to use a large machine to be able to run the import
quickly but only want a smaller machine for production because there is not so
much load. Or you might want to do the import once and then replicate the
database to many machines.
The important thing to keep in mind when transferring the Nominatim installation
is that you need to transfer the database _and the project directory_. Both
parts are essential for your installation.
The Nominatim database can be transferred using the `pg_dump`/`pg_restore` tool.
Make sure to use the same version of PostgreSQL and PostGIS on source and
target machine.
!!! note
Before creating a dump of your Nominatim database, consider running
`nominatim freeze` first. Your database looses the ability to receive further
data updates but the resulting database is only about a third of the size
of a full database.
Next install Nominatim on the target machine by following the standard installation
instructions. Again make sure to use the same version as the source machine.
You can now copy the project directory from the source machine to the new machine.
If necessary, edit the `.env` file to point it to the restored database.
Finally run
nominatim refresh --website
to make sure that the local installation of Nominatim will be used.
If you are using the legacy tokenizer you might also have to switch to the
PostgreSQL module that was compiled on your target machine. If you get errors
that PostgreSQL cannot find or access `nominatim.so` then copy the installed
version into the `module` directory of your project directory. The installed
copy can usually be found under `/usr/local/lib/nominatim/module/nominatim.so`.

View File

@@ -134,7 +134,7 @@ On CentOS v7 the PostgreSQL server is started with `systemd`. Check if
`/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`. If
so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security
feature, so use the
[preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
[preferred solution](../appendix/Install-on-Centos-7.md#adding-selinux-security-settings).
However, you can solve this the quick and dirty way by commenting out that line and then run
@@ -182,7 +182,7 @@ by everybody, e.g.
Try `chmod a+r nominatim.so; chmod a+x nominatim.so`.
When running SELinux, make sure that the
[context is set up correctly](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
[context is set up correctly](../appendix/Install-on-Centos-7.md#adding-selinux-security-settings).
When you recently updated your operating system, updated PostgreSQL to
a new version or moved files (e.g. the build directory) you should

View File

@@ -40,15 +40,16 @@ all commands from the project directory.
### Configuration setup in `.env`
The Nominatim server can be customized via an `.env` configuration file in the
The Nominatim server can be customized via an `.env` configuration file in the
project directory. This is a file in [dotenv](https://github.com/theskumar/python-dotenv)
format which looks the same as variable settings in a standard shell environment.
You can also set the same configuration via environment variables. All
settings have a `NOMINATIM_` prefix to avoid conflicts with other environment
variables.
There are lots of configuration settings you can tweak. Have a look
at `settings/env.default` for a full list. Most should have a sensible default.
There are lots of configuration settings you can tweak. A full reference
can be found in the chapter [Configuration Settings](../customize/Settings.md).
Most should have a sensible default.
#### Flatnode files
@@ -83,15 +84,19 @@ The file is about 400MB and adds around 4GB to the Nominatim database.
`nominatim refresh --wiki-data --importance`. Updating importances for
a planet can take a couple of hours.
### Great Britain, USA postcodes
### External postcodes
Nominatim can use postcodes from an external source to improve searches that
involve a GB or US postcode. This data can be optionally downloaded into the
project directory:
Nominatim can use postcodes from an external source to improve searching with
postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
This data can be optionally downloaded into the project directory:
cd $PROJECT_DIR
wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
wget https://www.nominatim.org/data/us_postcode_data.sql.gz
wget https://www.nominatim.org/data/gb_postcodes.csv.gz
wget https://www.nominatim.org/data/us_postcodes.csv.gz
You can also add your own custom postcode sources, see
[Customization of postcodes](../customize/Postcodes.md).
## Choosing the data to import
@@ -107,7 +112,7 @@ If you only need geocoding for a smaller region, then precomputed OSM extracts
are a good way to reduce the database size and import time.
[Geofabrik](https://download.geofabrik.de) offers extracts for most countries.
They even have daily updates which can be used with the update process described
[in the next section](../Update). There are also
[in the next section](Update.md). There are also
[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading).
Please be aware that some extracts are not cut exactly along the country
@@ -133,6 +138,14 @@ Note that you still need to provide for sufficient disk space for the initial
import. So this option is particularly interesting if you plan to transfer the
database or reuse the space later.
!!! warning
The datastructure for updates are also required when adding additional data
after the import, for example [TIGER housenumber data](../customize/Tiger.md).
If you plan to use those, you must not use the `--no-updates` parameter.
Do a normal import, add the external data and once you are done with
everything run `nominatim freeze`.
### Reverse-only Imports
If you only want to use the Nominatim database for reverse lookups or
@@ -148,15 +161,15 @@ Nominatim normally sets up a full search database containing administrative
boundaries, places, streets, addresses and POI data. There are also other
import styles available which only read selected data:
* **settings/import-admin.style**
* **admin**
Only import administrative boundaries and places.
* **settings/import-street.style**
* **street**
Like the admin style but also adds streets.
* **settings/import-address.style**
* **address**
Import all data necessary to compute addresses down to house number level.
* **settings/import-full.style**
* **full**
Default style that also includes points of interest.
* **settings/import-extratags.style**
* **extratags**
Like the full style but also adds most of the OSM tags into the extratags
column.
@@ -179,8 +192,8 @@ full | 54h | 640 GB | 330 GB
extratags | 54h | 650 GB | 340 GB
You can also customize the styles further.
A [description of the style format](../develop/Import.md#configuring-the-import)
can be found in the development section.
A [description of the style format](../customize/Import-Styles.md)
can be found in the customization guide.
## Initial import of the data
@@ -189,12 +202,15 @@ can be found in the development section.
[Geofabrik](https://download.geofabrik.de).
Download the data to import. Then issue the following command
from the **build directory** to start the import:
from the **project directory** to start the import:
```sh
nominatim import --osm-file <data file> 2>&1 | tee setup.log
```
The **project directory** is the one that you have set up at the beginning.
See [creating the project directory](#creating-the-project-directory).
### Notes on full planet imports
Even on a perfectly configured machine
@@ -212,7 +228,7 @@ to load the OSM data into the PostgreSQL database. This step is very demanding
in terms of RAM usage. osm2pgsql and PostgreSQL are running in parallel at
this point. PostgreSQL blocks at least the part of RAM that has been configured
with the `shared_buffers` parameter during
[PostgreSQL tuning](Installation#postgresql-tuning)
[PostgreSQL tuning](Installation.md#postgresql-tuning)
and needs some memory on top of that. osm2pgsql needs at least 2GB of RAM for
its internal data structures, potentially more when it has to process very large
relations. In addition it needs to maintain a cache for node locations. The size
@@ -231,7 +247,8 @@ reduce the cache size or even consider using a flatnode file.
### Testing the installation
Run this script to verify all required tables and indices got created successfully.
Run this script to verify that all required tables and indices got created
successfully.
```sh
nominatim admin --check-database
@@ -248,59 +265,24 @@ to verify that your installation is working. Go to
`http://localhost:8088/status.php` and you should see the message `OK`.
You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`.
Note that search query is not supported for reverse-only imports. You can run a
reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=78.04209025`.
To run Nominatim via webservers like Apache or nginx, please read the
[Deployment chapter](Deployment.md).
## Tuning the database
Accurate word frequency information for search terms helps PostgreSQL's query
planner to make the right decisions. Recomputing them can improve the performance
of forward geocoding in particular under high load. To recompute word counts run:
```sh
nominatim refresh --word-counts
```
This will take a couple of hours for a full planet installation. You can
also defer that step to a later point in time when you realise that
performance becomes an issue. Just make sure that updates are stopped before
running this function.
## Adding search through category phrases
If you want to be able to search for places by their type through
[special key phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
[special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to import these key phrases like this:
nominatim special-phrases --import-from-wiki
```sh
nominatim special-phrases --import-from-wiki
```
Note that this command downloads the phrases from the wiki link above. You
need internet access for the step.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2020 data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger2020-nominatim-preprocessed.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger2020-nominatim-preprocessed.tar.gz
3. Enable use of the Tiger data in your `.env` by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions
See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
information on how the data got preprocessed.
You can also import special phrases from a csv file, for more
information please see the [Customization part](../customize/Special-Phrases.md).

View File

@@ -17,12 +17,17 @@ and can't offer support.
* [Docker](https://github.com/mediagis/nominatim-docker)
* [Docker on Kubernetes](https://github.com/peter-evans/nominatim-k8s)
* [Kubernetes with Helm](https://github.com/robjuz/helm-charts/blob/master/charts/nominatim/README.md)
* [Ansible](https://github.com/synthesio/infra-ansible-nominatim)
## Prerequisites
### Software
!!! Warning
For larger installations you **must have** PostgreSQL 11+ and Postgis 3+
otherwise import and queries will be slow to the point of being unusable.
For compiling:
* [cmake](https://cmake.org/)
@@ -37,14 +42,16 @@ For compiling:
For running Nominatim:
* [PostgreSQL](https://www.postgresql.org) (9.3+ will work, 11+ strongly recommended)
* [PostGIS](https://postgis.net) (2.2+)
* [Python 3](https://www.python.org/) (3.5+)
* [PostgreSQL](https://www.postgresql.org) (9.5+ will work, 11+ strongly recommended)
* [PostGIS](https://postgis.net) (2.2+ will work, 3.0+ strongly recommended)
* [Python 3](https://www.python.org/) (3.6+)
* [Psycopg2](https://www.psycopg.org) (2.7+)
* [Python Dotenv](https://github.com/theskumar/python-dotenv)
* [psutil](https://github.com/giampaolo/psutil)
* [Jinja2](https://palletsprojects.com/p/jinja/)
* [PyICU](https://pypi.org/project/PyICU/)
* [PyYaml](https://pyyaml.org/) (5.1+)
* [datrie](https://github.com/pytries/datrie)
* [PHP](https://php.net) (7.0 or later)
* PHP-pgsql
* PHP-intl (bundled with PHP)

51
docs/admin/Maintenance.md Normal file
View File

@@ -0,0 +1,51 @@
This chapter describes the various operations the Nominatim database administrator
may use to clean and maintain the database. None of these operations is mandatory
but they may help improve the performance and accuracy of results.
## Updating postcodes
Command: `nominatim refresh --postcodes`
Postcode centroids (aka 'calculated postcodes') are generated by looking at all
postcodes of a country, grouping them and calculating the geometric centroid.
There is currently no logic to deal with extreme outliers (typos or other
mistakes in OSM data). There is also no check if a postcodes adheres to a
country's format, e.g. if Swiss postcodes are 4 digits.
When running regular updates, postcodes results can be improved by running
this command on a regular basis. Note that only the postcode table and the
postcode search terms are updated. The postcode that is assigned to each place
is only updated when the place is updated.
The command takes around 70min to run on the planet and needs ca. 40GB of
temporary disk space.
## Updating word counts
Command: `nominatim refresh --word-counts`
Nominatim keeps frequency statistics about all search terms it indexes. These
statistics are currently used to optimise queries to the database. Thus better
statistics mean better performance. Word counts are created once after import
and are usually sufficient even when running regular updates. You might want
to rerun the statistics computation when adding larger amounts of new data,
for example, when adding an additional country via `nominatim add-data`.
## Removing large deleted objects
Nominatim refuses to delete very large areas because often these deletions are
accidental and are reverted within hours. Instead the deletions are logged in
the `import_polygon_delete` table and left to the administrator to clean up.
There is currently no command to do that. You can use the following SQL
query to force a deletion on all objects that have been deleted more than
a certain timespan ago (here: 1 month):
```sql
SELECT place_force_delete(p.place_id) FROM import_polygon_delete d, placex p
WHERE p.osm_type = d.osm_type and p.osm_id = d.osm_id
and age(p.indexed_date) > '1 month'::interval
```

View File

@@ -15,8 +15,40 @@ breaking changes. **Please read them before running the migration.**
If you are migrating from a version <3.6, then you still have to follow
the manual migration steps up to 3.6.
## 3.7.0 -> 4.0.0
### NOMINATIM_PHRASE_CONFIG removed
Custom blacklist configurations for special phrases now need to be handed
with the `--config` parameter to `nominatim special-phrases`. Alternatively
you can put your custom configuration in the project directory in a file
named `phrase-settings.json`.
Version 3.8 also removes the automatic converter for the php format of
the configuration in older versions. If you are updating from Nominatim < 3.7
and still work with a custom `phrase-settings.php`, you need to manually
convert it into a json format.
### PHP utils removed
The old PHP utils have now been removed completely. You need to switch to
the appropriate functions of the nominatim command line tool. See
[Introducing `nominatim` command line tool](#introducing-nominatim-command-line-tool)
below.
## 3.6.0 -> 3.7.0
### New format and name of configuration file
The configuration for an import is now saved in a `.env` file in the project
directory. This file follows the dotenv format. For more information, see
the [installation chapter](Import.md#configuration-setup-in-env).
To migrate to the new system, create a new project directory, add the `.env`
file and port your custom configuration from `settings/local.php`. Most
settings are named similar and only have received a `NOMINATIM_` prefix.
Use the default settings in `settings/env.defaults` as a reference.
### New location for data files
External data files for Wikipedia importance, postcodes etc. are no longer
@@ -69,7 +101,7 @@ done
The debugging UI is no longer directly provided with Nominatim. Instead we
now provide a simple Javascript application. Please refer to
[Setting up the Nominatim UI](../Setup-Nominatim-UI) for details on how to
[Setting up the Nominatim UI](Setup-Nominatim-UI.md) for details on how to
set up the UI.
The icons served together with the API responses have been moved to the
@@ -113,6 +145,14 @@ configuration file, run the following command after updating:
./utils/setup.php --setup-website
```
### Update SQL code
To update the SQL code to the leatest version run:
```
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.4.0 -> 3.5.0
### New Wikipedia/Wikidata importance tables

View File

@@ -10,20 +10,20 @@ installation. For more details, please also have a look at the
## Installing nominatim-ui
nominatim-ui does not need any special installation, just download, configure
and run it.
We provide regular releases of nominatim-ui that contain the packaged website.
They do not need any special installation. Just download, configure
and run it. Grab the latest release from
[nominatim-ui's Github release page](https://github.com/osm-search/nominatim-ui/releases)
and unpack it. You can use `nominatim-ui-x.x.x.tar.gz` or `nominatim-ui-x.x.x.zip`.
Clone the source from github:
git clone https://github.com/osm-search/nominatim-ui
Copy the example configuration into the right place:
Next you need to adapt the UI yo your installation. Custom settings need to be
put into `dist/theme/config.theme.js`. At a minimum you need to
set `Nominatim_API_Endpoint` to point to your Nominatim installation:
cd nominatim-ui
cp dist/config.example.js dist/config.js
echo "Nominatim_Config.Nominatim_API_Endpoint='https:\\myserver.org\nominatim';" > dist/theme/config.theme.js
Now adapt the configuration to your needs. You need at least
to change the `Nominatim_API_Endpoint` to point to your Nominatim installation.
For the full set of available settings, have a look at `dist/config.defaults.js`.
Then you can just test it locally by spinning up a webserver in the `dist`
directory. For example, with Python:

View File

@@ -10,18 +10,21 @@ For a list of other methods to add or update data see the output of
If you have configured a flatnode file for the import, then you
need to keep this flatnode file around for updates.
#### Installing the newest version of Pyosmium
### Installing the newest version of Pyosmium
It is recommended to install Pyosmium via pip. Make sure to use python3.
The replication process uses
[Pyosmium](https://docs.osmcode.org/pyosmium/latest/updating_osm_data.html)
to download update data from the server.
It is recommended to install Pyosmium via pip.
Run (as the same user who will later run the updates):
```sh
pip3 install --user osmium
```
#### Setting up the update process
### Setting up the update process
Next the update needs to be initialised. By default Nominatim is configured
Next the update process needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
@@ -30,9 +33,9 @@ diffs for Ireland from Geofabrik add the following:
# base URL of the replication service
NOMINATIM_REPLICATION_URL="https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates"
# How often upstream publishes diffs
# How often upstream publishes diffs (in seconds)
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
# How long to sleep if no update found yet
# How long to sleep if no update found yet (in seconds)
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900
To set up the update process now run the following command:
@@ -45,12 +48,119 @@ what you expect.
The `replication --init` command needs to be rerun whenever the replication
service is changed.
#### Updating Nominatim
### Updating Nominatim
The following command will keep your database constantly up to date:
Nominatim supports different modes how to retrieve the update data from the
server. Which one you want to use depends on your exact setup and how often you
want to retrieve updates.
These instructions are for using a single source of updates. If you have
imported multiple country extracts and want to keep them
up-to-date, [Advanced installations section](Advanced-Installations.md)
contains instructions to set up and update multiple country extracts.
#### Continuous updates
This is the easiest mode. Simply run the replication command without any
parameters:
nominatim replication
If you have imported multiple country extracts and want to keep them
up-to-date, [Advanced installations section](Advanced-Installations.md) contains instructions
to set up and update multiple country extracts.
The update application keeps running forever and retrieves and applies
new updates from the server as they are published.
You can run this command as a simple systemd service. Create a service
description like that in `/etc/systemd/system/nominatim-update.service`:
```
[Unit]
Description=Continuous updates of Nominatim
[Service]
WorkingDirectory=/srv/nominatim
ExecStart=nominatim replication
StandardOutput=append:/var/log/nominatim-updates.log
StandardError=append:/var/log/nominatim-updates.error.log
User=nominatim
Group=nominatim
Type=simple
[Install]
WantedBy=multi-user.target
```
Replace the `WorkingDirectory` with your project directory. Also adapt user
and group names as required.
Now activate the service and start the updates:
```
sudo systemctl daemon-reload
sudo systemctl enable nominatim-updates
sudo systemctl start nominatim-updates
```
#### One-time mode
When the `--once` parameter is given, then Nominatim will download exactly one
batch of updates and then exit. This one-time mode still respects the
`NOMINATIM_REPLICATION_UPDATE_INTERVAL` that you have set. If according to
the update interval no new data has been published yet, it will go to sleep
until the next expected update and only then attempt to download the next batch.
The one-time mode is particularly useful if you want to run updates continuously
but need to schedule other work in between updates. For example, the main
service at osm.org uses it, to regularly recompute postcodes -- a process that
must not be run while updates are in progress. Its update script
looks like this:
```sh
#!/bin/bash
# Switch to your project directory.
cd /srv/nominatim
while true; do
nominatim replication --once
if [ -f "/srv/nominatim/schedule-mainenance" ]; then
rm /srv/nominatim/schedule-mainenance
nominatim refresh --postcodes
fi
done
```
A cron job then creates the file `/srv/nominatim/need-mainenance` once per night.
#### Catch-up mode
With the `--catch-up` parameter, Nominatim will immediately try to download
all changes from the server until the database is up-to-date. The catch-up mode
still respects the parameter `NOMINATIM_REPLICATION_MAX_DIFF`. It downloads and
applies the changes in appropriate batches until all is done.
The catch-up mode is foremost useful to bring the database up to speed after the
initial import. Give that the service usually is not in production at this
point, you can temporarily be a bit more generous with the batch size and
number of threads you use for the updates by running catch-up like this:
```
cd /srv/nominatim
NOMINATIM_REPLICATION_MAX_DIFF=5000 nominatim replication --catch-up --threads 15
```
The catch-up mode is also useful when you want to apply updates at a lower
frequency than what the source publishes. You can set up a cron job to run
replication catch-up at whatever interval you desire.
!!! hint
When running scheduled updates with catch-up, it is a good idea to choose
a replication source with an update frequency that is an order of magnitude
lower. For example, if you want to update once a day, use an hourly updated
source. This makes sure that you don't miss an entire day of updates when
the source is unexpectely late to publish its update.
If you want to use the source with the same update frequency (e.g. a daily
updated source with daily updates), use the
continuous update mode. It ensures to re-request the newest update until it
is published.

View File

@@ -35,7 +35,7 @@ it contains the county/state/country across the border.
#### 3. I get different counties/states/countries when I change the zoom parameter in the reverse query. How is that possible?
This is basically the same problem as in the previous answer.
The zoom level influences at which [search rank](https://wiki.openstreetmap.org/wiki/Nominatim/Development_overview#Country_to_street_level) Nominatim starts looking
The zoom level influences at which [search rank](../customize/Ranking.md#search-rank) Nominatim starts looking
for the closest object. So the closest house number maybe on one side of the
border while the closest street is on the other. As the address details contain
the address of the closest object found, you might sometimes get one result,

View File

@@ -290,6 +290,7 @@ with a designation label. Per default the following labels may appear:
* emergency, historic, military, natural, landuse, place, railway,
man_made, aerialway, boundary, amenity, aeroway, club, craft, leisure,
office, mountain_pass, shop, tourism, bridge, tunnel, waterway
* postcode
They roughly correspond to the classification of the OpenStreetMap data
according to either the `place` tag or the main key of the object.

View File

@@ -27,8 +27,8 @@ The search term may be specified with two different sets of parameters:
Free-form query string to search for.
Free-form queries are processed first left-to-right and then right-to-left if that fails. So you may search for
[pilkington avenue, birmingham](//nominatim.openstreetmap.org/search?q=pilkington+avenue,birmingham) as well as for
[birmingham, pilkington avenue](//nominatim.openstreetmap.org/search?q=birmingham,+pilkington+avenue).
[pilkington avenue, birmingham](https://nominatim.openstreetmap.org/search?q=pilkington+avenue,birmingham) as well as for
[birmingham, pilkington avenue](https://nominatim.openstreetmap.org/search?q=birmingham,+pilkington+avenue).
Commas are optional, but improve performance by reducing the complexity of the search.

View File

@@ -1,38 +1,24 @@
# OSM Data Import
OSM data is initially imported using [osm2pgsql](https://osm2pgsql.org).
Nominatim uses its own data output style 'gazetteer', which differs from the
output style created for map rendering.
## Database Layout
The gazetteer style produces a single table `place` with the following rows:
* `osm_type` - kind of OSM object (**N** - node, **W** - way, **R** - relation)
* `osm_id` - original OSM ID
* `class` - key of principal tag defining the object type
* `type` - value of principal tag defining the object type
* `name` - collection of tags that contain a name or reference
* `admin_level` - numerical value of the tagged administrative level
* `address` - collection of tags defining the address of an object
* `extratags` - collection of additional interesting tags that are not
directly relevant for searching
* `geometry` - geometry of the object (in WGS84)
A single OSM object may appear multiple times in this table when it is tagged
with multiple tags that may constitute a principal tag. Take for example a
motorway bridge. In OSM, this would be a way which is tagged with
`highway=motorway` and `bridge=yes`. This way would appear in the `place` table
once with `class` of `highway` and once with a `class` of `bridge`. Thus the
*unique key* for `place` is (`osm_type`, `osm_id`, `class`).
## Configuring the Import
How tags are interpreted and assigned to the different `place` columns can be
configured via the import style configuration file (`NOMINATIM_IMPORT_STYLE`). This
Which OSM objects are added to the database and which of the tags are used
can be configured via the import style configuration file. This
is a JSON file which contains a list of rules which are matched against every
tag of every object and then assign the tag its specific role.
The style to use is given by the `NOMINATIM_IMPORT_STYLE` configuration
option. There are a number of default styles, which are explained in detail
in the [Import section](../admin/Import.md#filtering-imported-data). These
standard styles may be referenced by their name.
You can also create your own custom syle. Put the style file into your
project directory and then set `NOMINATIM_IMPORT_STYLE` to the name of the file.
It is always recommended to start with one of the standard styles and customize
those. You find the standard styles under the name `import-<stylename>.style`
in the standard Nominatim configuration path (usually `/etc/nominatim` or
`/usr/local/etc/nominatim`).
The remainder of the page describes the format of the file.
### Configuration Rules
A single rule looks like this:
@@ -159,9 +145,6 @@ A rule can define as many of these properties for one match as it likes. For
example, if the property is `"main,extra"` then the tag will open a new row
but also have the tag appear in the list of extra tags.
There are a number of pre-defined styles in the `settings/` directory. It is
advisable to start from one of these styles when defining your own.
### Changing the Style of Existing Databases
There is normally no issue changing the style of a database that is already

View File

@@ -0,0 +1,20 @@
Nominatim comes with a predefined set of configuration options that should
work for most standard installations. If you have special requirements, there
are many places where the configuration can be adapted. This chapter describes
the following configurable parts:
* [Global Settings](Settings.md) has a detailed description of all parameters that
can be set in your local `.env` configuration
* [Import styles](Import-Styles.md) explains how to write your own import style
in order to control what kind of OSM data will be imported
* [Place ranking](Ranking.md) describes the configuration around classifing
places in terms of their importance and their role in an address
* [Tokenizers](Tokenizers.md) describes the configuration of the module
responsible for analysing and indexing names
* [Special Phrases](Special-Phrases.md) are common nouns or phrases that
can be used in search to identify a class of places
There are also guides for adding the following external data:
* [US house numbers from the TIGER dataset](Tiger.md)
* [External postcodes](Postcodes.md)

View File

@@ -0,0 +1,37 @@
# External postcode data
Nominatim creates a table of known postcode centroids during import. This table
is used for searches of postcodes and for adding postcodes to places where the
OSM data does not provide one. These postcode centroids are mainly computed
from the OSM data itself. In addition, Nominatim supports reading postcode
information from an external CSV file, to supplement the postcodes that are
missing in OSM.
To enable external postcode support, simply put one CSV file per country into
your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
two-letter country code for which to apply the file. The file may also be
gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
The CSV file must use commas as a delimiter and have a header line. Nominatim
expects three columns to be present: `postcode`, `lat` and `lon`. All other
columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
postcode centroids in WGS84.
The postcode files are loaded only when there is data for the given country
in your database. For example, if there is a `us_postcodes.csv` file in your
project directory but you import only an excerpt of Italy, then the US postcodes
will simply be ignored.
As a rule, the external postcode data should be put into the project directory
**before** starting the initial import. Still, you can add, remove and update the
external postcode data at any time. Simply
run:
```
nominatim refresh --postcodes
```
to make the changes visible in your database. Be aware, however, that the changes
only have an immediate effect on searches for postcodes. Postcodes that were
added to places are only updated, when they are reindexed. That usually happens
only during replication updates.

View File

@@ -1,8 +1,7 @@
# Place Ranking in Nominatim
Nominatim uses two metrics to rank a place: search rank and address rank.
Both can be assigned a value between 0 and 30. They serve slightly
different purposes, which are explained in this chapter.
This chapter explains what place ranking means and how it can be customized.
## Search rank

649
docs/customize/Settings.md Normal file
View File

@@ -0,0 +1,649 @@
This section provides a reference of all configuration parameters that can
be used with Nominatim.
# Configuring Nominatim
Nominatim uses [dotenv](https://github.com/theskumar/python-dotenv) to manage
its configuration settings. There are two means to set configuration
variables: through an `.env` configuration file or through an environment
variable.
The `.env` configuration file needs to be placed into the
[project directory](../admin/Import.md#creating-the-project-directory). It
must contain configuration parameters in `<parameter>=<value>` format.
Please refer to the dotenv documentation for details.
The configuration options may also be set in the form of shell environment
variables. This is particularly useful, when you want to temporarily change
a configuration option. For example, to force the replication serve to
download the next change, you can temporarily disable the update interval:
NOMINATIM_REPLICATION_UPDATE_INTERVAL=0 nominatim replication --once
If a configuration option is defined through .env file and environment
variable, then the latter takes precedence.
## Configuration Parameter Reference
### Import and Database Settings
#### NOMINATIM_DATABASE_DSN
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Database connection string |
| **Format:** | string: `pgsql:<param1>=<value1>;<param2>=<value2>;...` |
| **Default:** | pgsql:dbname=nominatim |
| **After Changes:** | run `nominatim refresh --website` |
Sets the connection parameters for the Nominatim database. At a minimum
the name of the database (`dbname`) is required. You can set any additional
parameter that is understood by libpq. See the [Postgres documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS) for a full list.
!!! note
It is usually recommended not to set the password directly in this
configuration parameter. Use a
[password file](https://www.postgresql.org/docs/current/libpq-pgpass.html)
instead.
#### NOMINATIM_DATABASE_WEBUSER
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Database query user |
| **Format:** | string |
| **Default:** | www-data |
| **After Changes:** | cannot be changed after import |
Defines the name of the database user that will run search queries. Usually
this is the user under which the webserver is executed. When running Nominatim
via php-fpm, you can also define a separate query user. The Postgres user
needs to be set up before starting the import.
Nominatim grants minimal rights to this user to all tables that are needed
for running geocoding queries.
#### NOMINATIM_DATABASE_MODULE_PATH
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Directory where to find the PostgreSQL server module |
| **Format:** | path |
| **Default:** | _empty_ (use `<project_directory>/module`) |
| **After Changes:** | run `nominatim refresh --functions` |
| **Comment:** | Legacy tokenizer only |
Defines the directory in which the PostgreSQL server module `nominatim.so`
is stored. The directory and module must be accessible by the PostgreSQL
server.
For information on how to use this setting when working with external databases,
see [Advanced Installations](../admin/Advanced-Installations.md).
The option is only used by the Legacy tokenizer and ignored otherwise.
#### NOMINATIM_TOKENIZER
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Tokenizer used for normalizing and parsing queries and names |
| **Format:** | string |
| **Default:** | legacy |
| **After Changes:** | cannot be changed after import |
Sets the tokenizer type to use for the import. For more information on
available tokenizers and how they are configured, see
[Tokenizers](../customize/Tokenizers.md).
#### NOMINATIM_TOKENIZER_CONFIG
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration file for the tokenizer |
| **Format:** | path |
| **Default:** | _empty_ (default file depends on tokenizer) |
| **After Changes:** | see documentation for each tokenizer |
Points to the file with additional configuration for the tokenizer.
See the [Tokenizer](../customize/Tokenizers.md) descriptions for details
on the file format.
If a relative path is given, then the file is searched first relative to the
project directory and then in the global settings directory.
#### NOMINATIM_MAX_WORD_FREQUENCY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Number of occurrences before a word is considered frequent |
| **Format:** | int |
| **Default:** | 50000 |
| **After Changes:** | cannot be changed after import |
| **Comment:** | Legacy tokenizer only |
The word frequency count is used by the Legacy tokenizer to automatically
identify _stop words_. Any partial term that occurs more often then what
is defined in this setting, is effectively ignored during search.
#### NOMINATIM_LIMIT_REINDEXING
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Avoid invalidating large areas |
| **Format:** | bool |
| **Default:** | yes |
Nominatim computes the address of each place at indexing time. This has the
advantage to make search faster but also means that more objects needs to
be invalidated when the data changes. For example, changing the name of
the state of Florida would require recomputing every single address point
in the state to make the new name searchable in conjunction with addresses.
Setting this option to 'yes' means that Nominatim skips reindexing of contained
objects when the area becomes too large.
#### NOMINATIM_LANGUAGES
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Restrict search languages |
| **Format:** | string: comma-separated list of language codes |
| **Default:** | _empty_ |
Normally Nominatim will include all language variants of name:XX
in the search index. Set this to a comma separated list of language
codes, to restrict import to a subset of languages.
Currently only affects the initial import of country names and special phrases.
#### NOMINATIM_TERM_NORMALIZATION
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Rules for normalizing terms for comparisons |
| **Format:** | string: semicolon-separated list of ICU rules |
| **Default:** | :: NFD (); [[:Nonspacing Mark:] [:Cf:]] >; :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC (); |
| **Comment:** | Legacy tokenizer only |
[Special phrases](Special-Phrases.md) have stricter matching requirements than
normal search terms. They must appear exactly in the query after this term
normalization has been applied.
Only has an effect on the Legacy tokenizer. For the ICU tokenizer the rules
defined in the
[normalization section](Tokenizers.md#normalization-and-transliteration)
will be used.
#### NOMINATIM_USE_US_TIGER_DATA
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable searching for Tiger house number data |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim --refresh --functions` |
When this setting is enabled, search and reverse queries also take data
from [Tiger house number data](Tiger.md) into account.
#### NOMINATIM_USE_AUX_LOCATION_DATA
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable searching in external house number tables |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim --refresh --functions` |
| **Comment:** | Do not use. |
When this setting is enabled, search queries also take data from external
house number tables into account.
*Warning:* This feature is currently unmaintained and should not be used.
#### NOMINATIM_HTTP_PROXY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Use HTTP proxy when downloading data |
| **Format:** | boolean |
| **Default:** | no |
When this setting is enabled and at least
[NOMINATIM_HTTP_PROXY_HOST](#nominatim_http_proxy_host) and
[NOMINATIM_HTTP_PROXY_PORT](#nominatim_http_proxy_port) are set, the
configured proxy will be used, when downloading external data like
replication diffs.
#### NOMINATIM_HTTP_PROXY_HOST
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Host name of the proxy to use |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, this setting
configures the proxy host name.
#### NOMINATIM_HTTP_PROXY_PORT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Port number of the proxy to use |
| **Format:** | integer |
| **Default:** | 3128 |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, this setting
configures the port number to use with the proxy.
#### NOMINATIM_HTTP_PROXY_LOGIN
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Username for proxies that require login |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, use this
setting to define the username for proxies that require a login.
#### NOMINATIM_HTTP_PROXY_PASSWORD
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Password for proxies that require login |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, use this
setting to define the password for proxies that require a login.
#### NOMINATIM_OSM2PGSQL_BINARY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Location of the osm2pgsql binary |
| **Format:** | path |
| **Default:** | _empty_ (use binary shipped with Nominatim) |
| **Comment:** | EXPERT ONLY |
Nominatim uses [osm2pgsql](https://osm2pgsql.org) to load the OSM data
initially into the database. Nominatim comes bundled with a version of
osm2pgsql that is guaranteed to be compatible. Use this setting to use
a different binary instead. You should do this only when you know exactly
what you are doing. If the osm2pgsql version is not compatible, then the
result is undefined.
#### NOMINATIM_WIKIPEDIA_DATA_PATH
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Directory with the wikipedia importance data |
| **Format:** | path |
| **Default:** | _empty_ (project directory) |
Set a custom location for the
[wikipedia ranking file](../admin/Import.md#wikipediawikidata-rankings). When
unset, Nominatim expects the data to be saved in the project directory.
#### NOMINATIM_ADDRESS_LEVEL_CONFIG
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration file for rank assignments |
| **Format:** | path |
| **Default:** | address-levels.json |
The _address level configuration_ defines the rank assignments for places. See
[Place Ranking](Ranking.md) for a detailed explanation what rank assignments
are and what the configuration file must look like.
When a relative path is given, then the file is searched first relative to the
project directory and then in the global settings directory.
#### NOMINATIM_IMPORT_STYLE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration to use for the initial OSM data import |
| **Format:** | string or path |
| **Default:** | extratags |
The _style configuration_ describes which OSM objects and tags are taken
into consideration for the search database. Nominatim comes with a set
of pre-configured styles, that may be configured here.
You can also write your own custom style and point the setting to the file
with the style. When a relative path is given, then the style file is searched
first relative to the project directory and then in the global settings
directory.
See [Import Styles](Import-Styles.md)
for more information on the available internal styles and the format of the
configuration file.
#### NOMINATIM_FLATNODE_FILE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Location of osm2pgsql flatnode file |
| **Format:** | path |
| **Default:** | _empty_ (do not use a flatnote file) |
| **After Changes:** | Only change when moving the file physically. |
The `osm2pgsql flatnode file` is file that efficiently stores geographic
location for OSM nodes. For larger imports it can significantly speed up
the import. When this option is unset, then osm2pgsql uses a PsotgreSQL table
to store the locations.
When a relative path is given, then the flatnode file is created/searched
relative to the project directory.
!!! warning
The flatnode file is not only used during the initial import but also
when adding new data with `nominatim add-data` or `nominatim replication`.
Make sure you keep the flatnode file around and this setting unmodified,
if you plan to add more data or run regular updates.
#### NOMINATIM_TABLESPACE_*
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Group of settings for distributing the database over tablespaces |
| **Format:** | string |
| **Default:** | _empty_ (do not use a table space) |
| **After Changes:** | no effect after initial import |
Nominatim allows to distribute the search database over up to 10 different
[PostgreSQL tablespaces](https://www.postgresql.org/docs/current/manage-ag-tablespaces.html).
If you use this option, make sure that the tablespaces exist before starting
the import.
The available tablespace groups are:
NOMINATIM_TABLESPACE_SEARCH_DATA
: Data used by the geocoding frontend.
NOMINATIM_TABLESPACE_SEARCH_INDEX
: Indexes used by the geocoding frontend.
NOMINATIM_TABLESPACE_OSM_DATA
: Raw OSM data cache used for import and updates.
NOMINATIM_TABLESPACE_OSM_DATA
: Indexes on the raw OSM data cache.
NOMINATIM_TABLESPACE_PLACE_DATA
: Data table with the pre-filtered but still unprocessed OSM data.
Used only during imports and updates.
NOMINATIM_TABLESPACE_PLACE_INDEX
: Indexes on raw data table. Used only during imports and updates.
NOMINATIM_TABLESPACE_ADDRESS_DATA
: Data tables used for computing search terms and addresses of places
during import and updates.
NOMINATIM_TABLESPACE_ADDRESS_INDEX
: Indexes on the data tables for search term and address computation.
Used only for import and updates.
NOMINATIM_TABLESPACE_AUX_DATA
: Auxiliary data tables for non-OSM data, e.g. for Tiger house number data.
NOMINATIM_TABLESPACE_AUX_INDEX
: Indexes on auxiliary data tables.
### Replication Update Settings
#### NOMINATIM_REPLICATION_URL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Base URL of the replication service |
| **Format:** | url |
| **Default:** | https://planet.openstreetmap.org/replication/minute |
| **After Changes:** | run `nominatim replication --init` |
Replication services deliver updates to OSM data. Use this setting to choose
which replication service to use. See [Updates](../admin/Update.md) for more
information on how to set up regular updates.
#### NOMINATIM_REPLICATION_MAX_DIFF
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Maximum amount of data to download per update cycle (in MB) |
| **Format:** | integer |
| **Default:** | 50 |
| **After Changes:** | restart the replication process |
At each update cycle Nominatim downloads diffs until either no more diffs
are available on the server (i.e. the database is up-to-date) or the limit
given in this setting is exceeded. Nominatim guarantees to downloads at least
one diff, if one is available, no matter how small the setting.
The default for this setting is fairly conservative because Nominatim keeps
all data downloaded in one cycle in RAM. Using large values in a production
server may interfere badly with the search frontend because it evicts data
from RAM that is needed for speedy answers to incoming requests. It is usually
a better idea to keep this setting lower and run multiple update cycles
to catch up with updates.
When catching up in non-production mode, for example after the initial import,
the setting can easily be changed temporarily on the command line:
NOMINATIM_REPLICATION_MAX_DIFF=3000 nominatim replication
#### NOMINATIM_REPLICATION_UPDATE_INTERVAL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Publication interval of the replication service (in seconds) |
| **Format:** | integer |
| **Default:** | 75 |
| **After Changes:** | restart the replication process |
This setting determines when Nominatim will attempt to download again a new
update. The time is computed from the publication date of the last diff
downloaded. Setting this to a slightly higher value than the actual
publication interval avoids unnecessary rechecks.
#### NOMINATIM_REPLICATION_RECHECK_INTERVAL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Wait time to recheck for a pending update (in seconds) |
| **Format:** | integer |
| **Default:** | 60 |
| **After Changes:** | restart the replication process |
When replication updates are run in continuous mode (using `nominatim replication`),
this setting determines how long Nominatim waits until it looks for updates
again when updates were not available on the server.
Note that this is different from
[NOMINATIM_REPLICATION_UPDATE_INTERVAL](#nominatim_replication_update_interval).
Nominatim will never attempt to query for new updates for UPDATE_INTERVAL
seconds after the current database date. Only after the update interval has
passed it asks for new data. If then no new data is found, it waits for
RECHECK_INTERVAL seconds before it attempts again.
### API Settings
#### NOMINATIM_CORS_NOACCESSCONTROL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Send permissive CORS access headers |
| **Format:** | boolean |
| **Default:** | yes |
| **After Changes:** | run `nominatim refresh --website` |
When this setting is enabled, API HTTP responses include the HTTP
[CORS](https://en.wikipedia.org/wiki/CORS) headers
`access-control-allow-origin: *` and `access-control-allow-methods: OPTIONS,GET`.
#### NOMINATIM_MAPICON_URL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | URL prefix for static icon images |
| **Format:** | url |
| **Default:** | _empty_ |
| **After Changes:** | run `nominatim refresh --website` |
When a mapicon URL is configured, then Nominatim includes an additional `icon`
field in the responses, pointing to an appropriate icon for the place type.
Map icons used to be included in Nominatim itself but now have moved to the
[nominatim-ui](https://github.com/osm-search/nominatim-ui/) project. If you
want the URL to be included in API responses, make the `/mapicon`
directory of the project available under a public URL and point this setting
to the directory.
#### NOMINATIM_DEFAULT_LANGUAGE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Language of responses when no language is requested |
| **Format:** | language code |
| **Default:** | _empty_ (use the local language of the feature) |
| **After Changes:** | run `nominatim refresh --website` |
Nominatim localizes the place names in responses when the corresponding
translation is available. Users can request a custom language setting through
the HTTP accept-languages header or through the explicit parameter
[accept-languages](../api/Search.md#language-of-results). If neither is
given, it falls back to this setting. If the setting is also empty, then
the local languages (in OSM: the name tag without any language suffix) is
used.
#### NOMINATIM_SEARCH_BATCH_MODE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable a special batch query mode |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --website` |
This feature is currently undocumented and potentially broken.
#### NOMINATIM_SEARCH_NAME_ONLY_THRESHOLD
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Threshold for switching the search index lookup strategy |
| **Format:** | integer |
| **Default:** | 500 |
| **After Changes:** | run `nominatim refresh --website` |
This setting defines the threshold over which a name is no longer considered
as rare. When searching for places with rare names, only the name is used
for place lookups. Otherwise the name and any address information is used.
This setting only has an effect after `nominatim refresh --word-counts` has
been called to compute the word frequencies.
#### NOMINATIM_LOOKUP_MAX_COUNT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Maximum number of OSM ids accepted by /lookup |
| **Format:** | integer |
| **Default:** | 50 |
| **After Changes:** | run `nominatim refresh --website` |
The /lookup point accepts list of ids to look up address details for. This
setting restricts the number of places a user may look up with a single
request.
#### NOMINATIM_POLYGON_OUTPUT_MAX_TYPES
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Number of different geometry formats that may be returned |
| **Format:** | integer |
| **Default:** | 1 |
| **After Changes:** | run `nominatim refresh --website` |
Nominatim supports returning full geometries of places. The geometries may
be requested in different formats with one of the
[`polygon_*` parameters](../api/Search.md#polygon-output). Use this
setting to restrict the number of geometry types that may be requested
with a single query.
Setting this parameter to 0 disables polygon output completely.
### Logging Settings
#### NOMINATIM_LOG_DB
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Log requests into the database |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --website` |
Enable logging requests into a database table with this setting. The logs
can be found in the table `new_query_log`.
When using this logging method, it is advisable to set up a job that
regularly clears out old logging information. Nominatim will not do that
on its own.
Can be used as the same time as NOMINATIM_LOG_FILE.
#### NOMINATIM_LOG_FILE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Log requests into a file |
| **Format:** | path |
| **Default:** | _empty_ (logging disabled) |
| **After Changes:** | run `nominatim refresh --website` |
Enable logging of requests into a file with this setting by setting the log
file where to log to. A relative file name is assumed to be relative to
the project directory.
The entries in the log file have the following format:
<request time> <execution time in s> <number of results> <type> "<query string>"
Request time is the time when the request was started. The execution time is
given in ms and corresponds to the time the query took executing in PHP.
type contains the name of the endpoint used.
Can be used as the same time as NOMINATIM_LOG_DB.

View File

@@ -0,0 +1,34 @@
# Special phrases
## Importing OSM user-maintained special phrases
As described in the [Import section](../admin/Import.md), it is possible to
import special phrases from the wiki with the following command:
```sh
nominatim special-phrases --import-from-wiki
```
## Importing custom special phrases
But, it is also possible to import some phrases from a csv file.
To do so, you have access to the following command:
```sh
nominatim special-phrases --import-from-csv <csv file>
```
Note that the two previous import commands will update the phrases from your database.
This means that if you import some phrases from a csv file, only the phrases
present in the csv file will be kept into the database. All other phrases will
be removed.
If you want to only add new phrases and not update the other ones you can add
the argument `--no-replace` to the import command. For example:
```sh
nominatim special-phrases --import-from-csv <csv file> --no-replace
```
This will add the phrases present in the csv file into the database without
removing the other ones.

28
docs/customize/Tiger.md Normal file
View File

@@ -0,0 +1,28 @@
# Installing TIGER housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2021 data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger2021-nominatim-preprocessed.csv.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger2021-nominatim-preprocessed.csv.tar.gz
3. Enable use of the Tiger data in your `.env` by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions
See the [TIGER-data project](https://github.com/osm-search/TIGER-data) for more
information on how the data got preprocessed.

View File

@@ -0,0 +1,302 @@
# Tokenizers
The tokenizer module in Nominatim is responsible for analysing the names given
to OSM objects and the terms of an incoming query in order to make sure, they
can be matched appropriately.
Nominatim offers different tokenizer modules, which behave differently and have
different configuration options. This sections describes the tokenizers and how
they can be configured.
!!! important
The use of a tokenizer is tied to a database installation. You need to choose
and configure the tokenizer before starting the initial import. Once the import
is done, you cannot switch to another tokenizer anymore. Reconfiguring the
chosen tokenizer is very limited as well. See the comments in each tokenizer
section.
## Legacy tokenizer
The legacy tokenizer implements the analysis algorithms of older Nominatim
versions. It uses a special Postgresql module to normalize names and queries.
This tokenizer is currently the default.
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=legacy
```
The Postgresql module for the tokenizer is available in the `module` directory
and also installed with the remainder of the software under
`lib/nominatim/module/nominatim.so`. You can specify a custom location for
the module with
```
NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
```
This is in particular useful when the database runs on a different server.
See [Advanced installations](../admin/Advanced-Installations.md#importing-nominatim-to-an-external-postgresql-database) for details.
There are no other configuration options for the legacy tokenizer. All
normalization functions are hard-coded.
## ICU tokenizer
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and
abbreviation handling.
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=icu
```
### How it works
On import the tokenizer processes names in the following three stages:
1. During the **Sanitizer step** incoming names are cleaned up and converted to
**full names**. This step can be used to regularize spelling, split multi-name
tags into their parts and tag names with additional attributes. See the
[Sanitizers section](#sanitizers) below for available cleaning routines.
2. The **Normalization** part removes all information from the full names
that are not relevant for search.
3. The **Token analysis** step takes the normalized full names and creates
all transliterated variants under which the name should be searchable.
See the [Token analysis](#token-analysis) section below for more
information.
During query time, only normalization and transliteration are relevant.
An incoming query is first split into name chunks (this usually means splitting
the string at the commas) and the each part is normalised and transliterated.
The result is used to look up places in the search index.
### Configuration
The ICU tokenizer is configured using a YAML file which can be configured using
`NOMINATIM_TOKENIZER_CONFIG`. The configuration is read on import and then
saved as part of the internal database status. Later changes to the variable
have no effect.
Here is an example configuration file:
``` yaml
normalization:
- ":: lower ()"
- "ß > 'ss'" # German szet is unimbigiously equal to double ss
transliteration:
- !include /etc/nominatim/icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
sanitizers:
- step: split-name-list
token-analysis:
- analyzer: generic
variants:
- !include icu-rules/variants-ca.yaml
- words:
- road -> rd
- bridge -> bdge,br,brdg,bri,brg
```
The configuration file contains four sections:
`normalization`, `transliteration`, `sanitizers` and `token-analysis`.
#### Normalization and Transliteration
The normalization and transliteration sections each define a set of
ICU rules that are applied to the names.
The **normalisation** rules are applied after sanitation. They should remove
any information that is not relevant for search at all. Usual rules to be
applied here are: lower-casing, removing of special characters, cleanup of
spaces.
The **transliteration** rules are applied at the end of the tokenization
process to transfer the name into an ASCII representation. Transliteration can
be useful to allow for further fuzzy matching, especially between different
scripts.
Each section must contain a list of
[ICU transformation rules](https://unicode-org.github.io/icu/userguide/transforms/general/rules.html).
The rules are applied in the order in which they appear in the file.
You can also include additional rules from external yaml file using the
`!include` tag. The included file must contain a valid YAML list of ICU rules
and may again include other files.
!!! warning
The ICU rule syntax contains special characters that conflict with the
YAML syntax. You should therefore always enclose the ICU rules in
double-quotes.
#### Sanitizers
The sanitizers section defines an ordered list of functions that are applied
to the name and address tags before they are further processed by the tokenizer.
They allows to clean up the tagging and bring it to a standardized form more
suitable for building the search index.
!!! hint
Sanitizers only have an effect on how the search index is built. They
do not change the information about each place that is saved in the
database. In particular, they have no influence on how the results are
displayed. The returned results always show the original information as
stored in the OpenStreetMap database.
Each entry contains information of a sanitizer to be applied. It has a
mandatory parameter `step` which gives the name of the sanitizer. Depending
on the type, it may have additional parameters to configure its operation.
The order of the list matters. The sanitizers are applied exactly in the order
that is configured. Each sanitizer works on the results of the previous one.
The following is a list of sanitizers that are shipped with Nominatim.
##### split-name-list
::: nominatim.tokenizer.sanitizers.split_name_list
selection:
members: False
rendering:
heading_level: 6
##### strip-brace-terms
::: nominatim.tokenizer.sanitizers.strip_brace_terms
selection:
members: False
rendering:
heading_level: 6
##### tag-analyzer-by-language
::: nominatim.tokenizer.sanitizers.tag_analyzer_by_language
selection:
members: False
rendering:
heading_level: 6
#### Token Analysis
Token analyzers take a full name and transform it into one or more normalized
form that are then saved in the search index. In its simplest form, the
analyzer only applies the transliteration rules. More complex analyzers
create additional spelling variants of a name. This is useful to handle
decomposition and abbreviation.
The ICU tokenizer may use different analyzers for different names. To select
the analyzer to be used, the name must be tagged with the `analyzer` attribute
by a sanitizer (see for example the
[tag-analyzer-by-language sanitizer](#tag-analyzer-by-language)).
The token-analysis section contains the list of configured analyzers. Each
analyzer must have an `id` parameter that uniquely identifies the analyzer.
The only exception is the default analyzer that is used when no special
analyzer was selected.
Different analyzer implementations may exist. To select the implementation,
the `analyzer` parameter must be set. Currently there is only one implementation
`generic` which is described in the following.
##### Generic token analyzer
The generic analyzer is able to create variants from a list of given
abbreviation and decomposition replacements. It takes one optional parameter
`variants` which lists the replacements to apply. If the section is
omitted, then the generic analyzer becomes a simple analyzer that only
applies the transliteration.
The variants section defines lists of replacements which create alternative
spellings of a name. To create the variants, a name is scanned from left to
right and the longest matching replacement is applied until the end of the
string is reached.
The variants section must contain a list of replacement groups. Each group
defines a set of properties that describes where the replacements are
applicable. In addition, the word section defines the list of replacements
to be made. The basic replacement description is of the form:
```
<source>[,<source>[...]] => <target>[,<target>[...]]
```
The left side contains one or more `source` terms to be replaced. The right side
lists one or more replacements. Each source is replaced with each replacement
term.
!!! tip
The source and target terms are internally normalized using the
normalization rules given in the configuration. This ensures that the
strings match as expected. In fact, it is better to use unnormalized
words in the configuration because then it is possible to change the
rules for normalization later without having to adapt the variant rules.
###### Decomposition
In its standard form, only full words match against the source. There
is a special notation to match the prefix and suffix of a word:
``` yaml
- ~strasse => str # matches "strasse" as full word and in suffix position
- hinter~ => hntr # matches "hinter" as full word and in prefix position
```
There is no facility to match a string in the middle of the word. The suffix
and prefix notation automatically trigger the decomposition mode: two variants
are created for each replacement, one with the replacement attached to the word
and one separate. So in above example, the tokenization of "hauptstrasse" will
create the variants "hauptstr" and "haupt str". Similarly, the name "rote strasse"
triggers the variants "rote str" and "rotestr". By having decomposition work
both ways, it is sufficient to create the variants at index time. The variant
rules are not applied at query time.
To avoid automatic decomposition, use the '|' notation:
``` yaml
- ~strasse |=> str
```
simply changes "hauptstrasse" to "hauptstr" and "rote strasse" to "rote str".
###### Initial and final terms
It is also possible to restrict replacements to the beginning and end of a
name:
``` yaml
- ^south => s # matches only at the beginning of the name
- road$ => rd # matches only at the end of the name
```
So the first example would trigger a replacement for "south 45th street" but
not for "the south beach restaurant".
###### Replacements vs. variants
The replacement syntax `source => target` works as a pure replacement. It changes
the name instead of creating a variant. To create an additional version, you'd
have to write `source => source,target`. As this is a frequent case, there is
a shortcut notation for it:
```
<source>[,<source>[...]] -> <target>[,<target>[...]]
```
The simple arrow causes an additional variant to be added. Note that
decomposition has an effect here on the source as well. So a rule
``` yaml
- "~strasse -> str"
```
means that for a word like `hauptstrasse` four variants are created:
`hauptstrasse`, `haupt strasse`, `hauptstr` and `haupt str`.
### Reconfiguration
Changing the configuration after the import is currently not possible, although
this feature may be added at a later time.

View File

@@ -0,0 +1,167 @@
# Database Layout
### Import tables
OSM data is initially imported using [osm2pgsql](https://osm2pgsql.org).
Nominatim uses its own data output style 'gazetteer', which differs from the
output style created for map rendering.
The import process creates the following tables:
![osm2pgsql tables](osm2pgsql-tables.svg)
The `planet_osm_*` tables are the usual backing tables for OSM data. Note
that Nominatim uses them to look up special relations and to find nodes on
ways.
The gazetteer style produces a single table `place` as output with the following
columns:
* `osm_type` - kind of OSM object (**N** - node, **W** - way, **R** - relation)
* `osm_id` - original OSM ID
* `class` - key of principal tag defining the object type
* `type` - value of principal tag defining the object type
* `name` - collection of tags that contain a name or reference
* `admin_level` - numerical value of the tagged administrative level
* `address` - collection of tags defining the address of an object
* `extratags` - collection of additional interesting tags that are not
directly relevant for searching
* `geometry` - geometry of the object (in WGS84)
A single OSM object may appear multiple times in this table when it is tagged
with multiple tags that may constitute a principal tag. Take for example a
motorway bridge. In OSM, this would be a way which is tagged with
`highway=motorway` and `bridge=yes`. This way would appear in the `place` table
once with `class` of `highway` and once with a `class` of `bridge`. Thus the
*unique key* for `place` is (`osm_type`, `osm_id`, `class`).
How raw OSM tags are mapped to the columns in the place table is to a certain
degree configurable. See [Customizing Import Styles](../customize/Import-Styles.md)
for more information.
### Search tables
The following tables carry all information needed to do the search:
![search tables](search-tables.svg)
The **placex** table is the central table that saves all information about the
searchable places in Nominatim. The basic columns are the same as for the
place table and have the same meaning. The placex tables adds the following
additional columns:
* `place_id` - the internal unique ID to identify the place
* `partition` - the id to use with partitioned tables (see below)
* `geometry_sector` - a location hash used for geographically close ordering
* `parent_place_id` - the next higher place in the address hierarchy, only
relevant for POI-type places (with rank 30)
* `linked_place_id` - place ID of the place this object has been merged with.
When this ID is set, then the place is invisible for search.
* `importance` - measure how well known the place is
* `rank_search`, `rank_address` - search and address rank (see [Customizing ranking](../customize/Ranking.md)
* `wikipedia` - the wikipedia page used for computing the importance of the place
* `country_code` - the country the place is located in
* `housenumber` - normalized housenumber, if the place has one
* `postcode` - computed postcode for the place
* `indexed_status` - processing status of the place (0 - ready, 1 - freshly inserted, 2 - needs updating, 100 - needs deletion)
* `indexed_date` - timestamp when the place was processed last
* `centroid` - a point feature for the place
The **location_property_osmline** table is a special table for
[address interpolations](https://wiki.openstreetmap.org/wiki/Addresses#Using_interpolation).
The columns have the same meaning and use as the columns with the same name in
the placex table. Only three columns are special:
* `startnumber` and `endnumber` - beginning and end of the number range
for the interpolation
* `interpolationtype` - a string `odd`, `even` or `all` to indicate
the interval between the numbers
Address interpolations are always ways in OSM, which is why there is no column
`osm_type`.
The **location_postcode** table holds computed centroids of all postcodes that
can be found in the OSM data. The meaning of the columns is again the same
as that of the placex table.
Every place needs an address, a set of surrounding places that describe the
location of the place. The set of address places is made up of OSM places
themselves. The **place_addressline** table cross-references for each place
all the places that make up its address. Two columns define the address
relation:
* `place_id` - reference to the place being addressed
* `address_place_id` - reference to the place serving as an address part
The most of the columns cache information from the placex entry of the address
part. The exceptions are:
* `fromarea` - is true if the address part has an area geometry and can
therefore be considered preceise
* `isaddress` - is true if the address part should show up in the address
output. Sometimes there are multiple places competing for for same address
type (e.g. multiple cities) and this field resolves the tie.
The **search_name** table contains the search index proper. It saves for each
place the terms with which the place can be found. The terms are split into
the name itself and all terms that make up the address. The table mirrors some
of the columns from placex for faster lookup.
Search terms are not saved as strings. Each term is assigned an integer and those
integers are saved in the name and address vectors of the search_name table. The
**word** table serves as the lookup table from string to such a word ID. The
exact content of the word table depends on the [tokenizer](Tokenizers.md) used.
## Address computation tables
Next to the main search tables, there is a set of secondary helper tables used
to compute the address relations between places. These tables are partitioned.
Each country is assigned a partition number in the country_name table (see
below) and the data is then split between a set of tables, one for each
partition. Note that Nominatim still manually manages partitioned tables.
Native support for partitions in PostgreSQL only became useable with version 13.
It will be a little while before Nominatim drops support for older versions.
![address tables](address-tables.svg)
The **search_name_X** tables are used to look up streets that appear in the
`addr:street` tag.
The **location_area_large_X** tables are used to look up larger areas
(administrative boundaries and place nodes) either through their geographic
closeness or through `addr:*` entries.
The **location_road_X** tables are used to find the closest street for a
dependent place.
All three table cache specific information from the placex table for their
selected subset of places:
* `keywords` and `name_vector` contain lists of term ids (from the word table)
that the full name of the place should match against
* `isguess` is true for places that are not described by an area
All other columns reflect their counterpart in the placex table.
## Static data tables
Nominatim also creates a number of static tables at import:
* `nominatim_properties` saves settings that must not be changed after
import
* `address_levels` save the rank information from the
[ranking configuration](../customize/Ranking.md)
* `country_name` contains a fallback of names for all countries, their
default languages and saves the assignment of countries to partitions.
* `country_osm_grid` provides a fallback for country geometries
## Auxilary data tables
Finally there are some table for auxillary data:
* `location_property_tiger` - saves housenumber from the Tiger import. Its
layout is similar to that of `location_propoerty_osmline`.
* `place_class_*` tables are helper tables to facilitate lookup of POIs
by their class and type. They exist because it is not possible to create
combined indexes with geometries.

View File

@@ -29,7 +29,7 @@ The Nominatim test suite consists of behavioural tests (using behave) and
unit tests (using PHPUnit for PHP code and pytest for Python code).
It has the following additional requirements:
* [behave test framework](https://behave.readthedocs.io) >= 1.2.5
* [behave test framework](https://behave.readthedocs.io) >= 1.2.6
* [phpunit](https://phpunit.de) >= 7.3
* [PHP CodeSniffer](https://github.com/squizlabs/PHP_CodeSniffer)
* [Pylint](https://pylint.org/) (2.6.0 is used for the CI)
@@ -38,6 +38,7 @@ It has the following additional requirements:
The documentation is built with mkdocs:
* [mkdocs](https://www.mkdocs.org/) >= 1.1.2
* [mkdocstrings](https://mkdocstrings.github.io/)
### Installing prerequisites on Ubuntu/Debian
@@ -51,7 +52,7 @@ To install all necessary packages run:
sudo apt install php-cgi phpunit php-codesniffer \
python3-pip python3-setuptools python3-dev pylint
pip3 install --user behave mkdocs pytest
pip3 install --user behave mkdocs mkdocstrings pytest
```
The `mkdocs` executable will be located in `.local/bin`. You may have to add
@@ -113,7 +114,7 @@ symlinks (see `CMakeLists.txt` for the exact steps).
Now you can start webserver for local testing
```
build> mkdocs serve
build> make serve-doc
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
@@ -122,7 +123,7 @@ If you develop inside a Vagrant virtual machine, use a port that is forwarded
to your host:
```
build> mkdocs serve --dev-addr 0.0.0.0:8088
build> PYTHONPATH=$SRCDIR mkdocs serve --dev-addr 0.0.0.0:8088
[server:296] Serving on http://0.0.0.0:8088
[handlers:62] Start watching changes
```

152
docs/develop/Indexing.md Normal file
View File

@@ -0,0 +1,152 @@
# Indexing Places
In Nominatim, the word __indexing__ refers to the process that takes the raw
OpenStreetMap data from the place table, enriches it with address information
and creates the search indexes. This section explains the basic data flow.
## Initial import
After osm2pgsql has loaded the raw OSM data into the place table,
the data is copied to the final search tables placex and location_property_osmline.
While they are copied, some basic properties are added:
* country_code, geometry_sector and partition
* initial search and address rank
In addition the column `indexed_status` is set to `1` marking the place as one
that needs to be indexed.
All this happens in the triggers `placex_insert` and `osmline_insert`.
## Indexing
The main work horse of the data import is the indexing step, where Nominatim
takes every place from the placex and location_property_osmline tables where
the indexed_status != 0 and computes the search terms and the address parts
of the place.
The indexing happens in three major steps:
1. **Data preparation** - The indexer gets the data for the place to be indexed
from the database.
2. **Search name processing** - The prepared data is given to the
tokenizer which computes the search terms from the names
and potentially other information.
3. **Address processing** - The indexer then hands the prepared data and the
tokenizer information back to the database via an `INSERT` statement which
also sets the indexed_status to `0`. This triggers the update triggers
`placex_update`/`osmline_update` which do the work of computing address
parts and filling all the search tables.
When computing the address terms of a place, Nominatim relies on the processed
search names of all the address parts. That is why places are processed in rank
order, from smallest rank to largest. To ensure correct handling of linked
place nodes, administrative boundaries are processed before all other places.
Apart from these restrictions, each place can be indexed independently
from the others. This allows a large degree of parallelization during the indexing.
It also means that the indexing process can be interrupted at any time and
will simply pick up where it left of when restarted.
### Data preparation
The data preparation step computes and retrieves all data for a place that
might be needed for the next step of processing the search name. That includes
* location information (country code)
* place classification (class, type, ranks)
* names (including names of linked places)
* address information (`addr:*` tags)
Data preparation is implemented in pl/PgSQL mostly in the functions
`placex_indexing_prepare()` and `get_interpolation_address()`.
#### `addr:*` tag inheritance
Nominatim has limited support for inheriting address tags from a building
to POIs inside the building. This only works when the address tags are on the
building outline. Any rank 30 object inside such a building or on its outline
inherits all address tags when it does not have any address tags of its own.
The inheritance is computed in the data preparation step.
### Search name processing
The prepared place information is handed to the tokenizer next. This is a
Python module responsible for processing the names from both name and address
terms and building up the word index from them. The process is explained in
more detail in the [Tokenizer chapter](Tokenizer.md).
### Address processing
Finally, the preprocessed place information and the results of the search name
processing are written back to the database. At this point the update trigger
of the placex/location_property_osmline tables take over and fill all the
dependent tables. This makes up the most work-intensive part of the indexing.
Nominatim distinguishes between dependent and independent places.
**Dependent places** are all places on rank 30: house numbers, POIs etc. These
places don't have a full address of their own. Instead they are attached to
a parent street or place and use the information of the parent for searching
and displaying information. Everything else are **independent places**: streets,
parks, water bodies, suburbs, cities, states etc. They receive a full address
on their own.
The address processing for both types of places is very different.
#### Independent places
To compute the address of an independent place Nominatim searches for all
places that cover the place to compute the address for at least partially.
For places with an area, that area is used to check for coverage. For place
nodes an artificial square area is computed according to the rank of
the place. The lower the rank the lager the area. The `location_area_large_X`
tables are there to facilitate the lookup. All places that can function as
the address of another place are saved in those tables.
`addr:*` and `isin:*` tags are taken into account to compute the address, too.
Nominatim will give preference to places with the same name as in these tags
when looking for places in the vicinity. If there are no matching place names
at all, then the tags are at least added to the search index. That means that
the names will not be shown in the result as the 'address' of the place, but
searching by them still works.
Independent places are always added to the global search index `search_name`.
#### Dependent places
Dependent places skip the full address computation for performance reasons.
Instead they just find a parent place to attach themselves to.
![parenting of dependent places](parenting-flow.svg)
By default a POI
or house number will be attached to the closest street. That can be any major
or minor street indexed by Nominatim. In the default configuration that means
that it can attach itself to a footway but only when it has a name.
When the dependent place has an `addr:street` tag, then Nominatim will first
try to find a street with the same name before falling back to the closest
street.
There are also addresses in OSM, where the housenumber does not belong
to a street at all. These have an `addr:place` tag. For these places, Nominatim
tries to find a place with the given name in the indexed places with an
address rank between 16 and 25. If none is found, then the dependent place
is attached to the closest place in that category and the addr:place name is
added as *unlisted* place, which indicates to Nominatim that it needs to add
it to the address output, no matter what. This special case is necessary to
cover addresses that don't really refer to an existing object.
When an address has both the `addr:street` and `addr:place` tag, then Nominatim
assumes that the `addr:place` tag in fact should be the city part of the address
and give the POI the usual street number address.
Dependent places are only added to the global search index `search_name` when
they have either a name themselves or when they have address tags that are not
covered by the places that make up their address. The latter ensures that
addresses are always searchable by those address tags.

View File

@@ -1,45 +0,0 @@
# Postcodes in Nominatim
The blog post
[Nominatim and Postcodes](https://www.openstreetmap.org/user/lonvia/diary/43143)
describes the handling implemented since Nominatim 3.1.
Postcode centroids (aka 'calculated postcodes') are generated by looking at all
postcodes of a country, grouping them and calculating the geometric centroid.
There is currently no logic to deal with extreme outliers (typos or other
mistakes in OSM data). There is also no check if a postcodes adheres to a
country's format, e.g. if Swiss postcodes are 4 digits.
## Regular updating calculated postcodes
The script to rerun the calculation is
`nominatim refresh --postcodes`
and runs once per night on nominatim.openstreetmap.org.
## Finding places that share a specific postcode
In the Nominatim database run
```sql
SELECT address->'postcode' as pc,
osm_type, osm_id, class, type,
st_x(centroid) as lon, st_y(centroid) as lat
FROM placex
WHERE country_code='fr'
AND upper(trim (both ' ' from address->'postcode')) = '33210';
```
Alternatively on [Overpass](https://overpass-turbo.eu/) run the following query
```
[out:json][timeout:250];
area["name"="France"]->.boundaryarea;
(
nwr(area.boundaryarea)["addr:postcode"="33210"];
);
out body;
>;
out skel qt;
```

332
docs/develop/Tokenizers.md Normal file
View File

@@ -0,0 +1,332 @@
# Tokenizers
The tokenizer is the component of Nominatim that is responsible for
analysing names of OSM objects and queries. Nominatim provides different
tokenizers that use different strategies for normalisation. This page describes
how tokenizers are expected to work and the public API that needs to be
implemented when creating a new tokenizer. For information on how to configure
a specific tokenizer for a database see the
[tokenizer chapter in the Customization Guide](../customize/Tokenizers.md).
## Generic Architecture
### About Search Tokens
Search in Nominatim is organised around search tokens. Such a token represents
string that can be part of the search query. Tokens are used so that the search
index does not need to be organised around strings. Instead the database saves
for each place which tokens match this place's name, address, house number etc.
To be able to distinguish between these different types of information stored
with the place, a search token also always has a certain type: name, house number,
postcode etc.
During search an incoming query is transformed into a ordered list of such
search tokens (or rather many lists, see below) and this list is then converted
into a database query to find the right place.
It is the core task of the tokenizer to create, manage and assign the search
tokens. The tokenizer is involved in two distinct operations:
* __at import time__: scanning names of OSM objects, normalizing them and
building up the list of search tokens.
* __at query time__: scanning the query and returning the appropriate search
tokens.
### Importing
The indexer is responsible to enrich an OSM object (or place) with all data
required for geocoding. It is split into two parts: the controller collects
the places that require updating, enriches the place information as required
and hands the place to Postgresql. The collector is part of the Nominatim
library written in Python. Within Postgresql, the `placex_update`
trigger is responsible to fill out all secondary tables with extra geocoding
information. This part is written in PL/pgSQL.
The tokenizer is involved in both parts. When the indexer prepares a place,
it hands it over to the tokenizer to inspect the names and create all the
search tokens applicable for the place. This usually involves updating the
tokenizer's internal token lists and creating a list of all token IDs for
the specific place. This list is later needed in the PL/pgSQL part where the
indexer needs to add the token IDs to the appropriate search tables. To be
able to communicate the list between the Python part and the pl/pgSQL trigger,
the `placex` table contains a special JSONB column `token_info` which is there
for the exclusive use of the tokenizer.
The Python part of the tokenizer returns a structured information about the
tokens of a place to the indexer which converts it to JSON and inserts it into
the `token_info` column. The content of the column is then handed to the PL/pqSQL
callbacks of the tokenizer which extracts the required information. Usually
the tokenizer then removes all information from the `token_info` structure,
so that no information is ever persistently saved in the table. All information
that went in should have been processed after all and put into secondary tables.
This is however not a hard requirement. If the tokenizer needs to store
additional information about a place permanently, it may do so in the
`token_info` column. It just may never execute searches over it and
consequently not create any special indexes on it.
### Querying
At query time, Nominatim builds up multiple _interpretations_ of the search
query. Each of these interpretations is tried against the database in order
of the likelihood with which they match to the search query. The first
interpretation that yields results wins.
The interpretations are encapsulated in the `SearchDescription` class. An
instance of this class is created by applying a sequence of
_search tokens_ to an initially empty SearchDescription. It is the
responsibility of the tokenizer to parse the search query and derive all
possible sequences of search tokens. To that end the tokenizer needs to parse
the search query and look up matching words in its own data structures.
## Tokenizer API
The following section describes the functions that need to be implemented
for a custom tokenizer implementation.
!!! warning
This API is currently in early alpha status. While this API is meant to
be a public API on which other tokenizers may be implemented, the API is
far away from being stable at the moment.
### Directory Structure
Nominatim expects two files for a tokenizer:
* `nominiatim/tokenizer/<NAME>_tokenizer.py` containing the Python part of the
implementation
* `lib-php/tokenizer/<NAME>_tokenizer.php` with the PHP part of the
implementation
where `<NAME>` is a unique name for the tokenizer consisting of only lower-case
letters, digits and underscore. A tokenizer also needs to install some SQL
functions. By convention, these should be placed in `lib-sql/tokenizer`.
If the tokenizer has a default configuration file, this should be saved in
the `settings/<NAME>_tokenizer.<SUFFIX>`.
### Configuration and Persistance
Tokenizers may define custom settings for their configuration. All settings
must be prefixed with `NOMINATIM_TOKENIZER_`. Settings may be transient or
persistent. Transient settings are loaded from the configuration file when
Nominatim is started and may thus be changed at any time. Persistent settings
are tied to a database installation and must only be read during installation
time. If they are needed for the runtime then they must be saved into the
`nominatim_properties` table and later loaded from there.
### The Python module
The Python module is expect to export a single factory function:
```python
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
```
The `dsn` parameter contains the DSN of the Nominatim database. The `data_dir`
is a directory in the project directory that the tokenizer may use to save
database-specific data. The function must return the instance of the tokenizer
class as defined below.
### Python Tokenizer Class
All tokenizers must inherit from `nominatim.tokenizer.base.AbstractTokenizer`
and implement the abstract functions defined there.
::: nominatim.tokenizer.base.AbstractTokenizer
rendering:
heading_level: 4
### Python Analyzer Class
::: nominatim.tokenizer.base.AbstractAnalyzer
rendering:
heading_level: 4
### PL/pgSQL Functions
The tokenizer must provide access functions for the `token_info` column
to the indexer which extracts the necessary information for the global
search tables. If the tokenizer needs additional SQL functions for private
use, then these functions must be prefixed with `token_` in order to ensure
that there are no naming conflicts with the SQL indexer code.
The following functions are expected:
```sql
FUNCTION token_get_name_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of search terms that should match
the name(s) for the given place. These tokens are used to look up the place
by name and, where the place functions as part of an address for another place,
by address. Must return NULL when the place has no name.
```sql
FUNCTION token_get_name_match_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of full names of the place that should be used
to match addresses. The list of match tokens is usually more strict than
search tokens as it is used to find a match between two OSM tag values which
are expected to contain matching full names. Partial terms should not be
used for match tokens. Must return NULL when the place has no name.
```sql
FUNCTION token_get_housenumber_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of house number tokens that apply to the place.
Note that a place may have multiple house numbers, for example when apartments
each have their own number. Must be NULL when the place has no house numbers.
```sql
FUNCTION token_normalized_housenumber(info JSONB) RETURNS TEXT
```
Return the house number(s) in the normalized form that can be matched against
a house number token text. If a place has multiple house numbers they must
be listed with a semicolon as delimiter. Must be NULL when the place has no
house numbers.
```sql
FUNCTION token_matches_street(info JSONB, street_tokens INTEGER[]) RETURNS BOOLEAN
```
Check if the given tokens (previously saved from `token_get_name_match_tokens()`)
match against the `addr:street` tag name. Must return either NULL or FALSE
when the place has no `addr:street` tag.
```sql
FUNCTION token_matches_place(info JSONB, place_tokens INTEGER[]) RETURNS BOOLEAN
```
Check if the given tokens (previously saved from `token_get_name_match_tokens()`)
match against the `addr:place` tag name. Must return either NULL or FALSE
when the place has no `addr:place` tag.
```sql
FUNCTION token_addr_place_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return the search token IDs extracted from the `addr:place` tag. These tokens
are used for searches by address when no matching place can be found in the
database. Must be NULL when the place has no `addr:place` tag.
```sql
FUNCTION token_get_address_keys(info JSONB) RETURNS SETOF TEXT
```
Return the set of keys for which address information is provided. This
should correspond to the list of (relevant) `addr:*` tags with the `addr:`
prefix removed or the keys used in the `address` dictionary of the place info.
```sql
FUNCTION token_get_address_search_tokens(info JSONB, key TEXT) RETURNS INTEGER[]
```
Return the array of search tokens for the given address part. `key` can be
expected to be one of those returned with `token_get_address_keys()`. The
search tokens are added to the address search vector of the place, when no
corresponding OSM object could be found for the given address part from which
to copy the name information.
```sql
FUNCTION token_matches_address(info JSONB, key TEXT, tokens INTEGER[])
```
Check if the given tokens match against the address part `key`.
__Warning:__ the tokens that are handed in are the lists previously saved
from `token_get_name_search_tokens()`, _not_ from the match token list. This
is an historical oddity which will be fixed at some point in the future.
Currently, tokenizers are encouraged to make sure that matching works against
both the search token list and the match token list.
```sql
FUNCTION token_normalized_postcode(postcode TEXT) RETURNS TEXT
```
Return the normalized version of the given postcode. This function must return
the same value as the Python function `AbstractAnalyzer->normalize_postcode()`.
```sql
FUNCTION token_strip_info(info JSONB) RETURNS JSONB
```
Return the part of the `token_info` field that should be stored in the database
permanently. The indexer calls this function when all processing is done and
replaces the content of the `token_info` column with the returned value before
the trigger stores the information in the database. May return NULL if no
information should be stored permanently.
### PHP Tokenizer class
The PHP tokenizer class is instantiated once per request and responsible for
analyzing the incoming query. Multiple requests may be in flight in
parallel.
The class is expected to be found under the
name of `\Nominatim\Tokenizer`. To find the class the PHP code includes the file
`tokenizer/tokenizer.php` in the project directory. This file must be created
when the tokenizer is first set up on import. The file should initialize any
configuration variables by setting PHP constants and then require the file
with the actual implementation of the tokenizer.
The tokenizer class must implement the following functions:
```php
public function __construct(object &$oDB)
```
The constructor of the class receives a database connection that can be used
to query persistent data in the database.
```php
public function checkStatus()
```
Check that the tokenizer can access its persistent data structures. If there
is an issue, throw an `\Exception`.
```php
public function normalizeString(string $sTerm) : string
```
Normalize string to a form to be used for comparisons when reordering results.
Nominatim reweighs results how well the final display string matches the actual
query. Before comparing result and query, names and query are normalised against
this function. The tokenizer can thus remove all properties that should not be
taken into account for reweighing, e.g. special characters or case.
```php
public function tokensForSpecialTerm(string $sTerm) : array
```
Return the list of special term tokens that match the given term.
```php
public function extractTokensFromPhrases(array &$aPhrases) : TokenList
```
Parse the given phrases, splitting them into word lists and retrieve the
matching tokens.
The phrase array may take on two forms. In unstructured searches (using `q=`
parameter) the search query is split at the commas and the elements are
put into a sorted list. For structured searches the phrase array is an
associative array where the key designates the type of the term (street, city,
county etc.) The tokenizer may ignore the phrase type at this stage in parsing.
Matching phrase type and appropriate search token type will be done later
when the SearchDescription is built.
For each phrase in the list of phrases, the function must analyse the phrase
string and then call `setWordSets()` to communicate the result of the analysis.
A word set is a list of strings, where each string refers to a search token.
A phrase may have multiple interpretations. Therefore a list of word sets is
usually attached to the phrase. The search tokens themselves are returned
by the function in an associative array, where the key corresponds to the
strings given in the word sets. The value is a list of search tokens. Thus
a single string in the list of word sets may refer to multiple search tokens.

View File

@@ -0,0 +1,35 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
map search_name_X {
place_id => BIGINT
address_rank => SMALLINT
name_vector => INT[]
centroid => GEOMETRY
}
map location_area_large_X {
place_id => BIGINT
keywords => INT[]
partition => SMALLINT
rank_search => SMALLINT
rank_address => SMALLINT
country_code => VARCHR(2)
isguess => BOOLEAN
postcode => TEXT
centroid => POINT
geometry => GEOMETRY
}
map location_road_X {
place_id => BIGINT
partition => SMALLINT
country_code => VARCHR(2)
geometry => GEOMETRY
}
search_name_X -[hidden]> location_area_large_X
location_area_large_X -[hidden]> location_road_X
@enduml

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -0,0 +1,44 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
map planet_osm_nodes #eee {
id => BIGINT
lat => INT
lon => INT
}
map planet_osm_ways #eee {
id => BIGINT
nodes => BIGINT[]
tags => TEXT[]
}
map planet_osm_rels #eee {
id => BIGINT
parts => BIGINT[]
members => TEXT[]
tags => TEXT[]
way_off => SMALLINT
rel_off => SMALLINT
}
map place {
osm_type => CHAR(1)
osm_id => BIGINT
class => TEXT
type => TEXT
name => HSTORE
address => HSTORE
extratags => HSTORE
admin_level => SMALLINT
geometry => GEOMETRY
}
planet_osm_nodes -[hidden]> planet_osm_ways
planet_osm_ways -[hidden]> planet_osm_rels
planet_osm_ways -[hidden]-> place
planet_osm_nodes::id <- planet_osm_ways::nodes
@enduml

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 13 KiB

View File

@@ -0,0 +1,31 @@
@startuml
skinparam monochrome true
start
if (has 'addr:street'?) then (yes)
if (street with that name\n nearby?) then (yes)
:**Use closest street**
**with same name**;
kill
else (no)
:** Use closest**\n**street**;
kill
endif
elseif (has 'addr:place'?) then (yes)
if (place with that name\n nearby?) then (yes)
:**Use closest place**
**with same name**;
kill
else (no)
:add addr:place to adress;
:**Use closest place**\n**rank 16 to 25**;
kill
endif
else (otherwise)
:**Use closest**\n**street**;
kill
endif
@enduml

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.8 KiB

View File

@@ -0,0 +1,99 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
left to right direction
map placex {
place_id => BIGINT
osm_type => CHAR(1)
osm_id => BIGINT
class => TEXT
type => TEXT
name => HSTORE
address => HSTORE
extratags => HSTORE
admin_level => SMALLINT
partition => SMALLINT
geometry_sector => INT
parent_place_id => BIGINT
linked_place_id => BIGINT
importance => DOUBLE
rank_search => SMALLINT
rank_address => SMALLINT
wikipedia => TEXT
country_code => VARCHAR(2)
housenumber => TEXT
postcode => TEXT
indexed_status => SMALLINT
indexed_date => TIMESTAMP
centroid => GEOMETRY
geometry => GEOMETRY
}
map search_name {
place_id => BIGINT
importance => DOUBLE
search_rank => SMALLINT
address_rank => SMALLINT
name_vector => INT[]
nameaddress_vector => INT[]
country_code => VARCHAR(2)
centroid => GEOMETRY
}
map word {
word_id => INT
word_token => TEXT
... =>
}
map location_property_osmline {
place_id => BIGINT
osm_id => BIGINT
startnumber => INT
endnumber => INT
interpolationtype => TEXT
address => HSTORE
partition => SMALLINT
geometry_sector => INT
parent_place_id => BIGINT
country_code => VARCHAR(2)
postcode => text
indexed_status => SMALLINT
indexed_date => TIMESTAMP
linegeo => GEOMETRY
}
map place_addressline {
place_id => BIGINT
address_place_id => BIGINT
distance => DOUBLE
cached_rank_address => SMALLINT
fromarea => BOOLEAN
isaddress => BOOLEAN
}
map location_postcode {
place_id => BIGINT
postcode => TEXT
parent_place_id => BIGINT
rank_search => SMALLINT
rank_address => SMALLINT
indexed_status => SMALLINT
indexed_date => TIMESTAMP
geometry => GEOMETRY
}
placex::place_id <-- search_name::place_id
placex::place_id <-- place_addressline::place_id
placex::place_id <-- place_addressline::address_place_id
search_name::name_vector --> word::word_id
search_name::nameaddress_vector --> word::word_id
place_addressline -[hidden]> location_property_osmline
search_name -[hidden]> place_addressline
location_property_osmline -[hidden]-> location_postcode
@enduml

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 35 KiB

View File

@@ -13,3 +13,11 @@ th, td {
th {
background-color: #eee;
}
/* Indentation for mkdocstrings.
div.doc-contents:not(.first) {
padding-left: 25px;
border-left: 4px solid rgba(230, 230, 230);
margin-bottom: 60px;
}*/

View File

@@ -1,8 +1,10 @@
Nominatim (from the Latin, 'by name') is a tool to search OSM data by name and address and to generate synthetic addresses of OSM points (reverse geocoding).
This guide comes in three parts:
This guide comes in four parts:
* __[API reference](api/Overview.md)__ for users of Nominatim
* __[Administration Guide](admin/Installation.md)__ for those who want
to install their own Nominatim server
* __[Customization Guide](customize/Overview.md)__ for those who want to
adapt their own installation to their special requirements
* __[Developer's Guide](develop/overview.md)__ for developers of the software

View File

@@ -1,4 +1,4 @@
site_name: Nominatim Documentation
site_name: Nominatim 4.0.1
theme: readthedocs
docs_dir: ${CMAKE_CURRENT_BINARY_DIR}
site_url: https://nominatim.org
@@ -21,14 +21,24 @@ pages:
- 'Deploy' : 'admin/Deployment.md'
- 'Nominatim UI' : 'admin/Setup-Nominatim-UI.md'
- 'Advanced Installations' : 'admin/Advanced-Installations.md'
- 'Maintenance' : 'admin/Maintenance.md'
- 'Migration from older Versions' : 'admin/Migration.md'
- 'Troubleshooting' : 'admin/Faq.md'
- 'Customization Guide':
- 'Overview': 'customize/Overview.md'
- 'Import Styles': 'customize/Import-Styles.md'
- 'Configuration Settings': 'customize/Settings.md'
- 'Place Ranking' : 'customize/Ranking.md'
- 'Tokenizers' : 'customize/Tokenizers.md'
- 'Special Phrases': 'customize/Special-Phrases.md'
- 'External data: US housenumbers from TIGER': 'customize/Tiger.md'
- 'External data: Postcodes': 'customize/Postcodes.md'
- 'Developers Guide':
- 'Setup for Development' : 'develop/Development-Environment.md'
- 'Architecture Overview' : 'develop/overview.md'
- 'OSM Data Import' : 'develop/Import.md'
- 'Place Ranking' : 'develop/Ranking.md'
- 'Postcodes' : 'develop/Postcodes.md'
- 'Database Layout' : 'develop/Database-Layout.md'
- 'Indexing' : 'develop/Indexing.md'
- 'Tokenizers' : 'develop/Tokenizers.md'
- 'Setup for Development' : 'develop/Development-Environment.md'
- 'Testing' : 'develop/Testing.md'
- 'External Data Sources': 'develop/data-sources.md'
- 'Appendix':
@@ -39,6 +49,15 @@ pages:
markdown_extensions:
- codehilite
- admonition
- def_list
- toc:
permalink:
extra_css: [extra.css, styles.css]
plugins:
- search
- mkdocstrings:
handlers:
python:
rendering:
show_source: false
show_signature_annotations: false

View File

@@ -61,7 +61,7 @@ class AddressDetails
return join(', ', $aParts);
}
public function getAddressNames($sCountry = null)
public function getAddressNames()
{
$aAddress = array();
@@ -79,13 +79,11 @@ class AddressDetails
$sName = $aLine['housenumber'];
}
if (isset($sName)) {
$sTypeLabel = strtolower(str_replace(' ', '_', $sTypeLabel));
if (!isset($aAddress[$sTypeLabel])
|| $aLine['class'] == 'place'
) {
$aAddress[$sTypeLabel] = $sName;
}
if (isset($sName)
&& (!isset($aAddress[$sTypeLabel])
|| $aLine['class'] == 'place')
) {
$aAddress[$sTypeLabel] = $sName;
}
}

View File

@@ -39,7 +39,9 @@ class DB
$conn->exec("SET DateStyle TO 'sql,european'");
$conn->exec("SET client_encoding TO 'utf-8'");
$iMaxExecution = ini_get('max_execution_time');
if ($iMaxExecution > 0) $conn->setAttribute(\PDO::ATTR_TIMEOUT, $iMaxExecution); // seconds
if ($iMaxExecution > 0) {
$conn->setAttribute(\PDO::ATTR_TIMEOUT, $iMaxExecution); // seconds
}
$this->connection = $conn;
return true;
@@ -95,7 +97,9 @@ class DB
try {
$stmt = $this->getQueryStatement($sSQL, $aInputVars, $sErrMessage);
$row = $stmt->fetch(\PDO::FETCH_NUM);
if ($row === false) return false;
if ($row === false) {
return false;
}
} catch (\PDOException $e) {
throw new \Nominatim\DatabaseError($sErrMessage, 500, null, $e, $sSQL);
}
@@ -306,9 +310,13 @@ class DB
if (preg_match('/^pgsql:(.+)$/', $sDSN, $aMatches)) {
foreach (explode(';', $aMatches[1]) as $sKeyVal) {
list($sKey, $sVal) = explode('=', $sKeyVal, 2);
if ($sKey == 'host') $sKey = 'hostspec';
if ($sKey == 'dbname') $sKey = 'database';
if ($sKey == 'user') $sKey = 'username';
if ($sKey == 'host') {
$sKey = 'hostspec';
} elseif ($sKey == 'dbname') {
$sKey = 'database';
} elseif ($sKey == 'user') {
$sKey = 'username';
}
$aInfo[$sKey] = $sVal;
}
}

View File

@@ -5,7 +5,7 @@ namespace Nominatim;
class DatabaseError extends \Exception
{
public function __construct($message, $code = 500, Exception $previous = null, $oPDOErr, $sSql = null)
public function __construct($message, $code, $previous, $oPDOErr, $sSql = null)
{
parent::__construct($message, $code, $previous);
// https://secure.php.net/manual/en/class.pdoexception.php

View File

@@ -78,7 +78,7 @@ class Debug
echo '<th>Address Tokens</th><th>Address Not</th>';
echo '<th>country</th><th>operator</th>';
echo '<th>class</th><th>type</th><th>postcode</th><th>housenumber</th></tr>';
foreach ($aSearches as $iRank => $aRankedSet) {
foreach ($aSearches as $aRankedSet) {
foreach ($aRankedSet as $aRow) {
$aRow->dumpAsHtmlTableRow($aWordsIDs);
}

View File

@@ -7,18 +7,20 @@ require_once(CONST_LibDir.'/Phrase.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
require_once(CONST_LibDir.'/SearchDescription.php');
require_once(CONST_LibDir.'/SearchContext.php');
require_once(CONST_LibDir.'/SearchPosition.php');
require_once(CONST_LibDir.'/TokenList.php');
require_once(CONST_TokenizerDir.'/tokenizer.php');
class Geocode
{
protected $oDB;
protected $oPlaceLookup;
protected $oTokenizer;
protected $aLangPrefOrder = array();
protected $aExcludePlaceIDs = array();
protected $bReverseInPlan = true;
protected $iLimit = 20;
protected $iFinalLimit = 10;
@@ -42,28 +44,12 @@ class Geocode
protected $sQuery = false;
protected $aStructuredQuery = false;
protected $oNormalizer = null;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oPlaceLookup = new PlaceLookup($this->oDB);
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
}
private function normTerm($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
public function setReverseInPlan($bReverse)
{
$this->bReverseInPlan = $bReverse;
$this->oTokenizer = new \Nominatim\Tokenizer($this->oDB);
}
public function setLanguagePreference($aLangPref)
@@ -85,7 +71,9 @@ class Geocode
$aParams['exclude_place_ids'] = implode(',', $this->aExcludePlaceIDs);
}
if ($this->bBoundedSearch) $aParams['bounded'] = '1';
if ($this->bBoundedSearch) {
$aParams['bounded'] = '1';
}
if ($this->aCountryCodes) {
$aParams['countrycodes'] = implode(',', $this->aCountryCodes);
@@ -100,8 +88,11 @@ class Geocode
public function setLimit($iLimit = 10)
{
if ($iLimit > 50) $iLimit = 50;
if ($iLimit < 1) $iLimit = 1;
if ($iLimit > 50) {
$iLimit = 50;
} elseif ($iLimit < 1) {
$iLimit = 1;
}
$this->iFinalLimit = $iLimit;
$this->iLimit = $iLimit + min($iLimit, 10);
@@ -196,18 +187,24 @@ class Geocode
if ($sExcluded) {
foreach ($sExcluded as $iExcludedPlaceID) {
$iExcludedPlaceID = (int)$iExcludedPlaceID;
if ($iExcludedPlaceID)
if ($iExcludedPlaceID) {
$aExcludePlaceIDs[$iExcludedPlaceID] = $iExcludedPlaceID;
}
}
if (isset($aExcludePlaceIDs))
if (isset($aExcludePlaceIDs)) {
$this->aExcludePlaceIDs = $aExcludePlaceIDs;
}
}
// Only certain ranks of feature
$sFeatureType = $oParams->getString('featureType');
if (!$sFeatureType) $sFeatureType = $oParams->getString('featuretype');
if ($sFeatureType) $this->setFeatureType($sFeatureType);
if (!$sFeatureType) {
$sFeatureType = $oParams->getString('featuretype');
}
if ($sFeatureType) {
$this->setFeatureType($sFeatureType);
}
// Country code list
$sCountries = $oParams->getStringList('countrycodes');
@@ -217,8 +214,9 @@ class Geocode
$aCountries[] = strtolower($sCountryCode);
}
}
if (isset($aCountries))
if (isset($aCountries)) {
$this->aCountryCodes = $aCountries;
}
}
$aViewbox = $oParams->getStringList('viewboxlbrt');
@@ -262,7 +260,6 @@ class Geocode
$oParams->getString('country'),
$oParams->getString('postalcode')
);
$this->setReverseInPlan(false);
} else {
$this->setQuery($sQuery);
}
@@ -271,13 +268,17 @@ class Geocode
public function loadStructuredAddressElement($sValue, $sKey, $iNewMinAddressRank, $iNewMaxAddressRank, $aItemListValues)
{
$sValue = trim($sValue);
if (!$sValue) return false;
if (!$sValue) {
return false;
}
$this->aStructuredQuery[$sKey] = $sValue;
if ($this->iMinAddressRank == 0 && $this->iMaxAddressRank == 30) {
$this->iMinAddressRank = $iNewMinAddressRank;
$this->iMaxAddressRank = $iNewMaxAddressRank;
}
if ($aItemListValues) $this->aAddressRankList = array_merge($this->aAddressRankList, $aItemListValues);
if ($aItemListValues) {
$this->aAddressRankList = array_merge($this->aAddressRankList, $aItemListValues);
}
return true;
}
@@ -311,11 +312,11 @@ class Geocode
public function fallbackStructuredQuery()
{
if (!$this->aStructuredQuery) return false;
$aParams = $this->aStructuredQuery;
if (count($aParams) == 1) return false;
if (!$aParams || count($aParams) == 1) {
return false;
}
$aOrderToFallback = array('postalcode', 'street', 'city', 'county', 'state');
@@ -330,7 +331,7 @@ class Geocode
return false;
}
public function getGroupedSearches($aSearches, $aPhrases, $oValidTokens, $bIsStructured)
public function getGroupedSearches($aSearches, $aPhrases, $oValidTokens)
{
/*
Calculate all searches using oValidTokens i.e.
@@ -345,52 +346,26 @@ class Geocode
*/
foreach ($aPhrases as $iPhrase => $oPhrase) {
$aNewPhraseSearches = array();
$sPhraseType = $bIsStructured ? $oPhrase->getPhraseType() : '';
$oPosition = new SearchPosition(
$oPhrase->getPhraseType(),
$iPhrase,
count($aPhrases)
);
foreach ($oPhrase->getWordSets() as $aWordset) {
$aWordsetSearches = $aSearches;
// Add all words from this wordset
foreach ($aWordset as $iToken => $sToken) {
//echo "<br><b>$sToken</b>";
$aNewWordsetSearches = array();
$oPosition->setTokenPosition($iToken, count($aWordset));
foreach ($aWordsetSearches as $oCurrentSearch) {
//echo "<i>";
//var_dump($oCurrentSearch);
//echo "</i>";
// Tokens with full name matches.
foreach ($oValidTokens->get(' '.$sToken) as $oSearchTerm) {
$aNewSearches = $oCurrentSearch->extendWithFullTerm(
$oSearchTerm,
$oValidTokens->contains($sToken)
&& strpos($sToken, ' ') === false,
$sPhraseType,
$iToken == 0 && $iPhrase == 0,
$iPhrase == 0,
$iToken + 1 == count($aWordset)
&& $iPhrase + 1 == count($aPhrases)
);
foreach ($aNewSearches as $oSearch) {
if ($oSearch->getRank() < $this->iMaxRank) {
$aNewWordsetSearches[] = $oSearch;
}
}
}
// Look for partial matches.
// Note that there is no point in adding country terms here
// because country is omitted in the address.
if ($sPhraseType != 'country') {
// Allow searching for a word - but at extra cost
foreach ($oValidTokens->get($sToken) as $oSearchTerm) {
$aNewSearches = $oCurrentSearch->extendWithPartialTerm(
$sToken,
$oSearchTerm,
$bIsStructured,
$iPhrase,
$oValidTokens->get(' '.$sToken)
foreach ($oValidTokens->get($sToken) as $oSearchTerm) {
if ($oSearchTerm->isExtendable($oCurrentSearch, $oPosition)) {
$aNewSearches = $oSearchTerm->extendSearch(
$oCurrentSearch,
$oPosition
);
foreach ($aNewSearches as $oSearch) {
@@ -405,7 +380,6 @@ class Geocode
usort($aNewWordsetSearches, array('Nominatim\SearchDescription', 'bySearchRank'));
$aWordsetSearches = array_slice($aNewWordsetSearches, 0, 50);
}
//var_Dump('<hr>',count($aWordsetSearches)); exit;
$aNewPhraseSearches = array_merge($aNewPhraseSearches, $aNewWordsetSearches);
usort($aNewPhraseSearches, array('Nominatim\SearchDescription', 'bySearchRank'));
@@ -413,8 +387,11 @@ class Geocode
$aSearchHash = array();
foreach ($aNewPhraseSearches as $iSearch => $aSearch) {
$sHash = serialize($aSearch);
if (isset($aSearchHash[$sHash])) unset($aNewPhraseSearches[$iSearch]);
else $aSearchHash[$sHash] = 1;
if (isset($aSearchHash[$sHash])) {
unset($aNewPhraseSearches[$iSearch]);
} else {
$aSearchHash[$sHash] = 1;
}
}
$aNewPhraseSearches = array_slice($aNewPhraseSearches, 0, 50);
@@ -435,10 +412,12 @@ class Geocode
$iSearchCount = 0;
$aSearches = array();
foreach ($aGroupedSearches as $iScore => $aNewSearches) {
foreach ($aGroupedSearches as $aNewSearches) {
$iSearchCount += count($aNewSearches);
$aSearches = array_merge($aSearches, $aNewSearches);
if ($iSearchCount > 50) break;
if ($iSearchCount > 50) {
break;
}
}
}
@@ -495,7 +474,9 @@ class Geocode
public function lookup()
{
Debug::newFunction('Geocode::lookup');
if (!$this->sQuery && !$this->aStructuredQuery) return array();
if (!$this->sQuery && !$this->aStructuredQuery) {
return array();
}
Debug::printDebugArray('Geocode', $this);
@@ -520,25 +501,11 @@ class Geocode
Debug::newSection('Query Preprocessing');
$sNormQuery = $this->normTerm($this->sQuery);
Debug::printVar('Normalized query', $sNormQuery);
$sLanguagePrefArraySQL = $this->oDB->getArraySQL(
$this->oDB->getDBQuotedList($this->aLangPrefOrder)
);
$sQuery = $this->sQuery;
if (!preg_match('//u', $sQuery)) {
userError('Query string is not UTF-8 encoded.');
}
// Conflicts between US state abreviations and various words for 'the' in different languages
if (isset($this->aLangPrefOrder['name:en'])) {
$sQuery = preg_replace('/(^|,)\s*il\s*(,|$)/i', '\1illinois\2', $sQuery);
$sQuery = preg_replace('/(^|,)\s*al\s*(,|$)/i', '\1alabama\2', $sQuery);
$sQuery = preg_replace('/(^|,)\s*la\s*(,|$)/i', '\1louisiana\2', $sQuery);
}
// Do we have anything that looks like a lat/lon pair?
$sQuery = $oCtx->setNearPointFromQuery($sQuery);
@@ -576,117 +543,62 @@ class Geocode
}
if ($sSpecialTerm && !$aSearches[0]->hasOperator()) {
$sSpecialTerm = pg_escape_string($sSpecialTerm);
$sToken = $this->oDB->getOne(
'SELECT make_standard_name(:term)',
array(':term' => $sSpecialTerm),
'Cannot decode query. Wrong encoding?'
);
$sSQL = 'SELECT class, type FROM word ';
$sSQL .= ' WHERE word_token in (\' '.$sToken.'\')';
$sSQL .= ' AND class is not null AND class not in (\'place\')';
$aTokens = $this->oTokenizer->tokensForSpecialTerm($sSpecialTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL);
$aNewSearches = array();
foreach ($aSearches as $oSearch) {
foreach ($aSearchWords as $aSearchTerm) {
$oNewSearch = clone $oSearch;
$oNewSearch->setPoiSearch(
Operator::TYPE,
$aSearchTerm['class'],
$aSearchTerm['type']
);
$aNewSearches[] = $oNewSearch;
if (!empty($aTokens)) {
$aNewSearches = array();
$oPosition = new SearchPosition('', 0, 1);
$oPosition->setTokenPosition(0, 1);
foreach ($aSearches as $oSearch) {
foreach ($aTokens as $oToken) {
$aNewSearches = array_merge(
$aNewSearches,
$oToken->extendSearch($oSearch, $oPosition)
);
}
}
$aSearches = $aNewSearches;
}
$aSearches = $aNewSearches;
}
// Split query into phrases
// Commas are used to reduce the search space by indicating where phrases split
$aPhrases = array();
if ($this->aStructuredQuery) {
$aInPhrases = $this->aStructuredQuery;
$bStructuredPhrases = true;
foreach ($this->aStructuredQuery as $iPhrase => $sPhrase) {
$aPhrases[] = new Phrase($sPhrase, $iPhrase);
}
} else {
$aInPhrases = explode(',', $sQuery);
$bStructuredPhrases = false;
foreach (explode(',', $sQuery) as $sPhrase) {
$aPhrases[] = new Phrase($sPhrase, '');
}
}
Debug::printDebugArray('Search context', $oCtx);
Debug::printDebugArray('Base search', empty($aSearches) ? null : $aSearches[0]);
Debug::printVar('Final query phrases', $aInPhrases);
// Convert each phrase to standard form
// Create a list of standard words
// Get all 'sets' of words
// Generate a complete list of all
Debug::newSection('Tokenization');
$aTokens = array();
$aPhrases = array();
foreach ($aInPhrases as $iPhrase => $sPhrase) {
$sPhrase = $this->oDB->getOne(
'SELECT make_standard_name(:phrase)',
array(':phrase' => $sPhrase),
'Cannot normalize query string (is it a UTF-8 string?)'
);
if (trim($sPhrase)) {
$oPhrase = new Phrase($sPhrase, is_string($iPhrase) ? $iPhrase : '');
$oPhrase->addTokens($aTokens);
$aPhrases[] = $oPhrase;
}
}
Debug::printVar('Tokens', $aTokens);
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$oValidTokens->addTokensFromDB(
$this->oDB,
$aTokens,
$this->aCountryCodes,
$sNormQuery,
$this->oNormalizer
);
$oValidTokens = $this->oTokenizer->extractTokensFromPhrases($aPhrases);
if ($oValidTokens->count() > 0) {
$oCtx->setFullNameWords($oValidTokens->getFullWordIDs());
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] == ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^ ([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^ [0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
$aPhrases = array_filter($aPhrases, function ($oPhrase) {
return $oPhrase->getWordSets() !== null;
});
// Any words that have failed completely?
// TODO: suggestions
Debug::printGroupTable('Valid Tokens', $oValidTokens->debugInfo());
foreach ($aPhrases as $oPhrase) {
$oPhrase->computeWordSets($oValidTokens);
}
Debug::printDebugTable('Phrases', $aPhrases);
Debug::newSection('Search candidates');
$aGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens, $bStructuredPhrases);
$aGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens);
if ($this->bReverseInPlan) {
if (!$this->aStructuredQuery) {
// Reverse phrase array and also reverse the order of the wordsets in
// the first and final phrase. Don't bother about phrases in the middle
// because order in the address doesn't matter.
@@ -695,7 +607,7 @@ class Geocode
if (count($aPhrases) > 1) {
$aPhrases[count($aPhrases)-1]->invertWordSets();
}
$aReverseGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens, false);
$aReverseGroupedSearches = $this->getGroupedSearches($aSearches, $aPhrases, $oValidTokens);
foreach ($aGroupedSearches as $aSearches) {
foreach ($aSearches as $aSearch) {
@@ -714,7 +626,9 @@ class Geocode
$aGroupedSearches = array();
foreach ($aSearches as $aSearch) {
if ($aSearch->getRank() < $this->iMaxRank) {
if (!isset($aGroupedSearches[$aSearch->getRank()])) $aGroupedSearches[$aSearch->getRank()] = array();
if (!isset($aGroupedSearches[$aSearch->getRank()])) {
$aGroupedSearches[$aSearch->getRank()] = array();
}
$aGroupedSearches[$aSearch->getRank()][] = $aSearch;
}
}
@@ -728,7 +642,9 @@ class Geocode
$sHash = serialize($aSearch);
if (isset($aSearchHash[$sHash])) {
unset($aGroupedSearches[$iGroup][$iSearch]);
if (empty($aGroupedSearches[$iGroup])) unset($aGroupedSearches[$iGroup]);
if (empty($aGroupedSearches[$iGroup])) {
unset($aGroupedSearches[$iGroup]);
}
} else {
$aSearchHash[$sHash] = 1;
}
@@ -772,7 +688,9 @@ class Geocode
}
}
if ($iQueryLoop > 20) break;
if ($iQueryLoop > 20) {
break;
}
}
if (!empty($aResults)) {
@@ -838,7 +756,6 @@ class Geocode
foreach ($aResults as $oResult) {
if (($this->iMaxAddressRank == 30 &&
($oResult->iTable == Result::TABLE_OSMLINE
|| $oResult->iTable == Result::TABLE_AUX
|| $oResult->iTable == Result::TABLE_TIGER))
|| in_array($oResult->iId, $aFilteredIDs)
) {
@@ -848,9 +765,9 @@ class Geocode
$aResults = $tempIDs;
}
if (!empty($aResults)) break;
if ($iGroupLoop > 4) break;
if ($iQueryLoop > 30) break;
if (!empty($aResults) || $iGroupLoop > 4 || $iQueryLoop > 30) {
break;
}
}
} else {
// Just interpret as a reverse geocode
@@ -868,10 +785,8 @@ class Geocode
// No results? Done
if (empty($aResults)) {
if ($this->bFallback) {
if ($this->fallbackStructuredQuery()) {
return $this->lookup();
}
if ($this->bFallback && $this->fallbackStructuredQuery()) {
return $this->lookup();
}
return array();
@@ -890,7 +805,9 @@ class Geocode
$aRecheckWords = preg_split('/\b[\s,\\-]*/u', $sQuery);
foreach ($aRecheckWords as $i => $sWord) {
if (!preg_match('/[\pL\pN]/', $sWord)) unset($aRecheckWords[$i]);
if (!preg_match('/[\pL\pN]/', $sWord)) {
unset($aRecheckWords[$i]);
}
}
Debug::printVar('Recheck words', $aRecheckWords);
@@ -950,7 +867,9 @@ class Geocode
foreach ($aRecheckWords as $i => $sWord) {
if (stripos($sAddress, $sWord)!==false) {
$iCountWords++;
if (preg_match('/(^|,)\s*'.preg_quote($sWord, '/').'\s*(,|$)/', $sAddress)) $iCountWords += 0.1;
if (preg_match('/(^|,)\s*'.preg_quote($sWord, '/').'\s*(,|$)/', $sAddress)) {
$iCountWords += 0.1;
}
}
}
@@ -967,15 +886,8 @@ class Geocode
$aToFilter = $aSearchResults;
$aSearchResults = array();
$bFirst = true;
foreach ($aToFilter as $aResult) {
$this->aExcludePlaceIDs[$aResult['place_id']] = $aResult['place_id'];
if ($bFirst) {
$fLat = $aResult['lat'];
$fLon = $aResult['lon'];
if (isset($aResult['zoom'])) $iZoom = $aResult['zoom'];
$bFirst = false;
}
if (!$this->oPlaceLookup->doDeDupe() || (!isset($aOSMIDDone[$aResult['osm_type'].$aResult['osm_id']])
&& !isset($aClassTypeNameDone[$aResult['osm_type'].$aResult['class'].$aResult['type'].$aResult['name'].$aResult['admin_level']]))
) {
@@ -985,7 +897,9 @@ class Geocode
}
// Absolute limit on number of results
if (count($aSearchResults) >= $this->iFinalLimit) break;
if (count($aSearchResults) >= $this->iFinalLimit) {
break;
}
}
Debug::printVar('Post-filter results', $aSearchResults);
@@ -999,7 +913,6 @@ class Geocode
'Structured query' => $this->aStructuredQuery,
'Name keys' => Debug::fmtArrayVals($this->aLangPrefOrder),
'Excluded place IDs' => Debug::fmtArrayVals($this->aExcludePlaceIDs),
'Try reversed query'=> $this->bReverseInPlan,
'Limit (for searches)' => $this->iLimit,
'Limit (for results)'=> $this->iFinalLimit,
'Country codes' => Debug::fmtArrayVals($this->aCountryCodes),

View File

@@ -90,14 +90,16 @@ class ParameterParser
$aLanguages = array();
$sLangString = $this->getString('accept-language', $sFallback);
if ($sLangString) {
if (preg_match_all('/(([a-z]{1,8})([-_][a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $sLangString, $aLanguagesParse, PREG_SET_ORDER)) {
foreach ($aLanguagesParse as $iLang => $aLanguage) {
$aLanguages[$aLanguage[1]] = isset($aLanguage[5])?(float)$aLanguage[5]:1 - ($iLang/100);
if (!isset($aLanguages[$aLanguage[2]])) $aLanguages[$aLanguage[2]] = $aLanguages[$aLanguage[1]]/10;
if ($sLangString
&& preg_match_all('/(([a-z]{1,8})([-_][a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $sLangString, $aLanguagesParse, PREG_SET_ORDER)
) {
foreach ($aLanguagesParse as $iLang => $aLanguage) {
$aLanguages[$aLanguage[1]] = isset($aLanguage[5])?(float)$aLanguage[5]:1 - ($iLang/100);
if (!isset($aLanguages[$aLanguage[2]])) {
$aLanguages[$aLanguage[2]] = $aLanguages[$aLanguage[1]]/10;
}
arsort($aLanguages);
}
arsort($aLanguages);
}
if (empty($aLanguages) && CONST_Default_Language) {
$aLanguages[CONST_Default_Language] = 1;

View File

@@ -9,36 +9,26 @@ namespace Nominatim;
*/
class Phrase
{
const MAX_WORDSET_LEN = 20;
const MAX_WORDSETS = 100;
// Complete phrase as a string.
// Complete phrase as a string (guaranteed to have no leading or trailing
// spaces).
private $sPhrase;
// Element type for structured searches.
private $sPhraseType;
// Space-separated words of the phrase.
private $aWords;
// Possible segmentations of the phrase.
private $aWordSets;
public static function cmpByArraylen($aA, $aB)
{
$iALen = count($aA);
$iBLen = count($aB);
if ($iALen == $iBLen) {
return 0;
}
return ($iALen < $iBLen) ? -1 : 1;
}
public function __construct($sPhrase, $sPhraseType)
{
$this->sPhrase = trim($sPhrase);
$this->sPhraseType = $sPhraseType;
$this->aWords = explode(' ', $this->sPhrase);
}
/**
* Get the orginal phrase of the string.
*/
public function getPhrase()
{
return $this->sPhrase;
}
/**
@@ -52,6 +42,11 @@ class Phrase
return $this->sPhraseType;
}
public function setWordSets($aWordSets)
{
$this->aWordSets = $aWordSets;
}
/**
* Return the array of possible segmentations of the phrase.
*
@@ -63,30 +58,6 @@ class Phrase
return $this->aWordSets;
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
public function addTokens(&$aTokens)
{
$iNumWords = count($this->aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $this->aWords[$i];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$this->aWords[$j];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
}
}
}
/**
* Invert the set of possible segmentations.
*
@@ -99,61 +70,11 @@ class Phrase
}
}
public function computeWordSets($oTokens)
{
$iNumWords = count($this->aWords);
// Caches the word set for the partial phrase up to word i.
$aSetCache = array_fill(0, $iNumWords, array());
// Initialise first element of cache. There can only be the word.
if ($oTokens->containsAny($this->aWords[0])) {
$aSetCache[0][] = array($this->aWords[0]);
}
// Now do the next elements using what we already have.
for ($i = 1; $i < $iNumWords; $i++) {
for ($j = $i; $j > 0; $j--) {
$sPartial = $j == $i ? $this->aWords[$j] : $this->aWords[$j].' '.$sPartial;
if (!empty($aSetCache[$j - 1]) && $oTokens->containsAny($sPartial)) {
$aPartial = array($sPartial);
foreach ($aSetCache[$j - 1] as $aSet) {
if (count($aSet) < Phrase::MAX_WORDSET_LEN) {
$aSetCache[$i][] = array_merge($aSet, $aPartial);
}
}
if (count($aSetCache[$i]) > 2 * Phrase::MAX_WORDSETS) {
usort(
$aSetCache[$i],
array('\Nominatim\Phrase', 'cmpByArraylen')
);
$aSetCache[$i] = array_slice(
$aSetCache[$i],
0,
Phrase::MAX_WORDSETS
);
}
}
}
// finally the current full phrase
$sPartial = $this->aWords[0].' '.$sPartial;
if ($oTokens->containsAny($sPartial)) {
$aSetCache[$i][] = array($sPartial);
}
}
$this->aWordSets = $aSetCache[$iNumWords - 1];
usort($this->aWordSets, array('\Nominatim\Phrase', 'cmpByArraylen'));
$this->aWordSets = array_slice($this->aWordSets, 0, Phrase::MAX_WORDSETS);
}
public function debugInfo()
{
return array(
'Type' => $this->sPhraseType,
'Phrase' => $this->sPhrase,
'Words' => $this->aWords,
'WordSets' => $this->aWordSets
);
}

View File

@@ -89,20 +89,36 @@ class PlaceLookup
{
$aParams = array();
if ($this->bAddressDetails) $aParams['addressdetails'] = '1';
if ($this->bExtraTags) $aParams['extratags'] = '1';
if ($this->bNameDetails) $aParams['namedetails'] = '1';
if ($this->bAddressDetails) {
$aParams['addressdetails'] = '1';
}
if ($this->bExtraTags) {
$aParams['extratags'] = '1';
}
if ($this->bNameDetails) {
$aParams['namedetails'] = '1';
}
if ($this->bIncludePolygonAsText) $aParams['polygon_text'] = '1';
if ($this->bIncludePolygonAsGeoJSON) $aParams['polygon_geojson'] = '1';
if ($this->bIncludePolygonAsKML) $aParams['polygon_kml'] = '1';
if ($this->bIncludePolygonAsSVG) $aParams['polygon_svg'] = '1';
if ($this->bIncludePolygonAsText) {
$aParams['polygon_text'] = '1';
}
if ($this->bIncludePolygonAsGeoJSON) {
$aParams['polygon_geojson'] = '1';
}
if ($this->bIncludePolygonAsKML) {
$aParams['polygon_kml'] = '1';
}
if ($this->bIncludePolygonAsSVG) {
$aParams['polygon_svg'] = '1';
}
if ($this->fPolygonSimplificationThreshold > 0.0) {
$aParams['polygon_threshold'] = $this->fPolygonSimplificationThreshold;
}
if (!$this->bDeDupe) $aParams['dedupe'] = '0';
if (!$this->bDeDupe) {
$aParams['dedupe'] = '0';
}
return $aParams;
}
@@ -147,8 +163,9 @@ class PlaceLookup
private function langAddressSql($sHousenumber)
{
if ($this->bAddressDetails)
if ($this->bAddressDetails) {
return ''; // langaddress will be computed from address details
}
return 'get_address_by_language(place_id,'.$sHousenumber.','.$this->aLangPrefOrderSql.') AS langaddress,';
}
@@ -234,12 +251,20 @@ class PlaceLookup
$sSQL .= ' housenumber,';
$sSQL .= ' country_code, ';
$sSQL .= ' importance, ';
if (!$this->bDeDupe) $sSQL .= 'place_id,';
if (!$this->bAddressDetails) $sSQL .= 'langaddress, ';
if (!$this->bDeDupe) {
$sSQL .= 'place_id,';
}
if (!$this->bAddressDetails) {
$sSQL .= 'langaddress, ';
}
$sSQL .= ' placename, ';
$sSQL .= ' ref, ';
if ($this->bExtraTags) $sSQL .= 'extratags, ';
if ($this->bNameDetails) $sSQL .= 'name, ';
if ($this->bExtraTags) {
$sSQL .= 'extratags, ';
}
if ($this->bNameDetails) {
$sSQL .= 'name, ';
}
$sSQL .= ' extra_place ';
$aSubSelects[] = $sSQL;
@@ -260,8 +285,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('-1');
$sSQL .= ' postcode as placename,';
$sSQL .= ' postcode as ref,';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra,';
if ($this->bNameDetails) $sSQL .= 'null::text AS names,';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra,';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names,';
}
$sSQL .= ' ST_x(geometry) AS lon, ST_y(geometry) AS lat,';
$sSQL .= ' (0.75-(rank_search::float/40)) AS importance, ';
$sSQL .= $this->addressImportanceSql('geometry', 'lp.parent_place_id');
@@ -298,8 +327,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('housenumber_for_place');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra,';
if ($this->bNameDetails) $sSQL .= 'null::text AS names,';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra,';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names,';
}
$sSQL .= ' st_x(centroid) AS lon, ';
$sSQL .= ' st_y(centroid) AS lat,';
$sSQL .= ' -1.15 AS importance, ';
@@ -344,8 +377,12 @@ class PlaceLookup
$sSQL .= $this->langAddressSql('housenumber_for_place');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra, ';
if ($this->bNameDetails) $sSQL .= 'null::text AS names, ';
if ($this->bExtraTags) {
$sSQL .= 'null::text AS extra, ';
}
if ($this->bNameDetails) {
$sSQL .= 'null::text AS names, ';
}
$sSQL .= ' st_x(centroid) AS lon, ';
$sSQL .= ' st_y(centroid) AS lat, ';
// slightly smaller than the importance for normal houses
@@ -373,42 +410,6 @@ class PlaceLookup
$aSubSelects[] = $sSQL;
}
if (CONST_Use_Aux_Location_data) {
$sPlaceIDs = Result::joinIdsByTable($aResults, Result::TABLE_AUX);
if ($sPlaceIDs) {
$sHousenumbers = Result::sqlHouseNumberTable($aResults, Result::TABLE_AUX);
$sSQL = ' SELECT ';
$sSQL .= " 'L' AS osm_type, ";
$sSQL .= ' place_id AS osm_id, ';
$sSQL .= " 'place' AS class,";
$sSQL .= " 'house' AS type, ";
$sSQL .= ' null::smallint AS admin_level, ';
$sSQL .= ' 30 AS rank_search,';
$sSQL .= ' 30 AS rank_address, ';
$sSQL .= ' place_id,';
$sSQL .= ' parent_place_id, ';
$sSQL .= ' housenumber,';
$sSQL .= " 'us' AS country_code, ";
$sSQL .= $this->langAddressSql('-1');
$sSQL .= ' null::text AS placename, ';
$sSQL .= ' null::text AS ref, ';
if ($this->bExtraTags) $sSQL .= 'null::text AS extra, ';
if ($this->bNameDetails) $sSQL .= 'null::text AS names, ';
$sSQL .= ' ST_X(centroid) AS lon, ';
$sSQL .= ' ST_Y(centroid) AS lat, ';
$sSQL .= ' -1.10 AS importance, ';
$sSQL .= $this->addressImportanceSql(
'centroid',
'location_property_aux.parent_place_id'
);
$sSQL .= ' null::text AS extra_place ';
$sSQL .= ' FROM location_property_aux ';
$sSQL .= " WHERE place_id in ($sPlaceIDs) ";
$aSubSelects[] = $sSQL;
}
}
}
if (empty($aSubSelects)) {
@@ -484,7 +485,9 @@ class PlaceLookup
{
$aOutlineResult = array();
if (!$iPlaceID) return $aOutlineResult;
if (!$iPlaceID) {
return $aOutlineResult;
}
// Get the bounding box and outline polygon
$sSQL = 'select place_id,0 as numfeatures,st_area(geometry) as area,';
@@ -496,10 +499,18 @@ class PlaceLookup
}
$sSQL .= ' ST_YMin(geometry) as minlat,ST_YMax(geometry) as maxlat,';
$sSQL .= ' ST_XMin(geometry) as minlon,ST_XMax(geometry) as maxlon';
if ($this->bIncludePolygonAsGeoJSON) $sSQL .= ',ST_AsGeoJSON(geometry) as asgeojson';
if ($this->bIncludePolygonAsKML) $sSQL .= ',ST_AsKML(geometry) as askml';
if ($this->bIncludePolygonAsSVG) $sSQL .= ',ST_AsSVG(geometry) as assvg';
if ($this->bIncludePolygonAsText) $sSQL .= ',ST_AsText(geometry) as astext';
if ($this->bIncludePolygonAsGeoJSON) {
$sSQL .= ',ST_AsGeoJSON(geometry) as asgeojson';
}
if ($this->bIncludePolygonAsKML) {
$sSQL .= ',ST_AsKML(geometry) as askml';
}
if ($this->bIncludePolygonAsSVG) {
$sSQL .= ',ST_AsSVG(geometry) as assvg';
}
if ($this->bIncludePolygonAsText) {
$sSQL .= ',ST_AsText(geometry) as astext';
}
if ($fLonReverse != null && $fLatReverse != null) {
$sFrom = ' from (SELECT * , CASE WHEN (class = \'highway\') AND (ST_GeometryType(geometry) = \'ST_LineString\') THEN ';
$sFrom .=' ST_ClosestPoint(geometry, ST_SetSRID(ST_Point('.$fLatReverse.','.$fLonReverse.'),4326))';
@@ -522,10 +533,18 @@ class PlaceLookup
$aOutlineResult['lon'] = $aPointPolygon['centrelon'];
}
if ($this->bIncludePolygonAsGeoJSON) $aOutlineResult['asgeojson'] = $aPointPolygon['asgeojson'];
if ($this->bIncludePolygonAsKML) $aOutlineResult['askml'] = $aPointPolygon['askml'];
if ($this->bIncludePolygonAsSVG) $aOutlineResult['assvg'] = $aPointPolygon['assvg'];
if ($this->bIncludePolygonAsText) $aOutlineResult['astext'] = $aPointPolygon['astext'];
if ($this->bIncludePolygonAsGeoJSON) {
$aOutlineResult['asgeojson'] = $aPointPolygon['asgeojson'];
}
if ($this->bIncludePolygonAsKML) {
$aOutlineResult['askml'] = $aPointPolygon['askml'];
}
if ($this->bIncludePolygonAsSVG) {
$aOutlineResult['assvg'] = $aPointPolygon['assvg'];
}
if ($this->bIncludePolygonAsText) {
$aOutlineResult['astext'] = $aPointPolygon['astext'];
}
if (abs($aPointPolygon['minlat'] - $aPointPolygon['maxlat']) < 0.0000001) {
$aPointPolygon['minlat'] = $aPointPolygon['minlat'] - $fRadius;

View File

@@ -13,8 +13,7 @@ class Result
const TABLE_PLACEX = 0;
const TABLE_POSTCODE = 1;
const TABLE_OSMLINE = 2;
const TABLE_AUX = 3;
const TABLE_TIGER = 4;
const TABLE_TIGER = 3;
/// Database table that contains the result.
public $iTable;
@@ -56,6 +55,27 @@ class Result
}
)));
}
public static function joinIdsByTableMinRank($aResults, $iTable, $iMinAddressRank)
{
return join(',', array_keys(array_filter(
$aResults,
function ($aValue) use ($iTable, $iMinAddressRank) {
return $aValue->iTable == $iTable && $aValue->iAddressRank >= $iMinAddressRank;
}
)));
}
public static function joinIdsByTableMaxRank($aResults, $iTable, $iMaxAddressRank)
{
return join(',', array_keys(array_filter(
$aResults,
function ($aValue) use ($iTable, $iMaxAddressRank) {
return $aValue->iTable == $iTable && $aValue->iAddressRank <= $iMaxAddressRank;
}
)));
}
public static function sqlHouseNumberTable($aResults, $iTable)
{
$sHousenumbers = '';

View File

@@ -74,8 +74,6 @@ class ReverseGeocode
protected function lookupLargeArea($sPointSQL, $iMaxRank)
{
$oResult = null;
if ($iMaxRank > 4) {
$aPlace = $this->lookupPolygon($sPointSQL, $iMaxRank);
if ($aPlace) {
@@ -113,6 +111,7 @@ class ReverseGeocode
$sSQL .= ' FROM placex';
$sSQL .= ' WHERE osm_type = \'N\'';
$sSQL .= ' AND country_code = \''.$sCountryCode.'\'';
$sSQL .= ' AND rank_search < 26 '; // needed to select right index
$sSQL .= ' AND rank_search between 5 and ' .min(25, $iMaxRank);
$sSQL .= ' AND class = \'place\' AND type != \'postcode\'';
$sSQL .= ' AND name IS NOT NULL ';
@@ -167,9 +166,13 @@ class ReverseGeocode
{
Debug::newFunction('lookupPolygon');
// polygon search begins at suburb-level
if ($iMaxRank > 25) $iMaxRank = 25;
if ($iMaxRank > 25) {
$iMaxRank = 25;
}
// no polygon search over country-level
if ($iMaxRank < 5) $iMaxRank = 5;
if ($iMaxRank < 5) {
$iMaxRank = 5;
}
// search for polygon
$sSQL = 'SELECT place_id, parent_place_id, rank_address, rank_search FROM';
$sSQL .= '(select place_id, parent_place_id, rank_address, rank_search, country_code, geometry';
@@ -190,7 +193,6 @@ class ReverseGeocode
if ($aPoly) {
// if a polygon is found, search for placenodes begins ...
$iParentPlaceID = $aPoly['parent_place_id'];
$iRankAddress = $aPoly['rank_address'];
$iRankSearch = $aPoly['rank_search'];
$iPlaceID = $aPoly['place_id'];
@@ -205,6 +207,7 @@ class ReverseGeocode
// for place nodes at rank_address 16
$sSQL .= ' AND rank_search > '.$iRankSearch;
$sSQL .= ' AND rank_search <= '.$iMaxRank;
$sSQL .= ' AND rank_search < 26 '; // needed to select right index
$sSQL .= ' AND rank_address > 0';
$sSQL .= ' AND class = \'place\'';
$sSQL .= ' AND type != \'postcode\'';
@@ -242,26 +245,24 @@ class ReverseGeocode
public function lookupPoint($sPointSQL, $bDoInterpolation = true)
{
Debug::newFunction('lookupPoint');
// starts if the search is on POI or street level,
// searches for the nearest POI or street,
// if a street is found and a POI is searched for,
// the nearest POI which the found street is a parent of is choosen.
$iMaxRank = $this->iMaxRank;
// Find the nearest point
$fSearchDiam = 0.006;
$oResult = null;
$aPlace = null;
// for POI or street level
if ($iMaxRank >= 26) {
if ($this->iMaxRank >= 26) {
// starts if the search is on POI or street level,
// searches for the nearest POI or street,
// if a street is found and a POI is searched for,
// the nearest POI which the found street is a parent of is choosen.
$sSQL = 'select place_id,parent_place_id,rank_address,country_code,';
$sSQL .= ' ST_distance('.$sPointSQL.', geometry) as distance';
$sSQL .= ' FROM ';
$sSQL .= ' placex';
$sSQL .= ' WHERE ST_DWithin('.$sPointSQL.', geometry, '.$fSearchDiam.')';
$sSQL .= ' AND';
$sSQL .= ' rank_address between 26 and '.$iMaxRank;
$sSQL .= ' rank_address between 26 and '.$this->iMaxRank;
$sSQL .= ' and (name is not null or housenumber is not null';
$sSQL .= ' or rank_address between 26 and 27)';
$sSQL .= ' and (rank_address between 26 and 27';
@@ -284,7 +285,7 @@ class ReverseGeocode
if ($aPlace) {
// if street and maxrank > streetlevel
if ($iRankAddress <= 27 && $iMaxRank > 27) {
if ($iRankAddress <= 27 && $this->iMaxRank > 27) {
// find the closest object (up to a certain radius) of which the street is a parent of
$sSQL = ' select place_id,';
$sSQL .= ' ST_distance('.$sPointSQL.', geometry) as distance';
@@ -338,7 +339,7 @@ class ReverseGeocode
}
}
if ($bDoInterpolation && $iMaxRank >= 30) {
if ($bDoInterpolation && $this->iMaxRank >= 30) {
$fDistance = $fSearchDiam;
if ($aPlace) {
// We can't reliably go from the closest street to an
@@ -356,7 +357,6 @@ class ReverseGeocode
$oResult = new Result($aHouse['place_id'], Result::TABLE_OSMLINE);
$oResult->iHouseNumber = closestHouseNumber($aHouse);
$aPlace = $aHouse;
$iRankAddress = 30;
}
}
@@ -366,7 +366,7 @@ class ReverseGeocode
}
} else {
// lower than street level ($iMaxRank < 26 )
$oResult = $this->lookupLargeArea($sPointSQL, $iMaxRank);
$oResult = $this->lookupLargeArea($sPointSQL, $this->iMaxRank);
}
Debug::printVar('Final result', $oResult);

View File

@@ -28,6 +28,8 @@ class SearchContext
public $sqlViewboxLarge = '';
/// Reference along a route (as SQL).
public $sqlViewboxCentre = '';
/// List of countries to restrict search to (as array).
public $aCountryList = null;
/// List of countries to restrict search to (as SQL).
public $sqlCountryList = '';
/// List of place IDs to exclude (as SQL).
@@ -187,6 +189,7 @@ class SearchContext
public function setCountryList($aCountries)
{
$this->sqlCountryList = '('.join(',', array_map('addQuotes', $aCountries)).')';
$this->aCountryList = $aCountries;
}
/**
@@ -279,6 +282,19 @@ class SearchContext
return '';
}
/**
* Check if the given country is covered by the search context.
*
* @param string $sCountryCode Country code of the country to check.
*
* @return True, if no country code restrictions are set or the
* country is included in the country list.
*/
public function isCountryApplicable($sCountryCode)
{
return $this->aCountryList === null || in_array($sCountryCode, $this->aCountryList);
}
public function debugInfo()
{
return array(

View File

@@ -19,6 +19,8 @@ class SearchDescription
private $aName = array();
/// True if the name is rare enough to force index use on name.
private $bRareName = false;
/// True if the name requires to be accompanied by address terms.
private $bNameNeedsAddress = false;
/// List of word ids making up the address of the object.
private $aAddress = array();
/// List of word ids that appear in the name but should be ignored.
@@ -67,35 +69,6 @@ class SearchDescription
return $this->iSearchRank;
}
/**
* Make this search a POI search.
*
* In a POI search, objects are not (only) searched by their name
* but also by the primary OSM key/value pair (class and type in Nominatim).
*
* @param integer $iOperator Type of POI search
* @param string $sClass Class (or OSM tag key) of POI.
* @param string $sType Type (or OSM tag value) of POI.
*
* @return void
*/
public function setPoiSearch($iOperator, $sClass, $sType)
{
$this->iOperator = $iOperator;
$this->sClass = $sClass;
$this->sType = $sType;
}
/**
* Check if any operator is set.
*
* @return bool True, if this is a special search operation.
*/
public function hasOperator()
{
return $this->iOperator != Operator::NONE;
}
/**
* Extract key/value pairs from a query.
*
@@ -142,257 +115,253 @@ class SearchDescription
return false;
}
}
if ($this->bNameNeedsAddress && empty($this->aAddress)) {
return false;
}
return true;
}
/////////// Search building functions
/**
* Derive new searches by adding a full term to the existing search.
* Create a copy of this search description adding to search rank.
*
* @param object $oSearchTerm Description of the token.
* @param bool $bHasPartial True if there are also tokens of partial terms
* with the same name.
* @param string $sPhraseType Type of phrase the token is contained in.
* @param bool $bFirstToken True if the token is at the beginning of the
* query.
* @param bool $bFirstPhrase True if the token is in the first phrase of
* the query.
* @param bool $bLastToken True if the token is at the end of the query.
* @param integer $iTermCost Cost to add to the current search rank.
*
* @return SearchDescription[] List of derived search descriptions.
* @return object Cloned search description.
*/
public function extendWithFullTerm($oSearchTerm, $bHasPartial, $sPhraseType, $bFirstToken, $bFirstPhrase, $bLastToken)
public function clone($iTermCost)
{
$aNewSearches = array();
$oSearch = clone $this;
$oSearch->iSearchRank += $iTermCost;
if (($sPhraseType == '' || $sPhraseType == 'country')
&& is_a($oSearchTerm, '\Nominatim\Token\Country')
) {
if (!$this->sCountryCode) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->sCountryCode = $oSearchTerm->sCountryCode;
// Country is almost always at the end of the string
// - increase score for finding it anywhere else (optimisation)
if (!$bLastToken) {
$oSearch->iSearchRank += 5;
$oSearch->iNamePhrase = -1;
}
$aNewSearches[] = $oSearch;
}
} elseif (($sPhraseType == '' || $sPhraseType == 'postalcode')
&& is_a($oSearchTerm, '\Nominatim\Token\Postcode')
) {
if (!$this->sPostcode) {
// If we have structured search or this is the first term,
// make the postcode the primary search element.
if ($this->iOperator == Operator::NONE && $bFirstToken) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->iOperator = Operator::POSTCODE;
$oSearch->aAddress = array_merge($this->aAddress, $this->aName);
$oSearch->aName =
array($oSearchTerm->iId => $oSearchTerm->sPostcode);
$aNewSearches[] = $oSearch;
}
// If we have a structured search or this is not the first term,
// add the postcode as an addendum.
if ($this->iOperator != Operator::POSTCODE
&& ($sPhraseType == 'postalcode' || !empty($this->aName))
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->iNamePhrase = -1;
if (strlen($oSearchTerm->sPostcode) < 4) {
$oSearch->iSearchRank += 4 - strlen($oSearchTerm->sPostcode);
}
$oSearch->sPostcode = $oSearchTerm->sPostcode;
$aNewSearches[] = $oSearch;
}
}
} elseif (($sPhraseType == '' || $sPhraseType == 'street')
&& is_a($oSearchTerm, '\Nominatim\Token\HouseNumber')
) {
if (!$this->sHouseNumber && $this->iOperator != Operator::POSTCODE) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->iNamePhrase = -1;
$oSearch->sHouseNumber = $oSearchTerm->sToken;
if ($this->iOperator != Operator::NONE) {
$oSearch->iSearchRank++;
}
// sanity check: if the housenumber is not mainly made
// up of numbers, add a penalty
if (preg_match('/\\d/', $oSearch->sHouseNumber) === 0
|| preg_match_all('/[^0-9]/', $oSearch->sHouseNumber, $aMatches) > 2) {
$oSearch->iSearchRank++;
}
if (empty($oSearchTerm->iId)) {
$oSearch->iSearchRank++;
}
// also must not appear in the middle of the address
if (!empty($this->aAddress)
|| (!empty($this->aAddressNonSearch))
|| $this->sPostcode
) {
$oSearch->iSearchRank++;
}
$aNewSearches[] = $oSearch;
// Housenumbers may appear in the name when the place has its own
// address terms.
if ($oSearchTerm->iId !== null
&& ($this->iNamePhrase >= 0 || empty($this->aName))
&& empty($this->aAddress)
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->aAddress = $this->aName;
$oSearch->bRareName = false;
$oSearch->aName = array($oSearchTerm->iId => $oSearchTerm->iId);
$aNewSearches[] = $oSearch;
}
}
} elseif ($sPhraseType == ''
&& is_a($oSearchTerm, '\Nominatim\Token\SpecialTerm')
) {
if ($this->iOperator == Operator::NONE) {
$oSearch = clone $this;
$oSearch->iSearchRank += 2;
$oSearch->iNamePhrase = -1;
$iOp = $oSearchTerm->iOperator;
if ($iOp == Operator::NONE) {
if (!empty($this->aName) || $this->oContext->isBoundedSearch()) {
$iOp = Operator::NAME;
} else {
$iOp = Operator::NEAR;
}
$oSearch->iSearchRank += 2;
} elseif (!$bFirstToken && !$bLastToken) {
$oSearch->iSearchRank += 2;
}
if ($this->sHouseNumber) {
$oSearch->iSearchRank++;
}
$oSearch->setPoiSearch(
$iOp,
$oSearchTerm->sClass,
$oSearchTerm->sType
);
$aNewSearches[] = $oSearch;
}
} elseif ($sPhraseType != 'country'
&& is_a($oSearchTerm, '\Nominatim\Token\Word')
) {
$iWordID = $oSearchTerm->iId;
// Full words can only be a name if they appear at the beginning
// of the phrase. In structured search the name must forcably in
// the first phrase. In unstructured search it may be in a later
// phrase when the first phrase is a house number.
if (!empty($this->aName) || !($bFirstPhrase || $sPhraseType == '')) {
if (($sPhraseType == '' || !$bFirstPhrase) && !$bHasPartial) {
$oSearch = clone $this;
$oSearch->iNamePhrase = -1;
$oSearch->iSearchRank += 3 * $oSearchTerm->iTermCount;
$oSearch->aAddress[$iWordID] = $iWordID;
$aNewSearches[] = $oSearch;
}
} elseif (empty($this->aNameNonSearch)) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
$oSearch->aName = array($iWordID => $iWordID);
if (CONST_Search_NameOnlySearchFrequencyThreshold) {
$oSearch->bRareName =
$oSearchTerm->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold;
}
$aNewSearches[] = $oSearch;
}
}
return $aNewSearches;
return $oSearch;
}
/**
* Derive new searches by adding a partial term to the existing search.
* Check if the search currently includes a name.
*
* @param string $sToken Term for the token.
* @param object $oSearchTerm Description of the token.
* @param bool $bStructuredPhrases True if the search is structured.
* @param integer $iPhrase Number of the phrase the token is in.
* @param array[] $aFullTokens List of full term tokens with the
* same name.
* @param bool bIncludeNonNames If true stop-word tokens are taken into
* account, too.
*
* @return SearchDescription[] List of derived search descriptions.
* @return bool True, if search has a name.
*/
public function extendWithPartialTerm($sToken, $oSearchTerm, $bStructuredPhrases, $iPhrase, $aFullTokens)
public function hasName($bIncludeNonNames = false)
{
// Only allow name terms.
if (!(is_a($oSearchTerm, '\Nominatim\Token\Word'))) {
return array();
return !empty($this->aName)
|| (!empty($this->aNameNonSearch) && $bIncludeNonNames);
}
/**
* Check if the search currently includes an address term.
*
* @return bool True, if any address term is included, including stop-word
* terms.
*/
public function hasAddress()
{
return !empty($this->aAddress) || !empty($this->aAddressNonSearch);
}
/**
* Check if a country restriction is currently included in the search.
*
* @return bool True, if a country restriction is set.
*/
public function hasCountry()
{
return $this->sCountryCode !== '';
}
/**
* Check if a postcode is currently included in the search.
*
* @return bool True, if a postcode is set.
*/
public function hasPostcode()
{
return $this->sPostcode !== '';
}
/**
* Check if a house number is set for the search.
*
* @return bool True, if a house number is set.
*/
public function hasHousenumber()
{
return $this->sHouseNumber !== '';
}
/**
* Check if a special type of place is requested.
*
* param integer iOperator When set, check for the particular
* operator used for the special type.
*
* @return bool True, if speial type is requested or, if requested,
* a special type with the given operator.
*/
public function hasOperator($iOperator = null)
{
return $iOperator === null ? $this->iOperator != Operator::NONE : $this->iOperator == $iOperator;
}
/**
* Add the given token to the list of terms to search for in the address.
*
* @param integer iID ID of term to add.
* @param bool bSearchable Term should be used to search for result
* (i.e. term is not a stop word).
*/
public function addAddressToken($iId, $bSearchable = true)
{
if ($bSearchable) {
$this->aAddress[$iId] = $iId;
} else {
$this->aAddressNonSearch[$iId] = $iId;
}
}
$aNewSearches = array();
$iWordID = $oSearchTerm->iId;
/**
* Add the given full-word token to the list of terms to search for in the
* name.
*
* @param interger iId ID of term to add.
* @param bool bRareName True if the term is infrequent enough to not
* require other constraints for efficient search.
*/
public function addNameToken($iId, $bRareName)
{
$this->aName[$iId] = $iId;
$this->bRareName = $bRareName;
$this->bNameNeedsAddress = false;
}
if ((!$bStructuredPhrases || $iPhrase > 0)
&& (!empty($this->aName))
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
if (preg_match('#^[0-9 ]+$#', $sToken)) {
$oSearch->iSearchRank++;
}
if ($oSearchTerm->iSearchNameCount < CONST_Max_Word_Frequency) {
$oSearch->aAddress[$iWordID] = $iWordID;
} else {
$oSearch->aAddressNonSearch[$iWordID] = $iWordID;
if (!empty($aFullTokens)) {
$oSearch->iSearchRank++;
}
}
$aNewSearches[] = $oSearch;
/**
* Add the given partial token to the list of terms to search for in
* the name.
*
* @param integer iID ID of term to add.
* @param bool bSearchable Term should be used to search for result
* (i.e. term is not a stop word).
* @param bool bNeedsAddress True if the term is too unspecific to be used
* in a stand-alone search without an address
* to narrow down the search.
* @param integer iPhraseNumber Index of phrase, where the partial term
* appears.
*/
public function addPartialNameToken($iId, $bSearchable, $bNeedsAddress, $iPhraseNumber)
{
if (empty($this->aName)) {
$this->bNameNeedsAddress = $bNeedsAddress;
} else {
$this->bNameNeedsAddress |= $bNeedsAddress;
}
if ((!$this->sPostcode && !$this->aAddress && !$this->aAddressNonSearch)
&& ((empty($this->aName) && empty($this->aNameNonSearch)) || $this->iNamePhrase == $iPhrase)
&& strpos($sToken, ' ') === false
) {
$oSearch = clone $this;
$oSearch->iSearchRank++;
if (empty($this->aName) && empty($this->aNameNonSearch)) {
$oSearch->iSearchRank++;
}
if (preg_match('#^[0-9 ]+$#', $sToken)) {
$oSearch->iSearchRank++;
}
if ($oSearchTerm->iSearchNameCount < CONST_Max_Word_Frequency) {
if (empty($this->aName)
&& CONST_Search_NameOnlySearchFrequencyThreshold
) {
$oSearch->bRareName =
$oSearchTerm->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold;
} else {
$oSearch->bRareName = false;
}
$oSearch->aName[$iWordID] = $iWordID;
} else {
if (!empty($aFullTokens)) {
$oSearch->iSearchRank++;
}
$oSearch->aNameNonSearch[$iWordID] = $iWordID;
}
$oSearch->iNamePhrase = $iPhrase;
$aNewSearches[] = $oSearch;
if ($bSearchable) {
$this->aName[$iId] = $iId;
} else {
$this->aNameNonSearch[$iId] = $iId;
}
$this->iNamePhrase = $iPhraseNumber;
}
return $aNewSearches;
/**
* Set country restriction for the search.
*
* @param string sCountryCode Country code of country to restrict search to.
*/
public function setCountry($sCountryCode)
{
$this->sCountryCode = $sCountryCode;
$this->iNamePhrase = -1;
}
/**
* Set postcode search constraint.
*
* @param string sPostcode Postcode the result should have.
*/
public function setPostcode($sPostcode)
{
$this->sPostcode = $sPostcode;
$this->iNamePhrase = -1;
}
/**
* Make this search a search for a postcode object.
*
* @param integer iId Token Id for the postcode.
* @param string sPostcode Postcode to look for.
*/
public function setPostcodeAsName($iId, $sPostcode)
{
$this->iOperator = Operator::POSTCODE;
$this->aAddress = array_merge($this->aAddress, $this->aName);
$this->aName = array($iId => $sPostcode);
$this->bRareName = true;
$this->iNamePhrase = -1;
}
/**
* Set house number search cnstraint.
*
* @param string sNumber House number the result should have.
*/
public function setHousenumber($sNumber)
{
$this->sHouseNumber = $sNumber;
$this->iNamePhrase = -1;
}
/**
* Make this search a search for a house number.
*
* @param integer iId Token Id for the house number.
*/
public function setHousenumberAsName($iId)
{
$this->aAddress = array_merge($this->aAddress, $this->aName);
$this->bRareName = false;
$this->bNameNeedsAddress = true;
$this->aName = array($iId => $iId);
$this->iNamePhrase = -1;
}
/**
* Make this search a POI search.
*
* In a POI search, objects are not (only) searched by their name
* but also by the primary OSM key/value pair (class and type in Nominatim).
*
* @param integer $iOperator Type of POI search
* @param string $sClass Class (or OSM tag key) of POI.
* @param string $sType Type (or OSM tag value) of POI.
*
* @return void
*/
public function setPoiSearch($iOperator, $sClass, $sType)
{
$this->iOperator = $iOperator;
$this->sClass = $sClass;
$this->sType = $sType;
$this->iNamePhrase = -1;
}
public function getNamePhrase()
{
return $this->iNamePhrase;
}
/**
* Get the global search context.
*
* @return object Objects of global search constraints.
*/
public function getContext()
{
return $this->oContext;
}
/////////// Query functions
@@ -413,7 +382,6 @@ class SearchDescription
public function query(&$oDB, $iMinRank, $iMaxRank, $iLimit)
{
$aResults = array();
$iHousenumber = -1;
if ($this->sCountryCode
&& empty($this->aName)
@@ -446,23 +414,24 @@ class SearchDescription
// Now search for housenumber, if housenumber provided. Can be zero.
if (($this->sHouseNumber || $this->sHouseNumber === '0') && !empty($aResults)) {
// Downgrade the rank of the street results, they are missing
// the housenumber.
foreach ($aResults as $oRes) {
if ($oRes->iAddressRank >= 26) {
$oRes->iResultRank++;
} else {
$oRes->iResultRank += 2;
}
}
$aHnResults = $this->queryHouseNumber($oDB, $aResults);
if (!empty($aHnResults)) {
foreach ($aHnResults as $oRes) {
$aResults[$oRes->iId] = $oRes;
// Downgrade the rank of the street results, they are missing
// the housenumber. Also drop POI places (rank 30) here, they
// cannot be a parent place and therefore must not be shown
// as a result for a search with a missing housenumber.
foreach ($aResults as $oRes) {
if ($oRes->iAddressRank < 28) {
if ($oRes->iAddressRank >= 26) {
$oRes->iResultRank++;
} else {
$oRes->iResultRank += 2;
}
$aHnResults[$oRes->iId] = $oRes;
}
}
$aResults = $aHnResults;
}
// finally get POIs if requested
@@ -612,32 +581,37 @@ class SearchDescription
// Sort by existence of the requested house number but only if not
// too many results are expected for the street, i.e. if the result
// will be narrowed down by an address. Remeber that with ordering
// will be narrowed down by an address. Remember that with ordering
// every single result has to be checked.
if ($this->sHouseNumber && ($this->bRareName || !empty($this->aAddress) || $this->sPostcode)) {
$sHouseNumberRegex = '\\\\m'.$this->sHouseNumber.'\\\\M';
$aOrder[] = ' (';
$aOrder[0] .= 'EXISTS(';
$aOrder[0] .= ' SELECT place_id';
$aOrder[0] .= ' FROM placex';
$aOrder[0] .= ' WHERE parent_place_id = search_name.place_id';
$aOrder[0] .= " AND housenumber ~* E'".$sHouseNumberRegex."'";
$aOrder[0] .= ' LIMIT 1';
$aOrder[0] .= ') ';
// also housenumbers from interpolation lines table are needed
if (preg_match('/[0-9]+/', $this->sHouseNumber)) {
$iHouseNumber = intval($this->sHouseNumber);
$aOrder[0] .= 'OR EXISTS(';
$aOrder[0] .= ' SELECT place_id ';
$aOrder[0] .= ' FROM location_property_osmline ';
$aOrder[0] .= ' WHERE parent_place_id = search_name.place_id';
$aOrder[0] .= ' AND startnumber is not NULL';
$aOrder[0] .= ' AND '.$iHouseNumber.'>=startnumber ';
$aOrder[0] .= ' AND '.$iHouseNumber.'<=endnumber ';
$aOrder[0] .= ' LIMIT 1';
$aOrder[0] .= ')';
// Housenumbers on streets and places.
$sChildHnr = 'SELECT * FROM placex WHERE parent_place_id = search_name.place_id';
$sChildHnr .= " AND housenumber ~* E'".$sHouseNumberRegex."'";
// Interpolations on streets and places.
if (preg_match('/^[0-9]+$/', $this->sHouseNumber)) {
$sIpolHnr = 'SELECT * FROM location_property_osmline ';
$sIpolHnr .= 'WHERE parent_place_id = search_name.place_id ';
$sIpolHnr .= ' AND startnumber is not NULL';
$sIpolHnr .= ' AND '.$this->sHouseNumber.'>=startnumber ';
$sIpolHnr .= ' AND '.$this->sHouseNumber.'<=endnumber ';
} else {
$sIpolHnr = false;
}
$aOrder[0] .= ') DESC';
// Housenumbers on the object iteself for unlisted places.
$sSelfHnr = 'SELECT * FROM placex WHERE place_id = search_name.place_id';
$sSelfHnr .= " AND housenumber ~* E'".$sHouseNumberRegex."'";
$sSql = '(CASE WHEN address_rank = 30 THEN EXISTS('.$sSelfHnr.') ';
$sSql .= ' ELSE EXISTS('.$sChildHnr.') ';
if ($sIpolHnr) {
$sSql .= 'OR EXISTS('.$sIpolHnr.') ';
}
$sSql .= 'END) DESC';
$aOrder[] = $sSql;
}
if (!empty($this->aName)) {
@@ -670,7 +644,7 @@ class SearchDescription
$aOrder[] = $this->oContext->distanceSQL('centroid');
} elseif ($this->sPostcode) {
if (empty($this->aAddress)) {
$aTerms[] = "EXISTS(SELECT place_id FROM location_postcode p WHERE p.postcode = '".$this->sPostcode."' AND ST_DWithin(search_name.centroid, p.geometry, 0.1))";
$aTerms[] = "EXISTS(SELECT place_id FROM location_postcode p WHERE p.postcode = '".$this->sPostcode."' AND ST_DWithin(search_name.centroid, p.geometry, 0.12))";
} else {
$aOrder[] = "(SELECT min(ST_Distance(search_name.centroid, p.geometry)) FROM location_postcode p WHERE p.postcode = '".$this->sPostcode."')";
}
@@ -742,16 +716,33 @@ class SearchDescription
private function queryHouseNumber(&$oDB, $aRoadPlaceIDs)
{
$aResults = array();
$sPlaceIDs = Result::joinIdsByTable($aRoadPlaceIDs, Result::TABLE_PLACEX);
$sRoadPlaceIDs = Result::joinIdsByTableMaxRank(
$aRoadPlaceIDs,
Result::TABLE_PLACEX,
27
);
$sPOIPlaceIDs = Result::joinIdsByTableMinRank(
$aRoadPlaceIDs,
Result::TABLE_PLACEX,
30
);
if (!$sPlaceIDs) {
$aIDCondition = array();
if ($sRoadPlaceIDs) {
$aIDCondition[] = 'parent_place_id in ('.$sRoadPlaceIDs.')';
}
if ($sPOIPlaceIDs) {
$aIDCondition[] = 'place_id in ('.$sPOIPlaceIDs.')';
}
if (empty($aIDCondition)) {
return $aResults;
}
$sHouseNumberRegex = '\\\\m'.$this->sHouseNumber.'\\\\M';
$sSQL = 'SELECT place_id FROM placex ';
$sSQL .= 'WHERE parent_place_id in ('.$sPlaceIDs.')';
$sSQL .= " AND housenumber ~* E'".$sHouseNumberRegex."'";
$sSQL = 'SELECT place_id FROM placex WHERE';
$sSQL .= " housenumber ~* E'".$sHouseNumberRegex."'";
$sSQL .= ' AND ('.join(' OR ', $aIDCondition).')';
$sSQL .= $this->oContext->excludeSQL(' AND place_id');
Debug::printSQL($sSQL);
@@ -763,11 +754,11 @@ class SearchDescription
$bIsIntHouseNumber= (bool) preg_match('/[0-9]+/', $this->sHouseNumber);
$iHousenumber = intval($this->sHouseNumber);
if ($bIsIntHouseNumber && empty($aResults)) {
if ($bIsIntHouseNumber && $sRoadPlaceIDs && empty($aResults)) {
// if nothing found, search in the interpolation line table
$sSQL = 'SELECT distinct place_id FROM location_property_osmline';
$sSQL .= ' WHERE startnumber is not NULL';
$sSQL .= ' AND parent_place_id in ('.$sPlaceIDs.') AND (';
$sSQL .= ' AND parent_place_id in ('.$sRoadPlaceIDs.') AND (';
if ($iHousenumber % 2 == 0) {
// If housenumber is even, look for housenumber in streets
// with interpolationtype even or all.
@@ -790,24 +781,10 @@ class SearchDescription
}
}
// If nothing found try the aux fallback table
if (CONST_Use_Aux_Location_data && empty($aResults)) {
$sSQL = 'SELECT place_id FROM location_property_aux';
$sSQL .= ' WHERE parent_place_id in ('.$sPlaceIDs.')';
$sSQL .= " AND housenumber = '".$this->sHouseNumber."'";
$sSQL .= $this->oContext->excludeSQL(' AND place_id');
Debug::printSQL($sSQL);
foreach ($oDB->getCol($sSQL) as $iPlaceId) {
$aResults[$iPlaceId] = new Result($iPlaceId, Result::TABLE_AUX);
}
}
// If nothing found then search in Tiger data (location_property_tiger)
if (CONST_Use_US_Tiger_Data && $bIsIntHouseNumber && empty($aResults)) {
if (CONST_Use_US_Tiger_Data && $sRoadPlaceIDs && $bIsIntHouseNumber && empty($aResults)) {
$sSQL = 'SELECT place_id FROM location_property_tiger';
$sSQL .= ' WHERE parent_place_id in ('.$sPlaceIDs.') and (';
$sSQL .= ' WHERE parent_place_id in ('.$sRoadPlaceIDs.') and (';
if ($iHousenumber % 2 == 0) {
$sSQL .= "interpolationtype='even'";
} else {

View File

@@ -0,0 +1,87 @@
<?php
namespace Nominatim;
/**
* Description of the position of a token within a query.
*/
class SearchPosition
{
private $sPhraseType;
private $iPhrase;
private $iNumPhrases;
private $iToken;
private $iNumTokens;
public function __construct($sPhraseType, $iPhrase, $iNumPhrases)
{
$this->sPhraseType = $sPhraseType;
$this->iPhrase = $iPhrase;
$this->iNumPhrases = $iNumPhrases;
}
public function setTokenPosition($iToken, $iNumTokens)
{
$this->iToken = $iToken;
$this->iNumTokens = $iNumTokens;
}
/**
* Check if the phrase can be of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase is untyped or of the given type.
*/
public function maybePhrase($sType)
{
return $this->sPhraseType == '' || $this->sPhraseType == $sType;
}
/**
* Check if the phrase is exactly of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase of the given type.
*/
public function isPhrase($sType)
{
return $this->sPhraseType == $sType;
}
/**
* Return true if the token is the very first in the query.
*/
public function isFirstToken()
{
return $this->iPhrase == 0 && $this->iToken == 0;
}
/**
* Check if the token is the final one in the query.
*/
public function isLastToken()
{
return $this->iToken + 1 == $this->iNumTokens && $this->iPhrase + 1 == $this->iNumPhrases;
}
/**
* Check if the current token is part of the first phrase in the query.
*/
public function isFirstPhrase()
{
return $this->iPhrase == 0;
}
/**
* Get the phrase position in the query.
*/
public function getPhrase()
{
return $this->iPhrase;
}
}

View File

@@ -33,7 +33,9 @@ class Shell
public function addEnvPair($sKey, $sVal)
{
if (isset($sKey) && $sKey && isset($sVal)) {
if (!isset($this->aEnv)) $this->aEnv = $_ENV;
if (!isset($this->aEnv)) {
$this->aEnv = $_ENV;
}
$this->aEnv = array_merge($this->aEnv, array($sKey => $sVal), $_ENV);
}
return $this;
@@ -75,11 +77,8 @@ class Shell
return $iStat;
}
private function escapeParam($sParam)
{
if (preg_match('/^-*\w+$/', $sParam)) return $sParam;
return escapeshellarg($sParam);
return (preg_match('/^-*\w+$/', $sParam)) ? $sParam : escapeshellarg($sParam);
}
}

131
lib-php/SimpleWordList.php Normal file
View File

@@ -0,0 +1,131 @@
<?php
namespace Nominatim;
/**
* A word list creator based on simple splitting by space.
*
* Creates possible permutations of split phrases by finding all combination
* of splitting the phrase on space boundaries.
*/
class SimpleWordList
{
const MAX_WORDSET_LEN = 20;
const MAX_WORDSETS = 100;
// The phrase as a list of simple terms (without spaces).
private $aWords;
/**
* Create a new word list
*
* @param string sPhrase Phrase to create the word list from. The phrase is
* expected to be normalised, so that there are no
* subsequent spaces.
*/
public function __construct($sPhrase)
{
if (strlen($sPhrase) > 0) {
$this->aWords = explode(' ', $sPhrase);
} else {
$this->aWords = array();
}
}
/**
* Get all possible tokens that are present in this word list.
*
* @return array The list of string tokens in the word list.
*/
public function getTokens()
{
$aTokens = array();
$iNumWords = count($this->aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $this->aWords[$i];
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$this->aWords[$j];
$aTokens[$sPhrase] = $sPhrase;
}
}
return $aTokens;
}
/**
* Compute all possible permutations of phrase splits that result in
* words which are in the token list.
*/
public function getWordSets($oTokens)
{
$iNumWords = count($this->aWords);
if ($iNumWords == 0) {
return null;
}
// Caches the word set for the partial phrase up to word i.
$aSetCache = array_fill(0, $iNumWords, array());
// Initialise first element of cache. There can only be the word.
if ($oTokens->containsAny($this->aWords[0])) {
$aSetCache[0][] = array($this->aWords[0]);
}
// Now do the next elements using what we already have.
for ($i = 1; $i < $iNumWords; $i++) {
for ($j = $i; $j > 0; $j--) {
$sPartial = $j == $i ? $this->aWords[$j] : $this->aWords[$j].' '.$sPartial;
if (!empty($aSetCache[$j - 1]) && $oTokens->containsAny($sPartial)) {
$aPartial = array($sPartial);
foreach ($aSetCache[$j - 1] as $aSet) {
if (count($aSet) < SimpleWordList::MAX_WORDSET_LEN) {
$aSetCache[$i][] = array_merge($aSet, $aPartial);
}
}
if (count($aSetCache[$i]) > 2 * SimpleWordList::MAX_WORDSETS) {
usort(
$aSetCache[$i],
array('\Nominatim\SimpleWordList', 'cmpByArraylen')
);
$aSetCache[$i] = array_slice(
$aSetCache[$i],
0,
SimpleWordList::MAX_WORDSETS
);
}
}
}
// finally the current full phrase
$sPartial = $this->aWords[0].' '.$sPartial;
if ($oTokens->containsAny($sPartial)) {
$aSetCache[$i][] = array($sPartial);
}
}
$aWordSets = $aSetCache[$iNumWords - 1];
usort($aWordSets, array('\Nominatim\SimpleWordList', 'cmpByArraylen'));
return array_slice($aWordSets, 0, SimpleWordList::MAX_WORDSETS);
}
public static function cmpByArraylen($aA, $aB)
{
$iALen = count($aA);
$iBLen = count($aB);
if ($iALen == $iBLen) {
return 0;
}
return ($iALen < $iBLen) ? -1 : 1;
}
public function debugInfo()
{
return $this->aWords;
}
}

View File

@@ -2,6 +2,8 @@
namespace Nominatim;
require_once(CONST_TokenizerDir.'/tokenizer.php');
use Exception;
class Status
@@ -25,24 +27,8 @@ class Status
throw new Exception('Database connection failed', 700);
}
$sStandardWord = $this->oDB->getOne("SELECT make_standard_name('a')");
if ($sStandardWord === false) {
throw new Exception('Module failed', 701);
}
if ($sStandardWord != 'a') {
throw new Exception('Module call failed', 702);
}
$sSQL = 'SELECT word_id, word_token, word, class, type, country_code, ';
$sSQL .= "operator, search_name_count FROM word WHERE word_token IN (' a')";
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new Exception('Query failed', 703);
}
if (!$iWordID) {
throw new Exception('No value', 704);
}
$oTokenizer = new \Nominatim\Tokenizer($this->oDB);
$oTokenizer->checkStatus();
}
public function dataDate()
@@ -51,7 +37,7 @@ class Status
$iDataDateEpoch = $this->oDB->getOne($sSQL);
if ($iDataDateEpoch === false) {
throw Exception('Data date query failed '.$iDataDateEpoch->getMessage(), 705);
throw new Exception('Import date is not available', 705);
}
return $iDataDateEpoch;

View File

@@ -8,9 +8,9 @@ namespace Nominatim\Token;
class Country
{
/// Database word id, if available.
public $iId;
private $iId;
/// Two-letter country code (lower-cased).
public $sCountryCode;
private $sCountryCode;
public function __construct($iId, $sCountryCode)
{
@@ -18,6 +18,46 @@ class Country
$this->sCountryCode = $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasCountry()
&& $oPosition->maybePhrase('country')
&& $oSearch->getContext()->isCountryApplicable($this->sCountryCode);
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$oNewSearch = $oSearch->clone($oPosition->isLastToken() ? 1 : 6);
$oNewSearch->setCountry($this->sCountryCode);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
@@ -26,4 +66,9 @@ class Country
'Info' => $this->sCountryCode
);
}
public function debugCode()
{
return 'C';
}
}

View File

@@ -8,9 +8,9 @@ namespace Nominatim\Token;
class HouseNumber
{
/// Database word id, if available.
public $iId;
private $iId;
/// Normalized house number.
public $sToken;
private $sToken;
public function __construct($iId, $sToken)
{
@@ -18,6 +18,80 @@ class HouseNumber
$this->sToken = $sToken;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasHousenumber()
&& !$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& $oPosition->maybePhrase('street');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// sanity check: if the housenumber is not mainly made
// up of numbers, add a penalty
$iSearchCost = 1;
if (preg_match('/\\d/', $this->sToken) === 0
|| preg_match_all('/[^0-9]/', $this->sToken, $aMatches) > 2) {
$iSearchCost += strlen($this->sToken) - 1;
}
if (!$oSearch->hasOperator(\Nominatim\Operator::NONE)) {
$iSearchCost++;
}
if (empty($this->iId)) {
$iSearchCost++;
}
// also must not appear in the middle of the address
if ($oSearch->hasAddress() || $oSearch->hasPostcode()) {
$iSearchCost++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumber($this->sToken);
$aNewSearches[] = $oNewSearch;
// Housenumbers may appear in the name when the place has its own
// address terms.
if ($this->iId !== null
&& ($oSearch->getNamePhrase() >= 0 || !$oSearch->hasName())
&& !$oSearch->hasAddress()
) {
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumberAsName($this->iId);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
@@ -26,4 +100,9 @@ class HouseNumber
'Info' => array('nr' => $this->sToken)
);
}
public function debugCode()
{
return 'H';
}
}

View File

@@ -7,6 +7,7 @@ require_once(CONST_LibDir.'/TokenHousenumber.php');
require_once(CONST_LibDir.'/TokenPostcode.php');
require_once(CONST_LibDir.'/TokenSpecialTerm.php');
require_once(CONST_LibDir.'/TokenWord.php');
require_once(CONST_LibDir.'/TokenPartial.php');
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
/**
@@ -17,15 +18,6 @@ require_once(CONST_LibDir.'/SpecialSearchOperator.php');
* tokens do not have a common base class. All tokens need to have a field
* with the word id that points to an entry in the `word` database table
* but otherwise the information saved about a token can be very different.
*
* There are two different kinds of token words: full words and partial terms.
*
* Full words start with a space. They represent a complete name of a place.
* All special tokens are normally full words.
*
* Partial terms have no space at the beginning. They may represent a part of
* a name of a place (e.g. in the name 'World Trade Center' a partial term
* would be 'Trade' or 'Trade Center'). They are only used in TokenWord.
*/
class TokenList
{
@@ -64,7 +56,7 @@ class TokenList
*/
public function containsAny($sWord)
{
return isset($this->aTokens[$sWord]) || isset($this->aTokens[' '.$sWord]);
return isset($this->aTokens[$sWord]);
}
/**
@@ -86,8 +78,8 @@ class TokenList
foreach ($this->aTokens as $aTokenList) {
foreach ($aTokenList as $oToken) {
if (is_a($oToken, '\Nominatim\Token\Word') && !$oToken->bPartial) {
$ids[$oToken->iId] = $oToken->iId;
if (is_a($oToken, '\Nominatim\Token\Word')) {
$ids[$oToken->getId()] = $oToken->getId();
}
}
}
@@ -95,88 +87,6 @@ class TokenList
return $ids;
}
/**
* Add token information from the word table in the database.
*
* @param object $oDB Nominatim::DB instance.
* @param string[] $aTokens List of tokens to look up in the database.
* @param string[] $aCountryCodes List of country restrictions.
* @param string $sNormQuery Normalized query string.
* @param object $oNormalizer Normalizer function to use on tokens.
*
* @return void
*/
public function addTokensFromDB(&$oDB, &$aTokens, &$aCountryCodes, $sNormQuery, $oNormalizer)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, word, class, type, country_code,';
$sSQL .= ' operator, coalesce(search_name_count, 0) as count';
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$oToken = null;
$iId = (int) $aWord['word_id'];
if ($aWord['class']) {
// Special terms need to appear in their normalized form.
if ($aWord['word']) {
$sNormWord = $aWord['word'];
if ($oNormalizer != null) {
$sNormWord = $oNormalizer->transliterate($aWord['word']);
}
if (strpos($sNormQuery, $sNormWord) === false) {
continue;
}
}
if ($aWord['class'] == 'place' && $aWord['type'] == 'house') {
$oToken = new Token\HouseNumber($iId, trim($aWord['word_token']));
} elseif ($aWord['class'] == 'place' && $aWord['type'] == 'postcode') {
if ($aWord['word']
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$oToken = new Token\Postcode(
$iId,
$aWord['word'],
$aWord['country_code']
);
}
} else {
// near and in operator the same at the moment
$oToken = new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['type'],
$aWord['operator'] ? Operator::NEAR : Operator::NONE
);
}
} elseif ($aWord['country_code']) {
// Filter country tokens that do not match restricted countries.
if (!$aCountryCodes
|| in_array($aWord['country_code'], $aCountryCodes)
) {
$oToken = new Token\Country($iId, $aWord['country_code']);
}
} else {
$oToken = new Token\Word(
$iId,
$aWord['word_token'][0] != ' ',
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
);
}
if ($oToken) {
$this->addToken($aWord['word_token'], $oToken);
}
}
}
/**
* Add a new token for the given word.
*
@@ -199,9 +109,9 @@ class TokenList
$aWordsIDs = array();
foreach ($this->aTokens as $sToken => $aWords) {
foreach ($aWords as $aToken) {
if ($aToken->iId !== null) {
$aWordsIDs[$aToken->iId] =
'#'.$sToken.'('.$aToken->iId.')#';
$iId = $aToken->getId();
if ($iId !== null) {
$aWordsIDs[$iId] = '#'.$sToken.'('.$aToken->debugCode().' '.$iId.')#';
}
}
}

119
lib-php/TokenPartial.php Normal file
View File

@@ -0,0 +1,119 @@
<?php
namespace Nominatim\Token;
/**
* A standard word token.
*/
class Partial
{
/// Database word id, if applicable.
private $iId;
/// Number of appearances in the database.
private $iSearchNameCount;
/// True, if the token consists exclusively of digits and spaces.
private $bNumberToken;
public function __construct($iId, $sToken, $iSearchNameCount)
{
$this->iId = $iId;
$this->bNumberToken = (bool) preg_match('#^[0-9 ]+$#', $sToken);
$this->iSearchNameCount = $iSearchNameCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// Partial token in Address.
if (($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
&& $oSearch->hasName()
) {
$iSearchCost = $this->bNumberToken ? 2 : 1;
if ($this->iSearchNameCount >= CONST_Max_Word_Frequency) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addAddressToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency
);
$aNewSearches[] = $oNewSearch;
}
// Partial token in Name.
if ((!$oSearch->hasPostcode() && !$oSearch->hasAddress())
&& (!$oSearch->hasName(true)
|| $oSearch->getNamePhrase() == $oPosition->getPhrase())
) {
$iSearchCost = 1;
if (!$oSearch->hasName(true)) {
$iSearchCost += 1;
}
if ($this->bNumberToken) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addPartialNameToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency,
$this->iSearchNameCount > CONST_Search_NameOnlySearchFrequencyThreshold,
$oPosition->getPhrase()
);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'partial',
'Info' => array(
'count' => $this->iSearchNameCount
)
);
}
public function debugCode()
{
return 'w';
}
}

View File

@@ -8,11 +8,11 @@ namespace Nominatim\Token;
class Postcode
{
/// Database word id, if available.
public $iId;
private $iId;
/// Full nomralized postcode (upper cased).
public $sPostcode;
private $sPostcode;
// Optional country code the postcode belongs to (currently unused).
public $sCountryCode;
private $sCountryCode;
public function __construct($iId, $sPostcode, $sCountryCode = '')
{
@@ -21,6 +21,67 @@ class Postcode
$this->sCountryCode = empty($sCountryCode) ? '' : $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasPostcode() && $oPosition->maybePhrase('postalcode');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// If we have structured search or this is the first term,
// make the postcode the primary search element.
if ($oSearch->hasOperator(\Nominatim\Operator::NONE) && $oPosition->isFirstToken()) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->setPostcodeAsName($this->iId, $this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
// If we have a structured search or this is not the first term,
// add the postcode as an addendum.
if (!$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& ($oPosition->isPhrase('postalcode') || $oSearch->hasName())
) {
$iPenalty = 1;
if (strlen($this->sPostcode) < 4) {
$iPenalty += 4 - strlen($this->sPostcode);
}
$oNewSearch = $oSearch->clone($iPenalty);
$oNewSearch->setPostcode($this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
@@ -29,4 +90,9 @@ class Postcode
'Info' => $this->sPostcode.'('.$this->sCountryCode.')'
);
}
public function debugCode()
{
return 'P';
}
}

View File

@@ -10,13 +10,13 @@ require_once(CONST_LibDir.'/SpecialSearchOperator.php');
class SpecialTerm
{
/// Database word id, if applicable.
public $iId;
private $iId;
/// Class (or OSM tag key) of the place to look for.
public $sClass;
private $sClass;
/// Type (or OSM tag value) of the place to look for.
public $sType;
private $sType;
/// Relationship of the operator to the object (see Operator class).
public $iOperator;
private $iOperator;
public function __construct($iID, $sClass, $sType, $iOperator)
{
@@ -26,6 +26,65 @@ class SpecialTerm
$this->iOperator = $iOperator;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasOperator()
&& $oPosition->isPhrase('')
&& ($this->iOperator != \Nominatim\Operator::NONE
|| (!$oSearch->hasAddress() && !$oSearch->hasHousenumber() && !$oSearch->hasCountry()));
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$iSearchCost = 2;
$iOp = $this->iOperator;
if ($iOp == \Nominatim\Operator::NONE) {
if ($oSearch->hasName() || $oSearch->getContext()->isBoundedSearch()) {
$iOp = \Nominatim\Operator::NAME;
} else {
$iOp = \Nominatim\Operator::NEAR;
$iSearchCost += 2;
}
} elseif (!$oPosition->isFirstToken() && !$oPosition->isLastToken()) {
$iSearchCost += 2;
}
if ($oSearch->hasHousenumber()) {
$iSearchCost ++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setPoiSearch($iOp, $this->sClass, $this->sType);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
@@ -38,4 +97,9 @@ class SpecialTerm
)
);
}
public function debugCode()
{
return 'S';
}
}

View File

@@ -8,31 +8,95 @@ namespace Nominatim\Token;
class Word
{
/// Database word id, if applicable.
public $iId;
/// If true, the word may represent only part of a place name.
public $bPartial;
private $iId;
/// Number of appearances in the database.
public $iSearchNameCount;
private $iSearchNameCount;
/// Number of terms in the word.
public $iTermCount;
private $iTermCount;
public function __construct($iId, $bPartial, $iSearchNameCount, $iTermCount)
public function __construct($iId, $iSearchNameCount, $iTermCount)
{
$this->iId = $iId;
$this->bPartial = $bPartial;
$this->iSearchNameCount = $iSearchNameCount;
$this->iTermCount = $iTermCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
// Full words can only be a name if they appear at the beginning
// of the phrase. In structured search the name must forcably in
// the first phrase. In unstructured search it may be in a later
// phrase when the first phrase is a house number.
if ($oSearch->hasName()
|| !($oPosition->isFirstPhrase() || $oPosition->isPhrase(''))
) {
if ($this->iTermCount > 1
&& ($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addAddressToken($this->iId);
return array($oNewSearch);
}
} elseif (!$oSearch->hasName(true)) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addNameToken(
$this->iId,
CONST_Search_NameOnlySearchFrequencyThreshold
&& $this->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold
);
return array($oNewSearch);
}
return array();
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'word',
'Info' => array(
'partial' => $this->bPartial,
'count' => $this->iSearchNameCount
'count' => $this->iSearchNameCount,
'terms' => $this->iTermCount
)
);
}
public function debugCode()
{
return 'W';
}
}

View File

@@ -1,10 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
loadSettings(getcwd());
(new \Nominatim\Shell(getSetting('NOMINATIM_TOOL')))
->addParams('admin', '--check-database')
->run();

View File

@@ -1,34 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
ini_set('memory_limit', '800M');
ini_set('display_errors', 'stderr');
$aCMDOptions
= array(
'Import country language data from osm wiki',
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aCMDResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
setupHTTPProxy();
if (true) {
$sURL = 'https://wiki.openstreetmap.org/wiki/Special:Export/Nominatim/Country_Codes';
$sWikiPageXML = file_get_contents($sURL);
if (preg_match_all('#\\| ([a-z]{2}) \\|\\| [^|]+\\|\\| ([a-z,]+)#', $sWikiPageXML, $aMatches, PREG_SET_ORDER)) {
foreach ($aMatches as $aMatch) {
$aLanguages = explode(',', $aMatch[2]);
foreach ($aLanguages as $i => $s) {
$aLanguages[$i] = '"'.pg_escape_string($s).'"';
}
echo "UPDATE country_name set country_default_language_codes = '{".join(',', $aLanguages)."}' where country_code = '".pg_escape_string($aMatch[1])."';\n";
}
}
}

View File

@@ -49,7 +49,9 @@
$oDB->connect();
if (isset($aCMDResult['output-type'])) {
if (!isset($aRankmap[$aCMDResult['output-type']])) fail('unknown output-type: '.$aCMDResult['output-type']);
if (!isset($aRankmap[$aCMDResult['output-type']])) {
fail('unknown output-type: '.$aCMDResult['output-type']);
}
$iOutputRank = $aRankmap[$aCMDResult['output-type']];
} else {
$iOutputRank = $aRankmap['street'];
@@ -58,14 +60,18 @@
// Preferred language
$oParams = new Nominatim\ParameterParser();
if (!isset($aCMDResult['language'])) $aCMDResult['language'] = 'xx';
if (!isset($aCMDResult['language'])) {
$aCMDResult['language'] = 'xx';
}
$aLangPrefOrder = $oParams->getPreferredLanguages($aCMDResult['language']);
$sLanguagePrefArraySQL = $oDB->getArraySQL($oDB->getDBQuotedList($aLangPrefOrder));
// output formatting: build up a lookup table that maps address ranks to columns
$aColumnMapping = array();
$iNumCol = 0;
if (!isset($aCMDResult['output-format'])) $aCMDResult['output-format'] = 'street;suburb;city;county;state;country';
if (!isset($aCMDResult['output-format'])) {
$aCMDResult['output-format'] = 'street;suburb;city;county;state;country';
}
foreach (preg_split('/\s*;\s*/', $aCMDResult['output-format']) as $sColumn) {
$bHasData = false;
foreach (preg_split('/\s*,\s*/', $sColumn) as $sRank) {
@@ -80,7 +86,9 @@
}
}
}
if ($bHasData) $iNumCol++;
if ($bHasData) {
$iNumCol++;
}
}
// build the query for objects
@@ -122,7 +130,9 @@
if ($sOsmType) {
$sSQL = 'select place_id from placex where osm_type = :osm_type and osm_id = :osm_id';
$sParentId = $oDB->getOne($sSQL, array('osm_type' => $sOsmType, 'osm_id' => $sOsmId));
if (!$sParentId) fail('Could not find place '.$sOsmType.' '.$sOsmId);
if (!$sParentId) {
fail('Could not find place '.$sOsmType.' '.$sOsmId);
}
}
if ($sParentId) {
$sPlacexSQL .= ' and place_id in (select place_id from place_addressline where address_place_id = '.$sParentId.' and isaddress)';
@@ -136,7 +146,6 @@
$oResults = $oDB->getQueryStatement($sPlacexSQL);
$fOutstream = fopen('php://output', 'w');
while ($aRow = $oResults->fetch()) {
//var_dump($aRow);
$iPlaceID = $aRow['place_id'];
$sSQL = "select rank_address,get_name_by_language(name,$sLanguagePrefArraySQL) as localname from get_addressdata(:place_id, -1)";
$sSQL .= ' WHERE isaddress';

View File

@@ -1,93 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/Geocode.php');
require_once(CONST_LibDir.'/ParameterParser.php');
ini_set('memory_limit', '800M');
$aCMDOptions
= array(
'Query database from command line. Returns search result as JSON.',
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('search', '', 0, 1, 1, 1, 'string', 'Search for given term or coordinate'),
array('country', '', 0, 1, 1, 1, 'string', 'Structured search: country'),
array('state', '', 0, 1, 1, 1, 'string', 'Structured search: state'),
array('county', '', 0, 1, 1, 1, 'string', 'Structured search: county'),
array('city', '', 0, 1, 1, 1, 'string', 'Structured search: city'),
array('street', '', 0, 1, 1, 1, 'string', 'Structured search: street'),
array('amenity', '', 0, 1, 1, 1, 'string', 'Structured search: amenity'),
array('postalcode', '', 0, 1, 1, 1, 'string', 'Structured search: postal code'),
array('accept-language', '', 0, 1, 1, 1, 'string', 'Preferred language order for showing search results'),
array('bounded', '', 0, 1, 0, 0, 'bool', 'Restrict results to given viewbox'),
array('nodedupe', '', 0, 1, 0, 0, 'bool', 'Do not remove duplicate results'),
array('limit', '', 0, 1, 1, 1, 'int', 'Maximum number of results returned (default: 10)'),
array('exclude_place_ids', '', 0, 1, 1, 1, 'string', 'Comma-separated list of place ids to exclude from results'),
array('featureType', '', 0, 1, 1, 1, 'string', 'Restrict results to certain features (country, state,city,settlement)'),
array('countrycodes', '', 0, 1, 1, 1, 'string', 'Comma-separated list of countries to restrict search to'),
array('viewbox', '', 0, 1, 1, 1, 'string', 'Prefer results in given view box'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aCMDResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
@define('CONST_Database_DSN', getSetting('DATABASE_DSN'));
@define('CONST_Default_Language', getSetting('DEFAULT_LANGUAGE', false));
@define('CONST_Log_DB', getSettingBool('LOG_DB'));
@define('CONST_Log_File', getSetting('LOG_FILE', false));
@define('CONST_Max_Word_Frequency', getSetting('MAX_WORD_FREQUENCY'));
@define('CONST_NoAccessControl', getSettingBool('CORS_NOACCESSCONTROL'));
@define('CONST_Places_Max_ID_count', getSetting('LOOKUP_MAX_COUNT'));
@define('CONST_PolygonOutput_MaximumTypes', getSetting('POLYGON_OUTPUT_MAX_TYPES'));
@define('CONST_Search_BatchMode', getSettingBool('SEARCH_BATCH_MODE'));
@define('CONST_Search_NameOnlySearchFrequencyThreshold', getSetting('SEARCH_NAME_ONLY_THRESHOLD'));
@define('CONST_Term_Normalization_Rules', getSetting('TERM_NORMALIZATION'));
@define('CONST_Use_Aux_Location_data', getSettingBool('USE_AUX_LOCATION_DATA'));
@define('CONST_Use_US_Tiger_Data', getSettingBool('USE_US_TIGER_DATA'));
@define('CONST_MapIcon_URL', getSetting('MAPICON_URL', false));
$oDB = new Nominatim\DB;
$oDB->connect();
if (isset($aCMDResult['nodedupe'])) $aCMDResult['dedupe'] = 'false';
$oParams = new Nominatim\ParameterParser($aCMDResult);
$aSearchParams = array(
'search',
'amenity',
'street',
'city',
'county',
'state',
'country',
'postalcode'
);
if (!$oParams->hasSetAny($aSearchParams)) {
showUsage($aCMDOptions, true);
return 1;
}
$oGeocode = new Nominatim\Geocode($oDB);
$oGeocode->setLanguagePreference($oParams->getPreferredLanguages(false));
$oGeocode->setReverseInPlan(true);
$oGeocode->loadParamArray($oParams);
if ($oParams->getBool('search')) {
$oGeocode->setQuery($aCMDResult['search']);
} else {
$oGeocode->setQueryFromParams($oParams);
}
$aSearchResults = $oGeocode->lookup();
echo json_encode($aSearchResults, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE)."\n";

View File

@@ -1,218 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/setup/SetupClass.php');
require_once(CONST_LibDir.'/setup_functions.php');
ini_set('memory_limit', '800M');
use Nominatim\Setup\SetupFunctions as SetupFunctions;
// (long-opt, short-opt, min-occurs, max-occurs, num-arguments, num-arguments, type, help)
$aCMDOptions
= array(
'Create and setup nominatim search system',
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('osm-file', '', 0, 1, 1, 1, 'realpath', 'File to import'),
array('threads', '', 0, 1, 1, 1, 'int', 'Number of threads (where possible)'),
array('all', '', 0, 1, 0, 0, 'bool', 'Do the complete process'),
array('create-db', '', 0, 1, 0, 0, 'bool', 'Create nominatim db'),
array('setup-db', '', 0, 1, 0, 0, 'bool', 'Build a blank nominatim db'),
array('import-data', '', 0, 1, 0, 0, 'bool', 'Import a osm file'),
array('osm2pgsql-cache', '', 0, 1, 1, 1, 'int', 'Cache size used by osm2pgsql'),
array('reverse-only', '', 0, 1, 0, 0, 'bool', 'Do not create search tables and indexes'),
array('create-functions', '', 0, 1, 0, 0, 'bool', 'Create functions'),
array('enable-diff-updates', '', 0, 1, 0, 0, 'bool', 'Turn on the code required to make diff updates work'),
array('enable-debug-statements', '', 0, 1, 0, 0, 'bool', 'Include debug warning statements in pgsql commands'),
array('ignore-errors', '', 0, 1, 0, 0, 'bool', 'Continue import even when errors in SQL are present (EXPERT)'),
array('create-tables', '', 0, 1, 0, 0, 'bool', 'Create main tables'),
array('create-partition-tables', '', 0, 1, 0, 0, 'bool', 'Create required partition tables'),
array('create-partition-functions', '', 0, 1, 0, 0, 'bool', 'Create required partition triggers'),
array('no-partitions', '', 0, 1, 0, 0, 'bool', 'Do not partition search indices (speeds up import of single country extracts)'),
array('import-wikipedia-articles', '', 0, 1, 0, 0, 'bool', 'Import wikipedia article dump'),
array('load-data', '', 0, 1, 0, 0, 'bool', 'Copy data to live tables from import table'),
array('disable-token-precalc', '', 0, 1, 0, 0, 'bool', 'Disable name precalculation (EXPERT)'),
array('import-tiger-data', '', 0, 1, 0, 0, 'bool', 'Import tiger data (not included in \'all\')'),
array('calculate-postcodes', '', 0, 1, 0, 0, 'bool', 'Calculate postcode centroids'),
array('index', '', 0, 1, 0, 0, 'bool', 'Index the data'),
array('index-noanalyse', '', 0, 1, 0, 0, 'bool', 'Do not perform analyse operations during index (EXPERT)'),
array('create-search-indices', '', 0, 1, 0, 0, 'bool', 'Create additional indices required for search and update'),
array('create-country-names', '', 0, 1, 0, 0, 'bool', 'Create default list of searchable country names'),
array('drop', '', 0, 1, 0, 0, 'bool', 'Drop tables needed for updates, making the database readonly (EXPERIMENTAL)'),
array('setup-website', '', 0, 1, 0, 0, 'bool', 'Used to compile environment variables for the website'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
// $aCMDOptions passed to getCmdOpt by reference
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aCMDResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
setupHTTPProxy();
$bDidSomething = false;
$oNominatimCmd = new \Nominatim\Shell(getSetting('NOMINATIM_TOOL'));
// by default, use all but one processor, but never more than 15.
$iInstances = max(1, $aCMDResult['threads'] ?? (min(16, getProcessorCount()) - 1));
function run($oCmd)
{
global $iInstances;
global $aCMDResult;
$oCmd->addParams('--threads', $iInstances);
if ($aCMDResult['ignore-errors'] ?? false) {
$oCmd->addParams('--ignore-errors');
}
if ($aCMDResult['quiet'] ?? false) {
$oCmd->addParams('--quiet');
}
if ($aCMDResult['verbose'] ?? false) {
$oCmd->addParams('--verbose');
}
$oCmd->run(true);
}
//*******************************************************
// Making some sanity check:
// Check if osm-file is set and points to a valid file
if ($aCMDResult['import-data'] || $aCMDResult['all']) {
// to remain in /lib/setup_functions.php function
checkInFile($aCMDResult['osm-file']);
}
// ******************************************************
// instantiate Setup class
$oSetup = new SetupFunctions($aCMDResult);
// *******************************************************
// go through complete process if 'all' is selected or start selected functions
if ($aCMDResult['create-db'] || $aCMDResult['all']) {
$bDidSomething = true;
run((clone($oNominatimCmd))->addParams('transition', '--create-db'));
}
if ($aCMDResult['setup-db'] || $aCMDResult['all']) {
$bDidSomething = true;
$oCmd = (clone($oNominatimCmd))->addParams('transition', '--setup-db');
if ($aCMDResult['no-partitions'] ?? false) {
$oCmd->addParams('--no-partitions');
}
run($oCmd);
}
if ($aCMDResult['import-data'] || $aCMDResult['all']) {
$bDidSomething = true;
$oCmd = (clone($oNominatimCmd))
->addParams('transition', '--import-data')
->addParams('--osm-file', $aCMDResult['osm-file']);
if ($aCMDResult['drop'] ?? false) {
$oCmd->addParams('--drop');
}
run($oCmd);
}
if ($aCMDResult['create-functions'] || $aCMDResult['all']) {
$bDidSomething = true;
$oSetup->createSqlFunctions();
}
if ($aCMDResult['create-tables'] || $aCMDResult['all']) {
$bDidSomething = true;
$oCmd = (clone($oNominatimCmd))->addParams('transition', '--create-tables');
if ($aCMDResult['reverse-only'] ?? false) {
$oCmd->addParams('--reverse-only');
}
run($oCmd);
}
if ($aCMDResult['create-partition-tables'] || $aCMDResult['all']) {
$bDidSomething = true;
run((clone($oNominatimCmd))->addParams('transition', '--create-partition-tables'));
}
if ($aCMDResult['create-partition-functions'] || $aCMDResult['all']) {
$bDidSomething = true;
$oSetup->createSqlFunctions(); // also create partition functions
}
if ($aCMDResult['import-wikipedia-articles'] || $aCMDResult['all']) {
$bDidSomething = true;
// ignore errors!
(clone($oNominatimCmd))->addParams('refresh', '--wiki-data')->run();
}
if ($aCMDResult['load-data'] || $aCMDResult['all']) {
$bDidSomething = true;
run((clone($oNominatimCmd))->addParams('transition', '--load-data'));
}
if ($aCMDResult['import-tiger-data']) {
$bDidSomething = true;
$sTigerPath = getSetting('TIGER_DATA_PATH', CONST_InstallDir.'/tiger');
run((clone($oNominatimCmd))->addParams('transition', '--tiger-data', $sTigerPath));
}
if ($aCMDResult['calculate-postcodes'] || $aCMDResult['all']) {
$bDidSomething = true;
$oSetup->calculatePostcodes($aCMDResult['all']);
}
if ($aCMDResult['index'] || $aCMDResult['all']) {
$bDidSomething = true;
$oCmd = (clone($oNominatimCmd))->addParams('transition', '--index');
if ($aCMDResult['index-noanalyse'] ?? false) {
$oCmd->addParams('--no-analyse');
}
run($oCmd);
}
if ($aCMDResult['drop']) {
$bDidSomething = true;
run((clone($oNominatimCmd))->addParams('freeze'));
}
if ($aCMDResult['create-search-indices'] || $aCMDResult['all']) {
$bDidSomething = true;
$oCmd = (clone($oNominatimCmd))->addParams('transition', '--create-search-indices');
if ($aCMDResult['drop'] ?? false) {
$oCmd->addParams('--drop');
}
run($oCmd);
}
if ($aCMDResult['create-country-names'] || $aCMDResult['all']) {
$bDidSomething = true;
run(clone($oNominatimCmd))->addParams('transition', '--create-country-names');
}
if ($aCMDResult['setup-website'] || $aCMDResult['all']) {
$bDidSomething = true;
run((clone($oNominatimCmd))->addParams('refresh', '--website'));
}
// ******************************************************
// If we did something, repeat the warnings
if (!$bDidSomething) {
showUsage($aCMDOptions, true);
} else {
echo "Summary of warnings:\n\n";
repeatWarnings();
echo "\n";
info('Setup finished.');
}

View File

@@ -1,11 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
loadSettings(getcwd());
(new \Nominatim\Shell(getSetting('NOMINATIM_TOOL')))
->addParams('special-phrases', '--import-from-wiki')
->run();

View File

@@ -1,236 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/setup_functions.php');
require_once(CONST_LibDir.'/setup/SetupClass.php');
ini_set('memory_limit', '800M');
use Nominatim\Setup\SetupFunctions as SetupFunctions;
// (long-opt, short-opt, min-occurs, max-occurs, num-arguments, num-arguments, type, help)
$aCMDOptions
= array(
'Import / update / index osm data',
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('init-updates', '', 0, 1, 0, 0, 'bool', 'Set up database for updating'),
array('check-for-updates', '', 0, 1, 0, 0, 'bool', 'Check if new updates are available'),
array('no-update-functions', '', 0, 1, 0, 0, 'bool', 'Do not update trigger functions to support differential updates (assuming the diff update logic is already present)'),
array('import-osmosis', '', 0, 1, 0, 0, 'bool', 'Import updates once'),
array('import-osmosis-all', '', 0, 1, 0, 0, 'bool', 'Import updates forever'),
array('no-index', '', 0, 1, 0, 0, 'bool', 'Do not index the new data'),
array('calculate-postcodes', '', 0, 1, 0, 0, 'bool', 'Update postcode centroid table'),
array('import-file', '', 0, 1, 1, 1, 'realpath', 'Re-import data from an OSM file'),
array('import-diff', '', 0, 1, 1, 1, 'realpath', 'Import a diff (osc) file from local file system'),
array('osm2pgsql-cache', '', 0, 1, 1, 1, 'int', 'Cache size used by osm2pgsql'),
array('import-node', '', 0, 1, 1, 1, 'int', 'Re-import node'),
array('import-way', '', 0, 1, 1, 1, 'int', 'Re-import way'),
array('import-relation', '', 0, 1, 1, 1, 'int', 'Re-import relation'),
array('import-from-main-api', '', 0, 1, 0, 0, 'bool', 'Use OSM API instead of Overpass to download objects'),
array('index', '', 0, 1, 0, 0, 'bool', 'Index'),
array('index-rank', '', 0, 1, 1, 1, 'int', 'Rank to start indexing from'),
array('index-instances', '', 0, 1, 1, 1, 'int', 'Number of indexing instances (threads)'),
array('recompute-word-counts', '', 0, 1, 0, 0, 'bool', 'Compute frequency of full-word search terms'),
array('update-address-levels', '', 0, 1, 0, 0, 'bool', 'Reimport address level configuration (EXPERT)'),
array('recompute-importance', '', 0, 1, 0, 0, 'bool', 'Recompute place importances'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
setupHTTPProxy();
if (!isset($aResult['index-instances'])) $aResult['index-instances'] = 1;
if (!isset($aResult['index-rank'])) $aResult['index-rank'] = 0;
date_default_timezone_set('Etc/UTC');
$oDB = new Nominatim\DB();
$oDB->connect();
$fPostgresVersion = $oDB->getPostgresVersion();
$aDSNInfo = Nominatim\DB::parseDSN(getSetting('DATABASE_DSN'));
if (!isset($aDSNInfo['port']) || !$aDSNInfo['port']) $aDSNInfo['port'] = 5432;
// cache memory to be used by osm2pgsql, should not be more than the available memory
$iCacheMemory = (isset($aResult['osm2pgsql-cache'])?$aResult['osm2pgsql-cache']:2000);
if ($iCacheMemory + 500 > getTotalMemoryMB()) {
$iCacheMemory = getCacheMemoryMB();
echo "WARNING: resetting cache memory to $iCacheMemory\n";
}
$oOsm2pgsqlCmd = (new \Nominatim\Shell(getOsm2pgsqlBinary()))
->addParams('--hstore')
->addParams('--latlong')
->addParams('--append')
->addParams('--slim')
->addParams('--with-forward-dependencies', 'false')
->addParams('--log-progress', 'true')
->addParams('--number-processes', 1)
->addParams('--cache', $iCacheMemory)
->addParams('--output', 'gazetteer')
->addParams('--style', getImportStyle())
->addParams('--database', $aDSNInfo['database'])
->addParams('--port', $aDSNInfo['port']);
if (isset($aDSNInfo['hostspec']) && $aDSNInfo['hostspec']) {
$oOsm2pgsqlCmd->addParams('--host', $aDSNInfo['hostspec']);
}
if (isset($aDSNInfo['username']) && $aDSNInfo['username']) {
$oOsm2pgsqlCmd->addParams('--user', $aDSNInfo['username']);
}
if (isset($aDSNInfo['password']) && $aDSNInfo['password']) {
$oOsm2pgsqlCmd->addEnvPair('PGPASSWORD', $aDSNInfo['password']);
}
if (getSetting('FLATNODE_FILE')) {
$oOsm2pgsqlCmd->addParams('--flat-nodes', getSetting('FLATNODE_FILE'));
}
if ($fPostgresVersion >= 11.0) {
$oOsm2pgsqlCmd->addEnvPair(
'PGOPTIONS',
'-c jit=off -c max_parallel_workers_per_gather=0'
);
}
$oNominatimCmd = new \Nominatim\Shell(getSetting('NOMINATIM_TOOL'));
function run($oCmd)
{
global $aCMDResult;
if ($aCMDResult['quiet'] ?? false) {
$oCmd->addParams('--quiet');
}
if ($aCMDResult['verbose'] ?? false) {
$oCmd->addParams('--verbose');
}
$oCmd->run(true);
}
if ($aResult['init-updates']) {
$oCmd = (clone($oNominatimCmd))->addParams('replication', '--init');
if ($aResult['no-update-functions']) {
$oCmd->addParams('--no-update-functions');
}
run($oCmd);
}
if ($aResult['check-for-updates']) {
exit((clone($oNominatimCmd))->addParams('replication', '--check-for-updates')->run());
}
if (isset($aResult['import-diff']) || isset($aResult['import-file'])) {
// import diffs and files directly (e.g. from osmosis --rri)
$sNextFile = isset($aResult['import-diff']) ? $aResult['import-diff'] : $aResult['import-file'];
if (!file_exists($sNextFile)) {
fail("Cannot open $sNextFile\n");
}
// Import the file
$oCMD = (clone $oOsm2pgsqlCmd)->addParams($sNextFile);
echo $oCMD->escapedCmd()."\n";
$iRet = $oCMD->run();
if ($iRet) {
fail("Error from osm2pgsql, $iRet\n");
}
// Don't update the import status - we don't know what this file contains
}
if ($aResult['calculate-postcodes']) {
run((clone($oNominatimCmd))->addParams('refresh', '--postcodes'));
}
$sTemporaryFile = CONST_InstallDir.'/osmosischange.osc';
$bHaveDiff = false;
$bUseOSMApi = isset($aResult['import-from-main-api']) && $aResult['import-from-main-api'];
$sContentURL = '';
if (isset($aResult['import-node']) && $aResult['import-node']) {
if ($bUseOSMApi) {
$sContentURL = 'https://www.openstreetmap.org/api/0.6/node/'.$aResult['import-node'];
} else {
$sContentURL = 'https://overpass-api.de/api/interpreter?data=node('.$aResult['import-node'].');out%20meta;';
}
}
if (isset($aResult['import-way']) && $aResult['import-way']) {
if ($bUseOSMApi) {
$sContentURL = 'https://www.openstreetmap.org/api/0.6/way/'.$aResult['import-way'].'/full';
} else {
$sContentURL = 'https://overpass-api.de/api/interpreter?data=(way('.$aResult['import-way'].');%3E;);out%20meta;';
}
}
if (isset($aResult['import-relation']) && $aResult['import-relation']) {
if ($bUseOSMApi) {
$sContentURL = 'https://www.openstreetmap.org/api/0.6/relation/'.$aResult['import-relation'].'/full';
} else {
$sContentURL = 'https://overpass-api.de/api/interpreter?data=(rel(id:'.$aResult['import-relation'].');%3E;);out%20meta;';
}
}
if ($sContentURL) {
file_put_contents($sTemporaryFile, file_get_contents($sContentURL));
$bHaveDiff = true;
}
if ($bHaveDiff) {
// import generated change file
$oCMD = (clone $oOsm2pgsqlCmd)->addParams($sTemporaryFile);
echo $oCMD->escapedCmd()."\n";
$iRet = $oCMD->run();
if ($iRet) {
fail("osm2pgsql exited with error level $iRet\n");
}
}
if ($aResult['recompute-word-counts']) {
run((clone($oNominatimCmd))->addParams('refresh', '--word-counts'));
}
if ($aResult['index']) {
run((clone $oNominatimCmd)
->addParams('index', '--minrank', $aResult['index-rank'])
->addParams('--threads', $aResult['index-instances']));
}
if ($aResult['update-address-levels']) {
run((clone($oNominatimCmd))->addParams('refresh', '--address-levels'));
}
if ($aResult['recompute-importance']) {
run((clone($oNominatimCmd))->addParams('refresh', '--importance'));
}
if ($aResult['import-osmosis'] || $aResult['import-osmosis-all']) {
$oCmd = (clone($oNominatimCmd))
->addParams('replication')
->addParams('--threads', $aResult['index-instances']);
if (!$aResult['import-osmosis-all']) {
$oCmd->addParams('--once');
}
if ($aResult['no-index']) {
$oCmd->addParams('--no-index');
}
run($oCmd);
}

View File

@@ -3,7 +3,6 @@
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/Geocode.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
@@ -26,17 +25,16 @@ loadSettings($aCMDResult['project-dir'] ?? getcwd());
@define('CONST_Default_Language', getSetting('DEFAULT_LANGUAGE', false));
@define('CONST_Log_DB', getSettingBool('LOG_DB'));
@define('CONST_Log_File', getSetting('LOG_FILE', false));
@define('CONST_Max_Word_Frequency', getSetting('MAX_WORD_FREQUENCY'));
@define('CONST_NoAccessControl', getSettingBool('CORS_NOACCESSCONTROL'));
@define('CONST_Places_Max_ID_count', getSetting('LOOKUP_MAX_COUNT'));
@define('CONST_PolygonOutput_MaximumTypes', getSetting('POLYGON_OUTPUT_MAX_TYPES'));
@define('CONST_Search_BatchMode', getSettingBool('SEARCH_BATCH_MODE'));
@define('CONST_Search_NameOnlySearchFrequencyThreshold', getSetting('SEARCH_NAME_ONLY_THRESHOLD'));
@define('CONST_Term_Normalization_Rules', getSetting('TERM_NORMALIZATION'));
@define('CONST_Use_Aux_Location_data', getSettingBool('USE_AUX_LOCATION_DATA'));
@define('CONST_Use_US_Tiger_Data', getSettingBool('USE_US_TIGER_DATA'));
@define('CONST_MapIcon_URL', getSetting('MAPICON_URL', false));
@define('CONST_TokenizerDir', CONST_InstallDir.'/tokenizer');
require_once(CONST_LibDir.'/Geocode.php');
$oDB = new Nominatim\DB();
$oDB->connect();
@@ -64,11 +62,15 @@ if (!$aResult['search-only']) {
$oPlaceLookup->setLanguagePreference(array('en'));
echo 'Warm reverse: ';
if ($bVerbose) echo "\n";
if ($bVerbose) {
echo "\n";
}
for ($i = 0; $i < 1000; $i++) {
$fLat = rand(-9000, 9000) / 100;
$fLon = rand(-18000, 18000) / 100;
if ($bVerbose) echo "$fLat, $fLon = ";
if ($bVerbose) {
echo "$fLat, $fLon = ";
}
$oLookup = $oReverseGeocode->lookup($fLat, $fLon);
$aSearchResults = $oLookup ? $oPlaceLookup->lookup(array($oLookup->iId => $oLookup)) : null;
@@ -81,10 +83,19 @@ if (!$aResult['reverse-only']) {
$oGeocode = new Nominatim\Geocode($oDB);
echo 'Warm search: ';
if ($bVerbose) echo "\n";
if ($bVerbose) {
echo "\n";
}
$oTokenizer = new \Nominatim\Tokenizer($oDB);
$aWords = $oTokenizer->mostFrequentWords(1000);
$sSQL = 'SELECT word FROM word WHERE word is not null ORDER BY search_name_count DESC LIMIT 1000';
foreach ($oDB->getCol($sSQL) as $sWord) {
if ($bVerbose) echo "$sWord = ";
foreach ($aWords as $sWord) {
if ($bVerbose) {
echo "$sWord = ";
}
$oGeocode->setLanguagePreference(array('en'));
$oGeocode->setQuery($sWord);

View File

@@ -9,8 +9,12 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
foreach ($aSpec as $aLine) {
if (is_array($aLine)) {
if ($aLine[0]) $aQuick['--'.$aLine[0]] = $aLine;
if ($aLine[1]) $aQuick['-'.$aLine[1]] = $aLine;
if ($aLine[0]) {
$aQuick['--'.$aLine[0]] = $aLine;
}
if ($aLine[1]) {
$aQuick['-'.$aLine[1]] = $aLine;
}
$aCounts[$aLine[0]] = 0;
}
}
@@ -28,7 +32,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
$xVal = array();
for ($n = $aLine[4]; $i < $iSize && $n; $n--) {
$i++;
if ($i >= $iSize || $aArg[$i][0] == '-') showUsage($aSpec, $bExitOnError, 'Parameter of \''.$aLine[0].'\' is missing');
if ($i >= $iSize || $aArg[$i][0] == '-') {
showUsage($aSpec, $bExitOnError, 'Parameter of \''.$aLine[0].'\' is missing');
}
switch ($aLine[6]) {
case 'realpath':
@@ -56,7 +62,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
break;
}
}
if ($aLine[4] == 1) $xVal = $xVal[0];
if ($aLine[4] == 1) {
$xVal = $xVal[0];
}
} else {
$xVal = true;
}
@@ -65,7 +73,9 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
}
if ($aLine[3] > 1) {
if (!array_key_exists($aLine[0], $aResult)) $aResult[$aLine[0]] = array();
if (!array_key_exists($aLine[0], $aResult)) {
$aResult[$aLine[0]] = array();
}
$aResult[$aLine[0]][] = $xVal;
} else {
$aResult[$aLine[0]] = $xVal;
@@ -75,18 +85,23 @@ function getCmdOpt($aArg, $aSpec, &$aResult, $bExitOnError = false, $bExitOnUnkn
}
}
if (array_key_exists('help', $aResult)) showUsage($aSpec);
if ($bUnknown && $bExitOnUnknown) showUsage($aSpec, $bExitOnError, 'Unknown option \''.$bUnknown.'\'');
if (array_key_exists('help', $aResult)) {
showUsage($aSpec);
}
if ($bUnknown && $bExitOnUnknown) {
showUsage($aSpec, $bExitOnError, 'Unknown option \''.$bUnknown.'\'');
}
foreach ($aSpec as $aLine) {
if (is_array($aLine)) {
if ($aCounts[$aLine[0]] < $aLine[2]) showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is missing');
if ($aCounts[$aLine[0]] > $aLine[3]) showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is pressent too many times');
switch ($aLine[6]) {
case 'bool':
if (!array_key_exists($aLine[0], $aResult))
$aResult[$aLine[0]] = false;
break;
if ($aCounts[$aLine[0]] < $aLine[2]) {
showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is missing');
}
if ($aCounts[$aLine[0]] > $aLine[3]) {
showUsage($aSpec, $bExitOnError, 'Option \''.$aLine[0].'\' is pressent too many times');
}
if ($aLine[6] == 'bool' && !array_key_exists($aLine[0], $aResult)) {
$aResult[$aLine[0]] = false;
}
}
}
@@ -109,8 +124,12 @@ function showUsage($aSpec, $bExit = false, $sError = false)
echo "\n";
}
$aNames = array();
if ($aLine[1]) $aNames[] = '-'.$aLine[1];
if ($aLine[0]) $aNames[] = '--'.$aLine[0];
if ($aLine[1]) {
$aNames[] = '-'.$aLine[1];
}
if ($aLine[0]) {
$aNames[] = '--'.$aLine[0];
}
$sName = join(', ', $aNames);
echo ' '.$sName.str_repeat(' ', 30-strlen($sName)).$aLine[7]."\n";
} else {
@@ -144,58 +163,6 @@ function repeatWarnings()
}
function runSQLScript($sScript, $bfatal = true, $bVerbose = false, $bIgnoreErrors = false)
{
// Convert database DSN to psql parameters
$aDSNInfo = \Nominatim\DB::parseDSN(getSetting('DATABASE_DSN'));
if (!isset($aDSNInfo['port']) || !$aDSNInfo['port']) $aDSNInfo['port'] = 5432;
$oCmd = new \Nominatim\Shell('psql');
$oCmd->addParams('--port', $aDSNInfo['port']);
$oCmd->addParams('--dbname', $aDSNInfo['database']);
if (isset($aDSNInfo['hostspec']) && $aDSNInfo['hostspec']) {
$oCmd->addParams('--host', $aDSNInfo['hostspec']);
}
if (isset($aDSNInfo['username']) && $aDSNInfo['username']) {
$oCmd->addParams('--username', $aDSNInfo['username']);
}
if (isset($aDSNInfo['password'])) {
$oCmd->addEnvPair('PGPASSWORD', $aDSNInfo['password']);
}
if (!$bVerbose) {
$oCmd->addParams('--quiet');
}
if ($bfatal && !$bIgnoreErrors) {
$oCmd->addParams('-v', 'ON_ERROR_STOP=1');
}
$aDescriptors = array(
0 => array('pipe', 'r'),
1 => STDOUT,
2 => STDERR
);
$ahPipes = null;
$hProcess = @proc_open($oCmd->escapedCmd(), $aDescriptors, $ahPipes, null, $oCmd->aEnv);
if (!is_resource($hProcess)) {
fail('unable to start pgsql');
}
if (!$bVerbose) {
fwrite($ahPipes[0], 'set client_min_messages to WARNING;');
}
while (strlen($sScript)) {
$iWritten = fwrite($ahPipes[0], $sScript);
if ($iWritten <= 0) break;
$sScript = substr($sScript, $iWritten);
}
fclose($ahPipes[0]);
$iReturn = proc_close($hProcess);
if ($bfatal && $iReturn > 0) {
fail("pgsql returned with error code ($iReturn)");
}
}
function setupHTTPProxy()
{
if (!getSettingBool('HTTP_PROXY')) {

View File

@@ -12,7 +12,7 @@ require_once(CONST_Debug ? 'DebugHtml.php' : 'DebugNone.php');
function userError($sMsg)
{
throw new Exception($sMsg, 400);
throw new \Exception($sMsg, 400);
}
@@ -37,7 +37,7 @@ function shutdown_exception_handler_xml()
{
$error = error_get_last();
if ($error !== null && $error['type'] === E_ERROR) {
exception_handler_xml(new Exception($error['message'], 500));
exception_handler_xml(new \Exception($error['message'], 500));
}
}
@@ -45,7 +45,7 @@ function shutdown_exception_handler_json()
{
$error = error_get_last();
if ($error !== null && $error['type'] === E_ERROR) {
exception_handler_json(new Exception($error['message'], 500));
exception_handler_json(new \Exception($error['message'], 500));
}
}
@@ -81,6 +81,10 @@ if (CONST_NoAccessControl) {
header('Access-Control-Allow-Headers: '.$_SERVER['HTTP_ACCESS_CONTROL_REQUEST_HEADERS']);
}
}
if (isset($_SERVER['REQUEST_METHOD']) && $_SERVER['REQUEST_METHOD'] == 'OPTIONS') exit;
if (isset($_SERVER['REQUEST_METHOD']) && $_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
exit;
}
if (CONST_Debug) header('Content-type: text/html; charset=utf-8');
if (CONST_Debug) {
header('Content-type: text/html; charset=utf-8');
}

View File

@@ -6,10 +6,7 @@ function loadSettings($sProjectDir)
// Temporary hack to set the direcory via environment instead of
// the installed scripts. Neither setting is part of the official
// set of settings.
defined('CONST_DataDir') or define('CONST_DataDir', $_SERVER['NOMINATIM_DATADIR']);
defined('CONST_SqlDir') or define('CONST_SqlDir', $_SERVER['NOMINATIM_SQLDIR']);
defined('CONST_ConfigDir') or define('CONST_ConfigDir', $_SERVER['NOMINATIM_CONFIGDIR']);
defined('CONST_Default_ModulePath') or define('CONST_Default_ModulePath', $_SERVER['NOMINATIM_DATABASE_MODULE_SRC_PATH']);
}
function getSetting($sConfName, $sDefault = null)
@@ -32,22 +29,14 @@ function getSettingBool($sConfName)
|| strcmp($sVal, '1') == 0;
}
function getSettingConfig($sConfName, $sSystemConfig)
{
$sValue = $_SERVER['NOMINATIM_'.$sConfName];
if (!$sValue) {
return CONST_ConfigDir.'/'.$sSystemConfig;
}
return $sValue;
}
function fail($sError, $sUserError = false)
{
if (!$sUserError) $sUserError = $sError;
if (!$sUserError) {
$sUserError = $sError;
}
error_log('ERROR: '.$sError);
var_dump($sUserError)."\n";
var_dump($sUserError);
echo "\n";
exit(-1);
}
@@ -95,8 +84,9 @@ function getDatabaseDate(&$oDB)
function byImportance($a, $b)
{
if ($a['importance'] != $b['importance'])
if ($a['importance'] != $b['importance']) {
return ($a['importance'] > $b['importance']?-1:1);
}
return $a['foundorder'] <=> $b['foundorder'];
}
@@ -227,3 +217,12 @@ function closestHouseNumber($aRow)
return max(min($aRow['endnumber'], $iHn), $aRow['startnumber']);
}
if (!function_exists('array_key_last')) {
function array_key_last(array $array)
{
if (!empty($array)) {
return key(array_slice($array, -1, 1, true));
}
}
}

View File

@@ -5,15 +5,23 @@ function logStart(&$oDB, $sType = '', $sQuery = '', $aLanguageList = array())
{
$fStartTime = microtime(true);
$aStartTime = explode('.', $fStartTime);
if (!isset($aStartTime[1])) $aStartTime[1] = '0';
if (!isset($aStartTime[1])) {
$aStartTime[1] = '0';
}
$sOutputFormat = '';
if (isset($_GET['format'])) $sOutputFormat = $_GET['format'];
if (isset($_GET['format'])) {
$sOutputFormat = $_GET['format'];
}
if ($sType == 'reverse') {
$sOutQuery = (isset($_GET['lat'])?$_GET['lat']:'').'/';
if (isset($_GET['lon'])) $sOutQuery .= $_GET['lon'];
if (isset($_GET['zoom'])) $sOutQuery .= '/'.$_GET['zoom'];
if (isset($_GET['lon'])) {
$sOutQuery .= $_GET['lon'];
}
if (isset($_GET['zoom'])) {
$sOutQuery .= '/'.$_GET['zoom'];
}
} else {
$sOutQuery = $sQuery;
}
@@ -28,13 +36,15 @@ function logStart(&$oDB, $sType = '', $sQuery = '', $aLanguageList = array())
);
if (CONST_Log_DB) {
if (isset($_GET['email']))
if (isset($_GET['email'])) {
$sUserAgent = $_GET['email'];
elseif (isset($_SERVER['HTTP_REFERER']))
} elseif (isset($_SERVER['HTTP_REFERER'])) {
$sUserAgent = $_SERVER['HTTP_REFERER'];
elseif (isset($_SERVER['HTTP_USER_AGENT']))
} elseif (isset($_SERVER['HTTP_USER_AGENT'])) {
$sUserAgent = $_SERVER['HTTP_USER_AGENT'];
else $sUserAgent = '';
} else {
$sUserAgent = '';
}
$sSQL = 'insert into new_query_log (type,starttime,query,ipaddress,useragent,language,format,searchterm)';
$sSQL .= ' values (';
$sSQL .= join(',', $oDB->getDBQuotedList(array(
@@ -60,7 +70,9 @@ function logEnd(&$oDB, $hLog, $iNumResults)
if (CONST_Log_DB) {
$aEndTime = explode('.', $fEndTime);
if (!$aEndTime[1]) $aEndTime[1] = '0';
if (!$aEndTime[1]) {
$aEndTime[1] = '0';
}
$sEndTime = date('Y-m-d H:i:s', $aEndTime[0]).'.'.$aEndTime[1];
$sSQL = 'update new_query_log set endtime = '.$oDB->getDBQuoted($sEndTime).', results = '.$iNumResults;

View File

@@ -1,19 +0,0 @@
<?php
$phpPhraseSettingsFile = $argv[1];
$jsonPhraseSettingsFile = dirname($phpPhraseSettingsFile).'/'.basename($phpPhraseSettingsFile, '.php').'.json';
if (file_exists($phpPhraseSettingsFile) && !file_exists($jsonPhraseSettingsFile)) {
include $phpPhraseSettingsFile;
$data = array();
if (isset($aTagsBlacklist))
$data['blackList'] = $aTagsBlacklist;
if (isset($aTagsWhitelist))
$data['whiteList'] = $aTagsWhitelist;
$jsonFile = fopen($jsonPhraseSettingsFile, 'w');
fwrite($jsonFile, json_encode($data));
fclose($jsonFile);
}

View File

@@ -3,14 +3,26 @@
function formatOSMType($sType, $bIncludeExternal = true)
{
if ($sType == 'N') return 'node';
if ($sType == 'W') return 'way';
if ($sType == 'R') return 'relation';
if ($sType == 'N') {
return 'node';
}
if ($sType == 'W') {
return 'way';
}
if ($sType == 'R') {
return 'relation';
}
if (!$bIncludeExternal) return '';
if (!$bIncludeExternal) {
return '';
}
if ($sType == 'T') return 'way';
if ($sType == 'I') return 'way';
if ($sType == 'T') {
return 'way';
}
if ($sType == 'I') {
return 'way';
}
// not handled: P, L

View File

@@ -1,261 +0,0 @@
<?php
namespace Nominatim\Setup;
require_once(CONST_LibDir.'/Shell.php');
class SetupFunctions
{
protected $iInstances;
protected $aDSNInfo;
protected $bQuiet;
protected $bVerbose;
protected $sIgnoreErrors;
protected $bEnableDiffUpdates;
protected $bEnableDebugStatements;
protected $bDrop;
protected $oDB = null;
protected $oNominatimCmd;
public function __construct(array $aCMDResult)
{
// by default, use all but one processor, but never more than 15.
$this->iInstances = isset($aCMDResult['threads'])
? $aCMDResult['threads']
: (min(16, getProcessorCount()) - 1);
if ($this->iInstances < 1) {
$this->iInstances = 1;
warn('resetting threads to '.$this->iInstances);
}
// parse database string
$this->aDSNInfo = \Nominatim\DB::parseDSN(getSetting('DATABASE_DSN'));
if (!isset($this->aDSNInfo['port'])) {
$this->aDSNInfo['port'] = 5432;
}
// setting member variables based on command line options stored in $aCMDResult
$this->bQuiet = isset($aCMDResult['quiet']) && $aCMDResult['quiet'];
$this->bVerbose = $aCMDResult['verbose'];
//setting default values which are not set by the update.php array
if (isset($aCMDResult['ignore-errors'])) {
$this->sIgnoreErrors = $aCMDResult['ignore-errors'];
} else {
$this->sIgnoreErrors = false;
}
if (isset($aCMDResult['enable-debug-statements'])) {
$this->bEnableDebugStatements = $aCMDResult['enable-debug-statements'];
} else {
$this->bEnableDebugStatements = false;
}
if (isset($aCMDResult['enable-diff-updates'])) {
$this->bEnableDiffUpdates = $aCMDResult['enable-diff-updates'];
} else {
$this->bEnableDiffUpdates = false;
}
$this->bDrop = isset($aCMDResult['drop']) && $aCMDResult['drop'];
$this->oNominatimCmd = new \Nominatim\Shell(getSetting('NOMINATIM_TOOL'));
if ($this->bQuiet) {
$this->oNominatimCmd->addParams('--quiet');
}
if ($this->bVerbose) {
$this->oNominatimCmd->addParams('--verbose');
}
}
public function calculatePostcodes($bCMDResultAll)
{
info('Calculate Postcodes');
$this->pgsqlRunScriptFile(CONST_SqlDir.'/postcode_tables.sql');
$sPostcodeFilename = CONST_InstallDir.'/gb_postcode_data.sql.gz';
if (file_exists($sPostcodeFilename)) {
$this->pgsqlRunScriptFile($sPostcodeFilename);
} else {
warn('optional external GB postcode table file ('.$sPostcodeFilename.') not found. Skipping.');
}
$sPostcodeFilename = CONST_InstallDir.'/us_postcode_data.sql.gz';
if (file_exists($sPostcodeFilename)) {
$this->pgsqlRunScriptFile($sPostcodeFilename);
} else {
warn('optional external US postcode table file ('.$sPostcodeFilename.') not found. Skipping.');
}
$this->db()->exec('TRUNCATE location_postcode');
$sSQL = 'INSERT INTO location_postcode';
$sSQL .= ' (place_id, indexed_status, country_code, postcode, geometry) ';
$sSQL .= "SELECT nextval('seq_place'), 1, country_code,";
$sSQL .= " upper(trim (both ' ' from address->'postcode')) as pc,";
$sSQL .= ' ST_Centroid(ST_Collect(ST_Centroid(geometry)))';
$sSQL .= ' FROM placex';
$sSQL .= " WHERE address ? 'postcode' AND address->'postcode' NOT SIMILAR TO '%(,|;)%'";
$sSQL .= ' AND geometry IS NOT null';
$sSQL .= ' GROUP BY country_code, pc';
$this->db()->exec($sSQL);
// only add postcodes that are not yet available in OSM
$sSQL = 'INSERT INTO location_postcode';
$sSQL .= ' (place_id, indexed_status, country_code, postcode, geometry) ';
$sSQL .= "SELECT nextval('seq_place'), 1, 'us', postcode,";
$sSQL .= ' ST_SetSRID(ST_Point(x,y),4326)';
$sSQL .= ' FROM us_postcode WHERE postcode NOT IN';
$sSQL .= ' (SELECT postcode FROM location_postcode';
$sSQL .= " WHERE country_code = 'us')";
$this->db()->exec($sSQL);
// add missing postcodes for GB (if available)
$sSQL = 'INSERT INTO location_postcode';
$sSQL .= ' (place_id, indexed_status, country_code, postcode, geometry) ';
$sSQL .= "SELECT nextval('seq_place'), 1, 'gb', postcode, geometry";
$sSQL .= ' FROM gb_postcode WHERE postcode NOT IN';
$sSQL .= ' (SELECT postcode FROM location_postcode';
$sSQL .= " WHERE country_code = 'gb')";
$this->db()->exec($sSQL);
if (!$bCMDResultAll) {
$sSQL = "DELETE FROM word WHERE class='place' and type='postcode'";
$sSQL .= 'and word NOT IN (SELECT postcode FROM location_postcode)';
$this->db()->exec($sSQL);
}
$sSQL = 'SELECT count(getorcreate_postcode_id(v)) FROM ';
$sSQL .= '(SELECT distinct(postcode) as v FROM location_postcode) p';
$this->db()->exec($sSQL);
}
/**
* Return the connection to the database.
*
* @return Database object.
*
* Creates a new connection if none exists yet. Otherwise reuses the
* already established connection.
*/
private function db()
{
if (is_null($this->oDB)) {
$this->oDB = new \Nominatim\DB();
$this->oDB->connect();
}
return $this->oDB;
}
private function pgsqlRunScript($sScript, $bfatal = true)
{
runSQLScript(
$sScript,
$bfatal,
$this->bVerbose,
$this->sIgnoreErrors
);
}
public function createSqlFunctions()
{
$oCmd = (clone($this->oNominatimCmd))
->addParams('refresh', '--functions');
if (!$this->bEnableDiffUpdates) {
$oCmd->addParams('--no-diff-updates');
}
if ($this->bEnableDebugStatements) {
$oCmd->addParams('--enable-debug-statements');
}
$oCmd->run(!$this->sIgnoreErrors);
}
private function pgsqlRunScriptFile($sFilename)
{
if (!file_exists($sFilename)) fail('unable to find '.$sFilename);
$oCmd = (new \Nominatim\Shell('psql'))
->addParams('--port', $this->aDSNInfo['port'])
->addParams('--dbname', $this->aDSNInfo['database']);
if (!$this->bVerbose) {
$oCmd->addParams('--quiet');
}
if (isset($this->aDSNInfo['hostspec'])) {
$oCmd->addParams('--host', $this->aDSNInfo['hostspec']);
}
if (isset($this->aDSNInfo['username'])) {
$oCmd->addParams('--username', $this->aDSNInfo['username']);
}
if (isset($this->aDSNInfo['password'])) {
$oCmd->addEnvPair('PGPASSWORD', $this->aDSNInfo['password']);
}
$ahGzipPipes = null;
if (preg_match('/\\.gz$/', $sFilename)) {
$aDescriptors = array(
0 => array('pipe', 'r'),
1 => array('pipe', 'w'),
2 => array('file', '/dev/null', 'a')
);
$oZcatCmd = new \Nominatim\Shell('zcat', $sFilename);
$hGzipProcess = proc_open($oZcatCmd->escapedCmd(), $aDescriptors, $ahGzipPipes);
if (!is_resource($hGzipProcess)) fail('unable to start zcat');
$aReadPipe = $ahGzipPipes[1];
fclose($ahGzipPipes[0]);
} else {
$oCmd->addParams('--file', $sFilename);
$aReadPipe = array('pipe', 'r');
}
$aDescriptors = array(
0 => $aReadPipe,
1 => array('pipe', 'w'),
2 => array('file', '/dev/null', 'a')
);
$ahPipes = null;
$hProcess = proc_open($oCmd->escapedCmd(), $aDescriptors, $ahPipes, null, $oCmd->aEnv);
if (!is_resource($hProcess)) fail('unable to start pgsql');
// TODO: error checking
while (!feof($ahPipes[1])) {
echo fread($ahPipes[1], 4096);
}
fclose($ahPipes[1]);
$iReturn = proc_close($hProcess);
if ($iReturn > 0) {
fail("pgsql returned with error code ($iReturn)");
}
if ($ahGzipPipes) {
fclose($ahGzipPipes[1]);
proc_close($hGzipProcess);
}
}
private function replaceSqlPatterns($sSql)
{
$sSql = str_replace('{www-user}', getSetting('DATABASE_WEBUSER'), $sSql);
$aPatterns = array(
'{ts:address-data}' => getSetting('TABLESPACE_ADDRESS_DATA'),
'{ts:address-index}' => getSetting('TABLESPACE_ADDRESS_INDEX'),
'{ts:search-data}' => getSetting('TABLESPACE_SEARCH_DATA'),
'{ts:search-index}' => getSetting('TABLESPACE_SEARCH_INDEX'),
'{ts:aux-data}' => getSetting('TABLESPACE_AUX_DATA'),
'{ts:aux-index}' => getSetting('TABLESPACE_AUX_INDEX')
);
foreach ($aPatterns as $sPattern => $sTablespace) {
if ($sTablespace) {
$sSql = str_replace($sPattern, 'TABLESPACE "'.$sTablespace.'"', $sSql);
} else {
$sSql = str_replace($sPattern, '', $sSql);
}
}
return $sSql;
}
}

View File

@@ -1,20 +1,5 @@
<?php
function checkInFile($sOSMFile)
{
if (!isset($sOSMFile)) {
fail('missing --osm-file for data import');
}
if (!file_exists($sOSMFile)) {
fail('the path supplied to --osm-file does not exist');
}
if (!is_readable($sOSMFile)) {
fail('osm-file "' . $aCMDResult['osm-file'] . '" not readable');
}
}
function getOsm2pgsqlBinary()
{
$sBinary = getSetting('OSM2PGSQL_BINARY');

View File

@@ -5,9 +5,11 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
javascript_renderData($aFilteredPlaces);
} else {
$aFilteredPlaces = array(
@@ -17,7 +19,9 @@ if (empty($aPlace)) {
)
);
if (isset($aPlace['place_id'])) $aFilteredPlaces['properties']['geocoding']['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['properties']['geocoding']['place_id'] = $aPlace['place_id'];
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['properties']['geocoding']['osm_type'] = $sOSMType;

View File

@@ -3,9 +3,11 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
javascript_renderData($aFilteredPlaces);
} else {
$aFilteredPlaces = array(
@@ -13,7 +15,9 @@ if (empty($aPlace)) {
'properties' => array()
);
if (isset($aPlace['place_id'])) $aFilteredPlaces['properties']['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['properties']['place_id'] = $aPlace['place_id'];
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['properties']['osm_type'] = $sOSMType;
@@ -36,8 +40,12 @@ if (empty($aPlace)) {
if (isset($aPlace['address'])) {
$aFilteredPlaces['properties']['address'] = $aPlace['address']->getAddressNames();
}
if (isset($aPlace['sExtraTags'])) $aFilteredPlaces['properties']['extratags'] = $aPlace['sExtraTags'];
if (isset($aPlace['sNameDetails'])) $aFilteredPlaces['properties']['namedetails'] = $aPlace['sNameDetails'];
if (isset($aPlace['sExtraTags'])) {
$aFilteredPlaces['properties']['extratags'] = $aPlace['sExtraTags'];
}
if (isset($aPlace['sNameDetails'])) {
$aFilteredPlaces['properties']['namedetails'] = $aPlace['sNameDetails'];
}
if (isset($aPlace['aBoundingBox'])) {
$aFilteredPlaces['bbox'] = array(

View File

@@ -3,19 +3,27 @@
$aFilteredPlaces = array();
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
$aFilteredPlaces['error'] = $sError;
else $aFilteredPlaces['error'] = 'Unable to geocode';
} else {
$aFilteredPlaces['error'] = 'Unable to geocode';
}
} else {
if (isset($aPlace['place_id'])) $aFilteredPlaces['place_id'] = $aPlace['place_id'];
if (isset($aPlace['place_id'])) {
$aFilteredPlaces['place_id'] = $aPlace['place_id'];
}
$aFilteredPlaces['licence'] = 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright';
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) {
$aFilteredPlaces['osm_type'] = $sOSMType;
$aFilteredPlaces['osm_id'] = $aPlace['osm_id'];
}
if (isset($aPlace['lat'])) $aFilteredPlaces['lat'] = $aPlace['lat'];
if (isset($aPlace['lon'])) $aFilteredPlaces['lon'] = $aPlace['lon'];
if (isset($aPlace['lat'])) {
$aFilteredPlaces['lat'] = $aPlace['lat'];
}
if (isset($aPlace['lon'])) {
$aFilteredPlaces['lon'] = $aPlace['lon'];
}
if ($sOutputFormat == 'jsonv2' || $sOutputFormat == 'geojson') {
$aFilteredPlaces['place_rank'] = $aPlace['rank_search'];
@@ -35,8 +43,12 @@ if (empty($aPlace)) {
if (isset($aPlace['address'])) {
$aFilteredPlaces['address'] = $aPlace['address']->getAddressNames();
}
if (isset($aPlace['sExtraTags'])) $aFilteredPlaces['extratags'] = $aPlace['sExtraTags'];
if (isset($aPlace['sNameDetails'])) $aFilteredPlaces['namedetails'] = $aPlace['sNameDetails'];
if (isset($aPlace['sExtraTags'])) {
$aFilteredPlaces['extratags'] = $aPlace['sExtraTags'];
}
if (isset($aPlace['sNameDetails'])) {
$aFilteredPlaces['namedetails'] = $aPlace['sNameDetails'];
}
if (isset($aPlace['aBoundingBox'])) {
$aFilteredPlaces['boundingbox'] = $aPlace['aBoundingBox'];

View File

@@ -12,17 +12,29 @@ echo " querystring='".htmlspecialchars($_SERVER['QUERY_STRING'], ENT_QUOTES)."'"
echo ">\n";
if (empty($aPlace)) {
if (isset($sError))
if (isset($sError)) {
echo "<error>$sError</error>";
else echo '<error>Unable to geocode</error>';
} else {
echo '<error>Unable to geocode</error>';
}
} else {
echo '<result';
if ($aPlace['place_id']) echo ' place_id="'.$aPlace['place_id'].'"';
if ($aPlace['place_id']) {
echo ' place_id="'.$aPlace['place_id'].'"';
}
$sOSMType = formatOSMType($aPlace['osm_type']);
if ($sOSMType) echo ' osm_type="'.$sOSMType.'"'.' osm_id="'.$aPlace['osm_id'].'"';
if ($aPlace['ref']) echo ' ref="'.htmlspecialchars($aPlace['ref']).'"';
if (isset($aPlace['lat'])) echo ' lat="'.htmlspecialchars($aPlace['lat']).'"';
if (isset($aPlace['lon'])) echo ' lon="'.htmlspecialchars($aPlace['lon']).'"';
if ($sOSMType) {
echo ' osm_type="'.$sOSMType.'"'.' osm_id="'.$aPlace['osm_id'].'"';
}
if ($aPlace['ref']) {
echo ' ref="'.htmlspecialchars($aPlace['ref']).'"';
}
if (isset($aPlace['lat'])) {
echo ' lat="'.htmlspecialchars($aPlace['lat']).'"';
}
if (isset($aPlace['lon'])) {
echo ' lon="'.htmlspecialchars($aPlace['lon']).'"';
}
if (isset($aPlace['aBoundingBox'])) {
echo ' boundingbox="';
echo join(',', $aPlace['aBoundingBox']);

View File

@@ -43,29 +43,26 @@ $aPlaceDetails['centroid'] = array(
$aPlaceDetails['geometry'] = json_decode($aPointDetails['asgeojson']);
$funcMapAddressLine = function ($aFull) {
$aMapped = array(
'localname' => $aFull['localname'],
'place_id' => isset($aFull['place_id']) ? (int) $aFull['place_id'] : null,
'osm_id' => isset($aFull['osm_id']) ? (int) $aFull['osm_id'] : null,
'osm_type' => isset($aFull['osm_type']) ? $aFull['osm_type'] : null,
'place_type' => isset($aFull['place_type']) ? $aFull['place_type'] : null,
'class' => $aFull['class'],
'type' => $aFull['type'],
'admin_level' => isset($aFull['admin_level']) ? (int) $aFull['admin_level'] : null,
'rank_address' => $aFull['rank_address'] ? (int) $aFull['rank_address'] : null,
'distance' => (float) $aFull['distance'],
'isaddress' => isset($aFull['isaddress']) ? (bool) $aFull['isaddress'] : null
);
return $aMapped;
return array(
'localname' => $aFull['localname'],
'place_id' => isset($aFull['place_id']) ? (int) $aFull['place_id'] : null,
'osm_id' => isset($aFull['osm_id']) ? (int) $aFull['osm_id'] : null,
'osm_type' => isset($aFull['osm_type']) ? $aFull['osm_type'] : null,
'place_type' => isset($aFull['place_type']) ? $aFull['place_type'] : null,
'class' => $aFull['class'],
'type' => $aFull['type'],
'admin_level' => isset($aFull['admin_level']) ? (int) $aFull['admin_level'] : null,
'rank_address' => $aFull['rank_address'] ? (int) $aFull['rank_address'] : null,
'distance' => (float) $aFull['distance'],
'isaddress' => isset($aFull['isaddress']) ? (bool) $aFull['isaddress'] : null
);
};
$funcMapKeyword = function ($aFull) {
$aMapped = array(
'id' => (int) $aFull['word_id'],
'token' => $aFull['word_token']
);
return $aMapped;
return array(
'id' => (int) $aFull['word_id'],
'token' => $aFull['word_token']
);
};
if ($aAddressLines) {
@@ -81,10 +78,14 @@ if ($bIncludeKeywords) {
if ($aPlaceSearchNameKeywords) {
$aPlaceDetails['keywords']['name'] = array_map($funcMapKeyword, $aPlaceSearchNameKeywords);
} else {
$aPlaceDetails['keywords']['name'] = array();
}
if ($aPlaceSearchAddressKeywords) {
$aPlaceDetails['keywords']['address'] = array_map($funcMapKeyword, $aPlaceSearchAddressKeywords);
} else {
$aPlaceDetails['keywords']['address'] = array();
}
}
@@ -92,11 +93,15 @@ if ($bIncludeHierarchy) {
if ($bGroupHierarchy) {
$aPlaceDetails['hierarchy'] = array();
foreach ($aHierarchyLines as $aAddressLine) {
if ($aAddressLine['type'] == 'yes') $sType = $aAddressLine['class'];
else $sType = $aAddressLine['type'];
if ($aAddressLine['type'] == 'yes') {
$sType = $aAddressLine['class'];
} else {
$sType = $aAddressLine['type'];
}
if (!isset($aPlaceDetails['hierarchy'][$sType]))
if (!isset($aPlaceDetails['hierarchy'][$sType])) {
$aPlaceDetails['hierarchy'][$sType] = array();
}
$aPlaceDetails['hierarchy'][$sType][] = $funcMapAddressLine($aAddressLine);
}
} else {

View File

@@ -8,4 +8,4 @@
$error['details'] = $exception->getFile() . '('. $exception->getLine() . ')';
}
echo javascript_renderData(array('error' => $error));
javascript_renderData(array('error' => $error));

View File

@@ -5,7 +5,9 @@ $aOutput['licence'] = 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm
$aOutput['batch'] = array();
foreach ($aBatchResults as $aSearchResults) {
if (!$aSearchResults) $aSearchResults = array();
if (!$aSearchResults) {
$aSearchResults = array();
}
$aFilteredPlaces = array();
foreach ($aSearchResults as $iResNum => $aPointDetails) {
$aPlace = array(

View File

@@ -9,7 +9,9 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
)
);
if (isset($aPointDetails['place_id'])) $aPlace['properties']['geocoding']['place_id'] = $aPointDetails['place_id'];
if (isset($aPointDetails['place_id'])) {
$aPlace['properties']['geocoding']['place_id'] = $aPointDetails['place_id'];
}
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['properties']['geocoding']['osm_type'] = $sOSMType;

View File

@@ -8,7 +8,7 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
'place_id'=>$aPointDetails['place_id'],
)
);
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['properties']['osm_type'] = $sOSMType;
@@ -58,8 +58,12 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
}
if (isset($aPointDetails['sExtraTags'])) $aPlace['properties']['extratags'] = $aPointDetails['sExtraTags'];
if (isset($aPointDetails['sNameDetails'])) $aPlace['properties']['namedetails'] = $aPointDetails['sNameDetails'];
if (isset($aPointDetails['sExtraTags'])) {
$aPlace['properties']['extratags'] = $aPointDetails['sExtraTags'];
}
if (isset($aPointDetails['sNameDetails'])) {
$aPlace['properties']['namedetails'] = $aPointDetails['sNameDetails'];
}
$aFilteredPlaces[] = $aPlace;
}

View File

@@ -6,7 +6,7 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
'place_id'=>$aPointDetails['place_id'],
'licence'=>'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
);
$sOSMType = formatOSMType($aPointDetails['osm_type']);
if ($sOSMType) {
$aPlace['osm_type'] = $sOSMType;
@@ -60,8 +60,12 @@ foreach ($aSearchResults as $iResNum => $aPointDetails) {
$aPlace['geokml'] = $aPointDetails['askml'];
}
if (isset($aPointDetails['sExtraTags'])) $aPlace['extratags'] = $aPointDetails['sExtraTags'];
if (isset($aPointDetails['sNameDetails'])) $aPlace['namedetails'] = $aPointDetails['sNameDetails'];
if (isset($aPointDetails['sExtraTags'])) {
$aPlace['extratags'] = $aPointDetails['sExtraTags'];
}
if (isset($aPointDetails['sNameDetails'])) {
$aPlace['namedetails'] = $aPointDetails['sNameDetails'];
}
$aFilteredPlaces[] = $aPlace;
}

View File

@@ -10,7 +10,9 @@ echo (isset($sXmlRootTag)?$sXmlRootTag:'searchresults');
echo " timestamp='".date(DATE_RFC822)."'";
echo " attribution='Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright'";
echo " querystring='".htmlspecialchars($sQuery, ENT_QUOTES)."'";
if (isset($aMoreParams['viewbox'])) echo " viewbox='".htmlspecialchars($aMoreParams['viewbox'], ENT_QUOTES)."'";
if (isset($aMoreParams['viewbox'])) {
echo " viewbox='".htmlspecialchars($aMoreParams['viewbox'], ENT_QUOTES)."'";
}
if (isset($aMoreParams['exclude_place_ids'])) {
echo " exclude_place_ids='".htmlspecialchars($aMoreParams['exclude_place_ids'])."'";
}

View File

@@ -0,0 +1,221 @@
<?php
namespace Nominatim;
require_once(CONST_LibDir.'/SimpleWordList.php');
class Tokenizer
{
private $oDB;
private $oNormalizer;
private $oTransliterator;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
$this->oTransliterator = \Transliterator::createFromRules(CONST_Transliteration);
}
public function checkStatus()
{
$sSQL = 'SELECT word_id FROM word WHERE word_id is not null limit 1';
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
}
if (!$iWordID) {
throw new \Exception('No value', 704);
}
}
public function normalizeString($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
public function mostFrequentWords($iNum)
{
$sSQL = "SELECT word FROM word WHERE type = 'W'";
$sSQL .= "ORDER BY info->'count' DESC LIMIT ".$iNum;
return $this->oDB->getCol($sSQL);
}
private function makeStandardWord($sTerm)
{
return trim($this->oTransliterator->transliterate(' '.$sTerm.' '));
}
public function tokensForSpecialTerm($sTerm)
{
$aResults = array();
$sSQL = "SELECT word_id, info->>'class' as class, info->>'type' as type ";
$sSQL .= ' FROM word WHERE word_token = :term and type = \'S\'';
Debug::printVar('Term', $sTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL, array(':term' => $this->makeStandardWord($sTerm)));
Debug::printVar('Results', $aSearchWords);
foreach ($aSearchWords as $aSearchTerm) {
$aResults[] = new \Nominatim\Token\SpecialTerm(
$aSearchTerm['word_id'],
$aSearchTerm['class'],
$aSearchTerm['type'],
\Nominatim\Operator::TYPE
);
}
Debug::printVar('Special term tokens', $aResults);
return $aResults;
}
public function extractTokensFromPhrases(&$aPhrases)
{
$sNormQuery = '';
$aWordLists = array();
$aTokens = array();
foreach ($aPhrases as $iPhrase => $oPhrase) {
$sNormQuery .= ','.$this->normalizeString($oPhrase->getPhrase());
$sPhrase = $this->makeStandardWord($oPhrase->getPhrase());
Debug::printVar('Phrase', $sPhrase);
$oWordList = new SimpleWordList($sPhrase);
$aTokens = array_merge($aTokens, $oWordList->getTokens());
$aWordLists[] = $oWordList;
}
Debug::printVar('Tokens', $aTokens);
Debug::printVar('WordLists', $aWordLists);
$oValidTokens = $this->computeValidTokens($aTokens, $sNormQuery);
foreach ($aPhrases as $iPhrase => $oPhrase) {
$oPhrase->setWordSets($aWordLists[$iPhrase]->getWordSets($oValidTokens));
}
return $oValidTokens;
}
private function computeValidTokens($aTokens, $sNormQuery)
{
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$this->addTokensFromDB($oValidTokens, $aTokens, $sNormQuery);
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] != ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^[0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
}
return $oValidTokens;
}
private function addTokensFromDB(&$oValidTokens, $aTokens, $sNormQuery)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, type, word,';
$sSQL .= " info->>'op' as operator,";
$sSQL .= " info->>'class' as class, info->>'type' as ctype,";
$sSQL .= " info->>'count' as count";
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $this->oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $this->oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$iId = (int) $aWord['word_id'];
$sTok = $aWord['word_token'];
switch ($aWord['type']) {
case 'C': // country name tokens
if ($aWord['word'] !== null) {
$oValidTokens->addToken(
$sTok,
new Token\Country($iId, $aWord['word'])
);
}
break;
case 'H': // house number tokens
$oValidTokens->addToken($sTok, new Token\HouseNumber($iId, $aWord['word_token']));
break;
case 'P': // postcode tokens
// Postcodes are not normalized, so they may have content
// that makes SQL injection possible. Reject postcodes
// that would need special escaping.
if ($aWord['word'] !== null
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$sNormPostcode = $this->normalizeString($aWord['word']);
if (strpos($sNormQuery, $sNormPostcode) !== false) {
$oValidTokens->addToken(
$sTok,
new Token\Postcode($iId, $aWord['word'], null)
);
}
}
break;
case 'S': // tokens for classification terms (special phrases)
if ($aWord['class'] !== null && $aWord['ctype'] !== null) {
$oValidTokens->addToken($sTok, new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['ctype'],
(isset($aWord['operator'])) ? Operator::NEAR : Operator::NONE
));
}
break;
case 'W': // full-word tokens
$oValidTokens->addToken($sTok, new Token\Word(
$iId,
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
));
break;
case 'w': // partial word terms
$oValidTokens->addToken($sTok, new Token\Partial(
$iId,
$aWord['word_token'],
(int) $aWord['count']
));
break;
default:
break;
}
}
}
}

Some files were not shown because too many files have changed in this diff Show More