Compare commits

..

6 Commits

Author SHA1 Message Date
Sarah Hoffmann
282bd4a67e prepare for 3.4.2 release 2020-05-02 22:04:32 +02:00
Sarah Hoffmann
51f6db2e9c properly escape class parameter
The class parameter was used as is, allowing for potential
SQL injection via the API.

Thanks to @bladeswords for finding this.
2020-05-02 21:58:16 +02:00
Sarah Hoffmann
e4ecbef61e prepare for 3.4.1 release 2019-12-28 22:53:38 +01:00
Sarah Hoffmann
23dd49a5a2 update osm2pgsql (exclude country and postcode from address tags) 2019-12-28 22:41:33 +01:00
Francesc Hervada-Sala
0c85f88be8 typo - fixes openstreetmap#1606 2019-12-28 22:41:19 +01:00
Sarah Hoffmann
7829a05002 update osm2pgsql (deletion and address updates) 2019-12-28 22:40:46 +01:00
3687 changed files with 26759 additions and 46157 deletions

View File

@@ -1,4 +0,0 @@
contact_links:
- name: Nominatim Discussions
url: https://github.com/osm-search/Nominatim/discussions
about: Ask questions, get support, share ideas and discuss with community members.

View File

@@ -1,22 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
<!-- Before opening a new feature request, please search through the open issue to check that your request hasn't been reported already. -->
**Is your feature request related to a problem? Please describe.**
<!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] -->
**Describe the solution you'd like**
<!-- A clear and concise description of what you want to happen. -->
**Describe alternatives you've considered**
<!-- A clear and concise description of any alternative solutions or features you've considered. -->
**Additional context**
<!-- Add any other context or screenshots about the feature request here. -->

View File

@@ -1,37 +0,0 @@
---
name: Report issues with search results
about: You have searched something with Nominatim and did not get the expected result.
title: ''
labels: ''
assignees: ''
---
## What did you search for?
<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. -->
## What result did you get?
## What result did you expect?
**Is the result in the right place and just named wrongly?**
<!-- Please tell us the display name you expected. -->
**Is the result missing completely?**
<!-- Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.
To get the link to the OSM object, you can try the following:
* Go to [https://openstreetmap.org](https://openstreetmap.org).
* Move to the area of the map where you expect the result and then zoom in as much as possible.
* Click on the question mark on the right side of the map. You get a question cursor. Use it to click on the map where your object is located.
* Find the object of interest in the list that appears on the left side.
* Click on the object and report back the URL that the browser shows.
-->
## Further details
<!-- Anything else we should know about the search. Particularities with addresses in the area etc. -->

View File

@@ -1,36 +0,0 @@
---
name: Report problems with the software
about: You have your own installation of Nominatim and found a bug.
title: ''
labels: ''
assignees: ''
---
<!-- Note: if you are installing Nominatim through a docker image, you should report issues with the installation process with the docker repository first. -->
**Describe the bug**
<!-- A clear and concise description of what the bug is. -->
**To Reproduce**
<!-- Please describe what you did to get to the issue. -->
**Software Environment (please complete the following information):**
- Nominatim version:
- Postgresql version:
- Postgis version:
- OS:
**Hardware Configuration (please complete the following information):**
- RAM:
- number of CPUs:
- type and size of disks:
- bare metal/AWS/other cloud service:
**Postgresql Configuration:**
<!-- List any configuration items you changed in your postgresql configuration. -->
**Additional context**
<!-- Add any other context about the problem here. -->

View File

@@ -1,42 +0,0 @@
name: 'Build Nominatim'
inputs:
ubuntu:
description: 'Version of Ubuntu to install on'
required: false
default: '20'
runs:
using: "composite"
steps:
- name: Install prerequisites
run: |
sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev
if [ "x$UBUNTUVER" == "x18" ]; then
pip3 install python-dotenv psycopg2==2.7.7 jinja2==2.8 psutil==5.4.2 pyicu osmium PyYAML==5.1 datrie
else
sudo apt-get install -y -qq python3-icu python3-datrie python3-pyosmium python3-jinja2 python3-psutil python3-psycopg2 python3-dotenv python3-yaml
fi
shell: bash
env:
UBUNTUVER: ${{ inputs.ubuntu }}
- name: Download dependencies
run: |
if [ ! -f country_grid.sql.gz ]; then
wget --no-verbose https://www.nominatim.org/data/country_grid.sql.gz
fi
cp country_grid.sql.gz Nominatim/data/country_osm_grid.sql.gz
shell: bash
- name: Configure
run: mkdir build && cd build && cmake ../Nominatim
shell: bash
- name: Build
run: |
make -j2 all
sudo make install
shell: bash
working-directory: build

View File

@@ -1,47 +0,0 @@
name: 'Setup Postgresql and Postgis'
inputs:
postgresql-version:
description: 'Version of PostgreSQL to install'
required: true
postgis-version:
description: 'Version of Postgis to install'
required: true
runs:
using: "composite"
steps:
- name: Remove existing PostgreSQL
run: |
sudo apt-get purge -yq postgresql*
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get update -qq
shell: bash
- name: Install PostgreSQL
run: |
sudo apt-get install -y -qq --no-install-suggests --no-install-recommends postgresql-client-${PGVER} postgresql-${PGVER}-postgis-${POSTGISVER} postgresql-${PGVER}-postgis-${POSTGISVER}-scripts postgresql-contrib-${PGVER} postgresql-${PGVER} postgresql-server-dev-${PGVER}
shell: bash
env:
PGVER: ${{ inputs.postgresql-version }}
POSTGISVER: ${{ inputs.postgis-version }}
- name: Adapt postgresql configuration
run: |
echo 'fsync = off' | sudo tee /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'synchronous_commit = off' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'full_page_writes = off' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'shared_buffers = 1GB' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'port = 5432' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
shell: bash
env:
PGVER: ${{ inputs.postgresql-version }}
- name: Setup database
run: |
sudo systemctl restart postgresql
sudo -u postgres createuser -S www-data
sudo -u postgres createuser -s runner
shell: bash

View File

@@ -1,217 +0,0 @@
name: CI Tests
on: [ push, pull_request ]
jobs:
tests:
strategy:
matrix:
ubuntu: [18, 20]
include:
- ubuntu: 18
postgresql: 9.5
postgis: 2.5
pytest: pytest
php: 7.2
- ubuntu: 20
postgresql: 13
postgis: 3
pytest: py.test-3
php: 7.4
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/checkout@v2
with:
submodules: true
path: Nominatim
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
coverage: xdebug
tools: phpunit, phpcs, composer
- uses: actions/setup-python@v2
with:
python-version: 3.6
if: matrix.ubuntu == 18
- name: Get Date
id: get-date
run: |
echo "::set-output name=date::$(/bin/date -u "+%Y%W")"
shell: bash
- uses: actions/cache@v2
with:
path: |
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Install test prerequsites
run: sudo apt-get install -y -qq pylint python3-pytest python3-behave python3-pytest-cov php-codecoverage
if: matrix.ubuntu == 20
- name: Install test prerequsites
run: |
pip3 install pylint==2.6.0 pytest pytest-cov behave==1.2.6
if: matrix.ubuntu == 18
- name: PHP linting
run: phpcs --report-width=120 .
working-directory: Nominatim
- name: Python linting
run: pylint nominatim
working-directory: Nominatim
- name: PHP unit tests
run: phpunit --coverage-clover ../../coverage-php.xml ./
working-directory: Nominatim/test/php
if: matrix.ubuntu == 20
- name: Python unit tests
run: $PYTEST --cov=nominatim --cov-report=xml test/python
working-directory: Nominatim
env:
PYTEST: ${{ matrix.pytest }}
- name: BDD tests
run: |
mkdir cov
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3 -DPHPCOV=./cov
composer require phpunit/phpcov:7.0.2
vendor/bin/phpcov merge --clover ../../coverage-bdd.xml ./cov
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 20
- name: BDD tests
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build --format=progress3
working-directory: Nominatim/test/bdd
if: matrix.ubuntu == 18
- name: BDD tests (legacy_icu tokenizer)
run: |
behave -DREMOVE_TEMPLATE=1 -DBUILDDIR=$GITHUB_WORKSPACE/build -DTOKENIZER=legacy_icu --format=progress3
working-directory: Nominatim/test/bdd
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
files: ./Nominatim/coverage*.xml
directory: ./
name: codecov-umbrella
fail_ci_if_error: false
path_to_write_report: ./coverage/codecov_report.txt
verbose: true
if: matrix.ubuntu == 20
import:
strategy:
matrix:
ubuntu: [18, 20]
include:
- ubuntu: 18
postgresql: 9.5
postgis: 2.5
- ubuntu: 20
postgresql: 13
postgis: 3
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/checkout@v2
with:
submodules: true
path: Nominatim
- name: Get Date
id: get-date
run: |
echo "::set-output name=date::$(/bin/date -u "+%Y%W")"
shell: bash
- uses: actions/cache@v2
with:
path: |
country_grid.sql.gz
key: nominatim-country-data-${{ steps.get-date.outputs.date }}
- uses: actions/cache@v2
with:
path: |
monaco-latest.osm.pbf
key: nominatim-test-data-${{ steps.get-date.outputs.date }}
- uses: actions/setup-python@v2
with:
python-version: 3.6
if: matrix.ubuntu == 18
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
postgis-version: ${{ matrix.postgis }}
- uses: ./Nominatim/.github/actions/build-nominatim
with:
ubuntu: ${{ matrix.ubuntu }}
- name: Clean installation
run: rm -rf Nominatim build
shell: bash
- name: Prepare import environment
run: |
if [ ! -f monaco-latest.osm.pbf ]; then
wget --no-verbose https://download.geofabrik.de/europe/monaco-latest.osm.pbf
fi
mkdir data-env
cd data-env
shell: bash
- name: Import
run: nominatim import --osm-file ../monaco-latest.osm.pbf
shell: bash
working-directory: data-env
- name: Import special phrases
run: nominatim special-phrases --import-from-wiki
working-directory: data-env
- name: Check full import
run: nominatim admin --check-database
working-directory: data-env
- name: Warm up database
run: nominatim admin --warm
working-directory: data-env
- name: Run update
run: |
nominatim replication --init
nominatim replication --once
working-directory: data-env
- name: Run reverse-only import
run : nominatim import --osm-file ../monaco-latest.osm.pbf --reverse-only --no-updates
working-directory: data-env
env:
NOMINATIM_DATABASE_DSN: pgsql:dbname=reverse
- name: Check reverse import
run: nominatim admin --check-database
working-directory: data-env

1
.gitignore vendored
View File

@@ -9,4 +9,3 @@ data/wiki_specialphrases.sql
data/osmosischange.osc
.vagrant
data/country_osm_grid.sql.gz

View File

@@ -1,15 +0,0 @@
[MASTER]
extension-pkg-whitelist=osmium
ignored-modules=icu,datrie
[MESSAGES CONTROL]
[TYPECHECK]
# closing added here because it sometimes triggers a false positive with
# 'with' statements.
ignored-classes=NominatimArgs,closing
disable=too-few-public-methods,duplicate-code
good-names=i,x,y,fd,db

34
.travis.yml Normal file
View File

@@ -0,0 +1,34 @@
---
sudo: required
dist: xenial
language: python
python:
- "3.6"
addons:
postgresql: "9.6"
git:
depth: 3
env:
- TEST_SUITE=tests
- TEST_SUITE=monaco
before_install:
- phpenv global 7.1
install:
- vagrant/install-on-travis-ci.sh
before_script:
- psql -U postgres -c "create extension postgis"
script:
- cd $TRAVIS_BUILD_DIR/
- if [[ $TEST_SUITE == "tests" ]]; then phpcs --report-width=120 . ; fi
- cd $TRAVIS_BUILD_DIR/test/php
- if [[ $TEST_SUITE == "tests" ]]; then /usr/bin/phpunit ./ ; fi
- cd $TRAVIS_BUILD_DIR/test/bdd
- # behave --format=progress3 api
- if [[ $TEST_SUITE == "tests" ]]; then behave -DREMOVE_TEMPLATE=1 --format=progress3 db ; fi
- if [[ $TEST_SUITE == "tests" ]]; then behave --format=progress3 osm2pgsql ; fi
- cd $TRAVIS_BUILD_DIR/build
- if [[ $TEST_SUITE == "monaco" ]]; then wget --no-verbose --output-document=../data/monaco.osm.pbf http://download.geofabrik.de/europe/monaco-latest.osm.pbf; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/setup.php --osm-file ../data/monaco.osm.pbf --osm2pgsql-cache 1000 --all 2>&1 | grep -v 'ETA (seconds)'; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/specialphrases.php --wiki-import | psql -d test_api_nominatim >/dev/null; fi
notifications:
email: false

View File

@@ -6,7 +6,7 @@
#
#-----------------------------------------------------------------------------
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
cmake_minimum_required(VERSION 2.8 FATAL_ERROR)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
@@ -19,8 +19,8 @@ list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
project(nominatim)
set(NOMINATIM_VERSION_MAJOR 3)
set(NOMINATIM_VERSION_MINOR 7)
set(NOMINATIM_VERSION_PATCH 0)
set(NOMINATIM_VERSION_MINOR 4)
set(NOMINATIM_VERSION_PATCH 2)
set(NOMINATIM_VERSION "${NOMINATIM_VERSION_MAJOR}.${NOMINATIM_VERSION_MINOR}.${NOMINATIM_VERSION_PATCH}")
@@ -28,236 +28,148 @@ add_definitions(-DNOMINATIM_VERSION="${NOMINATIM_VERSION}")
#-----------------------------------------------------------------------------
# Configuration
#
# Find external dependencies
#
#-----------------------------------------------------------------------------
set(BUILD_IMPORTER on CACHE BOOL "Build everything for importing/updating the database")
set(BUILD_API on CACHE BOOL "Build everything for the API server")
set(BUILD_MODULE on CACHE BOOL "Build PostgreSQL module")
set(BUILD_TESTS on CACHE BOOL "Build test suite")
set(BUILD_DOCS on CACHE BOOL "Build documentation")
set(BUILD_MANPAGE on CACHE BOOL "Build Manual Page")
set(BUILD_OSM2PGSQL on CACHE BOOL "Build osm2pgsql (expert only)")
set(BUILD_TESTS off CACHE BOOL "Build test suite" FORCE)
set(WITH_LUA off CACHE BOOL "Build with lua support" FORCE)
set(ONLY_DOCS off CACHE BOOL "Build documentation only")
#-----------------------------------------------------------------------------
# osm2pgsql (imports/updates only)
#-----------------------------------------------------------------------------
if (BUILD_IMPORTER AND BUILD_OSM2PGSQL)
if (NOT ONLY_DOCS)
if (NOT EXISTS "${CMAKE_SOURCE_DIR}/osm2pgsql/CMakeLists.txt")
message(FATAL_ERROR "The osm2pgsql directory is empty.\
Did you forget to check out Nominatim recursively?\
\nTry updating submodules with: git submodule update --init")
endif()
set(BUILD_TESTS_SAVED "${BUILD_TESTS}")
set(BUILD_TESTS off)
set(WITH_LUA off CACHE BOOL "")
add_subdirectory(osm2pgsql)
set(BUILD_TESTS ${BUILD_TESTS_SAVED})
endif()
find_package(Threads REQUIRED)
unset(PostgreSQL_TYPE_INCLUDE_DIR CACHE)
set(PostgreSQL_TYPE_INCLUDE_DIR "/usr/include/")
find_package(PostgreSQL REQUIRED)
include_directories(${PostgreSQL_INCLUDE_DIRS})
link_directories(${PostgreSQL_LIBRARY_DIRS})
find_program(PYOSMIUM pyosmium-get-changes)
if (NOT EXISTS "${PYOSMIUM}")
set(PYOSMIUM_PATH "")
message(WARNING "pyosmium-get-changes not found (required for updates)")
else()
set(PYOSMIUM_PATH "${PYOSMIUM}")
message(STATUS "Using pyosmium-get-changes at ${PYOSMIUM_PATH}")
endif()
#-----------------------------------------------------------------------------
# python (imports/updates only)
#-----------------------------------------------------------------------------
find_program(PG_CONFIG pg_config)
execute_process(COMMAND ${PG_CONFIG} --pgxs
OUTPUT_VARIABLE PGXS
OUTPUT_STRIP_TRAILING_WHITESPACE)
if (BUILD_IMPORTER)
find_package(PythonInterp 3.6 REQUIRED)
endif()
if (NOT EXISTS "${PGXS}")
message(FATAL_ERROR "Postgresql server package not found.")
endif()
#-----------------------------------------------------------------------------
# PHP
#-----------------------------------------------------------------------------
find_package(ZLIB REQUIRED)
# Setting PHP binary variable as to command line (prevailing) or auto detect
find_package(BZip2 REQUIRED)
if (BUILD_API OR BUILD_IMPORTER)
find_package(LibXml2 REQUIRED)
include_directories(${LIBXML2_INCLUDE_DIR})
# Setting PHP binary variable as to command line (prevailing) or auto detect
if (NOT PHP_BIN)
find_program (PHP_BIN php)
endif()
# sanity check if PHP binary exists
if (NOT EXISTS ${PHP_BIN})
message(FATAL_ERROR "PHP binary not found. Install php or provide location with -DPHP_BIN=/path/php ")
else()
message (STATUS "Using PHP binary " ${PHP_BIN})
endif()
if (NOT PHPCGI_BIN)
find_program (PHPCGI_BIN php-cgi)
endif()
# sanity check if PHP binary exists
if (NOT EXISTS ${PHPCGI_BIN})
message(WARNING "php-cgi binary not found. nominatim tool will not provide query functions.")
set (PHPCGI_BIN "")
else()
message (STATUS "Using php-cgi binary " ${PHPCGI_BIN})
endif()
message (STATUS "Using PHP binary " ${PHP_BIN})
endif()
#-----------------------------------------------------------------------------
# import scripts and utilities (importer only)
#
# Setup settings and paths
#
#-----------------------------------------------------------------------------
if (BUILD_IMPORTER)
find_file(COUNTRY_GRID_FILE country_osm_grid.sql.gz
PATHS ${PROJECT_SOURCE_DIR}/data
NO_DEFAULT_PATH
DOC "Location of the country grid file."
)
set(WEBSITESCRIPTS
website/deletable.php
website/details.php
website/hierarchy.php
website/lookup.php
website/polygons.php
website/reverse.php
website/search.php
website/status.php
)
if (NOT COUNTRY_GRID_FILE)
message(FATAL_ERROR "\nYou need to download the country_osm_grid first:\n"
" wget -O ${PROJECT_SOURCE_DIR}/data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz")
endif()
set(CUSTOMSCRIPTS
utils/country_languages.php
utils/importWikipedia.php
utils/export.php
utils/query.php
utils/setup.php
utils/specialphrases.php
utils/update.php
utils/warm.php
)
foreach (script_source ${CUSTOMSCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/script.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
foreach (script_source ${WEBSITESCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/website.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
configure_file(${PROJECT_SOURCE_DIR}/settings/defaults.php
${PROJECT_BINARY_DIR}/settings/settings.php)
set(WEBPATHS css images js)
foreach (wp ${WEBPATHS})
execute_process(
COMMAND ln -sf ${PROJECT_SOURCE_DIR}/website/${wp} ${PROJECT_BINARY_DIR}/website/
)
endforeach()
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool.tmpl
${PROJECT_BINARY_DIR}/nominatim)
endif()
#-----------------------------------------------------------------------------
#
# Tests
#
#-----------------------------------------------------------------------------
if (BUILD_TESTS)
if (NOT ONLY_DOCS)
include(CTest)
set(TEST_BDD db osm2pgsql api)
find_program(PYTHON_BEHAVE behave)
find_program(PYLINT NAMES pylint3 pylint)
find_program(PYTEST NAMES pytest py.test-3 py.test)
find_program(PHPCS phpcs)
find_program(PHPUNIT phpunit)
foreach (test ${TEST_BDD})
add_test(NAME bdd_${test}
COMMAND lettuce features/${test}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/tests)
set_tests_properties(bdd_${test}
PROPERTIES ENVIRONMENT "NOMINATIM_DIR=${PROJECT_BINARY_DIR}")
endforeach()
if (PYTHON_BEHAVE)
message(STATUS "Using Python behave binary ${PYTHON_BEHAVE}")
foreach (test ${TEST_BDD})
add_test(NAME bdd_${test}
COMMAND ${PYTHON_BEHAVE} ${test}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/bdd)
set_tests_properties(bdd_${test}
PROPERTIES ENVIRONMENT "NOMINATIM_DIR=${PROJECT_BINARY_DIR}")
endforeach()
else()
message(WARNING "behave not found. BDD tests disabled." )
endif()
if (PHPUNIT)
message(STATUS "Using phpunit binary ${PHPUNIT}")
add_test(NAME php
COMMAND ${PHPUNIT} ./
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/test/php)
else()
message(WARNING "phpunit not found. PHP unit tests disabled." )
endif()
if (PHPCS)
message(STATUS "Using phpcs binary ${PHPCS}")
add_test(NAME phpcs
COMMAND ${PHPCS} --report-width=120 --colors lib website utils
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "phpcs not found. PHP linting tests disabled." )
endif()
if (PYLINT)
message(STATUS "Using pylint binary ${PYLINT}")
add_test(NAME pylint
COMMAND ${PYLINT} nominatim
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "pylint not found. Python linting tests disabled.")
endif()
if (PYTEST)
message(STATUS "Using pytest binary ${PYTEST}")
add_test(NAME pytest
COMMAND ${PYTEST} test/python
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
else()
message(WARNING "pytest not found. Python tests disabled." )
endif()
add_test(NAME php
COMMAND phpunit ./
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/tests-php)
endif()
#-----------------------------------------------------------------------------
# Postgres module
#-----------------------------------------------------------------------------
if (BUILD_MODULE)
if (NOT ONLY_DOCS)
add_subdirectory(module)
add_subdirectory(nominatim)
endif()
add_subdirectory(docs)
#-----------------------------------------------------------------------------
# Documentation
#-----------------------------------------------------------------------------
if (BUILD_DOCS)
add_subdirectory(docs)
endif()
#-----------------------------------------------------------------------------
# Manual page
#-----------------------------------------------------------------------------
if (BUILD_MANPAGE)
add_subdirectory(manual)
endif()
#-----------------------------------------------------------------------------
# Installation
#-----------------------------------------------------------------------------
include(GNUInstallDirs)
set(NOMINATIM_DATADIR ${CMAKE_INSTALL_FULL_DATADIR}/${PROJECT_NAME})
set(NOMINATIM_LIBDIR ${CMAKE_INSTALL_FULL_LIBDIR}/${PROJECT_NAME})
set(NOMINATIM_CONFIGDIR ${CMAKE_INSTALL_FULL_SYSCONFDIR}/${PROJECT_NAME})
if (BUILD_IMPORTER)
configure_file(${PROJECT_SOURCE_DIR}/cmake/tool-installed.tmpl installed.bin)
install(PROGRAMS ${PROJECT_BINARY_DIR}/installed.bin
DESTINATION ${CMAKE_INSTALL_BINDIR}
RENAME nominatim)
install(DIRECTORY nominatim
DESTINATION ${NOMINATIM_LIBDIR}/lib-python
FILES_MATCHING PATTERN "*.py"
PATTERN __pycache__ EXCLUDE)
install(DIRECTORY lib-sql DESTINATION ${NOMINATIM_LIBDIR})
install(FILES data/country_name.sql
${COUNTRY_GRID_FILE}
data/words.sql
DESTINATION ${NOMINATIM_DATADIR})
endif()
if (BUILD_OSM2PGSQL)
if (${CMAKE_VERSION} VERSION_LESS 3.13)
# Installation of subdirectory targets was only introduced in 3.13.
# So just copy the osm2pgsql file for older versions.
install(PROGRAMS ${PROJECT_BINARY_DIR}/osm2pgsql/osm2pgsql
DESTINATION ${NOMINATIM_LIBDIR})
else()
install(TARGETS osm2pgsql RUNTIME DESTINATION ${NOMINATIM_LIBDIR})
endif()
endif()
if (BUILD_MODULE)
install(PROGRAMS ${PROJECT_BINARY_DIR}/module/nominatim.so
DESTINATION ${NOMINATIM_LIBDIR}/module)
endif()
if (BUILD_API)
install(DIRECTORY lib-php DESTINATION ${NOMINATIM_LIBDIR})
endif()
install(FILES settings/env.defaults
settings/address-levels.json
settings/phrase-settings.json
settings/import-admin.style
settings/import-street.style
settings/import-address.style
settings/import-full.style
settings/import-extratags.style
settings/legacy_icu_tokenizer.yaml
settings/icu-rules/extended-unicode-to-asccii.yaml
DESTINATION ${NOMINATIM_CONFIGDIR})

View File

@@ -7,6 +7,38 @@ Please always open a separate issue for each problem. In particular, do
not add your bugs to closed issues. They may looks similar to you but
often are completely different from the maintainer's point of view.
### When Reporting Bad Search Results...
Please make sure to add the following information:
* the URL of the query that produces the bad result
* the result you are getting
* the expected result, preferably a link to the OSM object you want to find,
otherwise an address that is as precise as possible
To get the link to the OSM object, you can try the following:
* go to https://openstreetmap.org
* zoom to the area of the map where you expect the result and
zoom in as much as possible
* click on the question mark on the right side of the map,
then with the queston cursor on the map where your object is located
* find the object of interest in the list that appears on the left side
* click on the object and report the URL back that the browser shows
### When Reporting Problems with your Installation...
Please add the following information to your issue:
* hardware configuration: RAM size, CPUs, kind and size of disks
* Operating system (also mention if you are running on a cloud service)
* Postgres and Postgis version
* list of settings you changed in your Postgres configuration
* Nominatim version (release version or,
if you run from the git repo, the output of `git rev-parse HEAD`)
* (if applicable) exact command line of the command that was causing the issue
## Workflow for Pull Requests
We love to get pull requests from you. We operate the "Fork & Pull" model
@@ -49,18 +81,22 @@ are in process of consolidating the style. The following rules apply:
* for PHP variables use CamelCase with a prefixing letter indicating the type
(i - integer, f - float, a - array, s - string, o - object)
The coding style is enforced with PHPCS and pylint. It can be tested with:
The coding style is enforced with PHPCS and can be tested with:
```
phpcs --report-width=120 --colors .
pylint3 --extension-pkg-whitelist=osmium nominatim
phpcs --report-width=120 --colors .
```
## Testing
Before submitting a pull request make sure that the tests pass:
Before submitting a pull request make sure that the following tests pass:
```
cd build
make test
cd test/bdd
behave -DBUILDDIR=<builddir> db osm2pgsql
```
```
cd test/php
phpunit ./
```

111
ChangeLog
View File

@@ -1,112 +1,11 @@
3.7.0
* switch to dotenv for configuration file
* introduce 'make install' (reorganising most of the code)
* introduce nominatim tool as replacement for various php scripts
* introduce project directories and allow multiple installations from same build
* clean up BDD tests: drop nose, reorganise step code
* simplify test database for API BDD tests and autoinstall database
* port most of the code for command-line tools to Python
(thanks to @darkshredder and @AntoJvlt)
* add tests for all tooling
* replace pyosmium-get-changes with custom internal implementation using
pyosmium
* improve search for queries with housenumber and partial terms
* add database versioning
* use jinja2 for preprocessing SQL files
* introduce automatic migrations
* reverse fix preference of interpolations over housenumbers
* parallelize indexing of postcodes
* add non-key indexes to speed up housenumber + street searches
* switch housenumber field in placex to save transliterated names
3.6.0
* add full support for searching by and displaying of addr:* tags
* improve address output for large-area objects
* better use of country names from OSM data for search and display
* better debug output for reverse call
* add support for addr:place links without an place equivalent in OSM
* improve finding postcodes with normalisation artefacts
* batch object to index for rank 30, avoiding a wrap-around of transaction
IDs in PostgreSQL
* introduce dynamic address rank computation for administrative boundaries
depending on linked objects and their place in the admin level hierarchy
* add country-specific address ranking for Indonesia, Russia, Belgium and
the Netherlands (thanks @hendrikmoree)
* make sure wikidata/wikipedia tags are imported for all styles
* make POIs searchable by name and housenumber (thanks @joy-yyd)
* reverse geocoding now ignores places without an address rank (rivers etc.)
* installation of a webserver is no longer mandatory, for development
use the php internal webserver via 'make serve
* reduce the influence of place nodes in addresses
* drop support for the unspecific is_in tag
* various minor tweaks to supplied styles
* move HTML web frontend into its own project
* move scripts for processing external data sources into separate directories
* introduce separate configuration for website (thanks @krahulreddy)
* update documentation, in particular, clean up development docs
* update osm2pgsql to 1.4.0
3.5.2
* ensure that wikipedia tags are imported for all styles
* reinstate verbosity for indexing during updates
* make house number reappear in display name on named POIs
* introduce batch processing in indexer to avoid transaction ID overrun
* increase splitting for large geometries to improve indexing speed
* remove deprecated get_magic_quotes_gpc() function
* make sure that all postcodes have an entry in word and are thus searchable
* remove use of ST_Covers in conjunction woth ST_Intersects,
causes bad query planning and slow updates in Postgis3
* update osm2pgsql
3.5.1
* disable jit and parallel processing in PostgreSQL for osm2pgsql
* update libosmium to 2.15.6 (fixes an issue with processing hanging
on large multipolygons)
3.5.0
* structured select on HTML search page
* new PHP Nominatim\Shell class to wrap shell escaping
* remove polygon parameter from all API calls
* improve handling of postcode areas
* reorganise place linking algorithm, now using wikidata tag as well
* remove linkees from search_name and larger_area tables
* introduce country-specific address ranks
* reorganise rank address computation
* cleanup of partition function
* improve parenting for large POIs
* add support for Postgresql 12 and Postgis 3
* add earlier cleanup when --drop is given, to reduce memory usage
* remove use of place_id in URLs
* replace C nominatim indexer with a simpler Python implementation
* split up the huge sql/functions.sql file
* move osm2pgsql tests to osm2pgsql
* add new extratags style which imports all tags from OSM
* add new script for checking the import after completion
* update osm2pgsql, reducing memory usage
* use new wikipedia importance and add processing of wikidata tags
* add search form for details page
* use ExtraDataPath for country_grid table
* remove short_name from list of names to be displayed
* split up CMakeFile, so that all parts can be built separately
* update installation instructions for CentOS and Ubuntu
* add script for importing/updating multiple country extracts
* various documentation improvements
3.4.2
* fix security bug in /details endpoint where user input was not
properly sanitized
* security fix: fix possible SQL injection via details API
3.4.1
* update osm2pgsql to fix hans during updates and lost address numbers
during updates
* update osm2pgsql
* move deletion to copy thread (fixes deadlock in updates)
* fix filtering where valid address objects got dropped
* fix typo in import styles
3.4.0

View File

@@ -1,5 +1,4 @@
[![Build Status](https://github.com/osm-search/Nominatim/workflows/CI%20Tests/badge.svg)](https://github.com/osm-search/Nominatim/actions?query=workflow%3A%22CI+Tests%22)
[![codecov](https://codecov.io/gh/osm-search/Nominatim/branch/master/graph/badge.svg?token=8P1LXrhCMy)](https://codecov.io/gh/osm-search/Nominatim)
[![Build Status](https://travis-ci.org/openstreetmap/Nominatim.svg?branch=master)](https://travis-ci.org/openstreetmap/Nominatim)
Nominatim
=========
@@ -20,19 +19,12 @@ https://nominatim.org/release-docs/develop/ .
Installation
============
**Nominatim is a complex piece of software and runs in a complex environment.
Installing and running Nominatim is something for experienced system
administrators only who can do some trouble-shooting themselves. We are sorry,
but we can not provide installation support. We are all doing this in our free
time and there is just so much of that time to go around. Do not open issues in
our bug tracker if you need help. Use the discussions forum
or ask for help on [help.openstreetmap.org](https://help.openstreetmap.org/).**
The latest stable release can be downloaded from https://nominatim.org.
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation), as well as an extensive [Troubleshooting/FAQ section](https://nominatim.org/release-docs/latest/admin/Faq/).
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation).
[Detailed installation instructions for current master](https://nominatim.org/release-docs/develop/admin/Installation)
can be found at nominatim.org as well.
Detailed installation instructions for the development version can be
found at [nominatim.org](https://nominatim.org/release-docs/develop/admin/Installation)
as well.
A quick summary of the necessary steps:
@@ -42,15 +34,12 @@ A quick summary of the necessary steps:
cd build
cmake ..
make
sudo make install
2. Create a project directory, get OSM data and import:
2. Get OSM data and import:
mkdir nominatim-project
cd nominatim-project
nominatim import --osm-file <your planet file>
./build/utils/setup.php --osm-file <your planet file> --all
3. Point your webserver to the nominatim-project/website directory.
3. Point your webserver to the ./build/website directory.
License
@@ -62,14 +51,13 @@ The source code is available under a GPLv2 license.
Contributing
============
Contributions, bugreport and pull requests are welcome.
For details see [contribution guide](CONTRIBUTING.md).
Contributions are welcome. For details see [contribution guide](CONTRIBUTING.md).
Both bug reports and pull requests are welcome.
Questions and help
==================
Mailing list
============
For questions, community help and discussions you can use the
[Github discussions forum](https://github.com/osm-search/Nominatim/discussions)
or join the
[geocoding mailing list](https://lists.openstreetmap.org/listinfo/geocoding).
For questions you can join the geocoding mailing list, see
https://lists.openstreetmap.org/listinfo/geocoding

View File

@@ -1,39 +0,0 @@
# Security Policy
## Supported Versions
All Nominatim releases receive security updates for two years.
The following table lists the end of support for all currently supported
versions.
| Version | End of support for security updates |
| ------- | ----------------------------------- |
| 3.7.x | 2023-04-05 |
| 3.6.x | 2022-12-12 |
| 3.5.x | 2022-06-05 |
| 3.4.x | 2021-10-24 |
## Reporting a Vulnerability
If you believe, you have found an issue in Nominatim that has implications on
security, please send a description of the issue to **security@nominatim.org**.
You will receive an acknowledgement of your mail within 3 work days where we
also notify you of the next steps.
## How we Disclose Security Issues
** The following section only applies to security issues found in released
versions. Issues that concern the master development branch only will be
fixed immediately on the branch with the corresponding PR containing the
description of the nature and severity of the issue. **
Patches for identified security issues are applied to all affected versions and
new minor versions are released. At the same time we release a statement at
the [Nominatim blog](https://nominatim.org/blog/) describing the nature of the
incident. Announcements will also be published at the
[geocoding mailinglist](https://lists.openstreetmap.org/listinfo/geocoding).
## List of Previous Incidents
* 2020-05-04 - [SQL injection issue on /details endpoint](https://lists.openstreetmap.org/pipermail/geocoding/2020-May/002012.html)

View File

@@ -141,7 +141,7 @@ No. Long running Nominatim installations will differ once new import features (o
bug fixes) get added since those usually only get applied to new/changed data.
Also this document skips the optional Wikipedia data import which affects ranking
of search results. See [Nominatim installation](https://nominatim.org/release-docs/latest/admin/Installation) for details.
of search results. See [Nominatim installation](http://nominatim.org/release-docs/latest/Installation) for details.
##### Why Ubuntu? Can I test CentOS/Fedora/CoreOS/FreeBSD?
@@ -160,9 +160,9 @@ You can configure/download other Vagrant boxes from [https://app.vagrantup.com/b
Let's say you have a Postgres database named `nominatim_it` on server `your-server.com` and port `5432`. The Postgres username is `postgres`. You can edit `settings/local.php` and point Nominatim to it.
pgsql:host=your-server.com;port=5432;user=postgres;dbname=nominatim_it
pgsql://postgres@your-server.com:5432/nominatim_it
No data import or restarting necessary.
No data import necessary or restarting necessary.
If the Postgres installation is behind a firewall, you can try

88
Vagrantfile vendored
View File

@@ -4,65 +4,18 @@
Vagrant.configure("2") do |config|
# Apache webserver
config.vm.network "forwarded_port", guest: 80, host: 8089
config.vm.network "forwarded_port", guest: 8088, host: 8088
# If true, then any SSH connections made will enable agent forwarding.
config.ssh.forward_agent = true
# Never sync the current directory to /vagrant.
config.vm.synced_folder ".", "/vagrant", disabled: true
checkout = "yes"
if ENV['CHECKOUT'] != 'y' then
checkout = "no"
end
config.vm.provider "virtualbox" do |vb, override|
vb.gui = false
vb.memory = 2048
vb.customize ["setextradata", :id, "VBoxInternal2/SharedFoldersEnableSymlinksCreate//vagrant","0"]
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim"
end
end
config.vm.provider "libvirt" do |lv, override|
lv.memory = 2048
lv.nested = true
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim", type: 'nfs'
end
config.vm.synced_folder ".", "/home/vagrant/Nominatim"
checkout = "no"
end
config.vm.define "ubuntu", primary: true do |sub|
sub.vm.box = "generic/ubuntu2004"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-20.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "ubuntu-apache" do |sub|
sub.vm.box = "generic/ubuntu2004"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-20.sh"
s.privileged = false
s.args = [checkout, "install-apache"]
end
end
config.vm.define "ubuntu-nginx" do |sub|
sub.vm.box = "generic/ubuntu2004"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-20.sh"
s.privileged = false
s.args = [checkout, "install-nginx"]
end
end
config.vm.define "ubuntu18" do |sub|
sub.vm.box = "generic/ubuntu1804"
sub.vm.box = "bento/ubuntu-18.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-18.sh"
s.privileged = false
@@ -70,41 +23,48 @@ Vagrant.configure("2") do |config|
end
end
config.vm.define "ubuntu18-apache" do |sub|
sub.vm.box = "generic/ubuntu1804"
config.vm.define "ubuntu18nginx" do |sub|
sub.vm.box = "bento/ubuntu-18.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-18.sh"
s.path = "vagrant/Install-on-Ubuntu-18-nginx.sh"
s.privileged = false
s.args = [checkout, "install-apache"]
s.args = [checkout]
end
end
config.vm.define "ubuntu18-nginx" do |sub|
sub.vm.box = "generic/ubuntu1804"
config.vm.define "ubuntu16" do |sub|
sub.vm.box = "bento/ubuntu-16.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-18.sh"
s.path = "vagrant/Install-on-Ubuntu-16.sh"
s.privileged = false
s.args = [checkout, "install-nginx"]
s.args = [checkout]
end
end
config.vm.define "centos7" do |sub|
sub.vm.box = "centos/7"
config.vm.define "travis" do |sub|
sub.vm.box = "bento/ubuntu-14.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Centos-7.sh"
s.path = "vagrant/install-on-travis-ci.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "centos" do |sub|
sub.vm.box = "generic/centos8"
sub.vm.box = "centos/7"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Centos-8.sh"
s.path = "vagrant/Install-on-Centos-7.sh"
s.privileged = false
s.args = [checkout]
s.args = "yes"
end
sub.vm.synced_folder ".", "/home/vagrant/Nominatim", disabled: true
sub.vm.synced_folder ".", "/vagrant", disabled: true
end
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = 2048
vb.customize ["setextradata", :id, "VBoxInternal2/SharedFoldersEnableSymlinksCreate//vagrant","0"]
end
end

4
cmake/script.tmpl Executable file
View File

@@ -0,0 +1,4 @@
#!@PHP_BIN@ -Cq
<?php
require_once(dirname(dirname(__FILE__)).'/settings/settings.php');
require_once(CONST_BasePath.'/@script_source@');

View File

@@ -1,17 +0,0 @@
#!/usr/bin/env python3
import sys
import os
sys.path.insert(1, '@NOMINATIM_LIBDIR@/lib-python')
os.environ['NOMINATIM_NOMINATIM_TOOL'] = os.path.abspath(__file__)
from nominatim import cli
exit(cli.nominatim(module_dir='@NOMINATIM_LIBDIR@/module',
osm2pgsql_path='@NOMINATIM_LIBDIR@/osm2pgsql',
phplib_dir='@NOMINATIM_LIBDIR@/lib-php',
sqllib_dir='@NOMINATIM_LIBDIR@/lib-sql',
data_dir='@NOMINATIM_DATADIR@',
config_dir='@NOMINATIM_CONFIGDIR@',
phpcgi_path='@PHPCGI_BIN@'))

View File

@@ -1,17 +0,0 @@
#!/usr/bin/env python3
import sys
import os
sys.path.insert(1, '@CMAKE_SOURCE_DIR@')
os.environ['NOMINATIM_NOMINATIM_TOOL'] = os.path.abspath(__file__)
from nominatim import cli
exit(cli.nominatim(module_dir='@CMAKE_BINARY_DIR@/module',
osm2pgsql_path='@CMAKE_BINARY_DIR@/osm2pgsql/osm2pgsql',
phplib_dir='@CMAKE_SOURCE_DIR@/lib-php',
sqllib_dir='@CMAKE_SOURCE_DIR@/lib-sql',
data_dir='@CMAKE_SOURCE_DIR@/data',
config_dir='@CMAKE_SOURCE_DIR@/settings',
phpcgi_path='@PHPCGI_BIN@'))

3
cmake/website.tmpl Executable file
View File

@@ -0,0 +1,3 @@
<?php
require_once(dirname(dirname(__FILE__)).'/settings/settings.php');
require_once(CONST_BasePath.'/@script_source@');

View File

@@ -1,14 +0,0 @@
codecov:
require_ci_to_pass: yes
coverage:
status:
project: off
patch: off
comment:
require_changes: true
after_n_builds: 2
fixes:
- "Nominatim/::"

View File

@@ -0,0 +1,77 @@
# Fallback Country Boundaries
Each place is assigned a `country_code` and partition. Partitions derive from `country_code`.
Nominatim imports two pre-generated files
* `data/country_name.sql` (country code, name, default language, partition)
* `data/country_osm_grid.sql` (country code, geometry)
before creating places in the database. This helps with fast lookups and missing data (e.g. if the data the user wants to import doesn't contain any country places).
The number of countries in the world can change (South Sudan created 2011, Germany reunification), so can their boundaries. This document explain how the pre-generated files can be updated.
## Country code
Each place is assigned a two letter country_code based on its location, e.g. `gb` for Great Britain. Or `NULL` if no suitable country is found (usually it's in open water then).
In `sql/functions.sql: get_country_code(geometry)` the place's center is checked against
1. country places already imported from the user's data file. Places are imported by rank low-to-high. Lowest rank 2 is countries so most places should be matched. Still the data file might be incomplete.
2. if unmatched: OSM grid boundaries
3. if still unmatched: OSM grid boundaries, but allow a small distance
## Partitions
Each place is assigned partition, which is a number 0..250. 0 is fallback/other.
During place indexing (`sql/functions.sql: placex_insert()`) a place is assigned the partition based on its country code (`sql/functions.sql: get_partition(country_code)`). It checks in the `country_name` table.
Most countries have their own partition, some share a partition. Thus partition counts vary greatly.
Several database tables are split by partition to allow queries to run against less indices and improve caching.
* `location_area_large_<partition>`
* `search_name_<partition>`
* `location_road_<partition>`
## Data files
### data/country_name.sql
Export from existing database table plus manual changes. `country_default_language_code` most taken from [https://wiki.openstreetmap.org/wiki/Nominatim/Country_Codes](), see `utils/country_languages.php`.
### data/country_osm_grid.sql
`country_grid.sql` merges territories by country. Then uses `function.sql: quad_split_geometry` to split each country into multiple [Quadtree](https://en.wikipedia.org/wiki/Quadtree) polygons for faster point-in-polygon lookups.
To visualize one country as geojson feature collection, e.g. for loading into [geojson.io](http://geojson.io/):
```
-- http://www.postgresonline.com/journal/archives/267-Creating-GeoJSON-Feature-Collections-with-JSON-and-PostGIS-functions.html
SELECT row_to_json(fc)
FROM (
SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
FROM (
SELECT 'Feature' As type,
ST_AsGeoJSON(lg.geometry)::json As geometry,
row_to_json((country_code, area)) As properties
FROM country_osm_grid As lg where country_code='mx'
) As f
) As fc;
```
`cat /tmp/query.sql | psql -At nominatim > /tmp/mexico.quad.geojson`
![mexico](mexico.quad.png)

View File

@@ -0,0 +1,33 @@
-- Script to build a calculated country grid from existing tables
DROP TABLE IF EXISTS tmp_country_osm_grid;
CREATE TABLE tmp_country_osm_grid as select country_name.country_code,st_union(placex.geometry) as geometry from country_name,
placex
where (lower(placex.country_code) = country_name.country_code)
and placex.rank_search < 16 and st_area(placex.geometry) > 0
group by country_name.country_code;
ALTER TABLE tmp_country_osm_grid add column area double precision;
UPDATE tmp_country_osm_grid set area = st_area(geometry::geography);
-- compare old and new
select country_code, round, round(log(area)) from (select distinct country_code,round(log(area)) from country_osm_grid order by country_code) as x
left outer join tmp_country_osm_grid using (country_code) where area is null or round(log(area)) != round;
DROP TABLE IF EXISTS new_country_osm_grid;
CREATE TABLE new_country_osm_grid as select country_code,area,quad_split_geometry(geometry,0.5,20) as geometry from tmp_country_osm_grid;
CREATE INDEX new_idx_country_osm_grid_geometry ON new_country_osm_grid USING GIST (geometry);
-- Sometimes there are problems calculating area due to invalid data - optionally recalc
UPDATE new_country_osm_grid set area = sum from (select country_code,sum(case when st_area(geometry::geography) = 'NaN' THEN 0 ELSE st_area(geometry::geography) END)
from new_country_osm_grid group by country_code) as x where x.country_code = new_country_osm_grid.country_code;
-- compare old and new
select country_code, x.round, y.round from (select distinct country_code,round(log(area)) from country_osm_grid order by country_code) as x
left outer join (select distinct country_code,round(log(area)) from new_country_osm_grid order by country_code) as y
using (country_code) where x.round != y.round;
-- Flip the new table in
BEGIN;
DROP TABLE IF EXISTS country_osm_grid;
ALTER TABLE new_country_osm_grid rename to country_osm_grid;
ALTER INDEX new_idx_country_osm_grid_geometry RENAME TO idx_country_osm_grid_geometry;
COMMIT;

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

View File

@@ -0,0 +1,56 @@
# GB Postcodes
The server [importing instructions](https://www.nominatim.org/release-docs/latest/admin/Import-and-Update/) allow optionally download [`gb_postcode_data.sql.gz`](https://www.nominatim.org/data/gb_postcode_data.sql.gz). This document explains how the file got created.
## GB vs UK
GB (Great Britain) is more correct as the Ordnance Survey dataset doesn't contain postcodes from Northern Ireland.
## Importing separately after the initial import
If you forgot to download the file, or have a new version, you can import it separately:
1. Import the downloaded `gb_postcode_data.sql.gz` file.
2. Run the SQL query `SELECT count(getorcreate_postcode_id(postcode)) FROM gb_postcode;`. This will update the search index.
3. Run `utils/setup.php --calculate-postcodes` from the build directory. This will copy data form the `gb_postcode` table to the `location_postcodes` table.
## Converting Code-Point Open data
1. Download from [Code-Point® Open](https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html). It requires an email address where a download link will be send to.
2. `unzip codepo_gb.zip`
Unpacked you'll see a directory of CSV files.
$ more codepo_gb/Data/CSV/n.csv
"N1 0AA",10,530626,183961,"E92000001","E19000003","E18000007","","E09000019","E05000368"
"N1 0AB",10,530559,183978,"E92000001","E19000003","E18000007","","E09000019","E05000368"
The coordinates are "Northings" and "Eastings" in [OSGB 1936](http://epsg.io/1314) projection. They can be projected to WGS84 like this
SELECT ST_AsText(ST_Transform(ST_SetSRID('POINT(530626 183961)'::geometry,27700), 4326));
POINT(-0.117872733220225 51.5394424719303)
[-0.117872733220225 51.5394424719303 on OSM map](https://www.openstreetmap.org/?mlon=-0.117872733220225&mlat=51.5394424719303&zoom=16)
3. Create database, import CSV files, add geometry column, dump into file
DBNAME=create_gb_postcode_file
createdb $DBNAME
echo 'CREATE EXTENSION postgis' | psql $DBNAME
cat data/gb_postcode_table.sql | psql $DBNAME
cat codepo_gb/Data/CSV/*.csv | ./data-sources/gb-postcodes/convert_codepoint.php | psql $DBNAME
cat codepo_gb/Doc/licence.txt | iconv -f iso-8859-1 -t utf-8 | dos2unix | sed 's/^/-- /g' > gb_postcode_data.sql
pg_dump -a -t gb_postcode $DBNAME | grep -v '^--' >> gb_postcode_data.sql
gzip -9 -f gb_postcode_data.sql
ls -lah gb_postcode_data.*
# dropdb $DBNAME

View File

@@ -0,0 +1,37 @@
#!/usr/bin/env php
<?php
echo <<< EOT
ALTER TABLE gb_postcode ADD COLUMN easting bigint;
ALTER TABLE gb_postcode ADD COLUMN northing bigint;
TRUNCATE gb_postcode;
COPY gb_postcode (id, postcode, easting, northing) FROM stdin;
EOT;
$iCounter = 0;
while ($sLine = fgets(STDIN)) {
$aColumns = str_getcsv($sLine);
// insert space before the third last position
// https://stackoverflow.com/a/9144834
$postcode = $aColumns[0];
$postcode = preg_replace('/\s*(...)$/', ' $1', $postcode);
echo join("\t", array($iCounter, $postcode, $aColumns[2], $aColumns[3]))."\n";
$iCounter = $iCounter + 1;
}
echo <<< EOT
\.
UPDATE gb_postcode SET geometry=ST_Transform(ST_SetSRID(CONCAT('POINT(', easting, ' ', northing, ')')::geometry, 27700), 4326);
ALTER TABLE gb_postcode DROP COLUMN easting;
ALTER TABLE gb_postcode DROP COLUMN northing;
EOT;

View File

@@ -0,0 +1,26 @@
# US TIGER address data
Convert [TIGER](https://www.census.gov/geo/maps-data/data/tiger.html)/Line dataset of the US Census Bureau to SQL files which can be imported by Nominatim. The created tables in the Nominatim database are separate from OpenStreetMap tables and get queried at search time separately.
The dataset gets updated once per year. Downloading is prone to be slow (can take a full day) and converting them can take hours as well.
Replace '2019' with the current year throughout.
1. Install the GDAL library and python bindings and the unzip tool
# Ubuntu:
sudo apt-get install python3-gdal unzip
2. Get the TIGER 2019 data. You will need the EDGES files
(3,233 zip files, 11GB total).
wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2019/EDGES/
3. Convert the data into SQL statements. Adjust the file paths in the scripts as needed
cd data-sources/us-tiger
./convert.sh <input-path> <output-path>
4. Maybe: package the created files
tar -czf tiger2019-nominatim-preprocessed.tar.gz tiger

View File

@@ -0,0 +1,48 @@
#!/bin/bash
INPATH=$1
OUTPATH=$2
if [[ ! -d "$INPATH" ]]; then
echo "input path does not exist"
exit 1
fi
if [[ ! -d "$OUTPATH" ]]; then
echo "output path does not exist"
exit 1
fi
INREGEX='_([0-9]{5})_edges.zip'
WORKPATH="$OUTPATH/tmp-workdir/"
mkdir -p "$WORKPATH"
INFILES=($INPATH/*.zip)
echo "Found ${#INFILES[*]} files."
for F in ${INFILES[*]}; do
# echo $F
if [[ "$F" =~ $INREGEX ]]; then
COUNTYID=${BASH_REMATCH[1]}
SHAPEFILE="$WORKPATH/$(basename $F '.zip').shp"
SQLFILE="$OUTPATH/$COUNTYID.sql"
unzip -o -q -d "$WORKPATH" "$F"
if [[ ! -e "$SHAPEFILE" ]]; then
echo "Unzip failed. $SHAPEFILE not found."
exit 1
fi
./tiger_address_convert.py "$SHAPEFILE" "$SQLFILE"
rm $WORKPATH/*
fi
done
OUTFILES=($OUTPATH/*.sql)
echo "Wrote ${#OUTFILES[*]} files."
rmdir $WORKPATH

View File

@@ -0,0 +1,620 @@
#!/usr/bin/python3
# Tiger road data to OSM conversion script
# Creates Karlsruhe-style address ways beside the main way
# based on the Massachusetts GIS script by christopher schmidt
#BUGS:
# On very tight curves, a loop may be generated in the address way.
# It would be nice if the ends of the address ways were not pulled back from dead ends
# Ways that include these mtfccs should not be uploaded
# H1100 Connector
# H3010 Stream/River
# H3013 Braided Stream
# H3020 Canal, Ditch or Aqueduct
# L4130 Point-to-Point Line
# L4140 Property/Parcel Line (Including PLSS)
# P0001 Nonvisible Linear Legal/Statistical Boundary
# P0002 Perennial Shoreline
# P0003 Intermittent Shoreline
# P0004 Other non-visible bounding Edge (e.g., Census water boundary, boundary of an areal feature)
ignoremtfcc = [ "H1100", "H3010", "H3013", "H3020", "L4130", "L4140", "P0001", "P0002", "P0003", "P0004" ]
# Sets the distance that the address ways should be from the main way, in feet.
address_distance = 30
# Sets the distance that the ends of the address ways should be pulled back from the ends of the main way, in feet
address_pullback = 45
import sys, os.path, json
try:
from osgeo import ogr
from osgeo import osr
except:
import ogr
import osr
# https://www.census.gov/geo/reference/codes/cou.html
# tiger_county_fips.json was generated from the following:
# wget https://www2.census.gov/geo/docs/reference/codes/files/national_county.txt
# cat national_county.txt | perl -F, -naE'($F[0] ne 'AS') && $F[3] =~ s/ ((city|City|County|District|Borough|City and Borough|Municipio|Municipality|Parish|Island|Census Area)(?:, |\Z))+//; say qq( "$F[1]$F[2]": "$F[3], $F[0]",)'
json_fh = open(os.path.dirname(sys.argv[0]) + "/tiger_county_fips.json")
county_fips_data = json.load(json_fh)
def parse_shp_for_geom_and_tags( filename ):
#ogr.RegisterAll()
dr = ogr.GetDriverByName("ESRI Shapefile")
poDS = dr.Open( filename )
if poDS == None:
raise "Open failed."
poLayer = poDS.GetLayer( 0 )
fieldNameList = []
layerDefinition = poLayer.GetLayerDefn()
for i in range(layerDefinition.GetFieldCount()):
fieldNameList.append(layerDefinition.GetFieldDefn(i).GetName())
# sys.stderr.write(",".join(fieldNameList))
poLayer.ResetReading()
ret = []
poFeature = poLayer.GetNextFeature()
while poFeature:
tags = {}
# WAY ID
tags["tiger:way_id"] = int( poFeature.GetField("TLID") )
# FEATURE IDENTIFICATION
mtfcc = poFeature.GetField("MTFCC");
if mtfcc != None:
if mtfcc == "L4010": #Pipeline
tags["man_made"] = "pipeline"
if mtfcc == "L4020": #Powerline
tags["power"] = "line"
if mtfcc == "L4031": #Aerial Tramway/Ski Lift
tags["aerialway"] = "cable_car"
if mtfcc == "L4110": #Fence Line
tags["barrier"] = "fence"
if mtfcc == "L4125": #Cliff/Escarpment
tags["natural"] = "cliff"
if mtfcc == "L4165": #Ferry Crossing
tags["route"] = "ferry"
if mtfcc == "R1011": #Railroad Feature (Main, Spur, or Yard)
tags["railway"] = "rail"
ttyp = poFeature.GetField("TTYP")
if ttyp != None:
if ttyp == "S":
tags["service"] = "spur"
if ttyp == "Y":
tags["service"] = "yard"
tags["tiger:ttyp"] = ttyp
if mtfcc == "R1051": #Carline, Streetcar Track, Monorail, Other Mass Transit Rail)
tags["railway"] = "light_rail"
if mtfcc == "R1052": #Cog Rail Line, Incline Rail Line, Tram
tags["railway"] = "incline"
if mtfcc == "S1100":
tags["highway"] = "primary"
if mtfcc == "S1200":
tags["highway"] = "secondary"
if mtfcc == "S1400":
tags["highway"] = "residential"
if mtfcc == "S1500":
tags["highway"] = "track"
if mtfcc == "S1630": #Ramp
tags["highway"] = "motorway_link"
if mtfcc == "S1640": #Service Drive usually along a limited access highway
tags["highway"] = "service"
if mtfcc == "S1710": #Walkway/Pedestrian Trail
tags["highway"] = "path"
if mtfcc == "S1720":
tags["highway"] = "steps"
if mtfcc == "S1730": #Alley
tags["highway"] = "service"
tags["service"] = "alley"
if mtfcc == "S1740": #Private Road for service vehicles (logging, oil, fields, ranches, etc.)
tags["highway"] = "service"
tags["access"] = "private"
if mtfcc == "S1750": #Private Driveway
tags["highway"] = "service"
tags["access"] = "private"
tags["service"] = "driveway"
if mtfcc == "S1780": #Parking Lot Road
tags["highway"] = "service"
tags["service"] = "parking_aisle"
if mtfcc == "S1820": #Bike Path or Trail
tags["highway"] = "cycleway"
if mtfcc == "S1830": #Bridle Path
tags["highway"] = "bridleway"
tags["tiger:mtfcc"] = mtfcc
# FEATURE NAME
if poFeature.GetField("FULLNAME"):
#capitalizes the first letter of each word
name = poFeature.GetField( "FULLNAME" )
tags["name"] = name
#Attempt to guess highway grade
if name[0:2] == "I-":
tags["highway"] = "motorway"
if name[0:3] == "US ":
tags["highway"] = "primary"
if name[0:3] == "US-":
tags["highway"] = "primary"
if name[0:3] == "Hwy":
if tags["highway"] != "primary":
tags["highway"] = "secondary"
# TIGER 2017 no longer contains this field
if 'DIVROAD' in fieldNameList:
divroad = poFeature.GetField("DIVROAD")
if divroad != None:
if divroad == "Y" and "highway" in tags and tags["highway"] == "residential":
tags["highway"] = "tertiary"
tags["tiger:separated"] = divroad
statefp = poFeature.GetField("STATEFP")
countyfp = poFeature.GetField("COUNTYFP")
if (statefp != None) and (countyfp != None):
county_name = county_fips_data.get(statefp + '' + countyfp)
if county_name:
tags["tiger:county"] = county_name
# tlid = poFeature.GetField("TLID")
# if tlid != None:
# tags["tiger:tlid"] = tlid
lfromadd = poFeature.GetField("LFROMADD")
if lfromadd != None:
tags["tiger:lfromadd"] = lfromadd
rfromadd = poFeature.GetField("RFROMADD")
if rfromadd != None:
tags["tiger:rfromadd"] = rfromadd
ltoadd = poFeature.GetField("LTOADD")
if ltoadd != None:
tags["tiger:ltoadd"] = ltoadd
rtoadd = poFeature.GetField("RTOADD")
if rtoadd != None:
tags["tiger:rtoadd"] = rtoadd
zipl = poFeature.GetField("ZIPL")
if zipl != None:
tags["tiger:zip_left"] = zipl
zipr = poFeature.GetField("ZIPR")
if zipr != None:
tags["tiger:zip_right"] = zipr
if mtfcc not in ignoremtfcc:
# COPY DOWN THE GEOMETRY
geom = []
rawgeom = poFeature.GetGeometryRef()
for i in range( rawgeom.GetPointCount() ):
geom.append( (rawgeom.GetX(i), rawgeom.GetY(i)) )
ret.append( (geom, tags) )
poFeature = poLayer.GetNextFeature()
return ret
# ====================================
# to do read .prj file for this data
# Change the Projcs_wkt to match your datas prj file.
# ====================================
projcs_wkt = \
"""GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]]"""
from_proj = osr.SpatialReference()
from_proj.ImportFromWkt( projcs_wkt )
# output to WGS84
to_proj = osr.SpatialReference()
to_proj.SetWellKnownGeogCS( "EPSG:4326" )
tr = osr.CoordinateTransformation( from_proj, to_proj )
import math
def length(segment, nodelist):
'''Returns the length (in feet) of a segment'''
first = True
distance = 0
lat_feet = 364613 #The approximate number of feet in one degree of latitude
for point in segment:
pointid, (lat, lon) = nodelist[ round_point( point ) ]
if first:
first = False
else:
#The approximate number of feet in one degree of longitute
lrad = math.radians(lat)
lon_feet = 365527.822 * math.cos(lrad) - 306.75853 * math.cos(3 * lrad) + 0.3937 * math.cos(5 * lrad)
distance += math.sqrt(((lat - previous[0])*lat_feet)**2 + ((lon - previous[1])*lon_feet)**2)
previous = (lat, lon)
return distance
def addressways(waylist, nodelist, first_id):
id = first_id
lat_feet = 364613 #The approximate number of feet in one degree of latitude
distance = float(address_distance)
ret = []
for waykey, segments in waylist.items():
waykey = dict(waykey)
rsegments = []
lsegments = []
for segment in segments:
lsegment = []
rsegment = []
lastpoint = None
# Don't pull back the ends of very short ways too much
seglength = length(segment, nodelist)
if seglength < float(address_pullback) * 3.0:
pullback = seglength / 3.0
else:
pullback = float(address_pullback)
if "tiger:lfromadd" in waykey:
lfromadd = waykey["tiger:lfromadd"]
else:
lfromadd = None
if "tiger:ltoadd" in waykey:
ltoadd = waykey["tiger:ltoadd"]
else:
ltoadd = None
if "tiger:rfromadd" in waykey:
rfromadd = waykey["tiger:rfromadd"]
else:
rfromadd = None
if "tiger:rtoadd" in waykey:
rtoadd = waykey["tiger:rtoadd"]
else:
rtoadd = None
if rfromadd != None and rtoadd != None:
right = True
else:
right = False
if lfromadd != None and ltoadd != None:
left = True
else:
left = False
if left or right:
first = True
firstpointid, firstpoint = nodelist[ round_point( segment[0] ) ]
finalpointid, finalpoint = nodelist[ round_point( segment[len(segment) - 1] ) ]
for point in segment:
pointid, (lat, lon) = nodelist[ round_point( point ) ]
#The approximate number of feet in one degree of longitute
lrad = math.radians(lat)
lon_feet = 365527.822 * math.cos(lrad) - 306.75853 * math.cos(3 * lrad) + 0.3937 * math.cos(5 * lrad)
#Calculate the points of the offset ways
if lastpoint != None:
#Skip points too close to start
if math.sqrt((lat * lat_feet - firstpoint[0] * lat_feet)**2 + (lon * lon_feet - firstpoint[1] * lon_feet)**2) < pullback:
#Preserve very short ways (but will be rendered backwards)
if pointid != finalpointid:
continue
#Skip points too close to end
if math.sqrt((lat * lat_feet - finalpoint[0] * lat_feet)**2 + (lon * lon_feet - finalpoint[1] * lon_feet)**2) < pullback:
#Preserve very short ways (but will be rendered backwards)
if (pointid != firstpointid) and (pointid != finalpointid):
continue
X = (lon - lastpoint[1]) * lon_feet
Y = (lat - lastpoint[0]) * lat_feet
if Y != 0:
theta = math.pi/2 - math.atan( X / Y)
Xp = math.sin(theta) * distance
Yp = math.cos(theta) * distance
else:
Xp = 0
if X > 0:
Yp = -distance
else:
Yp = distance
if Y > 0:
Xp = -Xp
else:
Yp = -Yp
if first:
first = False
dX = - (Yp * (pullback / distance)) / lon_feet #Pull back the first point
dY = (Xp * (pullback / distance)) / lat_feet
if left:
lpoint = (lastpoint[0] + (Yp / lat_feet) - dY, lastpoint[1] + (Xp / lon_feet) - dX)
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - (Yp / lat_feet) - dY, lastpoint[1] - (Xp / lon_feet) - dX)
rsegment.append( (id, rpoint) )
id += 1
else:
#round the curves
if delta[1] != 0:
theta = abs(math.atan(delta[0] / delta[1]))
else:
theta = math.pi / 2
if Xp != 0:
theta = theta - abs(math.atan(Yp / Xp))
else: theta = theta - math.pi / 2
r = 1 + abs(math.tan(theta/2))
if left:
lpoint = (lastpoint[0] + (Yp + delta[0]) * r / (lat_feet * 2), lastpoint[1] + (Xp + delta[1]) * r / (lon_feet * 2))
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - (Yp + delta[0]) * r / (lat_feet * 2), lastpoint[1] - (Xp + delta[1]) * r / (lon_feet * 2))
rsegment.append( (id, rpoint) )
id += 1
delta = (Yp, Xp)
lastpoint = (lat, lon)
#Add in the last node
dX = - (Yp * (pullback / distance)) / lon_feet
dY = (Xp * (pullback / distance)) / lat_feet
if left:
lpoint = (lastpoint[0] + (Yp + delta[0]) / (lat_feet * 2) + dY, lastpoint[1] + (Xp + delta[1]) / (lon_feet * 2) + dX )
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - Yp / lat_feet + dY, lastpoint[1] - Xp / lon_feet + dX)
rsegment.append( (id, rpoint) )
id += 1
#Generate the tags for ways and nodes
zipr = ''
zipl = ''
name = ''
county = ''
if "tiger:zip_right" in waykey:
zipr = waykey["tiger:zip_right"]
if "tiger:zip_left" in waykey:
zipl = waykey["tiger:zip_left"]
if "name" in waykey:
name = waykey["name"]
if "tiger:county" in waykey:
county = waykey["tiger:county"]
if "tiger:separated" in waykey: # No longer set in Tiger-2017
separated = waykey["tiger:separated"]
else:
separated = "N"
#Write the nodes of the offset ways
if right:
rlinestring = [];
for i, point in rsegment:
rlinestring.append( "%f %f" % (point[1], point[0]) )
if left:
llinestring = [];
for i, point in lsegment:
llinestring.append( "%f %f" % (point[1], point[0]) )
if right:
rsegments.append( rsegment )
if left:
lsegments.append( lsegment )
rtofromint = right #Do the addresses convert to integers?
ltofromint = left #Do the addresses convert to integers?
if right:
try: rfromint = int(rfromadd)
except:
print("Non integer address: %s" % rfromadd)
rtofromint = False
try: rtoint = int(rtoadd)
except:
print("Non integer address: %s" % rtoadd)
rtofromint = False
if left:
try: lfromint = int(lfromadd)
except:
print("Non integer address: %s" % lfromadd)
ltofromint = False
try: ltoint = int(ltoadd)
except:
print("Non integer address: %s" % ltoadd)
ltofromint = False
if right:
id += 1
interpolationtype = "all";
if rtofromint:
if (rfromint % 2) == 0 and (rtoint % 2) == 0:
if separated == "Y": #Doesn't matter if there is another side
interpolationtype = "even";
elif ltofromint and (lfromint % 2) == 1 and (ltoint % 2) == 1:
interpolationtype = "even";
elif (rfromint % 2) == 1 and (rtoint % 2) == 1:
if separated == "Y": #Doesn't matter if there is another side
interpolationtype = "odd";
elif ltofromint and (lfromint % 2) == 0 and (ltoint % 2) == 0:
interpolationtype = "odd";
ret.append( "SELECT tiger_line_import(ST_GeomFromText('LINESTRING(%s)',4326), %s, %s, %s, %s, %s, %s);" %
( ",".join(rlinestring), sql_quote(rfromadd), sql_quote(rtoadd), sql_quote(interpolationtype), sql_quote(name), sql_quote(county), sql_quote(zipr) ) )
if left:
id += 1
interpolationtype = "all";
if ltofromint:
if (lfromint % 2) == 0 and (ltoint % 2) == 0:
if separated == "Y":
interpolationtype = "even";
elif rtofromint and (rfromint % 2) == 1 and (rtoint % 2) == 1:
interpolationtype = "even";
elif (lfromint % 2) == 1 and (ltoint % 2) == 1:
if separated == "Y":
interpolationtype = "odd";
elif rtofromint and (rfromint %2 ) == 0 and (rtoint % 2) == 0:
interpolationtype = "odd";
ret.append( "SELECT tiger_line_import(ST_GeomFromText('LINESTRING(%s)',4326), %s, %s, %s, %s, %s, %s);" %
( ",".join(llinestring), sql_quote(lfromadd), sql_quote(ltoadd), sql_quote(interpolationtype), sql_quote(name), sql_quote(county), sql_quote(zipl) ) )
return ret
def sql_quote( string ):
return "'" + string.replace("'", "''") + "'"
def unproject( point ):
pt = tr.TransformPoint( point[0], point[1] )
return (pt[1], pt[0])
def round_point( point, accuracy=8 ):
return tuple( [ round(x,accuracy) for x in point ] )
def compile_nodelist( parsed_gisdata, first_id=1 ):
nodelist = {}
i = first_id
for geom, tags in parsed_gisdata:
if len( geom )==0:
continue
for point in geom:
r_point = round_point( point )
if r_point not in nodelist:
nodelist[ r_point ] = (i, unproject( point ))
i += 1
return (i, nodelist)
def adjacent( left, right ):
left_left = round_point(left[0])
left_right = round_point(left[-1])
right_left = round_point(right[0])
right_right = round_point(right[-1])
return ( left_left == right_left or
left_left == right_right or
left_right == right_left or
left_right == right_right )
def glom( left, right ):
left = list( left )
right = list( right )
left_left = round_point(left[0])
left_right = round_point(left[-1])
right_left = round_point(right[0])
right_right = round_point(right[-1])
if left_left == right_left:
left.reverse()
return left[0:-1] + right
if left_left == right_right:
return right[0:-1] + left
if left_right == right_left:
return left[0:-1] + right
if left_right == right_right:
right.reverse()
return left[0:-1] + right
raise 'segments are not adjacent'
def glom_once( segments ):
if len(segments)==0:
return segments
unsorted = list( segments )
x = unsorted.pop(0)
while len( unsorted ) > 0:
n = len( unsorted )
for i in range(0, n):
y = unsorted[i]
if adjacent( x, y ):
y = unsorted.pop(i)
x = glom( x, y )
break
# Sorted and unsorted lists have no adjacent segments
if len( unsorted ) == n:
break
return x, unsorted
def glom_all( segments ):
unsorted = segments
chunks = []
while unsorted != []:
chunk, unsorted = glom_once( unsorted )
chunks.append( chunk )
return chunks
def compile_waylist( parsed_gisdata ):
waylist = {}
#Group by tiger:way_id
for geom, tags in parsed_gisdata:
way_key = tags.copy()
way_key = ( way_key['tiger:way_id'], tuple( [(k,v) for k,v in way_key.items()] ) )
if way_key not in waylist:
waylist[way_key] = []
waylist[way_key].append( geom )
ret = {}
for (way_id, way_key), segments in waylist.items():
ret[way_key] = glom_all( segments )
return ret
def shape_to_sql( shp_filename, sql_filename ):
print("parsing shpfile %s" % shp_filename)
parsed_features = parse_shp_for_geom_and_tags( shp_filename )
print("compiling nodelist")
i, nodelist = compile_nodelist( parsed_features )
print("compiling waylist")
waylist = compile_waylist( parsed_features )
print("preparing address ways")
sql_lines = addressways(waylist, nodelist, i)
print("writing %s" % sql_filename)
fp = open( sql_filename, "w" )
fp.write( "\n".join( sql_lines ) )
fp.close()
if __name__ == '__main__':
import sys, os.path
if len(sys.argv) < 3:
print("%s input.shp output.sql" % sys.argv[0])
sys.exit()
shp_filename = sys.argv[1]
sql_filename = sys.argv[2]
shape_to_sql(shp_filename, sql_filename)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,58 @@
## Add Wikipedia and Wikidata to Nominatim
OSM contributors frequently tag items with links to Wikipedia and Wikidata. Nominatim can use the page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.
These scripts extract and prepare both Wikipedia page rank and Wikidata links for use in Nominatim.
#### Create a new postgres DB for Processing
Due to the size of initial and intermediate tables, processing can be done in an external database:
```
CREATE DATABASE wikiprocessingdb;
```
---
Wikipedia
---
Processing these data requires a large amount of disk space (~1TB) and considerable time (>24 hours).
#### Import & Process Wikipedia tables
This step downloads and converts [Wikipedia](https://dumps.wikimedia.org/) page data SQL dumps to postgreSQL files which can be imported and processed with pagelink information from Wikipedia language sites to calculate importance scores.
- The script will processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Note that processing the top 40 Wikipedia languages can take over a day, and will add nearly 1TB to the processing database. The final output tables will be approximately 11GB and 2GB in size
To download, convert, and import the data, then process summary statistics and compute importance scores, run:
```
./wikipedia_import.sh
```
---
Wikidata
---
This script downloads and processes Wikidata to enrich the previously created Wikipedia tables for use in Nominatim.
#### Import & Process Wikidata
This step downloads and converts [Wikidata](https://dumps.wikimedia.org/wikidatawiki/) page data SQL dumps to postgreSQL files which can be processed and imported into Nominatim database. Also utilizes Wikidata Query Service API to discover and include place types.
- Script presumes that the user has already processed Wikipedia tables as specified above
- Script requires wikidata_place_types.txt and wikidata_place_type_levles.csv
- script requires the [jq json parser](https://stedolan.github.io/jq/)
- Script processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Script queries Wikidata Query Service API and imports all instances of place types listed in wikidata_place_types.txt
- Script updates wikipedia_articles table with extracted wikidata
By including Wikidata in the wikipedia_articles table, new connections can be made on the fly from the Nominatim placex table to wikipedia_article importance scores.
To download, convert, and import the data, then process required items, run:
```
./wikidata_import.sh
```

View File

@@ -0,0 +1,95 @@
#!/bin/bash
psqlcmd() {
psql wikiprocessingdb
}
mysql2pgsqlcmd() {
./mysql2pgsql.perl /dev/stdin /dev/stdout
}
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
# get a few wikidata dump tables
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-geo_tags.sql.gz
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-page.sql.gz
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-wb_items_per_site.sql.gz
# import wikidata tables
gzip -dc wikidatawiki-latest-geo_tags.sql.gz | mysql2pgsqlcmd | psqlcmd
gzip -dc wikidatawiki-latest-page.sql.gz | mysql2pgsqlcmd | psqlcmd
gzip -dc wikidatawiki-latest-wb_items_per_site.sql.gz | mysql2pgsqlcmd | psqlcmd
# get wikidata places from wikidata query API
while read F ; do
wget "https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT ?item WHERE{?item wdt:P31*/wdt:P279*wd:$F;}" -O $F.json
jq -r '.results | .[] | .[] | [.item.value] | @csv' $F.json >> $F.txt
awk -v qid=$F '{print $0 ","qid}' $F.txt | sed -e 's!"http://www.wikidata.org/entity/!!' | sed 's/"//g' >> $F.csv
cat $F.csv >> wikidata_place_dump.csv
rm $F.json $F.txt $F.csv
done < wikidata_place_types.txt
# import wikidata places
echo "CREATE TABLE wikidata_place_dump (item text, instance_of text);" | psqlcmd
echo "COPY wikidata_place_dump (item, instance_of) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_dump.csv' DELIMITER ',' CSV;" | psqlcmd
echo "CREATE TABLE wikidata_place_type_levels (place_type text, level integer);" | psqlcmd
echo "COPY wikidata_place_type_levels (place_type, level) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_type_levels.csv' DELIMITER ',' CSV HEADER;" | psqlcmd
# create derived tables
echo "CREATE TABLE geo_earth_primary AS SELECT gt_page_id, gt_lat, gt_lon FROM geo_tags WHERE gt_globe = 'earth' AND gt_primary = 1 AND NOT( gt_lat < -90 OR gt_lat > 90 OR gt_lon < -180 OR gt_lon > 180 OR gt_lat=0 OR gt_lon=0) ;" | psqlcmd
echo "CREATE TABLE geo_earth_wikidata AS SELECT DISTINCT geo_earth_primary.gt_page_id, geo_earth_primary.gt_lat, geo_earth_primary.gt_lon, page.page_title, page.page_namespace FROM geo_earth_primary LEFT OUTER JOIN page ON (geo_earth_primary.gt_page_id = page.page_id) ORDER BY geo_earth_primary.gt_page_id;" | psqlcmd
echo "ALTER TABLE wikidata_place_dump ADD COLUMN ont_level integer, ADD COLUMN lat numeric(11,8), ADD COLUMN lon numeric(11,8);" | psqlcmd
echo "UPDATE wikidata_place_dump SET ont_level = wikidata_place_type_levels.level FROM wikidata_place_type_levels WHERE wikidata_place_dump.instance_of = wikidata_place_type_levels.place_type;" | psqlcmd
echo "CREATE TABLE wikidata_places AS SELECT DISTINCT ON (item) item, instance_of, MAX(ont_level) AS ont_level, lat, lon FROM wikidata_place_dump GROUP BY item, instance_of, ont_level, lat, lon ORDER BY item;" | psqlcmd
echo "UPDATE wikidata_places SET lat = geo_earth_wikidata.gt_lat, lon = geo_earth_wikidata.gt_lon FROM geo_earth_wikidata WHERE wikidata_places.item = geo_earth_wikidata.page_title" | psqlcmd
# process language pages
echo "CREATE TABLE wikidata_pages (item text, instance_of text, lat numeric(11,8), lon numeric(11,8), ips_site_page text, language text );" | psqlcmd
for i in "${language[@]}"
do
echo "CREATE TABLE wikidata_${i}_pages as select wikidata_places.item, wikidata_places.instance_of, wikidata_places.lat, wikidata_places.lon, wb_items_per_site.ips_site_page FROM wikidata_places LEFT JOIN wb_items_per_site ON (CAST (( LTRIM(wikidata_places.item, 'Q')) AS INTEGER) = wb_items_per_site.ips_item_id) WHERE ips_site_id = '${i}wiki' AND LEFT(wikidata_places.item,1) = 'Q' order by wikidata_places.item;" | psqlcmd
echo "ALTER TABLE wikidata_${i}_pages ADD COLUMN language text;" | psqlcmd
echo "UPDATE wikidata_${i}_pages SET language = '${i}';" | psqlcmd
echo "INSERT INTO wikidata_pages SELECT item, instance_of, lat, lon, ips_site_page, language FROM wikidata_${i}_pages;" | psqlcmd
done
echo "ALTER TABLE wikidata_pages ADD COLUMN wp_page_title text;" | psqlcmd
echo "UPDATE wikidata_pages SET wp_page_title = REPLACE(ips_site_page, ' ', '_');" | psqlcmd
echo "ALTER TABLE wikidata_pages DROP COLUMN ips_site_page;" | psqlcmd
# add wikidata to wikipedia_article table
echo "UPDATE wikipedia_article SET lat = wikidata_pages.lat, lon = wikidata_pages.lon, wd_page_title = wikidata_pages.item, instance_of = wikidata_pages.instance_of FROM wikidata_pages WHERE wikipedia_article.language = wikidata_pages.language AND wikipedia_article.title = wikidata_pages.wp_page_title;" | psqlcmd
echo "CREATE TABLE wikipedia_article_slim AS SELECT * FROM wikipedia_article WHERE wikidata_id IS NOT NULL;" | psqlcmd
echo "ALTER TABLE wikipedia_article RENAME TO wikipedia_article_full;" | psqlcmd
echo "ALTER TABLE wikipedia_article_slim RENAME TO wikipedia_article;" | psqlcmd
# clean up intermediate tables
echo "DROP TABLE wikidata_place_dump;" | psqlcmd
echo "DROP TABLE geo_earth_primary;" | psqlcmd
for i in "${language[@]}"
do
echo "DROP TABLE wikidata_${i}_pages;" | psqlcmd
done

View File

@@ -0,0 +1,77 @@
#!/bin/bash
psqlcmd() {
psql wikiprocessingdb
}
mysql2pgsqlcmd() {
./mysql2pgsql.perl /dev/stdin /dev/stdout
}
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
# create wikipedia calculation tables
echo "CREATE TABLE linkcounts (language text, title text, count integer, sumcount integer, lat double precision, lon double precision);" | psqlcmd
echo "CREATE TABLE wikipedia_article (language text NOT NULL, title text NOT NULL, langcount integer, othercount integer, totalcount integer, lat double precision, lon double precision, importance double precision, title_en text, osm_type character(1), osm_id bigint );" | psqlcmd
echo "CREATE TABLE wikipedia_redirect (language text, from_title text, to_title text );" | psqlcmd
# download individual wikipedia language tables
for i in "${language[@]}"
do
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-page.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-pagelinks.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-langlinks.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-redirect.sql.gz
done
# import individual wikipedia language tables
for i in "${language[@]}"
do
gzip -dc ${i}wiki-latest-pagelinks.sql.gz | sed "s/\`pagelinks\`/\`${i}pagelinks\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-page.sql.gz | sed "s/\`page\`/\`${i}page\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-langlinks.sql.gz | sed "s/\`langlinks\`/\`${i}langlinks\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-redirect.sql.gz | sed "s/\`redirect\`/\`${i}redirect\`/g" | mysql2pgsqlcmd | psqlcmd
done
# process language tables and associated pagelink counts
for i in "${language[@]}"
do
echo "create table ${i}pagelinkcount as select pl_title as title,count(*) as count from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
echo "insert into linkcounts select '${i}',pl_title,count(*) from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
echo "insert into wikipedia_redirect select '${i}',page_title,rd_title from ${i}redirect join ${i}page on (rd_from = page_id) where page_namespace = 0 and rd_namespace = 0;" | psqlcmd
echo "alter table ${i}pagelinkcount add column othercount integer;" | psqlcmd
echo "update ${i}pagelinkcount set othercount = 0;" | psqlcmd
for j in "${language[@]}"
do
echo "update ${i}pagelinkcount set othercount = ${i}pagelinkcount.othercount + x.count from (select page_title as title,count from ${i}langlinks join ${i}page on (ll_from = page_id) join ${j}pagelinkcount on (ll_lang = '${j}' and ll_title = title)) as x where x.title = ${i}pagelinkcount.title;" | psqlcmd
done
echo "insert into wikipedia_article select '${i}', title, count, othercount, count+othercount from ${i}pagelinkcount;" | psqlcmd
done
# calculate importance score for each wikipedia page
echo "update wikipedia_article set importance = log(totalcount)/log((select max(totalcount) from wikipedia_article))" | psqlcmd
# clean up intermediate tables to conserve space
for i in "${language[@]}"
do
echo "DROP TABLE ${i}pagelinks;" | psqlcmd
echo "DROP TABLE ${i}page;" | psqlcmd
echo "DROP TABLE ${i}langlinks;" | psqlcmd
echo "DROP TABLE ${i}redirect;" | psqlcmd
echo "DROP TABLE ${i}pagelinkcount;" | psqlcmd
done

View File

@@ -0,0 +1,951 @@
#!/usr/bin/perl -w
# mysql2pgsql
# MySQL to PostgreSQL dump file converter
#
# For usage: perl mysql2pgsql.perl --help
#
# ddl statments are changed but none or only minimal real data
# formatting are done.
# data consistency is up to the DBA.
#
# (c) 2004-2007 Jose M Duarte and Joseph Speigle ... gborg
#
# (c) 2000-2004 Maxim Rudensky <fonin@omnistaronline.com>
# (c) 2000 Valentine Danilchuk <valdan@ziet.zhitomir.ua>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. All advertising materials mentioning features or use of this software
# must display the following acknowledgement:
# This product includes software developed by the Max Rudensky
# and its contributors.
# 4. Neither the name of the author nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
use Getopt::Long;
use POSIX;
use strict;
use warnings;
# main sections
# -------------
# 1 variable declarations
# 2 subroutines
# 3 get commandline options and specify help statement
# 4 loop through file and process
# 5. print_plpgsql function prototype
#################################################################
# 1. variable declarations
#################################################################
# command line options
my( $ENC_IN, $ENC_OUT, $PRESERVE_CASE, $HELP, $DEBUG, $SCHEMA, $LOWERCASE, $CHAR2VARCHAR, $NODROP, $SEP_FILE, $opt_debug, $opt_help, $opt_schema, $opt_preserve_case, $opt_char2varchar, $opt_nodrop, $opt_sepfile, $opt_enc_in, $opt_enc_out );
# variables for constructing pre-create-table entities
my $pre_create_sql=''; # comments, 'enum' constraints preceding create table statement
my $auto_increment_seq= ''; # so we can easily substitute it if we need a default value
my $create_sql=''; # all the datatypes in the create table section
my $post_create_sql=''; # create indexes, foreign keys, table comments
my $function_create_sql = ''; # for the set (function,trigger) and CURRENT_TIMESTAMP ( function,trigger )
# constraints
my ($type, $column_valuesStr, @column_values, $value );
my %constraints=(); # holds values constraints used to emulate mysql datatypes (e.g. year, set)
# datatype conversion variables
my ( $index,$seq);
my ( $column_name, $col, $quoted_column);
my ( @year_holder, $year, $constraint_table_name);
my $table=""; # table_name for create sql statements
my $table_no_quotes=""; # table_name for create sql statements
my $sl = '^\s+\w+\s+'; # matches the column name
my $tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
my $mysql_numeric_datatypes = "TINYINT|SMALLINT|MEDIUMINT|INT|INTEGER|BIGINT|REAL|DOUBLE|FLOAT|DECIMAL|NUMERIC";
my $mysql_datetime_datatypes = "|DATE|TIME|TIMESTAMP|DATETIME|YEAR";
my $mysql_text_datatypes = "CHAR|VARCHAR|BINARY|VARBINARY|TINYBLOB|BLOB|MEDIUMBLOB|LONGBLOB|TINYTEXT|TEXT|MEDIUMTEXT|LONGTEXT|ENUM|SET";
my $mysql_datatypesStr = $mysql_numeric_datatypes . "|". $mysql_datetime_datatypes . "|". $mysql_text_datatypes ;
# handling INSERT INTO statements
my $rowRe = qr{
\( # opening parens
( # (start capture)
(?: # (start group)
' # string start
[^'\\]* # up to string-end or backslash (escape)
(?: # (start group)
\\. # gobble escaped character
[^'\\]* # up to string-end of backslash
)* # (end group, repeat zero or more)
' # string end
| # (OR)
.*? # everything else (not strings)
)* # (end group, repeat zero or more)
) # (end capture)
\) # closing parent
}x;
my ($insert_table, $valueString);
#
########################################################
# 2. subroutines
#
# get_identifier
# print_post_create_sql()
# quote_and_lc()
# make_plpgsql($table,$column_name) -- at end of file
########################################################
# returns an identifier with the given suffix doing controlled
# truncation if necessary
sub get_identifier($$$) {
my ($table, $col, $suffix) = @_;
my $name = '';
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
# in the case of multiple columns
my @cols = split(/,/,$col);
$col =~ s/,//g;
# in case all columns together too long we have to truncate them
if (length($col) > 55) {
my $totaltocut = length($col)-55;
my $tocut = ceil($totaltocut / @cols);
@cols = map {substr($_,0,abs(length($_)-$tocut))} @cols;
$col="";
foreach (@cols){
$col.=$_;
}
}
my $max_table_length = 63 - length("_${col}_$suffix");
if (length($table) > $max_table_length) {
$table = substr($table, length($table) - $max_table_length, $max_table_length);
}
return quote_and_lc("${table}_${col}_${suffix}");
}
#
#
# called when we encounter next CREATE TABLE statement
# also called at EOF to print out for last table
# prints comments, indexes, foreign key constraints (the latter 2 possibly to a separate file)
sub print_post_create_sql() {
my ( @create_idx_comments_constraints_commandsArr, $stmts, $table_field_combination);
my %stmts;
# loop to check for duplicates in $post_create_sql
# Needed because of duplicate key declarations ( PRIMARY KEY and KEY), auto_increment columns
@create_idx_comments_constraints_commandsArr = split(';\n?', $post_create_sql);
if ($SEP_FILE) {
open(SEP_FILE, ">>:encoding($ENC_OUT)", $SEP_FILE) or die "Unable to open $SEP_FILE for output: $!\n";
}
foreach (@create_idx_comments_constraints_commandsArr) {
if (m/CREATE INDEX "*(\S+)"*\s/i) { # CREATE INDEX korean_english_wordsize_idx ON korean_english USING btree (wordsize);
$table_field_combination = $1;
# if this particular table_field_combination was already used do not print the statement:
if ($SEP_FILE) {
print SEP_FILE "$_;\n" if !defined($stmts{$table_field_combination});
} else {
print OUT "$_;\n" if !defined($stmts{$table_field_combination});
}
$stmts{$table_field_combination} = 1;
}
elsif (m/COMMENT/i) { # COMMENT ON object IS 'text'; but comment may be part of table name so use 'elsif'
print OUT "$_;\n"
} else { # foreign key constraint or comments (those preceded by -- )
if ($SEP_FILE) {
print SEP_FILE "$_;\n";
} else {
print OUT "$_;\n"
}
}
}
if ($SEP_FILE) {
close SEP_FILE;
}
$post_create_sql='';
# empty %constraints for next " create table" statement
}
# quotes a string or a multicolumn string (comma separated)
# and optionally lowercase (if LOWERCASE is set)
# lowercase .... if user wants default postgres behavior
# quotes .... to preserve keywords and to preserve case when case-sensitive tables are to be used
sub quote_and_lc($)
{
my $col = shift;
if ($LOWERCASE) {
$col = lc($col);
}
if ($col =~ m/,/) {
my @cols = split(/,\s?/, $col);
@cols = map {"\"$_\""} @cols;
return join(', ', @cols);
} else {
return "\"$col\"";
}
}
########################################################
# 3. get commandline options and maybe print help
########################################################
GetOptions("help", "debug"=> \$opt_debug, "schema=s" => \$SCHEMA, "preserve_case" => \$opt_preserve_case, "char2varchar" => \$opt_char2varchar, "nodrop" => \$opt_nodrop, "sepfile=s" => \$opt_sepfile, "enc_in=s" => \$opt_enc_in, "enc_out=s" => \$opt_enc_out );
$HELP = $opt_help || 0;
$DEBUG = $opt_debug || 0;
$PRESERVE_CASE = $opt_preserve_case || 0;
if ($PRESERVE_CASE == 1) { $LOWERCASE = 0; }
else { $LOWERCASE = 1; }
$CHAR2VARCHAR = $opt_char2varchar || 0;
$NODROP = $opt_nodrop || 0;
$SEP_FILE = $opt_sepfile || 0;
$ENC_IN = $opt_enc_in || 'utf8';
$ENC_OUT = $opt_enc_out || 'utf8';
if (($HELP) || ! defined($ARGV[0]) || ! defined($ARGV[1])) {
print "\n\nUsage: perl $0 {--help --debug --preserve_case --char2varchar --nodrop --schema --sepfile --enc_in --enc_out } mysql.sql pg.sql\n";
print "\t* OPTIONS WITHOUT ARGS\n";
print "\t--help: prints this message \n";
print "\t--debug: output the commented-out mysql line above the postgres line in pg.sql \n";
print "\t--preserve_case: prevents automatic case-lowering of column and table names\n";
print "\t\tIf you want to preserve case, you must set this flag. For example,\n";
print "\t\tIf your client application quotes table and column-names and they have cases in them, set this flag\n";
print "\t--char2varchar: converts all char fields to varchar\n";
print "\t--nodrop: strips out DROP TABLE statements\n";
print "\t\totherise harmless warnings are printed by psql when the dropped table does not exist\n";
print "\n\t* OPTIONS WITH ARGS\n";
print "\t--schema: outputs a line into the postgres sql file setting search_path \n";
print "\t--sepfile: output foreign key constraints and indexes to a separate file so that it can be\n";
print "\t\timported after large data set is inserted from another dump file\n";
print "\t--enc_in: encoding of mysql in file (default utf8) \n";
print "\t--enc_out: encoding of postgres out file (default utf8) \n";
print "\n\t* REQUIRED ARGUMENTS\n";
if (defined ($ARGV[0])) {
print "\tmysql.sql ($ARGV[0])\n";
} else {
print "\tmysql.sql (undefined)\n";
}
if (defined ($ARGV[1])) {
print "\tpg.sql ($ARGV[1])\n";
} else {
print "\tpg.sql (undefined)\n";
}
print "\n";
exit 1;
}
########################################################
# 4. process through mysql_dump.sql file
# in a big loop
########################################################
# open in and out files
open(IN,"<:encoding($ENC_IN)", $ARGV[0]) || die "can't open mysql dump file $ARGV[0]";
open(OUT,">:encoding($ENC_OUT)", $ARGV[1]) || die "can't open pg dump file $ARGV[1]";
# output header
print OUT "--\n";
print OUT "-- Generated from mysql2pgsql.perl\n";
print OUT "-- http://gborg.postgresql.org/project/mysql2psql/\n";
print OUT "-- (c) 2001 - 2007 Jose M. Duarte, Joseph Speigle\n";
print OUT "--\n";
print OUT "\n";
print OUT "-- warnings are printed for drop tables if they do not exist\n";
print OUT "-- please see http://archives.postgresql.org/pgsql-novice/2004-10/msg00158.php\n\n";
print OUT "-- ##############################################################\n";
if ($SCHEMA ) {
print OUT "set search_path='" . $SCHEMA . "'\\g\n" ;
}
# loop through mysql file on a per-line basis
while(<IN>) {
############## flow #########################
# (the lines are directed to different string variables at different times)
#
# handle drop table , unlock, connect statements
# if ( start of create table) {
# print out post_create table (indexes, foreign key constraints, comments from previous table)
# add drop table statement if !$NODROP to pre_create_sql
# next;
# }
# else if ( inside create table) {
# add comments in this portion to create_sql
# if ( end of create table) {
# delete mysql-unique CREATE TABLE commands
# print pre_create_sql
# print the constraint tables for set and year datatypes
# print create_sql
# print function_create_sql (this is for the enum columns only)
# next;
# }
# do substitutions
# -- NUMERIC DATATYPES
# -- CHARACTER DATATYPES
# -- DATE AND TIME DATATYPES
# -- KEY AND UNIQUE CREATIONS
# and append them to create_sql
# } else {
# print inserts on-the-spot (this script only changes default timestamp of 0000-00-00)
# }
# LOOP until EOF
#
########################################################
if (!/^\s*insert into/i) { # not inside create table so don't worry about data corruption
s/`//g; # '`pgsql uses no backticks to denote table name (CREATE TABLE `sd`) or around field
# and table names like mysql
# doh! we hope all dashes and special chars are caught by the regular expressions :)
}
if (/^\s*USE\s*([^;]*);/) {
print OUT "\\c ". $1;
next;
}
if (/^(UN)?LOCK TABLES/i || /drop\s+table/i ) {
# skip
# DROP TABLE is added when we see the CREATE TABLE
next;
}
if (/(create\s+table\s+)([-_\w]+)\s/i) { # example: CREATE TABLE `english_english`
print_post_create_sql(); # for last table
$tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
$create_sql = '';
$table_no_quotes = $2 ;
$table=quote_and_lc($2);
if ( !$NODROP ) { # always print drop table if user doesn't explicitly say not to
# to drop a table that is referenced by a view or a foreign-key constraint of another table,
# CASCADE must be specified. (CASCADE will remove a dependent view entirely, but in the
# in the foreign-key case it will only remove the foreign-key constraint, not the other table entirely.)
# (source: 8.1.3 docs, section "drop table")
warn "table $table will be dropped CASCADE\n";
$pre_create_sql .= "DROP TABLE $table CASCADE;\n"; # custom dumps may be missing the 'dump' commands
}
s/(create\s+table\s+)([-_\w]+)\s/$1 $table /i;
if ($DEBUG) {
$create_sql .= '-- ' . $_;
}
$create_sql .= $_;
next;
}
if ($create_sql ne "") { # we are inside create table statement so lets process datatypes
# print out comments or empty lines in context
if ($DEBUG) {
$create_sql .= '-- ' . $_;
}
if (/^#/ || /^$/ || /^\s*--/) {
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
$create_sql.=$_;
next;
}
if (/\).*;/i) { # end of create table squence
s/INSERT METHOD[=\s+][^;\s]+//i;
s/PASSWORD=[^;\s]+//i;
s/ROW_FORMAT=(?:DEFAULT|DYNAMIC|FIXED|COMPRESSED|REDUNDANT|COMPACT)+//i;
s/KEY_BLOCK_SIZE=8//i;
s/DELAY KEY WRITE=[^;\s]+//i;
s/INDEX DIRECTORY[=\s+][^;\s]+//i;
s/DATA DIRECTORY=[^;\s]+//i;
s/CONNECTION=[^;\s]+//i;
s/CHECKSUM=[^;\s]+//i;
s/Type=[^;\s]+//i; # ISAM , # older versions
s/COLLATE=[^;\s]+//i; # table's collate
s/COLLATE\s+[^;\s]+//i; # table's collate
# possible AUTO_INCREMENT starting index, it is used in mysql 5.0.26, not sure since which version
if (/AUTO_INCREMENT=(\d+)/i) {
# should take < ---- ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1;
# and should ouput ---> CREATE SEQUENCE "rhm_host_info_id_seq" START WITH 16;
my $start_value = $1;
print $auto_increment_seq . "--\n";
# print $pre_create_sql . "--\n";
$pre_create_sql =~ s/(CREATE SEQUENCE $auto_increment_seq )/$1 START WITH $start_value /;
}
s/AUTO_INCREMENT=\d+//i;
s/PACK_KEYS=\d//i; # mysql 5.0.22
s/DEFAULT CHARSET=[^;\s]+//i; # my mysql version is 4.1.11
s/ENGINE\s*=\s*[^;\s]+//i; # my mysql version is 4.1.11
s/ROW_FORMAT=[^;\s]+//i; # my mysql version is 5.0.22
s/KEY_BLOCK_SIZE=8//i;
s/MIN_ROWS=[^;\s]+//i;
s/MAX_ROWS=[^;\s]+//i;
s/AVG_ROW_LENGTH=[^;\s]+//i;
if (/COMMENT='([^']*)'/) { # ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='must be country zones';
$post_create_sql.="COMMENT ON TABLE $table IS '$1'\;"; # COMMENT ON table_name IS 'text';
s/COMMENT='[^']*'//i;
}
$create_sql =~ s/,$//g; # strip last , inside create table
# make sure we end in a comma, as KEY statments are turned
# into post_create_sql indices
# they often are the last line so leaving a 'hanging comma'
my @array = split("\n", $create_sql);
for (my $a = $#array; $a >= 0; $a--) { #loop backwards
if ($a == $#array && $array[$a] =~ m/,\s*$/) { # for last line
$array[$a] =~ s/,\s*$//;
next;
}
if ($array[$a] !~ m/create table/i) { # i.e. if there was more than one column in table
if ($a != $#array && $array[$a] !~ m/,\s*$/ ) { # for second to last
$array[$a] =~ s/$/,/;
last;
}
elsif ($a != $#array && $array[$a] =~ m/,\s*$/ ) { # for second to last
last;
}
}
}
$create_sql = join("\n", @array) . "\n";
$create_sql .= $_;
# put comments out first
print OUT $pre_create_sql;
# create separate table to reference and to hold mysql's possible set data-type
# values. do that table's creation before create table
# definition
foreach $column_name (keys %constraints) {
$type=$constraints{$column_name}{'type'};
$column_valuesStr = $constraints{$column_name}{'values'};
$constraint_table_name = get_identifier(${table},${column_name} ,"constraint_table");
if ($type eq 'set') {
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
print OUT qq~create table $constraint_table_name ( set_values varchar UNIQUE)\\g\n~ ;
$function_create_sql .= make_plpgsql($table,$column_name);
} elsif ($type eq 'year') {
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
print OUT qq~create table $constraint_table_name ( year_values varchar UNIQUE)\\g\n~ ;
}
@column_values = split /,/, $column_valuesStr;
foreach $value (@column_values) {
print OUT qq~insert into $constraint_table_name values ( $value )\\g\n~; # ad ' for ints and varchars
}
}
$create_sql =~ s/double double/double precision/g;
# print create table and reset create table vars
# when moving from each "create table" to "insert" part of dump
print OUT $create_sql;
print OUT $function_create_sql;
$pre_create_sql="";
$auto_increment_seq="";
$create_sql="";
$function_create_sql='';
%constraints=();
# the post_create_sql for this table is output at the beginning of the next table def
# in case we want to make indexes after doing inserting
next;
}
if (/^\s*(\w+)\s+.*COMMENT\s*'([^']*)'/) { #`zone_country_id` int(11) COMMENT 'column comment here',
$quoted_column=quote_and_lc($1);
$post_create_sql.="COMMENT ON COLUMN $table"."."." $quoted_column IS '$2'\;"; # COMMENT ON table_name.column_name IS 'text';
s/COMMENT\s*'[^']*'//i;
}
# NUMERIC DATATYPES
#
# auto_increment -> sequences
# UNSIGNED conversions
# TINYINT
# SMALLINT
# MEDIUMINT
# INT, INTEGER
# BIGINT
#
# DOUBLE [PRECISION], REAL
# DECIMAL(M,D), NUMERIC(M,D)
# FLOAT(p)
# FLOAT
s/(\w*int)\(\d+\)/$1/g; # hack of the (n) stuff for e.g. mediumint(2) int(3)
if (/^(\s*)(\w+)\s*.*numeric.*auto_increment/i) { # int,auto_increment -> serial
$seq = get_identifier($table, $2, 'seq');
$quoted_column=quote_and_lc($2);
# Smash datatype to int8 and autogenerate the sequence.
s/^(\s*)(\w+)\s*.*NUMERIC(.*)auto_increment([^,]*)/$1 $quoted_column serial8 $4/ig;
$create_sql.=$_;
next;
}
if (/^\s*(\w+)\s+.*int.*auto_increment/i) { # example: data_id mediumint(8) unsigned NOT NULL auto_increment,
$seq = get_identifier($table, $1, 'seq');
$quoted_column=quote_and_lc($1);
s/(\s*)(\w+)\s+.*int.*auto_increment([^,]*)/$1 $quoted_column serial8 $3/ig;
$create_sql.=$_;
next;
}
# convert UNSIGNED to CHECK constraints
if (m/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/i) {
$quoted_column = quote_and_lc($2);
s/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/$1 $quoted_column $3 $4 CHECK ($quoted_column >= 0)/i;
}
# example: `wordsize` tinyint(3) unsigned default NULL,
if (m/^(\s+)(\w+)\s+(\w+)\s+unsigned/i) {
$quoted_column=quote_and_lc($2);
s/^(\s+)(\w+)\s+(\w+)\s+unsigned/$1 $quoted_column $3 CHECK ($quoted_column >= 0)/i;
}
if (m/^(\s*)(\w+)\s+(bigint.*)unsigned/) {
$quoted_column=quote_and_lc($2);
# see http://archives.postgresql.org/pgsql-general/2005-07/msg01178.php
# and see http://www.postgresql.org/docs/8.2/interactive/datatype-numeric.html
# see http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html max size == 20 digits
s/^(\s*)(\w+)\s+bigint(.*)unsigned/$1 $quoted_column NUMERIC (20,0) CHECK ($quoted_column >= 0)/i;
}
# int type conversion
# TINYINT (signed) -128 to 127 (unsigned) 0 255
# SMALLINT A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
# MEDIUMINT A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
# INT A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
# BIGINT The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615
# for postgres see http://www.postgresql.org/docs/8.2/static/datatype-numeric.html#DATATYPE-INT
s/^(\s+"*\w+"*\s+)tinyint/$1 smallint/i;
s/^(\s+"*\w+"*\s+)mediumint/$1 integer/i;
# the floating point types
# double -> double precision
# double(n,m) -> double precision
# float - no need for conversion
# float(n) - no need for conversion
# float(n,m) -> double precision
s/(^\s*\w+\s+)double(\(\d+,\d+\))?/$1float/i;
s/float(\(\d+,\d+\))/float/i;
#
# CHARACTER TYPES
#
# set
# enum
# binary(M), VARBINARy(M), tinyblob, tinytext,
# bit
# char(M), varchar(M)
# blob -> text
# mediumblob
# longblob, longtext
# text -> text
# mediumtext
# longtext
# mysql docs: A BLOB is a binary large object that can hold a variable amount of data.
# set
# For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:
# ''
# 'one'
# 'two'
# 'one,two'
if (/(\w*)\s+set\(((?:['"]\w+['"]\s*,*)+(?:['"]\w+['"])*)\)(.*)$/i) { # example: `au_auth` set('r','w','d') NOT NULL default '',
$column_name = $1;
$constraints{$column_name}{'values'} = $2; # 'abc','def', ...
$constraints{$column_name}{'type'} = "set"; # 'abc','def', ...
$_ = qq~ $column_name varchar , ~;
$column_name = quote_and_lc($1);
$create_sql.=$_;
next;
}
if (/(\S*)\s+enum\(((?:['"][^'"]+['"]\s*,)+['"][^'"]+['"])\)(.*)$/i) { # enum handling
# example: `test` enum('?','+','-') NOT NULL default '?'
# $2 is the values of the enum 'abc','def', ...
$quoted_column=quote_and_lc($1);
# "test" NOT NULL default '?' CONSTRAINT test_test_constraint CHECK ("test" IN ('?','+','-'))
$_ = qq~ $quoted_column varchar CHECK ($quoted_column IN ( $2 ))$3\n~; # just assume varchar?
$create_sql.=$_;
next;
}
# Take care of "binary" option for char and varchar
# (pre-4.1.2, it indicated a byte array; from 4.1.2, indicates
# a binary collation)
s/(?:var)?char(?:\(\d+\))? (?:byte|binary)/text/i;
if (m/(?:var)?binary\s*\(\d+\)/i) { # c varBINARY(3) in Mysql
warn "WARNING in table '$table' '$_': binary type is converted to bytea (unsized) for Postgres\n";
}
s/(?:var)?binary(?:\(\d+\))?/text/i; # c varBINARY(3) in Mysql
s/bit(?:\(\d+\))?/bytea/i; # bit datatype -> bytea
# large datatypes
s/\w*blob/bytea/gi;
s/tinytext/text/gi;
s/mediumtext/text/gi;
s/longtext/text/gi;
# char -> varchar -- if specified as a command line option
# PostgreSQL would otherwise pad with spaces as opposed
# to MySQL! Your user interface may depend on this!
if ($CHAR2VARCHAR) {
s/(^\s+\S+\s+)char/${1}varchar/gi;
}
# nuke column's collate and character set
s/(\S+)\s+character\s+set\s+\w+/$1/gi;
s/(\S+)\s+collate\s+\w+/$1/gi;
#
# DATE AND TIME TYPES
#
# date time
# year
# datetime
# timestamp
# date time
# these are the same types in postgres, just do the replacement of 0000-00-00 date
if (m/default '(\d+)-(\d+)-(\d+)([^']*)'/i) { # we grab the year, month and day
# NOTE: times of 00:00:00 are possible and are okay
my $time = '';
my $year=$1;
my $month= $2;
my $day = $3;
if ($4) {
$time = $4;
}
if ($year eq "0000") { $year = '1970'; }
if ($month eq "00") { $month = '01'; }
if ($day eq "00") { $day = '01'; }
s/default '[^']+'/default '$year-$month-$day$time'/i; # finally we replace with $datetime
}
# convert mysql's year datatype to a constraint
if (/(\w*)\s+year\(4\)(.*)$/i) { # can be integer OR string 1901-2155
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
$column_name=quote_and_lc($1);
@year_holder = ();
$year='';
for (1901 .. 2155) {
$year = "'$_'";
unless ($year =~ /2155/) { $year .= ','; }
push( @year_holder, $year);
}
$constraints{$column_name}{'values'} = join('','',@year_holder); # '1901','1902', ...
$constraints{$column_name}{'type'} = "year";
$_ = qq~ $column_name varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
$create_sql.=$_;
next;
} elsif (/(\w*)\s+year\(2\)(.*)$/i) { # same for a 2-integer string
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
$column_name=quote_and_lc($1);
@year_holder = ();
$year='';
for (1970 .. 2069) {
$year = "'$_'";
if ($year =~ /2069/) { next; }
push( @year_holder, $year);
}
push( @year_holder, '0000');
$constraints{$column_name}{'values'} = join(',',@year_holder); # '1971','1972', ...
$constraints{$column_name}{'type'} = "year"; # 'abc','def', ...
$_ = qq~ $1 varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
$create_sql.=$_;
next;
}
# datetime
# Default on a dump from MySQL 5.0.22 is in the same form as datetime so let it flow down
# to the timestamp section and deal with it there
s/(${sl})datetime /$1timestamp without time zone /i;
# change not null datetime field to null valid ones
# (to support remapping of "zero time" to null
# s/($sl)datetime not null/$1timestamp without time zone/i;
# timestamps
#
# nuke datetime representation (not supported in PostgreSQL)
# change default time of 0000-00-00 to 1970-01-01
# we may possibly need to create a trigger to provide
# equal functionality with ON UPDATE CURRENT TIMESTAMP
if (m/${sl}timestamp/i) {
if ( m/ON UPDATE CURRENT_TIMESTAMP/i ) { # the ... default CURRENT_TIMESTAMP only applies for blank inserts, not updates
s/ON UPDATE CURRENT_TIMESTAMP//i ;
m/^\s*(\w+)\s+timestamp/i ;
# automatic trigger creation
$table_no_quotes =~ s/"//g;
$function_create_sql .= " CREATE OR REPLACE FUNCTION update_". $table_no_quotes . "() RETURNS trigger AS '
BEGIN
NEW.$1 := CURRENT_TIMESTAMP;
RETURN NEW;
END;
' LANGUAGE 'plpgsql';
-- before INSERT is handled by 'default CURRENT_TIMESTAMP'
CREATE TRIGGER add_current_date_to_".$table_no_quotes." BEFORE UPDATE ON ". $table . " FOR EACH ROW EXECUTE PROCEDURE
update_".$table_no_quotes."();\n";
}
if ($tables_first_timestamp_column && m/DEFAULT NULL/i) {
# DEFAULT NULL is the same as DEFAULT CURRENT_TIMESTAMP for the first TIMESTAMP column. (MYSQL manual)
s/($sl)(timestamp\s+)default null/$1 $2 DEFAULT CURRENT_TIMESTAMP/i;
}
$tables_first_timestamp_column= 0;
if (m/${sl}timestamp\s*\(\d+\)/i) { # fix for timestamps with width spec not handled (ID: 1628)
warn "WARNING for in table '$table' '$_': your default timestamp width is being ignored for table $table \n";
s/($sl)timestamp(?:\(\d+\))/$1datetime/i;
}
} # end timestamp section
# KEY AND UNIQUE CREATIONS
#
# unique
if ( /^\s+unique\s+\(([^(]+)\)/i ) { # example UNIQUE `name` (`name`), same as UNIQUE KEY
# POSTGRESQL: treat same as mysql unique
$quoted_column = quote_and_lc($1);
s/\s+unique\s+\(([^(]+)\)/ unique ($quoted_column) /i;
$create_sql.=$_;
next;
} elsif ( /^\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/i ) { # example UNIQUE KEY `name` (`name`)
# MYSQL: unique key: allows null=YES, allows duplicates=NO (*)
# ... new ... UNIQUE KEY `unique_fullname` (`fullname`) in my mysql v. Ver 14.12 Distrib 5.1.7-beta
# POSTGRESQL: treat same as mysql unique
# just quote columns
$quoted_column = quote_and_lc($2);
s/\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/ unique ($quoted_column) /i;
$create_sql.=$_;
# the index corresponding to the 'key' is automatically created
next;
}
# keys
if ( /^\s+fulltext key\s+/i) { # example: FULLTEXT KEY `commenttext` (`commenttext`)
# that is key as a word in the first check for a match
# the tsvector datatype is made for these types of things
# example mysql file:
# what is tsvector datatype?
# http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
warn "dba must do fulltext key transformation for $table\n";
next;
}
if ( /^(\s+)constraint (\S+) foreign key \((\S+)\) references (\S+) \((\S+)\)(.*)/i ) {
$quoted_column =quote_and_lc($3);
$col=quote_and_lc($5);
$post_create_sql .= "ALTER TABLE $table ADD FOREIGN KEY ($quoted_column) REFERENCES " . quote_and_lc($4) . " ($col);\n";
next;
}
if ( /^\s*primary key\s*\(([^)]+)\)([,\s]+)/i ) { # example PRIMARY KEY (`name`)
# MYSQL: primary key: allows null=NO , allows duplicates=NO
# POSTGRESQL: When an index is declared unique, multiple table rows with equal indexed values will not be
# allowed. Null values are not considered equal.
# POSTGRESQL quote's source: 8.1.3 docs section 11.5 "unique indexes"
# so, in postgres, we need to add a NOT NULL to the UNIQUE constraint
# and, primary key (mysql) == primary key (postgres) so that we *really* don't need change anything
$quoted_column = quote_and_lc($1);
s/(\s*)primary key\s+\(([^)]+)\)([,\s]+)/$1 primary key ($quoted_column)$3/i;
# indexes are automatically created for unique columns
$create_sql.=$_;
next;
} elsif (m/^\s+key\s[-_\s\w]+\((.+)\)/i ) { # example: KEY `idx_mod_english_def_word` (`word`),
# regular key: allows null=YES, allows duplicates=YES
# MYSQL: KEY is normally a synonym for INDEX. http://dev.mysql.com/doc/refman/5.1/en/create-table.html
#
# * MySQL: ALTER TABLE {$table} ADD KEY $column ($column)
# * PostgreSQL: CREATE INDEX {$table}_$column_idx ON {$table}($column) // Please note the _idx "extension"
# PRIMARY KEY (`postid`),
# KEY `ownerid` (`ownerid`)
# create an index for everything which has a key listed for it.
my $col = $1;
# TODO we don't have a translation for the substring syntax in text columns in MySQL (e.g. "KEY my_idx (mytextcol(20))")
# for now just getting rid of the brackets and numbers (the substring specifier):
$col=~s/\(\d+\)//g;
$quoted_column = quote_and_lc($col);
if ($col =~ m/,/) {
$col = s/,/_/;
}
$index = get_identifier($table, $col, 'idx');
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column)\;";
# just create index do not add to create table statement
next;
}
# handle 'key' declared at end of column
if (/\w+.*primary key/i) { # mysql: key is normally just a synonym for index
# just leave as is ( postgres has primary key type)
} elsif (/(\w+\s+(?:$mysql_datatypesStr)\s+.*)key/i) { # mysql: key is normally just a synonym for index
# I can't find a reference for 'key' in a postgres command without using the word 'primary key'
s/$1key/$1/i ;
$index = get_identifier($table, $1, 'idx');
$quoted_column =quote_and_lc($1);
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column) \;";
$create_sql.=$_;
}
# do we really need this anymore?
# remap colums with names of existing system attribute
if (/"oid"/i) {
s/"oid"/"_oid"/g;
print STDERR "WARNING: table $table uses column \"oid\" which is renamed to \"_oid\"\nYou should fix application manually! Press return to continue.";
my $wait=<STDIN>;
}
s/oid/_oid/i if (/key/i && /oid/i); # fix oid in key
# FINAL QUOTING OF ALL COLUMNS
# quote column names which were not already quoted
# perhaps they were not quoted because they were not explicitly handled
if (!/^\s*"(\w+)"(\s+)/i) {
/^(\s*)(\w+)(\s+)(.*)$/i ;
$quoted_column= quote_and_lc($2);
s/^(\s*)(\w+)(\s+)(.*)$/$1 $quoted_column $3 $4 /;
}
$create_sql.=$_;
# END of if ($create_sql ne "") i.e. were inside create table statement so processed datatypes
}
# add "not in create table" comments or empty lines to pre_create_sql
elsif (/^#/ || /^$/ || /^\s*--/) {
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
$pre_create_sql .= $_ ; # printed above create table statement
next;
}
elsif (/^\s*insert into/i) { # not inside create table and doing insert
# fix mysql's zero/null value for timestamps
s/'0000-00-00/'1970-01-01/gi;
# commented out to fix bug "Field contents interpreted as a timestamp", what was the point of this line anyway?
#s/([12]\d\d\d)([01]\d)([0-3]\d)([0-2]\d)([0-6]\d)([0-6]\d)/'$1-$2-$3 $4:$5:$6'/;
#---- fix data in inserted data: (from MS world)
s!\x96!-!g; # --
s!\x93!"!g; # ``
s!\x94!"!g; # ''
s!\x85!... !g; # \ldots
s!\x92!`!g;
print OUT $pre_create_sql; # print comments preceding the insert section
$pre_create_sql="";
$auto_increment_seq = "";
s/'((?:[^'\\]++|\\.)*+)'(?=[),])/E'$1'/g;
# for the E'' see http://www.postgresql.org/docs/8.2/interactive/release-8-1.html
s!\\\\!\\\\\\\\!g; # replace \\ with ]\\\\
# split 'extended' INSERT INTO statements to something PostgreSQL can understand
( $insert_table, $valueString) = $_ =~ m/^INSERT\s+INTO\s+['`"]*(.*?)['`"]*\s+VALUES\s*(.*)/i;
$insert_table = quote_and_lc($insert_table);
s/^INSERT INTO.*?\);//i; # hose the statement which is to be replaced whether a run-on or not
# guarantee table names are quoted
print OUT qq(INSERT INTO $insert_table VALUES $valueString \n);
} else {
print OUT $_ ; # example: /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
}
# keep looping and get next line of IN file
} # END while(<IN>)
print_post_create_sql(); # in case there is extra from the last table
#################################################################
# 5. print_plgsql function prototype
# emulate the set datatype with the following plpgsql function
# looks ugly so putting at end of file
#################################################################
#
sub make_plpgsql {
my ($table,$column_name) = ($_[0],$_[1]);
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
my $constraint_table = get_identifier($table,$column_name ,"constraint_table");
return "
-- this function is called by the insert/update trigger
-- it checks if the INSERT/UPDATE for the 'set' column
-- contains members which comprise a valid mysql set
-- this TRIGGER function therefore acts like a constraint
-- provided limited functionality for mysql's set datatype
-- just verifies and matches for string representations of the set at this point
-- though the set datatype uses bit comparisons, the only supported arguments to our
-- set datatype are VARCHAR arguments
-- to add a member to the set add it to the ".$table."_".$column_name." table
CREATE OR REPLACE FUNCTION check_".$table."_".$column_name."_set( ) RETURNS TRIGGER AS \$\$\n
DECLARE
----
arg_str VARCHAR ;
argx VARCHAR := '';
nobreak INT := 1;
rec_count INT := 0;
psn INT := 0;
str_in VARCHAR := NEW.$column_name;
----
BEGIN
----
IF str_in IS NULL THEN RETURN NEW ; END IF;
arg_str := REGEXP_REPLACE(str_in, '\\',\\'', ','); -- str_in is CONSTANT
arg_str := REGEXP_REPLACE(arg_str, '^\\'', '');
arg_str := REGEXP_REPLACE(arg_str, '\\'\$', '');
-- RAISE NOTICE 'arg_str %',arg_str;
psn := POSITION(',' in arg_str);
IF psn > 0 THEN
psn := psn - 1; -- minus-1 from comma position
-- RAISE NOTICE 'psn %',psn;
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
psn := psn + 2; -- go to first starting letter
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
ELSE
psn := 0; -- minus-1 from comma position
argx := arg_str;
END IF;
-- RAISE NOTICE 'argx %',argx;
-- RAISE NOTICE 'new arg_str: %',arg_str;
WHILE nobreak LOOP
EXECUTE 'SELECT count(*) FROM $constraint_table WHERE set_values = ' || quote_literal(argx) INTO rec_count;
IF rec_count = 0 THEN RAISE EXCEPTION 'one of the set values was not found';
END IF;
IF psn > 0 THEN
psn := psn - 1; -- minus-1 from comma position
-- RAISE NOTICE 'psn %',psn;
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
psn := psn + 2; -- go to first starting letter
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
psn := POSITION(',' in arg_str);
ELSE nobreak = 0;
END IF;
-- RAISE NOTICE 'next argx % and next arg_str %', argx, arg_str;
END LOOP;
RETURN NEW;
----
END;
\$\$ LANGUAGE 'plpgsql' VOLATILE;
drop trigger set_test ON $table;
-- make a trigger for each set field
-- make trigger and hard-code in column names
-- see http://archives.postgresql.org/pgsql-interfaces/2005-02/msg00020.php
CREATE TRIGGER set_test
BEFORE INSERT OR UPDATE ON $table FOR EACH ROW
EXECUTE PROCEDURE check_".$table."_".$column_name."_set();\n";
} # end sub make_plpgsql();

View File

@@ -0,0 +1,199 @@
place_type,level
Q9842,4
Q9430,3
Q928830,4
Q9259,1
Q91028,5
Q8514,2
Q8502,2
Q83405,3
Q82794,2
Q820477,1
Q811979,1
Q8072,2
Q79007,2
Q786014,3
Q75848,2
Q75520,2
Q728937,4
Q7275,2
Q719456,3
Q7075,3
Q697295,4
Q6852233,2
Q682943,3
Q665487,5
Q655686,3
Q643589,5
Q641226,2
Q631305,2
Q6256,2
Q6023295,2
Q5773747,5
Q56061,1
Q55659167,4
Q55488,4
Q55465477,3
Q54050,2
Q532,3
Q53060,2
Q52177058,4
Q515716,5
Q5153984,4
Q515,3
Q5144960,5
Q5119,4
Q5119,4
Q5107,2
Q5084,4
Q5031071,4
Q5003624,2
Q4989906,1
Q4976993,3
Q486972,1
Q486972,2
Q483110,3
Q4830453,4
Q47521,3
Q473972,1
Q46831,2
Q46614560,5
Q44782,3
Q44613,4
Q44539,4
Q44494,2
Q44377,2
Q4421,2
Q43501,2
Q4286337,3
Q42523,3
Q41176,2
Q40357,3
Q4022,4
Q40080,2
Q39816,2
Q39715,3
Q39614,1
Q3957,3
Q3947,4
Q3914,3
Q38723,2
Q38720,3
Q3623867,5
Q35666,2
Q355304,3
Q35509,2
Q35112127,3
Q34985575,4
Q34876,5
Q34763,2
Q34627,4
Q3455524,3
Q34442,4
Q33837,2
Q33506,3
Q32815,4
Q3257686,2
Q3240715,2
Q3191695,5
Q3153117,2
Q30198,2
Q30139652,3
Q294422,3
Q2870166,3
Q27686,3
Q274153,3
Q271669,1
Q2659904,2
Q24529780,2
Q24354,3
Q2354973,4
Q23442,2
Q23413,3
Q23397,3
Q2327515,4
Q2311958,5
Q22927291,6
Q22698,1
Q2175765,4
Q205495,4
Q204832,3
Q2042028,2
Q202216,6
Q1970725,3
Q194203,5
Q194195,2
Q190429,2
Q185187,3
Q185113,2
Q183366,2
Q1799794,1
Q1788454,4
Q1785071,3
Q1777138,3
Q177634,2
Q177380,2
Q174814,4
Q174782,2
Q17350442,2
Q17343829,3
Q17334923,0
Q17018380,3
Q16970,4
Q16917,3
Q16831714,4
Q165,3
Q160742,4
Q159719,3
Q159334,4
Q15640612,5
Q15324,2
Q15284,5
Q15243209,6
Q152081,1
Q15195406,4
Q1500350,5
Q149621,5
Q14757767,4
Q14350,3
Q1410668,3
Q1394476,3
Q1377575,2
Q1353183,3
Q134447,4
Q133215,3
Q133056,2
Q13221722,3
Q13220204,2
Q1311958,4
Q1303167,3
Q130003,3
Q12518,2
Q12516,3
Q1248784,3
Q123705,3
Q12323,3
Q12284,4
Q12280,4
Q121359,2
Q1210950,2
Q11755880,3
Q11707,3
Q11315,3
Q11303,3
Q1115575,4
Q1107656,1
Q10864048,1
Q1076486,2
Q105731,3
Q105190,3
Q1048525,3
Q102496,5
Q28872924,1
Q15617994,1
Q159313,2
Q24398318,3
Q327333,2
Q43229,1
Q860861,1
Q4989906,1
1 place_type level
2 Q9842 4
3 Q9430 3
4 Q928830 4
5 Q9259 1
6 Q91028 5
7 Q8514 2
8 Q8502 2
9 Q83405 3
10 Q82794 2
11 Q820477 1
12 Q811979 1
13 Q8072 2
14 Q79007 2
15 Q786014 3
16 Q75848 2
17 Q75520 2
18 Q728937 4
19 Q7275 2
20 Q719456 3
21 Q7075 3
22 Q697295 4
23 Q6852233 2
24 Q682943 3
25 Q665487 5
26 Q655686 3
27 Q643589 5
28 Q641226 2
29 Q631305 2
30 Q6256 2
31 Q6023295 2
32 Q5773747 5
33 Q56061 1
34 Q55659167 4
35 Q55488 4
36 Q55465477 3
37 Q54050 2
38 Q532 3
39 Q53060 2
40 Q52177058 4
41 Q515716 5
42 Q5153984 4
43 Q515 3
44 Q5144960 5
45 Q5119 4
46 Q5119 4
47 Q5107 2
48 Q5084 4
49 Q5031071 4
50 Q5003624 2
51 Q4989906 1
52 Q4976993 3
53 Q486972 1
54 Q486972 2
55 Q483110 3
56 Q4830453 4
57 Q47521 3
58 Q473972 1
59 Q46831 2
60 Q46614560 5
61 Q44782 3
62 Q44613 4
63 Q44539 4
64 Q44494 2
65 Q44377 2
66 Q4421 2
67 Q43501 2
68 Q4286337 3
69 Q42523 3
70 Q41176 2
71 Q40357 3
72 Q4022 4
73 Q40080 2
74 Q39816 2
75 Q39715 3
76 Q39614 1
77 Q3957 3
78 Q3947 4
79 Q3914 3
80 Q38723 2
81 Q38720 3
82 Q3623867 5
83 Q35666 2
84 Q355304 3
85 Q35509 2
86 Q35112127 3
87 Q34985575 4
88 Q34876 5
89 Q34763 2
90 Q34627 4
91 Q3455524 3
92 Q34442 4
93 Q33837 2
94 Q33506 3
95 Q32815 4
96 Q3257686 2
97 Q3240715 2
98 Q3191695 5
99 Q3153117 2
100 Q30198 2
101 Q30139652 3
102 Q294422 3
103 Q2870166 3
104 Q27686 3
105 Q274153 3
106 Q271669 1
107 Q2659904 2
108 Q24529780 2
109 Q24354 3
110 Q2354973 4
111 Q23442 2
112 Q23413 3
113 Q23397 3
114 Q2327515 4
115 Q2311958 5
116 Q22927291 6
117 Q22698 1
118 Q2175765 4
119 Q205495 4
120 Q204832 3
121 Q2042028 2
122 Q202216 6
123 Q1970725 3
124 Q194203 5
125 Q194195 2
126 Q190429 2
127 Q185187 3
128 Q185113 2
129 Q183366 2
130 Q1799794 1
131 Q1788454 4
132 Q1785071 3
133 Q1777138 3
134 Q177634 2
135 Q177380 2
136 Q174814 4
137 Q174782 2
138 Q17350442 2
139 Q17343829 3
140 Q17334923 0
141 Q17018380 3
142 Q16970 4
143 Q16917 3
144 Q16831714 4
145 Q165 3
146 Q160742 4
147 Q159719 3
148 Q159334 4
149 Q15640612 5
150 Q15324 2
151 Q15284 5
152 Q15243209 6
153 Q152081 1
154 Q15195406 4
155 Q1500350 5
156 Q149621 5
157 Q14757767 4
158 Q14350 3
159 Q1410668 3
160 Q1394476 3
161 Q1377575 2
162 Q1353183 3
163 Q134447 4
164 Q133215 3
165 Q133056 2
166 Q13221722 3
167 Q13220204 2
168 Q1311958 4
169 Q1303167 3
170 Q130003 3
171 Q12518 2
172 Q12516 3
173 Q1248784 3
174 Q123705 3
175 Q12323 3
176 Q12284 4
177 Q12280 4
178 Q121359 2
179 Q1210950 2
180 Q11755880 3
181 Q11707 3
182 Q11315 3
183 Q11303 3
184 Q1115575 4
185 Q1107656 1
186 Q10864048 1
187 Q1076486 2
188 Q105731 3
189 Q105190 3
190 Q1048525 3
191 Q102496 5
192 Q28872924 1
193 Q15617994 1
194 Q159313 2
195 Q24398318 3
196 Q327333 2
197 Q43229 1
198 Q860861 1
199 Q4989906 1

View File

@@ -0,0 +1,195 @@
Q9842
Q9430
Q928830
Q9259
Q91028
Q8514
Q8502
Q83405
Q82794
Q820477
Q811979
Q8072
Q79007
Q786014
Q75848
Q75520
Q728937
Q7275
Q719456
Q7075
Q697295
Q6852233
Q682943
Q665487
Q655686
Q643589
Q641226
Q631305
Q6256
Q6023295
Q5773747
Q56061
Q55659167
Q55488
Q55465477
Q54050
Q532
Q53060
Q52177058
Q515716
Q5153984
Q515
Q5144960
Q5119
Q5107
Q5084
Q5031071
Q5003624
Q4989906
Q4976993
Q486972
Q483110
Q4830453
Q47521
Q473972
Q46831
Q46614560
Q44782
Q44613
Q44539
Q44494
Q44377
Q4421
Q43501
Q4286337
Q42523
Q41176
Q40357
Q4022
Q40080
Q39816
Q39715
Q39614
Q3957
Q3947
Q3914
Q38723
Q38720
Q3623867
Q35666
Q355304
Q35509
Q35112127
Q34985575
Q34876
Q34763
Q34627
Q3455524
Q34442
Q33837
Q33506
Q32815
Q3257686
Q3240715
Q3191695
Q3153117
Q30198
Q30139652
Q294422
Q2870166
Q27686
Q274153
Q271669
Q2659904
Q24529780
Q24354
Q2354973
Q23442
Q23413
Q23397
Q2327515
Q2311958
Q22927291
Q22698
Q2175765
Q205495
Q204832
Q2042028
Q202216
Q1970725
Q194203
Q194195
Q190429
Q185187
Q185113
Q183366
Q1799794
Q1788454
Q1785071
Q1777138
Q177634
Q177380
Q174814
Q174782
Q17350442
Q17343829
Q17334923
Q17018380
Q16970
Q16917
Q16831714
Q165
Q160742
Q159719
Q159334
Q15640612
Q15324
Q15284
Q15243209
Q152081
Q15195406
Q1500350
Q149621
Q14757767
Q14350
Q1410668
Q1394476
Q1377575
Q1353183
Q134447
Q133215
Q133056
Q13221722
Q13220204
Q1311958
Q1303167
Q130003
Q12518
Q12516
Q1248784
Q123705
Q12323
Q12284
Q12280
Q121359
Q1210950
Q11755880
Q11707
Q11315
Q11303
Q1115575
Q1107656
Q10864048
Q1076486
Q105731
Q105190
Q1048525
Q102496
Q28872924
Q15617994
Q159313
Q24398318
Q327333
Q43229
Q860861

View File

@@ -0,0 +1,200 @@
## Wikidata place types and related OSM Tags
Wikidata does not have any official ontologies, however the [DBpedia project](https://wiki.dbpedia.org/) has created an [ontology](https://wiki.dbpedia.org/services-resources/ontology) that covered [place types](http://mappings.dbpedia.org/server/ontology/classes/#Place). The table below used the DBpedia place ontology as a starting point, and is provided as a cross-reference to the relevant OSM tags.
The Wikidata place types listed in the table below can be used in conjunction with the [Wikidata Query Service](https://query.wikidata.org/) to retrieve instances of those place types from the Wikidata knowledgebase.
```
SELECT ?item ?lat ?lon
WHERE {
?item wdt:P31*/wdt:P279*wd:Q9430; wdt:P625 ?pt.
?item p:P625?loc.
?loc psv:P625?cnode.
?cnode wikibase:geoLatitude?lat.
?cnode wikibase:geoLongitude?lon.
}
```
An example json return for all instances of the Wikidata item "Q9430" (Ocean) can be seen at [json](https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT?item?lat?lon%20WHERE{?item%20wdt:P31*/wdt:P279*wd:Q9430;wdt:P625?pt.?item%20p:P625?loc.?loc%20psv:P625?cnode.?cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
**NOTE** the OSM tags listed are those listed in the wikidata entries, and not all the possible matches for tags within OSM.
title | concept | OSM Tag |
-----------|---------------------------------------|------------------|
[Q17334923](https://www.wikidata.org/entity/Q17334923) | Location | |
[Q811979](https://www.wikidata.org/entity/Q811979) | Architectural Structure | |
[Q194195](https://www.wikidata.org/entity/Q194195) | Amusement park |
[Q204832](https://www.wikidata.org/entity/Q204832) | Roller coaster | [attraction=roller_coaster](https://wiki.openstreetmap.org/wiki/Tag:attraction=roller_coaster) |
[Q2870166](https://www.wikidata.org/entity/Q2870166) | Water ride | |
[Q641226](https://www.wikidata.org/entity/Q641226) | Arena | [amenity=events_centre](https://wiki.openstreetmap.org/wiki/Tag:amenity=events_centre) |
[Q41176](https://www.wikidata.org/entity/Q41176) | Building | [building=yes](https://wiki.openstreetmap.org/wiki/Key:building) |
[Q1303167](https://www.wikidata.org/entity/Q1303167) | Barn | [building=barn](https://wiki.openstreetmap.org/wiki/Tag:building=barn) |
[Q655686](https://www.wikidata.org/entity/Q655686) | Commercial building | [building=commercial](https://wiki.openstreetmap.org/wiki/Tag:building=commercial) |
[Q4830453](https://www.wikidata.org/entity/Q4830453) | Business | |
[Q7075](https://www.wikidata.org/entity/Q7075) | Library | [amenity=library](https://wiki.openstreetmap.org/wiki/Tag:amenity=library) |
[Q133215](https://www.wikidata.org/entity/Q133215) | Casino | [amenity=casino](https://wiki.openstreetmap.org/wiki/Tag:amenity=casino) |
[Q23413](https://www.wikidata.org/entity/Q23413) | Castle | [historic=castle](https://wiki.openstreetmap.org/wiki/Tag:historic=castle) |
[Q83405](https://www.wikidata.org/entity/Q83405) | Factory | |
[Q53060](https://www.wikidata.org/entity/Q53060) | Gate | [barrier=gate](https://wiki.openstreetmap.org/wiki/Tag:barrier=gate) |cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
[Q11755880](https://www.wikidata.org/entity/Q11755880) | Residential Building | [building=residential](https://wiki.openstreetmap.org/wiki/Tag:building=residential) |
[Q3947](https://www.wikidata.org/entity/Q3947) | House | [building=house](https://wiki.openstreetmap.org/wiki/Tag:building=house) |
[Q35112127](https://www.wikidata.org/entity/Q35112127) | Historic Building | |
[Q5773747](https://www.wikidata.org/entity/Q5773747) | Historic house | |
[Q38723](https://www.wikidata.org/entity/Q38723) | Higher Education Institution |
[Q3914](https://www.wikidata.org/entity/Q3914) | School | [amenity=school](https://wiki.openstreetmap.org/wiki/Tag:amenity=school) |
[Q9842](https://www.wikidata.org/entity/Q9842) | Primary school | |
[Q159334](https://www.wikidata.org/entity/Q159334) | Secondary school | |
[Q16917](https://www.wikidata.org/entity/Q16917) | Hospital | [amenity=hospital](https://wiki.openstreetmap.org/wiki/Tag:amenity=hospital), [healthcare=hospital](https://wiki.openstreetmap.org/wiki/Tag:healthcare=hospital), [building=hospital](https://wiki.openstreetmap.org/wiki/Tag:building=hospital) |
[Q27686](https://www.wikidata.org/entity/Q27686) | Hotel | [tourism=hotel](https://wiki.openstreetmap.org/wiki/Tag:tourism=hotel), [building=hotel](https://wiki.openstreetmap.org/wiki/Tag:building=hotel) |
[Q33506](https://www.wikidata.org/entity/Q33506) | Museum | [tourism=museum](https://wiki.openstreetmap.org/wiki/Tag:tourism=museum) |
[Q40357](https://www.wikidata.org/entity/Q40357) | Prison | [amenity=prison](https://wiki.openstreetmap.org/wiki/Tag:amenity=prison) |
[Q24398318](https://www.wikidata.org/entity/Q24398318) | Religious Building | |
[Q160742](https://www.wikidata.org/entity/Q160742) | Abbey | |
[Q16970](https://www.wikidata.org/entity/Q16970) | Church (building) | [building=church](https://wiki.openstreetmap.org/wiki/Tag:building=church) |
[Q44613](https://www.wikidata.org/entity/Q44613) | Monastery | [amenity=monastery](https://wiki.openstreetmap.org/wiki/Tag:amenity=monastery) |
[Q32815](https://www.wikidata.org/entity/Q32815) | Mosque | [building=mosque](https://wiki.openstreetmap.org/wiki/Tag:building=mosque) |
[Q697295](https://www.wikidata.org/entity/Q697295) | Shrine | [building=shrine](https://wiki.openstreetmap.org/wiki/Tag:building=shrine) |
[Q34627](https://www.wikidata.org/entity/Q34627) | Synagogue | [building=synagogue](https://wiki.openstreetmap.org/wiki/Tag:building=synagogue) |
[Q44539](https://www.wikidata.org/entity/Q44539) | Temple | [building=temple](https://wiki.openstreetmap.org/wiki/Tag:building=temple) |
[Q11707](https://www.wikidata.org/entity/Q11707) | Restaurant | [amenity=restaurant](https://wiki.openstreetmap.org/wiki/Tag:amenity=restaurant) |
[Q11315](https://www.wikidata.org/entity/Q11315) | Shopping mall | [shop=mall](https://wiki.openstreetmap.org/wiki/Tag:shop=mall), [shop=shopping_centre](https://wiki.openstreetmap.org/wiki/Tag:shop=shopping_centre) |
[Q11303](https://www.wikidata.org/entity/Q11303) | Skyscraper | |
[Q17350442](https://www.wikidata.org/entity/Q17350442) | Venue | |
[Q41253](https://www.wikidata.org/entity/Q41253) | Movie Theater | [amenity=cinema](https://wiki.openstreetmap.org/wiki/Tag:amenity=cinema) |
[Q483110](https://www.wikidata.org/entity/Q483110) | Stadium | [leisure=stadium](https://wiki.openstreetmap.org/wiki/Tag:leisure=stadium), [building=stadium](https://wiki.openstreetmap.org/wiki/Tag:building=stadium) |
[Q24354](https://www.wikidata.org/entity/Q24354) | Theater (structure) | [amenity=theatre](https://wiki.openstreetmap.org/wiki/Tag:amenity=theatre) |
[Q121359](https://www.wikidata.org/entity/Q121359) | Infrastructure | |
[Q1248784](https://www.wikidata.org/entity/Q1248784) | Airport | |
[Q12323](https://www.wikidata.org/entity/Q12323) | Dam | [waterway=dam](https://wiki.openstreetmap.org/wiki/Tag:waterway=dam) |
[Q1353183](https://www.wikidata.org/entity/Q1353183) | Launch pad | |
[Q105190](https://www.wikidata.org/entity/Q105190) | Levee | [man_made=dyke](https://wiki.openstreetmap.org/wiki/Tag:man_made=dyke) |
[Q105731](https://www.wikidata.org/entity/Q105731) | Lock (water navigation) | [lock=yes](https://wiki.openstreetmap.org/wiki/Key:lock) |
[Q44782](https://www.wikidata.org/entity/Q44782) | Port | |
[Q159719](https://www.wikidata.org/entity/Q159719) | Power station | [power=plant](https://wiki.openstreetmap.org/wiki/Tag:power=plant) |
[Q174814](https://www.wikidata.org/entity/Q174814) | Electrical substation | |
[Q134447](https://www.wikidata.org/entity/Q134447) | Nuclear power plant | [plant:source=nuclear](https://wiki.openstreetmap.org/wiki/Tag:plant:source=nuclear) |
[Q786014](https://www.wikidata.org/entity/Q786014) | Rest area | [highway=rest_area](https://wiki.openstreetmap.org/wiki/Tag:highway=rest_area), [highway=services](https://wiki.openstreetmap.org/wiki/Tag:highway=services) |
[Q12280](https://www.wikidata.org/entity/Q12280) | Bridge | [bridge=* ](https://wiki.openstreetmap.org/wiki/Key:bridge), [man_made=bridge](https://wiki.openstreetmap.org/wiki/Tag:man_made=bridge) |
[Q728937](https://www.wikidata.org/entity/Q728937) | Railroad Line | [railway=rail](https://wiki.openstreetmap.org/wiki/Tag:railway=rail) |
[Q1311958](https://www.wikidata.org/entity/Q1311958) | Railway Tunnel | |
[Q34442](https://www.wikidata.org/entity/Q34442) | Road | [highway=* ](https://wiki.openstreetmap.org/wiki/Key:highway), [route=road](https://wiki.openstreetmap.org/wiki/Tag:route=road) |
[Q1788454](https://www.wikidata.org/entity/Q1788454) | Road junction | |
[Q44377](https://www.wikidata.org/entity/Q44377) | Tunnel | [tunnel=* ](https://wiki.openstreetmap.org/wiki/Key:tunnel) |
[Q5031071](https://www.wikidata.org/entity/Q5031071) | Canal tunnel | |
[Q719456](https://www.wikidata.org/entity/Q719456) | Station | [public_transport=station](https://wiki.openstreetmap.org/wiki/Tag:public_transport=station) |
[Q205495](https://www.wikidata.org/entity/Q205495) | Filling station | [amenity=fuel](https://wiki.openstreetmap.org/wiki/Tag:amenity=fuel) |
[Q928830](https://www.wikidata.org/entity/Q928830) | Metro station | [station=subway](https://wiki.openstreetmap.org/wiki/Tag:station=subway) |
[Q55488](https://www.wikidata.org/entity/Q55488) | Train station | [railway=station](https://wiki.openstreetmap.org/wiki/Tag:railway=station) |
[Q2175765](https://www.wikidata.org/entity/Q2175765) | Tram stop | [railway=tram_stop](https://wiki.openstreetmap.org/wiki/Tag:railway=tram_stop), [public_transport=stop_position](https://wiki.openstreetmap.org/wiki/Tag:public_transport=stop_position) |
[Q6852233](https://www.wikidata.org/entity/Q6852233) | Military building | |
[Q44494](https://www.wikidata.org/entity/Q44494) | Mill (grinding) | |
[Q185187](https://www.wikidata.org/entity/Q185187) | Watermill | [man_made=watermill](https://wiki.openstreetmap.org/wiki/Tag:man_made=watermill) |
[Q38720](https://www.wikidata.org/entity/Q38720) | Windmill | [man_made=windmill](https://wiki.openstreetmap.org/wiki/Tag:man_made=windmill) |
[Q4989906](https://www.wikidata.org/entity/Q4989906) | Monument | [historic=monument](https://wiki.openstreetmap.org/wiki/Tag:historic=monument) |
[Q5003624](https://www.wikidata.org/entity/Q5003624) | Memorial | [historic=memorial](https://wiki.openstreetmap.org/wiki/Tag:historic=memorial) |
[Q271669](https://www.wikidata.org/entity/Q271669) | Landform | |
[Q190429](https://www.wikidata.org/entity/Q190429) | Depression (geology) | |
[Q17018380](https://www.wikidata.org/entity/Q17018380) | Bight (geography) | |
[Q54050](https://www.wikidata.org/entity/Q54050) | Hill | |
[Q1210950](https://www.wikidata.org/entity/Q1210950) | Channel (geography) | |
[Q23442](https://www.wikidata.org/entity/Q23442) | Island | [place=island](https://wiki.openstreetmap.org/wiki/Tag:place=island) |
[Q42523](https://www.wikidata.org/entity/Q42523) | Atoll | |
[Q34763](https://www.wikidata.org/entity/Q34763) | Peninsula | |
[Q355304](https://www.wikidata.org/entity/Q355304) | Watercourse | |
[Q30198](https://www.wikidata.org/entity/Q30198) | Marsh | [wetland=marsh](https://wiki.openstreetmap.org/wiki/Tag:wetland=marsh) |
[Q75520](https://www.wikidata.org/entity/Q75520) | Plateau | |
[Q2042028](https://www.wikidata.org/entity/Q2042028) | Ravine | |
[Q631305](https://www.wikidata.org/entity/Q631305) | Rock formation | |
[Q12516](https://www.wikidata.org/entity/Q12516) | Pyramid | |
[Q1076486](https://www.wikidata.org/entity/Q1076486) | Sports venue | |
[Q682943](https://www.wikidata.org/entity/Q682943) | Cricket field | [sport=cricket](https://wiki.openstreetmap.org/wiki/Tag:sport=cricket) |
[Q1048525](https://www.wikidata.org/entity/Q1048525) | Golf course | [leisure=golf_course](https://wiki.openstreetmap.org/wiki/Tag:leisure=golf_course) |
[Q1777138](https://www.wikidata.org/entity/Q1777138) | Race track | [highway=raceway](https://wiki.openstreetmap.org/wiki/Tag:highway=raceway) |
[Q130003](https://www.wikidata.org/entity/Q130003) | Ski resort | |
[Q174782](https://www.wikidata.org/entity/Q174782) | Town square | [place=square](https://wiki.openstreetmap.org/wiki/Tag:place=square) |
[Q12518](https://www.wikidata.org/entity/Q12518) | Tower | [building=tower](https://wiki.openstreetmap.org/wiki/Tag:building=tower), [man_made=tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=tower) |
[Q39715](https://www.wikidata.org/entity/Q39715) | Lighthouse | [man_made=lighthouse](https://wiki.openstreetmap.org/wiki/Tag:man_made=lighthouse) |
[Q274153](https://www.wikidata.org/entity/Q274153) | Water tower | [building=water_tower](https://wiki.openstreetmap.org/wiki/Tag:building=water_tower), [man_made=water_tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=water_tower) |
[Q43501](https://www.wikidata.org/entity/Q43501) | Zoo | [tourism=zoo](https://wiki.openstreetmap.org/wiki/Tag:tourism=zoo) |
[Q39614](https://www.wikidata.org/entity/Q39614) | Cemetery | [amenity=grave_yard](https://wiki.openstreetmap.org/wiki/Tag:amenity=grave_yard), [landuse=cemetery](https://wiki.openstreetmap.org/wiki/Tag:landuse=cemetery) |
[Q152081](https://www.wikidata.org/entity/Q152081) | Concentration camp | |
[Q1107656](https://www.wikidata.org/entity/Q1107656) | Garden | [leisure=garden](https://wiki.openstreetmap.org/wiki/Tag:leisure=garden) |
[Q820477](https://www.wikidata.org/entity/Q820477) | Mine | |
[Q33837](https://www.wikidata.org/entity/Q33837) | Archipelago | [place=archipelago](https://wiki.openstreetmap.org/wiki/Tag:place=archipelago) |
[Q40080](https://www.wikidata.org/entity/Q40080) | Beach | [natural=beach](https://wiki.openstreetmap.org/wiki/Tag:natural=beach) |
[Q15324](https://www.wikidata.org/entity/Q15324) | Body of water | [natural=water](https://wiki.openstreetmap.org/wiki/Tag:natural=water) |
[Q23397](https://www.wikidata.org/entity/Q23397) | Lake | [water=lake](https://wiki.openstreetmap.org/wiki/Tag:water=lake) |
[Q9430](https://www.wikidata.org/entity/Q9430) | Ocean | |
[Q165](https://www.wikidata.org/entity/Q165) | Sea | |
[Q47521](https://www.wikidata.org/entity/Q47521) | Stream | |
[Q12284](https://www.wikidata.org/entity/Q12284) | Canal | [waterway=canal](https://wiki.openstreetmap.org/wiki/Tag:waterway=canal) |
[Q4022](https://www.wikidata.org/entity/Q4022) | River | [waterway=river](https://wiki.openstreetmap.org/wiki/Tag:waterway=river), [type=waterway](https://wiki.openstreetmap.org/wiki/Relation:waterway) |
[Q185113](https://www.wikidata.org/entity/Q185113) | Cape | [natural=cape](https://wiki.openstreetmap.org/wiki/Tag:natural=cape) |
[Q35509](https://www.wikidata.org/entity/Q35509) | Cave | [natural=cave_entrance](https://wiki.openstreetmap.org/wiki/Tag:natural=cave_entrance) |
[Q8514](https://www.wikidata.org/entity/Q8514) | Desert | |
[Q4421](https://www.wikidata.org/entity/Q4421) | Forest | [natural=wood](https://wiki.openstreetmap.org/wiki/Tag:natural=wood) |
[Q35666](https://www.wikidata.org/entity/Q35666) | Glacier | [natural=glacier](https://wiki.openstreetmap.org/wiki/Tag:natural=glacier) |
[Q177380](https://www.wikidata.org/entity/Q177380) | Hot spring | |
[Q8502](https://www.wikidata.org/entity/Q8502) | Mountain | [natural=peak](https://wiki.openstreetmap.org/wiki/Tag:natural=peak) |
[Q133056](https://www.wikidata.org/entity/Q133056) | Mountain pass | |
[Q46831](https://www.wikidata.org/entity/Q46831) | Mountain range | |
[Q39816](https://www.wikidata.org/entity/Q39816) | Valley | [natural=valley](https://wiki.openstreetmap.org/wiki/Tag:natural=valley) |
[Q8072](https://www.wikidata.org/entity/Q8072) | Volcano | [natural=volcano](https://wiki.openstreetmap.org/wiki/Tag:natural=volcano) |
[Q43229](https://www.wikidata.org/entity/Q43229) | Organization | |
[Q327333](https://www.wikidata.org/entity/Q327333) | Government agency | [office=government](https://wiki.openstreetmap.org/wiki/Tag:office=government)|
[Q22698](https://www.wikidata.org/entity/Q22698) | Park | [leisure=park](https://wiki.openstreetmap.org/wiki/Tag:leisure=park) |
[Q159313](https://www.wikidata.org/entity/Q159313) | Urban agglomeration | |
[Q177634](https://www.wikidata.org/entity/Q177634) | Community | |
[Q5107](https://www.wikidata.org/entity/Q5107) | Continent | [place=continent](https://wiki.openstreetmap.org/wiki/Tag:place=continent) |
[Q6256](https://www.wikidata.org/entity/Q6256) | Country | [place=country](https://wiki.openstreetmap.org/wiki/Tag:place=country) |
[Q75848](https://www.wikidata.org/entity/Q75848) | Gated community | |
[Q3153117](https://www.wikidata.org/entity/Q3153117) | Intercommunality | |
[Q82794](https://www.wikidata.org/entity/Q82794) | Region | |
[Q56061](https://www.wikidata.org/entity/Q56061) | Administrative division | [boundary=administrative](https://wiki.openstreetmap.org/wiki/Tag:boundary=administrative) |
[Q665487](https://www.wikidata.org/entity/Q665487) | Diocese | |
[Q4976993](https://www.wikidata.org/entity/Q4976993) | Parish | [boundary=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:boundary=civil_parish) |
[Q194203](https://www.wikidata.org/entity/Q194203) | Arrondissements of France | |
[Q91028](https://www.wikidata.org/entity/Q91028) | Arrondissements of Belgium | |
[Q3623867](https://www.wikidata.org/entity/Q3623867) | Arrondissements of Benin | |
[Q2311958](https://www.wikidata.org/entity/Q2311958) | Canton (country subdivision) | [political_division=canton](https://wiki.openstreetmap.org/wiki/FR:Cantons_in_France) |
[Q643589](https://www.wikidata.org/entity/Q643589) | Department | |
[Q202216](https://www.wikidata.org/entity/Q202216) | Overseas department and region | |
[Q149621](https://www.wikidata.org/entity/Q149621) | District | [place=district](https://wiki.openstreetmap.org/wiki/Tag:place=district) |
[Q15243209](https://www.wikidata.org/wiki/Q15243209) | Historic district | |
[Q5144960](https://www.wikidata.org/entity/Q5144960) | Microregion | |
[Q15284](https://www.wikidata.org/entity/Q15284) | Municipality | |
[Q515716](https://www.wikidata.org/entity/Q515716) | Prefecture | |
[Q34876](https://www.wikidata.org/entity/Q34876) | Province | |
[Q3191695](https://www.wikidata.org/entity/Q3191695) | Regency (Indonesia) | |
[Q1970725](https://www.wikidata.org/entity/Q1970725) | Natural region | |
[Q486972](https://www.wikidata.org/entity/Q486972) | Human settlement | |
[Q515](https://www.wikidata.org/entity/Q515) | City | [place=city](https://wiki.openstreetmap.org/wiki/Tag:place=city) |
[Q5119](https://www.wikidata.org/entity/Q5119) | Capital city | [capital=yes](https://wiki.openstreetmap.org/wiki/Key:capital) |
[Q4286337](https://www.wikidata.org/entity/Q4286337) | City district | |
[Q1394476](https://www.wikidata.org/entity/Q1394476) | Civil township | |
[Q1115575](https://www.wikidata.org/entity/Q1115575) | Civil parish | [designation=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:designation=civil_parish) |
[Q5153984](https://www.wikidata.org/entity/Q5153984) | Commune-level subdivisions | |
[Q123705](https://www.wikidata.org/entity/Q123705) | Neighbourhood | [place=neighbourhood](https://wiki.openstreetmap.org/wiki/Tag:place=neighbourhood) |
[Q1500350](https://www.wikidata.org/entity/Q1500350) | Townships of China | |
[Q17343829](https://www.wikidata.org/entity/Q17343829) | Unincorporated Community | |
[Q3957](https://www.wikidata.org/entity/Q3957) | Town | [place=town](https://wiki.openstreetmap.org/wiki/Tag:place=town) |
[Q532](https://www.wikidata.org/entity/Q532) | Village | [place=village](https://wiki.openstreetmap.org/wiki/Tag:place=village) |
[Q5084](https://www.wikidata.org/entity/Q5084) | Hamlet | [place=hamlet](https://wiki.openstreetmap.org/wiki/Tag:place=hamlet) |
[Q7275](https://www.wikidata.org/entity/Q7275) | State | |
[Q79007](https://www.wikidata.org/entity/Q79007) | Street | |
[Q473972](https://www.wikidata.org/entity/Q473972) | Protected area | [boundary=protected_area](https://wiki.openstreetmap.org/wiki/Tag:boundary=protected_area) |
[Q1377575](https://www.wikidata.org/entity/Q1377575) | Wildlife refuge | |
[Q1410668](https://www.wikidata.org/entity/Q1410668) | National Wildlife Refuge | [protection_title=National Wildlife Refuge](ownership=national), [ownership=national](https://wiki.openstreetmap.org/wiki/Tag:ownership=national)|
[Q9259](https://www.wikidata.org/entity/Q9259) | World Heritage Site | |
---
### Future Work
The Wikidata improvements to Nominatim can be further enhanced by:
- continuing to add new Wikidata links to OSM objects
- increasing the number of place types accounted for in the wikipedia_articles table
- working to use place types in the wikipedia_article matching process

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,26 @@
-- This data contains Ordnance Survey data © Crown copyright and database right 2010.
-- Code-Point Open contains Royal Mail data © Royal Mail copyright and database right 2010.
-- OS data may be used under the terms of the OS OpenData licence:
-- http://www.ordnancesurvey.co.uk/oswebsite/opendata/licence/docs/licence.pdf
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE gb_postcode (
id integer,
postcode character varying(9),
geometry geometry,
CONSTRAINT enforce_dims_geometry CHECK ((st_ndims(geometry) = 2)),
CONSTRAINT enforce_srid_geometry CHECK ((st_srid(geometry) = 4326))
);

View File

@@ -0,0 +1,16 @@
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE us_postcode (
postcode text,
x double precision,
y double precision
);

View File

@@ -29787,7 +29787,7 @@ st 5557484
-- prefill word table
select count(precompute_words(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(make_keywords(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(getorcreate_housenumber_id(make_standard_name(v))) from (select distinct address->'housenumber' as v from place where address ? 'housenumber') as w;
-- copy the word frequencies

View File

@@ -5,27 +5,23 @@
configure_file(mkdocs.yml ../mkdocs.yml)
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/appendix)
set (DOC_SOURCES
admin
develop
api
index.md
extra.css
styles.css
)
foreach (src ${DOC_SOURCES})
execute_process(
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/${src} ${CMAKE_CURRENT_BINARY_DIR}/${src}
)
endforeach()
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/data-sources)
ADD_CUSTOM_TARGET(doc
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/admin ${CMAKE_CURRENT_BINARY_DIR}/admin
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/develop ${CMAKE_CURRENT_BINARY_DIR}/develop
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/api ${CMAKE_CURRENT_BINARY_DIR}/api
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/index.md ${CMAKE_CURRENT_BINARY_DIR}/index.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/extra.css ${CMAKE_CURRENT_BINARY_DIR}/extra.css
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/data-sources/overview.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/overview.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/us-tiger/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/US-Tiger.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/gb-postcodes/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/GB-Postcodes.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/country-grid/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/Country-Grid.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/country-grid/mexico.quad.png ${CMAKE_CURRENT_BINARY_DIR}/data-sources/mexico.quad.png
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/wikipedia-wikidata/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/Wikipedia-Wikidata.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Centos-7.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Centos-7.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Centos-8.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Centos-8.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-16.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-16.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-18.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-18.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-20.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-20.md
COMMAND mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
)

View File

@@ -1,173 +0,0 @@
# Advanced installations
This page contains instructions for setting up multiple countries in
your Nominatim database. It is assumed that you have already successfully
installed the Nominatim software itself, if not return to the
[installation page](Installation.md).
## Importing multiple regions
To import multiple regions in your database, you need to configure and run `utils/import_multiple_regions.sh` file. This script will set up the update directory which has the following structure:
```bash
update
   ├── europe
   │   ├── andorra
   │   │   └── sequence.state
   │   └── monaco
   │   └── sequence.state
   └── tmp
├── combined.osm.pbf
└── europe
├── andorra-latest.osm.pbf
└── monaco-latest.osm.pbf
```
The `sequence.state` files will contain the sequence ID, which will be used by pyosmium to get updates. The tmp folder is used for import dump.
### Configuring multiple regions
The file `import_multiple_regions.sh` needs to be edited as per your requirement:
1. List of countries. eg:
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
### Setting up multiple regions
!!! tip
If your database already exists and you want to add more countries,
replace the setting up part
`${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1`
with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1`
where N is the numbers of CPUs in your system.
Run the following command from your Nominatim directory after configuring the file.
bash ./utils/import_multiple_regions.sh
!!! danger "Important"
This file uses osmium-tool. It must be installed before executing the import script.
Installation instructions can be found [here](https://osmcode.org/osmium-tool/manual.html#installation).
### Updating multiple regions
To import multiple regions in your database, you need to configure and run ```utils/update_database.sh```.
This uses the update directory set up while setting up the DB.
### Configuring multiple regions
The file `update_database.sh` needs to be edited as per your requirement:
1. List of countries. eg:
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-updates"
5. Followup can be set according to your installation. eg: For Photon,
FOLLOWUP="curl http://localhost:2322/nominatim-update"
will handle the indexing.
### Updating the database
Run the following command from your Nominatim directory after configuring the file.
bash ./utils/update_database.sh
This will get diffs from the replication server, import diffs and index the database. The default replication server in the script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Importing Nominatim to an external PostgreSQL database
You can install Nominatim using a database that runs on a different server when
you have physical access to the file system on the other server. Nominatim
uses a custom normalization library that needs to be made accessible to the
PostgreSQL server. This section explains how to set up the normalization
library.
### Option 1: Compiling the library on the database server
The most sure way to get a working library is to compile it on the database
server. From the prerequisites you need at least cmake, gcc and the
PostgreSQL server package.
Clone or unpack the Nominatim source code, enter the source directory and
create and enter a build directory.
```sh
cd Nominatim
mkdir build
cd build
```
Now configure cmake to only build the PostgreSQL module and build it:
```
cmake -DBUILD_IMPORTER=off -DBUILD_API=off -DBUILD_TESTS=off -DBUILD_DOCS=off -DBUILD_OSM2PGSQL=off ..
make
```
When done, you find the normalization library in `build/module/nominatim.so`.
Copy it to a place where it is readable and executable by the PostgreSQL server
process.
### Option 2: Compiling the library on the import machine
You can also compile the normalization library on the machine from where you
run the import.
!!! important
You can only do this when the database server and the import machine have
the same architecture and run the same version of Linux. Otherwise there is
no guarantee that the compiled library is compatible with the PostgreSQL
server running on the database server.
Make sure that the PostgreSQL server package is installed on the machine
**with the same version as on the database server**. You do not need to install
the PostgreSQL server itself.
Download and compile Nominatim as per standard instructions. Once done, you find
the normalization library in `build/module/nominatim.so`. Copy the file to
the database server at a location where it is readable and executable by the
PostgreSQL server process.
### Running the import
On the client side you now need to configure the import to point to the
correct location of the library **on the database server**. Add the following
line to your your `.env` file:
```php
NOMINATIM_DATABASE_MODULE_PATH="<directory on the database server where nominatim.so resides>"
```
Now change the `NOMINATIM_DATABASE_DSN` to point to your remote server and continue
to follow the [standard instructions for importing](/admin/Import).

View File

@@ -1,101 +0,0 @@
# Customization of the Database
This section explains in detail how to configure a Nominatim import and
the various means to use external data.
## External postcode data
Nominatim creates a table of known postcode centroids during import. This table
is used for searches of postcodes and for adding postcodes to places where the
OSM data does not provide one. These postcode centroids are mainly computed
from the OSM data itself. In addition, Nominatim supports reading postcode
information from an external CSV file, to supplement the postcodes that are
missing in OSM.
To enable external postcode support, simply put one CSV file per country into
your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
two-letter country code for which to apply the file. The file may also be
gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
The CSV file must use commas as a delimiter and have a header line. Nominatim
expects three columns to be present: `postcode`, `lat` and `lon`. All other
columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
postcode centroids in WGS84.
The postcode files are loaded only when there is data for the given country
in your database. For example, if there is a `us_postcodes.csv` file in your
project directory but you import only an excerpt of Italy, then the US postcodes
will simply be ignored.
As a rule, the external postcode data should be put into the project directory
**before** starting the initial import. Still, you can add, remove and update the
external postcode data at any time. Simply
run:
```
nominatim refresh --postcodes
```
to make the changes visible in your database. Be aware, however, that the changes
only have an immediate effect on searches for postcodes. Postcodes that were
added to places are only updated, when they are reindexed. That usually happens
only during replication updates.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2020 data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger2020-nominatim-preprocessed.csv.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger2020-nominatim-preprocessed.csv.tar.gz
3. Enable use of the Tiger data in your `.env` by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions
See the [developer's guide](../develop/data-sources.md#us-census-tiger) for more
information on how the data got preprocessed.
## Special phrases import
As described in the [Importation chapter](Import.md), it is possible to
import special phrases from the wiki with the following command:
```sh
nominatim special-phrases --import-from-wiki
```
But, it is also possible to import some phrases from a csv file.
To do so, you have access to the following command:
```sh
nominatim special-phrases --import-from-csv <csv file>
```
Note that the two previous import commands will update the phrases from your database.
This means that if you import some phrases from a csv file, only the phrases
present in the csv file will be kept into the database. All other phrases will
be removed.
If you want to only add new phrases and not update the other ones you can add
the argument `--no-replace` to the import command. For example:
```sh
nominatim special-phrases --import-from-csv <csv file> --no-replace
```
This will add the phrases present in the csv file into the database without
removing the other ones.

View File

@@ -1,142 +0,0 @@
# Deploying Nominatim
The Nominatim API is implemented as a PHP application. The `website/` directory
in the project directory contains the configured website. You can serve this
in a production environment with any web server that is capable to run
PHP scripts.
This section gives a quick overview on how to configure Apache and Nginx to
serve Nominatim. It is not meant as a full system administration guide on how
to run a web service. Please refer to the documentation of
[Apache](http://httpd.apache.org/docs/current/) and
[Nginx](https://nginx.org/en/docs/)
for background information on configuring the services.
!!! Note
Throughout this page, we assume that your Nominatim project directory is
located in `/srv/nominatim-project` and that you have installed Nominatim
using the default installation prefix `/usr/local`. If you have put it
somewhere else, you need to adjust the commands and configuration
accordingly.
We further assume that your web server runs as user `www-data`. Older
versions of CentOS may still use the user name `apache`. You also need
to adapt the instructions in this case.
## Making the website directory accessible
You need to make sure that the `website` directory is accessible for the
web server user. You can check that the permissions are correct by accessing
on of the php files as the web server user:
``` sh
sudo -u www-data head -n 1 /srv/nominatim-project/website/search.php
```
If this shows a permission error, then you need to adapt the permissions of
each directory in the path so that it is executable for `www-data`.
If you have SELinux enabled, further adjustments may be necessary to give the
web server access. At a minimum the following SELinux labelling should be done
for Nominatim:
``` sh
sudo semanage fcontext -a -t httpd_sys_content_t "/usr/local/nominatim/lib/lib-php(/.*)?"
sudo semanage fcontext -a -t httpd_sys_content_t "/srv/nominatim-project/website(/.*)?"
sudo semanage fcontext -a -t lib_t "/srv/nominatim-project/module/nominatim.so"
sudo restorecon -R -v /usr/local/lib/nominatim
sudo restorecon -R -v /srv/nominatim-project
```
## Nominatim with Apache
### Installing the required packages
With Apache you can use the PHP module to run Nominatim.
Under Ubuntu/Debian install them with:
``` sh
sudo apt install apache2 libapache2-mod-php
```
### Configuring Apache
Make sure your Apache configuration contains the required permissions for the
directory and create an alias:
``` apache
<Directory "/srv/nominatim-project/website">
Options FollowSymLinks MultiViews
AddType text/html .php
DirectoryIndex search.php
Require all granted
</Directory>
Alias /nominatim /srv/nominatim-project/website
```
After making changes in the apache config you need to restart apache.
The website should now be available on `http://localhost/nominatim`.
## Nominatim with Nginx
### Installing the required packages
Nginx has no built-in PHP interpreter. You need to use php-fpm as a deamon for
serving PHP cgi.
On Ubuntu/Debian install nginx and php-fpm with:
``` sh
sudo apt install nginx php-fpm
```
### Configure php-fpm and Nginx
By default php-fpm listens on a network socket. If you want it to listen to a
Unix socket instead, change the pool configuration
(`/etc/php/<php version>/fpm/pool.d/www.conf`) as follows:
``` ini
; Replace the tcp listener and add the unix socket
listen = /var/run/php-fpm.sock
; Ensure that the daemon runs as the correct user
listen.owner = www-data
listen.group = www-data
listen.mode = 0666
```
Tell nginx that php files are special and to fastcgi_pass to the php-fpm
unix socket by adding the location definition to the default configuration.
``` nginx
root /srv/nominatim-project/website;
index search.php;
location / {
try_files $uri $uri/ @php;
}
location @php {
fastcgi_param SCRIPT_FILENAME "$document_root$uri.php";
fastcgi_param PATH_TRANSLATED "$document_root$uri.php";
fastcgi_param QUERY_STRING $args;
fastcgi_pass unix:/var/run/php-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
}
location ~ [^/]\.php(/|$) {
fastcgi_split_path_info ^(.+?\.php)(/.*)$;
if (!-f $document_root$fastcgi_script_name) {
return 404;
}
fastcgi_pass unix:/var/run/php-fpm.sock;
fastcgi_index search.php;
include fastcgi.conf;
}
```
Restart the nginx and php-fpm services and the website should now be available
at `http://localhost/`.

View File

@@ -16,44 +16,27 @@ was killed. If it looks like this:
then you can resume with the following command:
```sh
nominatim import --continue indexing
./utils/setup.php --index --create-search-indices --create-country-names
```
If the reported rank is 26 or higher, you can also safely add `--index-noanalyse`.
### PostgreSQL crashed "invalid page in block"
Usually serious problem, can be a hardware issue, not all data written to disc
for example. Check PostgreSQL log file and search PostgreSQL issues/mailing
list for hints.
If it happened during index creation you can try rerunning the step with
```sh
nominatim import --continue indexing
```
Otherwise it's best to start the full setup from the beginning.
### PHP "open_basedir restriction in effect" warnings
PHP Warning: file_get_contents(): open_basedir restriction in effect.
You need to adjust the
[open_basedir](https://www.php.net/manual/en/ini.core.php#ini.open-basedir)
setting in your PHP configuration (`php.ini` file). By default this setting may
look like this:
You need to adjust the [open_basedir](https://www.php.net/manual/en/ini.core.php#ini.open-basedir) setting
in your PHP configuration (`php.ini file`). By default this setting may look like this:
open_basedir = /srv/http/:/home/:/tmp/:/usr/share/pear/
Either add reported directories to the list or disable this setting temporarily
by adding ";" at the beginning of the line. Don't forget to enable this setting
again once you are done with the PHP command line operations.
Either add reported directories to the list or disable this setting temporarily by
dding ";" at the beginning of the line. Don't forget to enable this setting again
once you are done with the PHP command line operations.
### PHP timezeone warnings
### PHP timzeone warnings
The Apache log may contain lots of PHP warnings like this:
`PHP Warning: date_default_timezone_set() function.`
@@ -83,7 +66,7 @@ server development libraries (`postgresql-server-dev-9.5` on Ubuntu)
and recompile (`cmake .. && make`).
### I see the error "ERROR: permission denied for language c"
## I see the error "ERROR: permission denied for language c"
`nominatim.so`, written in C, is required to be installed on the database
server. Some managed database (cloud) services like Amazon RDS do not allow
@@ -93,7 +76,7 @@ on a non-managed machine.
### I see the error: "function transliteration(text) does not exist"
Reinstall the nominatim functions with `nominatim refresh --functions`
Reinstall the nominatim functions with `setup.php --create--functions`
and check for any errors, e.g. a missing `nominatim.so` file.
### I see the error: "ERROR: mmap (remap) failed"
@@ -106,15 +89,9 @@ If you are using a flatnode file, then it may also be that the underlying
filesystem does not fully support 'mmap'. A notable candidate is virtualbox's
vboxfs.
### I see the error: "clang: Command not found" on CentOS
On CentOS 7 users reported `/opt/rh/llvm-toolset-7/root/usr/bin/clang: Command not found`.
Double-check clang is installed. Instead of `make` try running `make CLANG=true`.
### nominatim UPDATE failed: ERROR: buffer 179261 is not owned by resource owner Portal
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168)
during the initial import of the database. It's
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168) during the initial import of the database. It's
something PostgreSQL internal Nominatim doesn't control. And PostgreSQL forums
suggest it's threading related but definitely some kind of crash of a process.
Users reported either rebooting the server, different hardware or just trying
@@ -130,11 +107,10 @@ to get the full error message.
`could not connect to server: No such file or directory`
On CentOS v7 the PostgreSQL server is started with `systemd`. Check if
`/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`. If
so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security
feature, so use the
[preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
On CentOS v7 the PostgreSQL server is started with `systemd`.
Check if `/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`.
If so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security feature,
so use the [preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
However, you can solve this the quick and dirty way by commenting out that line and then run
@@ -142,12 +118,14 @@ However, you can solve this the quick and dirty way by commenting out that line
sudo systemctl restart httpd
### "must be an array or an object that implements Countable" warning in /usr/share/pear/DB.php
The warning started with PHP 7.2. Make sure you have at least [version 1.9.3 of PEAR DB](https://github.com/pear/DB/releases)
installed.
### Website reports "DB Error: insufficient permissions"
The user the webserver, e.g. Apache, runs under needs to have access to the
Nominatim database. You can find the user like
[this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as),
for default Ubuntu operating system for example it's `www-data`.
The user the webserver, e.g. Apache, runs under needs to have access to the Nominatim database. You can find the user like [this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as), for default Ubuntu operating system for example it's `www-data`.
1. Repeat the `createuser` step of the installation instructions.
@@ -172,8 +150,7 @@ Example error message
The PostgreSQL database, i.e. user `postgres`, needs to have access to that file.
The permission need to be read & executable by everybody, but not writeable
by everybody, e.g.
The permission need to be read & executable by everybody, e.g.
```
-rwxr-xr-x 1 nominatim nominatim 297984 build/module/nominatim.so
@@ -184,31 +161,55 @@ Try `chmod a+r nominatim.so; chmod a+x nominatim.so`.
When running SELinux, make sure that the
[context is set up correctly](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
When you recently updated your operating system, updated PostgreSQL to
a new version or moved files (e.g. the build directory) you should
recreate `nominatim.so`. Try
```
cd build
rm -r module/
cmake $main_Nominatim_path && make
```
### Setup.php fails with "DB Error: extension not found"
Make sure you have the PostgreSQL extensions "hstore" and "postgis" installed.
See the installation instructions for a full list of required packages.
See the installation instruction for a full list of required packages.
### Setup.php reports "Cannot redeclare getDB()"
`Cannot redeclare getDB() (previously declared in /your/path/Nominatim/lib/db.php:4)`
The message is a bit misleading as PHP needs to load the file `DB.php` and
instead re-loads Nominatim's `db.php`. To solve this make sure you
have the [Pear module 'DB'](https://pear.php.net/package/DB/) installed.
sudo pear install DB
### I forgot to delete the flatnodes file before starting an import.
That's fine. For each import the flatnodes file get overwritten.
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage](https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage)
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage]()
for more information.
## Running your own instance
### Can I import multiple countries and keep them up to date?
You should use the extracts and updates from https://download.geofabrik.de.
For the initial import, download the countries you need and merge them.
See [OSM Help](https://help.openstreetmap.org/questions/48843/merging-two-or-more-geographical-areas-to-import-two-or-more-osm-files-in-nominatim)
for examples how to do that. Use the resulting single osm file when
running `setup.php`.
For updates you need to download the change files for each country
once per day and apply them **separately** using
./utils/update.php --import-diff <filename> --index
See [this issue](https://github.com/openstreetmap/Nominatim/issues/60#issuecomment-18679446)
for a script that runs the updates using osmosis.
### Can I import negative OSM ids into Nominatim?
See [this question of Stackoverflow](https://help.openstreetmap.org/questions/64662/nominatim-flatnode-with-negative-id).
### Missing XML or text declaration
The website might show: `XML Parsing Error: XML or text declaration not at start of entity Location.`
Make sure there are no spaces at the beginning of your `settings/local.php` file.

View File

@@ -0,0 +1,276 @@
# Importing and Updating the Database
The following instructions explain how to create a Nominatim database
from an OSM planet file and how to keep the database up to date. It
is assumed that you have already successfully installed the Nominatim
software itself, if not return to the [installation page](Installation.md).
## Configuration setup in settings/local.php
The Nominatim server can be customized via the file `settings/local.php`
in the build directory. Note that this is a PHP file, so it must always
start like this:
<?php
without any leading spaces.
There are lots of configuration settings you can tweak. Have a look
at `settings/default.php` for a full list. Most should have a sensible default.
#### Flatnode files
If you plan to import a large dataset (e.g. Europe, North America, planet),
you should also enable flatnode storage of node locations. With this
setting enabled, node coordinates are stored in a simple file instead
of the database. This will save you import time and disk storage.
Add to your `settings/local.php`:
@define('CONST_Osm2pgsql_Flatnode_File', '/path/to/flatnode.file');
Replace the second part with a suitable path on your system and make sure
the directory exists. There should be at least 40GB of free space.
## Downloading additional data
### Wikipedia rankings
Wikipedia can be used as an optional auxiliary data source to help indicate
the importance of OSM features. Nominatim will work without this information
but it will improve the quality of the results if this is installed.
This data is available as a binary download:
cd $NOMINATIM_SOURCE_DIR/data
wget https://www.nominatim.org/data/wikipedia_article.sql.bin
wget https://www.nominatim.org/data/wikipedia_redirect.sql.bin
Combined the 2 files are around 1.5GB and add around 30GB to the install
size of Nominatim. They also increase the install time by an hour or so.
*NOTE:* you'll need to download the Wikipedia rankings before performing
the initial import of the data if you want the rankings applied to the
loaded data.
### Great Britain, USA postcodes
Nominatim can use postcodes from an external source to improve searches that
involve a GB or US postcode. This data can be optionally downloaded:
cd $NOMINATIM_SOURCE_DIR/data
wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
wget https://www.nominatim.org/data/us_postcode_data.sql.gz
## Choosing the Data to Import
In its default setup Nominatim is configured to import the full OSM data
set for the entire planet. Such a setup requires a powerful machine with
at least 32GB of RAM and around 800GB of SSD hard disks. Depending on your
use case there are various ways to reduce the amount of data imported. This
section discusses these methods. They can also be combined.
### Using an extract
If you only need geocoding for a smaller region, then precomputed extracts
are a good way to reduce the database size and import time.
[Geofabrik](https://download.geofabrik.de) offers extracts for most countries.
They even have daily updates which can be used with the update process described
below. There are also
[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading).
Please be aware that some extracts are not cut exactly along the country
boundaries. As a result some parts of the boundary may be missing which means
that Nominatim cannot compute the areas for some administrative areas.
### Dropping Data Required for Dynamic Updates
About half of the data in Nominatim's database is not really used for serving
the API. It is only there to allow the data to be updated from the latest
changes from OSM. For many uses these dynamic updates are not really required.
If you don't plan to apply updates, the dynamic part of the database can be
safely dropped using the following command:
```
./utils/setup.php --drop
```
Note that you still need to provide for sufficient disk space for the initial
import. So this option is particularly interesting if you plan to transfer the
database or reuse the space later.
### Reverse-only Imports
If you only want to use the Nominatim database for reverse lookups or
if you plan to use the installation only for exports to a
[photon](https://photon.komoot.de/) database, then you can set up a database
without search indexes. Add `--reverse-only` to your setup command above.
This saves about 5% of disk space.
### Filtering Imported Data
Nominatim normally sets up a full search database containing administrative
boundaries, places, streets, addresses and POI data. There are also other
import styles available which only read selected data:
* **settings/import-admin.style**
Only import administrative boundaries and places.
* **settings/import-street.style**
Like the admin style but also adds streets.
* **settings/import-address.style**
Import all data necessary to compute addresses down to house number level.
* **settings/import-full.style**
Default style that also includes points of interest.
The style can be changed with the configuration `CONST_Import_Style`.
To give you an idea of the impact of using the different styles, the table
below gives rough estimates of the final database size after import of a
2018 planet and after using the `--drop` option. It also shows the time
needed for the import on a machine with 32GB RAM, 4 CPUS and SSDs. Note that
the given sizes are just an estimate meant for comparison of style requirements.
Your planet import is likely to be larger as the OSM data grows with time.
style | Import time | DB size | after drop
----------|--------------|------------|------------
admin | 5h | 190 GB | 20 GB
street | 42h | 400 GB | 180 GB
address | 59h | 500 GB | 260 GB
full | 80h | 575 GB | 300 GB
You can also customize the styles further. For an description of the
style format see [the development section](../develop/Import.md).
## Initial import of the data
**Important:** first try the import with a small extract, for example from
[Geofabrik](https://download.geofabrik.de).
Download the data to import and load the data with the following command
from the build directory:
```sh
./utils/setup.php --osm-file <data file> --all [--osm2pgsql-cache 28000] 2>&1 | tee setup.log
```
The `--osm2pgsql-cache` parameter is optional but strongly recommended for
planet imports. It sets the node cache size for the osm2pgsql import part
(see `-C` parameter in osm2pgsql help). As a rule of thumb, this should be
about the same size as the file you are importing but never more than
2/3 of RAM available. If your machine starts swapping reduce the size.
Computing word frequency for search terms can improve the performance of
forward geocoding in particular under high load as it helps PostgreSQL's query
planner to make the right decisions. To recompute word counts run:
```sh
./utils/update.php --recompute-word-counts
```
This will take a couple of hours for a full planet installation. You can
also defer that step to a later point in time when you realise that
performance becomes an issue. Just make sure that updates are stopped before
running this function.
If you want to be able to search for places by their type through
[special key phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to enable these key phrases like this:
./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql
Note that this command downloads the phrases from the wiki link above.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geo/maps-data/data/tiger.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2019 data and unpack it into the
data directory in your Nominatim sources:
cd Nominatim/data
wget https://nominatim.org/data/tiger2019-nominatim-preprocessed.tar.gz
tar xf tiger2019-nominatim-preprocessed.tar.gz
`data-source/us-tiger/README.md` explains how the data got preprocessed.
2. Import the data into your Nominatim database:
./utils/setup.php --import-tiger-data
3. Enable use of the Tiger data in your `settings/local.php` by adding:
@define('CONST_Use_US_Tiger_Data', true);
4. Apply the new settings:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## Updates
There are many different ways to update your Nominatim database.
The following section describes how to keep it up-to-date with Pyosmium.
For a list of other methods see the output of `./utils/update.php --help`.
#### Installing the newest version of Pyosmium
It is recommended to install Pyosmium via pip. Make sure to use python3.
Run (as the same user who will later run the updates):
```sh
pip3 install --user osmium
```
Nominatim needs a tool called `pyosmium-get-updates` which comes with
Pyosmium. You need to tell Nominatim where to find it. Add the
following line to your `settings/local.php`:
@define('CONST_Pyosmium_Binary', '/home/user/.local/bin/pyosmium-get-changes');
The path above is fine if you used the `--user` parameter with pip.
Replace `user` with your user name.
#### Setting up the update process
Next the update needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
to `settings/local.php`. For example, to use the daily country extracts
diffs for Ireland from Geofabrik add the following:
// base URL of the replication service
@define('CONST_Replication_Url', 'https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates');
// How often upstream publishes diffs
@define('CONST_Replication_Update_Interval', '86400');
// How long to sleep if no update found yet
@define('CONST_Replication_Recheck_Interval', '900');
To set up the update process now run the following command:
./utils/update.php --init-updates
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `--init-updates` command needs to be rerun whenever the replication service
is changed.
#### Updating Nominatim
The following command will keep your database constantly up to date:
./utils/update.php --import-osmosis-all
(Note that even though the old name "import-osmosis-all" has been kept for compatibility reasons, Osmosis is not required to run this - it uses pyosmium behind the scenes.)
If you have imported multiple country extracts and want to keep them
up-to-date, have a look at the script in
[issue #60](https://github.com/openstreetmap/Nominatim/issues/60).

View File

@@ -1,291 +0,0 @@
# Importing the Database
The following instructions explain how to create a Nominatim database
from an OSM planet file. It is assumed that you have already successfully
installed the Nominatim software itself and the `nominatim` tool can be found
in your `PATH`. If this is not the case, return to the
[installation page](Installation.md).
## Creating the project directory
Before you start the import, you should create a project directory for your
new database installation. This directory receives all data that is related
to a single Nominatim setup: configuration, extra data, etc. Create a project
directory apart from the Nominatim software and change into the directory:
```
mkdir ~/nominatim-planet
cd ~/nominatim-planet
```
In the following, we refer to the project directory as `$PROJECT_DIR`. To be
able to copy&paste instructions, you can export the appropriate variable:
```
export PROJECT_DIR=~/nominatim-planet
```
The Nominatim tool assumes per default that the current working directory is
the project directory but you may explicitly state a different directory using
the `--project-dir` parameter. The following instructions assume that you run
all commands from the project directory.
!!! tip "Migration Tip"
Nominatim used to be run directly from the build directory until version 3.6.
Essentially, the build directory functioned as the project directory
for the database installation. This setup still works and can be useful for
development purposes. It is not recommended anymore for production setups.
Create a project directory that is separate from the Nominatim software.
### Configuration setup in `.env`
The Nominatim server can be customized via an `.env` configuration file in the
project directory. This is a file in [dotenv](https://github.com/theskumar/python-dotenv)
format which looks the same as variable settings in a standard shell environment.
You can also set the same configuration via environment variables. All
settings have a `NOMINATIM_` prefix to avoid conflicts with other environment
variables.
There are lots of configuration settings you can tweak. Have a look
at `Nominatim/settings/env.default` for a full list. Most should have a sensible default.
#### Flatnode files
If you plan to import a large dataset (e.g. Europe, North America, planet),
you should also enable flatnode storage of node locations. With this
setting enabled, node coordinates are stored in a simple file instead
of the database. This will save you import time and disk storage.
Add to your `.env`:
NOMINATIM_FLATNODE_FILE="/path/to/flatnode.file"
Replace the second part with a suitable path on your system and make sure
the directory exists. There should be at least 75GB of free space.
## Downloading additional data
### Wikipedia/Wikidata rankings
Wikipedia can be used as an optional auxiliary data source to help indicate
the importance of OSM features. Nominatim will work without this information
but it will improve the quality of the results if this is installed.
This data is available as a binary download. Put it into your project directory:
cd $PROJECT_DIR
wget https://www.nominatim.org/data/wikimedia-importance.sql.gz
The file is about 400MB and adds around 4GB to the Nominatim database.
!!! tip
If you forgot to download the wikipedia rankings, you can also add
importances after the import. Download the files, then run
`nominatim refresh --wiki-data --importance`. Updating importances for
a planet can take a couple of hours.
### External postcodes
Nominatim can use postcodes from an external source to improve searching with
postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
This data can be optionally downloaded into the project directory:
cd $PROJECT_DIR
wget https://www.nominatim.org/data/gb_postcodes.csv.gz
wget https://www.nominatim.org/data/us_postcodes.csv.gz
You can also add your own custom postcode sources, see
[Customization of postcodes](Customization.md#external-postcode-data).
## Choosing the data to import
In its default setup Nominatim is configured to import the full OSM data
set for the entire planet. Such a setup requires a powerful machine with
at least 64GB of RAM and around 900GB of SSD hard disks. Depending on your
use case there are various ways to reduce the amount of data imported. This
section discusses these methods. They can also be combined.
### Using an extract
If you only need geocoding for a smaller region, then precomputed OSM extracts
are a good way to reduce the database size and import time.
[Geofabrik](https://download.geofabrik.de) offers extracts for most countries.
They even have daily updates which can be used with the update process described
[in the next section](../Update). There are also
[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading).
Please be aware that some extracts are not cut exactly along the country
boundaries. As a result some parts of the boundary may be missing which means
that Nominatim cannot compute the areas for some administrative areas.
### Dropping Data Required for Dynamic Updates
About half of the data in Nominatim's database is not really used for serving
the API. It is only there to allow the data to be updated from the latest
changes from OSM. For many uses these dynamic updates are not really required.
If you don't plan to apply updates, you can run the import with the
`--no-updates` parameter. This will drop the dynamic part of the database as
soon as it is not required anymore.
You can also drop the dynamic part later using the following command:
```
nominatim freeze
```
Note that you still need to provide for sufficient disk space for the initial
import. So this option is particularly interesting if you plan to transfer the
database or reuse the space later.
### Reverse-only Imports
If you only want to use the Nominatim database for reverse lookups or
if you plan to use the installation only for exports to a
[photon](https://photon.komoot.de/) database, then you can set up a database
without search indexes. Add `--reverse-only` to your setup command above.
This saves about 5% of disk space.
### Filtering Imported Data
Nominatim normally sets up a full search database containing administrative
boundaries, places, streets, addresses and POI data. There are also other
import styles available which only read selected data:
* **settings/import-admin.style**
Only import administrative boundaries and places.
* **settings/import-street.style**
Like the admin style but also adds streets.
* **settings/import-address.style**
Import all data necessary to compute addresses down to house number level.
* **settings/import-full.style**
Default style that also includes points of interest.
* **settings/import-extratags.style**
Like the full style but also adds most of the OSM tags into the extratags
column.
The style can be changed with the configuration `NOMINATIM_IMPORT_STYLE`.
To give you an idea of the impact of using the different styles, the table
below gives rough estimates of the final database size after import of a
2020 planet and after using the `--drop` option. It also shows the time
needed for the import on a machine with 64GB RAM, 4 CPUS and NVME disks.
Note that the given sizes are just an estimate meant for comparison of
style requirements. Your planet import is likely to be larger as the
OSM data grows with time.
style | Import time | DB size | after drop
----------|--------------|------------|------------
admin | 4h | 215 GB | 20 GB
street | 22h | 440 GB | 185 GB
address | 36h | 545 GB | 260 GB
full | 54h | 640 GB | 330 GB
extratags | 54h | 650 GB | 340 GB
You can also customize the styles further.
A [description of the style format](../develop/Import.md#configuring-the-import)
can be found in the development section.
## Initial import of the data
!!! danger "Important"
First try the import with a small extract, for example from
[Geofabrik](https://download.geofabrik.de).
Download the data to import. Then issue the following command
from the **project directory** to start the import:
```sh
nominatim import --osm-file <data file> 2>&1 | tee setup.log
```
The **project directory** is the one that you have set up at the beginning.
See [creating the project directory](Import#creating-the-project-directory).
### Notes on full planet imports
Even on a perfectly configured machine
the import of a full planet takes around 2 days. Once you see messages
with `Rank .. ETA` appear, the indexing process has started. This part takes
the most time. There are 30 ranks to process. Rank 26 and 30 are the most complex.
They take each about a third of the total import time. If you have not reached
rank 26 after two days of import, it is worth revisiting your system
configuration as it may not be optimal for the import.
### Notes on memory usage
In the first step of the import Nominatim uses [osm2pgsql](https://osm2pgsql.org)
to load the OSM data into the PostgreSQL database. This step is very demanding
in terms of RAM usage. osm2pgsql and PostgreSQL are running in parallel at
this point. PostgreSQL blocks at least the part of RAM that has been configured
with the `shared_buffers` parameter during
[PostgreSQL tuning](Installation#postgresql-tuning)
and needs some memory on top of that. osm2pgsql needs at least 2GB of RAM for
its internal data structures, potentially more when it has to process very large
relations. In addition it needs to maintain a cache for node locations. The size
of this cache can be configured with the parameter `--osm2pgsql-cache`.
When importing with a flatnode file, it is best to disable the node cache
completely and leave the memory for the flatnode file. Nominatim will do this
by default, so you do not need to configure anything in this case.
For imports without a flatnode file, set `--osm2pgsql-cache` approximately to
the size of the OSM pbf file you are importing. The size needs to be given in
MB. Make sure you leave enough RAM for PostgreSQL and osm2pgsql as mentioned
above. If the system starts swapping or you are getting out-of-memory errors,
reduce the cache size or even consider using a flatnode file.
### Testing the installation
Run this script to verify all required tables and indices got created successfully.
```sh
nominatim admin --check-database
```
Now you can try out your installation by running:
```sh
nominatim serve
```
This runs a small test server normally used for development. You can use it
to verify that your installation is working. Go to
`http://localhost:8088/status.php` and you should see the message `OK`.
You can also run a search query, e.g. `http://localhost:8088/search.php?q=Berlin`.
Note that search query is not supported for reverse-only imports. You can run a
reverse query, e.g. `http://localhost:8088/reverse.php?lat=27.1750090510034&lon=78.04209025`.
To run Nominatim via webservers like Apache or nginx, please read the
[Deployment chapter](Deployment.md).
## Tuning the database
Accurate word frequency information for search terms helps PostgreSQL's query
planner to make the right decisions. Recomputing them can improve the performance
of forward geocoding in particular under high load. To recompute word counts run:
```sh
nominatim refresh --word-counts
```
This will take a couple of hours for a full planet installation. You can
also defer that step to a later point in time when you realise that
performance becomes an issue. Just make sure that updates are stopped before
running this function.
If you want to be able to search for places by their type through
[special key phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to import these key phrases like this:
```sh
nominatim special-phrases --import-from-wiki
```
Note that this command downloads the phrases from the wiki link above. You
need internet access for the step.
You can also import special phrases from a csv file, for more
information please read the [Customization chapter](Customization.md).

View File

@@ -4,9 +4,8 @@ This page contains generic installation instructions for Nominatim and its
prerequisites. There are also step-by-step instructions available for
the following operating systems:
* [Ubuntu 20.04](../appendix/Install-on-Ubuntu-20.md)
* [Ubuntu 18.04](../appendix/Install-on-Ubuntu-18.md)
* [CentOS 8](../appendix/Install-on-Centos-8.md)
* [Ubuntu 16.04](../appendix/Install-on-Ubuntu-16.md)
* [CentOS 7.2](../appendix/Install-on-Centos-7.md)
These OS-specific instructions can also be found in executable form
@@ -17,7 +16,6 @@ and can't offer support.
* [Docker](https://github.com/mediagis/nominatim-docker)
* [Docker on Kubernetes](https://github.com/peter-evans/nominatim-k8s)
* [Kubernetes with Helm](https://github.com/robjuz/helm-charts/blob/master/charts/nominatim/README.md)
* [Ansible](https://github.com/synthesio/infra-ansible-nominatim)
## Prerequisites
@@ -27,82 +25,65 @@ and can't offer support.
For compiling:
* [cmake](https://cmake.org/)
* [expat](https://libexpat.github.io/)
* [proj](https://proj.org/)
* [bzip2](http://www.bzip.org/)
* [zlib](https://www.zlib.net/)
* [ICU](http://site.icu-project.org/)
* [Boost libraries](https://www.boost.org/), including system and filesystem
* PostgreSQL client libraries
* a recent C++ compiler (gcc 5+ or Clang 3.8+)
* [libxml2](http://xmlsoft.org/)
* a recent C++ compiler
Nominatim comes with its own version of osm2pgsql. See the
osm2pgsql README for additional dependencies required for compiling osm2pgsql.
For running tests:
* [behave](http://pythonhosted.org/behave/)
* [Psycopg2](https://initd.org/psycopg)
* [nose](https://nose.readthedocs.io)
* [phpunit](https://phpunit.de)
For running Nominatim:
* [PostgreSQL](https://www.postgresql.org) (9.5+ will work, 11+ strongly recommended)
* [PostGIS](https://postgis.net) (2.2+)
* [Python 3](https://www.python.org/) (3.6+)
* [Psycopg2](https://www.psycopg.org) (2.7+)
* [Python Dotenv](https://github.com/theskumar/python-dotenv)
* [psutil](https://github.com/giampaolo/psutil)
* [Jinja2](https://palletsprojects.com/p/jinja/)
* [PyICU](https://pypi.org/project/PyICU/)
* [PyYaml](https://pyyaml.org/) (5.1+)
* [datrie](https://github.com/pytries/datrie)
* [PostgreSQL](https://www.postgresql.org) (9.3 or later)
* [PostGIS](https://postgis.org) (2.2 or later)
* [PHP](https://php.net) (7.0 or later)
* PHP-pgsql
* PHP-intl (bundled with PHP)
* PHP-cgi (for running queries from the command line)
* [PEAR::DB](https://pear.php.net/package/DB)
* a webserver (apache or nginx are recommended)
For running continuous updates:
* [pyosmium](https://osmcode.org/pyosmium/)
For dependencies for running tests and building documentation, see
the [Development section](../develop/Development-Environment.md).
* [pyosmium](https://osmcode.org/pyosmium/) (with Python 3)
### Hardware
A minimum of 2GB of RAM is required or installation will fail. For a full
planet import 64GB of RAM or more are strongly recommended. Do not report
out of memory problems if you have less than 64GB RAM.
planet import 32GB of RAM or more are strongly recommended.
For a full planet install you will need at least 900GB of hard disk space.
Take into account that the OSM database is growing fast.
Fast disks are essential. Using NVME disks is recommended.
For a full planet install you will need at least 700GB of hard disk space
(take into account that the OSM database is growing fast). SSD disks
will help considerably to speed up import and queries.
Even on a well configured machine the import of a full planet takes
around 2 days. On traditional spinning disks, 7-8 days are more realistic.
On a 6-core machine with 32GB RAM and SSDs the import of a full planet takes
a bit more than 2 days. Without SSDs 7-8 days are more realistic.
## Tuning the PostgreSQL database
## Setup of the server
### PostgreSQL tuning
You might want to tune your PostgreSQL installation so that the later steps
make best use of your hardware. You should tune the following parameters in
your `postgresql.conf` file.
shared_buffers = 2GB
maintenance_work_mem = (10GB)
autovacuum_work_mem = 2GB
work_mem = (50MB)
effective_cache_size = (24GB)
shared_buffers (2GB)
maintenance_work_mem (10GB)
work_mem (50MB)
effective_cache_size (24GB)
synchronous_commit = off
checkpoint_segments = 100 # only for postgresql <= 9.4
max_wal_size = 1GB # postgresql > 9.4
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
The numbers in brackets behind some parameters seem to work fine for
64GB RAM machine. Adjust to your setup. A higher number for `max_wal_size`
means that PostgreSQL needs to run checkpoints less often but it does require
the additional space on your disk.
Autovacuum must not be switched off because it ensures that the
tables are frequently analysed. If your machine has very little memory,
you might consider setting:
autovacuum_max_workers = 1
and even reduce `autovacuum_work_mem` further. This will reduce the amount
of memory that autovacuum takes away from the import process.
32GB RAM machine. Adjust to your setup.
For the initial import, you should also set:
@@ -110,57 +91,68 @@ For the initial import, you should also set:
full_page_writes = off
Don't forget to reenable them after the initial import or you risk database
corruption.
corruption. Autovacuum must not be switched off because it ensures that the
tables are frequently analysed.
### Webserver setup
The `website/` directory in the build directory contains the configured
website. Include the directory into your webbrowser to serve php files
from there.
#### Configure for use with Apache
Make sure your Apache configuration contains the required permissions for the
directory and create an alias:
<Directory "/srv/nominatim/build/website">
Options FollowSymLinks MultiViews
AddType text/html .php
DirectoryIndex search.php
Require all granted
</Directory>
Alias /nominatim /srv/nominatim/build/website
`/srv/nominatim/build` should be replaced with the location of your
build directory.
After making changes in the apache config you need to restart apache.
The website should now be available on http://localhost/nominatim.
#### Configure for use with Nginx
Use php-fpm as a deamon for serving PHP cgi. Install php-fpm together with nginx.
By default php listens on a network socket. If you want it to listen to a
Unix socket instead, change the pool configuration (`pool.d/www.conf`) as
follows:
; Comment out the tcp listener and add the unix socket
;listen = 127.0.0.1:9000
listen = /var/run/php5-fpm.sock
; Ensure that the daemon runs as the correct user
listen.owner = www-data
listen.group = www-data
listen.mode = 0666
Tell nginx that php files are special and to fastcgi_pass to the php-fpm
unix socket by adding the location definition to the default configuration.
root /srv/nominatim/build/website;
index search.php index.html;
location ~ [^/]\.php(/|$) {
fastcgi_split_path_info ^(.+?\.php)(/.*)$;
if (!-f $document_root$fastcgi_script_name) {
return 404;
}
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index search.php;
include fastcgi.conf;
}
Restart the nginx and php5-fpm services and the website should now be available
at `http://localhost/`.
## Downloading and building Nominatim
### Downloading the latest release
You can download the [latest release from nominatim.org](https://nominatim.org/downloads/).
The release contains all necessary files. Just unpack it.
### Downloading the latest development version
If you want to install latest development version from github, make sure to
also check out the osm2pgsql subproject:
```
git clone --recursive git://github.com/openstreetmap/Nominatim.git
```
The development version does not include the country grid. Download it separately:
```
wget -O Nominatim/data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz
```
### Building Nominatim
The code must be built in a separate directory. Create the directory and
change into it.
```
mkdir build
cd build
```
Nominatim uses cmake and make for building. Assuming that you have created the
build at the same level as the Nominatim source directory run:
```
cmake ../Nominatim
make
sudo make install
```
Nominatim installs itself into `/usr/local` per default. To choose a different
installation directory add `-DCMAKE_INSTALL_PREFIX=<install root>` to the
cmake command. Make sure that the `bin` directory is available in your path
in that case, e.g.
```
export PATH=<install root>/bin:$PATH
```
Now continue with [importing the database](Import.md).
Now continue with [importing the database](Import-and-Update.md).

View File

@@ -1,173 +1,10 @@
# Database Migrations
Since version 3.7.0 Nominatim offers automatic migrations. Please follow
the following steps:
This page describes database migrations necessary to update existing databases
to newer versions of Nominatim.
* stop any updates that are potentially running
* update Nominatim to the newer version
* go to your project directory and run `nominatim admin --migrate`
* (optionally) restart updates
Below you find additional migrations and hints about other structural and
breaking changes. **Please read them before running the migration.**
!!! note
If you are migrating from a version <3.6, then you still have to follow
the manual migration steps up to 3.6.
## 3.6.0 -> 3.7.0
### New format and name of configuration file
The configuration for an import is now saved in a `.env` file in the project
directory. This file follows the dotenv format. For more information, see
the [installation chapter](Import.md#configuration-setup-in-env).
To migrate to the new system, create a new project directory, add the `.env`
file and port your custom configuration from `settings/local.php`. Most
settings are named similar and only have received a `NOMINATIM_` prefix.
Use the default settings in `settings/env.defaults` as a reference.
### New location for data files
External data files for Wikipedia importance, postcodes etc. are no longer
expected to reside in the source tree by default. Instead they will be searched
in the project directory. If you have an automated setup script you must
either adapt the download location or explicitly set the location of the
files to the old place in your `.env`.
### Introducing `nominatim` command line tool
The various php utilities have been replaced with a single `nominatim`
command line tool. Make sure to adapt any scripts. There is no direct 1:1
matching between the old utilities and the commands of nominatim CLI. The
following list gives you a list of nominatim sub-commands that contain
functionality of each script:
* ./utils/setup.php: `import`, `freeze`, `refresh`
* ./utils/update.php: `replication`, `add-data`, `index`, `refresh`
* ./utils/specialphrases.php: `special-phrases`
* ./utils/check_import_finished.php: `admin`
* ./utils/warm.php: `admin`
* ./utils/export.php: `export`
Try `nominatim <command> --help` for more information about each subcommand.
`./utils/query.php` no longer exists in its old form. `nominatim search`
provides a replacement but returns different output.
### Switch to normalized house numbers
The housenumber column in the placex table uses now normalized version.
The automatic migration step will convert the column but this may take a
very long time. It is advisable to take the machine offline while doing that.
## 3.5.0 -> 3.6.0
### Change of layout of search_name_* tables
The table need a different index for nearest place lookup. Recreate the
indexes using the following shell script:
```bash
for table in `psql -d nominatim -c "SELECT tablename FROM pg_tables WHERE tablename LIKE 'search_name_%'" -tA | grep -v search_name_blank`;
do
psql -d nominatim -c "DROP INDEX idx_${table}_centroid_place; CREATE INDEX idx_${table}_centroid_place ON ${table} USING gist (centroid) WHERE ((address_rank >= 2) AND (address_rank <= 25)); DROP INDEX idx_${table}_centroid_street; CREATE INDEX idx_${table}_centroid_street ON ${table} USING gist (centroid) WHERE ((address_rank >= 26) AND (address_rank <= 27))";
done
```
### Removal of html output
The debugging UI is no longer directly provided with Nominatim. Instead we
now provide a simple Javascript application. Please refer to
[Setting up the Nominatim UI](../Setup-Nominatim-UI) for details on how to
set up the UI.
The icons served together with the API responses have been moved to the
nominatim-ui project as well. If you want to keep the `icon` field in the
response, you need to set `CONST_MapIcon_URL` to the URL of the `/mapicon`
directory of nominatim-ui.
### Change order during indexing
When reindexing places during updates, there is now a different order used
which needs a different database index. Create it with the following SQL command:
```sql
CREATE INDEX idx_placex_pendingsector_rank_address
ON placex
USING BTREE (rank_address, geometry_sector)
WHERE indexed_status > 0;
```
You can then drop the old index with:
```sql
DROP INDEX idx_placex_pendingsector;
```
### Unused index
This index has been unused ever since the query using it was changed two years ago. Saves about 12GB on a planet installation.
```sql
DROP INDEX idx_placex_geometry_reverse_lookupPoint;
```
### Switching to dotenv
As part of the work changing the configuration format, the configuration for
the website is now using a separate configuration file. To create the
configuration file, run the following command after updating:
```sh
./utils/setup.php --setup-website
```
### Update SQL code
To update the SQL code to the leatest version run:
```
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.4.0 -> 3.5.0
### New Wikipedia/Wikidata importance tables
The `wikipedia_*` tables have a new format that also includes references to
Wikidata. You need to update the computation functions and the tables as
follows:
* download the new Wikipedia tables as described in the import section
* reimport the tables: `./utils/setup.php --import-wikipedia-articles`
* update the functions: `./utils/setup.php --create-functions --enable-diff-updates`
* create a new lookup index:
```sql
CREATE INDEX idx_placex_wikidata
ON placex
USING BTREE ((extratags -> 'wikidata'))
WHERE extratags ? 'wikidata'
AND class = 'place'
AND osm_type = 'N'
AND rank_search < 26;
```
* compute importance: `./utils/update.php --recompute-importance`
The last step takes about 10 hours on the full planet.
Remove one function (it will be recreated in the next step):
```sql
DROP FUNCTION create_country(hstore,character varying);
```
Finally, update all SQL functions:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
SQL statements should be executed from the PostgreSQL commandline. Execute
`psql nominatim` to enter command line mode.
## 3.3.0 -> 3.4.0
@@ -186,12 +23,6 @@ CREATE INDEX idx_location_area_country_geometry ON location_area_country USING G
CREATE INDEX idx_location_area_country_place_id ON location_area_country USING BTREE (place_id);
```
Finally, update all SQL functions:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.2.0 -> 3.3.0
### New database connection string (DSN) format
@@ -208,7 +39,7 @@ The new format is
### Natural Earth country boundaries no longer needed as fallback
```sql
```
DROP TABLE country_naturalearthdata;
```
@@ -234,37 +65,27 @@ following command:
The reverse algorithm has changed and requires new indexes. Run the following
SQL statements to create the indexes:
```sql
```
CREATE INDEX idx_placex_geometry_reverse_lookupPoint
ON placex
USING gist (geometry)
WHERE (name IS NOT null or housenumber IS NOT null or rank_address BETWEEN 26 AND 27)
AND class NOT IN ('railway','tunnel','bridge','man_made')
AND rank_address >= 26
AND indexed_status = 0
AND linked_place_id IS null;
ON placex USING gist (geometry)
WHERE (name is not null or housenumber is not null or rank_address between 26 and 27)
AND class not in ('railway','tunnel','bridge','man_made')
AND rank_address >= 26 AND indexed_status = 0 AND linked_place_id is null;
CREATE INDEX idx_placex_geometry_reverse_lookupPolygon
ON placex USING gist (geometry)
WHERE St_GeometryType(geometry) in ('ST_Polygon', 'ST_MultiPolygon')
AND rank_address between 4 and 25
AND type != 'postcode'
AND name is not null
AND indexed_status = 0
AND linked_place_id is null;
AND rank_address between 4 and 25 AND type != 'postcode'
AND name is not null AND indexed_status = 0 AND linked_place_id is null;
CREATE INDEX idx_placex_geometry_reverse_placeNode
ON placex USING gist (geometry)
WHERE osm_type = 'N'
AND rank_search between 5 and 25
AND class = 'place'
AND type != 'postcode'
AND name is not null
AND indexed_status = 0
AND linked_place_id is null;
WHERE osm_type = 'N' AND rank_search between 5 and 25
AND class = 'place' AND type != 'postcode'
AND name is not null AND indexed_status = 0 AND linked_place_id is null;
```
You also need to grant the website user access to the `country_osm_grid` table:
```sql
```
GRANT SELECT ON table country_osm_grid to "www-user";
```
@@ -272,7 +93,7 @@ Replace the `www-user` with the user name of your website server if necessary.
You can now drop the unused indexes:
```sql
```
DROP INDEX idx_placex_reverse_geometry;
```
@@ -301,8 +122,8 @@ CREATE INDEX idx_postcode_geometry ON location_postcode USING GIST (geometry);
CREATE UNIQUE INDEX idx_postcode_id ON location_postcode USING BTREE (place_id);
CREATE INDEX idx_postcode_postcode ON location_postcode USING BTREE (postcode);
GRANT SELECT ON location_postcode TO "www-data";
DROP TYPE IF EXISTS nearfeaturecentr CASCADE;
CREATE TYPE nearfeaturecentr AS (
drop type if exists nearfeaturecentr cascade;
create type nearfeaturecentr as (
place_id BIGINT,
keywords int[],
rank_address smallint,

View File

@@ -1,184 +0,0 @@
# Setting up the Nominatim UI
Nominatim is a search API, it does not provide a website interface on its
own. [nominatim-ui](https://github.com/osm-search/nominatim-ui) offers a
small website for testing your setup and inspecting the database content.
This section provides a quick start how to use nominatim-ui with your
installation. For more details, please also have a look at the
[README of nominatim-ui](https://github.com/osm-search/nominatim-ui/blob/master/README.md).
## Installing nominatim-ui
We provide regular releases of nominatim-ui that contain the packaged website.
They do not need any special installation. Just download, configure
and run it. Grab the latest release from
[nominatim-ui's Github release page](https://github.com/osm-search/nominatim-ui/releases)
and unpack it. You can use `nominatim-ui-x.x.x.tar.gz` or `nominatim-ui-x.x.x.zip`.
Copy the example configuration into the right place:
cd nominatim-ui
cp dist/config.example.js dist/config.js
Now adapt the configuration to your needs. You need at least
to change the `Nominatim_API_Endpoint` to point to your Nominatim installation.
Then you can just test it locally by spinning up a webserver in the `dist`
directory. For example, with Python:
cd nominatim-ui/dist
python3 -m http.server 8765
The website is now available at `http://localhost:8765`.
## Forwarding searches to nominatim-ui
Nominatim used to provide the search interface directly by itself when
`format=html` was requested. For all endpoints except for `/reverse` and
`/lookup` this even used to be the default.
The following section describes how to set up Apache or nginx, so that your
users are forwarded to nominatim-ui when they go to URL that formerly presented
the UI.
### Setting up forwarding in Nginx
First of all make nominatim-ui available under `/ui` on your webserver:
``` nginx
server {
# Here is the Nominatim setup as described in the Installation section
location /ui/ {
alias <full path to the nominatim-ui directory>/dist/;
index index.html;
}
}
```
Now we need to find out if a URL should be forwarded to the UI. Add the
following `map` commands *outside* the server section:
``` nginx
# Inspect the format parameter in the query arguments. We are interested
# if it is set to html or something else or if it is missing completely.
map $args $format {
default default;
~(^|&)format=html(&|$) html;
~(^|&)format= other;
}
# Determine from the URI and the format parameter above if forwarding is needed.
map $uri/$format $forward_to_ui {
default 1; # The default is to forward.
~^/ui 0; # If the URI point to the UI already, we are done.
~/other$ 0; # An explicit non-html format parameter. No forwarding.
~/reverse.*/default 0; # Reverse and lookup assume xml format when
~/lookup.*/default 0; # no format parameter is given. No forwarding.
}
```
The `$forward_to_ui` parameter can now be used to conditionally forward the
calls:
```
# When no endpoint is given, default to search.
# Need to add a rewrite so that the rewrite rules below catch it correctly.
rewrite ^/$ /search;
location @php {
# fastcgi stuff..
if ($forward_to_ui) {
rewrite ^(/[^/]*) https://yourserver.com/ui$1.html redirect;
}
}
location ~ [^/]\.php(/|$) {
# fastcgi stuff..
if ($forward_to_ui) {
rewrite (.*).php https://yourserver.com/ui$1.html redirect;
}
}
```
!!! warning
Be aware that the rewrite commands are slightly different for URIs with and
without the .php suffix.
Reload nginx and the UI should be available.
### Setting up forwarding in Apache
First of all make nominatim-ui available in the `ui/` subdirectory where
Nominatim is installed. For example, given you have set up an alias under
`nominatim` like this:
``` apache
Alias /nominatim /home/vagrant/build/website
```
you need to insert the following rules for nominatim-ui before that alias:
```
<Directory "/home/vagrant/nominatim-ui/dist">
DirectoryIndex search.html
Require all granted
</Directory>
Alias /nominatim/ui /home/vagrant/nominatim-ui/dist
```
Replace `/home/vagrant/nominatim-ui` with the directory where you have cloned
nominatim-ui.
!!! important
The alias for nominatim-ui must come before the alias for the Nominatim
website directory.
To set up forwarding, the Apache rewrite module is needed. Enable it with:
``` sh
sudo a2enmod rewrite
```
Then add rewrite rules to the `Directory` directive of the Nominatim website
directory like this:
``` apache
<Directory "/home/vagrant/build/website">
Options FollowSymLinks MultiViews
AddType text/html .php
Require all granted
RewriteEngine On
# This must correspond to the URL where nominatim can be found.
RewriteBase "/nominatim/"
# If no endpoint is given, then use search.
RewriteRule ^(/|$) "search.php"
# If format-html is explicity requested, forward to the UI.
RewriteCond %{QUERY_STRING} "format=html"
RewriteRule ^([^/]+).php ui/$1.html [R,END]
# Same but .php suffix is missing.
RewriteCond %{QUERY_STRING} "format=html"
RewriteRule ^([^/]+) ui/$1.html [R,END]
# If no format parameter is there then forward anything
# but /reverse and /lookup to the UI.
RewriteCond %{QUERY_STRING} "!format="
RewriteCond %{REQUEST_URI} "!/lookup"
RewriteCond %{REQUEST_URI} "!/reverse"
RewriteRule ^([^/]+).php ui/$1.html [R,END]
# Same but .php suffix is missing.
RewriteCond %{QUERY_STRING} "!format="
RewriteCond %{REQUEST_URI} "!/lookup"
RewriteCond %{REQUEST_URI} "!/reverse"
RewriteRule ^([^/]+) ui/$1.html [R,END]
</Directory>
```
Restart Apache and the UI should be available.

View File

@@ -1,205 +0,0 @@
# Tokenizers
The tokenizer module in Nominatim is responsible for analysing the names given
to OSM objects and the terms of an incoming query in order to make sure, they
can be matched appropriately.
Nominatim offers different tokenizer modules, which behave differently and have
different configuration options. This sections describes the tokenizers and how
they can be configured.
!!! important
The use of a tokenizer is tied to a database installation. You need to choose
and configure the tokenizer before starting the initial import. Once the import
is done, you cannot switch to another tokenizer anymore. Reconfiguring the
chosen tokenizer is very limited as well. See the comments in each tokenizer
section.
## Legacy tokenizer
The legacy tokenizer implements the analysis algorithms of older Nominatim
versions. It uses a special Postgresql module to normalize names and queries.
This tokenizer is currently the default.
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=legacy
```
The Postgresql module for the tokenizer is available in the `module` directory
and also installed with the remainder of the software under
`lib/nominatim/module/nominatim.so`. You can specify a custom location for
the module with
```
NOMINATIM_DATABASE_MODULE_PATH=<path to directory where nominatim.so resides>
```
This is in particular useful when the database runs on a different server.
See [Advanced installations](Advanced-Installations.md#importing-nominatim-to-an-external-postgresql-database) for details.
There are no other configuration options for the legacy tokenizer. All
normalization functions are hard-coded.
## ICU tokenizer
!!! danger
This tokenizer is currently in active development and still subject
to backwards-incompatible changes.
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and
abbreviation handling.
### How it works
On import the tokenizer processes names in the following four stages:
1. The **Normalization** part removes all non-relevant information from the
input.
2. Incoming names are now converted to **full names**. This process is currently
hard coded and mostly serves to handle name tags from OSM that contain
multiple names (e.g. [Biel/Bienne](https://www.openstreetmap.org/node/240097197)).
3. Next the tokenizer creates **variants** from the full names. These variants
cover decomposition and abbreviation handling. Variants are saved to the
database, so that it is not necessary to create the variants for a search
query.
4. The final **Tokenization** step converts the names to a simple ASCII form,
potentially removing further spelling variants for better matching.
At query time only stage 1) and 4) are used. The query is normalized and
tokenized and the resulting string used for searching in the database.
### Configuration
The ICU tokenizer is configured using a YAML file which can be configured using
`NOMINATIM_TOKENIZER_CONFIG`. The configuration is read on import and then
saved as part of the internal database status. Later changes to the variable
have no effect.
Here is an example configuration file:
``` yaml
normalization:
- ":: lower ()"
- "ß > 'ss'" # German szet is unimbigiously equal to double ss
transliteration:
- !include /etc/nominatim/icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
variants:
- language: de
words:
- ~haus => haus
- ~strasse -> str
- language: en
words:
- road -> rd
- bridge -> bdge,br,brdg,bri,brg
```
The configuration file contains three sections:
`normalization`, `transliteration`, `variants`.
The normalization and transliteration sections each must contain a list of
[ICU transformation rules](https://unicode-org.github.io/icu/userguide/transforms/general/rules.html).
The rules are applied in the order in which they appear in the file.
You can also include additional rules from external yaml file using the
`!include` tag. The included file must contain a valid YAML list of ICU rules
and may again include other files.
!!! warning
The ICU rule syntax contains special characters that conflict with the
YAML syntax. You should therefore always enclose the ICU rules in
double-quotes.
The variants section defines lists of replacements which create alternative
spellings of a name. To create the variants, a name is scanned from left to
right and the longest matching replacement is applied until the end of the
string is reached.
The variants section must contain a list of replacement groups. Each group
defines a set of properties that describes where the replacements are
applicable. In addition, the word section defines the list of replacements
to be made. The basic replacement description is of the form:
```
<source>[,<source>[...]] => <target>[,<target>[...]]
```
The left side contains one or more `source` terms to be replaced. The right side
lists one or more replacements. Each source is replaced with each replacement
term.
!!! tip
The source and target terms are internally normalized using the
normalization rules given in the configuration. This ensures that the
strings match as expected. In fact, it is better to use unnormalized
words in the configuration because then it is possible to change the
rules for normalization later without having to adapt the variant rules.
#### Decomposition
In its standard form, only full words match against the source. There
is a special notation to match the prefix and suffix of a word:
``` yaml
- ~strasse => str # matches "strasse" as full word and in suffix position
- hinter~ => hntr # matches "hinter" as full word and in prefix position
```
There is no facility to match a string in the middle of the word. The suffix
and prefix notation automatically trigger the decomposition mode: two variants
are created for each replacement, one with the replacement attached to the word
and one separate. So in above example, the tokenization of "hauptstrasse" will
create the variants "hauptstr" and "haupt str". Similarly, the name "rote strasse"
triggers the variants "rote str" and "rotestr". By having decomposition work
both ways, it is sufficient to create the variants at index time. The variant
rules are not applied at query time.
To avoid automatic decomposition, use the '|' notation:
``` yaml
- ~strasse |=> str
```
simply changes "hauptstrasse" to "hauptstr" and "rote strasse" to "rote str".
#### Initial and final terms
It is also possible to restrict replacements to the beginning and end of a
name:
``` yaml
- ^south => s # matches only at the beginning of the name
- road$ => rd # matches only at the end of the name
```
So the first example would trigger a replacement for "south 45th street" but
not for "the south beach restaurant".
#### Replacements vs. variants
The replacement syntax `source => target` works as a pure replacement. It changes
the name instead of creating a variant. To create an additional version, you'd
have to write `source => source,target`. As this is a frequent case, there is
a shortcut notation for it:
```
<source>[,<source>[...]] -> <target>[,<target>[...]]
```
The simple arrow causes an additional variant to be added. Note that
decomposition has an effect here on the source as well. So a rule
``` yaml
- "~strasse -> str"
```
means that for a word like `hauptstrasse` four variants are created:
`hauptstrasse`, `haupt strasse`, `hauptstr` and `haupt str`.
### Reconfiguration
Changing the configuration after the import is currently not possible, although
this feature may be added at a later time.

View File

@@ -1,56 +0,0 @@
# Updating the Database
There are many different ways to update your Nominatim database.
The following section describes how to keep it up-to-date using
an [online replication service for OpenStreetMap data](https://wiki.openstreetmap.org/wiki/Planet.osm/diffs)
For a list of other methods to add or update data see the output of
`nominatim add-data --help`.
!!! important
If you have configured a flatnode file for the import, then you
need to keep this flatnode file around for updates.
#### Installing the newest version of Pyosmium
It is recommended to install Pyosmium via pip. Make sure to use python3.
Run (as the same user who will later run the updates):
```sh
pip3 install --user osmium
```
#### Setting up the update process
Next the update needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
to `.env`. For example, to use the daily country extracts
diffs for Ireland from Geofabrik add the following:
# base URL of the replication service
NOMINATIM_REPLICATION_URL="https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates"
# How often upstream publishes diffs (in seconds)
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
# How long to sleep if no update found yet (in seconds)
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900
To set up the update process now run the following command:
nominatim replication --init
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `replication --init` command needs to be rerun whenever the replication
service is changed.
#### Updating Nominatim
The following command will keep your database constantly up to date:
nominatim replication
If you have imported multiple country extracts and want to keep them
up-to-date, [Advanced installations section](Advanced-Installations.md) contains instructions
to set up and update multiple country extracts.

View File

@@ -1,22 +1,19 @@
# Place details
Show all details about a single place saved in the database.
Lookup details about a single place by id. The default output is HTML for debugging search logic and results.
!!! warning
The details page exists for debugging only. You may not use it in scripts
or to automatically query details about a result.
See [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
**The details page (including JSON output) exists for debugging only and must not be downloaded automatically**, see [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
## Parameters
The details API supports the following two request formats:
``` xml
https://nominatim.openstreetmap.org/details?osmtype=[N|W|R]&osmid=<value>&class=<value>
```
https://nominatim.openstreetmap.org/details?osmtype=[N|W|R]&osmid=<value>&class=<value>
```
`osmtype` and `osmid` are required parameters. The type is one of node (N), way (W)
`osmtype` and `osmid` are required parameter. The type is one of node (N), way (W)
or relation (R). The id must be a number. The `class` parameter is optional and
allows to distinguish between entries, when the corresponding OSM object has more
than one main tag. For example, when a place is tagged with `tourism=hotel` and
@@ -26,34 +23,36 @@ to get exactly the one you want. If there are multiple places in the database
but the `class` parameter is left out, then one of the places will be chosen
at random and displayed.
``` xml
https://nominatim.openstreetmap.org/details?place_id=<value>
```
https://nominatim.openstreetmap.org/details?place_id=<value>
```
Place IDs are assigned sequentially during Nominatim data import. The ID
for a place is different between Nominatim installation (servers) and
changes when data gets reimported. Therefore it cannot be used as
a permanent id and shouldn't be used in bug reports.
Placeids are assigned sequentially during Nominatim data import. The id for a place is different between Nominatim installation (servers) and changes when data gets reimported. Therefore it can't be used as permanent id and shouldn't be used in bug reports.
Additional optional parameters are explained below.
### Output format
* `format=[html|json]`
See [Place Output Formats](Output.md) for details on each format. (Default: html)
* `json_callback=<string>`
Wrap JSON output in a callback function (JSONP) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
* `pretty=[0|1]`
Add indentation to make it more human-readable. (Default: 0)
For JSON output will add indentation to make it more human-readable. (Default: 0)
### Output details
* `addressdetails=[0|1]`
Include a breakdown of the address into elements. (Default: 0)
Include a breakdown of the address into elements. (Default for JSON: 0, for HTML: 1)
* `keywords=[0|1]`
@@ -61,16 +60,11 @@ Include a list of name keywords and address keywords (word ids). (Default: 0)
* `linkedplaces=[0|1]`
Include a details of places that are linked with this one. Places get linked
together when they are different forms of the same physical object. Nominatim
links two kinds of objects together: place nodes get linked with the
corresponding administrative boundaries. Waterway relations get linked together with their
members.
(Default: 1)
Include details of places higher in the address hierarchy. E.g. for a street this is usually the city, state, postal code, country. (Default: 1)
* `hierarchy=[0|1]`
Include details of places lower in the address hierarchy. (Default: 0)
Include details of places lower in the address hierarchy. E.g. for a city this usually a list of streets, suburbs, rivers. (Default for JSON: 0, for HTML: 1)
* `group_hierarchy=[0|1]`
@@ -78,7 +72,7 @@ For JSON output will group the places by type. (Default: 0)
* `polygon_geojson=[0|1]`
Include geometry of result. (Default: 0)
Include geometry of result. (Default for JSON: 0, for HTML: 1)
### Language of results
@@ -92,6 +86,10 @@ comma-separated list of language codes.
## Examples
##### HTML
[https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407](https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407)
##### JSON
[https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407&format=json](https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407&format=json)

View File

@@ -58,4 +58,4 @@ The [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) is more
suited for these kinds of queries.
That said if you installed your own Nominatim instance you can use the
`nominatim export` PHP script as basis to return such lists.
`/utils/export.php` PHP script as basis to return such lists.

View File

@@ -56,21 +56,6 @@ specified in the "Accept-Language" HTTP header.
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
### Polygon output
* `polygon_geojson=1`
* `polygon_kml=1`
* `polygon_svg=1`
* `polygon_text=1`
Output geometry of results as a GeoJSON, KML, SVG or WKT. Only one of these
options can be used at a time. (Default: 0)
* `polygon_threshold=0.0`
Return a simplified version of the output geometry. The parameter is the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the result. (Default: 0.0)
### Other

View File

@@ -2,10 +2,12 @@
The [/reverse](Reverse.md), [/search](Search.md) and [/lookup](Lookup.md)
API calls produce very similar output which is explained in this section.
There is one section for each format. The format correspond to what was
selected via the `format` parameter.
There is one section for each format which is selectable via the `format`
parameter.
## JSON
## Formats
### JSON
The JSON format returns an array of places (for search and lookup) or
a single place (for reverse) of the following format:
@@ -39,50 +41,48 @@ a single place (for reverse) of the following format:
"wikipedia": "en:London",
"population": "8416535"
}
}
},
```
The possible fields are:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `boundingbox` - area of corner coordinates ([see notes](#boundingbox))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `boundingbox` - area of corner coordinates
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `display_name` - full comma-separated address
* `class`, `type` - key and value of the main OSM tag
* `importance` - computed importance rank
* `icon` - link to class icon (if available)
* `address` - dictionary of address details (only with `addressdetails=1`,
[see notes](#addressdetails))
* `address` - dictionary of address details (only with `addressdetails=1`)
* `extratags` - dictionary with additional useful tags like website or maxspeed
(only with `extratags=1`)
* `namedetails` - dictionary with full list of available names including ref etc.
* `geojson`, `svg`, `geotext`, `geokml` - full geometry
(only with the appropriate `polygon_*` parameter)
## JSONv2
### JSONv2
This is the same as the JSON format with two changes:
* `class` renamed to `category`
* additional field `place_rank` with the search rank of the object
## GeoJSON
### GeoJSON
This format follows the [RFC7946](https://geojson.org). Every feature includes
a bounding box (`bbox`).
The properties object has the following fields:
The feature list has the following fields:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `category`, `type` - key and value of the main OSM tag
* `display_name` - full comma-separated address
* `place_rank` - class search rank
* `importance` - computed importance rank
* `icon` - link to class icon (if available)
* `address` - dictionary of address details (only with `addressdetails=1`,
[see notes](#addressdetails))
* `address` - dictionary of address details (only with `addressdetails=1`)
* `extratags` - dictionary with additional useful tags like `website` or `maxspeed`
(only with `extratags=1`)
* `namedetails` - dictionary with full list of available names including ref etc.
@@ -90,36 +90,39 @@ The properties object has the following fields:
Use `polygon_geojson` to output the full geometry of the object instead
of the centroid.
## GeocodeJSON
### GeocodeJSON
The GeocodeJSON format follows the
[GeocodeJSON spec 0.1.0](https://github.com/geocoders/geocodejson-spec).
The following feature attributes are implemented:
* `osm_type`, `osm_id` - reference to the OSM object (unofficial extension, [see notes](#osm-reference))
* `osm_type`, `osm_id` - reference to the OSM object (unofficial extension)
* `type` - value of the main tag of the object (e.g. residential, restaurant, ...)
* `label` - full comma-separated address
* `name` - localised name of the place
* `housenumber`, `street`, `locality`, `district`, `postcode`, `city`,
`county`, `state`, `country` -
* `housenumber`, `street`, `locality`, `postcode`, `city`,
`district`, `county`, `state`, `country` -
provided when it can be determined from the address
(see [this issue](https://github.com/openstreetmap/Nominatim/issues/1080) for
current limitations on the correctness of the address) and `addressdetails=1`
was given
* `admin` - list of localised names of administrative boundaries (only with `addressdetails=1`)
Use `polygon_geojson` to output the full geometry of the object instead
of the centroid.
## XML
### XML
The XML response returns one or more place objects in slightly different
formats depending on the API call.
### Reverse
#### Reverse
```
<reversegeocode timestamp="Sat, 11 Aug 18 11:53:21 +0000"
attribution="Data © OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright"
querystring="lat=48.400381&lon=11.745876&zoom=5&format=xml">
<result place_id="179509537" osm_type="relation" osm_id="2145268" ref="BY" place_rank="15" address_rank="15"
<result place_id="179509537" osm_type="relation" osm_id="2145268" ref="BY"
lat="48.9467562" lon="11.4038717"
boundingbox="47.2701114,50.5647142,8.9763497,13.8396373">
Bavaria, Germany
@@ -145,11 +148,11 @@ attribution to OSM and the original querystring.
The place information can be found in the `result` element. The attributes of that element contain:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `ref` - content of `ref` tag if it exists
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox))
* `boundingbox` - comma-separated list of corner coordinates
The full address of the result can be found in the content of the
`result` element as a comma-separated list.
@@ -157,14 +160,14 @@ The full address of the result can be found in the content of the
Additional information requested with `addressdetails=1`, `extratags=1` and
`namedetails=1` can be found in extra elements.
### Search and Lookup
#### Search and Lookup
```
<searchresults timestamp="Sat, 11 Aug 18 11:55:35 +0000"
attribution="Data © OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright"
querystring="london" polygon="false" exclude_place_ids="100149"
more_url="https://nominatim.openstreetmap.org/search.php?q=london&addressdetails=1&extratags=1&exclude_place_ids=100149&format=xml&accept-language=en-US%2Cen%3Bq%3D0.7%2Cde%3Bq%3D0.3">
<place place_id="100149" osm_type="node" osm_id="107775" place_rank="15" address_rank="15"
<place place_id="100149" osm_type="node" osm_id="107775" place_rank="15"
boundingbox="51.3473219,51.6673219,-0.2876474,0.0323526" lat="51.5073219" lon="-0.1276474"
display_name="London, Greater London, England, SW1A 2DU, United Kingdom"
class="place" type="city" importance="0.9654895765402"
@@ -200,13 +203,12 @@ generic information about the query:
The place information can be found in the `place` elements, of which there may
be more than one. The attributes of that element contain:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `ref` - content of `ref` tag if it exists
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox))
* `place_rank` - class [search rank](../develop/Ranking#search-rank)
* `address_rank` - place [address rank](../develop/Ranking#address-rank)
* `boundingbox` - comma-separated list of corner coordinates
* `place_rank` - class search rank
* `display_name` - full comma-separated address
* `class`, `type` - key and value of the main OSM tag
* `importance` - computed importance rank
@@ -216,19 +218,17 @@ When `addressdetails=1` is requested, the localised address parts appear
as subelements with the type of the address part.
Additional information requested with `extratags=1` and `namedetails=1` can
be found in extra elements as sub-element of `extratags` and `namedetails`
respectively.
be found in extra elements as sub-element of each place.
## Notes on field values
### place_id is not a persistent id
The `place_id` is an internal identifier that is assigned data is imported
into a Nominatim database. The same OSM object will have a different value
on another server. It may even change its ID on the same server when it is
removed and reimported while updating the database with fresh OSM data.
It is thus not useful to treat it as permanent for later use.
The `place_id` is created when a Nominatim database gets installed. A
single place will have a different value on another server or even when
the same data gets re-imported. It's thus not useful to treat it as
permanent for later use.
The combination `osm_type`+`osm_id` is slighly better but remember in
OpenStreetMap mappers can delete, split, recreate places (and those
@@ -237,59 +237,10 @@ Places can also change their meaning without changing their `osm_id`,
e.g. when a restaurant is retagged as supermarket. For a more in-depth
discussion see [Permanent ID](https://wiki.openstreetmap.org/wiki/Permanent_ID).
If you need an ID that is consistent over multiple installations of Nominatim,
then you should use the combination of `osm_type`+`osm_id`+`class`.
### OSM reference
Nominatim may sometimes return special objects that do not correspond directly
to an object in OpenStreetMap. These are:
* **Postcodes**. Nominatim returns an postcode point created from all mapped
postcodes of the same name. The class and type of these object is `place=postcdode`.
No `osm_type` and `osm_id` are included in the result.
* **Housenumber interpolations**. Nominatim returns a single interpolated
housenumber from the interpolation way. The class and type are `place=house`
and `osm_type` and `osm_id` correspond to the interpolation way in OSM.
* **TIGER housenumber.** Nominatim returns a single interpolated housenumber
from the TIGER data. The class and type are `place=house`
and `osm_type` and `osm_id` correspond to the street mentioned in the result.
Please note that the `osm_type` and `osm_id` returned may be changed in the
future. You should not expect to only find `node`, `way` and `relation` for
the type.
Nominatim merges some places (e.g. center node of a city with the boundary
relation) so `osm_type`+`osm_id`+`class_name` would be more unique.
### boundingbox
Comma separated list of min latitude, max latitude, min longitude, max longitude.
The whole planet would be `-90,90,-180,180`.
Can be used to pan and center the map on the result, for example with leafletjs
mapping library
`map.fitBounds([[bbox[0],bbox[2]],[bbox[1],bbox[3]]], {padding: [20, 20], maxzoom: 16});`
Bounds crossing the antimeridian have a min latitude -180 and max latitude 180,
essentially covering the entire planet
(see [issue 184](https://github.com/openstreetmap/Nominatim/issues/184)).
### addressdetails
Address details in the xml and json formats return a list of names together
with a designation label. Per default the following labels may appear:
* continent
* country, country_code
* region, state, state_district, county
* municipality, city, town, village
* city_district, district, borough, suburb, subdivision
* hamlet, croft, isolated_dwelling
* neighbourhood, allotments, quarter
* city_block, residental, farm, farmyard, industrial, commercial, retail
* road
* house_number, house_name
* emergency, historic, military, natural, landuse, place, railway,
man_made, aerialway, boundary, amenity, aeroway, club, craft, leisure,
office, mountain_pass, shop, tourism, bridge, tunnel, waterway
They roughly correspond to the classification of the OpenStreetMap data
according to either the `place` tag or the main key of the object.

View File

@@ -7,7 +7,7 @@ Its API has the following endpoints for querying the data:
* __[/search](Search.md)__ - search OSM objects by name or type
* __[/reverse](Reverse.md)__ - search OSM object by their location
* __[/lookup](Lookup.md)__ - look up address details for OSM objects by their ID
* __[/status](Status.md)__ - query the status of the server
* __/status__ - query the status of the server
* __/deletable__ - list objects that have been deleted in OSM but are held
back in Nominatim in case the deletion was accidental
* __/polygons__ - list of broken polygons detected by Nominatim

View File

@@ -1,48 +1,36 @@
# Reverse Geocoding
Reverse geocoding generates an address from a latitude and longitude.
## How it works
The reverse geocoding API does not exactly compute the address for the
coordinate it receives. It works by finding the closest suitable OSM object
and returning its address information. This may occasionally lead to
unexpected results.
First of all, Nominatim only includes OSM objects in
its index that are suitable for searching. Small, unnamed paths for example
are missing from the database and can therefore not be used for reverse
geocoding either.
The other issue to be aware of is that the closest OSM object may not always
have a similar enough address to the coordinate you were requesting. For
example, in dense city areas it may belong to a completely different street.
Reverse geocoding generates an address from a latitude and longitude or from
an OSM object.
## Parameters
The main format of the reverse API is
```
https://nominatim.openstreetmap.org/reverse?lat=<value>&lon=<value>&<params>
https://nominatim.openstreetmap.org/reverse?<query>
```
where `lat` and `lon` are latitude and longitutde of a coordinate in WGS84
projection. The API returns exactly one result or an error when the coordinate
is in an area with no OSM data coverage.
There are two ways how the requested location can be specified:
Additional paramters are accepted as listed below.
* `lat=<value>` `lon=<value>`
!!! warning "Deprecation warning"
The reverse API used to allow address lookup for a single OSM object by
its OSM id. This use is now deprecated. Use the [Address Lookup API](../Lookup)
instead.
A geographic location to generate an address for. The coordiantes must be
in WGS84 format.
* `osm_type=[N|W|R]` `osm_id=<value>`
A specific OSM node(N), way(W) or relation(R) to return an address for.
In both cases exactly one object is returned. The two input parameters cannot
be used at the same time. Both accept the additional optional parameters listed
below.
### Output format
* `format=[xml|json|jsonv2|geojson|geocodejson]`
See [Place Output Formats](Output.md) for details on each format. (Default: xml)
See [Place Output Formats](Output.md) for details on each format. (Default: html)
* `json_callback=<string>`
@@ -81,9 +69,8 @@ comma-separated list of language codes.
* `zoom=[0-18]`
Level of detail required for the address. Default: 18. This is a number that
corresponds roughly to the zoom level used in XYZ tile sources in frameworks
like Leaflet.js, Openlayers etc.
Level of detail required for the address. Default: 18. This is a number that corresponds
roughly to the zoom level used in map frameworks like Leaflet.js, Openlayers etc.
In terms of address details the zoom levels are as follows:
zoom | address detail
@@ -110,7 +97,7 @@ options can be used at a time. (Default: 0)
* `polygon_threshold=0.0`
Return a simplified version of the output geometry. The parameter is the
Simplify the output geometry before returning. The parameter is the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the result. (Default: 0.0)
@@ -162,7 +149,7 @@ This overrides the specified machine readable format. (Default: 0)
"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https:\/\/www.openstreetmap.org\/copyright",
"osm_type":"way",
"osm_id":"280940520",
"lat":"-34.4391708",
"lat":"-34.4391708",
"lon":"-58.7064573",
"place_rank":"26",
"category":"highway",

View File

@@ -1,27 +1,30 @@
# Search queries
The search API allows you to look up a location from a textual description
or address. Nominatim supports structured and free-form search queries.
The search API allows you to look up a location from a textual description.
Nominatim supports structured as well as free-form search queries.
The search query may also contain
[special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
which are translated into specific OpenStreetMap (OSM) tags (e.g. Pub => `amenity=pub`).
This can be used to narrow down the kind of objects to be returned.
!!! warning
Special phrases are not suitable to query all objects of a certain type in an
area. Nominatim will always just return a collection of the best matches. To
download OSM data by object type, use the [Overpass API](https://overpass-api.de/).
Note that this only limits the items to be found, it's not suited to return complete
lists of OSM objects of a specific type. For those use [Overpass API](https://overpass-api.de/).
## Parameters
The search API has the following format:
The search API has the following two formats:
```
https://nominatim.openstreetmap.org/search/<query>?<params>
```
This format only accepts a free-form query string where the
parts of the query are separated by slashes.
```
https://nominatim.openstreetmap.org/search?<params>
```
The search term may be specified with two different sets of parameters:
In this form, the query may be given through two different sets of parameters:
* `q=<query>`
@@ -43,13 +46,13 @@ The search term may be specified with two different sets of parameters:
Structured requests are faster but are less robust against alternative
OSM tagging schemas. **Do not combine with** `q=<query>` **parameter**.
Both query forms accept the additional parameters listed below.
All three query forms accept the additional parameters listed below.
### Output format
* `format=[xml|json|jsonv2|geojson|geocodejson]`
* `format=[html|xml|json|jsonv2|geojson|geocodejson]`
See [Place Output Formats](Output.md) for details on each format. (Default: jsonv2)
See [Place Output Formats](Output.md) for details on each format. (Default: html)
* `json_callback=<string>`
@@ -89,20 +92,16 @@ comma-separated list of language codes.
* `countrycodes=<countrycode>[,<countrycode>][,<countrycode>]...`
Limit search results to one or more countries. `<countrycode>` must be the
[ISO 3166-1alpha2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) code,
e.g. `gb` for the United Kingdom, `de` for Germany.
ISO 3166-1alpha2 code, e.g. `gb` for the United Kingdom, `de` for Germany.
Each place in Nominatim is assigned to one country code based
on OSM country boundaries. In rare cases a place may not be in any country
at all, for example, in international waters.
* `exclude_place_ids=<place_id,[place_id],[place_id]`
If you do not want certain OSM objects to appear in the search
result, give a comma separated list of the `place_id`s you want to skip.
This can be used to retrieve additional search results. For example, if a
previous query only returned a few results, then including those here would
cause the search to return other, less accurate, matches (if possible).
This can be used to broaden search results. For example, if a previous
query only returned a few results, then including those here would cause
the search to return other, less accurate, matches (if possible).
* `limit=<integer>`
@@ -113,17 +112,16 @@ Limit the number of returned results. (Default: 10, Maximum: 50)
* `viewbox=<x1>,<y1>,<x2>,<y2>`
The preferred area to find search results. Any two corner points of the box
are accepted as long as they span a real box. `x` is longitude,
are accepted in any order as long as they span a real box. `x` is longitude,
`y` is latitude.
* `bounded=[0|1]`
When a viewbox is given, restrict the result to items contained within that
When a viewbox is given, restrict the result to items contained with that
viewbox (see above). When `viewbox` and `bounded=1` are given, an amenity
only search is allowed. Give the special keyword for the amenity in square
brackets, e.g. `[pub]` and a selection of objects of this type is returned.
There is no guarantee that the result is complete. (Default: 0)
only search is allowed. In this case, give the special keyword for the
amenity in square brackets, e.g. `[pub]`. (Default: 0)
### Polygon output
@@ -138,7 +136,7 @@ options can be used at a time. (Default: 0)
* `polygon_threshold=0.0`
Return a simplified version of the output geometry. The parameter is the
Simplify the output geometry before returning. The parameter is the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the result. (Default: 0.0)
@@ -152,11 +150,13 @@ address to identify your requests. See Nominatim's [Usage Policy](https://operat
* `dedupe=[0|1]`
Sometimes you have several objects in OSM identifying the same place or
object in reality. The simplest case is a street being split into many
object in reality. The simplest case is a street being split in many
different OSM ways due to different characteristics. Nominatim will
attempt to detect such duplicates and only return one match unless
this parameter is set to 0. (Default: 1)
* `debug=[0|1]`
Output assorted developer debug information. Data on internals of Nominatim's
@@ -168,27 +168,21 @@ This overrides the specified machine readable format. (Default: 0)
## Examples
##### XML with kml polygon
##### XML with polygon points
* [https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_geojson=1&addressdetails=1](https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_geojson=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search/gb/birmingham/pilkington%20avenue/135?format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search/gb/birmingham/pilkington%20avenue/135?format=xml&polygon=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search/135%20pilkington%20avenue,%20birmingham?format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search/135%20pilkington%20avenue,%20birmingham?format=xml&polygon=1&addressdetails=1)
```xml
<searchresults timestamp="Sat, 07 Nov 09 14:42:10 +0000" querystring="135 pilkington, avenue birmingham" polygon="true">
<place
place_id="1620612" osm_type="node" osm_id="452010817"
boundingbox="52.548641204834,52.5488433837891,-1.81612110137939,-1.81592094898224"
polygonpoints="[['-1.81592098644987','52.5487429714954'],['-1.81592290792183','52.5487234624632'],...]"
lat="52.5487429714954" lon="-1.81602098644987"
display_name="135, Pilkington Avenue, Wylde Green, City of Birmingham, West Midlands (county), B72, United Kingdom"
class="place" type="house">
<geokml>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-1.816513,52.548756599999997 -1.816434,52.548747300000002 -1.816429,52.5487629 -1.8163717,52.548756099999999 -1.8163464,52.548834599999999 -1.8164599,52.548848100000001 -1.8164685,52.5488213 -1.8164913,52.548824000000003 -1.816513,52.548756599999997</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</geokml>
<house_number>135</house_number>
<road>Pilkington Avenue</road>
<village>Wylde Green</village>

View File

@@ -1,66 +0,0 @@
# Status
Useful for checking if the service and database is running. The JSON output also shows
when the database was last updated.
## Parameters
* `format=[text|json]` (defaults to 'text')
## Output
#### Text format
```
https://nominatim.openstreetmap.org/status.php
```
will return HTTP status code 200 and print `OK`.
On error it will return HTTP status code 500 and print a message, e.g.
`ERROR: Database connection failed`.
#### JSON format
```
https://nominatim.openstreetmap.org/status.php?format=json
```
will return HTTP code 200 and a structure
```json
{
"status": 0,
"message": "OK",
"data_updated": "2020-05-04T14:47:00+00:00",
"software_version": "3.6.0-0",
"database_version": "3.6.0-0"
}
```
The `software_version` field contains the version of Nominatim used to serve
the API. The `database_version` field contains the version of the data format
in the database.
On error will also return HTTP status code 200 and a structure with error
code and message, e.g.
```json
{
"status": 700,
"message": "Database connection failed"
}
```
Possible status codes are
| | message | notes |
|-----|----------------------|---------------------------------------------------|
| 700 | "No database" | connection failed |
| 701 | "Module failed" | database could not load nominatim.so |
| 702 | "Module call failed" | nominatim.so loaded but calling a function failed |
| 703 | "Query failed" | test query against a database table failed |
| 704 | "No value" | test query worked but returned no results |

View File

@@ -0,0 +1,4 @@
# Additional Data Sources
This guide explains how data sources other than OpenStreetMap mentioned in
the install instructions got obtained and converted.

View File

@@ -1,128 +0,0 @@
# Setting up Nominatim for Development
This chapter gives an overview how to set up Nominatim for developement
and how to run tests.
!!! Important
This guide assumes that you develop under the latest version of Ubuntu. You
can of course also use your favourite distribution. You just might have to
adapt the commands below slightly, in particular the commands for installing
additional software.
## Installing Nominatim
The first step is to install Nominatim itself. Please follow the installation
instructions in the [Admin section](../admin/Installation.md). You don't need
to set up a webserver for development, the webserver that is included with PHP
is sufficient.
If you want to run Nominatim in a VM via Vagrant, use the default `ubuntu` setup.
Vagrant's libvirt provider runs out-of-the-box under Ubuntu. You also need to
install an NFS daemon to enable directory sharing between host and guest. The
following packages should get you started:
sudo apt install vagrant vagrant-libvirt libvirt-daemon nfs-kernel-server
## Prerequisites for testing and documentation
The Nominatim test suite consists of behavioural tests (using behave) and
unit tests (using PHPUnit for PHP code and pytest for Python code).
It has the following additional requirements:
* [behave test framework](https://behave.readthedocs.io) >= 1.2.6
* [phpunit](https://phpunit.de) >= 7.3
* [PHP CodeSniffer](https://github.com/squizlabs/PHP_CodeSniffer)
* [Pylint](https://pylint.org/) (2.6.0 is used for the CI)
* [pytest](https://pytest.org)
The documentation is built with mkdocs:
* [mkdocs](https://www.mkdocs.org/) >= 1.1.2
### Installing prerequisites on Ubuntu/Debian
Some of the Python packages require the newest version which is not yet
available with the current distributions. Therefore it is recommended to
install pip to get the newest versions.
To install all necessary packages run:
```sh
sudo apt install php-cgi phpunit php-codesniffer \
python3-pip python3-setuptools python3-dev pylint
pip3 install --user behave mkdocs pytest
```
The `mkdocs` executable will be located in `.local/bin`. You may have to add
this directory to your path, for example by running:
```
echo 'export PATH=~/.local/bin:$PATH' > ~/.profile
```
If your distribution does not have PHPUnit 7.3+, you can install it (as well
as CodeSniffer) via composer:
```
sudo apt-get install composer
composer global require "squizlabs/php_codesniffer=*"
composer global require "phpunit/phpunit=8.*"
```
The binaries are found in `.config/composer/vendor/bin`. You need to add this
to your PATH as well:
```
echo 'export PATH=~/.config/composer/vendor/bin:$PATH' > ~/.profile
```
## Executing Tests
All tests are located in the `/test` directory.
To run all tests just go to the build directory and run make:
```sh
cd build
make test
```
For more information about the structure of the tests and how to change and
extend the test suite, see the [Testing chapter](Testing.md).
## Documentation Pages
The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is
built using the [MkDocs](https://www.mkdocs.org/) static site generation
framework. The master branch is automatically deployed every night on
[https://nominatim.org/release-docs/develop/](https://nominatim.org/release-docs/develop/)
To build the documentation, go to the build directory and run
```
make doc
INFO - Cleaning site directory
INFO - Building documentation to directory: /home/vagrant/build/site-html
```
This runs `mkdocs build` plus extra transformation of some files and adds
symlinks (see `CMakeLists.txt` for the exact steps).
Now you can start webserver for local testing
```
build> mkdocs serve
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
If you develop inside a Vagrant virtual machine, use a port that is forwarded
to your host:
```
build> mkdocs serve --dev-addr 0.0.0.0:8088
[server:296] Serving on http://0.0.0.0:8088
[handlers:62] Start watching changes
```

View File

@@ -0,0 +1,36 @@
# Documentation Pages
The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is built using the [MkDocs](https://www.mkdocs.org/) static site generation framework. The master branch is automatically deployed every night on under [https://nominatim.org/release-docs/develop/]()
To preview local changes:
1. Install MkDocs
```
pip3 install --user mkdocs
```
2. In build directory run
```
make doc
INFO - Cleaning site directory
INFO - Building documentation to directory: /home/vagrant/build/site-html
```
This runs `mkdocs build` plus extra transformion of some files and adds symlinks (see `CMakeLists.txt` for the exact steps).
3. Start webserver for local testing
```
mkdocs serve
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
If you develop inside a Vagrant virtual machine:
* add port forwarding to your Vagrantfile, e.g. `config.vm.network "forwarded_port", guest: 8000, host: 8000`
* use `mkdocs serve --dev-addr 0.0.0.0:8000` because the default localhost
IP does not get forwarded.

View File

@@ -1,8 +1,8 @@
# OSM Data Import
OSM data is initially imported using [osm2pgsql](https://osm2pgsql.org).
Nominatim uses its own data output style 'gazetteer', which differs from the
output style created for map rendering.
OSM data is initially imported using osm2pgsql. Nominatim uses its own data
output style 'gazetteer', which differs from the output style created for
map rendering.
## Database Layout
@@ -29,7 +29,7 @@ once with `class` of `highway` and once with a `class` of `bridge`. Thus the
## Configuring the Import
How tags are interpreted and assigned to the different `place` columns can be
configured via the import style configuration file (`NOMINATIM_IMPORT_STYLE`). This
configured via the import style configuration file (`CONST_Import_style`). This
is a JSON file which contains a list of rules which are matched against every
tag of every object and then assign the tag its specific role.

View File

@@ -1,45 +0,0 @@
# Postcodes in Nominatim
The blog post
[Nominatim and Postcodes](https://www.openstreetmap.org/user/lonvia/diary/43143)
describes the handling implemented since Nominatim 3.1.
Postcode centroids (aka 'calculated postcodes') are generated by looking at all
postcodes of a country, grouping them and calculating the geometric centroid.
There is currently no logic to deal with extreme outliers (typos or other
mistakes in OSM data). There is also no check if a postcodes adheres to a
country's format, e.g. if Swiss postcodes are 4 digits.
## Regular updating calculated postcodes
The script to rerun the calculation is
`nominatim refresh --postcodes`
and runs once per night on nominatim.openstreetmap.org.
## Finding places that share a specific postcode
In the Nominatim database run
```sql
SELECT address->'postcode' as pc,
osm_type, osm_id, class, type,
st_x(centroid) as lon, st_y(centroid) as lat
FROM placex
WHERE country_code='fr'
AND upper(trim (both ' ' from address->'postcode')) = '33210';
```
Alternatively on [Overpass](https://overpass-turbo.eu/) run the following query
```
[out:json][timeout:250];
area["name"="France"]->.boundaryarea;
(
nwr(area.boundaryarea)["addr:postcode"="33210"];
);
out body;
>;
out skel qt;
```

View File

@@ -7,74 +7,24 @@ different purposes, which are explained in this chapter.
## Search rank
The search rank describes the extent and importance of a place. It is used
when ranking search results. Simply put, if there are two results for a
when ranking search result. Simply put, if there are two results for a
search query which are otherwise equal, then the result with the _lower_
search rank will be appear higher in the result list.
Search ranks are not so important these days because many well-known
places use the Wikipedia importance ranking instead.
The following table gives an overview of the kind of features that Nominatim
expects for each rank:
rank | typical place types | extent
-------|---------------------------------|-------
1-3 | oceans, continents | -
4 | countries | -
5-9 | states, regions, provinces | -
10-12 | counties | -
13-16 | cities, municipalities, islands | 15 km
17-18 | towns, boroughs | 4 km
19 | villages, suburbs | 2 km
20 | hamlets, farms, neighbourhoods | 1 km
21-25 | isolated dwellings, city blocks | 500 m
The extent column describes how far a feature is assumed to reach when it
is mapped only as a point. Larger features like countries and states are usually
available with their exact area in the OpenStreetMap data. That is why no extent
is given.
## Address rank
The address rank describes where a place shows up in an address hierarchy.
Usually only administrative boundaries and place nodes and areas are
eligible to be part of an address. Places that should not appear in the
address must have an address rank of 0.
eligible to be part of an address. All other objects have an address rank
of 0.
The following table gives an overview how ranks are mapped to address parts:
rank | address part
-------------|-------------
1-3 | _unused_
4 | country
5-9 | state
10-12 | county
13-16 | city
17-21 | suburb
22-24 | neighbourhood
25 | squares, farms, localities
26-27 | street
28-30 | POI/house number
The country rank 4 usually doesn't show up in the address parts of an object.
The country is determined indirectly from the country code.
Ranks 5-24 can be assigned more or less freely. They make up the major part
of the address.
Rank 25 is also an addressing rank but it is special because while it can be
the parent to a POI with an addr:place of the same name, it cannot be a parent
to streets. Use it for place features that are technically on the same level
as a street (e.g. squares, city blocks) or for places that should not normally
appear in an address unless explicitly tagged so (e.g place=locality which
should be uninhabited and as such not addressable).
The street ranks 26 and 27 are handled slightly differently. Only one object
from these ranks shows up in an address.
For POI level objects like shops, buildings or house numbers always use rank 30.
Ranks 28 is reserved for house number interpolations. 29 is for internal use
only.
Note that the search rank of a place plays a role in the address computation
as well. When collecting the places that should make up the address parts
then only places are taken into account that have a lower address rank than
the search rank of the base object.
## Rank configuration
@@ -87,9 +37,9 @@ into the database. There are a few hard-coded rules for the assignment:
* highway nodes
* landuse that is not an area
Other than that, the ranks can be freely assigned via the JSON file according
to their type and the country they are in. The name of the config file to be
used can be changed with the setting `NOMINATIM_ADDRESS_LEVEL_CONFIG`.
Other than that, the ranks can be freely assigned via the JSON file
defined with `CONST_Address_Level_Config` according to their type and
the country they are in.
The address level configuration must consist of an array of configuration
entries, each containing a tag definition and an optional country array:
@@ -134,7 +84,7 @@ Then the rank is used when no more specific value is found for the given
key.
Countries and key/value combination may appear in multiple definitions. Just
make sure that each combination of country/key/value appears only once per
make sure that each combination of counrty/key/value appears only once per
file. Otherwise the import will fail with a UNIQUE INDEX constraint violation
on import.

View File

@@ -1,34 +0,0 @@
# Additional Data Sources
This guide explains how data sources other than OpenStreetMap mentioned in
the install instructions got obtained and converted.
## Country grid
Nominatim uses pre-generated country borders data. In case one imports only
a subset of a country. And to assign each place a partition. Nominatim
database tables are split into partitions for performance.
More details in [osm-search/country-grid-data](https://github.com/osm-search/country-grid-data).
## US Census TIGER
For the United States you can choose to import additonal street-level data.
The data isn't mixed into OSM data but queried as fallback when no OSM
result can be found.
More details in [osm-search/TIGER-data](https://github.com/osm-search/TIGER-data).
## GB postcodes
For Great Britain you can choose to import Royalmail postcode centroids.
More details in [osm-search/gb-postcode-data](https://github.com/osm-search/gb-postcode-data).
## Wikipedia & Wikidata rankings
Nominatim can import "importance" data of place names. This greatly
improves ranking of results.
More details in [osm-search/wikipedia-wikidata](https://github.com/osm-search/wikipedia-wikidata).

View File

@@ -9,14 +9,14 @@ the address computation and the search frontend.
The __data import__ stage reads the raw OSM data and extracts all information
that is useful for geocoding. This part is done by osm2pgsql, the same tool
that can also be used to import a rendering database. It uses the special
gazetteer output plugin in `osm2pgsql/src/output-gazetter.[ch]pp`. The result of
gazetteer output plugin in `osm2pgsql/output-gazetter.[ch]pp`. The result of
the import can be found in the database table `place`.
The __address computation__ or __indexing__ stage takes the data from `place`
and adds additional information needed for geocoding. It ranks the places by
importance, links objects that belong together and computes addresses and
the search index. Most of this work is done in PL/pgSQL via database triggers
and can be found in the files in the `sql/functions/` directory.
and can be found in the file `sql/functions.sql`.
The __search frontend__ implements the actual API. It takes search
and reverse geocoding queries from the user, looks up the data and

View File

@@ -1,15 +1,3 @@
.toctree-l3 {
display: none!important
}
table {
margin-bottom: 12pt
}
th, td {
padding: 1pt 12pt;
}
th {
background-color: #eee;
}

View File

@@ -1,7 +1,7 @@
site_name: Nominatim Documentation
theme: readthedocs
docs_dir: ${CMAKE_CURRENT_BINARY_DIR}
site_url: https://nominatim.org
site_url: http://nominatim.org
repo_url: https://github.com/openstreetmap/Nominatim
pages:
- 'Introduction' : 'index.md'
@@ -11,36 +11,31 @@ pages:
- 'Reverse': 'api/Reverse.md'
- 'Address Lookup': 'api/Lookup.md'
- 'Details' : 'api/Details.md'
- 'Status' : 'api/Status.md'
- 'Place Output Formats': 'api/Output.md'
- 'FAQ': 'api/Faq.md'
- 'Administration Guide':
- 'Basic Installation': 'admin/Installation.md'
- 'Import' : 'admin/Import.md'
- 'Update' : 'admin/Update.md'
- 'Deploy' : 'admin/Deployment.md'
- 'Customize Imports' : 'admin/Customization.md'
- 'Tokenizers' : 'admin/Tokenizers.md'
- 'Nominatim UI' : 'admin/Setup-Nominatim-UI.md'
- 'Advanced Installations' : 'admin/Advanced-Installations.md'
- 'Importing and Updating' : 'admin/Import-and-Update.md'
- 'Migration from older Versions' : 'admin/Migration.md'
- 'Troubleshooting' : 'admin/Faq.md'
- 'Developers Guide':
- 'Setup for Development' : 'develop/Development-Environment.md'
- 'Architecture Overview' : 'develop/overview.md'
- 'Overview' : 'develop/overview.md'
- 'OSM Data Import' : 'develop/Import.md'
- 'Place Ranking' : 'develop/Ranking.md'
- 'Postcodes' : 'develop/Postcodes.md'
- 'Testing' : 'develop/Testing.md'
- 'External Data Sources': 'develop/data-sources.md'
- 'Documentation' : 'develop/Documentation.md'
- 'External Data Sources':
- 'Overview' : 'data-sources/overview.md'
- 'US Census (Tiger)': 'data-sources/US-Tiger.md'
- 'GB Postcodes': 'data-sources/GB-Postcodes.md'
- 'Country Grid': 'data-sources/Country-Grid.md'
- 'Wikipedia & Wikidata': 'data-sources/Wikipedia-Wikidata.md'
- 'Appendix':
- 'Installation on CentOS 7' : 'appendix/Install-on-Centos-7.md'
- 'Installation on CentOS 8' : 'appendix/Install-on-Centos-8.md'
- 'Installation on Ubuntu 16' : 'appendix/Install-on-Ubuntu-16.md'
- 'Installation on Ubuntu 18' : 'appendix/Install-on-Ubuntu-18.md'
- 'Installation on Ubuntu 20' : 'appendix/Install-on-Ubuntu-20.md'
markdown_extensions:
- codehilite
- admonition
- codehilite:
use_pygments: False
- toc:
permalink:
extra_css: [extra.css, styles.css]
extra_css: [extra.css]

View File

@@ -1,69 +0,0 @@
.codehilite .hll { background-color: #ffffcc }
.codehilite { background: #f0f0f0; }
.codehilite .c { color: #60a0b0; font-style: italic } /* Comment */
.codehilite .err { /* border: 1px solid #FF0000 */ } /* Error */
.codehilite .k { color: #007020; font-weight: bold } /* Keyword */
.codehilite .o { color: #666666 } /* Operator */
.codehilite .ch { color: #60a0b0; font-style: italic } /* Comment.Hashbang */
.codehilite .cm { color: #60a0b0; font-style: italic } /* Comment.Multiline */
.codehilite .cp { color: #007020 } /* Comment.Preproc */
.codehilite .cpf { color: #60a0b0; font-style: italic } /* Comment.PreprocFile */
.codehilite .c1 { color: #60a0b0; font-style: italic } /* Comment.Single */
.codehilite .cs { color: #60a0b0; background-color: #fff0f0 } /* Comment.Special */
.codehilite .gd { color: #A00000 } /* Generic.Deleted */
.codehilite .ge { font-style: italic } /* Generic.Emph */
.codehilite .gr { color: #FF0000 } /* Generic.Error */
.codehilite .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.codehilite .gi { color: #00A000 } /* Generic.Inserted */
.codehilite .go { color: #888888 } /* Generic.Output */
.codehilite .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
.codehilite .gs { font-weight: bold } /* Generic.Strong */
.codehilite .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.codehilite .gt { color: #0044DD } /* Generic.Traceback */
.codehilite .kc { color: #007020; font-weight: bold } /* Keyword.Constant */
.codehilite .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
.codehilite .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
.codehilite .kp { color: #007020 } /* Keyword.Pseudo */
.codehilite .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
.codehilite .kt { color: #902000 } /* Keyword.Type */
.codehilite .m { color: #40a070 } /* Literal.Number */
.codehilite .s { color: #4070a0 } /* Literal.String */
.codehilite .na { color: #4070a0 } /* Name.Attribute */
.codehilite .nb { color: #007020 } /* Name.Builtin */
.codehilite .nc { color: #0e84b5; font-weight: bold } /* Name.Class */
.codehilite .no { color: #60add5 } /* Name.Constant */
.codehilite .nd { color: #555555; font-weight: bold } /* Name.Decorator */
.codehilite .ni { color: #d55537; font-weight: bold } /* Name.Entity */
.codehilite .ne { color: #007020 } /* Name.Exception */
.codehilite .nf { color: #06287e } /* Name.Function */
.codehilite .nl { color: #002070; font-weight: bold } /* Name.Label */
.codehilite .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
.codehilite .nt { color: #062873; font-weight: bold } /* Name.Tag */
.codehilite .nv { color: #bb60d5 } /* Name.Variable */
.codehilite .ow { color: #007020; font-weight: bold } /* Operator.Word */
.codehilite .w { color: #bbbbbb } /* Text.Whitespace */
.codehilite .mb { color: #40a070 } /* Literal.Number.Bin */
.codehilite .mf { color: #40a070 } /* Literal.Number.Float */
.codehilite .mh { color: #40a070 } /* Literal.Number.Hex */
.codehilite .mi { color: #40a070 } /* Literal.Number.Integer */
.codehilite .mo { color: #40a070 } /* Literal.Number.Oct */
.codehilite .sa { color: #4070a0 } /* Literal.String.Affix */
.codehilite .sb { color: #4070a0 } /* Literal.String.Backtick */
.codehilite .sc { color: #4070a0 } /* Literal.String.Char */
.codehilite .dl { color: #4070a0 } /* Literal.String.Delimiter */
.codehilite .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
.codehilite .s2 { color: #4070a0 } /* Literal.String.Double */
.codehilite .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
.codehilite .sh { color: #4070a0 } /* Literal.String.Heredoc */
.codehilite .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */
.codehilite .sx { color: #c65d09 } /* Literal.String.Other */
.codehilite .sr { color: #235388 } /* Literal.String.Regex */
.codehilite .s1 { color: #4070a0 } /* Literal.String.Single */
.codehilite .ss { color: #517918 } /* Literal.String.Symbol */
.codehilite .bp { color: #007020 } /* Name.Builtin.Pseudo */
.codehilite .fm { color: #06287e } /* Name.Function.Magic */
.codehilite .vc { color: #bb60d5 } /* Name.Variable.Class */
.codehilite .vg { color: #bb60d5 } /* Name.Variable.Global */
.codehilite .vi { color: #bb60d5 } /* Name.Variable.Instance */
.codehilite .vm { color: #bb60d5 } /* Name.Variable.Magic */
.codehilite .il { color: #40a070 } /* Literal.Number.Integer.Long */

View File

@@ -1,568 +0,0 @@
<?php
namespace Nominatim\ClassTypes;
/**
* Create a label tag for the given place that can be used as an XML name.
*
* @param array[] $aPlace Information about the place to label.
*
* A label tag groups various object types together under a common
* label. The returned value is lower case and has no spaces
*/
function getLabelTag($aPlace, $sCountry = null)
{
$iRank = (int) ($aPlace['rank_address'] ?? 30);
$sLabel;
if (isset($aPlace['place_type'])) {
$sLabel = $aPlace['place_type'];
} elseif ($aPlace['class'] == 'boundary' && $aPlace['type'] == 'administrative') {
$sLabel = getBoundaryLabel($iRank/2, $sCountry);
} elseif ($aPlace['type'] == 'postal_code') {
$sLabel = 'postcode';
} elseif ($iRank < 26) {
$sLabel = $aPlace['type'];
} elseif ($iRank < 28) {
$sLabel = 'road';
} elseif ($aPlace['class'] == 'place'
&& ($aPlace['type'] == 'house_number' ||
$aPlace['type'] == 'house_name' ||
$aPlace['type'] == 'country_code')
) {
$sLabel = $aPlace['type'];
} else {
$sLabel = $aPlace['class'];
}
return strtolower(str_replace(' ', '_', $sLabel));
}
/**
* Create a label for the given place.
*
* @param array[] $aPlace Information about the place to label.
*/
function getLabel($aPlace, $sCountry = null)
{
if (isset($aPlace['place_type'])) {
return ucwords(str_replace('_', ' ', $aPlace['place_type']));
}
if ($aPlace['class'] == 'boundary' && $aPlace['type'] == 'administrative') {
return getBoundaryLabel(($aPlace['rank_address'] ?? 30)/2, $sCountry ?? null);
}
// Return a label only for 'important' class/type combinations
if (getImportance($aPlace) !== null) {
return ucwords(str_replace('_', ' ', $aPlace['type']));
}
return null;
}
/**
* Return a simple label for an administrative boundary for the given country.
*
* @param int $iAdminLevel Content of admin_level tag.
* @param string $sCountry Country code of the country where the object is
* in. May be null, in which case a world-wide
* fallback is used.
* @param string $sFallback String to return if no explicit string is listed.
*
* @return string
*/
function getBoundaryLabel($iAdminLevel, $sCountry, $sFallback = 'Administrative')
{
static $aBoundaryList = array (
'default' => array (
1 => 'Continent',
2 => 'Country',
3 => 'Region',
4 => 'State',
5 => 'State District',
6 => 'County',
7 => 'Municipality',
8 => 'City',
9 => 'City District',
10 => 'Suburb',
11 => 'Neighbourhood',
12 => 'City Block'
),
'no' => array (
3 => 'State',
4 => 'County'
),
'se' => array (
3 => 'State',
4 => 'County'
)
);
if (isset($aBoundaryList[$sCountry])
&& isset($aBoundaryList[$sCountry][$iAdminLevel])
) {
return $aBoundaryList[$sCountry][$iAdminLevel];
}
return $aBoundaryList['default'][$iAdminLevel] ?? $sFallback;
}
/**
* Return an estimated radius of how far the object node extends.
*
* @param array[] $aPlace Information about the place. This must be a node
* feature.
*
* @return float The radius around the feature in degrees.
*/
function getDefRadius($aPlace)
{
$aSpecialRadius = array(
'place:continent' => 25,
'place:country' => 7,
'place:state' => 2.6,
'place:province' => 2.6,
'place:region' => 1.0,
'place:county' => 0.7,
'place:city' => 0.16,
'place:municipality' => 0.16,
'place:island' => 0.32,
'place:postcode' => 0.16,
'place:town' => 0.04,
'place:village' => 0.02,
'place:hamlet' => 0.02,
'place:district' => 0.02,
'place:borough' => 0.02,
'place:suburb' => 0.02,
'place:locality' => 0.01,
'place:neighbourhood'=> 0.01,
'place:quarter' => 0.01,
'place:city_block' => 0.01,
'landuse:farm' => 0.01,
'place:farm' => 0.01,
'place:airport' => 0.015,
'aeroway:aerodrome' => 0.015,
'railway:station' => 0.005
);
$sClassPlace = $aPlace['class'].':'.$aPlace['type'];
return $aSpecialRadius[$sClassPlace] ?? 0.00005;
}
/**
* Get the icon to use with the given object.
*/
function getIcon($aPlace)
{
$aIcons = array(
'boundary:administrative' => 'poi_boundary_administrative',
'place:city' => 'poi_place_city',
'place:town' => 'poi_place_town',
'place:village' => 'poi_place_village',
'place:hamlet' => 'poi_place_village',
'place:suburb' => 'poi_place_village',
'place:locality' => 'poi_place_village',
'place:airport' => 'transport_airport2',
'aeroway:aerodrome' => 'transport_airport2',
'railway:station' => 'transport_train_station2',
'amenity:place_of_worship' => 'place_of_worship_unknown3',
'amenity:pub' => 'food_pub',
'amenity:bar' => 'food_bar',
'amenity:university' => 'education_university',
'tourism:museum' => 'tourist_museum',
'amenity:arts_centre' => 'tourist_art_gallery2',
'tourism:zoo' => 'tourist_zoo',
'tourism:theme_park' => 'poi_point_of_interest',
'tourism:attraction' => 'poi_point_of_interest',
'leisure:golf_course' => 'sport_golf',
'historic:castle' => 'tourist_castle',
'amenity:hospital' => 'health_hospital',
'amenity:school' => 'education_school',
'amenity:theatre' => 'tourist_theatre',
'amenity:library' => 'amenity_library',
'amenity:fire_station' => 'amenity_firestation3',
'amenity:police' => 'amenity_police2',
'amenity:bank' => 'money_bank2',
'amenity:post_office' => 'amenity_post_office',
'tourism:hotel' => 'accommodation_hotel2',
'amenity:cinema' => 'tourist_cinema',
'tourism:artwork' => 'tourist_art_gallery2',
'historic:archaeological_site' => 'tourist_archaeological2',
'amenity:doctors' => 'health_doctors',
'leisure:sports_centre' => 'sport_leisure_centre',
'leisure:swimming_pool' => 'sport_swimming_outdoor',
'shop:supermarket' => 'shopping_supermarket',
'shop:convenience' => 'shopping_convenience',
'amenity:restaurant' => 'food_restaurant',
'amenity:fast_food' => 'food_fastfood',
'amenity:cafe' => 'food_cafe',
'tourism:guest_house' => 'accommodation_bed_and_breakfast',
'amenity:pharmacy' => 'health_pharmacy_dispensing',
'amenity:fuel' => 'transport_fuel',
'natural:peak' => 'poi_peak',
'natural:wood' => 'landuse_coniferous_and_deciduous',
'shop:bicycle' => 'shopping_bicycle',
'shop:clothes' => 'shopping_clothes',
'shop:hairdresser' => 'shopping_hairdresser',
'shop:doityourself' => 'shopping_diy',
'shop:estate_agent' => 'shopping_estateagent2',
'shop:car' => 'shopping_car',
'shop:garden_centre' => 'shopping_garden_centre',
'shop:car_repair' => 'shopping_car_repair',
'shop:bakery' => 'shopping_bakery',
'shop:butcher' => 'shopping_butcher',
'shop:apparel' => 'shopping_clothes',
'shop:laundry' => 'shopping_laundrette',
'shop:beverages' => 'shopping_alcohol',
'shop:alcohol' => 'shopping_alcohol',
'shop:optician' => 'health_opticians',
'shop:chemist' => 'health_pharmacy',
'shop:gallery' => 'tourist_art_gallery2',
'shop:jewelry' => 'shopping_jewelry',
'tourism:information' => 'amenity_information',
'historic:ruins' => 'tourist_ruin',
'amenity:college' => 'education_school',
'historic:monument' => 'tourist_monument',
'historic:memorial' => 'tourist_monument',
'historic:mine' => 'poi_mine',
'tourism:caravan_site' => 'accommodation_caravan_park',
'amenity:bus_station' => 'transport_bus_station',
'amenity:atm' => 'money_atm2',
'tourism:viewpoint' => 'tourist_view_point',
'tourism:guesthouse' => 'accommodation_bed_and_breakfast',
'railway:tram' => 'transport_tram_stop',
'amenity:courthouse' => 'amenity_court',
'amenity:recycling' => 'amenity_recycling',
'amenity:dentist' => 'health_dentist',
'natural:beach' => 'tourist_beach',
'railway:tram_stop' => 'transport_tram_stop',
'amenity:prison' => 'amenity_prison',
'highway:bus_stop' => 'transport_bus_stop2'
);
$sClassPlace = $aPlace['class'].':'.$aPlace['type'];
return $aIcons[$sClassPlace] ?? null;
}
/**
* Get an icon for the given object with its full URL.
*/
function getIconFile($aPlace)
{
if (CONST_MapIcon_URL === false) {
return null;
}
$sIcon = getIcon($aPlace);
if (!isset($sIcon)) {
return null;
}
return CONST_MapIcon_URL.'/'.$sIcon.'.p.20.png';
}
/**
* Return a class importance value for the given place.
*
* @param array[] $aPlace Information about the place.
*
* @return int An importance value. The lower the value, the more
* important the class.
*/
function getImportance($aPlace)
{
static $aWithImportance = null;
if ($aWithImportance === null) {
$aWithImportance = array_flip(array(
'boundary:administrative',
'place:country',
'place:state',
'place:province',
'place:county',
'place:city',
'place:region',
'place:island',
'place:town',
'place:village',
'place:hamlet',
'place:suburb',
'place:locality',
'landuse:farm',
'place:farm',
'highway:motorway_junction',
'highway:motorway',
'highway:trunk',
'highway:primary',
'highway:secondary',
'highway:tertiary',
'highway:residential',
'highway:unclassified',
'highway:living_street',
'highway:service',
'highway:track',
'highway:road',
'highway:byway',
'highway:bridleway',
'highway:cycleway',
'highway:pedestrian',
'highway:footway',
'highway:steps',
'highway:motorway_link',
'highway:trunk_link',
'highway:primary_link',
'landuse:industrial',
'landuse:residential',
'landuse:retail',
'landuse:commercial',
'place:airport',
'aeroway:aerodrome',
'railway:station',
'amenity:place_of_worship',
'amenity:pub',
'amenity:bar',
'amenity:university',
'tourism:museum',
'amenity:arts_centre',
'tourism:zoo',
'tourism:theme_park',
'tourism:attraction',
'leisure:golf_course',
'historic:castle',
'amenity:hospital',
'amenity:school',
'amenity:theatre',
'amenity:public_building',
'amenity:library',
'amenity:townhall',
'amenity:community_centre',
'amenity:fire_station',
'amenity:police',
'amenity:bank',
'amenity:post_office',
'leisure:park',
'amenity:park',
'landuse:park',
'landuse:recreation_ground',
'tourism:hotel',
'tourism:motel',
'amenity:cinema',
'tourism:artwork',
'historic:archaeological_site',
'amenity:doctors',
'leisure:sports_centre',
'leisure:swimming_pool',
'shop:supermarket',
'shop:convenience',
'amenity:restaurant',
'amenity:fast_food',
'amenity:cafe',
'tourism:guest_house',
'amenity:pharmacy',
'amenity:fuel',
'natural:peak',
'waterway:waterfall',
'natural:wood',
'natural:water',
'landuse:forest',
'landuse:cemetery',
'landuse:allotments',
'landuse:farmyard',
'railway:rail',
'waterway:canal',
'waterway:river',
'waterway:stream',
'shop:bicycle',
'shop:clothes',
'shop:hairdresser',
'shop:doityourself',
'shop:estate_agent',
'shop:car',
'shop:garden_centre',
'shop:car_repair',
'shop:newsagent',
'shop:bakery',
'shop:furniture',
'shop:butcher',
'shop:apparel',
'shop:electronics',
'shop:department_store',
'shop:books',
'shop:yes',
'shop:outdoor',
'shop:mall',
'shop:florist',
'shop:charity',
'shop:hardware',
'shop:laundry',
'shop:shoes',
'shop:beverages',
'shop:dry_cleaning',
'shop:carpet',
'shop:computer',
'shop:alcohol',
'shop:optician',
'shop:chemist',
'shop:gallery',
'shop:mobile_phone',
'shop:sports',
'shop:jewelry',
'shop:pet',
'shop:beauty',
'shop:stationery',
'shop:shopping_centre',
'shop:general',
'shop:electrical',
'shop:toys',
'shop:jeweller',
'shop:betting',
'shop:household',
'shop:travel_agency',
'shop:hifi',
'amenity:shop',
'tourism:information',
'place:house',
'place:house_name',
'place:house_number',
'place:country_code',
'leisure:pitch',
'highway:unsurfaced',
'historic:ruins',
'amenity:college',
'historic:monument',
'railway:subway',
'historic:memorial',
'leisure:nature_reserve',
'leisure:common',
'waterway:lock_gate',
'natural:fell',
'amenity:nightclub',
'highway:path',
'leisure:garden',
'landuse:reservoir',
'leisure:playground',
'leisure:stadium',
'historic:mine',
'natural:cliff',
'tourism:caravan_site',
'amenity:bus_station',
'amenity:kindergarten',
'highway:construction',
'amenity:atm',
'amenity:emergency_phone',
'waterway:lock',
'waterway:riverbank',
'natural:coastline',
'tourism:viewpoint',
'tourism:hostel',
'tourism:bed_and_breakfast',
'railway:halt',
'railway:platform',
'railway:tram',
'amenity:courthouse',
'amenity:recycling',
'amenity:dentist',
'natural:beach',
'place:moor',
'amenity:grave_yard',
'waterway:drain',
'landuse:grass',
'landuse:village_green',
'natural:bay',
'railway:tram_stop',
'leisure:marina',
'highway:stile',
'natural:moor',
'railway:light_rail',
'railway:narrow_gauge',
'natural:land',
'amenity:village_hall',
'waterway:dock',
'amenity:veterinary',
'landuse:brownfield',
'leisure:track',
'railway:historic_station',
'landuse:construction',
'amenity:prison',
'landuse:quarry',
'amenity:telephone',
'highway:traffic_signals',
'natural:heath',
'historic:house',
'amenity:social_club',
'landuse:military',
'amenity:health_centre',
'historic:building',
'amenity:clinic',
'highway:services',
'amenity:ferry_terminal',
'natural:marsh',
'natural:hill',
'highway:raceway',
'amenity:taxi',
'amenity:take_away',
'amenity:car_rental',
'place:islet',
'amenity:nursery',
'amenity:nursing_home',
'amenity:toilets',
'amenity:hall',
'waterway:boatyard',
'highway:mini_roundabout',
'historic:manor',
'tourism:chalet',
'amenity:bicycle_parking',
'amenity:hotel',
'waterway:weir',
'natural:wetland',
'natural:cave_entrance',
'amenity:crematorium',
'tourism:picnic_site',
'landuse:wood',
'landuse:basin',
'natural:tree',
'leisure:slipway',
'landuse:meadow',
'landuse:piste',
'amenity:care_home',
'amenity:club',
'amenity:medical_centre',
'historic:roman_road',
'historic:fort',
'railway:subway_entrance',
'historic:yes',
'highway:gate',
'leisure:fishing',
'historic:museum',
'amenity:car_wash',
'railway:level_crossing',
'leisure:bird_hide',
'natural:headland',
'tourism:apartments',
'amenity:shopping',
'natural:scrub',
'natural:fen',
'building:yes',
'mountain_pass:yes',
'amenity:parking',
'highway:bus_stop',
'place:postcode',
'amenity:post_box',
'place:houses',
'railway:preserved',
'waterway:derelict_canal',
'amenity:dead_pub',
'railway:disused_station',
'railway:abandoned',
'railway:disused'
));
}
$sClassPlace = $aPlace['class'].':'.$aPlace['type'];
return $aWithImportance[$sClassPlace] ?? null;
}

View File

@@ -1,87 +0,0 @@
<?php
namespace Nominatim;
/**
* Description of the position of a token within a query.
*/
class SearchPosition
{
private $sPhraseType;
private $iPhrase;
private $iNumPhrases;
private $iToken;
private $iNumTokens;
public function __construct($sPhraseType, $iPhrase, $iNumPhrases)
{
$this->sPhraseType = $sPhraseType;
$this->iPhrase = $iPhrase;
$this->iNumPhrases = $iNumPhrases;
}
public function setTokenPosition($iToken, $iNumTokens)
{
$this->iToken = $iToken;
$this->iNumTokens = $iNumTokens;
}
/**
* Check if the phrase can be of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase is untyped or of the given type.
*/
public function maybePhrase($sType)
{
return $this->sPhraseType == '' || $this->sPhraseType == $sType;
}
/**
* Check if the phrase is exactly of the given type.
*
* @param string $sType Type of phrse requested.
*
* @return True if the phrase of the given type.
*/
public function isPhrase($sType)
{
return $this->sPhraseType == $sType;
}
/**
* Return true if the token is the very first in the query.
*/
public function isFirstToken()
{
return $this->iPhrase == 0 && $this->iToken == 0;
}
/**
* Check if the token is the final one in the query.
*/
public function isLastToken()
{
return $this->iToken + 1 == $this->iNumTokens && $this->iPhrase + 1 == $this->iNumPhrases;
}
/**
* Check if the current token is part of the first phrase in the query.
*/
public function isFirstPhrase()
{
return $this->iPhrase == 0;
}
/**
* Get the phrase position in the query.
*/
public function getPhrase()
{
return $this->iPhrase;
}
}

View File

@@ -1,84 +0,0 @@
<?php
namespace Nominatim;
class Shell
{
public function __construct($sBaseCmd, ...$aParams)
{
if (!$sBaseCmd) {
throw new \Exception('Command missing in new() call');
}
$this->baseCmd = $sBaseCmd;
$this->aParams = array();
$this->aEnv = null; // null = use the same environment as the current PHP process
$this->stdoutString = null;
foreach ($aParams as $sParam) {
$this->addParams($sParam);
}
}
public function addParams(...$aParams)
{
foreach ($aParams as $sParam) {
if (isset($sParam) && $sParam !== null && $sParam !== '') {
array_push($this->aParams, $sParam);
}
}
return $this;
}
public function addEnvPair($sKey, $sVal)
{
if (isset($sKey) && $sKey && isset($sVal)) {
if (!isset($this->aEnv)) {
$this->aEnv = $_ENV;
}
$this->aEnv = array_merge($this->aEnv, array($sKey => $sVal), $_ENV);
}
return $this;
}
public function escapedCmd()
{
$aEscaped = array_map(function ($sParam) {
return $this->escapeParam($sParam);
}, array_merge(array($this->baseCmd), $this->aParams));
return join(' ', $aEscaped);
}
public function run($bExitOnFail = false)
{
$sCmd = $this->escapedCmd();
// $aEnv does not need escaping, proc_open seems to handle it fine
$aFDs = array(
0 => array('pipe', 'r'),
1 => STDOUT,
2 => STDERR
);
$aPipes = null;
$hProc = @proc_open($sCmd, $aFDs, $aPipes, null, $this->aEnv);
if (!is_resource($hProc)) {
throw new \Exception('Unable to run command: ' . $sCmd);
}
fclose($aPipes[0]); // no stdin
$iStat = proc_close($hProc);
if ($iStat != 0 && $bExitOnFail) {
exit($iStat);
}
return $iStat;
}
private function escapeParam($sParam)
{
return (preg_match('/^-*\w+$/', $sParam)) ? $sParam : escapeshellarg($sParam);
}
}

View File

@@ -1,51 +0,0 @@
<?php
namespace Nominatim;
require_once(CONST_TokenizerDir.'/tokenizer.php');
use Exception;
class Status
{
protected $oDB;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
}
public function status()
{
if (!$this->oDB) {
throw new Exception('No database', 700);
}
try {
$this->oDB->connect();
} catch (\Nominatim\DatabaseError $e) {
throw new Exception('Database connection failed', 700);
}
$oTokenizer = new \Nominatim\Tokenizer($this->oDB);
$oTokenizer->checkStatus();
}
public function dataDate()
{
$sSQL = 'SELECT EXTRACT(EPOCH FROM lastimportdate) FROM import_status LIMIT 1';
$iDataDateEpoch = $this->oDB->getOne($sSQL);
if ($iDataDateEpoch === false) {
throw new Exception('Import date is not available', 705);
}
return $iDataDateEpoch;
}
public function databaseVersion()
{
$sSQL = 'SELECT value FROM nominatim_properties WHERE property = \'database_version\'';
return $this->oDB->getOne($sSQL);
}
}

View File

@@ -1,72 +0,0 @@
<?php
namespace Nominatim\Token;
/**
* A country token.
*/
class Country
{
/// Database word id, if available.
private $iId;
/// Two-letter country code (lower-cased).
private $sCountryCode;
public function __construct($iId, $sCountryCode)
{
$this->iId = $iId;
$this->sCountryCode = $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasCountry() && $oPosition->maybePhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$oNewSearch = $oSearch->clone($oPosition->isLastToken() ? 1 : 6);
$oNewSearch->setCountry($this->sCountryCode);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'country',
'Info' => $this->sCountryCode
);
}
public function debugCode()
{
return 'C';
}
}

View File

@@ -1,108 +0,0 @@
<?php
namespace Nominatim\Token;
/**
* A house number token.
*/
class HouseNumber
{
/// Database word id, if available.
private $iId;
/// Normalized house number.
private $sToken;
public function __construct($iId, $sToken)
{
$this->iId = $iId;
$this->sToken = $sToken;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasHousenumber()
&& !$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& $oPosition->maybePhrase('street');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// sanity check: if the housenumber is not mainly made
// up of numbers, add a penalty
$iSearchCost = 1;
if (preg_match('/\\d/', $this->sToken) === 0
|| preg_match_all('/[^0-9]/', $this->sToken, $aMatches) > 2) {
$iSearchCost++;
}
if (!$oSearch->hasOperator(\Nominatim\Operator::NONE)) {
$iSearchCost++;
}
if (empty($this->iId)) {
$iSearchCost++;
}
// also must not appear in the middle of the address
if ($oSearch->hasAddress() || $oSearch->hasPostcode()) {
$iSearchCost++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumber($this->sToken);
$aNewSearches[] = $oNewSearch;
// Housenumbers may appear in the name when the place has its own
// address terms.
if ($this->iId !== null
&& ($oSearch->getNamePhrase() >= 0 || !$oSearch->hasName())
&& !$oSearch->hasAddress()
) {
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setHousenumberAsName($this->iId);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'house number',
'Info' => array('nr' => $this->sToken)
);
}
public function debugCode()
{
return 'H';
}
}

View File

@@ -1,126 +0,0 @@
<?php
namespace Nominatim;
require_once(CONST_LibDir.'/TokenCountry.php');
require_once(CONST_LibDir.'/TokenHousenumber.php');
require_once(CONST_LibDir.'/TokenPostcode.php');
require_once(CONST_LibDir.'/TokenSpecialTerm.php');
require_once(CONST_LibDir.'/TokenWord.php');
require_once(CONST_LibDir.'/TokenPartial.php');
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
/**
* Saves information about the tokens that appear in a search query.
*
* Tokens are sorted by their normalized form, the token word. There are different
* kinds of tokens, represented by different Token* classes. Note that
* tokens do not have a common base class. All tokens need to have a field
* with the word id that points to an entry in the `word` database table
* but otherwise the information saved about a token can be very different.
*/
class TokenList
{
// List of list of tokens indexed by their word_token.
private $aTokens = array();
/**
* Return total number of tokens.
*
* @return Integer
*/
public function count()
{
return count($this->aTokens);
}
/**
* Check if there are tokens for the given token word.
*
* @param string $sWord Token word to look for.
*
* @return bool True if there is one or more token for the token word.
*/
public function contains($sWord)
{
return isset($this->aTokens[$sWord]);
}
/**
* Check if there are partial or full tokens for the given word.
*
* @param string $sWord Token word to look for.
*
* @return bool True if there is one or more token for the token word.
*/
public function containsAny($sWord)
{
return isset($this->aTokens[$sWord]);
}
/**
* Get the list of tokens for the given token word.
*
* @param string $sWord Token word to look for.
*
* @return object[] Array of tokens for the given token word or an
* empty array if no tokens could be found.
*/
public function get($sWord)
{
return isset($this->aTokens[$sWord]) ? $this->aTokens[$sWord] : array();
}
public function getFullWordIDs()
{
$ids = array();
foreach ($this->aTokens as $aTokenList) {
foreach ($aTokenList as $oToken) {
if (is_a($oToken, '\Nominatim\Token\Word')) {
$ids[$oToken->getId()] = $oToken->getId();
}
}
}
return $ids;
}
/**
* Add a new token for the given word.
*
* @param string $sWord Word the token describes.
* @param object $oToken Token object to add.
*
* @return void
*/
public function addToken($sWord, $oToken)
{
if (isset($this->aTokens[$sWord])) {
$this->aTokens[$sWord][] = $oToken;
} else {
$this->aTokens[$sWord] = array($oToken);
}
}
public function debugTokenByWordIdList()
{
$aWordsIDs = array();
foreach ($this->aTokens as $sToken => $aWords) {
foreach ($aWords as $aToken) {
$iId = $aToken->getId();
if ($iId !== null) {
$aWordsIDs[$iId] = '#'.$sToken.'('.$aToken->debugCode().' '.$iId.')#';
}
}
}
return $aWordsIDs;
}
public function debugInfo()
{
return $this->aTokens;
}
}

View File

@@ -1,118 +0,0 @@
<?php
namespace Nominatim\Token;
/**
* A standard word token.
*/
class Partial
{
/// Database word id, if applicable.
private $iId;
/// Number of appearances in the database.
private $iSearchNameCount;
/// True, if the token consists exclusively of digits and spaces.
private $bNumberToken;
public function __construct($iId, $sToken, $iSearchNameCount)
{
$this->iId = $iId;
$this->bNumberToken = (bool) preg_match('#^[0-9 ]+$#', $sToken);
$this->iSearchNameCount = $iSearchNameCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// Partial token in Address.
if (($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
&& $oSearch->hasName()
) {
$iSearchCost = $this->bNumberToken ? 2 : 1;
if ($this->iSearchNameCount >= CONST_Max_Word_Frequency) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addAddressToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency
);
$aNewSearches[] = $oNewSearch;
}
// Partial token in Name.
if ((!$oSearch->hasPostcode() && !$oSearch->hasAddress())
&& (!$oSearch->hasName(true)
|| $oSearch->getNamePhrase() == $oPosition->getPhrase())
) {
$iSearchCost = 1;
if (!$oSearch->hasName(true)) {
$iSearchCost += 1;
}
if ($this->bNumberToken) {
$iSearchCost += 1;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->addPartialNameToken(
$this->iId,
$this->iSearchNameCount < CONST_Max_Word_Frequency,
$oPosition->getPhrase()
);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'partial',
'Info' => array(
'count' => $this->iSearchNameCount
)
);
}
public function debugCode()
{
return 'w';
}
}

View File

@@ -1,98 +0,0 @@
<?php
namespace Nominatim\Token;
/**
* A postcode token.
*/
class Postcode
{
/// Database word id, if available.
private $iId;
/// Full nomralized postcode (upper cased).
private $sPostcode;
// Optional country code the postcode belongs to (currently unused).
private $sCountryCode;
public function __construct($iId, $sPostcode, $sCountryCode = '')
{
$this->iId = $iId;
$this->sPostcode = $sPostcode;
$this->sCountryCode = empty($sCountryCode) ? '' : $sCountryCode;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasPostcode() && $oPosition->maybePhrase('postalcode');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$aNewSearches = array();
// If we have structured search or this is the first term,
// make the postcode the primary search element.
if ($oSearch->hasOperator(\Nominatim\Operator::NONE) && $oPosition->isFirstToken()) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->setPostcodeAsName($this->iId, $this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
// If we have a structured search or this is not the first term,
// add the postcode as an addendum.
if (!$oSearch->hasOperator(\Nominatim\Operator::POSTCODE)
&& ($oPosition->isPhrase('postalcode') || $oSearch->hasName())
) {
$iPenalty = 1;
if (strlen($this->sPostcode) < 4) {
$iPenalty += 4 - strlen($this->sPostcode);
}
$oNewSearch = $oSearch->clone($iPenalty);
$oNewSearch->setPostcode($this->sPostcode);
$aNewSearches[] = $oNewSearch;
}
return $aNewSearches;
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'postcode',
'Info' => $this->sPostcode.'('.$this->sCountryCode.')'
);
}
public function debugCode()
{
return 'P';
}
}

View File

@@ -1,102 +0,0 @@
<?php
namespace Nominatim\Token;
require_once(CONST_LibDir.'/SpecialSearchOperator.php');
/**
* A word token describing a place type.
*/
class SpecialTerm
{
/// Database word id, if applicable.
private $iId;
/// Class (or OSM tag key) of the place to look for.
private $sClass;
/// Type (or OSM tag value) of the place to look for.
private $sType;
/// Relationship of the operator to the object (see Operator class).
private $iOperator;
public function __construct($iID, $sClass, $sType, $iOperator)
{
$this->iId = $iID;
$this->sClass = $sClass;
$this->sType = $sType;
$this->iOperator = $iOperator;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oSearch->hasOperator() && $oPosition->isPhrase('');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
$iSearchCost = 2;
$iOp = $this->iOperator;
if ($iOp == \Nominatim\Operator::NONE) {
if ($oSearch->hasName() || $oSearch->getContext()->isBoundedSearch()) {
$iOp = \Nominatim\Operator::NAME;
} else {
$iOp = \Nominatim\Operator::NEAR;
}
$iSearchCost += 2;
} elseif (!$oPosition->isFirstToken() && !$oPosition->isLastToken()) {
$iSearchCost += 2;
}
if ($oSearch->hasHousenumber()) {
$iSearchCost ++;
}
$oNewSearch = $oSearch->clone($iSearchCost);
$oNewSearch->setPoiSearch($iOp, $this->sClass, $this->sType);
return array($oNewSearch);
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'special term',
'Info' => array(
'class' => $this->sClass,
'type' => $this->sType,
'operator' => \Nominatim\Operator::toString($this->iOperator)
)
);
}
public function debugCode()
{
return 'S';
}
}

View File

@@ -1,102 +0,0 @@
<?php
namespace Nominatim\Token;
/**
* A standard word token.
*/
class Word
{
/// Database word id, if applicable.
private $iId;
/// Number of appearances in the database.
private $iSearchNameCount;
/// Number of terms in the word.
private $iTermCount;
public function __construct($iId, $iSearchNameCount, $iTermCount)
{
$this->iId = $iId;
$this->iSearchNameCount = $iSearchNameCount;
$this->iTermCount = $iTermCount;
}
public function getId()
{
return $this->iId;
}
/**
* Check if the token can be added to the given search.
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return True if the token is compatible with the search configuration
* given the position.
*/
public function isExtendable($oSearch, $oPosition)
{
return !$oPosition->isPhrase('country');
}
/**
* Derive new searches by adding this token to an existing search.
*
* @param object $oSearch Partial search description derived so far.
* @param object $oPosition Description of the token position within
the query.
*
* @return SearchDescription[] List of derived search descriptions.
*/
public function extendSearch($oSearch, $oPosition)
{
// Full words can only be a name if they appear at the beginning
// of the phrase. In structured search the name must forcably in
// the first phrase. In unstructured search it may be in a later
// phrase when the first phrase is a house number.
if ($oSearch->hasName()
|| !($oPosition->isFirstPhrase() || $oPosition->isPhrase(''))
) {
if ($this->iTermCount > 1
&& ($oPosition->isPhrase('') || !$oPosition->isFirstPhrase())
) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addAddressToken($this->iId);
return array($oNewSearch);
}
} elseif (!$oSearch->hasName(true)) {
$oNewSearch = $oSearch->clone(1);
$oNewSearch->addNameToken(
$this->iId,
CONST_Search_NameOnlySearchFrequencyThreshold
&& $this->iSearchNameCount
< CONST_Search_NameOnlySearchFrequencyThreshold
);
return array($oNewSearch);
}
return array();
}
public function debugInfo()
{
return array(
'ID' => $this->iId,
'Type' => 'word',
'Info' => array(
'count' => $this->iSearchNameCount,
'terms' => $this->iTermCount
)
);
}
public function debugCode()
{
return 'W';
}
}

View File

@@ -1,101 +0,0 @@
<?php
@define('CONST_LibDir', dirname(dirname(__FILE__)));
require_once(CONST_LibDir.'/init-cmd.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/PlaceLookup.php');
require_once(CONST_LibDir.'/ReverseGeocode.php');
ini_set('memory_limit', '800M');
$aCMDOptions = array(
'Tools to warm nominatim db',
array('help', 'h', 0, 1, 0, 0, false, 'Show Help'),
array('quiet', 'q', 0, 1, 0, 0, 'bool', 'Quiet output'),
array('verbose', 'v', 0, 1, 0, 0, 'bool', 'Verbose output'),
array('reverse-only', '', 0, 1, 0, 0, 'bool', 'Warm reverse only'),
array('search-only', '', 0, 1, 0, 0, 'bool', 'Warm search only'),
array('project-dir', '', 0, 1, 1, 1, 'realpath', 'Base directory of the Nominatim installation (default: .)'),
);
getCmdOpt($_SERVER['argv'], $aCMDOptions, $aResult, true, true);
loadSettings($aCMDResult['project-dir'] ?? getcwd());
@define('CONST_Database_DSN', getSetting('DATABASE_DSN'));
@define('CONST_Default_Language', getSetting('DEFAULT_LANGUAGE', false));
@define('CONST_Log_DB', getSettingBool('LOG_DB'));
@define('CONST_Log_File', getSetting('LOG_FILE', false));
@define('CONST_NoAccessControl', getSettingBool('CORS_NOACCESSCONTROL'));
@define('CONST_Places_Max_ID_count', getSetting('LOOKUP_MAX_COUNT'));
@define('CONST_PolygonOutput_MaximumTypes', getSetting('POLYGON_OUTPUT_MAX_TYPES'));
@define('CONST_Search_BatchMode', getSettingBool('SEARCH_BATCH_MODE'));
@define('CONST_Search_NameOnlySearchFrequencyThreshold', getSetting('SEARCH_NAME_ONLY_THRESHOLD'));
@define('CONST_Use_US_Tiger_Data', getSettingBool('USE_US_TIGER_DATA'));
@define('CONST_MapIcon_URL', getSetting('MAPICON_URL', false));
@define('CONST_TokenizerDir', CONST_InstallDir.'/tokenizer');
require_once(CONST_LibDir.'/Geocode.php');
$oDB = new Nominatim\DB();
$oDB->connect();
$bVerbose = $aResult['verbose'];
function print_results($aResults, $bVerbose)
{
if ($bVerbose) {
if ($aResults && count($aResults)) {
echo $aResults[0]['langaddress']."\n";
} else {
echo "<not found>\n";
}
} else {
echo '.';
}
}
if (!$aResult['search-only']) {
$oReverseGeocode = new Nominatim\ReverseGeocode($oDB);
$oReverseGeocode->setZoom(20);
$oPlaceLookup = new Nominatim\PlaceLookup($oDB);
$oPlaceLookup->setIncludeAddressDetails(true);
$oPlaceLookup->setLanguagePreference(array('en'));
echo 'Warm reverse: ';
if ($bVerbose) {
echo "\n";
}
for ($i = 0; $i < 1000; $i++) {
$fLat = rand(-9000, 9000) / 100;
$fLon = rand(-18000, 18000) / 100;
if ($bVerbose) {
echo "$fLat, $fLon = ";
}
$oLookup = $oReverseGeocode->lookup($fLat, $fLon);
$aSearchResults = $oLookup ? $oPlaceLookup->lookup(array($oLookup->iId => $oLookup)) : null;
print_results($aSearchResults, $bVerbose);
}
echo "\n";
}
if (!$aResult['reverse-only']) {
$oGeocode = new Nominatim\Geocode($oDB);
echo 'Warm search: ';
if ($bVerbose) {
echo "\n";
}
$sSQL = 'SELECT word FROM word WHERE word is not null ORDER BY search_name_count DESC LIMIT 1000';
foreach ($oDB->getCol($sSQL) as $sWord) {
if ($bVerbose) {
echo "$sWord = ";
}
$oGeocode->setLanguagePreference(array('en'));
$oGeocode->setQuery($sWord);
$aSearchResults = $oGeocode->lookup();
print_results($aSearchResults, $bVerbose);
}
echo "\n";
}

View File

@@ -1,13 +0,0 @@
<?php
require('Symfony/Component/Dotenv/autoload.php');
function loadDotEnv()
{
$dotenv = new \Symfony\Component\Dotenv\Dotenv();
$dotenv->load(CONST_ConfigDir.'/env.defaults');
if (file_exists('.env')) {
$dotenv->load('.env');
}
}

View File

@@ -1,5 +0,0 @@
<?php
require_once('init.php');
require_once('cmd.php');
require_once('DebugNone.php');

View File

@@ -1,4 +0,0 @@
<?php
require_once(CONST_LibDir.'/lib.php');
require_once(CONST_LibDir.'/DB.php');

View File

@@ -1,21 +0,0 @@
<?php
$phpPhraseSettingsFile = $argv[1];
$jsonPhraseSettingsFile = dirname($phpPhraseSettingsFile).'/'.basename($phpPhraseSettingsFile, '.php').'.json';
if (file_exists($phpPhraseSettingsFile) && !file_exists($jsonPhraseSettingsFile)) {
include $phpPhraseSettingsFile;
$data = array();
if (isset($aTagsBlacklist)) {
$data['blackList'] = $aTagsBlacklist;
}
if (isset($aTagsWhitelist)) {
$data['whiteList'] = $aTagsWhitelist;
}
$jsonFile = fopen($jsonPhraseSettingsFile, 'w');
fwrite($jsonFile, json_encode($data));
fclose($jsonFile);
}

View File

@@ -1,30 +0,0 @@
<?php
function formatOSMType($sType, $bIncludeExternal = true)
{
if ($sType == 'N') {
return 'node';
}
if ($sType == 'W') {
return 'way';
}
if ($sType == 'R') {
return 'relation';
}
if (!$bIncludeExternal) {
return '';
}
if ($sType == 'T') {
return 'way';
}
if ($sType == 'I') {
return 'way';
}
// not handled: P, L
return '';
}

View File

@@ -1,19 +0,0 @@
<?php
function getOsm2pgsqlBinary()
{
$sBinary = getSetting('OSM2PGSQL_BINARY');
return $sBinary ? $sBinary : CONST_Default_Osm2pgsql;
}
function getImportStyle()
{
$sStyle = getSetting('IMPORT_STYLE');
if (in_array($sStyle, array('admin', 'street', 'address', 'full', 'extratags'))) {
return CONST_ConfigDir.'/import-'.$sStyle.'.style';
}
return $sStyle;
}

View File

@@ -1,246 +0,0 @@
<?php
namespace Nominatim;
class Tokenizer
{
private $oDB;
private $oNormalizer;
private $oTransliterator;
private $aCountryRestriction;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
$this->oTransliterator = \Transliterator::createFromRules(CONST_Transliteration);
}
public function checkStatus()
{
$sSQL = 'SELECT word_id FROM word limit 1';
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
}
if (!$iWordID) {
throw new \Exception('No value', 704);
}
}
public function setCountryRestriction($aCountries)
{
$this->aCountryRestriction = $aCountries;
}
public function normalizeString($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
private function makeStandardWord($sTerm)
{
return trim($this->oTransliterator->transliterate(' '.$sTerm.' '));
}
public function tokensForSpecialTerm($sTerm)
{
$aResults = array();
$sSQL = "SELECT word_id, info->>'class' as class, info->>'type' as type ";
$sSQL .= ' FROM word WHERE word_token = :term and type = \'S\'';
Debug::printVar('Term', $sTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL, array(':term' => $this->makeStandardWord($sTerm)));
Debug::printVar('Results', $aSearchWords);
foreach ($aSearchWords as $aSearchTerm) {
$aResults[] = new \Nominatim\Token\SpecialTerm(
$aSearchTerm['word_id'],
$aSearchTerm['class'],
$aSearchTerm['type'],
\Nominatim\Operator::TYPE
);
}
Debug::printVar('Special term tokens', $aResults);
return $aResults;
}
public function extractTokensFromPhrases(&$aPhrases)
{
$sNormQuery = '';
$aWordLists = array();
$aTokens = array();
foreach ($aPhrases as $iPhrase => $oPhrase) {
$sNormQuery .= ','.$this->normalizeString($oPhrase->getPhrase());
$sPhrase = $this->makeStandardWord($oPhrase->getPhrase());
Debug::printVar('Phrase', $sPhrase);
if (strlen($sPhrase) > 0) {
$aWords = explode(' ', $sPhrase);
Tokenizer::addTokens($aTokens, $aWords);
$aWordLists[] = $aWords;
} else {
$aWordLists[] = array();
}
}
Debug::printVar('Tokens', $aTokens);
Debug::printVar('WordLists', $aWordLists);
$oValidTokens = $this->computeValidTokens($aTokens, $sNormQuery);
foreach ($aPhrases as $iPhrase => $oPhrase) {
$oPhrase->computeWordSets($aWordLists[$iPhrase], $oValidTokens);
}
return $oValidTokens;
}
private function computeValidTokens($aTokens, $sNormQuery)
{
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$this->addTokensFromDB($oValidTokens, $aTokens, $sNormQuery);
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] != ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^[0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
}
return $oValidTokens;
}
private function addTokensFromDB(&$oValidTokens, $aTokens, $sNormQuery)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, type, word,';
$sSQL .= " info->>'op' as operator,";
$sSQL .= " info->>'class' as class, info->>'type' as ctype,";
$sSQL .= " info->>'count' as count";
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $this->oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $this->oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$iId = (int) $aWord['word_id'];
$sTok = $aWord['word_token'];
switch ($aWord['type']) {
case 'C': // country name tokens
if ($aWord['word'] !== null
&& (!$this->aCountryRestriction
|| in_array($aWord['word'], $this->aCountryRestriction))
) {
$oValidTokens->addToken(
$sTok,
new Token\Country($iId, $aWord['word'])
);
}
break;
case 'H': // house number tokens
$oValidTokens->addToken($sTok, new Token\HouseNumber($iId, $aWord['word_token']));
break;
case 'P': // postcode tokens
// Postcodes are not normalized, so they may have content
// that makes SQL injection possible. Reject postcodes
// that would need special escaping.
if ($aWord['word'] !== null
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$sNormPostcode = $this->normalizeString($aWord['word']);
if (strpos($sNormQuery, $sNormPostcode) !== false) {
$oValidTokens->addToken(
$sTok,
new Token\Postcode($iId, $aWord['word'], null)
);
}
}
break;
case 'S': // tokens for classification terms (special phrases)
if ($aWord['class'] !== null && $aWord['ctype'] !== null) {
$oValidTokens->addToken($sTok, new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['ctype'],
(isset($aWord['operator'])) ? Operator::NEAR : Operator::NONE
));
}
break;
case 'W': // full-word tokens
$oValidTokens->addToken($sTok, new Token\Word(
$iId,
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
));
break;
case 'w': // partial word terms
$oValidTokens->addToken($sTok, new Token\Partial(
$iId,
$aWord['word_token'],
(int) $aWord['count']
));
break;
default:
break;
}
}
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
private static function addTokens(&$aTokens, $aWords)
{
$iNumWords = count($aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $aWords[$i];
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$aWords[$j];
$aTokens[$sPhrase] = $sPhrase;
}
}
}
}

View File

@@ -1,266 +0,0 @@
<?php
namespace Nominatim;
class Tokenizer
{
private $oDB;
private $oNormalizer = null;
private $aCountryRestriction = null;
public function __construct(&$oDB)
{
$this->oDB =& $oDB;
$this->oNormalizer = \Transliterator::createFromRules(CONST_Term_Normalization_Rules);
}
public function checkStatus()
{
$sStandardWord = $this->oDB->getOne("SELECT make_standard_name('a')");
if ($sStandardWord === false) {
throw new \Exception('Module failed', 701);
}
if ($sStandardWord != 'a') {
throw new \Exception('Module call failed', 702);
}
$sSQL = "SELECT word_id FROM word WHERE word_token IN (' a')";
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);
}
if (!$iWordID) {
throw new \Exception('No value', 704);
}
}
public function setCountryRestriction($aCountries)
{
$this->aCountryRestriction = $aCountries;
}
public function normalizeString($sTerm)
{
if ($this->oNormalizer === null) {
return $sTerm;
}
return $this->oNormalizer->transliterate($sTerm);
}
public function tokensForSpecialTerm($sTerm)
{
$aResults = array();
$sSQL = 'SELECT word_id, class, type FROM word ';
$sSQL .= ' WHERE word_token = \' \' || make_standard_name(:term)';
$sSQL .= ' AND class is not null AND class not in (\'place\')';
Debug::printVar('Term', $sTerm);
Debug::printSQL($sSQL);
$aSearchWords = $this->oDB->getAll($sSQL, array(':term' => $sTerm));
Debug::printVar('Results', $aSearchWords);
foreach ($aSearchWords as $aSearchTerm) {
$aResults[] = new \Nominatim\Token\SpecialTerm(
$aSearchTerm['word_id'],
$aSearchTerm['class'],
$aSearchTerm['type'],
\Nominatim\Operator::TYPE
);
}
Debug::printVar('Special term tokens', $aResults);
return $aResults;
}
public function extractTokensFromPhrases(&$aPhrases)
{
// First get the normalized version of all phrases
$sNormQuery = '';
$sSQL = 'SELECT ';
$aParams = array();
foreach ($aPhrases as $iPhrase => $oPhrase) {
$sNormQuery .= ','.$this->normalizeString($oPhrase->getPhrase());
$sSQL .= 'make_standard_name(:' .$iPhrase.') as p'.$iPhrase.',';
$aParams[':'.$iPhrase] = $oPhrase->getPhrase();
}
$sSQL = substr($sSQL, 0, -1);
Debug::printSQL($sSQL);
Debug::printVar('SQL parameters', $aParams);
$aNormPhrases = $this->oDB->getRow($sSQL, $aParams);
Debug::printVar('SQL result', $aNormPhrases);
// now compute all possible tokens
$aWordLists = array();
$aTokens = array();
foreach ($aNormPhrases as $sPhrase) {
if (strlen($sPhrase) > 0) {
$aWords = explode(' ', $sPhrase);
Tokenizer::addTokens($aTokens, $aWords);
$aWordLists[] = $aWords;
} else {
$aWordLists[] = array();
}
}
Debug::printVar('Tokens', $aTokens);
Debug::printVar('WordLists', $aWordLists);
$oValidTokens = $this->computeValidTokens($aTokens, $sNormQuery);
foreach ($aPhrases as $iPhrase => $oPhrase) {
$oPhrase->computeWordSets($aWordLists[$iPhrase], $oValidTokens);
}
return $oValidTokens;
}
private function computeValidTokens($aTokens, $sNormQuery)
{
$oValidTokens = new TokenList();
if (!empty($aTokens)) {
$this->addTokensFromDB($oValidTokens, $aTokens, $sNormQuery);
// Try more interpretations for Tokens that could not be matched.
foreach ($aTokens as $sToken) {
if ($sToken[0] != ' ' && !$oValidTokens->contains($sToken)) {
if (preg_match('/^([0-9]{5}) [0-9]{4}$/', $sToken, $aData)) {
// US ZIP+4 codes - merge in the 5-digit ZIP code
$oValidTokens->addToken(
$sToken,
new Token\Postcode(null, $aData[1], 'us')
);
} elseif (preg_match('/^[0-9]+$/', $sToken)) {
// Unknown single word token with a number.
// Assume it is a house number.
$oValidTokens->addToken(
$sToken,
new Token\HouseNumber(null, trim($sToken))
);
}
}
}
}
return $oValidTokens;
}
private function addTokensFromDB(&$oValidTokens, $aTokens, $sNormQuery)
{
// Check which tokens we have, get the ID numbers
$sSQL = 'SELECT word_id, word_token, word, class, type, country_code,';
$sSQL .= ' operator, coalesce(search_name_count, 0) as count';
$sSQL .= ' FROM word WHERE word_token in (';
$sSQL .= join(',', $this->oDB->getDBQuotedList($aTokens)).')';
Debug::printSQL($sSQL);
$aDBWords = $this->oDB->getAll($sSQL, null, 'Could not get word tokens.');
foreach ($aDBWords as $aWord) {
$oToken = null;
$iId = (int) $aWord['word_id'];
if ($aWord['class']) {
// Special terms need to appear in their normalized form.
// (postcodes are not normalized in the word table)
$sNormWord = $this->normalizeString($aWord['word']);
if ($aWord['word'] && strpos($sNormQuery, $sNormWord) === false) {
continue;
}
if ($aWord['class'] == 'place' && $aWord['type'] == 'house') {
$oToken = new Token\HouseNumber($iId, trim($aWord['word_token']));
} elseif ($aWord['class'] == 'place' && $aWord['type'] == 'postcode') {
if ($aWord['word']
&& pg_escape_string($aWord['word']) == $aWord['word']
) {
$oToken = new Token\Postcode(
$iId,
$aWord['word'],
$aWord['country_code']
);
}
} else {
// near and in operator the same at the moment
$oToken = new Token\SpecialTerm(
$iId,
$aWord['class'],
$aWord['type'],
$aWord['operator'] ? Operator::NEAR : Operator::NONE
);
}
} elseif ($aWord['country_code']) {
// Filter country tokens that do not match restricted countries.
if (!$this->aCountryRestriction
|| in_array($aWord['country_code'], $this->aCountryRestriction)
) {
$oToken = new Token\Country($iId, $aWord['country_code']);
}
} elseif ($aWord['word_token'][0] == ' ') {
$oToken = new Token\Word(
$iId,
(int) $aWord['count'],
substr_count($aWord['word_token'], ' ')
);
// For backward compatibility: ignore all partial tokens with more
// than one word.
} elseif (strpos($aWord['word_token'], ' ') === false) {
$oToken = new Token\Partial(
$iId,
$aWord['word_token'],
(int) $aWord['count']
);
}
if ($oToken) {
// remove any leading spaces
if ($aWord['word_token'][0] == ' ') {
$oValidTokens->addToken(substr($aWord['word_token'], 1), $oToken);
} else {
$oValidTokens->addToken($aWord['word_token'], $oToken);
}
}
}
}
/**
* Add the tokens from this phrase to the given list of tokens.
*
* @param string[] $aTokens List of tokens to append.
*
* @return void
*/
private static function addTokens(&$aTokens, $aWords)
{
$iNumWords = count($aWords);
for ($i = 0; $i < $iNumWords; $i++) {
$sPhrase = $aWords[$i];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
for ($j = $i + 1; $j < $iNumWords; $j++) {
$sPhrase .= ' '.$aWords[$j];
$aTokens[' '.$sPhrase] = ' '.$sPhrase;
$aTokens[$sPhrase] = $sPhrase;
}
}
}
}

View File

@@ -1,28 +0,0 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
$sOutputFormat = $oParams->getSet('format', array('json'), 'json');
set_exception_handler_by_format($sOutputFormat);
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$sSQL = 'select placex.place_id, country_code,';
$sSQL .= " name->'name' as name, i.* from placex, import_polygon_delete i";
$sSQL .= ' where placex.osm_id = i.osm_id and placex.osm_type = i.osm_type';
$sSQL .= ' and placex.class = i.class and placex.type = i.type';
$aPolygons = $oDB->getAll($sSQL, null, 'Could not get list of deleted OSM elements.');
if (CONST_Debug) {
var_dump($aPolygons);
exit;
}
if ($sOutputFormat == 'json') {
javascript_renderData($aPolygons);
}

View File

@@ -1,55 +0,0 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/log.php');
require_once(CONST_LibDir.'/output.php');
ini_set('memory_limit', '200M');
$oParams = new Nominatim\ParameterParser();
$sOutputFormat = $oParams->getSet('format', array('json'), 'json');
set_exception_handler_by_format($sOutputFormat);
$iDays = $oParams->getInt('days', false);
$bReduced = $oParams->getBool('reduced', false);
$sClass = $oParams->getString('class', false);
$oDB = new Nominatim\DB(CONST_Database_DSN);
$oDB->connect();
$iTotalBroken = (int) $oDB->getOne('SELECT count(*) FROM import_polygon_error');
$aPolygons = array();
while ($iTotalBroken && empty($aPolygons)) {
$sSQL = 'SELECT osm_type, osm_id, class, type, name->\'name\' as "name",';
$sSQL .= 'country_code, errormessage, updated';
$sSQL .= ' FROM import_polygon_error';
$aWhere = array();
if ($iDays) {
$aWhere[] = "updated > 'now'::timestamp - '".$iDays." day'::interval";
$iDays++;
}
if ($bReduced) {
$aWhere[] = "errormessage like 'Area reduced%'";
}
if ($sClass) {
$sWhere[] = "class = '".pg_escape_string($sClass)."'";
}
if (!empty($aWhere)) {
$sSQL .= ' WHERE '.join(' and ', $aWhere);
}
$sSQL .= ' ORDER BY updated desc LIMIT 1000';
$aPolygons = $oDB->getAll($sSQL);
}
if (CONST_Debug) {
var_dump($aPolygons);
exit;
}
if ($sOutputFormat == 'json') {
javascript_renderData($aPolygons);
}

View File

@@ -1,12 +0,0 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/ParameterParser.php');
$oParams = new Nominatim\ParameterParser();
// Format for output
$sOutputFormat = $oParams->getSet('format', array('xml', 'json', 'jsonv2', 'geojson', 'geocodejson'), 'jsonv2');
set_exception_handler_by_format($sOutputFormat);
throw new Exception('Reverse-only import does not support forward searching.', 404);

View File

@@ -1,48 +0,0 @@
<?php
require_once(CONST_LibDir.'/init-website.php');
require_once(CONST_LibDir.'/ParameterParser.php');
require_once(CONST_LibDir.'/Status.php');
$oParams = new Nominatim\ParameterParser();
$sOutputFormat = $oParams->getSet('format', array('text', 'json'), 'text');
$oDB = new Nominatim\DB(CONST_Database_DSN);
if ($sOutputFormat == 'json') {
header('content-type: application/json; charset=UTF-8');
}
try {
$oStatus = new Nominatim\Status($oDB);
$oStatus->status();
if ($sOutputFormat == 'json') {
$epoch = $oStatus->dataDate();
$aResponse = array(
'status' => 0,
'message' => 'OK',
'data_updated' => (new DateTime('@'.$epoch))->format(DateTime::RFC3339),
'software_version' => CONST_NominatimVersion
);
$sDatabaseVersion = $oStatus->databaseVersion();
if ($sDatabaseVersion) {
$aResponse['database_version'] = $sDatabaseVersion;
}
javascript_renderData($aResponse);
} else {
echo 'OK';
}
} catch (Exception $oErr) {
if ($sOutputFormat == 'json') {
$aResponse = array(
'status' => $oErr->getCode(),
'message' => $oErr->getMessage()
);
javascript_renderData($aResponse);
} else {
header('HTTP/1.0 500 Internal Server Error');
echo 'ERROR: '.$oErr->getMessage();
}
}

View File

@@ -1,19 +0,0 @@
{% include('functions/utils.sql') %}
{% include('functions/ranking.sql') %}
{% include('functions/importance.sql') %}
{% include('functions/address_lookup.sql') %}
{% include('functions/interpolation.sql') %}
{% if 'place' in db.tables %}
{% include 'functions/place_triggers.sql' %}
{% endif %}
{% if 'placex' in db.tables %}
{% include 'functions/placex_triggers.sql' %}
{% endif %}
{% if 'location_postcode' in db.tables %}
{% include 'functions/postcode_triggers.sql' %}
{% endif %}
{% include('functions/partition-functions.sql') %}

View File

@@ -1,307 +0,0 @@
-- Functions for returning address information for a place.
DROP TYPE IF EXISTS addressline CASCADE;
CREATE TYPE addressline as (
place_id BIGINT,
osm_type CHAR(1),
osm_id BIGINT,
name HSTORE,
class TEXT,
type TEXT,
place_type TEXT,
admin_level INTEGER,
fromarea BOOLEAN,
isaddress BOOLEAN,
rank_address INTEGER,
distance FLOAT
);
CREATE OR REPLACE FUNCTION get_name_by_language(name hstore, languagepref TEXT[])
RETURNS TEXT
AS $$
DECLARE
result TEXT;
BEGIN
IF name is null THEN
RETURN null;
END IF;
FOR j IN 1..array_upper(languagepref,1) LOOP
IF name ? languagepref[j] THEN
result := trim(name->languagepref[j]);
IF result != '' THEN
return result;
END IF;
END IF;
END LOOP;
-- anything will do as a fallback - just take the first name type thing there is
RETURN trim((avals(name))[1]);
END;
$$
LANGUAGE plpgsql IMMUTABLE;
--housenumber only needed for tiger data
CREATE OR REPLACE FUNCTION get_address_by_language(for_place_id BIGINT,
housenumber INTEGER,
languagepref TEXT[])
RETURNS TEXT
AS $$
DECLARE
result TEXT[];
currresult TEXT;
prevresult TEXT;
location RECORD;
BEGIN
result := '{}';
prevresult := '';
FOR location IN
SELECT name,
CASE WHEN place_id = for_place_id THEN 99 ELSE rank_address END as rank_address
FROM get_addressdata(for_place_id, housenumber)
WHERE isaddress order by rank_address desc
LOOP
currresult := trim(get_name_by_language(location.name, languagepref));
IF currresult != prevresult AND currresult IS NOT NULL
AND result[(100 - location.rank_address)] IS NULL
THEN
result[(100 - location.rank_address)] := currresult;
prevresult := currresult;
END IF;
END LOOP;
RETURN array_to_string(result,', ');
END;
$$
LANGUAGE plpgsql STABLE;
DROP TYPE IF EXISTS addressdata_place;
CREATE TYPE addressdata_place AS (
place_id BIGINT,
country_code VARCHAR(2),
housenumber TEXT,
postcode TEXT,
class TEXT,
type TEXT,
name HSTORE,
address HSTORE,
centroid GEOMETRY
);
-- Compute the list of address parts for the given place.
--
-- If in_housenumber is greator or equal 0, look for an interpolation.
CREATE OR REPLACE FUNCTION get_addressdata(in_place_id BIGINT, in_housenumber INTEGER)
RETURNS setof addressline
AS $$
DECLARE
place addressdata_place;
location RECORD;
current_rank_address INTEGER;
location_isaddress BOOLEAN;
BEGIN
-- The place in question might not have a direct entry in place_addressline.
-- Look for the parent of such places then and save it in place.
-- first query osmline (interpolation lines)
IF in_housenumber >= 0 THEN
SELECT parent_place_id as place_id, country_code,
in_housenumber as housenumber, postcode,
'place' as class, 'house' as type,
null as name, null as address,
ST_Centroid(linegeo) as centroid
INTO place
FROM location_property_osmline
WHERE place_id = in_place_id
AND in_housenumber between startnumber and endnumber;
END IF;
--then query tiger data
{% if config.get_bool('USE_US_TIGER_DATA') %}
IF place IS NULL AND in_housenumber >= 0 THEN
SELECT parent_place_id as place_id, 'us' as country_code,
in_housenumber as housenumber, postcode,
'place' as class, 'house' as type,
null as name, null as address,
ST_Centroid(linegeo) as centroid
INTO place
FROM location_property_tiger
WHERE place_id = in_place_id
AND in_housenumber between startnumber and endnumber;
END IF;
{% endif %}
-- postcode table
IF place IS NULL THEN
SELECT parent_place_id as place_id, country_code,
null::text as housenumber, postcode,
'place' as class, 'postcode' as type,
null as name, null as address,
null as centroid
INTO place
FROM location_postcode
WHERE place_id = in_place_id;
END IF;
-- POI objects in the placex table
IF place IS NULL THEN
SELECT parent_place_id as place_id, country_code,
coalesce(address->'housenumber',
address->'streetnumber',
address->'conscriptionnumber')::text as housenumber,
postcode,
class, type,
name, address,
centroid
INTO place
FROM placex
WHERE place_id = in_place_id and rank_search > 27;
END IF;
-- If place is still NULL at this point then the object has its own
-- entry in place_address line. However, still check if there is not linked
-- place we should be using instead.
IF place IS NULL THEN
select coalesce(linked_place_id, place_id) as place_id, country_code,
null::text as housenumber, postcode,
class, type,
null as name, address,
null as centroid
INTO place
FROM placex where place_id = in_place_id;
END IF;
--RAISE WARNING '% % % %',searchcountrycode, searchhousenumber, searchpostcode;
-- --- Return the record for the base entry.
FOR location IN
SELECT placex.place_id, osm_type, osm_id, name,
coalesce(extratags->'linked_place', extratags->'place') as place_type,
class, type, admin_level,
CASE WHEN rank_address = 0 THEN 100
WHEN rank_address = 11 THEN 5
ELSE rank_address END as rank_address,
country_code
FROM placex
WHERE place_id = place.place_id
LOOP
--RAISE WARNING '%',location;
IF location.rank_address < 4 THEN
-- no country locations for ranks higher than country
place.country_code := NULL::varchar(2);
ELSEIF place.country_code IS NULL AND location.country_code IS NOT NULL THEN
place.country_code := location.country_code;
END IF;
RETURN NEXT ROW(location.place_id, location.osm_type, location.osm_id,
location.name, location.class, location.type,
location.place_type,
location.admin_level, true,
location.type not in ('postcode', 'postal_code'),
location.rank_address, 0)::addressline;
current_rank_address := location.rank_address;
END LOOP;
-- --- Return records for address parts.
FOR location IN
SELECT placex.place_id, osm_type, osm_id, name, class, type,
coalesce(extratags->'linked_place', extratags->'place') as place_type,
admin_level, fromarea, isaddress,
CASE WHEN rank_address = 11 THEN 5 ELSE rank_address END as rank_address,
distance, country_code, postcode
FROM place_addressline join placex on (address_place_id = placex.place_id)
WHERE place_addressline.place_id IN (place.place_id, in_place_id)
AND linked_place_id is null
AND (placex.country_code IS NULL OR place.country_code IS NULL
OR placex.country_code = place.country_code)
ORDER BY rank_address desc,
(place_addressline.place_id = in_place_id) desc,
(fromarea and place.centroid is not null and not isaddress
and (place.address is null or avals(name) && avals(place.address))
and ST_Contains(geometry, place.centroid)) desc,
isaddress desc, fromarea desc,
distance asc, rank_search desc
LOOP
-- RAISE WARNING '%',location;
location_isaddress := location.rank_address != current_rank_address;
IF place.country_code IS NULL AND location.country_code IS NOT NULL THEN
place.country_code := location.country_code;
END IF;
IF location.type in ('postcode', 'postal_code')
AND place.postcode is not null
THEN
-- If the place had a postcode assigned, take this one only
-- into consideration when it is an area and the place does not have
-- a postcode itself.
IF location.fromarea AND location.isaddress
AND (place.address is null or not place.address ? 'postcode')
THEN
place.postcode := null; -- remove the less exact postcode
ELSE
location_isaddress := false;
END IF;
END IF;
RETURN NEXT ROW(location.place_id, location.osm_type, location.osm_id,
location.name, location.class, location.type,
location.place_type,
location.admin_level, location.fromarea,
location_isaddress,
location.rank_address,
location.distance)::addressline;
current_rank_address := location.rank_address;
END LOOP;
-- If no country was included yet, add the name information from country_name.
IF current_rank_address > 4 THEN
FOR location IN
SELECT name FROM country_name WHERE country_code = place.country_code LIMIT 1
LOOP
--RAISE WARNING '% % %',current_rank_address,searchcountrycode,countryname;
RETURN NEXT ROW(null, null, null, location.name, 'place', 'country', NULL,
null, true, true, 4, 0)::addressline;
END LOOP;
END IF;
-- Finally add some artificial rows.
IF place.country_code IS NOT NULL THEN
location := ROW(null, null, null, hstore('ref', place.country_code),
'place', 'country_code', null, null, true, false, 4, 0)::addressline;
RETURN NEXT location;
END IF;
IF place.name IS NOT NULL THEN
location := ROW(in_place_id, null, null, place.name, place.class,
place.type, null, null, true, true, 29, 0)::addressline;
RETURN NEXT location;
END IF;
IF place.housenumber IS NOT NULL THEN
location := ROW(null, null, null, hstore('ref', place.housenumber),
'place', 'house_number', null, null, true, true, 28, 0)::addressline;
RETURN NEXT location;
END IF;
IF place.address is not null and place.address ? '_unlisted_place' THEN
RETURN NEXT ROW(null, null, null, hstore('name', place.address->'_unlisted_place'),
'place', 'locality', null, null, true, true, 25, 0)::addressline;
END IF;
IF place.postcode is not null THEN
location := ROW(null, null, null, hstore('ref', place.postcode), 'place',
'postcode', null, null, false, true, 5, 0)::addressline;
RETURN NEXT location;
END IF;
RETURN;
END;
$$
LANGUAGE plpgsql STABLE;

View File

@@ -1,125 +0,0 @@
-- Functions for interpreting wkipedia/wikidata tags and computing importance.
DROP TYPE IF EXISTS wikipedia_article_match CASCADE;
CREATE TYPE wikipedia_article_match as (
language TEXT,
title TEXT,
importance FLOAT
);
DROP TYPE IF EXISTS place_importance CASCADE;
CREATE TYPE place_importance as (
importance FLOAT,
wikipedia TEXT
);
-- See: http://stackoverflow.com/questions/6410088/how-can-i-mimic-the-php-urldecode-function-in-postgresql
CREATE OR REPLACE FUNCTION decode_url_part(p varchar)
RETURNS varchar
AS $$
SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
SELECT CASE WHEN length(r.m[1]) = 1 THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex') ELSE substring(r.m[1] from 2 for 2) END
FROM regexp_matches($1, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');
$$
LANGUAGE SQL IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION catch_decode_url_part(p varchar)
RETURNS varchar
AS $$
DECLARE
BEGIN
RETURN decode_url_part(p);
EXCEPTION
WHEN others THEN return null;
END;
$$
LANGUAGE plpgsql IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION get_wikipedia_match(extratags HSTORE, country_code varchar(2))
RETURNS wikipedia_article_match
AS $$
DECLARE
langs TEXT[];
i INT;
wiki_article TEXT;
wiki_article_title TEXT;
wiki_article_language TEXT;
result wikipedia_article_match;
BEGIN
langs := ARRAY['english','country','ar','bg','ca','cs','da','de','en','es','eo','eu','fa','fr','ko','hi','hr','id','it','he','lt','hu','ms','nl','ja','no','pl','pt','kk','ro','ru','sk','sl','sr','fi','sv','tr','uk','vi','vo','war','zh'];
i := 1;
WHILE langs[i] IS NOT NULL LOOP
wiki_article := extratags->(case when langs[i] in ('english','country') THEN 'wikipedia' ELSE 'wikipedia:'||langs[i] END);
IF wiki_article is not null THEN
wiki_article := regexp_replace(wiki_article,E'^(.*?)([a-z]{2,3}).wikipedia.org/wiki/',E'\\2:');
wiki_article := regexp_replace(wiki_article,E'^(.*?)([a-z]{2,3}).wikipedia.org/w/index.php\\?title=',E'\\2:');
wiki_article := regexp_replace(wiki_article,E'^(.*?)/([a-z]{2,3})/wiki/',E'\\2:');
--wiki_article := regexp_replace(wiki_article,E'^(.*?)([a-z]{2,3})[=:]',E'\\2:');
wiki_article := replace(wiki_article,' ','_');
IF strpos(wiki_article, ':') IN (3,4) THEN
wiki_article_language := lower(trim(split_part(wiki_article, ':', 1)));
wiki_article_title := trim(substr(wiki_article, strpos(wiki_article, ':')+1));
ELSE
wiki_article_title := trim(wiki_article);
wiki_article_language := CASE WHEN langs[i] = 'english' THEN 'en' WHEN langs[i] = 'country' THEN get_country_language_code(country_code) ELSE langs[i] END;
END IF;
select wikipedia_article.language,wikipedia_article.title,wikipedia_article.importance
from wikipedia_article
where language = wiki_article_language and
(title = wiki_article_title OR title = catch_decode_url_part(wiki_article_title) OR title = replace(catch_decode_url_part(wiki_article_title),E'\\',''))
UNION ALL
select wikipedia_article.language,wikipedia_article.title,wikipedia_article.importance
from wikipedia_redirect join wikipedia_article on (wikipedia_redirect.language = wikipedia_article.language and wikipedia_redirect.to_title = wikipedia_article.title)
where wikipedia_redirect.language = wiki_article_language and
(from_title = wiki_article_title OR from_title = catch_decode_url_part(wiki_article_title) OR from_title = replace(catch_decode_url_part(wiki_article_title),E'\\',''))
order by importance desc limit 1 INTO result;
IF result.language is not null THEN
return result;
END IF;
END IF;
i := i + 1;
END LOOP;
RETURN NULL;
END;
$$
LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION compute_importance(extratags HSTORE,
country_code varchar(2),
osm_type varchar(1), osm_id BIGINT)
RETURNS place_importance
AS $$
DECLARE
match RECORD;
result place_importance;
BEGIN
FOR match IN SELECT * FROM get_wikipedia_match(extratags, country_code)
WHERE language is not NULL
LOOP
result.importance := match.importance;
result.wikipedia := match.language || ':' || match.title;
RETURN result;
END LOOP;
IF extratags ? 'wikidata' THEN
FOR match IN SELECT * FROM wikipedia_article
WHERE wd_page_title = extratags->'wikidata'
ORDER BY language = 'en' DESC, langcount DESC LIMIT 1 LOOP
result.importance := match.importance;
result.wikipedia := match.language || ':' || match.title;
RETURN result;
END LOOP;
END IF;
RETURN null;
END;
$$
LANGUAGE plpgsql;

Some files were not shown because too many files have changed in this diff Show More