Merge pull request #2424 from lonvia/multi-country-import

Update instructions for importing multiple regions
This commit is contained in:
Sarah Hoffmann
2021-08-16 08:48:28 +02:00
committed by GitHub
10 changed files with 175 additions and 156 deletions

View File

@@ -5,9 +5,34 @@ your Nominatim database. It is assumed that you have already successfully
installed the Nominatim software itself, if not return to the
[installation page](Installation.md).
## Importing multiple regions
## Importing multiple regions (without updates)
To import multiple regions in your database, you need to configure and run `utils/import_multiple_regions.sh` file. This script will set up the update directory which has the following structure:
To import multiple regions in your database you can simply give multiple
OSM files to the import command:
```
nominatim import --osm-file file1.pbf --osm-file file2.pbf
```
If you already have imported a file and want to add another one, you can
use the add-data function to import the additional data as follows:
```
nominatim add-data --file <FILE>
nominatim refresh --postcodes
nominatim index -j <NUMBER OF THREADS>
```
Please note that adding additional data is always significantly slower than
the original import.
## Importing multiple regions (with updates)
If you want to import multiple regions _and_ be able to keep them up-to-date
with updates, then you can use the scripts provided in the `utils` directory.
These scripts will set up an `update` directory in your project directory,
which has the following structure:
```bash
update
@@ -17,7 +42,6 @@ update
   │   └── monaco
   │   └── sequence.state
   └── tmp
├── combined.osm.pbf
└── europe
├── andorra-latest.osm.pbf
└── monaco-latest.osm.pbf
@@ -25,85 +49,57 @@ update
```
The `sequence.state` files will contain the sequence ID, which will be used by pyosmium to get updates. The tmp folder is used for import dump.
The `sequence.state` files contain the sequence ID for each region. They will
be used by pyosmium to get updates. The `tmp` folder is used for import dump and
can be deleted once the import is complete.
### Configuring multiple regions
The file `import_multiple_regions.sh` needs to be edited as per your requirement:
1. List of countries. eg:
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
### Setting up multiple regions
!!! tip
If your database already exists and you want to add more countries,
replace the setting up part
`${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1`
with `${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf --index --index-instances N 2>&1`
where N is the numbers of CPUs in your system.
Create a project directory as described for the
[simple import](Import.md#creating-the-project-directory). If necessary,
you can also add an `.env` configuration with customized options. In particular,
you need to make sure that `NOMINATIM_REPLICATION_UPDATE_INTERVAL` and
`NOMINATIM_REPLICATION_RECHECK_INTERVAL` are set according to the update
interval of the extract server you use.
Run the following command from your Nominatim directory after configuring the file.
Copy the scripts `utils/import_multiple_regions.sh` and `utils/update_database.sh`
into the project directory.
bash ./utils/import_multiple_regions.sh
Now customize both files as per your requirements
!!! danger "Important"
This file uses osmium-tool. It must be installed before executing the import script.
Installation instructions can be found [here](https://osmcode.org/osmium-tool/manual.html#installation).
### Updating multiple regions
To import multiple regions in your database, you need to configure and run ```utils/update_database.sh```.
This uses the update directory set up while setting up the DB.
### Configuring multiple regions
The file `update_database.sh` needs to be edited as per your requirement:
1. List of countries. eg:
1. List of countries. e.g.
COUNTRIES="europe/monaco europe/andorra"
2. Path to Build directory. eg:
2. URL to the service providing the extracts and updates. eg:
NOMINATIMBUILD="/srv/nominatim/build"
3. Path to Update directory. eg:
UPDATEDIR="/srv/nominatim/update"
4. Replication URL. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-updates"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
5. Followup can be set according to your installation. eg: For Photon,
5. Followup in the update script can be set according to your installation.
E.g. for Photon,
FOLLOWUP="curl http://localhost:2322/nominatim-update"
will handle the indexing.
To start the initial import, change into the project directory and run
```
bash import_multiple_regions.sh
```
### Updating the database
Run the following command from your Nominatim directory after configuring the file.
Change into the project directory and run the following command:
bash ./utils/update_database.sh
bash update_database.sh
This will get diffs from the replication server, import diffs and index the database. The default replication server in the script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
This will get diffs from the replication server, import diffs and index
the database. The default replication server in the
script([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Importing Nominatim to an external PostgreSQL database

View File

@@ -19,7 +19,7 @@ class Tokenizer
public function checkStatus()
{
$sSQL = 'SELECT word_id FROM word limit 1';
$sSQL = 'SELECT word_id FROM word WHERE word_id is not null limit 1';
$iWordID = $this->oDB->getOne($sSQL);
if ($iWordID === false) {
throw new \Exception('Query failed', 703);

View File

@@ -1,7 +1,12 @@
"""
Provides custom functions over command-line arguments.
"""
import logging
from pathlib import Path
from nominatim.errors import UsageError
LOG = logging.getLogger()
class NominatimArgs:
""" Customized namespace class for the nominatim command line tool
@@ -25,3 +30,20 @@ class NominatimArgs:
main_index=self.config.TABLESPACE_PLACE_INDEX
)
)
def get_osm_file_list(self):
""" Return the --osm-file argument as a list of Paths or None
if no argument was given. The function also checks if the files
exist and raises a UsageError if one cannot be found.
"""
if not self.osm_file:
return None
files = [Path(f) for f in self.osm_file]
for fname in files:
if not fname.is_file():
LOG.fatal("OSM file '%s' does not exist.", fname)
raise UsageError('Cannot access file.')
return files

View File

@@ -9,7 +9,6 @@ import psutil
from nominatim.db.connection import connect
from nominatim.db import status, properties
from nominatim.version import NOMINATIM_VERSION
from nominatim.errors import UsageError
# Do not repeat documentation of subcommand classes.
# pylint: disable=C0111
@@ -27,8 +26,9 @@ class SetupAll:
def add_args(parser):
group_name = parser.add_argument_group('Required arguments')
group = group_name.add_mutually_exclusive_group(required=True)
group.add_argument('--osm-file', metavar='FILE',
help='OSM file to be imported.')
group.add_argument('--osm-file', metavar='FILE', action='append',
help='OSM file to be imported'
' (repeat for importing multiple files.')
group.add_argument('--continue', dest='continue_at',
choices=['load-data', 'indexing', 'db-postprocess'],
help='Continue an import that was interrupted')
@@ -51,42 +51,25 @@ class SetupAll:
@staticmethod
def run(args): # pylint: disable=too-many-statements
def run(args):
from ..tools import database_import, refresh, postcodes, freeze
from ..indexer.indexer import Indexer
from ..tokenizer import factory as tokenizer_factory
if args.osm_file and not Path(args.osm_file).is_file():
LOG.fatal("OSM file '%s' does not exist.", args.osm_file)
raise UsageError('Cannot access file.')
if args.continue_at is None:
files = args.get_osm_file_list()
database_import.setup_database_skeleton(args.config.get_libpq_dsn(),
args.data_dir,
args.no_partitions,
rouser=args.config.DATABASE_WEBUSER)
LOG.warning('Importing OSM data file')
database_import.import_osm_data(Path(args.osm_file),
database_import.import_osm_data(files,
args.osm2pgsql_options(0, 1),
drop=args.no_updates,
ignore_errors=args.ignore_errors)
with connect(args.config.get_libpq_dsn()) as conn:
LOG.warning('Create functions (1st pass)')
refresh.create_functions(conn, args.config, False, False)
LOG.warning('Create tables')
database_import.create_tables(conn, args.config,
reverse_only=args.reverse_only)
refresh.load_address_levels_from_file(conn, Path(args.config.ADDRESS_LEVEL_CONFIG))
LOG.warning('Create functions (2nd pass)')
refresh.create_functions(conn, args.config, False, False)
LOG.warning('Create table triggers')
database_import.create_table_triggers(conn, args.config)
LOG.warning('Create partition tables')
database_import.create_partition_tables(conn, args.config)
LOG.warning('Create functions (3rd pass)')
refresh.create_functions(conn, args.config, False, False)
SetupAll._setup_tables(args.config, args.reverse_only)
LOG.warning('Importing wikipedia importance data')
data_path = Path(args.config.WIKIPEDIA_DATA_PATH or args.project_dir)
@@ -105,12 +88,7 @@ class SetupAll:
args.threads or psutil.cpu_count() or 1)
LOG.warning("Setting up tokenizer")
if args.continue_at is None or args.continue_at == 'load-data':
# (re)initialise the tokenizer data
tokenizer = tokenizer_factory.create_tokenizer(args.config)
else:
# just load the tokenizer
tokenizer = tokenizer_factory.get_tokenizer_for_db(args.config)
tokenizer = SetupAll._get_tokenizer(args.continue_at, args.config)
if args.continue_at is None or args.continue_at == 'load-data':
LOG.warning('Calculate postcodes')
@@ -145,19 +123,48 @@ class SetupAll:
refresh.setup_website(webdir, args.config, conn)
with connect(args.config.get_libpq_dsn()) as conn:
try:
dbdate = status.compute_database_date(conn)
status.set_status(conn, dbdate)
LOG.info('Database is at %s.', dbdate)
except Exception as exc: # pylint: disable=broad-except
LOG.error('Cannot determine date of database: %s', exc)
SetupAll._set_database_date(conn)
properties.set_property(conn, 'database_version',
'{0[0]}.{0[1]}.{0[2]}-{0[3]}'.format(NOMINATIM_VERSION))
return 0
@staticmethod
def _setup_tables(config, reverse_only):
""" Set up the basic database layout: tables, indexes and functions.
"""
from ..tools import database_import, refresh
with connect(config.get_libpq_dsn()) as conn:
LOG.warning('Create functions (1st pass)')
refresh.create_functions(conn, config, False, False)
LOG.warning('Create tables')
database_import.create_tables(conn, config, reverse_only=reverse_only)
refresh.load_address_levels_from_file(conn, Path(config.ADDRESS_LEVEL_CONFIG))
LOG.warning('Create functions (2nd pass)')
refresh.create_functions(conn, config, False, False)
LOG.warning('Create table triggers')
database_import.create_table_triggers(conn, config)
LOG.warning('Create partition tables')
database_import.create_partition_tables(conn, config)
LOG.warning('Create functions (3rd pass)')
refresh.create_functions(conn, config, False, False)
@staticmethod
def _get_tokenizer(continue_at, config):
""" Set up a new tokenizer or load an already initialised one.
"""
from ..tokenizer import factory as tokenizer_factory
if continue_at is None or continue_at == 'load-data':
# (re)initialise the tokenizer data
return tokenizer_factory.create_tokenizer(config)
# just load the tokenizer
return tokenizer_factory.get_tokenizer_for_db(config)
@staticmethod
def _create_pending_index(conn, tablespace):
""" Add a supporting index for finding places still to be indexed.
@@ -178,3 +185,15 @@ class SetupAll:
{} WHERE indexed_status > 0
""".format(tablespace))
conn.commit()
@staticmethod
def _set_database_date(conn):
""" Determine the database date and set the status accordingly.
"""
try:
dbdate = status.compute_database_date(conn)
status.set_status(conn, dbdate)
LOG.info('Database is at %s.', dbdate)
except Exception as exc: # pylint: disable=broad-except
LOG.error('Cannot determine date of database: %s', exc)

View File

@@ -103,11 +103,11 @@ def import_base_data(dsn, sql_dir, ignore_partitions=False):
conn.commit()
def import_osm_data(osm_file, options, drop=False, ignore_errors=False):
""" Import the given OSM file. 'options' contains the list of
def import_osm_data(osm_files, options, drop=False, ignore_errors=False):
""" Import the given OSM files. 'options' contains the list of
default settings for osm2pgsql.
"""
options['import_file'] = osm_file
options['import_file'] = osm_files
options['append'] = False
options['threads'] = 1
@@ -115,7 +115,12 @@ def import_osm_data(osm_file, options, drop=False, ignore_errors=False):
# Make some educated guesses about cache size based on the size
# of the import file and the available memory.
mem = psutil.virtual_memory()
fsize = os.stat(str(osm_file)).st_size
fsize = 0
if isinstance(osm_files, list):
for fname in osm_files:
fsize += os.stat(str(fname)).st_size
else:
fsize = os.stat(str(osm_files)).st_size
options['osm2pgsql_cache'] = int(min((mem.available + mem.cached) * 0.75,
fsize * 2) / 1024 / 1024) + 1

View File

@@ -130,6 +130,9 @@ def run_osm2pgsql(options):
if 'import_data' in options:
cmd.extend(('-r', 'xml', '-'))
elif isinstance(options['import_file'], list):
for fname in options['import_file']:
cmd.append(str(fname))
else:
cmd.append(str(options['import_file']))

View File

@@ -98,14 +98,25 @@ def test_import_base_data_ignore_partitions(dsn, src_dir, temp_db_with_extension
def test_import_osm_data_simple(table_factory, osm2pgsql_options):
table_factory('place', content=((1, ), ))
database_import.import_osm_data('file.pdf', osm2pgsql_options)
database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options)
def test_import_osm_data_multifile(table_factory, tmp_path, osm2pgsql_options):
table_factory('place', content=((1, ), ))
osm2pgsql_options['osm2pgsql_cache'] = 0
files = [tmp_path / 'file1.osm', tmp_path / 'file2.osm']
for f in files:
f.write_text('test')
database_import.import_osm_data(files, osm2pgsql_options)
def test_import_osm_data_simple_no_data(table_factory, osm2pgsql_options):
table_factory('place')
with pytest.raises(UsageError, match='No data.*'):
database_import.import_osm_data('file.pdf', osm2pgsql_options)
database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options)
def test_import_osm_data_drop(table_factory, temp_db_conn, tmp_path, osm2pgsql_options):
@@ -117,7 +128,7 @@ def test_import_osm_data_drop(table_factory, temp_db_conn, tmp_path, osm2pgsql_o
osm2pgsql_options['flatnode_file'] = str(flatfile.resolve())
database_import.import_osm_data('file.pdf', osm2pgsql_options, drop=True)
database_import.import_osm_data(Path('file.pbf'), osm2pgsql_options, drop=True)
assert not flatfile.exists()
assert not temp_db_conn.table_exists('planet_osm_nodes')

View File

@@ -8,8 +8,6 @@
# *) Set up sequence.state for updates
# *) Merge the pbf files into a single file.
# *) Setup nominatim db using 'setup.php --osm-file'
# Hint:
@@ -28,16 +26,6 @@ touch2() { mkdir -p "$(dirname "$1")" && touch "$1" ; }
COUNTRIES="europe/monaco europe/andorra"
# SET TO YOUR NOMINATIM build FOLDER PATH:
NOMINATIMBUILD="/srv/nominatim/build"
SETUPFILE="$NOMINATIMBUILD/utils/setup.php"
UPDATEFILE="$NOMINATIMBUILD/utils/update.php"
# SET TO YOUR update FOLDER PATH:
UPDATEDIR="/srv/nominatim/update"
# SET TO YOUR replication server URL:
BASEURL="https://download.geofabrik.de"
@@ -46,27 +34,24 @@ DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
# End of configuration section
# ******************************************************************************
COMBINEFILES="osmium merge"
UPDATEDIR=update
IMPORT_CMD="nominatim import"
mkdir -p ${UPDATEDIR}
cd ${UPDATEDIR}
pushd ${UPDATEDIR}
rm -rf tmp
mkdir -p tmp
cd tmp
popd
for COUNTRY in $COUNTRIES;
do
echo "===================================================================="
echo "$COUNTRY"
echo "===================================================================="
DIR="$UPDATEDIR/$COUNTRY"
FILE="$DIR/configuration.txt"
DOWNURL="$BASEURL/$COUNTRY$DOWNCOUNTRYPOSTFIX"
IMPORTFILE=$COUNTRY$DOWNCOUNTRYPOSTFIX
IMPORTFILEPATH=${UPDATEDIR}/tmp/${IMPORTFILE}
FILENAME=${COUNTRY//[\/]/_}
touch2 $IMPORTFILEPATH
wget ${DOWNURL} -O $IMPORTFILEPATH
@@ -74,18 +59,12 @@ do
touch2 ${DIR}/sequence.state
pyosmium-get-changes -O $IMPORTFILEPATH -f ${DIR}/sequence.state -v
COMBINEFILES="${COMBINEFILES} ${IMPORTFILEPATH}"
IMPORT_CMD="${IMPORT_CMD} --osm-file ${IMPORTFILEPATH}"
echo $IMPORTFILE
echo "===================================================================="
done
echo "${COMBINEFILES} -o combined.osm.pbf"
${COMBINEFILES} -o combined.osm.pbf
echo "===================================================================="
echo "Setting up nominatim db"
${SETUPFILE} --osm-file ${UPDATEDIR}/tmp/combined.osm.pbf --all 2>&1
# ${UPDATEFILE} --import-file ${UPDATEDIR}/tmp/combined.osm.pbf 2>&1
echo "===================================================================="
${IMPORT_CMD} 2>&1
echo "===================================================================="

View File

@@ -22,25 +22,14 @@
# REPLACE WITH LIST OF YOUR "COUNTRIES":
#
COUNTRIES="europe/monaco europe/andorra"
# SET TO YOUR NOMINATIM build FOLDER PATH:
#
NOMINATIMBUILD="/srv/nominatim/build"
UPDATEFILE="$NOMINATIMBUILD/utils/update.php"
# SET TO YOUR update data FOLDER PATH:
#
UPDATEDIR="/srv/nominatim/update"
UPDATEBASEURL="https://download.geofabrik.de"
UPDATECOUNTRYPOSTFIX="-updates"
# If you do not use Photon, let Nominatim handle (re-)indexing:
#
FOLLOWUP="$UPDATEFILE --index"
FOLLOWUP="nominatim index"
#
# If you use Photon, update Photon and let it handle the index
# (Photon server must be running and must have been started with "-database",
@@ -49,11 +38,10 @@ FOLLOWUP="$UPDATEFILE --index"
#FOLLOWUP="curl http://localhost:2322/nominatim-update"
# ******************************************************************************
UPDATEDIR="update"
for COUNTRY in $COUNTRIES;
do
echo "===================================================================="
echo "$COUNTRY"
echo "===================================================================="
@@ -61,20 +49,16 @@ do
FILE="$DIR/sequence.state"
BASEURL="$UPDATEBASEURL/$COUNTRY$UPDATECOUNTRYPOSTFIX"
FILENAME=${COUNTRY//[\/]/_}
# mkdir -p ${DIR}
cd ${DIR}
echo "Attempting to get changes"
rm -f ${DIR}/${FILENAME}.osc.gz
pyosmium-get-changes -o ${DIR}/${FILENAME}.osc.gz -f ${FILE} --server $BASEURL -v
echo "Attempting to import diffs"
${NOMINATIMBUILD}/utils/update.php --import-diff ${DIR}/${FILENAME}.osc.gz
rm ${DIR}/${FILENAME}.osc.gz
nominatim add-data --diff ${DIR}/${FILENAME}.osc.gz
done
echo "===================================================================="
echo "Reindexing"
${FOLLOWUP}
echo "===================================================================="
echo "===================================================================="