Skip to content

Commit

Permalink
#281 scripts to migrate extra theme
Browse files Browse the repository at this point in the history
  • Loading branch information
etj committed Mar 17, 2022
1 parent d43d13d commit 8b51d81
Show file tree
Hide file tree
Showing 4 changed files with 155 additions and 29 deletions.
56 changes: 39 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,52 +421,74 @@ In order to update the existing translations proceed as follow:
python setup.py compile_catalog --locale YOUR_LANGUAGE


## Updating an old installation
## Updating an existing installation

### Migration from 1.1.0 to 2.0.0

1. Dump of ckan and datastore databases (this is a safety measure)

2. Run the migration script:

ckan -c CONFIG_FILE dcatapit migrate-200

4. Reindex the datasets

#### Migration details

The `theme` extra field is now required by the `ckanext-dcat` extension, and it's required to be a valid URI.

The `ckanext-dcatapit` extension used the `theme` field as a dict for holding information about multiple themes and
subthemes, and this content would conflict with the dcat one.

The migration will move the content from the `theme` extra field to the `themes_aggregate` field,
while the logic will provide on-the-fly valid content for the `theme` field so that `ckanext-dcat` will not complain.

### Migration from 1.0.0 to 1.1.0

In order to update an old installation (from 1.0.0 to 1.1.0 version):

1. Dump of ckan and datastore databases (this is a safety measure):

su postgres
pg_dump -U postgres -i ckan > ckan.dump
pg_dump -U postgres -i datastore > datastore.dump
su postgres
pg_dump -U postgres -i ckan > ckan.dump
pg_dump -U postgres -i datastore > datastore.dump

2. Update extension code:

git pull

3. Update the Solr schema as reported in the installation steps and then restart Solr. In particular ensure that following fields are present in schema.xml:

<field name="dcat_theme" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="dcat_subtheme" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="dcat_subtheme_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="organization_region_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="resource_license_*" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="resource_license" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="dcat_theme" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="dcat_subtheme" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="dcat_subtheme_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="organization_region_*" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="resource_license_*" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="resource_license" type="string" indexed="true" stored="false" multiValued="true"/>

4. Ensure that all the configuration properties required by the new version have been properly provided in .ini file (see [Installation](#installation) paragraph)

5. Activate the virtual environment:

. /usr/lib/ckan/default/bin/activate
. /usr/lib/ckan/default/bin/activate
6. Run model update

paster --plugin=ckanext-dcatapit vocabulary initdb --config=/etc/ckan/default/production.ini

7. Run vocabulary load commands (regions, licenses and sub-themes):

wget "https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/VocabolariControllati/territorial-classifications/regions/regions.rdf" -O "/tmp/regions.rdf"
wget "https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/VocabolariControllati/territorial-classifications/regions/regions.rdf" -O "/tmp/regions.rdf"

paster --plugin=ckanext-dcatapit vocabulary load --filename "/tmp/regions.rdf" --name regions --config "/etc/ckan/default/production.ini"
paster --plugin=ckanext-dcatapit vocabulary load --filename "/tmp/regions.rdf" --name regions --config "/etc/ckan/default/production.ini"

wget "https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/VocabolariControllati/licences/licences.rdf" -O "/tmp/licenses.rdf"
wget "https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/VocabolariControllati/licences/licences.rdf" -O "/tmp/licenses.rdf"

paster --plugin=ckanext-dcatapit vocabulary load --filename "/tmp/licenses.rdf" --name licenses --config "/etc/ckan/default/production.ini"
paster --plugin=ckanext-dcatapit vocabulary load --filename "ckanext-dcatapit/examples/eurovoc_mapping.rdf" --name subthemes --config "/etc/ckan/default/production.ini" "ckanext-dcatapit/examples/eurovoc.rdf"
paster --plugin=ckanext-dcatapit vocabulary load --filename "/tmp/licenses.rdf" --name licenses --config "/etc/ckan/default/production.ini"
paster --plugin=ckanext-dcatapit vocabulary load --filename "ckanext-dcatapit/examples/eurovoc_mapping.rdf" --name subthemes --config "/etc/ckan/default/production.ini" "ckanext-dcatapit/examples/eurovoc.rdf"

8. Run data migration command:

paster --plugin=ckanext-dcatapit vocabulary migrate_data --config=/etc/ckan/default/production.ini > migration.log
ckan -c CONFIG_FILE dcatapit migrate-110

You can review migration results by viewing `migration.log` file. It will contain list of messages generated during migration.
There are additional command switches that can be used to optimize processing:
Expand Down
18 changes: 13 additions & 5 deletions ckanext/dcatapit/commands/dcatapit.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
import click
import logging

from ckanext.dcatapit.commands.migrate import do_migrate_data
import ckanext.dcatapit.commands.migrate110 as migrate110
import ckanext.dcatapit.commands.migrate200 as migrate200
from ckanext.dcatapit.commands.vocabulary import load_from_file as load_voc

log = logging.getLogger(__name__)
Expand All @@ -28,18 +29,25 @@ def initdb():
click.secho('DCATAPIT DB tables not created', fg=u"yellow")


@dcatapit.command()
@dcatapit.command(help='Migrate from 1.0.0 version to 1.1.0 (many elements 0..1 now are 0..N)')
@click.option('-o', '--offset', default=None, type=int,
help='Start from dataset at offset during data migration')
@click.option('-l', '--limit', default=None, type=int,
help='Limit number of processed datasets during data migration')
@click.option('-s', '--skip-orgs', is_flag=True,
help='Skip organizations in data migration')
def migrate_data(offset, limit, skip_orgs=False):
do_migrate_data(limit=limit, offset=offset, skip_orgs=skip_orgs)
def migrate_110(offset, limit, skip_orgs=False):
migrate110.do_migrate_data(limit=limit, offset=offset, skip_orgs=skip_orgs)


@dcatapit.command()
@dcatapit.command(help='Migrate to 2.0.0 (themes are encoded in a different named field)')
@click.option('-f', '--fix-old', is_flag=True, default=False,
help='Try and fix datasets in older 1.0.0 format')
def migrate_200(fix_old):
migrate200.migrate(fix_old)


@dcatapit.command(help='Load an RDF vocabulary into the DB')
@click.option('-f', "--filename", required=False, help='Path to a file', type=str)
@click.option('--url', required=False, help='URL to a resource')
@click.option('--format', default='xml', help='Use specific graph format (xml, turtle..), default: xml')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,18 @@
from ckan.lib.navl.dictization_functions import Invalid
from ckan.logic import ValidationError
from ckan.logic.validators import tag_name_validator
from ckan.model.meta import Session
from ckan.model import (
Group,
GroupExtra,
Package,
PackageExtra,
repo,
)
from ckan.model.meta import Session

from ckanext.multilang.model import PackageMultilang as ML_PM

from ckanext.dcatapit.schema import FIELD_THEMES_AGGREGATE
from ckanext.dcatapit import validators
import ckanext.dcatapit.interfaces as interfaces

Expand All @@ -33,7 +34,10 @@
log = logging.getLogger(__name__)


def do_migrate_data(limit=None, offset=None, skip_orgs=False):
def do_migrate_data(limit=None, offset=None, skip_orgs=False, pkg_uuid: list = None):
# Data migrations from 1.0.0 to 1.1.0
# ref: https://github.com/geosolutions-it/ckanext-dcatapit/issues/188

from ckanext.dcatapit.plugin import DCATAPITPackagePlugin

user = toolkit.get_action('get_site_user')({'ignore_auth': True}, {})
Expand Down Expand Up @@ -77,7 +81,7 @@ def do_migrate_data(limit=None, offset=None, skip_orgs=False):
else:
log.info(u'Skipping organizations processing')
pcontext = context.copy()
pkg_list = get_package_list()
pkg_list = get_package_list(pkg_uuid)
pcount = pkg_list.count()
log.info(f'processing {pcount} packages')
errored = []
Expand Down Expand Up @@ -153,12 +157,18 @@ def do_migrate_data(limit=None, offset=None, skip_orgs=False):
log.error(
f' {ptitile} at position {position}: {err.__class__}{err_summary}'
)
return pidx_count


def get_package_list(pkg_uuid=None):
query = Session.query(Package.name)\
.filter(Package.state.in_(['active', 'draft']),
Package.type == 'dataset')

def get_package_list():
return Session.query(Package.name).filter(Package.state == 'active',
Package.type == 'dataset') \
.order_by(Package.title)
if pkg_uuid:
query = query.filter(Package.id.in_(pkg_uuid))

return query.order_by(Package.title)


def get_organization_list():
Expand Down Expand Up @@ -222,6 +232,8 @@ def update_conforms_to(pdata):


def update_creator(pdata):
# move "creator_name" and "creator_identifier" into a json struct in field "creator"
# old format foresaw a single creator, new struct allows N creators
if pdata.get('creator'):
return
cname = pdata.pop('creator_name', None)
Expand Down Expand Up @@ -250,6 +262,9 @@ def update_creator(pdata):


def update_theme(pdata):
if FIELD_THEMES_AGGREGATE in pdata:
return

theme = pdata.pop('theme', None)
if not theme:
to_delete = []
Expand Down
81 changes: 81 additions & 0 deletions ckanext/dcatapit/commands/migrate200.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import json
import logging
import uuid
from datetime import datetime

from sqlalchemy import and_

import ckan.plugins.toolkit as toolkit
from ckan.lib.base import config
from ckan.lib.navl.dictization_functions import Invalid
from ckan.logic import ValidationError
from ckan.logic.validators import tag_name_validator
from ckan.model.meta import Session
from ckan.model import (
Group,
GroupExtra,
Package,
PackageExtra,
repo,
)

from ckanext.dcatapit.schema import FIELD_THEMES_AGGREGATE
from ckanext.dcatapit import validators
import ckanext.dcatapit.interfaces as interfaces

REGION_TYPE = 'https://w3id.org/italia/onto/CLV/Region'
NAME_TYPE = 'https://w3id.org/italia/onto/l0/name'

DEFAULT_LANG = config.get('ckan.locale_default', 'en')
DATE_FORMAT = '%d-%m-%Y'

log = logging.getLogger(__name__)


def migrate(fix_old=False):
# Data migrations from 1.1.0 to 2.0.0

cnt_migrated = migrate_themes()
cnt_obsolete_found, cnt_obsolete_migrated = check_obsolete_themes(fix_old)

log.info(f'========== Migration summary ==========')
log.info(f'Migrated theme extra keys: {cnt_migrated}')
log.info(f'Obsolete theme found: {cnt_obsolete_found}')
if fix_old:
log.info(f'Obsolete theme migrated: {cnt_obsolete_migrated}')
else:
log.info(f'*** You may want to use the --fix-old argument to fix the pre-1.1.0 datasets')

def migrate_themes():
# migrate current extras
extra_themes = Session.query(PackageExtra) \
.filter(PackageExtra.key == 'theme') \
.filter(PackageExtra.value.like('%"subthemes"%'))

cnt_extra = extra_themes.count()

log.info(f'Migrating theme extra keys: {cnt_extra}')
for x_theme in extra_themes:
x_theme.key = FIELD_THEMES_AGGREGATE
x_theme.save()

return cnt_extra

def check_obsolete_themes(fix_old):
bad_extra_themes = Session.query(PackageExtra) \
.filter(PackageExtra.key == 'theme') \
.filter(PackageExtra.value.notlike('%"subthemes"%'))

cnt_bad = bad_extra_themes.count()
if cnt_bad:
log.error(f'There are {cnt_bad} themes in the 1.0.0 plain format. Please review your DB.')

migrated = 0
if fix_old:
import ckanext.dcatapit.commands.migrate110 as migrate110

uuid = [pe.package_id for pe in bad_extra_themes]
log.debug(f'bad packages id {uuid}')
migrated = migrate110.do_migrate_data(skip_orgs=True, pkg_uuid=uuid)

return cnt_bad, migrated

0 comments on commit 8b51d81

Please sign in to comment.