Mapping Suite CLI toolchain

This set of instructions is prepared fo semantic engineers who intend to work with mapping suites already available or being developed in the TED-RDF Mappings repository.

Installation instructions for semantic engineers

It is required to use Python version 3.8 or greater on a Linux machine.

Open a Linux terminal and clone the ted-rdf-mapping project.

git clone https://github.com/OP-TED/ted-rdf-mapping
cd ted-rdf-mapping

Create a virtual Python environment and activate it.

pip install virtualenv # install first
python -m venv ./venv # Create a virtual environment named venv
source ./venv/bin/activate # to active the environment

If you already have the venv created then every time you work on the mapping suites, please remember to activate it.

source ./venv/bin/activate # to active the environment

Install the TED-SWS CLIs as a Python package using the pip package manager.

pip install git+https://github.com/OP-TED/ted-rdf-conversion-pipeline#egg=ted-sws

Usage

CLI tools (commands/console-scripts) described in this section are developed for mapping suite generation, processing, validation, testing, and other purposes.

To use any of these CLI commands using MAPPING_SUITE_ID argument should be enough for general purpose.

CMD: normalisation_resource_generator

Generates all resources files needed for notice mapping suite transformation.

Use:

normalisation_resource_generator --help

to get the Usage Help:

Usage: normalisation_resource_generator [OPTIONS] [MAPPING_SUITE_ID]

Options:
  -i, --opt-queries-folder TEXT               Use to overwrite default INPUT
  -o, --opt-output-folder TEXT                Use to overwrite default OUTPUT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: resources_injector

Injects the requested resources from Conceptual Mappings into the MappingSuite.

Use:

resources_injector --help

to get the Usage Help:

Usage: resources_injector [OPTIONS] [MAPPING_SUITE_ID]

  Injects the requested resources from Conceptual Mappings into the MappingSuite

Options:
  -i, --opt-conceptual-mappings-file TEXT     Use to overwrite default INPUT
  -o, --opt-output-folder TEXT                Use to overwrite default OUTPUT
  -r, --opt-resources-folder TEXT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: rml_modules_injector

Injects the requested RML modules from Conceptual Mappings into the MappingSuite.

Use:

rml_modules_injector --help

to get the Usage Help:

Usage: rml_modules_injector [OPTIONS] [MAPPING_SUITE_ID]

  Injects the requested RML modules from Conceptual Mappings into the MappingSuite

Options:
  -i, --opt-conceptual-mappings-file TEXT     Use to overwrite default INPUT
  -o, --opt-output-folder TEXT                Use to overwrite default OUTPUT
  -c, --opt-clean BOOLEAN                     Use to clean the OUTPUT folder
  -r, --opt-rml-modules-folder TEXT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: metadata_generator

Generates metadata.json file from Conceptual Mappings file data.

Use:

metadata_generator --help

to get the Usage Help:

Usage: metadata_generator [OPTIONS] [MAPPING_SUITE_ID]

  Generates Metadata from Conceptual Mappings.

Options:
  -i, --opt-conceptual-mappings-file TEXT     Use to overwrite default INPUT
  -o, --opt-output-metadata-file TEXT         Use to overwrite default OUTPUT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: yarrrml2rml_converter

Converts YARRRML data to RML data.

Use:

yarrrml2rml_converter --help

to get the Usage Help:

Usage: yarrrml2rml_converter [OPTIONS] [MAPPING_SUITE_ID] [RML_OUTPUT_FILE_NAME]

  Converts YARRRML to RML. Skip RML_OUTPUT_FILE_NAME to use the default name.

Options:
  -i, --opt-yarrrml-input-file TEXT           Use to overwrite default INPUT
  -o, --opt-rml-output-file TEXT              Use to overwrite default OUTPUT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: sparql_generator

Generates SPARQL queries from Conceptual Mappings file data.

Use:

sparql_generator --help

to get the Usage Help:

Usage: sparql_generator [OPTIONS] [MAPPING_SUITE_ID]

  Generates SPARQL queries from Conceptual Mappings.

Options:
  -i, --opt-conceptual-mappings-file TEXT         Use to overwrite default INPUT
  -o, --opt-output-sparql-queries-folder TEXT     Use to overwrite default OUTPUT
  -rq-name, --opt-rq-name TEXT
  -m, --opt-mappings-folder TEXT
  --help                                          Show this message and exit.

CMD: mapping_runner

Transforms the Test Mapping Suites.

Use:

mapping_runner --help

to get the Usage Help:

Usage: mapping_runner [OPTIONS] [MAPPING_SUITE_ID] [SERIALIZATION_FORMAT]

  Transforms the Test Mapping Suites (identified by mapping-suite-id). If no
  mapping-suite-id is provided, all mapping suites from mappings directory
  will be processed.

Options:
  --opt-mapping-suite-id TEXT                 MappingSuite ID to be processed (leave empty
                                              to process all Mapping Suites).
  --opt-serialization-format TEXT             Serialization format (turtle (default),
                                              nquads, trig, trix, jsonld, hdt).
  --opt-mappings-folder TEXT
  --opt-output-folder TEXT
  --help                                      Show this message and exit.

CMD: mapping_suite_processor

Processes Mapping Suite (identified by mapping-suite-id).

    - by commands:
        --- resources_injector
        --- rml_modules_injector
        --- sparql_generator
        --- rml_report_generator
        --- mapping_runner
        --- xpath_coverage_runner
        --- sparql_runner
        --- shacl_runner
        --- validation_summary_runner
        --- triple_store_loader
        --- metadata_generator
        --- mapping_suite_validator
    - by groups:
        --- "inject_resources": ["resources_injector", "rml_modules_injector"],
        --- "generate_resources": ["sparql_generator", "rml_report_generator"],
        --- "update_resources": ["resources_injector", "rml_modules_injector", "sparql_generator", "rml_report_generator"],
        --- "transform_notices": ["mapping_runner"],
        --- "validate_notices": ["xpath_coverage_runner", "sparql_runner", "shacl_runner", "validation_summary_runner"],
        --- "upload_notices": ["triple_store_loader"],
        --- "validate_mapping_suite": ["mapping_suite_validator"]

Use:

mapping_suite_processor --help

to get the Usage Help:

Usage: mapping_suite_processor [OPTIONS] MAPPING_SUITE_ID

  Processes Mapping Suite (identified by mapping-suite-id): -
  resources_injector - rml_modules_injector - sparql_generator -
  rml_report_generator - mapping_runner - xpath_coverage_runner -
  sparql_runner - shacl_runner - validation_summary_runner -
  triple_store_loader - mapping_suite_validator - metadata_generator

Options:
  -n, --notice-id TEXT            Provide notices to be used where applicable
  -c, --command TEXT              resources_injector,rml_modules_injector,spar
                                  ql_generator,rml_report_generator,mapping_ru
                                  nner,xpath_coverage_runner,sparql_runner,sha
                                  cl_runner,validation_summary_runner,triple_s
                                  tore_loader,metadata_generator,mapping_suite
                                  _validator
  -g, --group TEXT                inject_resources,generate_resources,update_r
                                  esources,transform_notices,validate_notices,
                                  upload_notices,validate_mapping_suite
  -m, --opt-mappings-folder TEXT
  -r, --opt-rml-modules-folder TEXT
  --help                          Show this message and exit.

Use:

mapping_suite_processor -c COMMAND1 -c COMMAND2 ...
or
mapping_suite_processor --command=COMMAND1,COMMAND2

to set custom commands (order) to be executed

mapping_suite_processor -g GROUP1 -g GROUP2 ...
or
mapping_suite_processor --group=GROUP1,GROUP2

to set custom command groups to be executed

mapping_suite_processor -n NOTICE_ID1 -n NOTICE_ID2 ...
or
mapping_suite_processor --notice-id=NOTICE_ID1,NOTICE_ID2

to set notice ids to be used (where applicable)

CMD: sparql_runner

Generates SPARQL Validation Reports for RDF files.

Use:

sparql_runner --help

to get the Usage Help:

Usage: sparql_runner [OPTIONS] [MAPPING_SUITE_ID]

  Generates Validation Reports for RDF files

Options:
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: xpath_coverage_runner

Generates Coverage Reports for Notices

Use:

xpath_coverage_runner --help

to get the Usage Help:

Usage: xpath_coverage_runner [OPTIONS] [MAPPING_SUITE_ID]

  Generates Coverage Reports for Notices

Options:
  -i, --opt-conceptual-mappings-file TEXT     Use to overwrite default INPUT
  -m, --opt-mappings-folder TEXT

  --help                                      Show this message and exit.

CMD: shacl_runner

Generates SHACL Validation Reports for RDF files.

Use:

shacl_runner --help

to get the Usage Help:

Usage: shacl_runner [OPTIONS] [MAPPING_SUITE_ID]

  Generates SHACL Validation Reports for RDF files

Options:
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: rml_report_generator

Generates RML modules report file for Mapping Suite.

Use:

rml_report_generator --help

to get the Usage Help:

Usage: rml_report_generator [OPTIONS] [MAPPING_SUITE_ID]

  Generates RML modules report file for Mapping Suite.

Options:
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: triple_store_loader

Loads the MappingSuite output into Triple Store.

Use:

triple_store_loader --help

to get the Usage Help:

Usage: triple_store_loader [OPTIONS] [MAPPING_SUITE_ID]

  Loads the MappingSuite output into Triple Store.

Options:
  -c, --opt-catalog-name TEXT
  -m, --opt-mappings-folder TEXT
  --help                                      Show this message and exit.

CMD: mapping_suite_validator

Validates a Mapping Suite (structure)

Use:

mapping_suite_validator --help

to get the Usage Help:

Usage: mapping_suite_validator [OPTIONS] [MAPPING_SUITE_ID]

  Validates a Mapping Suite (structure)

Options:
  -m, --opt-mappings-folder TEXT
  --help                          Show this message and exit.

CMD: conceptual_mapping_differ

Generate reports (JSON, HTML) with differences between 2 Conceptual Mappings

Use:

conceptual_mapping_differ --help

to get the Usage Help:

Usage: conceptual_mapping_differ [OPTIONS]

  Generate reports (JSON, HTML) with differences between 2 Conceptual Mappings

Options:
  -ms-id, --mapping-suite-id TEXT Mapping Suite IDs
  -f, --file TEXT                 Conceptual Mappings files
  -b, --branch TEXT               GIT branches or tags
  -m, --opt-mappings-folder TEXT
  -o, --opt-output-folder TEXT
  --help                          Show this message and exit.

Use for:

* --file vs --file
# conceptual_mapping_differ --file=<CONCEPTUAL_MAPPINGS_FILE1> --file=<CONCEPTUAL_MAPPINGS_FILE2>

* --mapping-suite-id vs --file
# conceptual_mapping_differ --mapping-suite-id=<MAPPING_SUITE_ID1> --file=<CONCEPTUAL_MAPPINGS_FILE2>

* --mapping-suite-id vs --mapping-suite-id
# conceptual_mapping_differ --mapping-suite-id=<MAPPING_SUITE_ID1> --mapping-suite-id=<MAPPING_SUITE_ID2>

* --branch + --mapping-suite-id vs --branch + --mapping-suite-id
# conceptual_mapping_differ --branch=<BRANCH1>  --mapping-suite-id=<MAPPING_SUITE_ID1> --branch=<BRANCH2> --mapping-suite-id=<MAPPING_SUITE_ID2>
# conceptual_mapping_differ -b <BRANCH1> -ms-id <MAPPING_SUITE_ID1> -b <BRANCH2> -ms-id <MAPPING_SUITE_ID2>

* --branch + --mapping-suite-id vs --file
# conceptual_mapping_differ --branch=<BRANCH1> --mapping-suite-id=<MAPPING_SUITE_ID1> --file=<FILE2>

* --branch + --mapping-suite-id (remote) vs --mapping-suite-id (local)
# conceptual_mapping_differ --branch=<BRANCH> --mapping-suite-id=<MAPPING_SUITE_ID>

CMD: rdf_differ

Given two RML files representing turtle-encoded RDF, check whether they represent the same graph.

Use:

rdf_differ --help

to get the Usage Help:

Usage: rdf_differ [OPTIONS] FIRST_FILE SECOND_FILE

  Given two RML files representing turtle-encoded RDF, check whether they
  represent the same graph.

Options:
  -o, --output-folder TEXT
  --help                    Show this message and exit.

CMD: s3_rdf_publisher

Publish RDF content to S3 bucket

Use:

s3_rdf_publisher --help

to get the Usage Help:

Usage: s3_rdf_publisher [OPTIONS]

  Publish RDF content to S3 bucket. --rdf-file[list] OR --mapping-suite-
  id[value] OR (--mapping-suite-id[value] AND --notice-id[list]) must be
  provided!

  Make sure to have set up these variables in .env file:

  S3_PUBLISH_HOST, S3_PUBLISH_NOTICE_RDF_BUCKET (this will be overwritten by
  CLI option, if provided), S3_PUBLISH_USER, S3_PUBLISH_PASSWORD,
  S3_PUBLISH_REGION=eu-central-1, S3_PUBLISH_SECURE=1, S3_PUBLISH_SSL_VERIFY=0

Options:
  -f, --rdf-file TEXT             '--rdf-file=RDF_FILE' or '-f
                                  RDF_FILE1,RDF_FILE2'
  -ms-id, --mapping-suite-id TEXT
  -n, --notice-id TEXT            '--notice-id=NOTICE_ID' or '-n
                                  NOTICE_ID1,NOTICE_ID2'
  -no-n, --skip-notice-id TEXT    notices to be skipped when only --mapping-
                                  suite-id is provided
  -b, --bucket-name TEXT          S3 Bucket
  -o, --object-name TEXT          '--object-name=OBJECT_NAME' or '-o
                                  OBJECT_NAME1,OBJECT_NAME2'
  -m, --mappings-folder TEXT
  --help                          Show this message and exit.

Usage example:

s3_rdf_publisher package_F03 -n 000163-2021 -n 006737-2021 -o object_name_for_000163-2021 -o object_name_for_006737-2021 -b bucket_name -f some_rdf_file_path

Use --skip-notice-id for notices to be skipped when only --mapping-suite-id is provided (if only --mapping-suite-id is provided all mapping-suite its RDFs will be published).

# --object-name (-o) will be used to fullfill RDF object_name in order of insertion (--notice-id list, followed by --rdf-file list): first --object-name will be used for first RDF and so on (otherwise, if no corresponding object_name found for RDF, order of insertion is preserved, the object_name will be the same as provided RDF file)

--rdf-file[list] OR --mapping-suite-id[value] OR (--mapping-suite-id[value] AND --notice-id[list]) must be provided!