Mapping suite package structure
In this section we describe the structure of a “mapping suite package” in GitHub. Such a package contains everything that is needed for the development and testing of a given “mapping suite” that is applicable to a certain set of notices. After the package is finalised, it can be used by a process to apply it to a large number of notices stored in a database, and would transform those notices into RDF data.
A package is represented by a well-defined folder structure containing certain files. This folder structure is repeated for every developed mapping. Initial organisation of these packages is per Form number, but it may evolve.
The structure of the package changes through the different phases of the mapping development process. Below we describe how such a package looks in three phases of the mapping development.
In the first, initial, phase, when the Semantic Engineers start working on a new mapping suite, they will have to set up a package folder structure similar to the one described below, and will work on (or with) the files contained there.
Assumption: Regarding the naming and organisation of the various mapping suites, one package per form number is assumed to be THE way to organise these packages.
Challenge: Are there better ways to deal with certain sections (sub-sections) that repeat across multiple forms? Consider Section I, for example, which in case of forms F03, F06, F25 contains “almost” the same information, therefore only one mapping should be written for it and RE-used in “final” form-mapping-packages. The problem is also discussed in a dedicated section below.
The structure of an example mapping package folder structure is presented below:
/package_Fxx /transformation conceptual_mappings.xlsx /mappings *.rml.ttl /resources *.json, *.xml, *.csv /test_data *.xml
/package_Fxxroot folder of the mapping suite
/transformation/conceptual_mappings.xlsxmanually created (from the Google Sheet template described here)
/transformation/resourcesadditional resources possibly needed by the transformation rules;
The content of this folder should be automatically generated by the mapping package processor, based on the "Resources" sheet of the
conceptual_mappings.xlsx, from the "source of truth"
/transformation/mappings/*.rml.ttlthe relevant RML transformation rules, organized in module files, which are copied from the "source" mappings folder, according to the information specified in the "RML Modules" sheet of the
conceptual_mappings.xlsx. IMPORTANT!!! In these rules the source XML is always referring to
data/source.xml, which corresponds to the
../../data/source.xmlfile that will be copied (and renamed) from the
test_datafolder at the time of the execution of the mapping.
/test_datamanually and carefully selected test data possibly grouped in suborders, e.g.
technical_mappings.yarrrml.yaml(optional) manually created, and used in earlier days of the mapping development, but currently not used
A package provided by the semantic engineers (SE) is enriched with additional artefacts that are generated automatically using the package expanding tools which take as input the artefacts provided by the SE. Here are some examples of these additional artefacts that are being generated:
Metadata describing the parameters for selecting the notices that the mappings can be applied to, various version information, etc.
SPARQL queries that can be used to validate and/or test the generated outputs
SHACL shapes that can be used to validate and the structure of the generated outputs
New ones may be added at the time of writing this document
After the package processing/expansion, the structure of the example mapping package presented in the previous subsection would look like this:
/package_Fxx metadata.json /transformation conceptual_mappings.xlsx /mappings *.rml.ttl /resources *.json, *.xml, *.csv /data source.xml /output *.rdf /validation /sparql /cm_assertions *.rq /shacl # this is a constant, when we know what the SHACL is (currently unknown) *.shacl.ttl # data shape file(s) /test_data # manually and carefully selected test data *.xml
metadata.jsonautomatically generated from Metadata sheet of
/data# this is a placeholder created at runtime to process the inputs. It serves only when the mapping suite is being tested, or executed by some script.
source.xmlthis file is generated during runtime by copying a given test data file
/outputthis is a placeholder created at runtime to store outputs. It serves only when the mapping suite is being tested, or executed by some script.
/validation/sparql/cm_assertionsSPARQL queries automatically generated from the conceptual mapping
After the “execution” of a mapping, the mapping package will be further enriched, and will contain additional files, as a result of running the mapping suite on the included test data.
/package_Fxx metadata.json /transformation conceptual_mappings.xlsx /mappings *.rml.ttl /resources *.json, *.xml, *.csv /data source.xml /output /<notice_file1> <notice_file1>.ttl /test_suite_report *.ttl, *.html, *.json # e.g. sparql_cm_assertions.html, shacl_epo.html, xml_coverage.html /<notice_file2> ... /<notice_file3> ... /validation /sparql /cm_assertions *.rq /shacl /epo ePO_shacl_shapes.rdf shacl_result_query.rq /test_data <notice_file1>.xml <notice_file2>.xml <notice_file3>.xml *.xml
/output/<notice_file1>for each example file we create a folder that will contain all the generated artefacts for that sample file
/output/test_suite_reportvalidation reports summarising all individual reports
/output/<notice_file1>/<notice_file1>.ttlthe output of the transformation *