Mapping Methodology

The purpose of this guide is to describe a methodology for creating mapping rules that can be used for transforming XML data representing the content of the TED “Standard” Forms into RDF graphs that conform to the ePO ontology.

The methodology will rely on various resources generated by earlier attempts (e.g. those created by Everis), outcomes of projects that work on related issues (e.g. by Deloitte e.a.), as well as the input of a number of people with in-depth knowledge of the ePO ontology, XML, RML and expertise in Semantic Web technologies in general.

The mappings generated by this methodology are to be used as part of the notice transformation workflow for all Tenders Electronic Daily (TED) notices conforming to "Standard" forms that are published in the TED XML format, into RDF format, in order to make it available for querying through a SPARQL endpoint.

Mapping development lifecycle

The purpose of this section is to provide a high-level overview of the mapping development lifecycle before the mapping suites are ready to be used for transformation in production settings.

The diagram in Figure 1 describes the complete mapping development lifecycle.

maping development lifecycle
Figure 1. Mapping development lifecycle

The process starts by deciding which notice type to map. The notice classification scheme currently considered are the "standard forms" and the "eForms". In the future, however, it is likely that new classification schemes will follow.

Once the target form is selected, a set of representative shall be selected in order to serve as a test set for the developed mapping.

The notice classification scheme is usually subject to regulations and described in multiple forms starting from PDF documents, XSD schemes, and other documents. All possible documentation and modelling artefacts are analysed in order to produce a conceptual mapping (described in the Conceptual mapping structure section). This artefact, stored as a workbook, establishes the correspondence of the TED notice fields, to XPath expressions, and to EPO structures.

Once the conceptual mapping is established between the XPath expressions and ontology structures, a corresponding technical mapping is implemented. This technical mapping consists of a set of mapping rules expressed in RDF Mapping Language (RML) and it represents a translation of the conceptual mapping into a formal language which can be automatically executed by a transformation engine, such as RMLMapper.

Reading RML, for revision or debugging purposes, can be difficult. To ease this process, the RML files can be turned into a comprehensive HTML report. This human friendly form can serve, for example, parsing the set of conceptual mapping rules and the technical ones, in order to check their completeness or consistency.

That completeness check is a manual step but not the only quality assurance mechanism foreseen in the lifecycle process. The setup and conventions of the conceptual mapping file permits generation of several additional artefacts (described elsewhere). In particular, a set of SPARQL assertion queries can be generated. These queries can be used to assess whether or not the fields mapped in the conceptual mapping are found in that form in the RDF file generated from an XML notice.

To put the SPARQL assertions into practice, the sample data, initially selected sample data is automatically transformed into RDF, using the RML mapping rules. And on each output, the SPARQL queries are applied resulting in a report, which indicates which query yielded a positive and which a negative result. The resulting validation reports can be checked to detect mistakes in the RML transformation rules or the conceptual mapping.

The final step consists of another assessment of the generated data: the fingerprinting. This procedure generates a report which reconstructs the data shape instantiated in a particular notice. This permits identification of mistakes with regard to the EPO ontology.

Key elements involved in the mapping process

In this section we will provide descriptions and references to the key elements (concepts or resources) that are involved in the creation of mappings or the mapping process itself.

The purpose of the mapping process is to generate a “mapping table” (described in the conceptual mapping artefact section) that can be processed by an automated workflow of the TED-SWS system to “execute” these mappings in order to convert the TED XML input data into an ePO conformant RDF graph (the output of the mapping). This “mapping table” will be encoded as a spreadsheet, with multiple worksheets, whose structure is described elsewhere.

Notices to be mapped

The input for the transformation process are XML files that contain TED notice data. These data are structured according to the “Standard Forms” published by the European Commission. There are 23 standard forms defined (numbered 1-8, 12-25, T1 and T2), whose PDF versions can be found here: https://simap.ted.europa.eu/standard-forms-for-public-procurement.

The XML files need to conform to the official TED XML format defined by TED XML Schema (XSD). Over the years, multiple versions of this TED XML Schema were released, and there is a significant amount of XML data published that conform to these various versions. The latest XML notices are conformant to the versions R2.0.8 (more precisely R2.0.8.S05.E01_002-20201027 - in case of forms 16-19) and R2.0.9 (more precisely R2.0.9.S05 - for all the other forms) of the TED XML Schema.

Considered resources

To create these mappings, we rely on a number of different resources that were produced by earlier or currently running projects that aim to provide solutions in the TED landscape. The following table contains a non-comprehensive enumeration of these resources, with their short descriptions and links to where they can be found. This table is provided as a quick reference only. More detailed description of the key resources can be found in a dedicated section below.

Resource Name

Description

ePO - dev version

The e-Procurement Ontology (currently under development) link

TED_EXPORT.xsd

The XML Schema defined for the TED XML Notices. Download and extract the versions R2.0.9 and R2.0.8 of the TED XML schema from the links named Reception: TED eSenders XML schema on this page: link

TED forms PDFs

PDF files representing the physical forms that are to be completed with the relevant data according to each notice. link

Deloitte & OP - TED_XML_Mapping

Excel spreadsheets that map elements (fields, sections, etc.) found in the Standard Forms to elements in the eForms. These tables provide the full list of elements and also XPaths to identify the corresponding information TED-XML files. link

Ontology_eForms_NEW_Mapping_New Regulation

link

Mappings by Everis

Just for reference. Various parts might be out of date. link

Test data set by Everis

A set of approx. 300 XML files (6 batches of about 50 files each, for the forms F02, F03, F05, F06, F24 and F25) link

RDF results by Everis

Just for reference. Various parts might be out of date. link

XML Data analysis

Contains various tables, each summarising certain aspects (e.g. XML elements related to certain fields in the form) of the data extracted from test notice files. link

XML Elements to Vocabulary Mapping

link

TED Mappings to ePO terms

link

conventions_report.html

Overview of ePO Terms generated from the UML model. conventions_report (to be checked out and open in a browser)

Transformation output

The output of the XML notice transformation will be an RDF graph instantiating the eProcurement Ontology, containing a number of RDF triples where the subjects, predicates and objects of the triples are either:

  • unique IRIs, generated in a deterministic fashion, that can identify the notice or the different component parts of a notice; these IRIs (or less frequently blank nodes) are used in multiple triples (either as subjects or object) to build an RDF graph;

  • IRIs representing controlled vocabulary terms or entities in the ePO ontology;

  • Literals representing numbers, boolean values, or strings. The string values are often encoded as multilingual strings of type rdf:langString., to enable the representation of textual values in multiple European languages.

Mapping files produced

The key element enabling transformation automation are the mapping files: conceptual and technical mappings. They are developed according to the mapping creation methodology.

The mapping rules are organised in mapping suites, described here.

Conceptual mapping structure

In this section we describe the structure of the mapping file that we aim to create as the result of the described mapping process. The mapping file is an Excel workbook containing multiple worksheets. This workbook is generated from a Google Sheets document, where it can be prepared, revised, and refined by multiple contributors in a collaborative fashion. The template for this spreadsheet is provided here. This template informs software developers and knowledge engineers alike about what should be the content of such a mapping file.

It consists of several important worksheets:

  • Metadata sheet: provides important technical and descriptive information about the mapping suite

  • Resources sheet: provides the list of resources used in the technical mappings. This list is used to automatically populate the mapping suite with indicated resources files.

  • Rules sheet: provides the actual set of mapping rules

  • Misc: there are additional optional worksheets added by the semantic engineers to manage additional information.

Cell refs. Header for content Description Notes

B2

Form number

Standard Form number (one of F03-F08, F12-F25, T1 or T2). For multiple forms a comma separated list can be used.

See list of standard forms here

B3

Legal Basis

Filter for the directives that constitute the legal bases for the notice. For multiple directives a comma separated list can be used. For any value the character * can be used.

Examples: D24 / D23, D25 / R1370

B4

Year

Filter for the year when the notice was published. For multiple years a comma separated list or ranges of the form startYear-endYear, or a combination of these two can be used. For any value the character * can be used.

Valid examples: 2018 / 2016-2020 / 2016, 2018-2020

B5

Notice type (eForms)

[TODO]

B6

Form type (eForms)

[TODO]

B7

Mapping Version

A version number for the current mapping table. The version number should be increased for each “released” version of the mapping table that is different from the previously released version, following semantic versioning practices.

Example values: 0.1.0 / 1.0.0-beta / 1.1.0 / 2.3.2

B8

EPO version

The version number of EPO to which the mapping is done.

B9

XSD version number(s)

The version number of the TED XML Schema file. Ranges should be also allowed. For multiple versions a comma separated list or ranges of the form (startVersion, endVersion), or a combination of these two can be used. For any value the character * can be used.

Example values: R2.0.9.S05 (this includes all intermediary versions of R2.0.9.S05.E01, such as R2.0.9.S05.E01_001-20210730) / R2.0.9.S04.E01_002-20201027, R2.0.9.S04.E01_001-20201008 / (R2.0.9.S03.E01_005-20180515, R2.0.9.S03.E01_010-20200224] / Theoretically anything like this could be used: (,1.0],[1.2,)

Cell refs. Header for content Description Notes

A2:A

File name

The name of the resource files that are used by the mappings and need to be present in the +resources +folder.

Cell refs.

Header for content

Description

Notes

A:D

Conceptual mapping

A3:A

Standard Form Field ID (M)

The “identifier” of the field in the Standard Form PDF file. Usually the field number, such as IV.1.1.1.2, or the section name, e.g. “Section IV

Mandatory

B3:B

Standard Form Field Name (M)

The name of the field in the Standard Form PDF

Mandatory

C3:C

eForm BT-ID (O)

The ID of the corresponding business term (BT) or business term group (BG) in eForms. The values are coming from the column B of this spreadsheet (or one of its equivalents):

Optional

D3:D

eForm BT Name (O)

The name of the corresponding business term (BT) or business term group (BG) in eForms. The values are coming from the column C of this spreadsheet (or one of its equivalents):

Optional

E:F

Standard form technical mapping

E3:E

Base XPath (for anchoring) (M)

The “base” XPath that identifies an XML element and all of its sub-elements. It can be specified at the level of a section, or subsection, so that writing XPaths for form elements within that (sub)section will not have to repeat over and over again the “base” XPath.

Mandatory

F3:F

Field XPath (M)

The XPath that identifies the form element, which is relative to the “base” XPath that was specified for the closest element above this one.

Mandatory

G:J

ePO mapping

G3:G

Class path (M)

Specifies the types of the resources involved in the entire “path” from the subject to the object, which “connects” the concept that represents this XML element (the object), to an RDF resource already created from previous XML elements (the subject). So, if the representation of an XML element involves the creation of the following triples: s p1 o1. o1 p2 o2. o2 p3 o.

Mandatory

H3:H

Property path (M)

Specifies the properties involved in the entire “path” from the subject to the object, which “connects” the concept that represents this XML element (the object), to an RDF resource already created from previous XML elements (the subject). So, if the representation of an XML element involves the creation of the following triples: s p1 o1. o1 p2 o2. o2 p3 o.

Mandatory

I3:I

Triple fingerprint (O)

[TODO]

Optional

J3:J

Fragment fingerprint (O)

[TODO]

Optional

Mapping development methodology

The purpose of this section is to give a high-level overview of the process that we follow to create the mapping rules that are used to transform TED notices available as XML files into RDF triples that use ePO terms.

The methodology that we describe here, focuses in a few steps and artefacts that are at the core of the diagram depicted in Figure 2. This diagram shows only a part of the mapping lifecycle.

mapping creation
Figure 2. Mapping creation steps in the development lifecycle

We started out building our methodology by following this theoretical framework, but based on what we learned from our practical experience with manually creating such mappings we have enhanced and refined the mapping creation process. It is safe to assume that as we learn even more from creating further mappings, the process will be further refined and this document will evolve.

The focus here is on describing the steps how the conceptual mapping and the technical mapping are created.

Creation of conceptual mappings:

  • Identify the form elements (and the corresponding XML elements and XPaths) to be mapped

  • Identify eForm Business Terms (BT) corresponding to the XML elements (optional)

  • Identify ePO terms (classes and relations) that correspond to XML elements and their relationships

  • Identify value sets in XML

  • Identify value sets in ePO and in other vocabularies, corresponding to XML controlled values used in the XML files

  • Write Turtle fragments that provide a template for the triples that should be generated, and which can be used in the RML mapping rules

  • Identify and document problems/issue/questions to be clarified with external experts

Creation of technical mappings:

  • Identify the sources that are necessary to execute a mapping

  • Prepare vocabulary files, and other “dictionaries” that are to be used as resources for the mapping

  • Prepare test data

  • Write YARRRML rules (optional)

  • Write RML rules (or convert YARRRML rules to RML) and test them

  • Document problems and create tasks to find solutions for them

Testing the mapping in various ways to discover potential problems and improve it:

  • Run the mapping on all test notices files and analyse the output

  • Generate the various validation outputs (SHACL shapes, SPARQL queries etc.), for all test data and analyse it

  • Execute the other steps in the mapping development lifecycle to find potential issues and refine the mapping

Next we provide more detailed descriptions of steps in more detail.

Steps involved in the conceptual mapping process

The conceptual mapping (see structure description here ) is the first artefact that must be created. It requires a thorough understanding of the content of a given form, and all the related concepts in the ePO ontology. It will likely involve several rounds of discussions with people who have deep knowledge in these areas, to ensure that the conceptual mapping is done right. Below we look at certain sub-steps involved in developing the conceptual mapping.

Identify the Form Elements (and the Corresponding XML Elements and XPaths) to be Mapped

To identify the XML elements that contain information to be mapped to RDF we look at:

  • The “Standard Forms to eForms” mapping table that corresponds to the form that we are trying to map. This provides us both with the list of the form elements, and XPath expressions that can be used to retrieve the appropriate information (from XML elements, attributes, etc.) from the XML data. *Note: *Some of these XPaths are pretty straight forward, but other times they can be quite complicated, or multiple XPaths are used to retrieve alternative values. We need to test these XPaths, and see if we can write simpler, better and/or more appropriate ones.

  • The TED_EXPORT.xsd schema file, corresponding to the XML version that we are trying to map. Note: Special attention should be paid to the structure of the XML document (especially when we have repeating elements, or multiple levels of nesting, sometimes involving elements with very similar names)

  • The PDF form that we are trying to map, to make sure that all the elements are covered and the correct semantic of the fields has been identified Individual XML notices available in our test data set, as well as data extracted from these sample notices and compiled in tables to provide an overview of the different values that are contained in the test regarding a certain field.

Identify eForm Business Terms (BT) corresponding to the XML elements (optional)

Although this is not necessary for the conversion of the Standard Form XML to RDF data, from a future-oriented perspective it is still useful to identify the eForm Business Terms corresponding to each Standard Form element. This should be pretty straight forward by looking at the “Standard Forms to eForms” mapping table.

Identify ePO terms (classes and relations) that correspond to XML elements and their relationships

Basically, in this step, we need to identify the appropriate classes, class attributes, and relationships between the ePO classes that can be used to represent the information contained in the various XML elements. This requires a deep understanding of the ePO model.

Identifying the relevant ePO terms might not be very straightforward in some cases, as there is quite a significant difference between the conceptualisation and abstractisations made in the two models, and we can often encounter situations where even the names used for the same concept might differ significantly in the two models. Therefore, in this step it is highly recommended consulting with people who have a high level of knowledge of the structure and content of the ePO model.

Make sure to document any problems and discrepancies that we discover, which prevents the creation of a perfect (one-to-one) mapping. This documentation should happen both on the spreadsheet (e.g. by highlighting problematic cells in certain colours and/or adding comments to them), but also by describing issues in a separate, dedicated, document that can be reviewed and addressed by ePO experts. For more on the documentation process see section below.

Identify value sets in XML

To identify the value sets (i.e. the possible different values) that are used in the XML data, either as certain element names, or attribute values), we need to look at:

  • The TED_EXPORT.xsd schema file, corresponding to the XML version that we are trying to map,

  • The values that appear in the sample XML notices, summarised in various ways.

  • The PDF form that we are trying to map, to see if the form specifies an obvious value set, e.g. by means of checkboxes or radio buttons.

  • Consult the authority tables used in the EPO available from the EU Vocabularies

Identify value sets in ePO and in other vocabularies, corresponding to XML controlled values used in the XML files

For this step we have to identify the different vocabularies that are referenced by ePO attributes and relationships that are involved in the mapping of a certain XML element, and we should gain some familiarity with that vocabulary. At minimum, we should know what namespace they are using and what are some values and how are they encoded (i.e. which properties they are using to encode labels, ids, etc).

Write Turtle fragments that provide a template for the triples that should be generated, and which can be used in the RML mapping rules

To give an example of an input (XML notice’s fragment), and it’s transformation (RDF result), let’s see a fragment of an organisation definition.

XML fragment (source.xml)
<TED_EXPORT>
    <FORM_SECTION>
        <F03_2014 CATEGORY="ORIGINAL" FORM="F03" LG="PT">
            <CONTRACTING_BODY>
                <ADDRESS_CONTRACTING_BODY>
                    <OFFICIALNAME>Administração Regional de Saúde do Alentejo, I. P.</OFFICIALNAME>
                <ADDRESS_CONTRACTING_BODY>
            <CONTRACTING_BODY>
        <F03_2014 CATEGORY="ORIGINAL" FORM="F03" LG="PT">
    <FORM_SECTION>
<TED_EXPORT>
Expected RDF result (result.ttl)
@prefix org: <http://www.w3.org/ns/org#> .
@prefix epo: <http://data.europa.eu/a4g/ontology#> .

epo:Organization/2021-S-001-000163/ab152979-15bf-30c3-b6f3-e0c554cfa9d0
    a org:Organization;
    epo:hasName "Administração Regional de Saúde do Alentejo, I. P."@pt .

To see the corresponding RML rules that we are writing to do such a transformation please see section below: Write RML rules.

Identify and document problems/issue/questions to be clarified with external experts

When we take a look at the way how the RML rules are writen (see technical mapping chapter) we notice that it is a set of TripleMap. Each one contains:

  • "LogicalSource" that gives information about the source (XML file to transform)

  • "referenceFormulation" witch is the language used to parse the source.

The main problem that we had when we are using XPath as parser of XML notices, is the old version used by RML Mapper to parse the XML notices. The only version that were supported is XPath 1.0 that did not offer enough features, functions and expressions to parse an XML file.

For example, XPath 1.0 does not support the default namespaces that we find in the notices that we are transforming. The consequences of having these namespaces in the XML notices is that RML Mapper did generate empty RDF result because XPath does not find any element that we are referring for. To handle this problem we started to think to add a pre-processing step that remove these default namespaces from the XML notices. We were not convinced by that solution because modifying/deleting element from the input can impact the quality of the expected results.

So that’s why we contacted the founders of RML Mapper to find a way to handle default namespaces without processing the XML notices.

The founders ended up by generously offering us a new version of RML Mapper that supports XPath 2.0/3.0 into witch the default namespaces are supported.

Steps involved in the technical mapping process

In this section we will describe certain aspects that need to be addressed in the “technical mapping” step of the mapping creation process.

Identify the sources that are necessary to execute a mapping

This step is about making sure that all necessary sources are defined properly in the YARRRML/RML files, and that these sources refer to files that already exist, or will be available at the time of running the mapping, in the mapping package.

Important: The path to the source files should be specified relative to the RML file(s). So, if the RML mapping files are in the transformation/mappings folder (as described above), then the sources they define should point to the ../../data/data.xml file, respectively to the various .json, .csv and/or .xml files in the ./resources folder

Prepare vocabulary files, and other “dictionaries” that are to be used as resources for the mapping

Prepare test data

Write YARRRML Rules (optional)

During the initial phase of our mapping creation process we started out writing the mapping rules in YARRRML (a human-readable text-based representation for declarative generation rules), instead of RML, because it seemed simpler, and the end result was more human friendly. However, as we gained more experience and confidence in how the mappings should be defined, we realised that writing RML rules directly could be even more powerful, and we started to rewrite all our YARRRML mapping rules into RML. If this transition proves to be successful, and writing RML rules directly will be more convenient, our process will not require writing YARRRML rules in the future. This is the reason for why this step is optional. It could be useful for small test cases, quick demos, or showcases, and in cases when some people are more familiar with YARRRML than RML. If people decide to write YARRRML rules, the next step will become unnecessary, as the RML rules will be automatically generated from the YARRRML rules, using dedicated tools that were developed for this purpose.

Since this step is optional, we will not describe in detail the individual issues that need to be worked on, but they are in principle the same as the ones described in the next section.

Write RML rules

If in the previous step we have defined the mapping rules in YARRRML, then this step consists of the simple action of executing the tool that generates RML out of YARRRML.

Regardless of which file we chose to define manually, the YARRRML or the RML, at the end of this step we will need to have an RML mapping file that should be able to convert an XML notice into a corresponding RDF graph.

In the rest of this section we assume that the RML rules are being written manually, as this is the solution that promises the biggest potential benefit and this is the approach that we would like to pursue in the future.

The technical mappings are written in the RML mapping language. The version of RML used is 5.0.0-r362, which was recommended to us by Julian Rojas, its principal developer, in which the XPath version 3.1 is supported.

prefix definition

To specify the technical mappings in RML, we start with the definition of the prefixes that are used in the mapping file. For example for the ePO ontology, we would define the epo prefix name as follows:

@prefix epo: <http://data.europa.eu/a4g/ontology#> .

The prefix names and their values, which are used in the RML file, should be ALL maintained in the prefixes.json resource file. If the content of that file is maintained and kept up to date properly, the entire prefix declaration section of the RML file could be automatically generated, and re-generated when necessary. (Note: Besides the individual prefixes, please also check out the array that is the assigned value to the rml_rules key).

TriplesMap

The next step after the definition of prefixes, is to define the various TriplesMaps to create class instances. For example, let’s see an organisation’s technical mapping.

<#OrganisationMapping> a rr:TriplesMap ;
   rml:logicalSource
       [
           rml:source "source.xml" ;
           rml:referenceFormulation ql:XPath
           rml:iterator "/TED_EXPORT/FORM_SECTION/F03_2014/CONTRACTING_BODY/ADDRESS_CONTRACTING_BODY" ;

       ] ;

The TriplesMap of this organisation is called “OrganisationMapping”, this name is a unique reference which is used to generate the rdf dataset and also used to refer to it in the others mappings.

A TripleMap has:

  1. rml:logicalSource : containing the source (it can be the xml notice that we are transforming, or a CSV/JSON file containing the controlled values)

  2. rml:referenceFormulation : defining the parser for the file. In the case of the XML notices we are using XPath, while for the CSV/JSON files we are using ql:CSV/ql:JsonPath

  3. rml:iterator : the path were the RML mapping starts iterating for this Organisation mappings

SubjectMap

The subjectMap describes how to generate a unique subject value of a TriplesMap (e.g. Organisation).

 rr:subjectMap
       [
           rr:template
               "http://data.europa.eu/a4g/resource/Organisation/{replace(replace(/TED_EXPORT/CODED_DATA_SECTION/NOTICE_DATA/NO_DOC_OJS, ' ', '-' ), '/' , '-')}/{substring-before(substring-after(unparsed-text('https://www.uuidtools.com/api/generate/v3/namespace/ns:url/name/' || count(preceding::*)+1),'[\"'),'\"]')}" ;
           rr:class org:Organization

       ] ;

The subject should be unique to each different organisation that we find in an XML notice. To do that, we are using a concatenation of

  1. a cleaned reference of the notice file replace(replace(/TED_EXPORT/CODED_DATA_SECTION/NOTICE_DATA/NO_DOC_OJS, ' ', '-' ), '/' , '-'); and

  2. a cleaned result of a MD5 function which returns a UUID based on the position of the iterator that is unique to each organisation on the XML notice, this is done with substring-before(substring-after(unparsed-text('https://www.uuidtools.com/api/generate/v3/namespace/ns:url/name/' || count(preceding::*)+1),'[\"'),'\"]');

  3. the type of the mapping is defined by rr:class org:Organization

This solution also helps us to handle the case of having nested tags by giving each of them a different uuid thanks to the result of the position XPath function.

predicateObjectMap

A nested set of predicates objects map to each predicate/object of the organisation instance.

rr:predicateObjectMap
   [
   rr:predicate epo:hasName ;
   rr:objectMap
           [
               rml:reference "OFFICIALNAME"
           ]
   ] ;

In this part of a TriplesMap we find two components:

  1. A predicate rr:predicate epo:hasName ;

  2. An objectMap. It can be

    1. a rml:reference which is the XPath (starting from the iterator) into the XML notice corresponding to the value of the predicate (OFFICIALNAME) or

    2. a rml:template that contains a combination of string and XPath expression

Refer to other mappings

A referencing object map allows using the subjects of another triples map as the objects generated by a predicate-object map. We have two use cases of connecting two TriplesMaps by using the rr:parentTriplesMap pattern

  • A referencing object map is represented by a resource that: has exactly one rr:parentTriplesMap property (without joint condition). Here is an example of connecting our Organisation to its ContactPoint

rr:predicateObjectMap
   [
       rr:predicate epo:hasDefaultContactPoint ;
       rr:objectMap
           [
               rr:parentTriplesMap <#ContactPoint>
           ] ;
   ] ;
  • A referencing object map is represented by a resource that: many rr:parentTriplesMap properties (we use a rr:joinCondition). Here is an example of connecting an Address to its NUTS code

rr:predicateObjectMap
   [
       rr:predicate locn:adminUnitL1 ;
       rr:objectMap
           [
               rr:parentTriplesMap <#nuts>;
               rr:joinCondition [
                   rr:child "*:NUTS/@CODE";
                   rr:parent "code.value";
               ];
           ] ;
   ] ;

A join condition is represented by a resource that has exactly one value for each of the following two properties:

  • rr:child, whose value is known as the join condition’s child reference (the path into the Address TriplesMap)

  • rr:parent, whose value is known as the join condition’s parent source (the path into the ContactPont TriplesMap))

Document technical and philosophical issues

While writing the mapping rules, make sure to document any issues that you are not able to solve, or that raise interesting questions, in the Observations/Questions about mapping generation Google doc. If warranted, a Jira task should be also created to address the given issue.

Problems that were successfully resolved should be integrated in this guide, as recommendations, e.g. in one of the above sections, and marked as [SOLVED] in the document. Unless the problem turns out to be fairly trivial, or there is only one obvious solution to it. It would be recommended NOT to delete the issue from the “Problem description” document, so that we can keep track of the different issues, and the thinking that went into choosing certain solutions.

Create RML Rules Modules

Forms tend to have similar or identical sections i.e. represent the same information. For example form F03 and F06 have the same section 1 which is the information about the contract authority.

In order to not write transformation rules twice (for section 1 of F03 and section 1 of F06), we have set up a module system which consists of writing transformation rules that are reusable by several forms. This gives us the following advantages:

  • Not to rewrite several times the same rules for different forms (avoid duplication of rules).

  • In the case of an update of the ePO ontology, we will have to implement the modification only once in the module(s) affected by the modification which, and the modififacation will be passed into all the forms.

As explained in technical mapping creation section, a TripleMap represent a mapping of an XML element and its content with one class in the ontology and a set of its predicates objects. In order to have a module that can be reused on many forms we only define in these module files the predicates objects of a TripleMap. The logical sources and subjects map, needed in every TripleMap object, are described separately in the main file as they contain information specific to every form, making these parts of the TripleMap object non-reusable. For example a module that contain an organisation looked like that:

<#OrganisationMapping> a rr:TriplesMap ;
	rr:predicateObjectMap
		[
		rr:predicate epo:hasName ;
		rr:objectMap
				[
					rml:reference "OFFICIALNAME"
				]
		] ;

This module is extended in the main file of F03 by adding the logicalSource and the SubjectMap in this way:

<#OrganisationMapping> a rr:TriplesMap ;
	rml:logicalSource
		[
			rml:source "source.xml" ;
			rml:referenceFormulation ql:XPath
			rml:iterator "/TED_EXPORT/FORM_SECTION/F03_2014/CONTRACTING_BODY/ADDRESS_CONTRACTING_BODY" ;

		] ;
	rr:subjectMap
		[
			rr:template
				"http://data.europa.eu/a4g/resource/Organisation/uuid_generation_function ;
			rr:class epo:Organisation

		] ;

The modules are stored in the ted-rdf-mapping project in the src/mappings folder. For the moment we covered five modules that represent the five sections of F03 and part of F06/F25, and one for Annex D1, which is part of F03. The modules are:

  • s1_contracting_authority.rml.ttl

  • s2_object.rml.ttl

  • s4_procedure.rml.ttl

  • s5_award_of_contract.rml.ttl

  • s6_complementary_information.rml.ttl

  • annex_d1.rml.ttl

The project contains one src/mappings folder containing all the modules files and one "main" entry point mapping file for each form package. Here is a representation:

/ted-rdf-mapping
	/XXX
	/mappings
		/package_F03
			/transformation
				XXXX
				/mappings
					...
					technical_mapping_F03.rml.ttl
		/package_FXX
			/transformation
				XXXX
				/mappings
					...
					technical_mapping_FXX.rml.ttl
	/src
		/mappings
			annex_d1.rml.ttl
			s1_contracting_authority.rml.ttl
			s2_object.rml.ttl
			s4_procedure.rml.ttl
			s5_award_of_contract.rml.ttl
			s6_complementary_information.rml.ttl
			technical_mapping_F03.rml.ttl
			technical_mapping_FXX.rml.ttl