Mapping Methodology

Developing mappings for eForms still follows the sound principles of conceptual agreement followed by technical implementation, divided into the distinct stages of conceptual mapping ( CM ) using a dialect of SPARQL Path patterns, and technical mapping ( TM ) using the RML mapping language.

eForms SDK Field Definitions

In contrast to SF, which had XSD schemas as the only cue for the underlying model, there is an eForms SDK which prescribes eForm XML element XPaths in terms of nodes and fields, which lends an easier framework for the management and comprehension of the UBL-based XML model behind elements used across the eForm notices. The notices themselves are categorized under document, form and notice types and subtypes, such as Competition Notice, subtype 16 (EF16).

Mapping Groups

A new idea introduced into the mapping methodology is the notion of a Mapping Group (MG), which is a representation of RDF connections (relationships) and subject groupings in a simple, hyphen-delimited textual notation, in the form:

MG-{TrailingClass}-{intermediateClassAndPropertyPath}-{RootClass}

where, for example, the full path of an organisation’s address might be:

MG-Address-hasAddress-Location-hasLocation-Organization

This serves to provide an indication of how RDF statements are logically grouped together to form the right (connected) instances, given that data for a given subject may be spread across completely different XPaths in the source XML. That is why, in the technical mapping, MGs are used to form the subject URIs of the RML rules (TriplesMaps), suffixed with the node name (as per the SDK) where the information comes from, for example:

tedm:MG-Address-hasAddress-Location-hasLocation-Organization_ND-Company a rr:TriplesMap

Different nodes usually require different subject templates or iterators, especially those which are located at entirely different or orthogonal XPaths, requiring as many RML TriplesMaps with the same MG but different node names.

Modular RML Mappings

Given the large number of mappings that are necessry for the conversion of a certain eForms Notice (the CM contains around 500 entries for various fields and nodes that belong to a type of notice), a mapping package ends up containing 200+ RML TriplesMaps with 500+ predicateObjectMap, resulting in more than 11,000 lines of RML code. To be able to efficiently and properly manage this amount of information, we split it up into multiple RML files that are organized following a certain logic.

The structure of the RML files (we call them modules because they are modular files with RML rules that work together when combined) is based on the primary/root class of a set of mapping rules, which are part of one or more Mapping Groups (MGs) that share such a root class (the final segment of an MG name). An MG represents a logical grouping of related instances/resources (like a foaf:Person with all of its properties and relationships together with the instances of those relationships).

Example 1. Multiple TriplesMaps in an RML Module

For example, the LotGroup.rml.ttl file (as it appears in a mapping package created for the conversion of the Competition Notices, eForms subtypes 10-24) contains the following TriplesMaps:

tedm:MG-LotGroup_ND-LotsGroup
tedm:MG-Identifier-identifier-LotGroup_ND-LotsGroup
tedm:MG-Identifier-hasInternalIdentifier-LotGroup_ND-LotsGroupProcurementScope
tedm:MG-LotGroup_ND-LotsGroupProcurementScope
tedm:MG-MonetaryValue-hasEstimatedValue-LotGroup_ND-LotsGroupValueEstimate
tedm:MG-MonetaryValue-hasLaunchFrameworkAgreementMaximumValue-LotGroup_ND-LotsGroupValueEstimateExtension
tedm:MG-MonetaryValue-hasLaunchFrameworkAgreementMaximumValue-LotGroup_ND-LotsGroupFA
tedm:MG-AwardCriterion-specifiesProcurementCriterion-LotGroup_ND-LotsGroupAwardCriterion
tedm:MG-AwardCriterion-specifiesProcurementCriterion-LotGroup_ND-LotsGroupAwardFixedCriterionParameter
tedm:MG-AwardCriterion-specifiesProcurementCriterion-LotGroup_ND-LotsGroupAwardWeightCriterionParameter
tedm:MG-Constraint-hasConstraint-AwardCriterion-specifiesProcurementCriterion-LotGroup_ND-LotsGroupAwardThresholdCriterionParameter

Looking at the above example we can observe that, on one hand, a node ND-LotsGroup can have children fields that provide information for the generation of multiple insctances, e.g. an epo:LotGroup instance from tedm:MG-LotGroup_ND-LotsGroup, and an adms:Identifier instance from tedm:MG-Identifier-identifier-LotGroup_ND-LotsGroup. On the other hand, the full definition of the same EPO instance, can achieved by combining information provided in fields that are organized under different nodes, e.g. the MonetaryValue that represents the hasLaunchFrameworkAgreementMaximumValue property of a LotGroup can come both from fields that are chidren of the ND-LotsGroupValueEstimateExtension node and those that are children of the ND-LotsGroupFA node:

tedm:MG-MonetaryValue-hasLaunchFrameworkAgreementMaximumValue-LotGroup_ND-LotsGroupValueEstimateExtension,
tedm:MG-MonetaryValue-hasLaunchFrameworkAgreementMaximumValue-LotGroup_ND-LotsGroupFA

Another advantage of creating modularized RML files, is that it allows us to easier handle the differences that appear in various SDK versions. Since fields and nodes can appear or disappear in any (major or minor) SDK version, we need to be able to write different RML mappings for different SDK versions. Some of these mapping are valid for all SDK version (that exist as of now), some are valid only for one particular SDK version, while others are valid up to or starting from a given SDK version. Our approach is to place these mappings in files whose name reflect the SDK versions for which they apply, and at the time when we create a mapping package for a given SDK version, we select oll the modeles that apply for that SDK version.

Example 2. Multiple RML modules for mapping information about Lots

For example, the RML mappings that are necessary to transform notices encoded according to eForms SDK v1.8 are organized in more than 30 files, from which the mappings for the Lot related information is organized in these files:

Lot.rml.ttl                 # mappings valid accross all SDK versions
Lot_v1.3-1.8.rml.ttl        # mappings valid up to SDK version 1.8 (inclusive)
Lot_v1.4+.rml.ttl           # mappings valid for all SDK versions starting with v1.4
Lot_v1.7+.rml.ttl           # mappings valid for all SDK versions starting with v1.7
Lot_v1.8+.rml.ttl           # mappings valid for all SDK versions starting with v1.8
Lot_v1.8-1.8.rml.ttl        # mappings valid for only SDK versions v1.8

Important Note: The TriplesMaps in the various RML modules, especially those that represent version specific mappings, can spread accross multiple files. For example, a TriplesMap can have it logical source/iterator, its subject map, and its various predicat-object maps in several files, depending on what aspect is specified in a given SDK version and/or belongs to a given module. However, this information needs to be provided and organized in such a manner that when they are combined together in a package, the RML mappings should be valid, complete, correct and non-conflicting. E.g. every TriplesMap, after the modules are combined, should have one and only one logical source and subject map, and the subject map and all predicate-obect maps should work with the logical source that they are conbined with.

RDF URI Scheme

eForms RML Mappings URI Scheme

The eForms RML mappings use the URI scheme:

{ns}/{notice}/{concept}/{trailer}

where:

  • {ns} is a base namespace, in this case http://data.europa.eu/a4g/resource/ (prefixed epd:)

  • {notice} is the shared context for all entities in the document, composed of two parts {notice-id}-{notice-version}; together with {ns} it forms the base ns-notice or notice segment of the URI, e.g. epd:14549263-b47b-4e59-96a1-2d0d13e19343-01

  • {concept} is either:

    • (i) an ontology fragment label, i.e. the class name, or

    • (ii) a source element label, i.e. the XML element name (without any prefix), depending on which provides better context for the resource being represented

  • {trailer} is either:

    • (i) an ID value (if the resource has one), or

    • (ii) a re-encoded and normalized XPath (to ensure uniqueness within the document), in which case it is preceded by a dollar symbol ($) and not slash (/) (to facilitate future rewriting or hashing), resulting in the scheme {ns-notice}/{concept}${reencoded-xpath}, e.g. epd:af0b8395-7498-4d0e-b5eb-3d1a4636eb1a-01/Procedure$_ContractAwardNotice1_TenderingProcess1_ProcessJustification1

  • Root concepts such as epo:Notice end at the {concept}, and their identifier simply appends /Identifier, resulting in the scheme {ns-notice}/Notice/Identifier (to avoid redundant repetition of the ID value which is already represented in the notice segment)

  • Identifier instances, if not technical identifiers with recognizable ID patterns according to the eForms specification (e.g. LOT-XXX), may be preceded by the parent class and followed by the ID value, resulting in the scheme {ns}/{notice}/{parent-class}/Identifier/{id-value}

  • In some cases, the {trailer} may be an aggregate of multiple values to produce uniqueness, e.g. when the ID is combined with its schemeName

  • In the case of externally referenced resources (e.g. a referenced notice or a child entity thereof), the notice segment is extended with the context of the referred notice, resulting in the scheme {ns-notice}/Notice/{external-notice-id}/{concept}/{trailer}

The following are some examples of exceptions to the rule:

  1. epo:AgentInRole instances, which require a carefully constructed URI seeded with information about the related party (a foaf:Agent).

  2. epo:AwardDecision instances, which have a trailer based on the cbc:AwardDate to yield the same instance for awards on the same date across possibly repeating elements.

  3. External resources that cannot be identified, such as the Framework Agreement contract representing OPT-100-Contract Framework Notice Identifier, for whom a proxy epo:FrameworkAgreement is created without a trailer.

Wherever URI is mentioned, IRI is meant.