DiSHACLed Dataset and Data Service Discovery

1. Introduction

The DCAT Application Profile for data portals (DCAT-AP) [DCAT-AP] is the official standard in Europe for describing datasets and data services in catalogues on the Web. It provides a common model that enables interoperability across data portals, making it easier to publish and share metadata consistently. This interoperability has been instrumental in improving the visibility and accessibility of open data across domains and EU member states.

However, the current design and usage of DCAT-AP are primarily oriented towards human-driven discovery. Catalogues are typically accessed through graphical interfaces where users browse, filter, and search for resources based on keywords or metadata attributes. While this supports transparency and accessibility for a wide audience, it limits the potential for automated, machine-based discovery and integration of datasets and data services. In practice, many use cases—such as federated analytics, cross-domain applications, or dynamic data integration pipelines—require more than keyword search: they need support for automated identification of datasets and services that match specific structural and semantic requirements.

This specification, aims to address this gap by defining a (i) data model extension for DCAT-AP that allows describing sematically what is contained within RDF [RDF11-CONCEPTS] datasets and expected/produced by RDF data services; and an (ii) algorithm for automated discovery and selection of RDF datasets and RDF data services over DCAT-AP catalogues, based on formal input constraints expressed as either:

a [SHACL] shape, describing the expected structure and semantics of the data, or
a [SPARQL-QUERY], representing the information needs directly.

By enabling discovery through such formal specifications, the proposed approach allows clients to automatically determine the relevance of available resources, thereby supporting machine-to-machine interoperability. This contributes to realising the full potential of DCAT-AP not only as a tool for publishing metadata for humans but also as a foundation for automated data ecosystems.

1.1. Document Conventions

Within this document, the following namespace prefix bindings are used:

Prefix	Namespace IRI	Description
dcat:	http://www.w3.org/ns/dcat#	[VOCAB-DCAT-3]
dcterms:	http://purl.org/dc/terms/	[DCTERMS]
rdf:	http://www.w3.org/1999/02/22-rdf-syntax-ns#	[RDF11-SCHEMA]
rdfs:	http://www.w3.org/2000/01/rdf-schema#	[RDF11-SCHEMA]
sh:	http://www.w3.org/ns/shacl#	[SHACL]
skos:	http://www.w3.org/2004/02/skos/core#	[SKOS-REFERENCE]
xsd:	http://www.w3.org/2001/XMLSchema#	[XMLSCHEMA11-2]

1.2. Terminology

Dataset refers to the definition given in [VOCAB-DCAT-3] for dcat:Dataset.
Distribution refers to the definition given in [VOCAB-DCAT-3] for dcat:Distribution.
Data Service refers to the definitions given in [VOCAB-DCAT-3] for dcat:DataService.
Resource refers to the generic term encompassing datasets, distributions and data services, defined in [VOCAB-DCAT-3] as dcat:Resource
Catalog refers to a [VOCAB-DCAT-3] resource of type dcat:Catalog.
Shape refers to a [SHACL] sh:NodeShape that formally describes constraints over RDF data.
Discovery refers to the process specified in § 3 Discovery logical flow for identifying relevant RDF resources over one or more catalogs.
Client refers to a client application that implements the § 4 Algorithm specification and supports the § 3 Discovery logical flow.

2. Describing DCAT(-AP) resource content with shapes

This section defines a set of alternative ways in which DCAT(-AP) resources, specifically referring to RDF datasets or data services, can be extended to include references to data shapes that semantically describe their content. The extensions are designed to be minimal and compatible with existing DCAT-AP profiles.

Note: The § 3 Discovery logical flow and related § 4 Algorithm specification are designed to support all the following data model extensions. In the presence of more than one alternative, a discovery client will not prioritise any of them and should treat them as a logical OR relation.

2.1. Relating to shapes via `dcterms:conformsTo`

In this approach we establish the use of the existing property dcterms:conformsTo [DCTERMS], as described by [VOCAB-DCAT-3] and [DCAT-AP], to link an RDF resource defined as part of a catalog, to a shape that describes its content.

In [VOCAB-DCAT-3], the use of dcterms:conformsTo is informally suggested for linking a resource to a relevant standard, that specifies its model, schema, ontology, application profile, among others. SHACL shapes may be considered as a type of schema, as they define the structure and constraints that RDF instance data must adhere to.

No further clarifications are provided in [DCAT-AP] regarding the expected use of dcterms:conformsTo, beyond the informal suggestion to link to relevant standards, understood as instances of dcterms:Standard.

Figure 1 shows how any kind of RDF resource that has been declared in a catalog, may be linked to a related shape via dcterms:conformsTo.

DCAT Resources linked to a SHACL shape via dcterms:conformsTo.

Example 1 shows corresponding RDF representations for a dataset, distribution, and data service, respectively linked to their shapes.

RDF representation in Turtle format of a dcat:Dataset linked to a sh:NodeShape via dcterms:conformsTo

ex:myDataset a dcat:Dataset;
    dcterms:title "My Dataset";
    dcterms:conformsTo ex:myShape.

ex:myShape a sh:NodeShape;
    sh:targetClass ex:MyClass;
    sh:property [
        sh:path ex:myProperty;
        sh:datatype xsd:string;
        sh:minCount 1;
    ].

RDF representation in Turtle format of a dcat:Distribution linked to a sh:NodeShape via dcterms:conformsTo

ex:myDataset a dcat:Dataset;
    dcterms:title "My Dataset";
    dcat:distribution ex:myDistribution.
        
ex:myDistribution a dcat:Distribution;
    dcat:downloadURL <https://example.org/dataset.ttl>;
    dcat:mediaType <https://www.iana.org/assignments/media-types/text/turtle>
    dcterms:conformsTo ex:myShape.

ex:myShape a sh:NodeShape;
    sh:targetClass ex:MyClass;
    sh:property [
        sh:path ex:myProperty;
        sh:datatype xsd:string;
        sh:minCount 1;
    ].

RDF representation in Turtle format of a dcat:DataService linked to a sh:NodeShape via dcterms:conformsTo

ex:myService a dcat:DataService;
    dcat:servesDataset ex:myDataset;
    dcat:endpointURL <https://example.org/my-data/service>
    dcterms:title "My data service";
    dcterms:conformsTo ex:myShape, ex:myOtherShape.

ex:myShape a sh:NodeShape;
    sh:targetClass ex:MyClass;
    sh:property [
        sh:path ex:myProperty;
        sh:datatype xsd:string;
        sh:minCount 1;
    ].

ex:myOtherShape a sh:NodeShape;
    sh:targetClass ex:MyOtherClass;
    sh:property [
        sh:path ex:myOtherProperty;
        sh:datatype xsd:integer;
        sh:minCount 1;
    ].

Note: A related shape to a data service could be referring to the constraints of either the input or output data (or both) of such service. Via the dcterms:conformsTo property, no distintion can be made about the role of a related shape (see § 2.2 Relating to shapes via dcat:qualifiedRelation for an alternative where it is possible). However, linking multiple shapes to a data service should not entail any conflicts for the discovery process, given its inclusive nature based on a logical OR operation.

2.2. Relating to shapes via `dcat:qualifiedRelation`

In this approach we establish the use of the property dcat:qualifiedRelation as described by [VOCAB-DCAT-3] and [DCAT-AP], to indirectly link an RDF resource defined as part of a catalog, to a shape that describes its content. This approach follows an n-ary [SWBP-N-ARYRELATIONS] or qualified [LDPATTERNS] data modelling pattern.

In [VOCAB-DCAT-3], the use of dcat:qualifiedRelation is informally specified as a way to define semantically richer relations among resources. This is accomplished with the introduction of the dcat:Relationship class, on which specific roles can be set, to further describe the nature of the relationship between a resource and another asset. For specifying types of roles, the use of controlled vocabularies is encouraged, which is reflected by the exsiting subclass relation between the dcat:Role class and skos:Concept.

No further guidelines nor usage notes are given for dcat:qualifiedRelation and dcat:Relationship in [DCAT-AP].

Figure 2 shows how a shape can be linked to any type of resource using the dcat:qualifiedRelation property.

DCAT Resources linked to a SHACL shape via dcat:qualifiedRelation.

In practice, for the discovery process, a client needs to traverse the property path dcat:qualifiedRelation/dcterms:relation, in search for an associated shape. This specification does not mandate a specific role concept value via dcat:hadRole. A discovery client must assess all existing qualified relations (see § 3 Discovery logical flow).

Note: [VOCAB-DCAT-3] suggests the use of coded term lists such as [IANA-RELATIONS], or [ISO-19115-1], to specify a dcat:Relationship role via the dcat:hadRole property. However, none of the recommended vocabularies defines a specific role type that conveys a relationship between a resource and an associated shape. A semantically similar role type is the describedby relation defined by [IANA-RELATIONS], but we opt for a lenient position since no established standard exists for this yet.

Example 2 shows corresponding RDF representations for a dataset and data service, respectively linked to their shapes via dcat:qualifiedRelation. According to [VOCAB-DCAT-3], the domain of dcat:qualifiedRelation is dcat:Resource, which excludes dcat:Distribution instances.

RDF representation in Turtle format of a dcat:Dataset linked to a sh:NodeShape via dcat:qualifiedRelation/dcterms:relation

ex:myDataset a dcat:Dataset;
    dcterms:title "My Dataset";
    dcat:qualifiedRelation [
        a dcat:Relationship;
        dcat:hadRole <http://www.iana.org/assignments/relation/describedby>; # this is optional
        dcterms:relation ex:myShape
    ].

ex:myShape a sh:NodeShape;
    sh:targetClass ex:MyClass;
    sh:property [
        sh:path ex:myProperty;
        sh:datatype xsd:string;
        sh:minCount 1;
    ].

RDF representation in Turtle format of a dcat:DataService linked to a sh:NodeShape via dcat:qualifiedRelation/dcterms:relation

ex:myService a dcat:DataService;
    dcat:endpointURL <https://example.org/my-data/service>
    dcterms:title "My data service";
    dcat:qualifiedRelation [
        a dcat:Relationship;
        dcat:hadRole ex:InputDataShape; # this is optional
        dcterms:relation ex:myShape
    ], [
        a dcat:Relationship;
        dcat:hadRole ex:OutputDataShape; # this is optional
        dcterms:relation ex:myOtherShape
    ].

ex:myShape a sh:NodeShape;
    sh:targetClass ex:MyClass;
    sh:property [
        sh:path ex:myProperty;
        sh:datatype xsd:string;
        sh:minCount 1;
    ].

ex:myOtherShape a sh:NodeShape;
    sh:targetClass ex:MyOtherClass;
    sh:property [
        sh:path ex:myOtherProperty;
        sh:datatype xsd:integer;
        sh:minCount 1;
    ].

Note: The qualified relation pattern allows for a more granular description of the type or relation that a given shape has with a resource. For example, as it is shown in Example 2 (Data Service Example), where the qualified relations describe the shapes of both input and output data of a data service independently. This has no impact on the discovery process as both relations shall be assessed to find relevant matches with respect to the given discovery constraints.

2.3. Relating to shapes via `sh:shapesGraph`

In this approach we establish the use of the property sh:shapesGraph, as defined by [SHACL], to link an RDF resource defined as part of a catalog, to a shape that describes its content.

The sh:shapesGraph property is used in [SHACL] to indicate a graph where the validation shapes are defined. From the perspective of DCAT, the use of sh:shapesGraph constitutes a model extension that is not currently contemplated by [VOCAB-DCAT-3] nor [DCAT-AP].

Note: An example of such an extenstion can be observed in the work of Frank Michiel et al. [WEB-API-DISC], where they propose the use of sh:shapesGraph to link a dcat:Distribution to a graph containing its shape definitions.

Figure 3 shows how any kind of RDF resource that has been declared in a catalog, may be linked to a related shape via sh:shapesGraph.

DCAT Resources linked to a SHACL shape via sh:shapesGraph.

An associated shapes graph linked via sh:shapesGraph may contain one or more shape definitions, but these must be related only to the resource that links to them. This is necessary to avoid false positives during the discovery process.

Example 3 shows corresponding RDF representations for a dataset, distribution, and data service, respectively linked to their shapes via sh:shapesGraph.

RDF representation in TriG format of a dcat:Dataset linked to a shape via sh:shapesGraph

ex:myDataset a dcat:Dataset;
    dcterms:title "My Dataset";
    sh:shapesGraph ex:myShapesGraph.

ex:myShapesGraph {
    ex:myShape a sh:NodeShape;
        sh:targetClass ex:MyClass;
        sh:property [
            sh:path ex:myProperty;
            sh:datatype xsd:string;
            sh:minCount 1;
        ].
}

RDF representation in TriG format of a dcat:Distribution linked to a shape via sh:shapesGraph

ex:myDataset a dcat:Dataset;
    dcterms:title "My Dataset";
    dcat:distribution ex:myDistribution.
        
ex:myDistribution a dcat:Distribution;
    dcat:downloadURL <https://example.org/dataset.ttl>;
    dcat:mediaType <https://www.iana.org/assignments/media-types/text/turtle>
    sh:shapesGraph ex:myShapesGraph.

ex:myShapesGraph {
    ex:myShape a sh:NodeShape;
        sh:targetClass ex:MyClass;
        sh:property [
            sh:path ex:myProperty;
            sh:datatype xsd:string;
            sh:minCount 1;
        ].
}

RDF representation in TriG format of a dcat:DataService linked to a shape via sh:shapesGraph

ex:myService a dcat:DataService;
    dcat:servesDataset ex:myDataset;
    dcat:endpointURL <https://example.org/my-data/service>
    dcterms:title "My data service";
    sh:shapesGraph ex:myShapesGraph.

ex:myShapesGraph {
    ex:myShape a sh:NodeShape;
        sh:targetClass ex:MyClass;
        sh:property [
            sh:path ex:myProperty;
            sh:datatype xsd:string;
            sh:minCount 1;
        ].

    ex:myOtherShape a sh:NodeShape;
        sh:targetClass ex:MyOtherClass;
        sh:property [
            sh:path ex:myOtherProperty;
            sh:datatype xsd:integer;
            sh:minCount 1;
        ].
}

3. Discovery logical flow

This section describes the logical steps that a client needs to follow to perform a discovery process. An ovewrview of the logical flow is shown in Figure 4.

Overview of a discovery process’s logical sequence.

In the next subsections a detailed description of each step is provided.

3.1. Discovery request

This is the initial step (labeled as 1️⃣ in Figure 4). A client starts a discovery process when it receives as input:

a set of one or more catalog URLs (IRI), which must point to a dereferenceable RDF representation of a catalog; and
a set of formal constraints, expressed either as an input SHACL shape (in any RDF serialization (string)) or an input SPARQL query (string).

3.2. Catalog retrieval

This step (labeled as 2️⃣ in Figure 4) involves retrieving the RDF representation of each catalog provided as input. A client must perform an HTTP GET request to each catalog URL, which must be dereferenceable to any RDF serialization (e.g. [TURTLE], [RDF11-XML], [JSON-LD], etc.).

Note: Even though in Figure 4 the catalog retrieval step is shown as a sequential process, a client may choose to perform the retrieval of multiple catalogs in parallel, to optimise the overall performance of the discovery process.

3.3. Extracting associated shapes

This step (labeled as 3️⃣ in Figure 4) involves parsing the RDF representation of each retrieved catalog, to extract all the shapes associated with RDF resources declared in the catalog. A client must identify all resources of type dcat:Dataset, dcat:Distribution, and dcat:DataService, and for each resource, extract any associated shape(s) using the three approaches defined in section § 2 Describing DCAT(-AP) resource content with shapes. Given the mutually inclusive relation of these approaches, a client must consider all of them when extracting shapes. That is, for every resource a client must:

follow the property path dcterms:conformsTo and assess the existence of sh:NodeShape instances (§ 2.1 Relating to shapes via dcterms:conformsTo). The following SPARQL query expresses an equivalent operation to extract the relevant quads from a catalog instance.

EQUIVALENT SPARQL QUERY

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?shape WHERE {
    VALUES ?type { dcat:Dataset dcat:Distribution dcat:DataService }
            
    ?resource a ?type.
        dcterms:conformsTo ?shape.
            
    ?shape a sh:NodeShape.
}

follow the property path dcat:qualifiedRelation/dcterms:relation and assess the existence of sh:NodeShape instances (§ 2.2 Relating to shapes via dcat:qualifiedRelation). The following SPARQL query expresses an equivalent operation to extract the relevant quads from a catalog instance.

EQUIVALENT SPARQL QUERY

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

SELECT ?shape WHERE {
    VALUES ?type { dcat:Dataset dcat:DataService }
            
    ?resource a ?type.
        dcat:qualifiedRelation [
            dcterms:relation ?shape
        ].
            
    ?shape a sh:NodeShape.
}

follow the property path sh:shapesGraph and assess the existence of sh:NodeShape instances within the referenced graph (§ 2.3 Relating to shapes via sh:shapesGraph). The following SPARQL query expresses an equivalent operation to extract the relevant quads from a catalog instance.

EQUIVALENT SPARQL QUERY

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

SELECT ?shape WHERE {
    VALUES ?type { dcat:Dataset dcat:Distribution dcat:DataService }
    
    ?resource a ?type.
        sh:shapesGraph ?shapesGraph.
    
    GRAPH ?shapesGraph {
        ?shape a sh:NodeShape.
    }
}

The equivalent SPARQL queries listed above, make the assumption that all RDF quads for both resources and associated shapes, are locally available as part of a catalog RDF document. However, it may be the case that a related shape is linked as a remote resource. In such case, a client must also dereference the shape’s IRI to retrieve its RDF representation.

3.4. Determining resource relevance

This step (labeled as 4️⃣ in Figure 4) involves determining the relevance of each resource, for which one or more shapes were extracted, according to the given input SHACL shape or input SPARQL query expressing the client’s formal constraints.

A client must evaluate each resource’s shape against the input SHACL shape or input SPARQL query constraints, by applying the query-shape subsumption algorithm described in section § 4 Algorithm specification.

3.5. Presenting results

In this last step, (labeled as 5️⃣ in Figure 4) the client collects and returns the set of resource identifiers (IRI), which were deemed as relevant with respect to the input constrainst (input SHACL shape or input SPARQL query).

4. Algorithm specification

This section defines a query-shape subsumption algorithm that aims to determine the relevance of an RDF resource based on a given set of formal constraints, expressed in the form of a SHACL shape or a SPARQL query, and considering a set of associated SHACL shapes to which the RDF resource conforms to. The algorithm decomposes queries and shapes into its constituent graph star pattern and proceeds to compare them in search of logical overlaps that indicate relevance relations among the input query/shape and the set of shapes related to catalog resources.

First, a set of preliminary concepts are defined, followed by a description of the logical steps taken by the algorithm.

4.1. Preliminaries

The algorithm operation is specified upon the following concepts:

triple pattern: a triple pattern $tp$ is a tuple $(s_{tp}, p_{tp}, o_{tp})$ where:
- $s_{tp}$ is the set of all subjects (name nodes $I$ , blank nodes $B$ and variables $V$ ),
- $p_{tp}$ is the set of all predicates (named nodes $I$ and variables $V$ ) and
- $o_{tp}$ is the set of all objects (named nodes $I$ , blank nodes $B$ , literals $L$ and variables $V$ ).
star pattern [STAR-PATTERNS]: given a SPARQL query $Q$ , a star pattern $Q_{sp}$ is a set of triple patterns $tp$ of $Q$ , all sharing the same subject.
Example of a star pattern defined as part of a SPARQL query.
```
SELECT * WHERE {
    ?subject a ex:Entity;  #---
        ex:pred1 ?obj1;    #  | start pattern 
        ex:pred2 ?obj2.    #---
}
```

graph star pattern: given a SPARQL query $Q$ , a graph star pattern $Q_{gsp}$ is the union of a root star pattern $Q_{spr}$ and all the star patterns having as a subject, an object of a triple pattern belonging to $Q_{spr}$ .

Example of a star pattern defined as part of a SPARQL query.

SELECT * WHERE {
    ?subject a ex:Entity;  #----------------------
        ex:pred1 ?obj1;    #  | start pattern    |
        ex:pred2 ?obj2.    #---                  |
                           #                     | graph star pattern
    ?obj1 ?p1 ?o1.         #                     |
                           #                     |
    ?obj2 ?p2 ?o2.         # ---------------------
}

DiSHACLed Dataset and Data Service Discovery

Living Document, 13 October 2025

Abstract

1. Introduction

1.1. Document Conventions

1.2. Terminology

2. Describing DCAT(-AP) resource content with shapes

2.1. Relating to shapes via `dcterms:conformsTo`

2.2. Relating to shapes via `dcat:qualifiedRelation`

2.3. Relating to shapes via `sh:shapesGraph`

3. Discovery logical flow

3.1. Discovery request

3.2. Catalog retrieval

3.3. Extracting associated shapes

3.4. Determining resource relevance

3.5. Presenting results

4. Algorithm specification

4.1. Preliminaries

Conformance

Index

Terms defined by this specification

References

Normative References

Informative References

DiSHACLed Dataset and Data Service Discovery

Living Document, 13 October 2025

Abstract

1. Introduction

1.1. Document Conventions

1.2. Terminology

2. Describing DCAT(-AP) resource content with shapes

2.1. Relating to shapes via dcterms:conformsTo

2.2. Relating to shapes via dcat:qualifiedRelation

2.3. Relating to shapes via sh:shapesGraph

3. Discovery logical flow

3.1. Discovery request

3.2. Catalog retrieval

3.3. Extracting associated shapes

3.4. Determining resource relevance

3.5. Presenting results

4. Algorithm specification

4.1. Preliminaries

Conformance

Index

Terms defined by this specification

References

Normative References

Informative References

2.1. Relating to shapes via `dcterms:conformsTo`

2.2. Relating to shapes via `dcat:qualifiedRelation`

2.3. Relating to shapes via `sh:shapesGraph`