INSPIRE Codelists: What are they, how to use them, and why do we need them?
If you have been creating INSPIRE GML, you have almost certainly encountered so-called codelists. They are an important part of INSPIRE data specifications and contribute substantially to interoperability. They are, however, not as straightforward as a simple enumeration is. This post explains what codelists are, how you use them, and why they are important.
In general, a codelist contains several terms whose definitions are universally agreed upon and understood. Codelists support data interoperability and form a shared vocabulary for a community. They can even be multilingual.
Managing Codelists and Codelist Registries
INSPIRE Codelists are commonly managed and maintained in codelist registers which provide search capabilities, so that both end users and client applications can easily access codelist values for reference. Registers provide unique and persistent identifiers for the published codelist values and ensure consistent versioning. There are many different INSPIRE registers which manage the identifiers of different resources commonly used in INSPIRE.
Codelists used in INSPIRE are maintained in the INSPIRE code list registry, the codelist registry of a member state, or an acknowledged, external third-party who maintains a domain-specific codelist.
To add a new codelist, you will have to either set up your own registry or work with the administration of one of the existing registries to get your codelist published. This can be a quite an involved process, which is designed to make sure that there is no random growth of codelists.
Extending Codelists
One special feature of codelists in INSPIRE is that they may be extensible. If a codelist is extensible, it will only contain a small set of common terms, but you can add your own terms. With respect to extensibility, we differentiate four different types of codelists in INSPIRE:
None
(Not extensible): A codelist that is not extensible includes only the values specified in the INSPIRE Implementing Rules (IR).Narrower
(Narrower extensible): A codelist that is narrower extensible includes the values specified in the IR and narrower values defined by the data providers.Open
(Freely extensible): A freely extensible codelist includes the values specified in the IR and additional values defined by data providers.Empty
(Any values allowed): An empty codelist can contain any values defined by the data providers.
You can recognize which type a codelist is by either looking at the UML model, where they appear as tagged values (“extensibility”), or by looking into their definitions in the respective registry. For example, the Anthropogenic Geomorphologic Feature codelist is shown below.
Codelists have maintenance processes which enable the update of codelist values. Codelists of the type "Not extensible" can also be updated to include new values for inclusion in the next, updated version. Codelists of the type "Freely extensible" can include extended codelist values, however only if they are managed in a register. Codelists of the type “Empty” often pose a challenge to users as there are not always readily applicable codelists available. In some cases, empty codelists suggest use of a standard external codelist commonly used in the domain.
Codelist Encoding
The conceptual schema language rules in the INSPIRE Generic Conceptual Model contain guidance on how to include codelists in INSPIRE GML application schemas, some of which you may recognize:
- Code lists should use the stereotype
codeList
. - The name of the codelist or enumeration should include the suffix
Value
- The documentation field of the
codeList
classes in the UML application schemas shall include the-- Name --
,-- Definition --
, and-- Description --
information. - The natural language name of the code list (given in the
-- Name --
section) should not include the termValue
. - The type of code list shall be specified using the tagged value
extensibility
on thecodeList
class. - For each code list, a tagged value called
vocabulary
shall be specified. The value of the tagged value shall be a persistent URI identifying the values of the code list. - A code list may also be used as a super-class for a number of specific codelists whose values may be used to specify the attribute value.
- Values of INSPIRE-governed code lists and enumerations shall be in
lowerCamelCase
notation.
In UML, the usage of an extended code list is indicated by substituting the existing code list. The extended codelist is represented by a sub-type of the original codelist.
Codelist values are encoded in GML application schemas using gml:ReferenceType
, which means that there is no formal link between the new subtype in the GML application schema and the extended codelist. The codelist itself must be published in a register and the register should be published in the INSPIRE register federation, however the application schema does not need to be adapted to use the extended or profiled codelist.
Using INSPIRE codelists in hale»studio
Both INSPIRE GML as well as the INSPIRE metadata – which describes harmonized datasets and network services – include references to codelists in the form of xlinks
. Xlink
is a recommendation by the World Wide Web Consortium for the definition of references in or across XML documents. Simple xlinks
are the standard method for object references in GML. Attributes encoded using xlink
require a URI to the remote object or internal document reference in xlink:href
.
It is standard practice to refer to items in the INSPIRE registry using HTTP URIs.
If you are using hale»studio to create your harmonization project, you can load INSPIRE codelists directly from the INSPIRE registry for use in your project. The INSPIRE codelists are referenced using http in the exported GML data.
To import an INSPIRE codelist into your hale studio project, select “File” » “Import” » “Codelist”.
Next, select “From INSPIRE registry”. A list of all INSPIRE codelists will appear and you can either filter by name or search by INSPIRE theme. The selected codelist will be added to your project.
If all the target instances in your dataset will use the same codelist value, select the href attribute in the target property and apply the Assign
function. In the Assign
function dialog, select the icon with the yellow arrows to assign a codelist value from the codelist you loaded into your project.
Next steps
Codelists are a fundamental building block of any INSPIRE implementation: they promote data interoperability through the effective reuse of stable and persistent identifiers for universally defined concepts. INSPIRE harmonization projects can often be stalled by empty codelists and missing values. Wetransform has supported numerous customers with the UML encoding of custom, codelist extensions, and with the development and maintenance of codelist registries. If you are interested in moving ahead with your project and overcoming the obstacles, please get in touch with our support team at support@wetransform.to.
If you're interested in learning more about such topics, feel free to check out our post on INSPIRE IDs or our news page!