Tutorial: Working with Collectors in hale»studio

Andreas von Dömming

Thorsten Reitz

20 Feb 2017

Collectors are a powerful feature that we introduced in hale»studio 3.1.0 and have since then expanded on. So, for what use cases should you be looking at collectors?

Your target schema has a collection object, such as a Network, that needs to reference many or all objects of a different type, e.g. NetworkLinks
You are building up a hierarchy of objects such as AdministrativeUnits, with their upperLevelUnit and lowerLevelUnit references
You need to use Merge or Join operations to determine relationships between objects, but these are computationally too expensive

With a Collector, you can collect values in one place in your transformation project and then use these values in another place in the transformation process. Let's look at a recent project we've worked on to see how they work in practice.

Please note that this article assumes you have working knowledge of hale»studio and know the terminology.

A big collection of keys - which one to use?

To use a collector, there are two to three steps:

Define the collector
Define where to apply values from the collector
Optionally, define Cell Execution priorities to make sure collected values are available when used

We always collect values in the context of another transformation function. As of hale studio 3.2.0, these are the transformation functions that support the definition of collectors:

Groovy Scripts and Groovy Script (Greedy)
Groovy Retype
Groovy Merge
Groovy Join
Custom Functions

In all of these functions plus the following ones, you can apply the collected values:

Groovy Create
Assign Collected Values

With the upcoming 3.3.0 release, you'll see more widespread support for the feature. Basically, most existing functions will allow collecting values, and we'll add more functions to assign them.

The Use Case

This is the use case we are going to work on for this tutorial:

We need to create an INSPIRE Hydrographical Network dataset from UK Meridian 2 data encoded in a specific schema, using a GML 2.1 encoding. Each river segment from the source will be transformed to a WatercourseLink, and in addition, we'll create a Network object that references all created WatercourseLink features.

The source and target schemas for this project

You can take a look at the hale transformation project for this tutorial and download it, including source data, here at haleconnect.com.

Implementation

Step 1: Define the Collector in a `Groovy Script`

As always, first define the Type-level transformation function. On the source schema, pick River, on the target side, select WatercourseLink. Click on the double arrow icon and select Retype. Use the default values for the function.
Now, select fid on the source and id on the target feature type. Click on the double arrow icon and select Groovy Script. Leave the parameters on the first page as they are and click Next to proceed to the actual script editor. After you've entered the script, click Finish to let the transformation execute.

Use a Groovy Script function to create and collect IDs for the new features

This is the actual script to use:

Download

Step 2: Use the collected values in an `Assign collected values` function

The easiest way to use values from a collector is to use the Assign collected values function. Follow these steps to use it:

In the target schema, select the Network feature type. Click on the arrow icon and select Create. This function will create one or more objects of the target type from thin air. Create exactly one object.
Next, click on the elements property in Network and then on the arrow icon. Choose Assign collected values and click Next. Enter the name of the collector we've defined in the script above (linkIDs) so that it can be accessed.

The Assign collected values function has some special behaviour to automatically identify and create local references. If you inspect the created network, you'll see it now has 982 references that all look like this:

Assign collected IDs to any ReferenceType to get local references

Step 3: Transform in the right order

hale studio, in principle, automatically determines execution order of all cells. In some cases, this may not have the desired effect, so you need to provide hints to the transformation engine what should happen first, and what should happen last. For collectors, it's important that the engine first completes collecting values before it tries to apply them in a different place. We do plan to recognize these cases automatically but for now, you'll have to assign cell execution priorities to make sure everything always works as expected.

To ensure that the described steps are executed in the correct sequence, the execution priority has to be defined accordingly. The second mapping cell (Create on Network) has therefore to be set to a lower priority than the first mapping cell. This can be done via context menu in hale studio.

Edit the Cell Priority so that the function using the collector is executed last

Summary

With these three steps, you learned how to use the collector feature in hale»studio. Let us know what you think of this feature and what we can do to improve its usability!

Happy transforming!