Using the REST API to get values from a MapReduce Aggregation Function

Author: Dave Cassel  |  Category: Software Development

In a recent post, I showed how to build a simple MapReduce Aggregation function with MarkLogic 6. That’s a good start, but the next step is figuring out what we can do with it. At some point, we’ll probably want to display the values from the function to the user, maybe as part of an analytics widget. In a future post, I’ll talk about how to display the values using the updated App Builder. For now, we’ll see how to use the REST API to get the values.

It’s worth noting that while I’m exploring my user-defined aggregation function, MarkLogic 6 has a bunch of new built-in aggregate functions, too. The procedure to use those is pretty similar, except that you don’t have to specify the aggregatePath or udf attributes that you’ll see below.

Setup

The Aggregation function

The first bit of setup is to build and load the custom function. You’ll find instructions for that on my earlier post.

Data and an Index

There are a couple ways to get what we want, but we’ll need some data. I’m going to use an example where I have some xs:date data in an element called posted-date. If you don’t have more interesting data at hand, you can use this to dummy up something good enough for this exercise (create a new database or use the Documents database):

for $i in (1 to 1000)
return
  xdmp:document-insert(
    "/content/" || $i || ".xml",
    <doc>
      <data>{$i}</data>
      <posted-date>{
        fn:current-date() - xs:dayTimeDuration("P"||xdmp:random(30)||"D")
      }</posted-date>
    </doc>
  )

That will produce 1000 documents scattered over 30 days. Add an element range index on posted-date (http://localhost:8001 -> Configure -> Databases -> <your database> -> Element Range Indexes. Use type = xs:date, no namespace (for this example), localname = posted-date).

REST App Server

You have data and an index, now we need a REST app server to get to it. Go to http://localhost:8000/appservices/, select your database from the Database dropdown near the top of the page, and click the Configure button. Scrolling down, you’ll see a section called REST API Instances. Click the “+ Add New” button, give it a name and an available port, and click the “Create REST API Instance” button. For the rest of this post, I’ll assume that you built the app server on port 8003.

Search Options

The REST API is configurable so that we can get back what we want. Search results are controlled by specifying search options. Our next step is to create some search options that will work on the posted-date range index. We’ll start with the simplest options and then tweak them.

<options xmlns="http://marklogic.com/appservices/search">
  <values name="posted-date">
    <range type="xs:date" facet="false">
      <element ns="" name="posted-date"/>
    </range>
  </values>
</options>

These options tell MarkLogic that we want to get values from the posted-date index, but don’t yet make use of the aggregation function. Now we need to tell the app server about these options. We can register the options using the REST API itself, by sending a POST message. Save the options in a file (I called mine dow-options.xml), then:

curl --anyauth --user admin:admin -X PUT \
  -d@"./dow-options.xml" -H "Content-type: application/xml" \
  http://localhost:8003/v1/config/query/dow-options

Change the username and password as needed — if you use something other than the admin user, you’ll need a user with at least the rest-writer role. This posts the contents of the options file to the REST app server. Note that the end of the URI is the place where I’m asking to put the new options.

Getting Values

At this point, we have enough that we can use the REST API to get the values posted-date values. This link:

http://localhost:8003/v1/values/posted-date?options=dow-options

shows all values.

<values-response name="posted-date" type="xs:date"
  xmlns="http://marklogic.com/appservices/search" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <distinct-value frequency="2">2001-03-25</distinct-value>
  <distinct-value frequency="1">2007-02-25</distinct-value>
  <distinct-value frequency="1">2007-08-05</distinct-value>
  ...
</values>

However, we aren’t calling the aggregation function yet. To do that, we need to change the request a bit.

Using the Aggregation Function

Now that we have a values option set up, there are two ways to apply an aggregation function to it.

Using Request Parameters

We can choose to use an aggregation function on a call-by-call basis by changing the request parameters:

http://localhost:8003/v1/values/posted-date?options=dow-options
  &aggregate=day-of-week&aggregatePath=native/day-of-week

Now in addition to a list of the values, we get the day-of-week function’s results:

<aggregate-result name="day-of-week">
  <map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <map:entry key="0">
      <map:value xsi:type="xs:unsignedLong">8470</map:value>
    </map:entry>
    <map:entry key="5">
      <map:value xsi:type="xs:unsignedLong">4574</map:value>
    </map:entry>
    <map:entry key="3">
      <map:value xsi:type="xs:unsignedLong">9304</map:value>
    </map:entry>
    <map:entry key="6">
      <map:value xsi:type="xs:unsignedLong">8482</map:value>
    </map:entry>
    <map:entry key="1">
      <map:value xsi:type="xs:unsignedLong">6736</map:value>
    </map:entry>
    <map:entry key="2">
      <map:value xsi:type="xs:unsignedLong">2764</map:value>
    </map:entry>
    <map:entry key="4">
      <map:value xsi:type="xs:unsignedLong">8713</map:value>
    </map:entry>
  </map:map>
</aggregate-result>

Search Options for Aggregation

We can also set up search options so that we always use the aggregation function. We’ll change the options that we set up above so that our function gets called.

<options xmlns="http://marklogic.com/appservices/search">
  <values name="posted-date">
    <range type="xs:date" facet="false">
      <element ns="" name="posted-date"/>
    </range>
    <aggregate apply="day-of-week" udf="native/day-of-week" />
  </values>
</options>

We tell MarkLogic about the revised option the same way we told it in the first place: a POST message.

curl --anyauth --user admin:admin -X PUT \
  -d@"./dow-options.xml" -H "Content-type: application/xml" \
  http://localhost:8003/v1/config/query/dow-options

Values with Aggregation

Now we can make the same call as we did above, but in addition to the values, we’ll also get the aggregation function results.

http://localhost:8003/v1/values/posted-date?options=dow-options

Aggregation without the Values

You may want to get just the results of the aggregation function without the full list of values. The REST API supports that with the view parameter. Specifying “view=aggregate” skips the full listing of the values.

http://localhost:8040/v1/values/posted-date?options=dow-options& \
aggregate=day-of-week&aggregatePath=native/day-of-week&view=aggregate

Well, that’s it for this installment of exploring the REST API. Still to come: wiring up App Builder to show the values.

Tags: , ,

Leave a Reply