Archive for the ‘Software Development’ Category

XPath: using the root element

Tuesday, August 31st, 2010

In XQuery, I commonly find myself writing XPath statements to navigate into some block of XML to find some tasty nugget that’s in there somewhere. When doing so, I’ve found one particular aspect that frequently throws me off, getting me empty results instead of the information I wanted. Having finally figured it out, and after confirming with a friend that I’m not the only one who messes this up, here’s a little lesson to help you write your XPath.

Let’s start with the question that I found confusing: if you want to specify a full XPath from the top node down to what you’re looking for, should you specify the root node? Seems like a simple thing, but as is so often the case, the answer is, “it depends”. Consider this XML:

<doc>
  <book>
    <name>The Fellowship of the Ring</name>
  </book>
  <author>
    <name>J. R. R. Tolkien</name>
  </author>
</doc>

I want to retrieve the name of the book using XPath. I can’t just ask for “//name”, because I’d get two results, the book’s name and the author’s. I’ll be more precise and specify the full path. So what XPath do I use: “/doc/book/name” or “/book/name”?

It depends on how we got this chunk of XML. If our XML is a document, then the variable that holds the XML is pointing to the whole document, and <doc/> is the top-level node of that doc. So, assuming that /book.xml contains the XML above:

let $doc := fn:doc('/book.xml')
return $doc/doc/book/name

But if the XML is constructed, then the node that holds it is pointing to the top-level node itself — there is no document.

let $str :=
  <doc>
    <book>
      <name>The Fellowship of the Ring</name>
    </book>
    <author>
      <name>J. R. R. Tolkien</name>
    </author>
  </doc>
return $str/book/name

Simple as that. Now a little test: which XPath expression would you use in this case?

let $str :=
  xdmp:unquote('
    <doc>
      <book>
        <name>The Fellowship of the Ring</name>
      </book>
      <author>
        <name>J. R. R. Tolkien</name>
      </author>
    </doc>')

To find the answer, you need to know that xdmp:unquote() returns a document-node() (actually one or more). Now that we know we’ll get a document back, the answer is simple:

return $doc/doc/book/name

Gotcha – sequence index evaluation

Tuesday, July 13th, 2010

Every now and then I get caught by this little gotcha, so I figured I’d share and hopefully by writing about it, I’ll remember to do this right. Let’s start with a little something simple, shall we?

let $seq := (1 to 100)
return $seq[10]

Simple, as promised. I create a sequence of numbers from one to one hundred and I ask for the tenth one. I run this and I get 10 as a result. So far, so good.

Now, suppose that I want to get a random element of this sequence. MarkLogic Server provides an xdmp:random() function, so this should be easy, too:

let $seq := (1 to 100)
return $seq[xdmp:random(99) + 1]

Randomly generate a number from 0 to 99, add one to get us into the 1 to 100 range, and return the value with that index. I run this one and I get… the empty sequence. I run it again and I get two values. I run it again and get one. What’s going on?

To see what’s going on, let’s run this in CQ using the Profile button.

expression count
let $seq := 1 to 100 return $seq[xdmp:random(9) + 1] 1
xdmp:random(9) + 1 100
xdmp:random(9) 100
$seq[xdmp:random(9) + 1] 1
1 to 100 1

What we see is the xdmp:random() expression getting called 100 times. Yet if you run Profile on the first implementation ($seq[10]), you’ll see that $seq[10] is evaluated just once and that “10″ doesn’t show up as an expression.

When I put a constant in the index operator ([]), XQuery knows exactly which element(s) I want — no work is required. But when I put an expression there, it evaluates the expression once for each element in the sequence and checks whether the current index matches the expression. That lets us do complicated things like

let $seq := (1 to 100)
return $seq[if (math:fmod(fn:position(), 3) = 0) then fn:position() else ()]

(return the elements whose indexes are divisible by three) but it comes at the cost of evaluating that expression more often than you might expect.

So what should we do instead? Happily, there is a simple solution:

let $seq := (1 to 100)
let $index := xdmp:random(99) + 1
return $seq[$index]

This approach returns one value every time it’s called, and profile shows us that each expression is evaluated only once.

Moral of the story: if you have an expression as a sequence index, make sure it’s not doing more work than you intend. Profiling, as always, is your friend.

Calling a function on all permutations of a sequence

Thursday, May 20th, 2010

A project I’m working on required me to call a function on each permutation of a sequence. I said to myself, “Surely, you can’t be the only person needing to do that in XQuery”. Having heard that from such a reliable source, I figured I should share.

(: Print an index and the selected members of the sequence. :)
declare function local:print($index, $values, $selections, $connector) {
    fn:concat($index, ": ", fn:string-join($values[$selections], $connector))
};

(: Apply the specified function to each permutation of $seq. $data is
 : provided as a pass-through.
 :)
declare function local:apply-on-permutations($seq as item()*,
            $function as xdmp:function, $data)
{
    let $len := fn:count($seq)
    for $i in (1 to xs:int(math:pow(2, $len) - 1))
    let $targets :=
        for $bit in (1 to $len)
        let $shifted :=
            if ($bit > 1) then
                xs:int($i div (math:pow(2, $bit - 1)))
            else $i
        return
            if (math:fmod($shifted, 2) eq 1) then
                $bit
            else ()
    return
        xdmp:apply($function, $i, $seq, $targets, $data)
};

let $seq := ('a', 'b', 'c', 'd')
return local:apply-on-permutations($seq, xdmp:function(xs:QName("local:print")), ',')

Calling this function generates this output:

1: a
2: b
3: a,b
4: c
5: a,c
6: b,c
7: a,b,c
8: d
9: a,d
10: b,d
11: a,b,d
12: c,d
13: a,c,d
14: b,c,d
15: a,b,c,d

I’m sure you’ll sleep better tonight knowing that this is available to you.

A RESTful chess service: part 5

Monday, May 3rd, 2010
Welcome to part 5 of my series on designing a RESTful chess service. Don’t worry — today’s is the last of the planning posts, then we get to some code.
The last two steps laid out by the authors of RESTful Web Services are to consider the typical course of events and to consider error conditions. Let’s get started.

The Typical Course of Events

There are two main use cases for this service. The first is a Tournament Director reporting the results of games. In this case, the TD will POST a PGN game to /games. The typical course here is that the service will translate the PGN to XML, use some of the fields to construct a URL, store the game, and returns its new URL. For this first version of the service, uploading a game is the only way to add data to the database. The other main use case is someone using the service to browse the contents of the database. Here, we expect that the consumer of the service will use a proper URL and the service will return a list of games that match. In the previous post, I worked out that the search results will be paginated.

As a secondary use case, the service allows a TD to upload a correction to a PGN record. This case leads to some interesting results. Naturally, if someone uses the service to retrieve an updated game, we want to show the updated version. The interesting part is what happens to the URL. Since we are constructing the game URLs from the PGN headers, updating a game record may well change the game’s URL. For instance, suppose that a TD POSTS a game with these PGN headers:

[Event "February 2010 Octet X"]
[Site "http://redhotpawn.com"]
[Date "2010.02.17"]
[Round "1"]
[White "David Cassel"]
[Black "sagator"]
[Result "1/2-1/2"]

The game will be accessible through the URL “/games/February+2010+Octet+X/http%3A%2F%2Fredhotpawn.com /2010.02.17/1/David+Cassel/sagator”. Later, the TD realizes that this site is usually recorded with the full address, including the “www”. The service will allow the TD to PUT the PGN with the corrected header ([Site "http://www.redhotpawn.com"]) to the URL created by the earlier POST. Two things happen as a result. First, the game will now be available at a URL based on the corrected headers: “/games/February+2010+Octet+X/%3A%2F%2Fredhotpawn.com/ 2010.02.17/1/David+Cassel/sagator”. But we also need to think about what to do if someone requests the original URL. After all, someone may have conducted a search or bookmarked the URL and may try to retrieve the game. HTTP code 301 Moved Permanently is the way to go here.

These seems like a good time to point out one of the simplifying assumptions I’ve made: that there’s no need to log in. In a real system, I would limit who is allowed to POST or PUT data, and might limit search and retrieval, depending on the intended use of the service. That means I’d need to think through how to handle security, how to model my users, how to respond to unauthorized access attempts, and so forth. MarkLogic Server provides handy ways to handle all that, but it’s not what I want to focus on right now. Maybe I’ll add that to the service in a future post.

Consider What Can Go Wrong

The steps listed by the book’s authors includes trying to anticipate what might go wrong while the service is being used. A little defensive programming can go a long way. So… what could possibly go wrong?

Let’s start with an obvious case: someone requesting a URL that doesn’t correspond to anything in the database. Is it worthwhile to list something basic like this? Personally, I think so. Listing such cases makes you decide whether you should return 404 Not Found, or simply an empty list with a 200 Success (you probably wouldn’t want this, but I’ve seen it happen). If someone asks the service for a game that isn’t in the database, the service will return a 404, but a search that has no results will get a 200 with an empty <results/> node. It also helps you to remember to address those cases in the code, so you don’t accidentally end up throwing a 500 Server Error.

Another potential error is a conflict — a TD POSTing a game with PGN fields that match those of a game already in the database. If this happens, we’ll return 409 Conflict, along with the URL of the conflicting game. That way, the TD will be able to look at the game that’s already in the database. It might be that the TD accidentally POSTed the same game twice, or it may be a mistake in one of the fields. By providing useful information, the consumer of the service will be able to figure out what went wrong.

Any time you expect a certain format for the data your users send you, you need to be prepared for the data not to match the format. In the chess service case, the PGN might be invalid, or it might be valid PGN that is missing some fields we require. We’ll treat these cases the same way, responding with a 400 Bad Request and a message stating the requirement of valid PGN and a list of required fields.

One way that I like to think of how things go wrong is to glance over the list of HTTP response codes. Sometimes it helps me think of a case that I might have overlooked otherwise. Based on a look at the list, I remembered that there are only a small number of permitted methods (GET on a particular game, event, player, or search; POST on /games; PUT on a particular game). Outside of those, we’ll return 405 Method Not Allowed.

I think that about covers it. I’ve now worked through the design for my RESTful chess service, informed in a couple places by some exploratory coding. Next time, we’ll look at how to implement the service with MarkLogic Server.

I found the procedure listed out by Richardson & Ruby to be a helpful one. Even though I’ve made some simplifying assumptions for this exercise, having a set of steps helped me go about the design in an organized way.

Do you have a particular approach you use to design a service?

A RESTful chess service: part 4

Monday, April 26th, 2010

This is part four in a series of posts walking through the process of building a RESTful chess service. I’ve been slacking off a bit in the pace of my posts, but hopefully that will pick back up — I have a bit more travel coming up, and evenings spent hanging out in hotel rooms are good for getting posts written!

The previous post in this series laid out representations of individual games. As part of that, I picked URLs for events and players, rooted at /events/ and /players/.

I think I’ve mentioned that I intend this blog as learning-in-public sort of exercise. Here’s a case in point: I don’t like the way I set up the event URLs in my last post. I have pictured that the representation of an event would list the participating players and the games. However, I set up the URL to only mention the name of the event. The problem with that is that the name of an event (“First Saturday Quads”) does not uniquely identify a specific tournament. To do that, you need to add at least the site and the date(s) on which the tournament was played. As such, I’m now revising the URL of event to take the form: /events/<event name>/<site name>/<date range>. I’m allowing (but not requiring) a date range, because while some events are over in one day, some last for a few days, weeks, or even (in the case of postal chess) years.

Representing Events and Players

Under the use cases I’ve identified so far, there is no way to enter event data other than the game information that shows up in the PGN. An event, under my setup so far, is simply a collection of games. I’m going to run with that for now, but I can picture storing addition infomation such as the name of the Tournament Director and the tournament winner. Eventually, that would require accepting a new representation for a tournament.

Since we don’t have much data about events, the representation won’t be that complex. Here’s the representation for /events/February 2010 Octet X/http%3A%2F%2Fwww.redhotpawn.com/2010.02.17-2010.04.01:

<event>
  <name>February 2010 Octet X</name>
  <site>http://www.redhotpawn.com</site>
  <date-start>2010-02-17</date-start>
  <date-end>2010-04-01</date-end>
  <players>8</players>
</event>

Likewise, what we know about players comes from the PGN information about individual games. For our player representation, we’ll give the name and what we know about the player — not much at this point.

<player>
  <name>Cassel, David</name>
  <counts events="2" games="100"/>
  <rating type="highest">1546</rating>
  <rating type="latest">1542</rating>
</player>

One more thing before we move on. I mentioned earlier that if we wanted to add, say, the Tournament Director to the representation of an event, we’d have to allow a consumer of this service to PUT a new representation. But notice that what’s contained in the representations right now is extracted from the actual game data. What would we do if someone uploaded a representation that contradicted what we knew from the game data? That’s an application-specific decision; we could either decide to overwrite what we saw in the game PGN data, or we could decide it is involatile. If we go with the latter approach, we could disregard contradicting information or trigger an error. Since I’m not going to allow PUTs to Event and Player resources, I don’t have to worry about it — but when you’re designing your service, watch out for gotchas like that and think through what should happen. (That’s step 9 in our authors’ procedure, for those keeping track.)

Representing Search

I’m going to offer two ways to search. The first will look pretty normal. I’ll use a URL like this: /search?q=. Is this RESTful? How does “search” count as a resource? Simply enough: search is an algorithm; the results of that algorithm are a resource, complete with a representation.

This is a free text search. Results may include games, players and events.

/search?q=quads
<results>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/Joe+Abner/David+Cassel</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2003.02.01/1/Joe+Demetrick/David+Cassel</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/David+Cassel/Charles+Jay</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/David+Cassel/Fred+Austin</game>
  <event>/events/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01</event>
</results>

We can see now how the decision to use transparent URLs is helpful: we can present these results a user, who will be able to determine something about them without having to load up the contents of the individual documents. But wait, there’s more! Let’s also set up search by path variable: we’ll let the user specify the parts they know about and use “any” for the remainder. We’ll only apply this style to game searches for now. I’ll also disallow wildcards, but you can probably see how that would be a useful extension. Let’s look at an example:

/games/First+Saturday+Quads/West+Chester+Chess+Club/any/David+Cassel/any

This URL will return a list of all games in which I played white in the West Chester Chess Club’s First Saturday Quads event, regardless or date or opponent. Of course, in a large database this could yield a lot of results, especially if you used “any” for a lot of fields. So let’s add two more fields in order to allow for paging: page number and page size.

/games/First+Saturday+Quads/West+Chester+Chess+Club/any/David+Cassel/any/1/10
<results>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/Joe+Abner/David+Cassel</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2003.02.01/1/Joe+Demetrick/David+Cassel</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/David+Cassel/Charles+Jay</game>
  <game>/games/First+Saturday+Quads/West+Chester+Chess+Club/2000.04.01/1/David+Cassel/Fred+Austin</game>
  <!-- six more <game/> results, then: -->
  <page>1</page>
  <link rel="next">/games/First+Saturday+Quads/West+Chester+Chess+Club/any/David+Cassel/any/2/10</link>
</results>

Now we’re asking for the same set of results as before, but we want the first page of 10 results. The <page/> element toward the bottom identifies the page and the <link/> element shows how to get to the next page. After the first page, the service would also provide a <link rel=”prev”/> element. This is part of being RESTful: a representation of a resource provides links to other things.

At this point, I’ve walked through the first seven steps of the process. The remaining steps are to consider the typical course of events, and to consider error conditions. I’ll pick that up in the next post. After that, it will be time to start whipping up some code!

A RESTful chess service: part 3

Tuesday, March 30th, 2010

In part 2 of this series, I defined the data set and resources for my service. I’ve been following the procedure listed in RESTful Web Services for how to lay out the service. Here are those steps again:

  1. Figure out the data set
  2. Split the data set into resources
  3. For each resource:

  4. Name the resources with URIs
  5. Expose a subset of the uniform interface
  6. Design the representation(s) accepted from the client
  7. Design the representation(s) served to the client
  8. Integrate this resource into existing resources, using hypermedia links and forms
  9. Consider the typical course of events
  10. Consider error conditions

Naming the Resources

I’ve finished the first two steps, and in doing so I identified Events, Players, Games, and search as resources that I want to expose through the service. Let’s take a look at the Games resource first. I need to decide how I will build the URI for a game. Let’s take another look at the PGN headers for my sample game again:

[Event "First Saturday Quads"]
[Site "West Chester Chess Club"]
[Date "2003.02.01"]
[Round "1"]
[White "Joe Demetrick"]
[Black "David Cassel"]
[Result "1/2-1/2"]
[WhiteElo "1337"]
[ECO "D13b"]

Of these fields, I believe you would need to specify Event, Site, Date, Round, White, and Black to be sure you’ve uniquely identified a game. This would cover cases where two players are playing a series of matches, with multiple games in the same day (I think this would be an unusual case, but it’s best to be sure). I can picture a couple different ways to identify this game with a URI. A very transparent way to do it would be to include each piece needed for uniqueness as a path variable:

/games/First+Saturday+Quads/West+Chester+Chess+Club/2003.02.01/1/Joe+Demetrick/David+Cassel

Another approach, which is a bit more compact, is to take the hash of the needed values and use that:

/games/372fa3184dc56e2910cc91195baccb4b

If I could count on each game having a game id field, I could just use that, of course, but that does not appear to be a widely used field, so let’s think about the two options I’ve laid out. Clearly, the first is more informative to a human reader. That’s handy, but there is something I’m not wild about with this scheme: the slashes typically imply a hierarchy; here, no such hierarchy is really there. The authors use other punctuation to deal with cases where there isn’t actually a hierarchy. For instance, when they establish URIs for maps, they separate the latitude and longitude values with commas instead of slashes. The slash-based structure implies some hierarchy where none really exists.

On the other hand, the hash certainly doesn’t imply an hierarchy. In fact, it’s completely impenetrable: all you know from looking at the URI is that it identifies one particular game. Let’s think about how this service might be used. Picture doing a search and getting back a list of games. If you’re building an application that will present this list to the user, you won’t want to just give a list of hashes. You’ll want your user to have some way to decide what links to click. Using the first approach gives a lot of information about the game. Using the second approach, conversely, before a system could present a list it would need to retrieve each of the individual games in order to get the basic metadata. Because it will be more helpful for building applications around the service, I’m going to go with the more descriptive URI.

Exposing a Subset of the Uniform Interface

The Uniform Interface refers to the set of HTTP methods. The most commonly used are GET, POST, PUT, and DELETE. The protocol also supports HEAD, OPTIONS, TRACE, and CONNECT, but my service will not support any of these. In a nutshell, GET is used to retrieve a resource’s representation, POST and PUT are used to create and update resources, and DELETE removes a resource. For the use cases I’ve described, there is only one way data changes through my service: a tournament director uploads a game. The way I will support this is to let the TD POST a game to /games/. Doing so will return the URI of the game. I will also allow a TD to PUT to a particular game URI, replacing the previous resource, as a way of correcting mistakes, as well as DELETE to simply remove a game.

I want to let users retrieve games, of course, or this service wouldn’t be all that useful. As such, I’ll support GET on a game to return a representation of that game. Here, I’ll throw in a little wrinkle: I want to offer the PGN version of the game, as that’s the standard representation, but I also want to offer an XML version, to be machine friendly. One of the hallmarks of a RESTful service is links between various resources. PGN doesn’t typically have links, but my XML representation will. With this in mind, I’ll make one change the URI used to retrieve a game: I’ll add an extension, which must be either .xml or .pgn.

Designing the Representation Accepted from the Client

Although my service will offer two representations for a game, I’ll only accept one from the client: the PGN standard. My service will expect to find the fields used to construct the URI; any other fields will be stored too.

Designing the Representations Served to the Client

The PGN representation needs no design work: when requested, a PGN representation will follow the standard format. The XML representation will take a little bit of thought, however.

<game>
  <headers>
    <Event>First Saturday Quads</Event>
    <Site>West Chester Chess Club</Site>
    <Date>2003-02-01</Date>
    <Round>1</Round>
    <White>Joe Demetrick</White>
    <Black>David Cassel</Black>
    <Result>1/2-1/2</Result>
    <WhiteELO>1337</WhiteELO>
    <ECO>D13b</ECO>
  </headers>
  <moves>
    <move num="1" color="w">d4</move>
    <move num="1" color="b">d5</move>
    <move num="1" color="w">c4</move>
    <move num="1" color="b">c6</move>
    ...
  </moves>
</game>

So far, this looks like an XMLized version of the PGN, with the one exception that I changed the date from 2003.02.01 to 2003-02-01. This simple change is to make the date a valid XML date. I’ll make a couple more changes in the next section.

One last comment before we move on, however. I took a quick look for XML representations of chess games. It seems none have caught on, simply because PGN has been so successful at capturing the information. So why am I bothering? Two reasons: 1) to provide the links that a good RESTful service needs to connect the resources, and 2) because for this exercise, I want to show multiple representations.

Integrate the Resource into Existing Resources

One of the key elements of a RESTful service is links to guide a consuming application from resource to resource. PGN doesn’t provide a way to do that, but our XML representation can. Let’s add a few changes:

<game>
  <link rel="alternate" type="application/x-chess-pgn"
    href="/games/First+Saturday+Quads/West+Chester+Chess+Club/2003.02.01/1/Joe+Demetrick/David+Cassel.pgn"/>
  <headers>
    <Event>
      <nameFirst Saturday Quads</name>
      <link uri="/events/First+Saturday+Quads"/>
    </Event>
    <Site>West Chester Chess Club</Site>
    <Date>2003-02-01</Date>
    <Round>1</Round>
    <White>
      <name>Joe Demetrick</name>
      <link uri="/players/Joe+Demetrick"/>
    </White>
    <Black>
      <name>David Cassel</name>
      <link uri="/players/David+Cassel"/>
    </Black>
    <Result>1/2-1/2</Result>
    <WhiteELO>1337</WhiteELO>
    <ECO>D13b</ECO>
  </headers>
  <moves>
    <move num="1" color="w">d4</move>
    <move num="1" color="b">d5</move>
    <move num="1" color="w">c4</move>
    <move num="1" color="b">c6</move>
    ...
  </moves>
</game>

For this step, I needed to jump ahead a bit and define the URIs for events and players. In most organizations, players would likely be given a unique id, which I would certainly want to use in the players URIs to avoid name collisions. Since PGN does typically use names, though, I’m going to stick with that for this exercise.

This post is getting a bit long, so I’m going to wrap this one up. In the next post, I’ll take a look at the representations for events and players as well as looking at seach.

A RESTful chess service: part 2

Sunday, February 21st, 2010

In part 1 of this series, I laid out my goals for a RESTful chess service, based on the RESTful Web Services book from the O’Reilly series. The authors present a procedure for designing services, the first two of which are:

  1. Figure out the data set
  2. Split the data set into resources

The use cases I’m looking to support are chess tournament directors (TD) reporting games during events and people searching for events, players, and games. That identifies my three major parts of the data set right there. There are other elements I could use. Chess has a standard notation for describing a game called PGN. Here’s an example from a small tournament I played in years ago:

[Event "First Saturday Quads"]
[Site "West Chester Chess Club"]
[Date "2003.02.01"]
[Round "1"]
[White "Joe Demetrick"]
[Black "David Cassel"]
[Result "1/2-1/2"]
[WhiteElo "1337"]
[ECO "D13b"]
1.d4 d5 2.c4 c6 3.Nf3 Nf6 4.cxd5 cxd5 5.Bf4 Bf5 6.e3 e6 7.Bb5+ Nbd7 8.O-O Qb6 9.Nc3 a6 10.Na4 Qxb5 11.Rc1 Be7 12.Ne5 Nxe5 13.Bxe5 O-O 14.b3 Rac8 15.Nb2 Ne4 16.g4 Nc3 17.Rxc3 Rxc3 18.gxf5 Rfc8 19.Qg4 f6 20.Nd1 Rc2 21.Bf4 exf5 22.Qxf5 Qc6 23.Kh1 Rxa2 24.Rg1 g6 25.Bh6 Bf8 26.Nc3 Qxc3 27.Qe6+ Kh8 28.Qxf6+ Kg8 29.Qe6+ 1/2-1/2

The format is pretty straightforward, I think. Metadata are at the top, followed by a list of moves in algebraic notation. If the move notation doesn’t make sense to you, don’t worry about it — that’s not important for this exercise.

This is a game I played in a small USCF event. You’ll notice the name of the event is “First Saturday Quads”. As you might guess, that’s not a unique name — other sites can have an event with the same name. Really it’s the combination of Event, Site, and Date that uniquely identify a particular tournament — and the date is a little dicey, since an event may span multiple days. This won’t be a problem for a TD uploading a game, as all the information is there. For search and browsing, I won’t have any explicit representation of a Tournament, although the service will allow a user to find all the games in a tournament.

The authors tell us that a resource is “anything interesting enough to be the target of a hypertext link.” They also list three kinds of resources web services commonly exposed.

1) Predefined, one-off resources

The given examples are top-level directories of available resources, or the home page of a web site. For my service, I will have a base resource that takes a collections approach, simply returning a list of links to the other resources available from the service.

2) A resource for every object exposed through the service

The objects I’ve decided to expose are events, players and games. Each event, player and game will be a unique resource available through the service. Events will be collections of games. We actually won’t be storing much information about players, only what is included in the game reports, so player resources will similarly be collections of games.

3) Resources representing results of algorithms

This service will define only one algorithm: search. The results will be collections of objects that match the search criteria. There will really be three ways to search, one for returning each type of object.

In the next part of this series, I’ll work through the remaining steps for the Game resource.

A RESTful chess service: Part 1

Monday, February 15th, 2010

I recently got a copy of RESTful Web Services, a book in the O’Reilly series that I read some time ago. I first read it when I was introduced to REST as an architectural style. It helped me get my head around a number of the concepts.

I decided that I want to refresh my memory about some aspects of designing RESTfully and to explore how to build a RESTful service using MarkLogic Server, so this post is actually the introduction to a miniseries. The book has a suggested method for going through the design steps, so I’ll walk through those steps and the implementation over a few posts. I invite you to read along and see how the process unfolds.

First things first: I need to decide what need I’m trying to satisfy. Being a chess player, I decided to implement a service that would be used by a chess organization such as the US Chess Federation, FIDE or the Free Internet Chess Server. These groups hold lots of tournaments, some of which are major international events while others are small events held by community chess clubs. In all cases, the tournament director needs to report the results to the organization, which will record the results and update players’ ratings.

Besides reporting, my service will support searching. The service should provide representations of players, events, and individual games. We’ll return player and event information as XML, but we’ll give users the choice between XML and PGN, the standard for chess notation, for individual games. We’ll need some search capability, in addition to browsing.

RESTful Web Services gives a 9-step Generic ROA (Resource Oriented Architecture) Procedure:

  1. Figure out the data set
  2. Split the data set into resources
  3. For each resource:

  4. Name the resources with URIs
  5. Expose a subset of the uniform interface
  6. Design the representation(s) accepted from the client
  7. Design the representation(s) served to the client
  8. Integrate this resource into existing resources, using hypermedia links and forms
  9. Consider the typical course of events
  10. Consider error conditions

In this post, I’ve introduced the series and laid out roughly what I want to do. Next time I’ll tackle the first two steps in the process. See you then!

Why is it flickering?

Sunday, January 31st, 2010

A friend of mine is learning JavaScript and jQuery and I recently helped her out with a problem she was having. She wanted to mouse over an image and have another <div/> appear, giving information about the image. Okay, nothing real strange there. But in her initial implementation she would mouse over the trigger image, the popup would appear, but then the popup would flicker, disappearing and reappearing repeatedly. I took what she started with and simplified it down to this to illustrate the problem:

<html>
  <head>
    <link rel="stylesheet" type="text/css" href="styles.css">
    <style type="text/css">
      #trigger {
        background-color: blue;
        display: block;
        height: 50px;
        width: 50px;
      }

      #popup {
        background-color: yellow;
        display: none;
        height: 200px;
        left: 25px;
        position: absolute;
        top: 25px;
        width: 100px;
      }
    </style>
    <script src="http://code.jquery.com/jquery-latest.js"></script>
    <script>
      $(document).ready(function(){
        $('#trigger').mouseover(function(){
          $('#popup').show('fast');
        });

        $('#trigger').mouseleave(function(){
          $('#popup').hide('fast');
        });
      });
    </script>
  </head>
  <body>
    <div id="trigger"></div>
    <div id="popup">Here is a bunch of text that will appear in the window
      that pops up. Lorem Ipsum and all that jazz.</div>
  </body>
</html>

If you view the page and point to the blue square, and yellow square with text pops up. Now move the mouse from the blue area to the yellow. As long as you’re in the overlap, the yellow area will repeatedly disappear and reappear.

What we wanted was that when you mouse over the blue, the popup would appear, and it would stay there until the mouse was over neither the blue trigger area nor the yellow popup. So that meant we needed to work with jQuery’s mouseleave event to hide the popup when leaving the target area. Here’s where it got tricky.

The area in red in the image to the left reflects the overlap of the two divs. Let’s think about what happens when we move the mouse from the blue trigger area to the red overlap. When the mouse crosses the line into the red, that triggers a mouseleave event from the trigger area. When that happens, we hide the popup. With the popup gone, the mouse is suddenly back in the blue div again. Mouseenter! That shows the popup again. Mouseleave! And so on.

So, what’s the solution? It was pretty straightforward in this case, once we understood what was happening. The real problem is that the trigger and popup divs are separate. In fact, the popup div should be a part of the trigger div, and then suddenly everything is easy. Why does that make a difference?

Take a look at the screenshot to the right. Here, the red area outlines the boundaries of the trigger div. Because the popup is now a child of the trigger, the mouse can roam around the yellow area without having left the trigger div, so we keep the popup showing.

The difference in the code is very simple; we just moved one div inside another, like so:

  <body>
    <div id="trigger">
      <div id="popup">Here is a bunch of text that will appear in the window
        that pops up. Lorem Ipsum and all that jazz.</div>
    </div>
 </body>

Of course, that led to some changes in the positioning of the popup, but that’s just a matter of working out new values for your CSS.

Moral of the story: when you want a popup, make it part of the trigger. That will let you have one region for the mouse to roam around in, making the event management nice and easy!

Passing a sequence to xdmp:eval()

Sunday, January 3rd, 2010

When using MarkLogic Server, you’ll sometimes need to execute a command against a different database than the one associated with your application. Enter the xdmp:eval() function. eval() lets you execute any code you can put into a string. Among its several options is <database/>, allowing you to execute a command against a different database.

As an example, suppose you want to create a new role. This function must be executed against the security database. We can do this with the eval function:

let $cmd := fn:concat(
    'import module namespace sec="http://marklogic.com/xdmp/security" at
      "/MarkLogic/security.xqy";',
    'sec:create-role("my-role", "This is my new role", (), (), ())'
  )
return xdmp:eval($cmd, (),
  <options xmlns="xdmp:eval">
    <database>{xdmp:security-database()}</database>
  </options>)

This will create a simple (albeit not very interesting) role.

But let’s suppose that we want to do something a little more interesting. Suppose we’re writing a script that will initialize the database for an application, and that as part of that initialization, you want to set up some roles. For each role, you want to pass in a sequence of default collections. There are a few ways to do it. First, let’s review xdmp:eval()’s mechanism for passing in variable information:

let $collections := ("http://marklogic.com/abc", "http://marklogic.com/def")
let $cmd := fn:concat(
    'import module namespace sec="http://marklogic.com/xdmp/security" at
       "/MarkLogic/security.xqy";
     declare variable $role external;
     declare variable $desc external;
     sec:create-role($role, $desc, (), (), ())'
)
return xdmp:eval($cmd, (xs:QName("role"), "my-role", xs:QName("desc"), "This is my new role"),
  <options xmlns="xdmp:eval">
    <database>{xdmp:security-database()}</database>
  </options>
)

Here we pass the role and description in to the eval() expression. We could, of course, just build the string with the values embedded. Why not do that? A couple of reasons:

  1. Reuse: if you’re going to execute this command a few times, it’s better to build the string once, then pass in the parts that change;
  2. Security: in some cases, embedding values directly into the command string could expose you to the XQuery injection problem, analogous to SQL injection. IBM has a write up of XPath injection. Disclaimer: I know that I need to read up more on XQuery injection to understand when it can bite you, and how the passing variable technique protects you. Maybe a future post. The usual case to worry about is when you use input from the user.
  3. Readability: okay, this is subjective, but I think your code will be easier to read if the command string just declares the external variables and isn’t broken up with sticking values in. This may depend on whether you’re putting in a scalar or a sequence. In our example, we’ll do a sequence.

As you can see, the 2nd parameter to eval() is a sequence of variable names and values. Here’s the trick: I want to get my sequence of collection strings into the mix. But if I put a sequence into a sequence, it will get flattened. So I can’t simply add “xs:QName(‘collections’), $collections”, because eval() will complain that the 2nd collection in the list isn’t a QName. Here’s how we can handle that:

let $collections := ("http://marklogic.com/abc", "http://marklogic.com/def")
let $cmd := fn:concat(
    'import module namespace sec="http://marklogic.com/xdmp/security" at
       "/MarkLogic/security.xqy";
     declare variable $role external;
     declare variable $desc external;
     declare variable $collections external;
     sec:create-role($role, $desc, (), (), fn:tokenize($collections, "~"))'
)
return xdmp:eval($cmd,
  (xs:QName("role"), "my-role",
   xs:QName("desc"), "This is my new role",
   xs:QName("collections"), fn:string-join($collections, "~")),
  <options xmlns="xdmp:eval">
    <database>{xdmp:security-database()}</database>
  </options>
)

In order to pass a sequence to eval(), we have to make it something other than a sequence. What I did here was join the collection names with a character that I’m sure isn’t being used in the collection names, then split them again inside the eval().

Another technique that was suggested to me was to build some XML around the sequence values, then use XPath to break them down within the eval(), but my testing found strings to be both simpler and more concise to write and faster to run.