Loading JSON into MarkLogic 7Author: Dave Cassel | Category: Software Development
This post shows how to ingest JSON into MarkLogic 7 using mlcp. Unlike many, this one is very specific to MarkLogic 7.
Since the release of MarkLogic 6, MarkLogic Content Pump (mlcp) has been the supported tool for importing, exporting, and copying content. One feature that’s missing from it is the ability to load JSON files without having them stored as text files. To expand on that, let me point out that MarkLogic 7 is part of a transition in how MarkLogic handles JSON files.
|MarkLogic Version||handles JSON as|
|6, 7||quietly converted to XML|
In MarkLogic 5, JSON documents are stored as text. As with any text document, that lets you do word searches, but you’re not able to use the structure.
In MarkLogic 6 and 7, you can load JSON using the REST API and MarkLogic quietly converts it to an XML format. When you request the document back, MarkLogic quietly converts it back to JSON. The reason for this is that handling JSON was a goal for MarkLogic 6, but it’s done at the REST API level — internally, actual JSON would just be text, preventing us from building indexes and otherwise working with the structure. By converting it to XML internally, we can do much more with it.
In MarkLogic 8, JSON is planned to be a native type. I tested loading JSON with mlcp today on an ML8 development build, and mlcp loads JSON as native content, meaning that the structure is accessible without having to do anything special.
Ingest via REST API
We can ingest JSON and have it transparently converted by POSTing them to the REST API. Here’s how to load a directory of JSON documents.
for f in ~/data/json-data/*.json; do curl --anyauth --user admin:admin -X POST -d@$f -i \ -H "Content-type: application/json" \ 'http://localhost:8040/v1/documents?extension=json&directory=/content/'; done
That works great, but for larger amounts of data, you lose out on mlcp’s ability to parallelize the workload.
Ingest via MLCP
And here’s the call to have mlcp use it: