MarkLogic Content Pump

Author: Dave Cassel  |  Category: Software Development

This morning, MarkLogic announced version 6 of the MarkLogic database. Lots of great stuff in this release (plenty of post material!). Let’s start with something simple, but very useful: copying data from one MarkLogic database to another with mlcp (MarkLogic Content Pump).

Migrating Data

In previous releases, there were a number of ways that you could choose to migrate data from one database to another. I typically used XQSync, which pulls data from one database into a bunch of zip files, and then in a second command, pushes that data in those zip files to another database. That’s a good tool, but it does require storing the archived version of the data on the machine where you run the command.

Migrating with MLCP

The new mlcp has a COPY command, which pulls from one database and pushes it directly to another, using XCC servers:

$ bin/mlcp.sh COPY -mode local \
  -input_host host1 -input_port 8020 -input_username admin -input_password admin \
  -output_host host2 -output_port 8006 -output_username admin -output_password admin \
  -thread_count 8

As someone who has had to migrate a number of applications (and their data), I like the simplicity of this.

MLCP is a descendant of recordloader, xqsync, and other tools to shift content. One big difference is that mlcp is a supported MarkLogic product (while still being open source). Do you have suggestions to product management as to what should be added to it?

Tags: , , , ,

9 Responses to “MarkLogic Content Pump”

  1. ankit kakkar Says:

    Couple of things to add here:

    1) XQSync can also be used to directly copy data from one server to another server (if source and destination platforms are different).

    2)While XQSync and Recordloader are not supported tools, MarkLogic does support MLCP.

  2. David Says:

    I was wondering if there are some best practices and maybe some performance tunings when trying to upload a couple of millions xml files?

  3. Dave Cassel Says:

    Good question, David. I haven’t used mlcp in enough situations to have figured out any best practices yet. When dealing with a large set of data, the knobs I’d play with are -batch_size, -thread_count and -transaction_size.

    If you figure out some best practices, please share!

  4. Software Development Says:

    MLCP is a good tool for migrating data from one database to another safely. Keep sharing the informative posts!

  5. Ravi Mariappan Says:

    MLCP should have a option to specify the log file location and logging level for each call rather than one single config parameter.

  6. Dave Cassel Says:

    Ravi, MLCP uses log4j for logging, so you should be able to set up a FileAppender and tell it where you want the log file to appear. There are two aspects for which you can activate debug logging: mapreduce and contentpump. See http://docs.marklogic.com/guide/ingestion/content-pump#id_75840

  7. Pratik Says:

    What are the limitations of mclp?

  8. Dave Cassel Says:

    Pratik: it isn’t very good at making chili, but otherwise it’s pretty versatile.

    But seriously … while the initial release of MLCP struggled with some forms of CSV, it’s come a long way. It now handles XML, JSON, aggregate data, triples, and other stuff pretty well, while providing some good knobs to adjust for performance. It’s become my go-to tool for loading.

  9. Brajesh Says:

    Hi Dave,
    I am using Recordloader to load data into marklogic on windows machine how ever when iam running it iam getting error
    ERROR: No such file or directory – java -cp

    Can you please help me out

Leave a Reply