This morning, MarkLogic announced version 6 of the MarkLogic database. Lots of great stuff in this release (plenty of post material!). Let’s start with something simple, but very useful: copying data from one MarkLogic database to another with mlcp (MarkLogic Content Pump).
Migrating Data
In previous releases, there were a number of ways that you could choose to migrate data from one database to another. I typically used XQSync, which pulls data from one database into a bunch of zip files, and then in a second command, pushes that data in those zip files to another database. That’s a good tool, but it does require storing the archived version of the data on the machine where you run the command.
Migrating with MLCP
The new mlcp has a COPY command, which pulls from one database and pushes it directly to another, using XCC servers:
$ bin/mlcp.sh COPY -mode local \ -input_host host1 -input_port 8020 -input_username admin -input_password admin \ -output_host host2 -output_port 8006 -output_username admin -output_password admin \ -thread_count 8
As someone who has had to migrate a number of applications (and their data), I like the simplicity of this.
MLCP is a descendant of recordloader, xqsync, and other tools to shift content. One big difference is that mlcp is a supported MarkLogic product (while still being open source). Do you have suggestions to product management as to what should be added to it?
Tags: data, marklogic, mlcp, new feature, tools
October 30th, 2012 at 3:20 am
Couple of things to add here:
1) XQSync can also be used to directly copy data from one server to another server (if source and destination platforms are different).
2)While XQSync and Recordloader are not supported tools, MarkLogic does support MLCP.
January 16th, 2013 at 3:52 pm
I was wondering if there are some best practices and maybe some performance tunings when trying to upload a couple of millions xml files?
January 17th, 2013 at 7:42 am
Good question, David. I haven’t used mlcp in enough situations to have figured out any best practices yet. When dealing with a large set of data, the knobs I’d play with are -batch_size, -thread_count and -transaction_size.
If you figure out some best practices, please share!
April 26th, 2013 at 7:52 am
MLCP is a good tool for migrating data from one database to another safely. Keep sharing the informative posts!
March 13th, 2015 at 1:21 pm
MLCP should have a option to specify the log file location and logging level for each call rather than one single config parameter.
March 13th, 2015 at 1:56 pm
Ravi, MLCP uses log4j for logging, so you should be able to set up a FileAppender and tell it where you want the log file to appear. There are two aspects for which you can activate debug logging: mapreduce and contentpump. See http://docs.marklogic.com/guide/ingestion/content-pump#id_75840
March 25th, 2015 at 3:16 pm
What are the limitations of mclp?
March 25th, 2015 at 3:36 pm
Pratik: it isn’t very good at making chili, but otherwise it’s pretty versatile.
But seriously … while the initial release of MLCP struggled with some forms of CSV, it’s come a long way. It now handles XML, JSON, aggregate data, triples, and other stuff pretty well, while providing some good knobs to adjust for performance. It’s become my go-to tool for loading.
October 8th, 2015 at 11:02 am
Hi Dave,
I am using Recordloader to load data into marklogic on windows machine how ever when iam running it iam getting error
ERROR: No such file or directory – java -cp
Can you please help me out