Directory Creation setting

Author: Dave Cassel  |  Category: Software Development

The “directory creation” database setting is one of the first things I change for a new database. The default is “automatic”, but I switch it to “manual”, as I was taught by more experienced developers when I joined MarkLogic. The reason given was performance — use manual so that MarkLogic doesn’t bother creating directories. I’ve been doing this for a couple years without ever exploring what the different settings really mean. I’ve never even used the third option: “manual-enforced”.

The MarkLogic Administrator’s Guide has definitions for the directory creation settings (more details in section 12.1.4.4, Document and Directory Settings):

  • automatic: “directories are automatically created based on the URI of a document”
  • manual-enforced: “requires that the directory hierarchy corresponding to the URI exists before creating a document”
  • manual: “directories are not automatically created, but documents can still be created without corresponding directories”

Directories in MarkLogic are pretty analogous to directories on a file system: they are a hierarchical system for organizing documents, which exist inside the directories. When a document is inserted with a URI like “/content/binary/admin.pdf”, there are three directories mentioned in that URI: “/”, “content/”, and “binary/”.

Experimenting

Let’s see what happens when we insert a document with a multilevel URI, using each of the different settings. Each of these tests was run on a clear database with default settings.

manual:
xdmp:document-insert(“/a/b/c.xml”, <test/>)
2 fragments: c.xml, c.xml properties

manual-enforced:
xdmp:document-insert(“/a/b/c.xml”, <test/>)
error

automatic:
xdmp:document-insert(“/a/b/c.xml”, <test/>)
5 fragments: /, a/, b/, c.xml, c.xml properties

We see that the automatic setting creates a total of 5 fragments: three for the directories, one for the document itself, and one for the properties fragment on the document. Under manual, the directories are skipped, and only the document and its properties fragment are created. Manual-enforced throws an error. Under that setting, we would need to call xdmp:directory-create() for each of the directories we need. All three perform as advertised in the Admin Guide.

Interesting…

Here’s an interesting thing I noticed: the docs tell us that with “manual”, directories are not automatically created. That tells me that we don’t have the directories “/”, “a/”, and “b/”. The fragment counts in the database correspond to this. How else can we test for the existence of a directory? One way is to ask MarkLogic for the contents of the directory:

xdmp:directory("/", "infinity")

Running that query retrieves my test document, as does running the same query with “/a/” or “/a/b/”. (I get the same results running a search based on cts:directory-query.) That tells me that even with directory creation set to manual and skipping the actual manual construction of directories, the directories still have some kind of existence in the sense that they can be used for search. But we know that directories aren’t being created, so we must be missing out on something by not having them.

Tradeoffs

I mentioned that the reason for shifting to manual is performance. Automatically creating unnecessary directories does a couple things. First, there is the directory creation itself, slightly slowing document ingest. Second, that directory fragment increases the number of items the database needs to maintain, including copying those fragments while merging stands.

So automatic directory creation has an impact. Does it have a benefit? Yes — if you are using some specific features.

WebDAV

As the Admin Guide tells us, WebDAV requires directories to exist in order for a client to see database content — just having slashes in your URIs doesn’t cut it here. In fact, when I specified “/” as the root directory while trying to create a WebDAV app server, I got an error:

The root directory hierarchy for this server is missing the following directory: /

Manual creation didn’t make the “/” directory, so WebDAV doesn’t see it. Next, I wondered what I’d see by creating the directories manually:

xdmp:directory-create(“/”)

By creating just the root directory, I still couldn’t see anythings. I needed to create “/a/” and “/a/b/” before I could see anything with WebDAV. So if you intend to create a WebDAV server, you’ll either need to use “automatic” directory creation or manually create every directory you wish to expose.

Modules Databases

The Admin Guide also tells us to use “automatic” for modules databases. I looked into this more, running some experiments and asking around on a MarkLogic-internal mailing list. The consensus is that the only reason you would need automatic directory creation for modules databases comes back to the WebDAV use case, which is more common for a modules database. Some people will set up WebDAV and use that to directory edit the source code in the modules db.

I ran my own tests using a Roxy-based application. I ran bootstrap, then changed the modules database’s directory creation setting to manual, then deployed the code. It worked with no problems. I also set up a non-admin user just to make sure there wasn’t something special about admin that made things work. Still no problems. My conclusion, between my experiments and following up with others, is that strictly speaking there is no requirement that modules databases use automatic directory creation. If you want to allow WebDAV source code editing, you’ll want automatic, but that’s really the same use case as the above.

(Thanks to Dave, Norm, and David for their responses.)

Tags: ,

2 Responses to “Directory Creation setting”

  1. john Says:

    I like this website

  2. Marko Says:

    I have been using multilevel URI for a while now, pretty useful stuff :)

Leave a Reply