This is an article introducing a new email reading system called notmuch, written by Carl Worth with comments from me (and a few minor patches).

Abandon Fail Boat

Almost two months ago, when I updated my debian system to the latest and greatest bits, I happened to get a new version of evolution, 2.28. As has become the tradition with new versions of evolution, a few more things broke.

I've suffered through evolution 'upgrades' several times and had slowly reduced my usage of evolution features to try and keep it working. This time, I got stuck. The accumulated bugs in this mailer made it impossible for me to get my work done any more.

And, yes, it's a sad commentary on the Linux desktop that the most important feature for many people using Linux has no credible GUI application (yes, I've tried a lot of email applications; I have too much mail for them to cope).

Exploring Sup

Carl had given up on Evolution a few weeks before and was using sup. From his description, and from a brief bit of experimentation, I decided to give it a try. Sup has four main features:

  1. It is entirely search based. All messages are indexed by a 'real' indexing system, xapian which provides reasonable full text search for email.

  2. You can mark (automatically, or manually) messages with labels; the 'inbox' view just shows the results of a search for messages with the 'inbox' label.

  3. It never modifies the actual mail store. All state is stored inside the database in the form of labels.

  4. Most operations act on threads, not messages. Viewing a thread shows you the unread messages in the whole thread in a single page, making following the conversation easy.

This feature set is exactly what I've been trying to get Evolution to use for several years; I used the virtual folders to automatically sort mail into several 'catagories'. Unfortunately, the evolution vfolder support was terrible to start with (way too slow to be actually useful) and has gotten far worse over time (no more nested vfolders?).

Sup works quite well for a small amount of email. With my message store (dating back to 1984), it took "a while" to do the initial scan of to construct the database. After that, searches are zippy fast.

Sup has a couple of fairly serious mis-features though:

  1. It's written in ruby. Yet another language disaster in my book; syntax horror-show similar to perl, and a lack of static typechecking means that obvious bugs in the program wouldn't be caught until you happened to execute that particular line of code. Ruby is also no speed demon—I spend a lot of my day reading email, waiting for ruby is not on my list of desired activities.

  2. It has a magic curses UI. This is actually pretty good for reading email, but it's not scriptable at all, which is useful for mass patch-application, and it completely fails when composing new mail as it forks off emacs and waits for it to complete, meaning that you cannot see any mail while composing a message.

  3. It saves a bunch of label changes inside the application, and Xapian saves most of the database changes too. Having sup crash often means re-viewing a lot of mail.

Carl and I started fixing sup in various ways; making the mime-viewer run asynchronously (so you could see attachments while viewing the rest of the message), sorting the inbox oldest first and various other changes. Nothing serious, but it did show us how sup was built and just how simple it was inside.

It turns out that sup is just a bit of UI goo over a powerful full-text database; the complicated code is not the UI but the database. Of course, the sup UI is great for viewing mail, but that's fortunately easy to clone.

A Minimal Mail Reader

Having seen just how easy it was to build a really nice mail reading system, Carl and I sat down and sketched out what the foundations of our 'ideal' system would look like:

  1. Xapian based. I haven't seen anything close to Xapian in terms of features or performance. It has only one serious bug—it's written in C++. Fortunately, we can wrap the C++ mess with a simple C wrapper and ignore that aspect.

  2. Command line driven. Any UI would be constructed on top of the command line interface. And, by UI, we mean emacs major mode. If someone wants to write a GUI, we won't stop them though.

  3. Otherwise, work a lot like Sup (thread-based, immutable mail store, user-defined tags).

Carl started by playing with Xapian, using the existing sup database; one possibility would have been to retain compatibility with the sup format and just provide a new interface. Unfortunately, there were a lot of 'ruby-isms' in the sup database, and reconstructing that would have been pretty difficult from a non-ruby application.

Introducing Notmuch: Not much of an email program

Notmuch really isn't much of an email program; it doesn't talk to mail servers to receive or send mail, it doesn't even really know what Maildir should look like. All it does is construct a database for all of your mail messages and allow you to search and show email messages.

Notmuch has two pieces—a C program that uses Xapian to search and tag mail messages, and an emacs major mode which provides a fairly simple user interface. Like git, the notmuch C program places a bunch of commands within a single executable:

  1. setup
    Interactively setup notmuch for first use.

  2. new
    Find and import any new messages.

  3. search search-term [...]
    Search for threads matching the given search terms.

  4. reply search-terms [...]
    Formats a reply from a set of existing messages.

  5. show search-terms [...]
    Shows all messages matching the search terms.

  6. tag +tag|-tag [...] [--] search-term [...]
    Add/remove tags for all messages matching the search terms.

  7. dump [filename]
    Create a plain-text dump of the tags for each message.

  8. restore filename
    Restore the tags from the given dump file (see 'dump').

  9. help [command]
    This message, or more detailed help for the named command.

(The above text was taken directly from notmuch itself and was written by Carl).

As you can see, all of the commands which talk about messages take an arbitrary search pattern. The search command outputs thread identifiers in search-term form, so you can easily script things by pulling that out of the search output and passing it to additional notmuch commands. Learning how to do searching in notmuch is the key to using it successfully.

Xapian Search Terms

Matching words anyplace in the message is fairly simple; just list the set of words you want to match. Notmuch also adds some special syntax to direct the match at specific header fields:

  • tag:tag
    match messages with the specified tag

  • thread:thread-id
    match messages associated with the specified thread

  • id:id
    match the message with the given id. Message ids are those set by the message sender in the Message-Id: header field.

  • from:word
    match messages with word in the from address field.

  • to:word
    match messages with word in either the To: or Cc: headers.

  • attachment:word
    match messages with word in an attachment filename.

  • subject:word
    match messages with word in the subject field.

Aside from these additions, notmuch uses standard Xapian search syntax, including support for AND, OR etc. Xapian's query parser is not the most robust piece of code though, so sometimes you need to mess with the query to get it to do what you want.

Notmuch emacs mode

There are a lot of email clients available for emacs; notmuch adds only the email reading part and uses the existing 'message' module for composing and sending mail. Even still, notmuch.el is almost 1000 lines long. It offers two different modes -- the search display, where a list of email threads are presented, and the thread display, where a single thread is displayed.

The search display presents the output of 'notmuch search' in a window, eliding the thread id. When a thread is selected, a thread display buffer is constructed with the thread contents as formatted by 'notmuch show'.

'notmuch show' structures the thread to make the display more useful in emacs; it splits messages into headers and bodies and marks the thread depth of each message. The header of each message will be shrunk to a single line (in reverse video). Previously read portions of the thread will be hidden by default, along with signature lines, quotations and attachments. Each of these can be viewed by use of a suitable command. Carl stole much of this from Sup and adopted it for use inside emacs, along with some of the key bindings.

How well does it work right now?

Frankly, notmuch is pretty rough today; I'm using it to read email, but I'm finding lots of stuff to fix. Fortunately, most of the fixes are pretty simple at this point. The good news is that it's plenty fast, fast enough that I can count how many threads I've exchanged with my good friend Bart in the past 25 years (2686) in only a few seconds.

The biggest performance issue is some lazy code within Xapian. When you want to change the set of tags related to a document in the database (a single mail message), Xapian replaces the entire document. Try removing the 'inbox' tag from half a million messages and Xapian will carefully rewrite 5GB of data. That takes a while. The Xapian developers have suggested that this shouldn't be hard to fix though, at which point re-tagging messages should get a lot faster.

For those interested in playing along, the notmuch sources are available from the notmuch web site along with a pointer to the mailing list.

Posted Tue 17 Nov 2009 02:14:21 AM PST Tags: tags/notmuch