Can Your Editor Do This?

If my blog had aural feedback, no doubt i’d hear a few snickers for saying this out loud, but here goes:

i use Emacs.

There, i said it. With all the tightly integrated development environments available these days (Visual Studio, Eclipse, etc.), you may wonder why anybody would use such an old-school tool. In fact, i’ve been using Emacs in various flavors for more than 20 years now: this request for an Emacs mode for an exotic programming language called Icon might even be my oldest extant trace on the Internet (though we didn’t call it that back then, kids). I’m pretty stale now, but at one point i considered Emacs Lisp one of the programming languages i’d put on a resume (if that sentence doesn’t make sense to you, it shows that you don’t know enough about Emacs to snicker at it).

Sure, it’s got a steep learning curve, it’s really geeky, and it’s not the hammer for every nail. I don’t write UI tools in it anymore, though for a while it was a pretty good choice for that. But there are still things i can do easily in Emacs that i don’t know how to do elsewhere without a lot more work: that’s one definition of what makes a good tool.

One use case i encounter a lot when groveling through data is progressive refinement. Typically that means a large data set (thousands or more), where i need several steps to filter out certain values (that i don’t know in advance: that’s one reason an editing environment is a good choice). For example, my current task is finding funky Unicode characters encoded as XML character entities, and replacing them with ASCII equivalents (i know that’s not good form, but for this particular string-matching task, it’s good enough). I’ve got a few 10s of thousands of lines of data, and i want to find all the different #&-encoded values so i can create a mapping table.

A simple pattern match for &# returns some 800 hits, and i don’t really want to look through all of them (particular when there’s a typical 80/20 distribution: i’ll get the major ones, but miss some long-tail cases once i start scanning quickly). So here’s the easy trick in Emacs i find myself using a lot:

  1. M-x occur creates another buffer (called *Occur*) with all the lines that match a given regexp (i just use &#)
  2. I scan the first page of results to see what looks like the most common value (in this case, รขโ‚ฌโ„ข, equivalent to an apostrophe), and add it to my map
  3. Here’s where it gets cool: the *Occur* buffer is a filtered view of my data, and i can work on it directly (once i toggle it the read-only status, a minor annoyance). So i switch to the *Occur* buffer, and then do M-x flush-lines for the value i just captured. This removes all the lines matching that case (about 400 of them for this first example), without damaging my original data (i’m in a different buffer).
  4. I go back to step 2 for a new value and repeat. Each time i’m capturing some large percentage of my data and then excluding that value from further consideration, getting a narrower and narrower view.
  5. At some point the view is narrowed down to a dozen or two lines, at which point i capture any remaining cases (all now in plain view), and i’m done.

This is completely interactive, the possibilities are always in plain sight so i can make decisions about where to go next, and i don’t have to go hunting around. If i make a mistake, i can undo, or just back up and start over. And the values are right there for easy cut-and-paste into another buffer where i’m writing my code. (Caveats: this approach really only works with line-oriented data, multiple matches per line make it more complicated, and of course you need to figure out suitable regexps) Most of the time, i find about 10 or so cycles is enough for me to find all the values i care about, out of an original set of thousands.

Can your editor do that?

17 thoughts on “Can Your Editor Do This?”

  1. I use Emacs on linux, I’m not a power user and have never heard of ‘occur’ buffers but I’m still hooked.

    Seems to me we are most effective using the tool we know that best. I’ve seem people do stuff with ‘vi’ that I’d never think possible and power users of eclipse seem to be able to pull code out of thin air. They all have advantages and disadvantages, the key is to know the one you actually use I guess…

    James

  2. I do know people who still love ‘vi’, though i’ve always had the sense it was because it started up fast (whereas i tend to spend the entire day in Emacs: it’s an operating environment, not just an editor). But if you can do this in ‘vi’, then i have to retract my boast.

    Back in the day, i taught an Emacs class to programmer-colleagues at BBN, who generally had little idea how much power it had (M-x occur being one of the examples).

  3. I agree with James, every tool has its advantages and disadvantages. And I’ve worked so long time with vi, I almost never use an other tool. The same is with my Bible, I grew up with the StatenVertaling (a Dutch translation like the KJV), and for daily use I almost never read an other translation. I already have problems when finding a text in the StatenVertaling from an other person ๐Ÿ˜‰
    There are great other translations and other tools but it’s difficult to change your habits.

  4. The hypersearch feature in JEdit lets me do something that at least is very similar.

    I’m more impressed with JEdit every day. It’s even got a plugin for mouse gestures, for heaven’s sake. And best of all, I don’t have to relearn basic editing skills. Most of those are directly portable to VS and Eclipse, when I have to use them.

  5. At the end of the day, pretty much every tool-set does what all the other tool-sets do. Some do something better than others, but if you want to make a big optimization in your productivity don’t switch tool-sets. stop browsing the internet.

    On the flip side, if you want to maximize your browsing time, then switching to a better tool set might help.

  6. I’ve tried to switch to emacs a few times, even stuck with it for two months straight at one point… I always come back to vi though.

    That said, I do agree that these new fancy visual editors really aren’t worth the space they take up.

  7. What a complete waste of time. In any decent IDE it would take all of 5 mintes to create a macro that lists all the distinct matches. For an extra 5 minutes it can not only generate the mapping table, but automatically do the replacement.

  8. Great post! I love Emacs and I had never heard of occur before, so this is a cool tidbit.

    Btw, it is a lot of fun to battle between editors but at the end of the day, I always appreciate folks who care enough about programming that they take the time to learn to use an effective tool.

  9. Jonathan:
    Of course any decent editor can find matches, and do search and replace. My point was that this approach frees me from looking at all the matches: i never have to look further than scanning a screenful of data. Once i see a case of interest, i can easily remove it from further cluttering the data, without losing all the remaining cases i haven’t covered yet, and doing it all interactively with high visual bandwidth. The key differentiator is that the first filter is broad, and successive filters successively more narrow.

  10. Correct me if I’m wrong, but I read your stated goal as being to “find all the different #&-encoded values so i can create a mapping table.”

    With a fairly simple macro, you can create that mapping table directly without the need to manually scan pages of matches looking for unique values.

    You could probably do the same thing using a simple perl script as well.

    I guess my point is you are braging about how easy it is to do manual work when you have a powerful set of tools that can do it all for you with very little effort.

  11. Yes, you’ve got that right, and a Perl script would do the same thing. The benefit to me of this approach is that it’s more hands-on and visually accessible, which is critical when i don’t really know what the data looks like in advance. For example, i didn’t initially think about left/right quote characters which come 2-to-a-line: a quick script might have considered that case. If i know the problem specification, there’s little benefit over Perl: it’s the fact that i don’t, and i’m exploring as i go, that makes this useful.

  12. Great article! And a question: How do you switch an occur-mode buffer to be writable?

  13. Vim, can do this.

    1. The Ex command below creates a temporary buffer with the data for occurences just like M-x occurences. Excepting that it can inlucde as many files as required and even some powerful shell like patterns for e.g. **/*.h **/*.cpp -> This will search for the pattern occuerence in all .h and .cpp files found in current directory and any other sub-directory with in it!!

    :vimgrep // %

    2. Once you have entered the command above, you can preview the results in [Quickfix List] just like in Emacs using :copen, but you have useful normal mode short cuts to move next, forward, rewind, etc. etc.

    :copen

    3. The [Quickfix List] buffer is read-only as well. Easily remediated by below, and you can filter afterwards.

    :set modifiable

    4. Lather, rinse, repeat and shine.

    5. Now you can save the [Quickfix List] permanently to the harddisk using commands below and review them later on.

    :w

    Afterwards, start VIM with the quickfix list as shown below:

    $ gvim -q

    Now, Can *your* editor do that?

  14. Aemon:

    You can make the *Occur* buffer writable by M-x toggle-read-only (normally bound to C-x C-q). Since it’s a normal buffer you can rename it, save it, etc. M-x occurs is one of the great features of emacs that a lot of people don’t know about, IMHO.

  15. Alok:
    Actually, yes, emacs can do this (i’m not trying to play “top this” though, i promise!). You can run any arbitrary shell command (M-x shell-command) and the output is collected into a buffer. Unlike M-x occurs, M-x grep, and their kin, though, the resulting buffer isn’t automatically linked back to the source lines. (it’s possible to do that too, but then you have to write your own version, which sort of defeats the original purpose).

  16. Hello. I love Emacs, too. I think I’ve been using it for about 1.5 years now. I noticed a grammar error, though. In your article, the “i”s should be replaced with “I”s.

Comments are closed.