{"id":597,"date":"2007-05-14T16:45:51","date_gmt":"2007-05-14T20:45:51","guid":{"rendered":"http:\/\/semanticbible.com\/blogos\/2007\/05\/14\/can-your-editor-do-this\/"},"modified":"2007-05-14T17:24:38","modified_gmt":"2007-05-14T21:24:38","slug":"can-your-editor-do-this","status":"publish","type":"post","link":"http:\/\/semanticbible.com\/blogos\/2007\/05\/14\/can-your-editor-do-this\/","title":{"rendered":"Can Your Editor Do This?"},"content":{"rendered":"<p>If my blog had aural feedback, no doubt i&#8217;d hear a few snickers for saying this out loud, but here goes:<\/p>\n<p>i use Emacs.<\/p>\n<p>There, i said it. With all the tightly integrated development environments available these days (Visual Studio, Eclipse, etc.), you may wonder why anybody would use such an old-school tool. In fact, i&#8217;ve been using Emacs in various flavors for more than 20 years now: <a href=\"http:\/\/www.cs.arizona.edu\/icon\/ftp\/newsgrp\/group88b.txt\">this request for an Emacs mode for an exotic programming language called Icon<\/a> might even be my oldest extant trace on the Internet (though we didn&#8217;t call it that back then, kids). I&#8217;m pretty stale now, but at one point i considered Emacs Lisp one of the programming languages i&#8217;d put on a resume (if that sentence doesn&#8217;t make sense to you, it shows that you don&#8217;t know enough about Emacs to snicker at it).<\/p>\n<p>Sure, it&#8217;s got a steep learning curve, it&#8217;s really geeky, and it&#8217;s not the hammer for every nail. I don&#8217;t write UI tools in it anymore, though for a while it was a pretty good choice for that. But there are still things i can do easily in Emacs that i don&#8217;t know how to do elsewhere without a lot more work: that&#8217;s one definition of what makes a good tool.<\/p>\n<p>One use case i encounter a lot when groveling through data is <em>progressive refinement<\/em>. Typically that means a large data set (thousands or more), where i need several steps to filter out certain values (that i don&#8217;t know in advance: that&#8217;s one reason an editing environment is a good choice). For example, my current task is finding funky Unicode characters encoded as XML character entities, and replacing them with ASCII equivalents (i know that&#8217;s not good form, but for this particular string-matching task, it&#8217;s good enough). I&#8217;ve got a few 10s of thousands of lines of data, and i want to find all the different #&#038;-encoded values so i can create a mapping table.<\/p>\n<p>A simple pattern match for &#038;# returns some 800 hits, and i don&#8217;t really want to look through all of them (particular when there&#8217;s a typical 80\/20 distribution: i&#8217;ll get the major ones, but miss some long-tail cases once i start scanning quickly). So here&#8217;s the easy trick in Emacs i find myself using a lot:<\/p>\n<ol>\n<li>M-x occur creates another buffer (called *Occur*) with all the lines that match a given regexp (i just use &#038;#)<\/li>\n<li>I scan the first page of results to see what looks like the most common value (in this case, \u00e2\u20ac\u2122, equivalent to an apostrophe), and add it to my map<\/li>\n<li>Here&#8217;s where it gets cool: the *Occur* buffer is a filtered view of my data, and i can work on it directly (once i toggle it the read-only status, a minor annoyance). So i switch to the *Occur* buffer, and then do M-x flush-lines for the value i just captured. This removes all the lines matching that case (about 400 of them for this first example), without damaging my original data (i&#8217;m in a different buffer).<\/li>\n<li>I go back to step 2 for a new value and repeat. Each time i&#8217;m capturing some large percentage of my data and then excluding that value from further consideration, getting a narrower and narrower view.<\/li>\n<li>At some point the view is narrowed down to a dozen or two lines, at which point i capture any remaining cases (all now in plain view), and i&#8217;m done.<\/li>\n<\/ol>\n<p>This is completely interactive, the possibilities are always in plain sight so i can make decisions about where to go next, and i don&#8217;t have to go hunting around. If i make a mistake, i can undo, or just back up and start over. And the values are right there for easy cut-and-paste into another buffer where i&#8217;m writing my code. (Caveats: this approach really only works with line-oriented data, multiple matches per line make it more complicated, and of course you need to figure out suitable regexps) Most of the time, i find about 10 or so cycles is enough for me to find all the values i care about, out of an original set of thousands.<\/p>\n<p>Can your editor do that?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If my blog had aural feedback, no doubt i&#8217;d hear a few snickers for saying this out loud, but here goes: i use Emacs. There, i said it. With all the tightly integrated development environments available these days (Visual Studio, Eclipse, etc.), you may wonder why anybody would use such an old-school tool. In fact, &hellip; <a href=\"http:\/\/semanticbible.com\/blogos\/2007\/05\/14\/can-your-editor-do-this\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Can Your Editor Do This?<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/posts\/597"}],"collection":[{"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/comments?post=597"}],"version-history":[{"count":0,"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/posts\/597\/revisions"}],"wp:attachment":[{"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/media?parent=597"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/categories?post=597"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/semanticbible.com\/blogos\/wp-json\/wp\/v2\/tags?post=597"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}