Word wrap: more adventures with sed
When I first pasted the transcript of C. Scott's talk, it was ugly and unwrapped and splurted off the right edge of my screen. So I decided to learn more regular expressions. This is what happened.
Attempt #1: Mrah. why does cat foo.txt | sed 's/(.{1,80}( +|$
?)|(.{1,80})/13
/g' not work? 3 is not an invalid reference, you silly thing.
D'oh moment #1: Oh. escaping { and ( with is needed with sed. now...
Attempt #2: why doesn't cat foo.txt | sed 's/(.{1,80})( +|$
?)|(.{1,80})/13
/g' work?
Attempt #3 (gave up): Ok, simplifying... cat foo.txt | sed 's/(.{1,80})/1
/g' will work, but not split across word boundaries when possible. Good enough for me (for now).
<br />#!/bin/sh<br /><br />if [ $# -ne 1 ]; then<br /> echo Usage: wrap file.txt<br /> exit 127<br />fi<br /><br />cat $1 | sed 's/(.{1,80})/1 /g'<br />exit 0<br />
Translation from bahasa geek, for parents and other noncoders: I was trying to come up with a way to quickly wrap long lines of text (the equivalent of hitting Enter to put a newline in the middle of long sentences so that they'll fit onto a page).
Sed is a stream editor that I used to do this, and I wrote a shell script (#!/bin/sh) that fed (cat) out the file I gave the script ($1 - which is a variable that I'd replace with the file I wanted to wrap, when the script was actually run). This replaces (s/) instances of an 80-character group, (.{1,80}) - the escaped parentheses, ( and ), say this is a group of characters, the period says "any non-whitespace character," and the "1, 80" inside the escaped brackets says "up to 80 of them all together."
It replaces it with that same 80-character group (using 1, which is a backreference - basically, "the first group that you talked about here," which is what we were doing with the parentheses - and then a newline,
, which is like the Enter key. Basically, this script has the same effect as me going through a text document the following way:
- hit the right arrow key 80 times
- hit Enter
- repeat steps 1 and 2 until you reach the end of the document
It's a dumb script because it'll break in the middle of words but I couldn't figure out how to make it pay attention to word boundaries. I tried - see above attempts - but gave up because I didn't have time to chase the answer down. I'm hoping that somebody reading this might be able to spot where I went wrong. Halp?