Friday, October 30, 2009

Handy Shell Pattern

When piping commands, you can often benefit from breaking your data into multiple lines. A combination of sed and xargs can make this very easy to do. Take the following commands for example:

cd /tmp && mkdir blah && cd blah;
touch file-1 file-2 file-3;
ls | sed 'p; s/-//' | xargs -n2

Breaking this down, we create a directory and touch three empty files. From there, we use sed to print two lines. The first line is the original file name, the second line is the filename without the dash. From here, we can use the -n argument in conjunction with xargs to merge the two lines back into a single line. The resulting output would be:

file-1 file1
file-2 file2
file-3 file3

Once you have the lines joined, you could do something like:

... | while read a b; do mv $a $b; done

Bonus tip: the "read" bash builtin takes a line and associates each word with a given variable. Type "help read" from the shell for the complete documentation.

In summary the full command-chain for this tip would be:

cd /tmp && mkdir blah && cd blah;
touch file-1 file-2 file-3;
ls | sed 'p; s/-//' | xargs -n2 | while read a b; do mv $a $b; done

Thanks to Chris Sutter for contributing this handy shell pattern, and thanks to Andeers for the coffee!

9 comments:

Anonymous said...

instead of:

ls | sed 'p; s/-//' | xargs -n2 | while read a b; do mv $a $b; done


you could do something like:


ls | perl -ne 'chop;$a=$_;$a=~s/-//;`mv $_ $a`'

Travis Whitton said...

@Anonymous

Yeah, Perl can be used in lieu of a slew of shell idioms; however there are a few things Perl won't give you out of the box that you can get with piped commands. My next post will illustrate one such technique.

Thanks for the tip!

Anonymous said...

It is better to use

| while read a b; do mv $a $b;

Anonymous said...

Sorry, in the previous post I wanted to say

| while read -r a b; do mv $a $b;

Kieran Clancy said...

If the filenames have any spaces or line breaks in them, then this will break horribly, because you will split the line breaks across two lines by piping from ls, and then xargs will further break things with whitespace.

I usually use something like this*:

for f in *; do mv "$f" "`echo "$f" | sed 's/-//'`"; done

* this works in bash, and I think in any standards-compliant sh, but I haven't tested it. If a filename ends with a newline, this has the side-effect of removing it (due to the backtick expression). Can't think of a simple way to get around that. Also, the sed will only operate per-line-break for the filename, so if you don't want that you need to use something like this which I found here:
sed -n '1h;1!H;${g;s/foo/bar/g;p}'

--

Still, I don't think it's that much more complicated, and works properly in many more cases.

Caio Moritz Ronchi said...

If you have the "rename" command installed, then you can achieve the same substitution behaviour with:

$ rename 's/-//g' *

That would remove all dashes from all filenames in the current directory. I use that command a lot, it works pretty much like Perl regular expressions.

Travis Whitton said...

I think there may have been some confusion here. The point of this post wasn't really to demonstrate how to rename files. Rather, it was to show how to use xargs and sed to construct and combine multi-line statements. The rename portion of the post was intended to show a basic example of how constructing and assembling data might be used in practice.

That being noted, I appreciate the suggestions regarding improvements on how to rename files. Part of the fun of this stuff is seeing the wide variety of solutions that you guys offer. Thanks for contributing.

Seth said...

@Travis - Sorry about the digression from your original point. I think the use case you provided (iterating over files and doing something to each file) is more commonly dealt with by for loops.

One observation about the command line you provided, on my system I had to run "ls --color=none" as the tty colors were causing problems. Also, you should specify the 'g' flag to sed if you want to remove all occurrences of "-".

Here's a slightly more succinct version that uses tr:

for f in *; do mv "${f}" "$(tr -d '-' <<<${f})"; done

Anonymous said...

"read" does not deal with spaces, but "read -r" does.

This is why I said before to use "read -r"