Course Management Made Easy
Open Course Software Logo

Fun with Unix Commands

I downloaded a free icon set called Silk Icons that I use in OCSW.  The icon set literally contains 1000 icons.  In my earliest git commits, without thinking, I added all 1000 of these icons to the OCSW code base, which add up to over 4 megabytes of data.  In reality, I was only using a handful of these icons in the code, but I wasn’t really sure which ones were being used.  So, every time I uploaded a commit to github, I was sending around 4 MB of data that I didn’t need to send — 4 MB of data that I had to download back every time I installed the software, which was never going to get used.  I had to figure out which icons I was using, so that I could only have those in the git repository.

Since I develop in a Linux environment, I had access to the most well-suited tool for this task — the Unix command line.  OCSW consists of somewhere around 200 files (PHP and JavaScript) worth of code, in a couple of different directories, and so the first task was to identify them.  In reality, only the PHP files contain any reference to the icon files, so this command identifies all the PHP files that are directly related to OCSW:

find ~/public_html -name "*.php"

This command also identifies other PHP files in my web root that aren’t related to OCSW (like the course blogs I host), but I decided that was OK, because those other sites don’t use silk icons.  So, now I have a couple of hundred lines of output, each of which contains the path of a PHP file, like:

/home/faculty/cmerlo/public_html/bad_banner.php
/home/faculty/cmerlo/public_html/upcoming_events.php
/home/faculty/cmerlo/public_html/logout.php
/home/faculty/cmerlo/public_html/create_account.php

Then, I had to search each of these files to see whether it contained any reference to one of the icon files.  Here’s where the grep command comes in handy.  Using grep, I can search for a regular expression, like “silk_icons, followed by a slash, followed by anything that ends in .png”:

grep "silk_icons/.*.png"

But this doesn’t solve the problem of running that grep command on each of those PHP files.  Enter the xargs command:

find ~/public_html -name "*.php" | xargs grep "silk_icons/.*.png"

xargs essentially allows you to run a command (in tihs case ‘grep’) multiple times, once each per line of output from another command (in this case, ‘find’).  So, I am searching each of those PHP files for the regular expression described above.  The cool thing is that I’m searching some 180 files all at once, in one command.  That’s neat.

However, a new problem arose: the output from these repeated calls to grep consists of the full path name of each PHP file that contains an icon filename, and then the full line of code in that file:

/home/faculty/cmerlo/public_html/student_tools.php:                      "<img src=\"$docroot/images/silk_icons/cancel.png\" height=\"16\" width=\"16\" style=\"border: 0\" />",

Yuck.  Fortunately, grep provides the -h option, which suppresses displaying the name of the file in which the pattern was found; and the -o switch, which causes grep to only display the stuff I was looking for.  So now this command:

find ~/public_html -name "*.php" | xargs grep -ho "silk_icons/.*.png"

produces output like this:

silk_icons/cancel.png
silk_icons/cancel.png
silk_icons/accept.png
silk_icons/cancel.png
silk_icons/delete.png
silk_icons/cancel.png
silk_icons/cancel.png
silk_icons/arrow_up.png

Nice.  But notice that each PNG is potentially listed several times — once per PHP script in which it’s used.  I can clean this up with a couple of really nice Unix utilities.  The sort command will, as the name implies, reorder this output into ASCIIbetical order, and then the uniq command, when it finds multiple identical lines of input, will only display one.  Now my command looks like this:

find ~/public_html -name "*.php" | xargs grep -ho "silk_icons/.*.png" | sort | uniq

And my output looks like this:

silk_icons/accept.png
silk_icons/add.png
silk_icons/arrow_down.png
silk_icons/arrow_up.png
silk_icons/cancel.png
silk_icons/cross.png
silk_icons/delete.png

That’s terrific.  So, those are the names of the seven PNG files that OCSW actually uses, out of the 1000 I downloaded.  They take up 32 KB, as opposed to the 4 MB that the whole icon set uses.  Now, I just want to copy those files into my git directory, and I’m home free:

find ~/public_html -name "*.php" | xargs grep -ho "silk_icons/.*.png" | sort | uniq | xargs -I {} cp {} ~/git/Open-Course-Software/images/silk_icons

The {} means “Take that line of output — the PNG file’s filename — and use it here in the cp command”.  This is why Unix is awesome.  Enjoy!

March 23, 2011   2 Comments