Using SED on Windows

The other day I needed to trawl through a rather unappealing log file. Over 18,000 lines and over 12,000 occurrences of the word error. A quick scan of the log showed that the vast majority of these were more a warning than a significant error. However I knew the error I was looking for was in there somewhere. I decided that I needed to weed out the non-error lines before I even started to tackle this one. I remembered that the stream editor (SED) utility in UNIX was ideal for this sort of thing so I had a look to see how I could apply this in Windows.

The first step was to try and find out if you could get SED for Windows. I had a good idea that it did exist and a quick Google search turned up a GNU version of SED that would run on Windows. You can download a copy of SED for Windows from SourceForge. I selected and installed the “Complete package, except sources” to the default location in Program Files.

I then spent a while looking through my book on SED & AWK and various web sites to try and get together the pieces for the script that I wanted to run. I’d really recommend the O’Reilly SED & AWK book by Dale Dougherty & Arnold Robbins, but there are plenty of resources on the web and in the manuals that come as part of the installation package. The “Useful one-line scripts for SED” was really useful to get the line numbering looking great in the script I eventually used.

I wanted my script to run through the log file and pick out every line containing the word “ERROR” (in upper case), and prefix it with the line number from it’s position in the original log.

So I set about creating a simple DOS batch file to call three SED commands in succession. At each stage I produced an intermediate file which I could check as I was creating the script to make sure it was working as I’d intended.

So the first SED command simply adds line numbers to each line from the original log file.

SED = "{input file path}" > "{output file path}"

The second SED command takes this new file and spaces out the line numbers and lines from the log so that the log lines all line up and the numbers sit in a ‘reserved space’ at the beginning of the line.

SED "N;s/\n/\t/" "{input file path}" > "{output file path}"

Finally the third SED command processes the file and outputs only lines containing the word “ERROR”.

SED -n "/ERROR/p" "{input file path}" > "{output file path}"

The batch file simply sets the current directory at the start to be the location of the SED executable and finally tidies up the temporary files. A temporary folder on the root of the C drive was used as the location for the original log file, the temporary files and the final processed log file.

@ECHO off
ECHO Set the current directory to the folder in which SED was installed
C:
CD "C:\Program Files\GnuWin32\bin"
ECHO Add line numbers
SED = "C:\temp\original.log" > "C:\temp1.log"
ECHO Format line numbers
SED "N;s/\n/\t/" "C:\temp\temp1.log" > "C:\temp\temp2.log"
ECHO Output only the lines containing the word 'ERROR'
SED -n "/ERROR/p" "C:\temp\temp2.log" > "C:\temp\processed.log"
ECHO Remove temporary files
DEL "C:\temp\temp1.log"
DEL "C:\temp\temp2.log"

I have a feeling given how prevalent these log files are in the work I’m doing that I may need to re-use and modify this script in the future. SED is a great tool and blindingly fast at processing even large files. If you haven’t already got it I’d recommend downloading this open source utility and adding it your Windows toolbox today.

Author: Stephen Millard
Tags: | utilities | windows |

Buy me a coffeeBuy me a coffee



Related posts that you may also like to read