Using PowerShell to Ace the Scripting & Regular Expression Interview Question

Last week I saw a tweet that referenced a couple of interesting articles on interview questions for programmers. Steve Yegge's The Five Essential Phone-Screen Questions had an interesting question about identifying which of 50,000 web pages had phone numbers in them. You have to identify all of the web pages with phone numbers hard coded in them. All of the web pages are HTML files in a directory tree under a folder named website. The phone numbers are in the format of ###-###-#### or (###) ###-####. You have two days to deliver a list of the files with phone numbers in them and their paths. I'm paraphrasing the requirements.

I couple of days after I read this article, I started reading another book on PowerShell named "Windows PowerShell in Action, Second Edition" by Bruce Payette. The first page in chapter 1 had an example of piping Dir to Select-String to perform a search for the word error in the log files from the Windows directory. This gave me an idea about how to solve the phone number problem. Why not use PowerShell? Displaying help on Select-String confirmed that it is able to use regular expressions.

1Get-ChildItem -Path /website -Filter *.html -Recurse |
2Select-String -Pattern "^(?:\(\d{3}\)\ ?|(?:\d{3}\-))\d{3}\-\d{4}$" |
3Format-Table -Property Path, Linenumber AutoSize

ps-search4phone1.png

I also recently learned that if a line ends with a pipe symbol or comma then you don't need to use the backtick (grave accent) character for line continuation. That tip came from "Learn Windows PowerShell in a Month of Lunches" by Don Jones.

µ