Claudio Cherubino's blog Life of a Googler

21May/089

Google Treasure Hunt 2008, second puzzle in F#

Unlike the first puzzle, which required some maths knowledge, in the second one we have to prove to be able to recursively process the filesystem.

I think that this puzzle should be quite easy to solve for the system administrators and in general those used to scripting languages.

Here is the text of the second Google Treasure Hunt 2008 problem:

Here is a random zip archive for you to download:
GoogleTreasureHunt08_1767743498252291641.zip

Unzip the archive, then process the resulting files to obtain a numeric result. You'll be taking the sum of lines from files matching a certain description, and multiplying those sums together to obtain a final result. Note that files have many different extensions, like '.pdf' and '.js', but all are plain text files containing a small number of lines of text.

Sum of line 4 for all files with path or name containing jkl and ending in .txt
Sum of line 1 for all files with path or name containing zzz and ending in .xml
Hint: If the requested line does not exist, do not increment the sum.

Multiply all the above sums together and enter the product below.

As usual, the puzzle may seem challenging at a first glance, so it is important to break it into smaller (and simpler) pieces.

First of all, we have to get the list of files ending with a given extension and containing a specific substring in their name.

There is an overloaded version of the Directory.GetFiles() function which takes two parameters, the directory to be searched and a pattern to be found in the files.

Then we have to filter the results checking if the complete filenames (path + filename) contain the given substring.

This first sub-problem can be solved with the following F# code:

#light
open System.IO

let file_list dir pattern content =
  let rec allFiles dir pattern =
      seq
          { for file in Directory.GetFiles(dir, pattern) do
              yield file
            for subdir in Directory.GetDirectories(dir) do
              for file in allFiles subdir pattern do
                  yield file }
  let elems = allFiles dir pattern
  let filt (str : string) = str.Contains(content)
  Seq.filter filt elems

Now we have the list of all files that satisfy the requirements of the exercise.

The next step is to open them (they are plain text files, despite their extensions) and take the numeric value from the right line, if present.

Let's write a function to accomplish this task for a single file and then map it to the entire list:

let getvalue numline filename =
  let lines = File.ReadAllLines(filename)
  let linevalue (lines : string[]) numline =
    let realindex = numline - 1
    if lines.Length > realindex then
      System.Int32.Parse(lines.[realindex])
    else
      0
  linevalue lines numline

The File.ReadAllLines(filename) function returns an array of strings, one for each line of the source file.

Arrays in F# have indexes starting from 0, so we have to subtract 1 from the line number supplied by the user to get the real index.

We also have to be sure that the searched line exists. In this case we return its content, otherwise we return zero.

We have now all the elements we need to compute one of the numbers requested by the exercise, we should only supply the appropriate parameters in this way:

let treasurehunt2 dir pattern content numline =
  file_list dir pattern content |> Seq.map (getvalue numline) |> Seq.fold (+) 0

For instance, to solve the first line of my puzzle we have to run the following command, assuming that the archive was unzipped in C:\google:

treasurehunt2 "C:\google" "*.txt" "jkl" 5

I hope you don't need my help to run this command twice with different parameters and multiply the results to get your answer to the quiz!