Claudio Cherubino's blogLife of a Googler

21May/089

Google Treasure Hunt 2008, second puzzle in F#

Unlike the first puzzle, which required some maths knowledge, in the second one we have to prove to be able to recursively process the filesystem.

I think that this puzzle should be quite easy to solve for the system administrators and in general those used to scripting languages.

Here is the text of the second Google Treasure Hunt 2008 problem:

Unzip the archive, then process the resulting files to obtain a numeric result. You'll be taking the sum of lines from files matching a certain description, and multiplying those sums together to obtain a final result. Note that files have many different extensions, like '.pdf' and '.js', but all are plain text files containing a small number of lines of text.

Sum of line 4 for all files with path or name containing jkl and ending in .txt
Sum of line 1 for all files with path or name containing zzz and ending in .xml
Hint: If the requested line does not exist, do not increment the sum.

Multiply all the above sums together and enter the product below.

As usual, the puzzle may seem challenging at a first glance, so it is important to break it into smaller (and simpler) pieces.

First of all, we have to get the list of files ending with a given extension and containing a specific substring in their name.

There is an overloaded version of the Directory.GetFiles() function which takes two parameters, the directory to be searched and a pattern to be found in the files.

Then we have to filter the results checking if the complete filenames (path + filename) contain the given substring.

This first sub-problem can be solved with the following F# code:

```#light
open System.IO

let file_list dir pattern content =
let rec allFiles dir pattern =
seq
{ for file in Directory.GetFiles(dir, pattern) do
yield file
for subdir in Directory.GetDirectories(dir) do
for file in allFiles subdir pattern do
yield file }
let elems = allFiles dir pattern
let filt (str : string) = str.Contains(content)
Seq.filter filt elems
```

Now we have the list of all files that satisfy the requirements of the exercise.

The next step is to open them (they are plain text files, despite their extensions) and take the numeric value from the right line, if present.

Let's write a function to accomplish this task for a single file and then map it to the entire list:

```let getvalue numline filename =
let linevalue (lines : string[]) numline =
let realindex = numline - 1
if lines.Length > realindex then
System.Int32.Parse(lines.[realindex])
else
0
linevalue lines numline
```

The File.ReadAllLines(filename) function returns an array of strings, one for each line of the source file.

Arrays in F# have indexes starting from 0, so we have to subtract 1 from the line number supplied by the user to get the real index.

We also have to be sure that the searched line exists. In this case we return its content, otherwise we return zero.

We have now all the elements we need to compute one of the numbers requested by the exercise, we should only supply the appropriate parameters in this way:

```let treasurehunt2 dir pattern content numline =
file_list dir pattern content |> Seq.map (getvalue numline) |> Seq.fold (+) 0
```

For instance, to solve the first line of my puzzle we have to run the following command, assuming that the archive was unzipped in C:\google:

```treasurehunt2 "C:\google" "*.txt" "jkl" 5
```

I hope you don't need my help to run this command twice with different parameters and multiply the results to get your answer to the quiz!

1. I noticed this on DZone and figured I’d like to try it in D – http://larsivi.net/node/112

2. Hi!

It’s nice to see somebody else using F# for this problems There’s another overloaded version of GetFiles that receives a SearchOption. Here you can specify to get files of subdirectories in the given directory.

3. Hi Diego,
with the overloaded version you mentioned the code becomes shorter and easier to understand, thanks!

4. find . -iname *jkl*.txt -exec awk ‘FNR == 5′ {} \; | awk ‘{ sum += \$1 } END { print sum }’

5. It’s a very good exercise to solve those puzzles in different languages, although this last one is much much easier solved with unix tools (awk, grep, etc)

6. @tulsi: I have to agree with you, solving it with functional programming was harder than it seems.

7. java and bash implemetations:
http://bpgergo.blogspot.com/2008/05/treasure-hunt-2.html

8. @liesen: this didn’t work with paths.

9. I don’t know awk, but this bash script was simple enough to cook up with a little trial and error:

for str in zzz aaa; do for file in `find -regex “.*\$str.*\.js”`; do if [ `wc -l < \$file` -ge 5 ]; then head -n 5 \$file | tail -n 1 | tr “\n” ‘+’; fi; done; echo; done

(-iname is wrong here, we need full paths. Needless to say, the task for me was the fifth line of *zzz*.js and of *aaa*.js.) It gave the output

2594+9005+3949+2058+8393+
2594+1249+8496+4728+

I then copy-pasted that to

\$ echo \$(((2594+9005+3949+2058+8393)*(2594+1249+8496+4728)))

and got 443724933, which was correct. My first try was wrong, because I was processing files with less than 4 lines (in that run it was the fourth line for me), despite their explicit hint. head happily returned less than four lines and tail got the last of those, despite it not being the fourth line. But with the added conditional it works fine.

People doing this with a real programming language aren’t doing it right, IMO. 