Claudio Cherubino's blog Life of a Googler

25Jan/100

Spell checking with F#

Some months ago I have written an introductory article on F# for the "IoProgrammo" magazine (sorry, Italian only!) and now I have published a second article on the latest issue of the same magazine.

This new article covers more advanced topics and is focused on writing a basic spell checker that mixes together the functional and object-oriented programming paradigms.

The spell checking algorithm is implemented in functional F# and is based on the Jaro-Winkler similarity distance while the UI is WPF-based and written with OO code.

I hope you will appreciate the article and I'll be very happy to get any feedback from the readers.

IoProgrammo - February 2010

IoProgrammo - February 2010

26Aug/091

Functional programming interview question

I think that examining the hiring process of a company you can understand a lot of what would be working there.

As Joel Spolsky wrote, you should only hire people who are Smart and Get Things Done and a good way to be sure that a candidate belongs to this category is testing his/her skills with a good programming exercise, one easy enough to be solved in 15 minutes but that requires the use of brain.

Starling Software clearly describes its interview process on a page of its website and proposes a couple of sample programs for the potential applicants. In the first problem you are asked to use Haskell to process a given file (obviously we will use F#):

The file input.txt contains lists of words, one per line, in two categories, NUMBERS and ANIMALS. A line containing just a category name indicates that the words on the following lines, until the next category name, belong to that category. Read this file as input (on stdin) and print out a) a sorted list of the unique animal names encountered, and b) a list of the number words encountered, along with the count of each. Feel free to chose your output format.

The algorithm to solve the exercise is easy: we read the file line by line and remember the current category (NUMBERS or ANIMALS) in order to add the next words to the appropriate list. When the file is over, we filter duplicates and sort the list of animals and group the numbers together with their counts.

The only problem is knowing how to manage the concept of state of the application in a functional way. In the imperative paradigm you define a variable to keep the state and change its value when you find a new category in the input file. In functional programming you don't use state variables instead you use function parameters and recursive calls:

open System
open System.IO

let animals_and_number filename =
  let rec process_line lines category animals numbers =
    match lines with
    | [] -> (animals, numbers)
    | x::xs -> match x with
               | "NUMBERS" -> process_line xs "NUMBERS" animals numbers
               | "ANIMALS" -> process_line xs "ANIMALS" animals numbers
               | x -> match category with
                      | "NUMBERS" -> process_line xs category animals (x :: numbers)
                      | "ANIMALS" -> process_line xs category (x :: animals) numbers
                      | _ -> process_line xs category animals numbers
  let all_lines = File.ReadAllLines(filename) |> Seq.to_list
  process_line all_lines "" [] []

let filename = "input.txt"
let (animals, numbers) = animals_and_number filename
let sorted_animals = animals |> Seq.distinct |> Seq.sort |> Seq.to_list
let counted_words = numbers |> Seq.countBy (fun x -> x) |> Seq.to_list

printf "Animals: %A\n" sorted_animals
printf "Numbers: %A" counted_words

The recursive process_line function has four parameters: the list of lines to be processed, the current category (initially an empty string) and the two lists of animals and numbers found so far.

For each new line we first check if it represents one of the categories. In this case we have to change state, i.e. discard the element and recursively call the same function with the correct category parameter.

If the element processed is not a category we only have to add it to the animals or number list, according to the value of the category parameter.

At the end of the animals_and_number function (when lines is empty) we return a tuple made of the two lists created.
The rest of the job is calling some standard library functions to filter duplicates, sort and count the elements of the sequences.

21Jul/092

F# introductory article on IoProgrammo magazine

If you understand Italian, you may be interested in reading an introductory article on F# I wrote for the most important Italian programming magazine called "IoProgrammo" and that was published a few days ago in the August issue.

The article covers the very first steps with F#, from installation to writing a first working sample application, which can be used to encrypt/decrypt messages using the Vigenere cipher.

I hope you can find useful, if you have any comments I'll be glad to hear from you.

IoProgrammo - August 2009

IoProgrammo - August 2009

15Apr/091

Luhn algorithm in F#

The Luhn algorithm is a simple error detection formula based on the modulus operator which is widely used to validate credit card numbers or Canadian Social Insurance Numbers.

This checksum function can detect any single-digit error and operates verifying the number against its included check digit.

The algorithm is made of three steps:

1) starting from the rightmost digit and moving left, double the value of every second digit (or digits in even positions);
2) sum all resulting digits together with the original ones that were untouched;
3) if the result is a multiple of 10 (i.e. the result modulus 10 is zero) then the input value is valid.

This simple algorithm can be implemented with a few F# lines of code:

#light
open System

let double_digit n =
  let double = n * 2
  if (double > 9) then
    double - 9
  else
    double

let rec luhn_loop isEven acc input =
    match input with
    | [] -> acc % 10 = 0
    | x :: xs -> let num = Int32.Parse(x.ToString())
                 if (isEven) then
                   luhn_loop (not isEven) (acc + (double_digit num)) xs
                 else
                   luhn_loop (not isEven) (acc + num) xs

let luhn n = n.ToString() |> Seq.to_array |> Array.rev |> Seq.to_list |> luhn_loop false 0

I separated from the main loop the double_digit function, which takes a digit and multiplies it by two. If the result has two digits (i.e. it is greater than 9) with sum together the two digits (i.e. subtract 9 from the number), otherwise we simply return it.

The luhn_loop function iterates over the list of digit that is obtained by taking the original number, converting it into an array of digits and reversing it.
Inside each iteration we take into account a single digit, double it if it is placed in an even position and sum it to the accumulator.

When the list of digits to be examined is empty we are done with the loop and we just have to check if the value stored in the accumulator is evenly divisible by 10.

20Mar/092

Project Euler in F# – Problem 52

After a long break, let's resume with the Project Euler series.

The next one to be solved is Problem 52:

It can be seen that the number, 125874, and its double, 251748, contain exactly the same digits, but in a different order.

Find the smallest positive integer, x, such that 2x, 3x, 4x, 5x, and 6x, contain the same digits.

As usual, we have to break the problem into small pieces and solve these pieces one by one.

As in many other Project Euler problems we have to extract the digits of a number, and this is done by converting the number into a string (i.e. an array of characters) and then casting the characters to integers.

We also need a function to check if two numbers are made of the same digits. For each number we create a set from the sequence of its digits, in order to ignore the order of the items and discard duplicates. Then we apply the standard set comparison operator to get the result.

We have to multiply each positive integer by 2, 3, 4, 5 and 6, so we can define the multiples function, which takes an integer x and returns a list containing five elements, corresponding to 2x, 3x, 4x, 5x and 6x.

Thanks to these three functions, checking if a number and its multiples contain exactly the same digits is very easy. In fact, we generate the list of multiples and test all of them with the same_digits function.

The algorithm to get the answer to the problem is the following: generate an infinite list of integers (remember seq_unfold?), test each number with the check function and return the first value that passes the test.

#light

let digits n =
  n.ToString() |> Seq.map (string >> int)

let same_digits a b =
  Set.equal (Set.of_seq (digits a)) (Set.of_seq (digits b))

let multiples n =
  [2 .. 6] |> Seq.map (fun x -> n * x)

let check n =
  n |> multiples |> Seq.for_all (same_digits n)

let answer =
  1 |> Seq.unfold (fun i -> Some (i, i + 1)) |> Seq.first (fun x -> if check x then Some (x) else None)

There is also a well-known (and tricky) solution for this problem, which was presented in the book "The man who calculated". Did you know it?

Tagged as: , , , , 2 Comments
12Feb/092

Brainfuck interpreter in F#

Brainfuck is an esoteric programming language whose grammar consists of only eight commands, written as a single character each.

Besides this minimalistic structure, the language is Turing-complete but writing (and reading) Brainfuck code is very hard.

The Brainfuck machine uses a finite-length tape (usually 30000 byte cells long) and two pointers: an instruction pointer and a data pointer.

The former scans the source code and executes one instruction at a time, while the latter is used to increase or decrease the value of the cell that it is pointing.

This structure can be represented by the following F# type, called BFState:

type BFState =
  { mutable program : string ;        // program being interpreted
    mutable memory : int[] ;          // memory
    mutable pc : int ;                // current program counter
    mutable pos : int }               // current pointer position

We define then a function to initialize our Brainfuck machine given the size of the memory tape and the source code, that basically zeroes all memory cells and the two pointers:

let initState memSize code  =
  { program = code;
    memory = Array.zero_create memSize;
    pc = 0;
    pos = 0; }

We reach the end of the program when the program counter steps beyond the end of the code (i.e. the length of the string containing the source):

let isEnd (state : BFState) =
  state.pc >= state.program.Length

The following two functions are used to move the program counter one step towards the end or the beginning of the source code:

let nextCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos }

let previousCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos }

getMem and setMem are two auxiliary functions that can be used to retrieve or set the value of the memory cell pointed by the data pointer:

let getMem (state : BFState) =
  state.memory.[state.pos]

let setMem (state : BFState) value =
  state.memory.[state.pos] <- value
  state.memory

Similarly, we have two functions to read and write from the console, which is the behavior of the ',' and '.' commands respectively:

let outputByte (state : BFState) =
  Console.Write (Convert.ToChar(state.memory.[state.pos]))
  nextCommand state

let readByte (state : BFState) =
  state.memory.[state.pos] <- Console.Read()
  nextCommand state

When the command to perform is '[', if the byte at the data pointer is zero we have to jump to the first matching ']' character following the current position:

let rec moveToForwardMatch (state : BFState) =
  match state.program.[state.pc] with
  | ']' -> (nextCommand state)
  | _ -> moveToForwardMatch (nextCommand state)

The complementary command is ']', that goes back to the matching '[' character when the byte at the data pointer is non-zero:

let rec moveToPreviousMatch (state : BFState) =
  match state.program.[state.pc] with
  | '[' -> (nextCommand state)
  | _ -> moveToPreviousMatch (previousCommand state)

We have now all the tools required to parse Brainfuck code, so we only need to write the logic to select the action to perform according to the command analyzed in a single step.

Each command takes a BFState and returns a new BFState. Any character different from the eight allowed will be simply ignored:

let step (state : BFState) =
  match state.program.[state.pc] with
  | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1}
  | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1}
  | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state
  | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state
  | '.' -> outputByte state
  | ',' -> readByte state
  | _ -> nextCommand state // ignore any non-command character

The parser, which could be the only publicly available function, will take care of initializing the state and recursively step through the code, one instruction at a time, until the end of the source:

let parse memSize code =
  let rec run (state : BFState) =
    if (not (isEnd state)) then
       run (step state)
  initState memSize code |> run

You can find a repository of Brainfuck sample programs at this site: http://esoteric.sange.fi/brainfuck/bf-source/prog/, the following code contains the classic Hello World! and how to parse it using the interpreter we just wrote:

let code = ">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+."

parse 30000 code

The whole Brainfuck interpreter code can be found just below, feel free to take it and dissect it:

#light
open System

type BFState =
  { mutable program : string ;        // program being interpreted
    mutable memory : int[] ;          // memory
    mutable pc : int ;                // current program counter
    mutable pos : int }               // current pointer position

let initState memSize code  =
  { program = code;
    memory = Array.zero_create memSize;
    pc = 0;
    pos = 0; }

let isEnd (state : BFState) =
  state.pc >= state.program.Length

let nextCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos }

let previousCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos }

let getMem (state : BFState) =
  state.memory.[state.pos]

let setMem (state : BFState) value =
  state.memory.[state.pos] <- value
  state.memory

let rec moveToForwardMatch (state : BFState) =
  match state.program.[state.pc] with
  | ']' -> (nextCommand state)
  | _ -> moveToForwardMatch (nextCommand state)

let rec moveToPreviousMatch (state : BFState) =
  match state.program.[state.pc] with
  | '[' -> (nextCommand state)
  | _ -> moveToPreviousMatch (previousCommand state)

let outputByte (state : BFState) =
  Console.Write (Convert.ToChar(state.memory.[state.pos]))
  nextCommand state

let readByte (state : BFState) =
  state.memory.[state.pos] <- Console.Read()
  nextCommand state

let step (state : BFState) =
  match state.program.[state.pc] with
  | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1}
  | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1}
  | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state
  | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state
  | '.' -> outputByte state
  | ',' -> readByte state
  | _ -> nextCommand state // ignore any non-command character

let parse memSize code =
  let rec run (state : BFState) =
    if (not (isEnd state)) then
       run (step state)
  initState memSize code |> run
23Jan/093

Facebook FizzBuzz

Have you ever tried looking for "FizzBuzz" on a search engine?

If you do, you'll surely land on this page on Coding Horror, the blog written by Jeff Atwood.

To make a long story short, Jeff states that the vast majority of the developers is unable to write a tiny program that should take no more than 10 minutes to code.

This is the famous FizzBuzz problem:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

You should be outraged now, if you aren't I hope you don't make a living as a developer! :)

I found out that Facebook also considers FizzBuzz as a good starting test for job applicants. There is a programming puzzles page, where you can find a set of good problems, ranging from blindingly easy to very hard.

The first one is called HoppityHop! and it is just a variation of the good old FizzBuzz:

The program should iterate over all integers (inclusive) from 1 to the number expressed by the input file. For example, if the file contained the number 10, the submission should iterate over 1 through 10. At each integer value in this range, the program may possibly (based upon the following rules) output a single string terminating with a newline.

* For integers that are evenly divisible by three, output the exact string Hoppity, followed by a newline.
* For integers that are evenly divisible by five, output the exact string Hophop, followed by a newline.
* For integers that are evenly divisble by both three and five, do not do any of the above, but instead output the exact string Hop, followed by a newline.

Being a developer myself I couldn't resist writing a solution for this problem, obviously in F#:

#light

let HoppityHop n =
  let printHop x =
    match x with
    | x when (x % 15 = 0)-> printfn "Hop"
    | x when (x % 3 = 0) -> printfn "Hoppity"
    | x when (x % 5 = 0) -> printfn "Hophop"
    | _ -> ()
  [1 .. n] |> Seq.iter printHop

I know many of you will feel the urge to suggest a better solution, you're welcome, the comment area is yours to use!

28Dec/080

Random testing in F# with FsCheck

One of the emerging trends in software development is TDD or Test-Driven Development, a methodology based on writing tests first and then coding in order to pass the tests.

Besides unit testing libraries inherited by the .Net Framework, F# can now count on FsCheck, a random testing framework cloned from Haskell's QuickCheck.

A random testing library generates a set of test cases and attempts to falsify the properties defined by the developer.

This approach is not exhaustive but if all the tests of large enough test suite are passed we can safely assume that the code is correct.

Let's see how to get started using FsCheck to validate the RSA implementation written some time ago.

The first step is to download the latest release (currently 0.3) of FsCheck from the Download page.

You can either get the binaries or the source code, that also contains an example console application.

Assuming that you downloaded the binaries, you have then to open/create an F# application and add a Reference (Project - Add Reference) to the FsCheck.dll library file:

Project referencing FsCheck library

Project referencing FsCheck library

In order to use the methods provided by FsCheck we have to include its namescope into our code by adding a open FsCheck clause at the beginning of our program.

For the sake of example, let's fix the two distinct random prime numbers p and q and the public exponent e.

We have then to define our first property, that I'm going to call prop_rsa, which basically asserts that decrypting a message that was previously encrypted we get back the original message.

In order to run a property prop_myproperty we just have to run quickCheck myproperty, so in this case we'll run quickCheck prop_rsa.

#light
open System
open FsCheck

// RSA sample data
let p = 61
let q = 53
let e = 17
let n = p * q
let d = private_exponent e p q

let prop_rsa message =
  let encrypted = encrypt message e n
  decrypt encrypted d n = message

quickCheck prop_rsa

Console.ReadKey() |> ignore

If everything goes well, we should see a console window like the following one, with the number of tests passed (by default, FsCheck generates 100 test cases).

FsCheck shows the number of passed tests

FsCheck shows the number of passed tests

If a test fails, FsCheck will stop the execution and show which test failed. If you want to see all the generated test cases, you can run verboseCheck instead of quickCheck.

The next step should be writing a custom generator in order to generate not only the message but also the prime numbers and the public exponent, but I think we can cover that in a future post.

You can download the complete solution here.

17Dec/081

Project Euler in F# – Problem 55

As in the last post, in the description of Project Euler problem 55 there is a detailed step-by-step solution of the problem itself and we just have to write the F# code to perform the tasks.

The only thing that I really had to understand is the definition of Lychrel numbers:

If we take 47, reverse and add, 47 + 74 = 121, which is palindromic.

Not all numbers produce palindromes so quickly. For example,

349 + 943 = 1292,
1292 + 2921 = 4213
4213 + 3124 = 7337

That is, 349 took three iterations to arrive at a palindrome.

Although no one has proved it yet, it is thought that some numbers, like 196, never produce a palindrome. A number that never forms a palindrome through the reverse and add process is called a Lychrel number. Due to the theoretical nature of these numbers, and for the purpose of this problem, we shall assume that a number is Lychrel until proven otherwise. In addition you are given that for every number below ten-thousand, it will either (i) become a palindrome in less than fifty iterations, or, (ii) no one, with all the computing power that exists, has managed so far to map it to a palindrome. In fact, 10677 is the first number to be shown to require over fifty iterations before producing a palindrome: 4668731596684224866951378664 (53 iterations, 28-digits).

Surprisingly, there are palindromic numbers that are themselves Lychrel numbers; the first example is 4994.

How many Lychrel numbers are there below ten-thousand?

As usual, it is suggested to break down the problem into little pieces that can be easily solved with a few lines of code each.

First, we need a function to reverse an integer and another one to check whether a number is palindromic, i.e. it is equal to its reverse.

Then we define a function that will be used during each iteration to take a number and sum it with its reverse. This new number is the one that will be checked next for being a palindrome inside the isLychrel function.

If we found a palindrome then the number under testing is not a Lychrel, otherwise we can start the next iteration up to a maximum of 50 times. If we hit this limit then we can conclude the number is a Lychrel.

We have now all the pieces required to answer the original question. Just take all numbers smaller then 10000, filter the Lychrel ones and count them:

#light

let reverse n =
  n.ToString().ToCharArray()
  |> Array.rev
  |> Array.map (fun x -> x.ToString())
  |> Array.reduce_left (^)
  |> bigint.Parse

let isPalindromic n =
  n = reverse n

let reverseAndAdd n =
  n + reverse n

let isLychrel n =
  let rec isLychrelIter n i =
    match i with
    | 50 -> true
    | _ ->
      let r = reverseAndAdd n
      if isPalindromic r then
        false
      else
        isLychrelIter r (i+1)
  isLychrelIter n 1

let answer = [1I .. 10000I]
             |> Seq.filter (fun x -> isLychrel x)
             |> Seq.length
7Dec/081

Project Euler in F# – Problem 53

Some of the problems proposed by Project Euler actually present the solution together with the description of the problem itself.

This is the case with Problem 53:

There are exactly ten ways of selecting three from five, 12345:

123, 124, 125, 134, 135, 145, 234, 235, 245, and 345

In combinatorics, we use the notation, ^(5)C_(3) = 10.

In general,
^(n)C_(r) =
n!

r!(n?r)!
,where r ? n, n! = n×(n?1)×...×3×2×1, and 0! = 1.

It is not until n = 23, that a value exceeds one-million: ^(23)C_(10) = 1144066.

How many, not necessarily distinct, values of ^(n)C_(r), for 1 ? n ? 100, are greater than one-million?

The only knowledge required in this problem is the binomial coefficient, but its formula is also described in the text above, so we can simply generate all couples (n, r) and check if the value of the binomial coefficient is greater than one million.

In F# this is equivalent to coding a function for the binomial coefficient and testing the items of a sequence:

#light
open Microsoft.FSharp.Math

let binomial_coefficient n k = BigInt.Factorial n /
                               (BigInt.Factorial k * BigInt.Factorial (n-k))

let answer =
  seq { for n in 1I .. 100I do
          for k in 1I .. n -> n,k } |> Seq.filter (fun (a,b) -> binomial_coefficient a b > 1000000I ) |> Seq.length

The approach adopted is almost the same that I used to solve Project Euler problem number 9. It is not the smartest nor the quickest since it is essentially a brute-force algorithm, but without any doubt it is the easiest to implement.

Which optimizations would you consider?