Claudio Cherubino's blog Life of a Googler

25Jan/100

Spell checking with F#

Some months ago I have written an introductory article on F# for the "IoProgrammo" magazine (sorry, Italian only!) and now I have published a second article on the latest issue of the same magazine.

This new article covers more advanced topics and is focused on writing a basic spell checker that mixes together the functional and object-oriented programming paradigms.

The spell checking algorithm is implemented in functional F# and is based on the Jaro-Winkler similarity distance while the UI is WPF-based and written with OO code.

I hope you will appreciate the article and I'll be very happy to get any feedback from the readers.

IoProgrammo - February 2010

IoProgrammo - February 2010

21Jul/092

F# introductory article on IoProgrammo magazine

If you understand Italian, you may be interested in reading an introductory article on F# I wrote for the most important Italian programming magazine called "IoProgrammo" and that was published a few days ago in the August issue.

The article covers the very first steps with F#, from installation to writing a first working sample application, which can be used to encrypt/decrypt messages using the Vigenere cipher.

I hope you can find useful, if you have any comments I'll be glad to hear from you.

IoProgrammo - August 2009

IoProgrammo - August 2009

21Jul/090

Il mio articolo su IoProgrammo

E' uscito da pochi giorni il numero di Agosto di IoProgrammo, la rivista di informatica più importante in Italia, e al suo interno, per la prima volta, è possibile leggere un articolo del sottoscritto dedicato ad F#, il nuovo linguaggio di programmazione funzionale di Microsoft.

L'articolo è lungo 5 pagine ed ha un taglio introduttivo, allo scopo di diffondere il paradigma di programmazione funzionale. Non manca comunque una prima applicazione di esempio, che si occupa di cifrare e decifrare dei messaggi sfruttando il Cifrario di Vigenere.

L'argomento è sicuramente attuale ed interessante, tanto che sto già lavorando ad un secondo articolo che tratti dei concetti più avanzati.
Mi auguro che lo possiate apprezzare e sarò lieto di ricevere i vostri commenti.

IoProgrammo - Agosto 2009

IoProgrammo - Agosto 2009

22May/095

Project Euler in F# – Problem 28

Sometimes, when I find a problem like Project Euler problem 28, I remember that human beings are smarter than computers. :)

Let's read the problem statement, it doesn't seem easy at first glance:

Starting with the number 1 and moving to the right in a clockwise direction a 5 by 5 spiral is formed as follows:

21 22 23 24 25
20  7  8  9 10
19  6  1  2 11
18  5  4  3 12
17 16 15 14 13

It can be verified that the sum of both diagonals is 101.

What is the sum of both diagonals in a 1001 by 1001 spiral formed in the same way?

If we don't think of anything creative, we are forced to write an algorithm to draw that 1001x1001 number spiral and then sum both diagonals together.

However, we can start noticing that the number in the top-right position is always n^2, where n is the length of one side of the spiral. In the example, the spiral is 5 by 5 and the top-right value is 25. In a 3 by 3 spiral it would be 9, 49 in a 7 by 7 spiral and so on.

If we concentrate on the outer ring of the spiral, then the inner sum can be obtained by recursion, so we only need to find a trick for the top-left, bottom-left and bottom-right numbers.

They are 21, 17 and 13 in a 5x5 spiral, so it seems like they can be computed starting from the top-right number and repeatedly subtracting 4 (i.e. n - 1).

Let's check this theory with the 3x3 spiral: the top-right number is 9, the other corners are 7, 5 and 3. We are indeed subtracting 2 (i.e. n - 1) each time!

Let's summarize: given a n x n spiral we have to sum n^2, n^2 - (n-1), n^2 - 2(n-1) and n^2 - 3(n-1).

Some elementary math and we get 4n^2 - 6n + 6 for each ring of the spiral.
The base case of the recursion is the 1x1 spiral and in that case the (degenerate) diagonal sum is 1.

Given these premises it is very easy to write the F# code to implement the algorithm:

let rec euler28 n =
  match n with
  | 1 -> 1
  | _ -> euler28 (n-2) + (4 * n * n) - (6 * n) + 6

An improvement of the code above would be using tail recursion, which means that the last operation of the function should be a recursive call. This leads to a reduction of the stack space used from O(n) to O(1) and let us avoid stack overflow exceptions:

let tail_euler28 n =
  let rec spiral n sum =
    match n with
    | 1 -> sum + 1
    | _ -> spiral (n - 2) (sum + (4 * n * n) - (6 * n) + 6)
  spiral n 0

As it is usually done in tail recursion, we define an accumulator parameter (sum in the example) which is used to store the partial result of the computation, eliminating the need to keep the state on the stack.

In both solutions I've left a bug that can lead to an infinite loop. Can you spot it?

15Apr/091

Luhn algorithm in F#

The Luhn algorithm is a simple error detection formula based on the modulus operator which is widely used to validate credit card numbers or Canadian Social Insurance Numbers.

This checksum function can detect any single-digit error and operates verifying the number against its included check digit.

The algorithm is made of three steps:

1) starting from the rightmost digit and moving left, double the value of every second digit (or digits in even positions);
2) sum all resulting digits together with the original ones that were untouched;
3) if the result is a multiple of 10 (i.e. the result modulus 10 is zero) then the input value is valid.

This simple algorithm can be implemented with a few F# lines of code:

#light
open System

let double_digit n =
  let double = n * 2
  if (double > 9) then
    double - 9
  else
    double

let rec luhn_loop isEven acc input =
    match input with
    | [] -> acc % 10 = 0
    | x :: xs -> let num = Int32.Parse(x.ToString())
                 if (isEven) then
                   luhn_loop (not isEven) (acc + (double_digit num)) xs
                 else
                   luhn_loop (not isEven) (acc + num) xs

let luhn n = n.ToString() |> Seq.to_array |> Array.rev |> Seq.to_list |> luhn_loop false 0

I separated from the main loop the double_digit function, which takes a digit and multiplies it by two. If the result has two digits (i.e. it is greater than 9) with sum together the two digits (i.e. subtract 9 from the number), otherwise we simply return it.

The luhn_loop function iterates over the list of digit that is obtained by taking the original number, converting it into an array of digits and reversing it.
Inside each iteration we take into account a single digit, double it if it is placed in an even position and sum it to the accumulator.

When the list of digits to be examined is empty we are done with the loop and we just have to check if the value stored in the accumulator is evenly divisible by 10.

20Mar/092

Project Euler in F# – Problem 52

After a long break, let's resume with the Project Euler series.

The next one to be solved is Problem 52:

It can be seen that the number, 125874, and its double, 251748, contain exactly the same digits, but in a different order.

Find the smallest positive integer, x, such that 2x, 3x, 4x, 5x, and 6x, contain the same digits.

As usual, we have to break the problem into small pieces and solve these pieces one by one.

As in many other Project Euler problems we have to extract the digits of a number, and this is done by converting the number into a string (i.e. an array of characters) and then casting the characters to integers.

We also need a function to check if two numbers are made of the same digits. For each number we create a set from the sequence of its digits, in order to ignore the order of the items and discard duplicates. Then we apply the standard set comparison operator to get the result.

We have to multiply each positive integer by 2, 3, 4, 5 and 6, so we can define the multiples function, which takes an integer x and returns a list containing five elements, corresponding to 2x, 3x, 4x, 5x and 6x.

Thanks to these three functions, checking if a number and its multiples contain exactly the same digits is very easy. In fact, we generate the list of multiples and test all of them with the same_digits function.

The algorithm to get the answer to the problem is the following: generate an infinite list of integers (remember seq_unfold?), test each number with the check function and return the first value that passes the test.

#light

let digits n =
  n.ToString() |> Seq.map (string >> int)

let same_digits a b =
  Set.equal (Set.of_seq (digits a)) (Set.of_seq (digits b))

let multiples n =
  [2 .. 6] |> Seq.map (fun x -> n * x)

let check n =
  n |> multiples |> Seq.for_all (same_digits n)

let answer =
  1 |> Seq.unfold (fun i -> Some (i, i + 1)) |> Seq.first (fun x -> if check x then Some (x) else None)

There is also a well-known (and tricky) solution for this problem, which was presented in the book "The man who calculated". Did you know it?

Tagged as: , , , , 2 Comments
12Feb/092

Brainfuck interpreter in F#

Brainfuck is an esoteric programming language whose grammar consists of only eight commands, written as a single character each.

Besides this minimalistic structure, the language is Turing-complete but writing (and reading) Brainfuck code is very hard.

The Brainfuck machine uses a finite-length tape (usually 30000 byte cells long) and two pointers: an instruction pointer and a data pointer.

The former scans the source code and executes one instruction at a time, while the latter is used to increase or decrease the value of the cell that it is pointing.

This structure can be represented by the following F# type, called BFState:

type BFState =
  { mutable program : string ;        // program being interpreted
    mutable memory : int[] ;          // memory
    mutable pc : int ;                // current program counter
    mutable pos : int }               // current pointer position

We define then a function to initialize our Brainfuck machine given the size of the memory tape and the source code, that basically zeroes all memory cells and the two pointers:

let initState memSize code  =
  { program = code;
    memory = Array.zero_create memSize;
    pc = 0;
    pos = 0; }

We reach the end of the program when the program counter steps beyond the end of the code (i.e. the length of the string containing the source):

let isEnd (state : BFState) =
  state.pc >= state.program.Length

The following two functions are used to move the program counter one step towards the end or the beginning of the source code:

let nextCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos }

let previousCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos }

getMem and setMem are two auxiliary functions that can be used to retrieve or set the value of the memory cell pointed by the data pointer:

let getMem (state : BFState) =
  state.memory.[state.pos]

let setMem (state : BFState) value =
  state.memory.[state.pos] <- value
  state.memory

Similarly, we have two functions to read and write from the console, which is the behavior of the ',' and '.' commands respectively:

let outputByte (state : BFState) =
  Console.Write (Convert.ToChar(state.memory.[state.pos]))
  nextCommand state

let readByte (state : BFState) =
  state.memory.[state.pos] <- Console.Read()
  nextCommand state

When the command to perform is '[', if the byte at the data pointer is zero we have to jump to the first matching ']' character following the current position:

let rec moveToForwardMatch (state : BFState) =
  match state.program.[state.pc] with
  | ']' -> (nextCommand state)
  | _ -> moveToForwardMatch (nextCommand state)

The complementary command is ']', that goes back to the matching '[' character when the byte at the data pointer is non-zero:

let rec moveToPreviousMatch (state : BFState) =
  match state.program.[state.pc] with
  | '[' -> (nextCommand state)
  | _ -> moveToPreviousMatch (previousCommand state)

We have now all the tools required to parse Brainfuck code, so we only need to write the logic to select the action to perform according to the command analyzed in a single step.

Each command takes a BFState and returns a new BFState. Any character different from the eight allowed will be simply ignored:

let step (state : BFState) =
  match state.program.[state.pc] with
  | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1}
  | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1}
  | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state
  | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state
  | '.' -> outputByte state
  | ',' -> readByte state
  | _ -> nextCommand state // ignore any non-command character

The parser, which could be the only publicly available function, will take care of initializing the state and recursively step through the code, one instruction at a time, until the end of the source:

let parse memSize code =
  let rec run (state : BFState) =
    if (not (isEnd state)) then
       run (step state)
  initState memSize code |> run

You can find a repository of Brainfuck sample programs at this site: http://esoteric.sange.fi/brainfuck/bf-source/prog/, the following code contains the classic Hello World! and how to parse it using the interpreter we just wrote:

let code = ">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+."

parse 30000 code

The whole Brainfuck interpreter code can be found just below, feel free to take it and dissect it:

#light
open System

type BFState =
  { mutable program : string ;        // program being interpreted
    mutable memory : int[] ;          // memory
    mutable pc : int ;                // current program counter
    mutable pos : int }               // current pointer position

let initState memSize code  =
  { program = code;
    memory = Array.zero_create memSize;
    pc = 0;
    pos = 0; }

let isEnd (state : BFState) =
  state.pc >= state.program.Length

let nextCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos }

let previousCommand (state : BFState) =
  { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos }

let getMem (state : BFState) =
  state.memory.[state.pos]

let setMem (state : BFState) value =
  state.memory.[state.pos] <- value
  state.memory

let rec moveToForwardMatch (state : BFState) =
  match state.program.[state.pc] with
  | ']' -> (nextCommand state)
  | _ -> moveToForwardMatch (nextCommand state)

let rec moveToPreviousMatch (state : BFState) =
  match state.program.[state.pc] with
  | '[' -> (nextCommand state)
  | _ -> moveToPreviousMatch (previousCommand state)

let outputByte (state : BFState) =
  Console.Write (Convert.ToChar(state.memory.[state.pos]))
  nextCommand state

let readByte (state : BFState) =
  state.memory.[state.pos] <- Console.Read()
  nextCommand state

let step (state : BFState) =
  match state.program.[state.pc] with
  | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos }
  | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1}
  | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1}
  | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state
  | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state
  | '.' -> outputByte state
  | ',' -> readByte state
  | _ -> nextCommand state // ignore any non-command character

let parse memSize code =
  let rec run (state : BFState) =
    if (not (isEnd state)) then
       run (step state)
  initState memSize code |> run
23Jan/093

Facebook FizzBuzz

Have you ever tried looking for "FizzBuzz" on a search engine?

If you do, you'll surely land on this page on Coding Horror, the blog written by Jeff Atwood.

To make a long story short, Jeff states that the vast majority of the developers is unable to write a tiny program that should take no more than 10 minutes to code.

This is the famous FizzBuzz problem:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

You should be outraged now, if you aren't I hope you don't make a living as a developer! :)

I found out that Facebook also considers FizzBuzz as a good starting test for job applicants. There is a programming puzzles page, where you can find a set of good problems, ranging from blindingly easy to very hard.

The first one is called HoppityHop! and it is just a variation of the good old FizzBuzz:

The program should iterate over all integers (inclusive) from 1 to the number expressed by the input file. For example, if the file contained the number 10, the submission should iterate over 1 through 10. At each integer value in this range, the program may possibly (based upon the following rules) output a single string terminating with a newline.

* For integers that are evenly divisible by three, output the exact string Hoppity, followed by a newline.
* For integers that are evenly divisible by five, output the exact string Hophop, followed by a newline.
* For integers that are evenly divisble by both three and five, do not do any of the above, but instead output the exact string Hop, followed by a newline.

Being a developer myself I couldn't resist writing a solution for this problem, obviously in F#:

#light

let HoppityHop n =
  let printHop x =
    match x with
    | x when (x % 15 = 0)-> printfn "Hop"
    | x when (x % 3 = 0) -> printfn "Hoppity"
    | x when (x % 5 = 0) -> printfn "Hophop"
    | _ -> ()
  [1 .. n] |> Seq.iter printHop

I know many of you will feel the urge to suggest a better solution, you're welcome, the comment area is yours to use!

28Dec/080

Random testing in F# with FsCheck

One of the emerging trends in software development is TDD or Test-Driven Development, a methodology based on writing tests first and then coding in order to pass the tests.

Besides unit testing libraries inherited by the .Net Framework, F# can now count on FsCheck, a random testing framework cloned from Haskell's QuickCheck.

A random testing library generates a set of test cases and attempts to falsify the properties defined by the developer.

This approach is not exhaustive but if all the tests of large enough test suite are passed we can safely assume that the code is correct.

Let's see how to get started using FsCheck to validate the RSA implementation written some time ago.

The first step is to download the latest release (currently 0.3) of FsCheck from the Download page.

You can either get the binaries or the source code, that also contains an example console application.

Assuming that you downloaded the binaries, you have then to open/create an F# application and add a Reference (Project - Add Reference) to the FsCheck.dll library file:

Project referencing FsCheck library

Project referencing FsCheck library

In order to use the methods provided by FsCheck we have to include its namescope into our code by adding a open FsCheck clause at the beginning of our program.

For the sake of example, let's fix the two distinct random prime numbers p and q and the public exponent e.

We have then to define our first property, that I'm going to call prop_rsa, which basically asserts that decrypting a message that was previously encrypted we get back the original message.

In order to run a property prop_myproperty we just have to run quickCheck myproperty, so in this case we'll run quickCheck prop_rsa.

#light
open System
open FsCheck

// RSA sample data
let p = 61
let q = 53
let e = 17
let n = p * q
let d = private_exponent e p q

let prop_rsa message =
  let encrypted = encrypt message e n
  decrypt encrypted d n = message

quickCheck prop_rsa

Console.ReadKey() |> ignore

If everything goes well, we should see a console window like the following one, with the number of tests passed (by default, FsCheck generates 100 test cases).

FsCheck shows the number of passed tests

FsCheck shows the number of passed tests

If a test fails, FsCheck will stop the execution and show which test failed. If you want to see all the generated test cases, you can run verboseCheck instead of quickCheck.

The next step should be writing a custom generator in order to generate not only the message but also the prime numbers and the public exponent, but I think we can cover that in a future post.

You can download the complete solution here.

17Dec/081

Project Euler in F# – Problem 55

As in the last post, in the description of Project Euler problem 55 there is a detailed step-by-step solution of the problem itself and we just have to write the F# code to perform the tasks.

The only thing that I really had to understand is the definition of Lychrel numbers:

If we take 47, reverse and add, 47 + 74 = 121, which is palindromic.

Not all numbers produce palindromes so quickly. For example,

349 + 943 = 1292,
1292 + 2921 = 4213
4213 + 3124 = 7337

That is, 349 took three iterations to arrive at a palindrome.

Although no one has proved it yet, it is thought that some numbers, like 196, never produce a palindrome. A number that never forms a palindrome through the reverse and add process is called a Lychrel number. Due to the theoretical nature of these numbers, and for the purpose of this problem, we shall assume that a number is Lychrel until proven otherwise. In addition you are given that for every number below ten-thousand, it will either (i) become a palindrome in less than fifty iterations, or, (ii) no one, with all the computing power that exists, has managed so far to map it to a palindrome. In fact, 10677 is the first number to be shown to require over fifty iterations before producing a palindrome: 4668731596684224866951378664 (53 iterations, 28-digits).

Surprisingly, there are palindromic numbers that are themselves Lychrel numbers; the first example is 4994.

How many Lychrel numbers are there below ten-thousand?

As usual, it is suggested to break down the problem into little pieces that can be easily solved with a few lines of code each.

First, we need a function to reverse an integer and another one to check whether a number is palindromic, i.e. it is equal to its reverse.

Then we define a function that will be used during each iteration to take a number and sum it with its reverse. This new number is the one that will be checked next for being a palindrome inside the isLychrel function.

If we found a palindrome then the number under testing is not a Lychrel, otherwise we can start the next iteration up to a maximum of 50 times. If we hit this limit then we can conclude the number is a Lychrel.

We have now all the pieces required to answer the original question. Just take all numbers smaller then 10000, filter the Lychrel ones and count them:

#light

let reverse n =
  n.ToString().ToCharArray()
  |> Array.rev
  |> Array.map (fun x -> x.ToString())
  |> Array.reduce_left (^)
  |> bigint.Parse

let isPalindromic n =
  n = reverse n

let reverseAndAdd n =
  n + reverse n

let isLychrel n =
  let rec isLychrelIter n i =
    match i with
    | 50 -> true
    | _ ->
      let r = reverseAndAdd n
      if isPalindromic r then
        false
      else
        isLychrelIter r (i+1)
  isLychrelIter n 1

let answer = [1I .. 10000I]
             |> Seq.filter (fun x -> isLychrel x)
             |> Seq.length