## Spell checking with F#

Some months ago I have written an introductory article on F# for the "IoProgrammo" magazine (sorry, Italian only!) and now I have published a second article on the latest issue of the same magazine.

This new article covers more advanced topics and is focused on writing a basic spell checker that mixes together the functional and object-oriented programming paradigms.

The spell checking algorithm is implemented in functional F# and is based on the Jaro-Winkler similarity distance while the UI is WPF-based and written with OO code.

I hope you will appreciate the article and I'll be very happy to get any feedback from the readers.

## Functional programming interview question

I think that examining the hiring process of a company you can understand a lot of what would be working there.

As *Joel Spolsky* wrote, you should only hire people who are *Smart and Get Things Done* and a good way to be sure that a candidate belongs to this category is testing his/her skills with a good programming exercise, one easy enough to be solved in 15 minutes but that requires the use of brain.

Starling Software clearly describes its interview process on a page of its website and proposes a couple of sample programs for the potential applicants. In the first problem you are asked to use Haskell to process a given file (obviously we will use F#):

The file input.txt contains lists of words, one per line, in two categories, NUMBERS and ANIMALS. A line containing just a category name indicates that the words on the following lines, until the next category name, belong to that category. Read this file as input (on stdin) and print out a) a sorted list of the unique animal names encountered, and b) a list of the number words encountered, along with the count of each. Feel free to chose your output format.

The algorithm to solve the exercise is easy: we read the file line by line and remember the current category (NUMBERS or ANIMALS) in order to add the next words to the appropriate list. When the file is over, we filter duplicates and sort the list of animals and group the numbers together with their counts.

The only problem is knowing how to manage the concept of *state* of the application in a functional way. In the imperative paradigm you define a variable to keep the state and change its value when you find a new category in the input file. In functional programming you don't use state variables instead you use function parameters and recursive calls:

open System open System.IO let animals_and_number filename = let rec process_line lines category animals numbers = match lines with | [] -> (animals, numbers) | x::xs -> match x with | "NUMBERS" -> process_line xs "NUMBERS" animals numbers | "ANIMALS" -> process_line xs "ANIMALS" animals numbers | x -> match category with | "NUMBERS" -> process_line xs category animals (x :: numbers) | "ANIMALS" -> process_line xs category (x :: animals) numbers | _ -> process_line xs category animals numbers let all_lines = File.ReadAllLines(filename) |> Seq.to_list process_line all_lines "" [] [] let filename = "input.txt" let (animals, numbers) = animals_and_number filename let sorted_animals = animals |> Seq.distinct |> Seq.sort |> Seq.to_list let counted_words = numbers |> Seq.countBy (fun x -> x) |> Seq.to_list printf "Animals: %A\n" sorted_animals printf "Numbers: %A" counted_words

The recursive *process_line* function has four parameters: the list of lines to be processed, the current category (initially an empty string) and the two lists of animals and numbers found so far.

For each new line we first check if it represents one of the categories. In this case we have to *change state*, i.e. discard the element and recursively call the same function with the correct category parameter.

If the element processed is not a category we only have to add it to the animals or number list, according to the value of the category parameter.

At the end of the *animals_and_number* function (when *lines* is empty) we return a tuple made of the two lists created.

The rest of the job is calling some standard library functions to filter duplicates, sort and count the elements of the sequences.

## F# introductory article on IoProgrammo magazine

If you understand Italian, you may be interested in reading an introductory article on F# I wrote for the most important Italian programming magazine called "*IoProgrammo*" and that was published a few days ago in the August issue.

The article covers the very first steps with F#, from installation to writing a first working sample application, which can be used to encrypt/decrypt messages using the Vigenere cipher.

I hope you can find useful, if you have any comments I'll be glad to hear from you.

## Merging arrays

Thanks to interviewpattern.com I discovered that one of the classical Amazon interview questions is writing a snippet of code to merge two sorted arrays:

"Suppose we have two sorted arrays A[] of m elements and B[] of n elements. Write a function merge which would merge this two arrays into new sorted array C[] in O(n) time as shown on the picture".

This problem is also a classical exercise for functional programming learners that shows the conciseness of functional code in comparison with imperative one.

The solution presented on the original page is written in C# and is longer than 40 lines of code, while we can solve the same problem in F# with less than 10 lines:

let rec merge_arrays a b = match (a, b) with | (a, []) -> a | ([], b) -> b | (x::xs, y::ys) -> if (x < y) then (x :: (merge_arrays xs (y::ys))) else (y :: (merge_arrays (x::xs) ys))

Besides being shorter, I also find the functional code to be much easier to understand. Do you agree with me?

## Luhn algorithm in F#

The Luhn algorithm is a simple error detection formula based on the modulus operator which is widely used to validate credit card numbers or Canadian Social Insurance Numbers.

This checksum function can detect any single-digit error and operates verifying the number against its included check digit.

The algorithm is made of three steps:

1) starting from the rightmost digit and moving left, double the value of every second digit (or digits in even positions);

2) sum all resulting digits together with the original ones that were untouched;

3) if the result is a multiple of 10 (i.e. the result modulus 10 is zero) then the input value is valid.

This simple algorithm can be implemented with a few F# lines of code:

#light open System let double_digit n = let double = n * 2 if (double > 9) then double - 9 else double let rec luhn_loop isEven acc input = match input with | [] -> acc % 10 = 0 | x :: xs -> let num = Int32.Parse(x.ToString()) if (isEven) then luhn_loop (not isEven) (acc + (double_digit num)) xs else luhn_loop (not isEven) (acc + num) xs let luhn n = n.ToString() |> Seq.to_array |> Array.rev |> Seq.to_list |> luhn_loop false 0

I separated from the main loop the *double_digit* function, which takes a digit and multiplies it by two. If the result has two digits (i.e. it is greater than 9) with sum together the two digits (i.e. subtract 9 from the number), otherwise we simply return it.

The *luhn_loop* function iterates over the list of digit that is obtained by taking the original number, converting it into an array of digits and reversing it.

Inside each iteration we take into account a single digit, double it if it is placed in an even position and sum it to the accumulator.

When the list of digits to be examined is empty we are done with the loop and we just have to check if the value stored in the accumulator is evenly divisible by 10.

## Brainfuck interpreter in F#

Brainfuck is an *esoteric programming language* whose grammar consists of only eight commands, written as a single character each.

Besides this minimalistic structure, the language is *Turing-complete* but writing (and reading) Brainfuck code is very hard.

The Brainfuck machine uses a finite-length tape (usually 30000 byte cells long) and two pointers: an instruction pointer and a data pointer.

The former scans the source code and executes one instruction at a time, while the latter is used to increase or decrease the value of the cell that it is pointing.

This structure can be represented by the following F# type, called *BFState*:

type BFState = { mutable program : string ; // program being interpreted mutable memory : int[] ; // memory mutable pc : int ; // current program counter mutable pos : int } // current pointer position

We define then a function to initialize our Brainfuck machine given the size of the memory tape and the source code, that basically zeroes all memory cells and the two pointers:

let initState memSize code = { program = code; memory = Array.zero_create memSize; pc = 0; pos = 0; }

We reach the end of the program when the program counter steps beyond the end of the code (i.e. the length of the string containing the source):

let isEnd (state : BFState) = state.pc >= state.program.Length

The following two functions are used to move the program counter one step towards the end or the beginning of the source code:

let nextCommand (state : BFState) = { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos } let previousCommand (state : BFState) = { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos }

*getMem* and *setMem* are two auxiliary functions that can be used to retrieve or set the value of the memory cell pointed by the data pointer:

let getMem (state : BFState) = state.memory.[state.pos] let setMem (state : BFState) value = state.memory.[state.pos] <- value state.memory

Similarly, we have two functions to read and write from the console, which is the behavior of the ',' and '.' commands respectively:

let outputByte (state : BFState) = Console.Write (Convert.ToChar(state.memory.[state.pos])) nextCommand state let readByte (state : BFState) = state.memory.[state.pos] <- Console.Read() nextCommand state

When the command to perform is '[', if the byte at the data pointer is zero we have to jump to the first matching ']' character following the current position:

let rec moveToForwardMatch (state : BFState) = match state.program.[state.pc] with | ']' -> (nextCommand state) | _ -> moveToForwardMatch (nextCommand state)

The complementary command is ']', that goes back to the matching '[' character when the byte at the data pointer is non-zero:

let rec moveToPreviousMatch (state : BFState) = match state.program.[state.pc] with | '[' -> (nextCommand state) | _ -> moveToPreviousMatch (previousCommand state)

We have now all the tools required to parse Brainfuck code, so we only need to write the logic to select the action to perform according to the command analyzed in a single step.

Each command takes a *BFState* and returns a new *BFState*. Any character different from the eight allowed will be simply ignored:

let step (state : BFState) = match state.program.[state.pc] with | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos } | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos } | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1} | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1} | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state | '.' -> outputByte state | ',' -> readByte state | _ -> nextCommand state // ignore any non-command character

The parser, which could be the only publicly available function, will take care of initializing the state and recursively step through the code, one instruction at a time, until the end of the source:

let parse memSize code = let rec run (state : BFState) = if (not (isEnd state)) then run (step state) initState memSize code |> run

You can find a repository of Brainfuck sample programs at this site: http://esoteric.sange.fi/brainfuck/bf-source/prog/, the following code contains the classic *Hello World!* and how to parse it using the interpreter we just wrote:

let code = ">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+." parse 30000 code

The whole Brainfuck interpreter code can be found just below, feel free to take it and dissect it:

#light open System type BFState = { mutable program : string ; // program being interpreted mutable memory : int[] ; // memory mutable pc : int ; // current program counter mutable pos : int } // current pointer position let initState memSize code = { program = code; memory = Array.zero_create memSize; pc = 0; pos = 0; } let isEnd (state : BFState) = state.pc >= state.program.Length let nextCommand (state : BFState) = { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos } let previousCommand (state : BFState) = { program = state.program ; memory = state.memory ; pc = state.pc - 1 ; pos = state.pos } let getMem (state : BFState) = state.memory.[state.pos] let setMem (state : BFState) value = state.memory.[state.pos] <- value state.memory let rec moveToForwardMatch (state : BFState) = match state.program.[state.pc] with | ']' -> (nextCommand state) | _ -> moveToForwardMatch (nextCommand state) let rec moveToPreviousMatch (state : BFState) = match state.program.[state.pc] with | '[' -> (nextCommand state) | _ -> moveToPreviousMatch (previousCommand state) let outputByte (state : BFState) = Console.Write (Convert.ToChar(state.memory.[state.pos])) nextCommand state let readByte (state : BFState) = state.memory.[state.pos] <- Console.Read() nextCommand state let step (state : BFState) = match state.program.[state.pc] with | '+' -> { program = state.program ; memory = setMem state (getMem state + 1) ; pc = state.pc + 1 ; pos = state.pos } | '-' -> { program = state.program ; memory = setMem state (getMem state - 1) ; pc = state.pc + 1 ; pos = state.pos } | '<' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos - 1} | '>' -> { program = state.program ; memory = state.memory ; pc = state.pc + 1 ; pos = state.pos + 1} | '[' -> if (state.memory.[state.pos] = 0) then moveToForwardMatch state else nextCommand state | ']' -> if (state.memory.[state.pos] <> 0) then moveToPreviousMatch state else nextCommand state | '.' -> outputByte state | ',' -> readByte state | _ -> nextCommand state // ignore any non-command character let parse memSize code = let rec run (state : BFState) = if (not (isEnd state)) then run (step state) initState memSize code |> run

## Facebook FizzBuzz

Have you ever tried looking for "**FizzBuzz**" on a search engine?

If you do, you'll surely land on this page on Coding Horror, the blog written by **Jeff Atwood**.

To make a long story short, Jeff states that the vast majority of the developers is unable to write a tiny program that should take no more than 10 minutes to code.

This is the *famous* FizzBuzz problem:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

You should be outraged now, if you aren't I hope you don't make a living as a developer!

I found out that Facebook also considers FizzBuzz as a good starting test for job applicants. There is a programming puzzles page, where you can find a set of good problems, ranging from blindingly easy to very hard.

The first one is called HoppityHop! and it is just a variation of the good old FizzBuzz:

The program should iterate over all integers (inclusive) from 1 to the number expressed by the input file. For example, if the file contained the number 10, the submission should iterate over 1 through 10. At each integer value in this range, the program may possibly (based upon the following rules) output a single string terminating with a newline.

* For integers that are evenly divisible by three, output the exact string Hoppity, followed by a newline.

* For integers that are evenly divisible by five, output the exact string Hophop, followed by a newline.

* For integers that are evenly divisble by both three and five, do not do any of the above, but instead output the exact string Hop, followed by a newline.

Being a developer myself I couldn't resist writing a solution for this problem, obviously in F#:

#light let HoppityHop n = let printHop x = match x with | x when (x % 15 = 0)-> printfn "Hop" | x when (x % 3 = 0) -> printfn "Hoppity" | x when (x % 5 = 0) -> printfn "Hophop" | _ -> () [1 .. n] |> Seq.iter printHop

I know many of you will feel the urge to suggest a better solution, you're welcome, the comment area is yours to use!

## Random testing in F# with FsCheck

One of the emerging trends in software development is **TDD** or **Test-Driven Development**, a methodology based on writing tests first and then coding in order to pass the tests.

Besides unit testing libraries inherited by the .Net Framework, F# can now count on FsCheck, a random testing framework cloned from Haskell's QuickCheck.

A random testing library generates a set of test cases and attempts to falsify the properties defined by the developer.

This approach is not exhaustive but if all the tests of large enough test suite are passed we can safely assume that the code is correct.

Let's see how to get started using FsCheck to validate the RSA implementation written some time ago.

The first step is to download the latest release (currently 0.3) of FsCheck from the Download page.

You can either get the binaries or the source code, that also contains an example console application.

Assuming that you downloaded the binaries, you have then to open/create an F# application and add a Reference (*Project - Add Reference*) to the FsCheck.dll library file:

In order to use the methods provided by FsCheck we have to include its namescope into our code by adding a *open FsCheck* clause at the beginning of our program.

For the sake of example, let's fix the two distinct random prime numbers *p* and *q* and the public exponent *e*.

We have then to define our first property, that I'm going to call *prop_rsa*, which basically asserts that decrypting a message that was previously encrypted we get back the original message.

In order to run a property *prop_myproperty* we just have to run *quickCheck myproperty*, so in this case we'll run *quickCheck prop_rsa*.

#light open System open FsCheck // RSA sample data let p = 61 let q = 53 let e = 17 let n = p * q let d = private_exponent e p q let prop_rsa message = let encrypted = encrypt message e n decrypt encrypted d n = message quickCheck prop_rsa Console.ReadKey() |> ignore

If everything goes well, we should see a console window like the following one, with the number of tests passed (by default, FsCheck generates 100 test cases).

If a test fails, FsCheck will stop the execution and show which test failed. If you want to see all the generated test cases, you can run *verboseCheck* instead of *quickCheck*.

The next step should be writing a custom generator in order to generate not only the message but also the prime numbers and the public exponent, but I think we can cover that in a future post.

You can **download the complete solution** here.

## Implementation of RSA in F#

During my university course I had to learn and use two functional programming languages: **Haskell** and **Scheme**. I fell in love with the former but I never managed to do the same with the syntax of Scheme and the incredibly huge number of parenthesis you have to type in order for your code to work!

That's why I decided to ~~steal~~ borrow a short implementation of the RSA algorithm written in Scheme and translate it into F#.

As you may know, RSA is an algorithm used for **public-key encryption** based on prime numbers and factorization that is widely use on the web to secure e-commerce transactions.

At the end of the following code there is also a working example that can be executed to better understand the steps required to encrypt and decrypt a message (in this case a single number):

#light // Modulus operator that handles negative numbers correctly let modulus n m = ((n % m) + m) % m // Greater Common Divisor let rec gcd a b = match b with | b when b = 0 -> a | b -> gcd b (a % b) // extended_gcd = (x,y), such that a*x + b*y = gcd(a,b) let rec extended_gcd a b = if (a % b = 0) then (0, 1) else let (x, y) = extended_gcd b (a % b) (y, x - y * (a / b)) // modulo_inverse(a,n) = b, such that a*b = 1 [mod n] let modulo_inverse a n = let (x, y) = extended_gcd a n modulus x n // totient(n) = (p - 1)*(q - 1), // where pq is the prime factorization of n. let totient p q = (p - 1) * (q - 1) // square(x) = x^2 let square x = x * x // modulo-power(base,exp,n) = base^exp [mod n] let rec modulo_power b exp n = if (exp = 0) then 1 else if (exp % 2 = 1) then (((modulo_power b (exp - 1) n) * b) % n) else ((square (modulo_power b (exp / 2) n)) % n) // RSA routines. // A legal public exponent e is between // 1 and totient(n), and gcd(e,totient(n)) = 1 let is_legal_public_exponent e p q = (1 < e) && (e < (totient p q)) && (1 = (gcd e (totient p q))) // The private exponent is the inverse of the public exponent, mod n. let private_exponent e p q = if (is_legal_public_exponent e p q) then modulo_inverse e (totient p q) else raise (new System.Exception("Not a legal public exponent for that modulus")) // An encrypted message is c = m^e [mod n] let encrypt m e n = if (m > n) then raise (new System.Exception("The modulus is too small to encrypt the message")) else modulo_power m e n // A decrypted message is m = c^d [mod n] let decrypt c d n = modulo_power c d n // RSA example. let p = 61 let q = 53 let n = p * q let e = 17 let d = private_exponent e p q let message = 123 let c = encrypt message e n let m = decrypt c d n

I think the code presented is self-explainatory and each function is briefly described in the comments, but there is a strange thing that you may have noticed.

The first function I defined is the *modulus* operator, but F# already has a *modulus* operator, so why bothering?

Before writing that function I was about to getting crazy, since everything else was already written but the encryption was not working properly. I debugged and tested again and again and eventually I found out that sometimes the result of the native modulo operation was a negative number.

After a short lookup on Wikipedia, I discovered that each programming language implements the modulo operator differently and while in Scheme the result has the same sign as the divisor, in F# the sign of the result is the same as the dividend!

## F# September CTP released!

F# development has finally reached **CTP** (**Community Technology Preview**) release and this actually makes F# a first-class citizen in the .Net ecosystem.

**Don Syme** himself made this important announcement on his blog, where he describes some of the major improvements, that include new and improved **F# Project system** and language service.

The detailed release notes can be found here, while the installer can be downloaded from the brand new **Microsoft F# Developer Center**.