Claudio Cherubino's blog Life of a Googler


Project Euler in F# – Problem 36

A number (in general a word) is palindromic when it can be read in the same way from left to right and in the opposite direction, i.e. it is "symmetrical".

According to the definition, a number can be palindromic when written in base10 and not palindromic in base2, but there are some numbers that are palindromic in both bases.

In Project Euler Problem 36 we have to find all these numbers between 1 and 1 million:

The decimal number, 585 = 10010010012 (binary), is palindromic in both bases.

Find the sum of all numbers, less than one million, which are palindromic in base 10 and base 2.

(Please note that the palindromic number, in either base, may not include leading zeros.)

To solve this problem we need two helper functions, one to check whether a number is palindromic and another one to convert a number from base10 to base2:

open Microsoft.FSharp.Math

let isPalindromic n =
  let chars = n.ToString().ToCharArray()
  chars = Array.rev chars

let toBinary (n : BigInt) =
  let rec toBinary1 a v =
    match v with
    | 0I -> a
    | _ ->
      let q,r = BigInt.DivMod (v,2I)
      toBinary1 (r.ToString()::a) q
  toBinary1 [] n |> List.reduce_left (^) |> BigInt.Parse

let euler36 = [1I .. 1000000I] |>
              List.filter isPalindromic |>
              List.filter (fun x -> x |> toBinary |> isPalindromic) |>
              List.reduce_left (+)

The first function checks if a number is palindromic by converting it into an array of digits and checking if this array is equal to its reverse one.

Then we have the toBinary function that is based itself on an inner function called (with a lot of originality) toBinary1.

This recursive function uses the list a to store the remainders of the repeated divisions by 2, until the original number is over.

These digits, converted as strings, are joined with ^, the string concatenation operator and eventually the result is converted to a BigInt.

To find the answer to the problem we start with the list of all numbers between 1 and 1 million and take (filter) only the elements which are palindromic in base10.

After this first step, we filter the list again, this time keeping only those numbers that are palindromic when converted to base2.

The last step is the easiest, we only need to sum the numbers left to get the answer to the problem.


Decoding RLE in F#

Encoding techniques are useless if you don't have a way to restore the original message from the compressed data.

A couple of days ago, we saw how to implement the Run-Length Encoding (RLE) algorithm to encode an array of elements and now we are going to describe how to restore the original array.

To decode an array encoded with RLE we take the (length, element) couples, replicate each element by its length and then concatenate the output.

Let's see the F# implementation:


let decode src =
  let rec decode_block (num, elem) =
    match num with
    | 0 -> []
    | _ -> elem :: decode_block (num-1, elem)
    in List.concat ( decode_block src)

At line 8 we concatenate the output of the decode_block function applied to each couple of the src array.

Decode_block is a recursive inner function, which is very easy to understand. In fact, it just repeats elem the number of times indicated by num.

This is everything you need to know to encode and decode arrays (or anything else) by using the RLE algorithm. There are, of course, better compression techniques, if you want we can try to look at them in the future.


Run-Length Encoding in F#

According to Wikipedia:

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, relatively simple graphic images such as icons, line drawings, and animations.

Run-length encoding is used in fax machines. It is relatively efficient because most faxed documents are mostly white space, with occasional interruptions of black.

We will implement now the RLE algorithm in F#, writing a function that will take a list as argument and return a new list of couples (N,E), where N is the number of duplicates of the element E.

Let's show the code:


let pack src =
  let rec pack2 src tmp out =
    match src with
    | [] ->
        if (tmp = [])
          then out
        else (List.append out [tmp])
    | h::t ->
      if (tmp = [])
        then (pack2 t [h] out)
        if (h = (List.hd tmp))
          then (pack2 t (h::tmp) out)
        else (pack2 t [h] (List.append out [tmp]))
    in (pack2 src [] [])

let encode src = (fun b -> (List.length b, List.hd b)) (pack src)

let source = ['a'; 'a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'a'; 'a'; 'd'; 'e'; 'e'; 'e'; 'e']
let answer = source |> encode

First of all we need to pack consecutive duplicates of list elements into sublists, and this is done by the pack function.

Inside pack we define an inner function, called pack2, which is not accessible from outside this block. This also allows us to transform each call like "pack src" to "pack2 src [] []", adding two more arguments to the call.

This inner function moves elements from src to tmp while they are duplicates, and when a different element appears, the list contained in tmp is appended to out.

This process continues until the source list is exhausted, then the out list contains the packed output.

If you want to follow this procedure step-by-step, in the following image you can see the content of src, tmp and out at the beginning of each iteration, using ['a'; 'a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'a'; 'a'; 'd'; 'e'; 'e'; 'e'; 'e'] as input.

Run-length encoding in F#

Now, the actual run-length encoding is trivial. We just have to pack the source list and then apply a function on each element that takes a list of duplicates and returns a couple (N,E), where N is the number of elements and E is the first element (they are all the same).

In our case the final result will be:

[(4, 'a'); (1, 'b'); (2, 'c'); (2, 'a'); (1, 'd'); (4, 'e')]

In the next article I'll present the decoding algorithm, even if I'm quite sure you can easily write it on your own.