Lately I’ve taken a closer look at Julien’s F# implementations of Chris Okasaki’s Purely Functional Data Structures^{1}. `AltBinaryRandomAccessList`

^{2} caught my eye because judging from its existing `update`

function it looked pretty simple to implement a `remove`

function. As you may have noticed in all the major collections of F# data structures –FSharp.Core.Collections, FSharpx.core.DataStructures, and PowerPack — `remove`

is pretty rare. And both `update`

and `remove`

seem to be neglected stepchildren in the purely functional data structure world. (By `update`

and `remove`

I am referring to functions updating and removing any element by index.)

Let’s look at Okasaki’s “pseudo-canonical”^{3} update for `list`

, implemented in F#.

```
```

```
let rec loop i y (l:'a list) =
match (i,y,l) with
| i',y,[] -> raise (System.Exception("subscript"))
| 0,y',x::xs -> y::xs
| i',y,x::xs -> x::(loop (i'-1) y xs)
```

```
```

Do you see the problem? I’ll get back to it below.

How else might we implement update? We could just *punt* by converting to `Array`

, updating, and back to `list`

.

```
```

```
let punt i y (l:’a list) =
let a = List.toArray l
a.[i] <- y
List.ofArray a
```

```
```

And our last candidate is a hybrid, in which we progressively take the `tail`

, collecting each `head`

in an `Array`

, until we pass the element for update. Then `cons`

the new element and the elements from the `Array`

.

```
```

```
let hybrid i y (l:'a list) =
if (i = 0) then List.Cons (y, (List.tail l))
else
let rec loop i' (front:'a array) back =
match i' with
| x when x < 0 -> front, (List.tail back)
| x ->
Array.set front x (List.head back)
loop (x-1) front (List.tail back)
let front', back' = loop (i - 1) (Array.create i y) l
let rec loop2 i' frontLen (front'':'a array) back'' =
match i' with
| x when x > frontLen -> back''
| x -> loop2 (x + 1) frontLen front'' (front''.[x]::back'')
loop2 0 ((Seq.length front') - 1) front' (y::back')
```

```
```

`AltBinaryRandomAccessList`

is a structure with the same basic functions as `List`

, but optimized for `update`

. Here’s the type structure:

```
```

```
type AltBinRndAccList<'a> =
| Nil
| Zero of AltBinRndAccList<'a * 'a>
| One of 'a * AltBinRndAccList<'a * 'a>
```

```
```

The element grouping represents the binary number of the count of elements, with least significant digit to the left^{4}. This allows for a member update function which runs in *O(log n)* time. (The best we have seen so far is *O(i)*, where *i* is the index of the target element, and *punt* runs in *O(n)*).

```
```

```
static member fupdate : ('a -> 'a) * int * AltBinRndAccList<'a> -> AltBinRndAccList<'a> = function
| f, i, Nil -> raise Subscript
| f, 0, One(x, ps) -> One(f x, ps)
| f, i, One (x, ps) -> AltBinRndAccList.cons x (AltBinRndAccList.fupdate (f, i-1, Zero ps))
| f, i, Zero ps ->
let f' (x, y) = if i % 2= 0 then f x, y else x, f y
Zero(AltBinRndAccList.fupdate(f', i/2, ps))
```

```
```

Now lets look at actual performance for our `update`

alternatives^{5}.

Size | 10k Random Updates | One-time Worst Case | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10^{2} |
PC | - | 2.9 | ms | Punt | - | 0.2 | ms | |||

Hybrid | 1.4 | X | 4.0 | PC | 1.1 | X | 0.2 | ||||

Punt | 1.5 | 4.5 | Hybrid | 4.1 | 0.8 | ||||||

RndAcc | 1.8 | 5.3 | RndAcc | 7.7 | 1.5 | ||||||

10^{3} |
RndAcc | - | 8.2 | Punt | - | 0.2 | |||||

Hybrid | 3.6 | 29.6 | PC | 1.1 | 0.2 | ||||||

Punt | 5.8 | 47.6 | Hybrid | 4.1 | 0.8 | ||||||

PC | 6.2 | 50.3 | RndAcc | 8.0 | 1.6 | ||||||

10^{4} |
RndAcc | - | 11.5 | Punt | - | 0.3 | |||||

Hybrid | 27.8 | 320.3 | PC | 1.3 | 0.4 | ||||||

Punt | 46.4 | 534.9 | Hybrid | 3.2 | 0.9 | ||||||

PC | 79.8 | 920.2 | RndAcc | 6.5 | 1.8 | ||||||

10^{5} |
RndAcc | - | 14.8 | Punt | - | 1.0 | |||||

Hybrid | 315.5 | 4.67 | sec | Hybrid | 1.5 | 1.5 | |||||

Punt | 630.7 | 9.34 | RndAcc | 2.0 | 2.0 | ||||||

PC | stack overflow |

By way of comparison `Array`

, which is not a purely functional structure, performs 10,000 updates in 0.1 millisecond at all scales, and really has no “worst case” single update. And notice because its function is not tail recursive Pseudo-Canonical eventually took a stack overflow. Also of interest, starting at the scale of 1,000 elements Punt outperforms Pseudo-Canonical in random updates, even though it is *O(n)* and the latter *O(i)*. Time complexity does not tell the whole performance story.

So where does this leave `remove`

? As it turns out my original idea to modify the `update`

of `AltBinaryRandomAccessList`

resulted with a function that was not tail recursive, but applying the *hybrid* technique to it resulted in a satisfactory algorithm^{6} and marginally improved performance, but like most every `remove`

implemented in a purely functional list-like structure, the best time complexity you are going to get is *O(i)*. (With a double-ended queue you can achieve *~O(i/2)* .)

Here’s the comparative summary of `remove`

performance on list-like structures. This time I include `Array`

, where I copy the retained elements into a new structure one element shorter.

Size | 10k Random Deletes | One-time Worst Case | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10^{2} |
Array | - | 1.4 | ms | Array | - | 0.1 | ms | |||

PC | 1.9 | X | 2.6 | PC | 2.7 | X | 0.2 | ||||

Hybrid | 2.7 | 3.7 | Punt | 4.4 | 0.3 | ||||||

Punt | 3.9 | 5.3 | Hybrid | 10.5 | 0.7 | ||||||

RndAcc | 36.5 | 49.7 | RndAcc | 40.4 | 2.7 | ||||||

10^{3} |
Array | - | 11.2 | Array | - | 0.1 | |||||

Hybrid | 2.6 | 28.7 | PC | 2.7 | 0.2 | ||||||

PC | 4.3 | 48.3 | Punt | 4.3 | 0.3 | ||||||

Punt | 4.4 | 49.3 | Hybrid | 10.4 | 0.7 | ||||||

RndAcc | 40.1 | 448.5 | RndAcc | 41.1 | 2.9 | ||||||

10^{4} |
Array | - | 107.9 | Array | - | 0.1 | |||||

Hybrid | 3.0 | 321.5 | PC | 3.2 | 0.3 | ||||||

Punt | 5.5 | 589.9 | Punt | 4.5 | 0.4 | ||||||

PC | 8.3 | 898.8 | Hybrid | 10.0 | 0.8 | ||||||

RndAcc | 48.8 | 5.27 | sec | RndAcc | 45.8 | 3.8 | |||||

10^{5} |
Array | - | 1.43 | sec | Array | - | 0.2 | ||||

Hybrid | 3.2 | 4.51 | Punt | 6.0 | 1.1 | ||||||

Punt | 7.7 | 11.04 | Hybrid | 8.1 | 1.5 | ||||||

RndAcc | 42.9 | 61.39 | RndAcc | 89.8 | 16.1 | ||||||

PC | stack overflow |

As you can see, `remove`

is a viable function for moderate-sized data structures, but suffers from the limitation of its time complexity as you scale up. You could easily modify `remove`

functions to process an ordered list or range of indexes and still have *O(max i)* performance.

One remaining question for purists: are the stepchildren still Purely Functional when they use mutation inside a function?

## Notes

1 Okasaki, Purely Functional Data Structures, 1998

3 Okasaki, 1998.

4 See Okasaki, 1998 or Okasaki, Purely Functional Data Structures, 1996 for in depth discussion of numerical representations of data structures.

5 Times were taken on a not particularly fast dual core 64-bit system. Time multiples were calculated at a finer granularity, and so may not perfectly match the timings in milliseconds. `AltBinaryRandomAccessList`

is abbreviated to “RndAcc”.

6 I hope to release it soon in an open source library.