# Exploring the squarified tree map algorithm with ReasonML (part 2)

Welcome to part two of this series of articles. In part one, we walked step-by-step through the inner workings of the squarified tree map algorithm using a concrete example. In this part, we’ll go over the actual implementation in ReasonML, or Reason for short. Along the way, we’ll discuss relevant concepts and features in the language. The purpose is to show how Reason can be used to elegantly solve a real world problem.

In case you are impatient, here is the full implementation. All functions in the code have corresponding unit tests. Here is an interactive notebook so you can play and experiment with the code. It contains a lot of the unit test cases as examples of how to use the functions.

## Set up

Here are the setup steps:

`yarn`

/`npm`

install the package`bs-platform`

globally.- Initialize the project by running:

1bsb -init reason-squarify -theme basic-reason

- Add
`jest`

as the unit testing framework as shown here.

After these steps, the repo should look like this.

## Types

### Coordinate system

Recall from part one that our coordinate system follows the convention in a web browser: $x$ increases to the right and $y$ increases downward.

There are many ways to represent a rectangle in any algorithm and here we choose to specify the coordinates of the top-left corner ($x_0$ and $y_0$) and bottom-right corner ($x_1$ and $y_1$).

With that out of the way, let’s define a few *types* that will represent the various entities we’ll encounter.
Types are an absolutely central part of any functional programming language and give these languages their expressive power.
Based on the above convention, we define the type of a rectangle as follows:

Because Reason is just an alternative syntax for OCaml, Reason inherits OCaml’s two different types of numbers: `float`

for floating-point numbers and `int`

for integers.
However, because floating-point numbers are really the only effective type of number in our compilation target (JavaScript)^{1}, we’ll represent all numbers in our Reason code as `float`

s.

Given the above definition of a `rectangle`

, this is how we represent the initial $6\times 4$ container in Reason:

This is called a `let`

binding, which is how we declare variables in Reason.
Here we explicitly annotate the type of the `container`

variable as `rectangle`

.
Ordinarily, you can omit such a type annotation because the Reason compiler can infer the type of `container`

by matching this binding against the closest compatible type definition i.e. `rectangle`

.
In this article, we’ll choose to include more explicit type annotations than necessary to aid clarity.

### Input data

Recall from part one that our sample input tree looks like this

and the final output of the algorithm looks like this:

As noted at the end of part one, only leaf nodes (terminal nodes) remain in the visual output. Branch nodes (non-terminal nodes), which only indicate structural information e.g node $\mathrm{B}$ contains nodes $\mathrm{BAA}$, $\mathrm{BAB}$ and $\mathrm{BB}$, have disappeared. As a result, I’ll chose a type representation for the input tree where leaf nodes contain the numerical value and label and branch nodes contain structural information. For example, leaf $\mathrm{D}$ contains the numeric value $3$ and the label $\mathrm{D}$. On the other hand, branch $\mathrm{B}$ contains the information that it is the parent of nodes $\mathrm{BA}$ and $\mathrm{BB}$ instead of directly containing the numerical value $6$, which can be derived by summing up the values of $\mathrm{B}$’s descendants.

With this mental model, we can define our tree data structure as follows:

In this snippet, the slightly right-slanted characters at the beginning of lines two and three are pipes (`|`

), not forward slashes (`/`

).
`datum`

is a type that represents the data contained in each `LeafNode`

:

There’s a lot to unpack in the definition of `tree`

so let’s tackle it bit by bit.
The type `tree`

is defined as a *variant*.
A variant type expresses the notion of disjunction: that a type can be “either this or that.”
We encounter variant types quite often in the physical world e.g. a lateral direction can be left or right, and Reason’s variant types allow us to model this behavior in an expressive way.
As will be seen later, Reason enforces the disjunctive nature of variants by requiring us to account for all the possible versions of a variant before using them in concrete computations.

By defining `tree`

in this way, we’re saying a tree is recursive data structure that can either be a leaf node or a branch node.
Leaf nodes contain the label and numerical value information.
Branch nodes contain the structural information of the tree.
Each branch node contains a collection of more trees, represented as a `list(tree)`

.
We’ll discuss `list`

s in the next section.

In the type definition of `tree`

, `LeafNode`

and `BranchNode`

are called *constructor*s.
Unlike class constructors in object-oriented languages, which perform necessary initialization tasks when new instances of a class are created, constructors in functional languages are a lot simpler.
You can think of them as mere wrappers around some data (or no data at all).
Here, `LeafNode`

is a wrapper around a leaf node’s numeric value and label and `BranchNode`

a wrapper around a collection of sub trees.

Based on this mental picture, our sample input tree can be represented as follows:

Unlike JavaScript, which allows string literals to be enclosed in either singe or double quotes, string literals in Reason must be enclosed in double quotes, a change that Douglas Crockford would surely approve.^{2}

### Output data

Recall from part one of this series that we call a collection of rectangles being laid out in the same orientation (horizontal vs vertical) a *row*.
For example, the nodes $\mathrm{A}$, $\mathrm{B}$ and $\mathrm{C}$ form a row in the diagram below:

Our implementation will use `list`

s to represent rows.
In Reason, a `list`

is a homogenous and immutable collection of items.
`list`

can be thought of as a variant type:

which says that a list that holds value(s) of some type `a`

(whatever it is) is either an empty list (`Nil`

) or a “construct” (`Cons`

) that holds a value of that type (called the *head*) plus another list (called the *tail*) containing values of the same type.
This definition makes Reason’s `list`

s more like linked lists than JavaScript arrays.
Because `list`

s are so widely used, Reason gives them some syntactic sugar: the empty list `Nil`

is shortened to `[]`

(empty square brackets) and the `Cons`

pair is shortened to `[head, ...tail]`

.

In `list`

’s definition above, `a`

is called a *type variable*.
Similar to the way “value” variables express symbolic relationships between values e.g. $(a + b)^2 = a^2 + b^2 + 2ab$ and will be assigned concrete values at run time, type variables express symbolic relationships between types and will be assigned concrete values at compile time by the compiler, either through explicit type annotations by the developer or through type inference.^{3}
For example, in this list of integers, the type variable `a`

is `int`

Here we could have omitted the `list(int)`

type annotation (usually read out loud as “list of `int`

s”) because the compiler can infer it from the type of the literal values (`1`

, `2`

, `3`

) in the list.

Based on our understanding of `list`

s and the diagram above, we expect the output of our implementation to look like this:

where each `cell`

adds the layout information (in the form of a rectangle) to the `datum`

contained in the corresponding leaf node:

As noted above, branch nodes do not appear in the layout result.

## Functions

The other central part of a functional language, other than types, is functions.
We’ll write a series of functions that progress from simple to complex, culminating in the two functions, `squarify`

and `traverse`

, that perform the breadth-wise and depth-wise recursions.
First off, how about a function that returns the shorter side of a rectangle?

### getShorterSide function

![getShorterSide function](getShorterSide function.png)

Functions in Reason look very similar to ES2015’s “arrow functions.”
They are defined with the parameters enclosed in parentheses on the left and the function body on the right of the “fat arrow” (`=>`

).
There’s no `return`

keyword because the value of the last statement is automatically returned.
In this case, the return value is the value of the last ternary statement, which works exactly the same as its JavaScript counterpart.

Because Reason makes a distinction between floats and integers, to perform arithmetic operations on floats, we have to use the dotted versions (`+.`

, `-.`

etc) of the integer operators (`+`

, `-`

etc).

To calculate the `width`

of the rectangle, we take the absolute value of the difference $(x_1 - x_0)$ using the `abs`

function.
Functions are invoked in Reason the same way in JavaScript: with arguments enclosed in parentheses.

Note that `abs`

is neither in Reason’s built-in library nor implemented by us.
Rather, it’s a built-in function in JavaScript.
The first line in the screenshot (starting with `[bs.scope`

) is an external^{4} (or inter-op) that says ”`abs`

is function under the global `Math`

scope that takes in a `float`

and returns a `float`

.”
In the JavaScript output, this `external`

declaration will be erased and all references to `abs`

in Reason will be replaced with `Math.abs`

in JavaScript, as shown below:

This JavaScript output is very clean and readable and very close to what you would write by hand^{5}.
The easy inter-op with JavaSript and clean, readable output is one of Reason’s greatest strengths.

### sum function

Another function that we’ll need later is `sum`

that returns the sum of all elements in a list of floats:

`List`

is a built-in module that contains utility functions to work with `list`

s, one of which is `fold_left`

.
We won’t cover `module`

s in this article as it is a more advanced topic.
The function `List.fold_left`

is very similar to the `.reduce`

method on JavaScript’s arrays.
In fact, *reduce* is just another name for *fold* in functional languages.
`List.fold_left`

takes three arguments 1) a function called the *reducer* that describes how to aggregate the list 2) the starting value of the aggregated value 3) the list itself.
And just like in JavaScript, functions, such as the reducer in this example, can be defined anonymously by listing the parameters followed by the function body, separated by the fat arrow `=>`

.

### sumValuesInTree function

Because only leaf nodes contain the numerical values, we need a way to sum up the values from all leaf nodes contained within a tree in order to do layout.
For example, if we run the input tree shown above through this function, we should get back 24.
Because the tree structure is recursive, performing operations on this tree naturally calls for a recursive function.
Because this recursive function calls itself, it has to be prefixed with the `rec`

keyword as shown below:

This code introduces pattern matching via `switch`

expressions.
Pattern matching allows us to check whether some piece of data has a certain structure and, if a match occurs, extracts parts of that data through destructuring, which is similar to destructuring an array or object in JavaScript.
We specify what data we want to match against by enclosing it in parentheses after the keyword `switch`

and then list out all the cases, separated by pipes (`|`

).
Each case consists of a pattern to the left of a fat arrow and what value to return from that pattern to the right.
One requirement of `switch`

expressions is that the patterns must be exhaustive, that is, you must list all the possible patterns that the value being matched against can possibly have.
The combination of variants and exhaustive pattern matching is a very powerful feature of Reason and eliminates an entire class of programming bugs.

In this function, we use `switch`

to match against the input node, which can be either a `LeafNode`

or a `BranchNode`

.
In the case of a `LeafNode`

, we know that it contains a `datum`

, which in turn contains the numerical `value`

(as shown in the information tooltip).
In the case of a `BranchNode`

, we know that it contains a `list`

of subtrees so we use `List.map`

to find the value sum of each sub tree.
`List.map`

works in the same manner as the `.map`

method on JavaScript arrays.
`List.map`

takes in two arguments: a transformer and a list of items to be transformed.
`List.map`

will use the transformer to transform every item in the input list and assemble all the output values into another list.
Thus, `valuesFromSubTrees`

, the output of the `List.map`

operation, is a `list`

of `float`

s because the input is a list of `tree`

and the transformer transforms any `tree`

into a `float`

.
Applying the `sum`

function to this list of `float`

s gives us the sum of all the subtrees contained in the `BrachNode`

.

Below are two passing test cases from the test code.
When given the `LeafNode`

$\mathrm{BAA}$ with value $3$, we expect the function to return $3$.
When given the entire input tree, we expect to get back $24$^{6}:

### Partial application of functions

You might wonder about the peculiar ordering of the parameters of `List.fold_left`

and `List.map`

.
Why does the data comes last in the list of arguments and the iteratees come first?
To fully answer this question, let’s take a short detour.

Reason provides automatic partial application of functions.
That is, if a function is called with fewer arguments that it expects, the result is a function that can be called with some or all of the remaining arguments.
The screenshot below shows the many ways in which `List.fold_left`

can be invoked to sum the list `[1.0, 2.0, 3.0]`

.
If we only provide the first argument, `List.fold_left`

returns a function `appliedOnce`

that takes in the remaining two arguments and returns a float (as shown in the information tooltip).
If we provide the first two arguments, `List.fold_left`

returns a function `appliedTwice`

that takes in the last argument and returns a float.
We get the final result only after supplying all three arguments.

We’re now in a position to answer the question posed at the beginning of this section.
In a language that provides automatic partial application such as Reason, the “iteratee-first data-last” ordering of arguments allows for more compact and expressive code.
In the snippet below, we use partial application to provide a more succinct definition of `sum`

by conveniently leaving off the last argument of `List.fold_left`

.
We can then use `sum`

in multiple other places to sum up all numbers in the two-dimensional list `input`

:
When applied to each row in `input`

through `List.map`

, that row becomes the last argument to `List.fold_left`

, causing it to be fully applied and returns the sum for each row.

Note that Reason could infer that `sum`

’s argument (equivalent to `List.fold_left`

’s third argument) is a list of floats because the “plus” operation in the `reducer`

function is the dotted i.e. floating-point version.

### max function (two implementations)

A function `max`

that takes in a list of floats and return the maximum value in that list sounds simple enough, but there is an edge case: what should the return value be if the input is an empty list?
The idiomatic way in Reason, like in other functional languages, is to return a “box” that may or may not contain a maximum value and leave the consumer of the function to decide how to handle the case where the box is empty.

This notion of “a box that may or may not contain any value” is embodied by the built-in `option`

type:

which says that an `option`

of type `a`

is a box that either contains nothing (`None`

) or a value of type `a`

(`Some('a)`

).
With this edge case out of the way, here’s the function `max`

.

This function uses pattern matching with `switch`

twice.
In the first `switch`

expression, we match against `input`

, which is a `list(float)`

.
Recall from the previous section that a list can be either an empty list `[]`

or an element followed by another list `[head, ...tail]`

.
In the second `switch`

expression, we match against the return value of `max`

, which is an `option(float)`

that can be an empty box or a box that contains a float.
In case the box contains a float (`numericMaxOfRest`

), the pattern allows us to extract the value `numericMaxOfRest`

out of the box and uses it for comparison against `head`

.

As it turns out, due to the way our program is structured, we do have the guarantee that the input list to `max`

will have at least one element.
As such, we can simplify away the first `switch`

expression (which checks whether the `input`

is an empty list) so that `max`

always return a float instead of `option(float)`

.
This is the version we’ll use in our final implementation:

The `min`

function is defined in an almost identical way, which you can find in the code and won’t be reproduced here.
The more generalized `max`

function we discussed earlier is still available as `generalizedMax`

in the final code for your reference:

### getMaxAspectRatioOfRow function

Whenever we consider whether to add a new tree node to the current row or to start a new one, we need to determine the maximum aspect ratio of the current row with and without the new added node. The formula to calculate the maximum aspect ratio of a row of nodes, as given in the paper, is

$\mathrm{max} \left( \frac{ w^2 r^+ }{ s^2 }, \frac{ s^2 }{ w^2 r^- } \right)$where:

- $w$ is the length of the side of the container along which we’re laying the rectangles out. As mentioned in part one, this is always the shorter side of the container.
- $r^+$ and $r^-$ are the maximum and minimum numerical values, respectively, of the nodes in the row.
- $s$ is the sum of the values of the nodes in the row.
- $\mathrm{max}$ is a mathematical function that returns the larger of its two arguments.
It’s not the
`max`

function we implemented in Reason earlier, which returns the maximum value in a list.

Here is the implementation, which takes in the numerical values of the nodes in a row, represented as a list of floats, and the length of the container’s side (`w`

).
Because we use the `min`

and `max`

functions defined earlier, which assumes the input list is not empty, the input row to `getMaxAspectRatio`

must also be non empty.
As such, the input row is represented as a head node (`headOfRow`

) followed by the rest of the nodes (`restOfRow`

).
However, because the `sum`

function takes in a list, we need to re-assemble `headOfRow`

and `restOfRow`

into a single list with the spread operator before passing them to `sum`

.

### fitRowIntoContainer function

This function, `fitRowIntoContainer`

, actually assigns each node its layout rectangle.
The function takes in a row to be laid out inside a `container`

.
Each element in the row is a node accompanied by its numerical value.
The node-value association is done by putting the two in a *tuple* (in this case, a pair, or 2-tuple).
`fitRowIntoContainer`

returns those same nodes, this time associated with their corresponding position in the layout.
The function also returns the portion of the `container`

that remains unfilled after this layout step.

Here we encounter tuples for the first time.
They are immutable, ordered, fixed-size and heterogeneous collection of values.
For example, here is a tuple of a `tree`

and a `float`

:

and here is a list of such pairs, which are the `row`

parameter to `fitRowIntoContainer`

.
Note the double parentheses around `((tree, float))`

.
The inner one creates a tuple and the outer one encloses the type variable to the `list`

:

Below is the first part of the function, where we figure out how much of the container is taken up by the nodes in the row:

You might have noticed that `row`

is actually declared to have the type `list((_, float))`

instead of `list((tree, float))`

.
The type `_`

in `(_, float)`

conveys the idea that we don’t really care about the type of the first element of the tuple.
Indeed, this function simply “passes through” those first elements to the output without ever knowing what they are.

After figuring out the limit of the occupied area, we assign each node its own rectangular area one by one:

Although the code looks intimidating, we can get a sense for how it works by testing against the case of fitting nodes $\mathrm{A}$, $\mathrm{B}$ and $\mathrm{C}$ into the original $6\times 4$ container.

Here is the unit test that confirms the output coordinates match the visual diagram:

This is a quick aside about the `approx`

function used in the unit test.
It’s a custom Jest/Jasmine asymmetric matcher that asserts that two float-point numbers are equal to within two decimal places:

Although `approx`

is written in raw Javascript, Reason allows us to dump the JavaScript code directly in the middle of Reason code.
This capability means that when all else fails in the Reason world, you can always revert to writing JavaScript to “get things done,” an extremely important escape hatch in production code.

### squarify function

Finally, we’re in the position to implement the `squarify`

function that performs layout within a single level of the tree as demonstrated in part one of this series.
This is the function’s skeleton:

This function takes in four arguments:

`remainingNodes`

is a list of nodes that haven’t been processed by the algorithm.`currentRow`

is the list of nodes that have been accepted into the current row but not yet assigned a position in the layout.`container`

is a rectangle demarcating the current available space for layout.`result`

is the list of nodes that have already been assigned a layout position by the algorithm and will not be touched further.

The general flow of this recursive function goes like this.
One by one, it “shifts” the nodes from `remainingNodes`

to the `currentRow`

, stopping when the addition of the next node worsens the maximum aspect ratio of the `currentRow`

.
At that point, all nodes in the `currentRow`

will be assigned a rectangle within the `container`

proportional to their values.
The nodes in `currentRow`

will then be added to the `result`

and the process starts anew with an empty `currentRow`

and a new container.
This new container is the unfilled portion of the `container`

from the last step.

Out of the four arguments to `squarify`

, only the first three are true inputs because `result`

plays the role of accumulating the results between recursive calls.
Out of these first three arguments, two are lists: `remainingNodes`

and `currentRow`

.
In order to process them, we are forced by Reason to account for whether or not they are empty lists, leading to four cases as annotated in the code.
These four cases turn out to be algorithmically quite meaningful.

- Case one, in which both
`remainingNodes`

and`currentRow`

are empty, is the base case of the recursion. Because there are neither nodes nor rows left to process, all nodes must already be in the`result`

list, meaning we can return the`result`

list as the final output of the algorithm. - Case two, in which
`remainingNodes`

is empty but`currentRow`

is not, corresponds to the penultimate step of the algorithm where the last row has been laid out but yet to be added to the result. The only thing left to do is to add`currentRow`

to`result`

and call the base case. - Case three, in which
`remainingNodes`

is not empty but`currentRow`

is, represents the moment when we are starting with a new row. The only logical action here is add the first node in`remainingNodes`

to the`currentRow`

. This will lead usually to case four but can also lead to case two. - Case four, in which both
`remainingNodes`

and`currentRow`

are not empty, is the most general case.

In the general case, we attempt to add the next remaining node to `currentRow`

.
We first calculate the maximum aspect ratios of `currentRow`

with and without the additional node by calling `getMaxAspectRatioInRow`

:

By comparing these two numbers, we can decide whether to add it to the `currentRow`

based on whether this addition improves the maximum aspect ratio:

Here is a small implementation detail that needs further explanation.
When we add a new node to a row, we add it to the end of that row.
If the current row is represented as `[head, ...rest]`

and the next node to be added is is `next`

, you can think of the new row with the added node as `[head, ...rest, next]`

.
However, this is not legal Reason syntax because the language only allows spreading at the end of a list.
The correct way is to use `List.concat`

, which concatenates a list of lists together, like this:

Strictly speaking, we didn’t have to add a new node to end of the current row, which, due to the linked-list nature of Reason’s `list`

s, is a linear-time operation. We could have just easily added to the beginning like this `[next, head, ...rest]`

, which is a constant time operation.
However, we decide to go with adding to the end to match the output in Bruls’ paper.

### traverse function

This is the last function in the implementation.
As the name suggests, this function traverses the tree structure and perform the layout algorithm level-by-level.
It takes in an `input`

tree and a `container`

and returns a list of `cell`

s.
Its working is very similar to the `sumValuesInTree`

function above.
If the function reaches a leaf node, there’s no more work to do.
If the function reaches a branch node, it performs the layout for all nodes at the top level in that branch node and descends into each branch to repeat the process.

To cap it all off, here is the test case for `traverse`

.
Let `inputTree`

be the tree shown at the beginning of the article:

we expect the output of `traverse`

to be a list consisting of only the leaf nodes in the tree accompanied by their final positions:

where the rectangles look like this:

Hopefully this article has given you a flavor of Reason when used to solve a real-world problem.
Because this implementation was written more as a pedagogical example than as a production code, there’s likely room for efficiency improvement, e.g. switching to tail-recursive `List`

functions.
I’ve enjoyed using Reason a lot and I think it is a promising language to explore, especially in combination with React.
Out of all functional compile-to-JavaScript languages, Reason has the most familiar syntax to front-end developers, its JavaScript output is very readable and its type system (that of OCaml) has been battle tested over many years.

I’d like to thank J David Eisenberg for his very helpful feedback on this article.

- According to MDN, in JavaScript, all numbers are implemented in double-precision 64-bit binary format and integer values up to $\pm 2^{53} - 1$ can be represented exactly. There’s no specific type for integers although
`BigInt`

is an upcoming language proposal for representing integers that are very large.↩ - In this 2018 talk, he argues that JavaScript creates unnecessary clutter that leads to programming errors by allowing multiple ways to do things: tab & space for indentation, single & double quotes as delimiters for string literals,
`null`

&`undefined`

as the bottom type and so on. He then proceeds to argue that we can remove this clutter by removing languages feature that we can do away with. For example, we should use double quotes for string literals because single quotes are already used as apostrophes.↩ - Reason uses a Hindley-Milner type system, which this article does a good job of explaining.↩
- Strictly speaking,
`[bs.scope]`

is a construct specific to BuckleScript rather than Reason. Reason is the alternative Syntax for OCaml. BuckleScript is the compiler that compiles OCaml to JavaScript.↩ - An attentive reader might notice that in the JavaScript output, a
`rectangle`

is represented as a JavaScript array instead of a JavaScript object (official rationale here and more discussion here) and that a`rectangle`

’s fields (`x0`

,`y0`

etc) are accessed by indexing into that array. What goes on under the hood is that the compiler keeps an internal record of what field name corresponds to what JavaScript array index and outputs the appropriate JavaScript indexing operations.↩ - The triangular-looking characters in the test code are the pipe operator
`|>`

, which takes the item on the left and put it as the*last*argument of the item on the right. Suppose we have a function`f(a, b)`

that takes two arguments, the following two expressions are equivalent:`b |> f(a)`

and`f(a, b)`

. This`|>`

operator is also colloquially called “pipe last” to distinguish it from the “pipe first” operator`->`

, which puts its left operand as the*first*argument of the function on its right:`a -> f(b)`

is equivalent to`f(a, b)`

. Pipe operators are used in these tests to maintain the same visual semantics as Jest assertions written in JavaScript:`expect(actual).`

`toEqual(expected)`

.↩