Forests#

We saw previously that the parent vector representation can actually represent multiple trees at once. We call such a parent vector with multiple trees a forest. Forests are a delight to work with using the parent representation, and we’ll see examples of that on this page.

Planting Forests#

Given multiple parent vectors, each representing a tree, how can we put all these trees into one parent vector?

Let’s start with some simple trees:

trees(0 0 0 2 2)(0 0 0)(0 0 0 2 0 4)(0 0 1 1)
PPV¨trees
┌───────┬─────┬───────┬─────┐
│┌─────┐│┌───┐│┌─────┐│┌───┐│
││  ∘  │││ ∘ │││  ∘  │││ ∘ ││
││┌─┴┐ │││┌┴┐│││┌─┼─┐│││ │ ││
││∘  ∘ │││∘ ∘│││∘ ∘ ∘│││ ∘ ││
││  ┌┴┐││└───┘││  │ ││││┌┴┐││
││  ∘ ∘││     ││  ∘ ∘│││∘ ∘││
│└─────┘│     │└─────┘│└───┘│
└───────┴─────┴───────┴─────┘

We want one parent vector which represents the forest of all these trees. It would be nice if we could simply catenate all these vectors together and be done, but sadly it’s not that simple.

PPV⊃,/trees    ⍝ not what we want
┌───────────────────────────┐
│             ∘             │
│ ┌────┬───┬─┬┴┬─┬─┬─┬─┬─┬─┐│
│ ∘    ∘   ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘│
│┌┴┐ ┌─┼─┐                  │
│∘ ∘ ∘ ∘ ∘                  │
│      │                    │
│      ∘                    │
└───────────────────────────┘

When we catenate the vectors, the indices in the later parent vectors are suddenly inaccurate, as they have been shifted farther along the index space. We therefore need to correct for this. For instance, we need to correct the parent vector 0 0 0 to 5 5 5 before we catenate it, since there are \(5\) nodes in the preceding parent vector 0 0 0 2 2. This way, the nodes preserve their connections in the catenated result, rather than all pointing to the node \(0\) in the first vector. Similarly, we need to add \(8\) to the third parent vector, and \(14\) to the fourth. It’s easy to calculate these offsets with a scan.

0,+\¯1↓≢¨trees    ⍝ offsets for each vector
0 5 8 14

We can then use these offsets to correct the parent vectors before we catenate them, giving us the correct result vector for the forest of them all.

      trees+0,+\¯1↓≢¨trees    ⍝ correctly offset vectors
   ⊃,/trees+0,+\¯1↓≢¨trees    ⍝ parent vector for forest
PPV⊃,/trees+0,+\¯1↓≢¨trees    ⍝ this gives us the correct forest
┌─────────┬─────┬─────────────┬───────────┐
│0 0 0 2 2│5 5 5│8 8 8 10 8 12│14 14 15 15│
└─────────┴─────┴─────────────┴───────────┘
0 0 0 2 2 5 5 5 8 8 8 10 8 12 14 14 15 15
┌─────┬───┬─────┬───┐
│  ∘  │ ∘ │  ∘  │ ∘ │
│┌─┴┐ │┌┴┐│┌─┼─┐│ │ │
│∘  ∘ │∘ ∘│∘ ∘ ∘│ ∘ │
│  ┌┴┐│   │  │ ││┌┴┐│
│  ∘ ∘│   │  ∘ ∘│∘ ∘│
└─────┴───┴─────┴───┘

Perfect! This construction has a couple of nice properties. If our original vectors were in DFPT order, then our forest vector will be as well. Additionally, if we have any associated data with our original vectors, we can simply catenate these together and we will have a vector associated with our forest in exactly the same way.

data'abcde' 'fgh' 'ijklmn' 'opqr'    ⍝ some additional data associated with our vectors
data PPV¨trees
forest⊃,/trees+0,+\¯1↓≢¨trees        ⍝ make the forest vector
data⊃,/data                          ⍝ just catenate the data
data PPV forest                       ⍝ the forest is labelled correctly
┌───────┬─────┬───────┬─────┐
│┌─────┐│┌───┐│┌─────┐│┌───┐│
││  a  │││ f │││  i  │││ o ││
││┌─┴┐ │││┌┴┐│││┌─┼─┐│││ │ ││
││b  c │││g h│││j k m│││ p ││
││  ┌┴┐││└───┘││  │ ││││┌┴┐││
││  d e││     ││  l n│││q r││
│└─────┘│     │└─────┘│└───┘│
└───────┴─────┴───────┴─────┘
┌─────┬───┬─────┬───┐
│  a  │ f │  i  │ o │
│┌─┴┐ │┌┴┐│┌─┼─┐│ │ │
│b  c │g h│j k m│ p │
│  ┌┴┐│   │  │ ││┌┴┐│
│  d e│   │  l n│q r│
└─────┴───┴─────┴───┘

Deforestation#

What if we have the reverse problem? Say we have a parent vector representing a forest, and we want to extract the individual trees from it.

Importantly, the forest vector could arrive with any tangled order of nodes, so we can’t rely on the nice consecutive arrangement of trees we have in the forest vector right now. To emphasise this, let’s shuffle forest.

perm6 7 0 9 11 13 1 2 5 16 17 3 8 4 14 15 10 12
forestpermforest[perm]    ⍝ shuffle forest and correct parents
datadata[perm]             ⍝ shuffle data as well
data PPV forest              ⍝ still represents the same forest
(⍳≢forest) PPV forest
8 8 2 12 16 17 2 2 8 15 15 7 12 7 14 14 12 12
ghajlnbcfqrdieopkm
┌─────┬───┬─────┬───┐
│  a  │ f │  i  │ o │
│┌─┴┐ │┌┴┐│┌─┼─┐│ │ │
│b  c │g h│j k m│ p │
│  ┌┴┐│   │  │ ││┌┴┐│
│  d e│   │  l n│q r│
└─────┴───┴─────┴───┘
┌───────┬───┬───────┬────┐
│   2   │ 8 │   12  │ 14 │
│┌──┴┐  │┌┴┐│┌─┬┴─┐ │ │  │
│6   7  │0 1│3 16 17│ 15 │
│  ┌─┴┐ │   │  │  │ │┌┴┐ │
│  11 13│   │  4  5 │9 10│
└───────┴───┴───────┴────┘

The first step to extracting the trees from the forest will be identifying which nodes belong to which tree. Each tree can be uniquely identified by its root, so by finding the roots of each node, we can see which tree they belong to.

rootsIforest    ⍝ we have seen I⍣≡⍨ previously
8 8 2 12 12 12 2 2 8 14 14 2 12 2 14 14 12 12

We can then use this to group the nodes in the parent vector by which tree they belong to.

roots{}forest
┌─────┬─────────┬─────────────────┬───────────┐
│8 8 8│2 2 2 7 7│12 16 17 12 12 12│15 15 14 14│
└─────┴─────────┴─────────────────┴───────────┘

The trouble with this is that the vectors we have are pointing to their parent in the forest vector. We need to correct the parent pointers so that they are appropriate to just the lone tree.

Each of the parents pointers are too large (in rare cases they are just right, but in general they are not correct) as they are offset by all the nodes in the forest vector which appeared before it, but are not present in the lone tree.

Take the group 8 8 8, in the forest vector, the parent pointers are set up like so:

        ┌──────────────────┐
        │                  ↓
forest: 8 8 2 12 12 12 2 2 8 14 14 2 12 2 14 14 12 12
          │ └──6─nodes───┘ ↑
          └────────────────┘

There are six nodes not in the tree which appear before the root node \(8\) in the forest. Therefore, this node will appear at index \(8-6=2\) in its own parent vector, and the parent pointers should be updated accordingly.

┌───┐
│   ↓
2 2 2
  │ ↑
  └─┘

For each tree, we can find the number of nodes outside the tree which appear before each node inside the tree with a scan.

8 8 2 12 12 12 2 2 8 14 14 2 12  2 14 14 12 12    ⍝ original forest vector
0 0 1  1  1  1 1 1 0  1  1 1  1  1  1  1  1  1    ⍝ nodes outside the tree
0 0 1  2  3  4 5 6 6  7  8 9 10 11 12 13 14 15    ⍝ +\ of the above vector
                   ↑
        6 nodes before this place

We can use this correct the parent pointers for each group, and extract parent vectors for each tree in the forest.

 roots{      ~forest    }forest    ⍝ mask of nodes outside the tree
 roots{    +\~forest    }forest    ⍝ how many nodes outside the tree appear before each parent
roots{  (+\~forest)[]}forest    ⍝ restricted only to this tree
roots{-(+\~forest)[]}forest    ⍝ use to correct the parent pointers
0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 0 0
1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
0 0 1 2 3 4 5 6 6 7 8  9 10 11 12 13 14 15
1 2 2 3 4 5 5 5 6 7 8  8  9  9 10 11 12 13
1 2 3 3 3 3 4 5 6 7 8  9  9 10 11 12 12 12
1 2 3 4 5 6 7 8 9 9 9 10 11 12 12 12 13 14
┌─────────────┐
│6 6 6        │
├─────────────┤
│2 2 2 5 5    │
├─────────────┤
│9 12 12 9 9 9│
├─────────────┤
│12 12 12 12  │
└─────────────┘
┌───────────┐
│2 2 2      │
├───────────┤
│0 0 0 2 2  │
├───────────┤
│3 4 5 3 3 3│
├───────────┤
│3 3 2 2    │
└───────────┘

This gives us exactly the trees we packed into the forest originally.

treesroots{-(+\~forest)[]}forest
PPV¨trees
┌─────┬───────┬───────┬─────┐
│┌───┐│┌─────┐│┌─────┐│┌───┐│
││ ∘ │││  ∘  │││  ∘  │││ ∘ ││
││┌┴┐│││┌─┴┐ │││┌─┼─┐│││ │ ││
││∘ ∘│││∘  ∘ │││∘ ∘ ∘│││ ∘ ││
│└───┘││  ┌┴┐│││  │ ││││┌┴┐││
│     ││  ∘ ∘│││  ∘ ∘│││∘ ∘││
│     │└─────┘│└─────┘│└───┘│
└─────┴───────┴───────┴─────┘

We can use the same grouping method to extract the data for each tree. Thankfully there’s no complicated parent vector correction needed here.

dataroots{}data
data PPV¨trees
┌─────┬───────┬───────┬─────┐
│┌───┐│┌─────┐│┌─────┐│┌───┐│
││ f │││  a  │││  i  │││ o ││
││┌┴┐│││┌─┴┐ │││┌─┼─┐│││ │ ││
││g h│││b  c │││j k m│││ p ││
│└───┘││  ┌┴┐│││  │ ││││┌┴┐││
│     ││  d e│││  l n│││q r││
│     │└─────┘│└─────┘│└───┘│
└─────┴───────┴───────┴─────┘

Notice that we’ve extracted the trees in a different order to the order we see when we render the forest. This is because PPV looks at the order that the root nodes appear in a forest when rendering, while looks at the first appearance of a group indicator in the roots vector. Luckily, it’s easy to correct this if you need to.

roots                   ⍝ the order ⌸ sees
treestrees[⍋∪roots]    ⍝ correct order of trees
data data [⍋∪roots]    ⍝ correct order of data
data PPV¨trees
8 2 12 14
┌─────────┬─────┬───────────┬───────┐
│0 0 0 2 2│2 2 2│3 4 5 3 3 3│3 3 2 2│
└─────────┴─────┴───────────┴───────┘
┌─────┬───┬──────┬────┐
│abcde│ghf│jlnikm│qrop│
└─────┴───┴──────┴────┘
┌───────┬─────┬───────┬─────┐
│┌─────┐│┌───┐│┌─────┐│┌───┐│
││  a  │││ f │││  i  │││ o ││
││┌─┴┐ │││┌┴┐│││┌─┼─┐│││ │ ││
││b  c │││g h│││j k m│││ p ││
││  ┌┴┐││└───┘││  │ ││││┌┴┐││
││  d e││     ││  l n│││q r││
│└─────┘│     │└─────┘│└───┘│
└───────┴─────┴───────┴─────┘

Challenge#

Challenge: Write a function that, given a parent vector representing a forest, joins each tree in the forest under one new root node.

Solution:

Hide code cell content
 {0,(≠⍳≢)×1+}
⍝ │││      │└─┴─ offset to adjust for new root
⍝ ││└──────┴──── set old roots to point to 0 (which we're about to add)
⍝ └┴──────────── add new root the old roots now point to