DataStructures Series #14 – The B Tree Data-structure

Oak Tree Png, Vector, PSD, and Clipart With Transparent Background ...
Image source: https://pngtree.com/

The B Tree is a self-balancing tree data-structure that maintains sorted data and allows searches, sequential access, insertions and deletions in logarithm time. The B Tree is similar to a binary tree except that a node isn’t restricted to having just two children.

Every non-leaf node in a B-Tree is structured to have all its children keys greater than the key before it and less than the key after it as shown in the sample B-Tree image below.

A B-Tree is characterised by its order. A B-Tree of order m is a tree which satisfies the following properties:

  • Every node has at most m children.
  • Every non-leaf node (except root) has at least m/2 child nodes.
  • The root has at least two children if it is non-leaf node.
  • A non-leaf node with k children contains k-1 keys.
  • All leaves appear in the same level and carry no information.

Illustration with the B-Tree below having a degree 5.

A example of a B-Tree data-structure
The inner circles with numbers represent the node keys while the green dots represent the children nodes.

Every node has at most m children

All nodes in the B-Tree diagram have the capacity to accommodate 5 children nodes (the green dots)

Every non-leaf node (except root) has at least m/2 child nodes

All nodes shown above are either root or leaf nodes, in a case where there are nodes in between, they must all have at least m/2 child nodes, in this case, inserting a node in between in the example above must have at least 2 (5/2) nodes.

The root has at least two children if it is a non-leaf node.

If the root is a non-leaf node as shown in the B-Tree diagram above, then it must have at least two children nodes.

A non-leaf node with k children contains k-1 keys.

As seen in the B-Tree diagram above, the green dots are the children nodes and every node has (no_of_children_nodes – 1) keys e.g the root node having 3 children (green dots) has 2 keys. This means the max number of keys that can be present in any given node in the B-Tree above is 4 (max_no_of_children – 1).

All leaves appear in the same level and carry no information.

All leaves are on the same level. This maintains the balanced property to ensure operations on the B-Tree always happen in O(log n) time complexity.

Operations on a B-Tree

  • Search
  • Insert
  • Delete

Search

Similar to the implementation for the Binary Search Tree. We start from the root node, go through the keys and compare with the search key until we find a node greater than or equal to the key, if the key is found, we return the node attached to it otherwise we . Otherwise we visit the key’s node children and repeat the process until we find the exact key.

Insert

  • Find the leaf node where the item should be inserted
  • If the leaf node can accommodate another item (It has no more than m-1 items), insert the item into the correct location in the node.
  • If the leaf node is full, split the node in two, with the smaller half of the items in one node and the larger half in the other. Promote the median item to the parent node. If the parent node is full, split and repeat process

There is a graphical explanation of how this is done in the video below:

Delete

Deletion is similar to the insert process.

Real life Applications

  • Databases
  • Filesystems

REFERENCES