Jesse's Software Engineering Blog
Binary Search Tree
Binary search trees are excellent data structures for determining relativity i.e. determining where keys are stored in relation to other keys, on data sets that need to be updated. Sorted arrays can be used for relativity using a binary search algorithm which provides search in O(logn); however this is only efficient on static data sets, i.e. data that does not need to handle deletion or updates (shifting values). Binary search works by comparing the median element (the root) and moving to smaller values (the left) or larger values (the right). Binary search tree expands on this concept by combining with a linked list that offers two links per node instead of just one. Having the extra node provides both the efficient insertion and search.
NOTE: Checkout an example binary search tree.
- Each node is pointed to by at most one other node, the parent.
- Each node points to at most two nodes: a left node pointing to a smaller key, right node pointing to larger key.
- Each node has a comparable key.
- Each node knows its size, which is the number of nodes beneath it plus its self.
Insertion is a simple recursive key comparison. If node key is larger recurse down the left node, if node key is smaller recurse down right node, or if keys are equal update the value. For better performance, it is best to insert keys in random order as complexity is actually O(h) which is tree height. By inserting string in a non-ordered fashion, the BST has a higher probability of being balanced. Average: O(logn), Worst: O(n)
Search is a recursive key comparison. If a node key is larger recurse down the left node, if node key is smaller recurse down the right node. If key found, return value, if no more nodes then key does not exist. The search interval halves just about every iteration, as well as the subtree sizes. Average: O(logn), Worst: O(n)
Max/min functions are recursion down the right/left subtrees respectively as the binary search tree requires that all larger keys are placed into a right node and smaller keys into a left node. Average: O(logn), Worst: O(n)
Floor returns the rounded down key for a given key, while ceil returns the round up value. If the key is in the tree, the key will be returned, otherwise floor() will return the next smallest key and ceil() the next largest.
For example if a tree has the values: a, d, and m. bst.floor('b') === 'a'; bst.ceil('b') === 'd'; bst.floor('m') === 'm'; bst.ceil('x') === null;
For floor, if node.key is larger than key, smaller values must be located on the left, recurse down the left node. If node.key is smaller, move to the right node, but again recurse down the left chain if applicable. When there are no more nodes to search, the most recently found node will be returned as the floor value. Ceil works the same except inverted. Average: O(logn), Worst: O(n)
Select will return the node that has x number of keys smaller. This can be determined by looking at the size of the nodes. If the size of the left node is equal to x we know the current node is the correct floor value. If the size of the left node is greater than x, we know that the floor value is somewhere in the left subtree, recurse down. If the size of the left node is smaller than x we know that no node smaller than the current node resides down the left chain, i.e. no nodes in the left chain can have more nodes smaller than the current node, therefore we move to the right. When iterating the right node, we can subtract the size of the left node from the initial x because those values are guaranteed to be smaller than the right node. We also subtract 1 from x to signify we are iterating onto the next node. Once node size === selection we know we have hit the correct value. Average: O(logn), Worst: O(n)
For example, a tree with the following values: a, c, e, h, m, r, s, y: bst.select(4) === 'm';
Rank returns the number of keys that are smaller than key, k (rank). If node.key === rank, we know the rank is the size of the left node. If node.key > rank we need to recursively search down the left node. Everything in the right node will be larger than rank and can be ignored. When recursing through the left, we know all values are greater than the rank. Once node.key < rank is discovered, we need to recurse down the right branch, summing up all the left subtrees, including the nodes themselves. Average: O(logn), Worst: O(n)
For example, a tree with the following values: a, c, e, h, m, r, s, y: bst.rank('h') === 3; bst.rank('y') === 7;
Delete min recurses down the left nodes. Once there are no more left nodes we’ve found min, replace parent’s left node reference with the min’s right node. Be sure to update the size of all the nodes. Average: O(logn), Worst: O(n)
Delete first finds the node in question by searching down the tree. Once the node is found, if no left or right node is found, then the node can be removed by simply removing the parent’s reference to the node. If the node has a left OR right node, the node can be replacing the parent’s reference to the right or left subnode. If the node to delete has both right AND left nodes, a successor node needs to be determined. The successor must be a node from the right tree (needs to be larger than the left subtree), and must be the smallest key in that tree. We leverage the min function for identifying the min value in the node and deleteMin for removing that node from the tree. We also must update all the nodes’ sizes as we recurse down the structure. This can also be done by replacing the node with it’s predecessor opposed to the successor. The efficiency of a successor vs a predecessor replacement depends on the tree structure. Average: O(logn), Worst: O(n)
A predecessor node is the node that lies directly behind a node while it’s successor lives directly after. When looking at a tree, the predecessor will be the largest node in a left subtree, while a successor will be the smallest node in the right subtree. Average: O(logn), Worst: O(n)
Range search queries can be accomplished by doing a ranged inorder traversal, and storing every key that meets the range requirements. An inorder traversal recurses all the left nodes first to get to the smallest value, then recurses the right nodes as the recursion returns back towards the root, hitting each node in key order. A ranged traversal cuts out any sub nodes that would not fit within the search range. Average: O(logn), Worst: O(n)
For example, a tree with the following values: a, c, e, h, m, r, s, y: bst.range('c', 'r') === ['c', 'e', 'h', 'm', 'r'];
Wikipedia has some good images of the different tree traversal algorithms.
The depth first searches, the searches that go down the trees, are almost identical. The only difference is depends on when the node keys are stored. For example the in order hits all the left nodes prior to storing the keys. Pre order stores the keys prior to recursing down the left, and post order recurses down both the left and right prior to key storage.
The breadth first search, going across the tree prior to going down can easily be written with a queue. By continually queueing the left and right nodes and recursing through the queue, the nodes can be hit in breadth order.
The binary search tree is a relatively simple data structure to write and offers great search and relativity functions. It’s important to store data in random order to try and achieve a balanced as possible tree.