Data structures
The conventional wisdom for when to use a linked list over contiguous storage hasn't applied for years: you have to test. If everything is in a cache, a vector might outperform a linked list for insertion.
time/program = instructions/program * clockCycles/instruction * time/clockCycles
Also see Iron Law of processor performance.
Heaps
What is the difference between a heap and a binary search tree?
ChatGPT:
A heap is a type of data structure that is organized based on the heap property, which states that the parent of a node is either greater than or equal to (max heap) or less than or equal to (min heap) its children. A binary search tree is a type of data structure that is organized based on a node’s value. Each node has two children, with the left child being less than the parent node and the right child being greater than the parent node. Additionally, all left descendants of a node are less than or equal to the node and all right descendants are greater than or equal to the node.
How does a heap differ from a red-black tree?
ChatGPT: A heap is a data structure that is organized as a binary tree which allows for efficient retrieval of the maximum or minimum element. It is a type of Priority Queue, where elements are organized in a complete binary tree, with each node having a priority associated with it. Heaps have the following properties:
- Every node has a key (priority) and a value associated with it.
- The heap is complete binary tree, which means that all levels except the last one are completely filled and all nodes are as far left as possible.
- Every node has a key that is smaller than or equal to its parent.
A red-black tree is a type of self-balancing binary search tree, where each node is colored either red or black. Red-black trees have the following properties:
- Every node is either red or black.
- The root is black.
- Every leaf (NIL) is black.
- If a node is red, then both its children are black.
- Every simple path from a node to a descendant leaf contains the same number of black nodes.
The main difference between a heap and a red-black tree is that a heap is an unordered data structure while a red-black tree is an ordered data structure. Heaps are used for efficient retrieval of the maximum or minimum element, while red-black trees are used for efficient storage and retrieval of items in sorted order. Heaps are used for priority queues, while red-black trees are used for dictionary like data structures.
Max heap
In a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap (with no parents) is called the root node.
Support of random access iterators is required to keep a heap structure
internally at all times. This is done automatically by the container adaptor by
automatically calling the algorithm functions make_heap
, push_heap
and
pop_heap
when needed. The standard container classes vector
, deque
and
list
fulfill these requirements. By default, if no container class is
specified for a particular stack class instantiation, the standard container
deque
is used.
A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.
A beap, or bi-parental heap, is a data structure where a node usually has two parents
A B-heap is a binary heap implemented to keep subtrees in a single page. This reduces the number of pages accessed by up to a factor of ten for big heaps when using virtual memory, compared with the traditional implementation.[1] The traditional mapping of elements to locations in an array puts almost every level in a different page.
See Beap (bi-parental heap) and Heaps)
Microbenchmarking for cache misses
Making a small part of the system make fewer cache misses means more hot cache is available for the rest of the system.
Trees
- Trees are like a linked list but not just a straight line.
- Fundamental rule: there is one path between nodes in a tree.
- A binary tree has at most two child noes: known as the left and right children
- Trie
Self-balancing binary search trees
A balanced tree is one of height O(log n), where n is the number of nodes in the tree.
- Red Black tree
- B-Tree
- k-d tree
- Sorted hierarchy of data
- A root node
- Left most node is the smallest, right most node is the largest
- Adding nodes that already exist: add to the right (larger)
Terminology
- Root / top / head node
- Leaf / terminals nodes
- Child / parent
Node traversal
- Pre-order
- In-order - left first, enumerates in sort order
- Post-order
Implementation
Recursive or stack.
AVL tree
In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation.
B-tree (binary tree)
A red–black tree is a kind of self-balancing binary search tree in computer science. Each node of the binary tree has an extra bit, and that bit is often interpreted as the color (red or black) of the node. These color bits are used to ensure the tree remains approximately balanced during insertions and deletions.
- Balanced: all leaf nodes same time to travel from root
- All nodes have the same capacity M
- Inner and leaf nodes require minimum fill of m >= M/2
- h = (m+1)^h
- Typically a disk optimised data structure
Tree types
- Geohash
- R-tree - rectangle tree
- M-tree - multidimensional tree
Bloom filter
A bloom filter tests whether an element is a member of a set.
References
Complexities for common data structures
See time complexities for different data structures.
See choosing the right container.
Where does O(nlogn) comes from? Stirling's approximation is where.
Hash tables -- access/search, insert/edit, delete
Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n).
Unordered maps and set are implemented using hash tables.
Singly linked lists
Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).
Doubly Linked Lists
Store a pointer to the beginning and end. Therefore operations on the end are also O(1).
Maps and set
Maps and sets are implemented using a red-black tree, a type of balanced binary search tree.