this dir | view | cards | source | edit | dark
top
Lecture
Introduction
- data structures: data + operations (queries, updates)
- complexity depends on the specific implementation
- static or dynamic (does the data structure support updates?)
- queue (FIFO)
- operations: enqueue(x), dequeue, isempty
- implementations
- (single) linked list with two pointers (head, tail)
- enqueue Θ(1)
- dequeue Θ(1)
- isempty Θ(1)
- (rotating) array with two pointers
- it's useful for the tail pointer to point at the first free cell
- the queue can only store less elements than the capacity of the array
- again, constant complexities
- note: when we use Θ, we should make it clear if it relates to the implementation or to the operation in general (for any possible implementation)
- set S⊆U
- operations
- insert(x)
- delete(x)
- find(x)
- build(x1,…,xn) for distinct elements (can be faster than running insert n times)
- implementations
- unsorted linked list
- we could assume x∈/S
- without this assumption, the complexity of insert is O(n)
- delete, find … O(n)
- build is also O(n) (which is faster than repeating insert n times)
- (rotating) array
- sorted array
- we need some linear order on the U
- find … binary search, O(logn)
- insert … O(n) (we need to shift the elements)
- delete … O(n) again
- build … Θ(nlogn)
- BST (balanced search tree)
- we have tree property (elements in the left subtree are strictly smaller than the current node, similarly for the right subtree)
- insert, delete, find O(logn)
- build O(nlogn)
- note: we could write thetas everywhere
- hashing
- we randomly choose a hashing function satisfying some statistical properties
- we get insert, delete, find in O(1) but it is expected time
- build in O(n)
- dictionary – like the set (but for each item, we have a pointer to the data)
- multiset – set with counters (→ dictionary where the data is the number of occurences)
- ordered set
- new operations: min, max
- linked list O(n), array O(n), sorted array O(1), BST O(logn), hashing O(n) (we need to go through the entire table)
- pred(x), succ(x)
- linked list O(n), array O(n), sorted array O(logn), BST O(logn), hashing O(n)
- RAM (random access machine)
- potentially infinite memory organized into cells
- each is addressed by a natural number
- we keep natural numbers inside the cells (we need set an upper bound!)
- we can access an arbitrary cell
- numbers in cells ≤p(len(I),max(I))
- p … polynomial
- I … instance of the problem (= the input)
- instructions: store, arithmetical operations, logical operations, goto, memory allocator (allocates consecutive space)
- time … number of instructions
- space … number of cells used (including allocated empty space)
Amortized Complexity
- stretchable array
- usually append in constant time
- sometimes we need to reallocate the array (to a larger one)
- worst-case complexity O(n)
- but such allocation does not happen often
- let's consider n operations
- we consider n such that 2k−1<n≤2k
- we start with capacity 1
- total cost of reallocation (aggregated)
- 20+21+⋯+2k−1=2k−1=Θ(n)
- amortized cost Θ(1)
- “aggregation method”
- flexible array
- stack
- append(x), removelast
- shrink to C/2 if n<C/2
- this is problematic, we would get linear amortized complexity
- we should use n<C/4 instead
- amortized analysis
- average over operations in the worst-case scenario
- append – double the capacity if the array is full … costs O(C) in this case
- removelast – halve the capacity if the array becomes shorter than C/4 … costs O(C)
- we consider blocks of operations
- block boundaries = growing/shrinking operation
- “how many elements does the array contain at the beginning and at the end of the block?”
- if growing: C at the end, then C/2 at the beginning
- if shrinking: C/4 at the end, then C/2 at the beginning
- so at the beginning of the block, the array always contains C/2 elements
- at the end of the block, the array contains C/4 or C elements
- → there are at least C/4 operations in each block
- → amortized O(1) for operation
- “accounting method” – we account the cost to the operations
- binary counter, ℓ bits
- increment operation
- how many bits we flip?
- worst case: O(ℓ) bit flips
- aggregation method
- observation: bit i changes once every 2i−1 operations
- → # bit flips =∑i=1ℓ⌊2i−1n⌋≤n∑i=0∞2i1=2n
- coin method
- 1 coin … 1 bit flip
- 000 → 001
- we pay 1 coin for flipping 0→1
- we store 1 coin on 1
- 001 → 010
- we use the stored coin – it pays for 1→0
- we pay 1 coin for 0→1
- we store 1 coin on 1 again
- → 2 coins suffice
- so we get O(1) amortized
- potential method (generalization of coin method)
- m operations, real costs Ri
- we want to compute upper estimates on the costs
- amortized cost Ai … our choice
- we want ∑j=1mAj−∑j=1mRj≥0
- “never owe time”
- this should hold not only for m but for any i≤m
- we define potential Φi:=∑j=1iAj−∑j=1iRj
- Φ0=0 by definition
- ΔΦi=Φi−Φi=1=Ai−Ri
0 → saving time
- = 0 → pay exactly
- < 0 → spending time
- back to the bit flips
- Φi= number of coins saved = number of ones
- Ai=2
- Ri=1+t
- t … number of trailing ones
- ΔΦi=1−t
- flexible array
- Φi= number of ops since last reallocation
- Ri=1 (without reallocation) or 1+Θ(n)
- Ai=2
- we may need to rescale this
- different approach
- we choose Φi≥0
- Ai:=Ri+ΔΦi
- so ∑i=1mAi=∑i=1mRi+∑i=1m(Φi−Φi−1)
- the last sum is a telescopic sum
- ∑i=1mAi=∑i=1mRi+Φm−Φ0
- Φm−Φ0 should be ≥0
- we can have a non-zero Φ0 but this difference has to be non-negative
- steps
- choose unit cost
- estimate (upper) Ri
- choose potential Φ
- show Φi≥0
- aim Ai=Ri+ΔΦi to be small
- lazily balanced trees
- notation
- n … number of items stored
- v … node
- T(v) … subtree rooted at v
- ℓ(v),r(v) … left/right child of v
- s(v) … size = number of nodes in T(v) including v
- def: BST is perfectly balanced if for every node v it holds that ∣s(ℓ(v))−s(r(v))∣≤1
- the ratio is approximately 1 : 1
- def: BST is balanced if for every v and its every child c it holds that s(ℓ(c))≤32s(v)
- the ratio is between 1 : 2 and 2 : 1
- lemma: any balanced BST has depth O(logn)
- imagine the longest path from the root to a leaf
- in each step from the root to the leaf, the size drops to at most 2/3 of the preceding size
- size of the root … n
- size of the leaf … 1
- 1≤(32)dn
- d≤log2/3(1/n)=log3/2n=O(logn)
- insert operation
- find, add a leaf, update sizes (we keep track of the sizes s(v))
- if not balanced, rebuild in the highest unbalanced node
- observation: for n sorted items, build takes linear time
- select the middle item as the root and split the rest in two subtrees
- proceed recursively
- Φ:=∑vφ(v)
- φ(v):={∣s(ℓ(v))−s(r(v))∣0if at least 2otherwise
- the clamping ensures that perfectly balanced tree has zero potential
- cost of insert
- no rebuild: A=R+ΔΦ=O(logn)+O(logn)=O(logn)
- ΔΦ=O(logn) because Δφ≤2 for each visited node (usually Δφ=1 but there may be nodes with φ hopping from 0 to 2 due to the clamping)
- rebuild at v
- the invariant was broken for v and its child c
- WLOG s(ℓ(v))>32s(v)⟹s(r(v))<31s(v)⟹φ(v)>31s(v)
- after the rebuild, this contribution and all the contributions in the subtree become zero
- contributions elsewhere stay the same
- cost of rebuild: O(s(v))+c⋅ΔΦ≤0
- for sufficiently large cost c
- splay trees
- splay(x) … pull node x to the root
- preferably using double rotation (zigzig, zigzag)
- possibly with one single rotation (zig)
- zig
- y has child x
- zig(x) makes y child of x
- zigzig, zigzag
- z has child y
- y has child x
- zigzig(x)/zigzag(x) moves x above both y and z
- zigzig is used if x<y<z or z<y<x
- zigzag is used if y<x<z
- splay performs a sequence of the operations until x becomes the root
- find(x)
- search(x)
- splay(lowest visited node)
- how to choose potential
- Φ:=∑vr(v)
- rank r(v)=logs(v)
- 1≤s(v)≤n
- 0≤r(v)≤logn
- 0≤Φ≤nlogn