this dir | view | cards | source | edit | dark
top
Lecture
- multi-core processor
- is it a distributed system? no
- the whole processor crashes (we cannot have one core working and other one crashed)
- partial failure
- in a distributed system, the components may fail independently
- „distribuovaný systém je to tehdy, když můj počítač přestane fungovat, protože se rozbil počítač, o kterém jsem ani nevěděl, že existuje“
- sharing information, taking a common decision
- why do we want/have a distributed system?
- users are in different places → it is a distributed system by nature
- better performance, stability (if one server fails, we have another)
Ordering of Events
- notation
- set of processes P={p1,…,pn}
- they communicate through channels (channel from pi to pj … Cij)
- event #k on process pi … eik
- event changes the state of the process
- states of the process σi0,σi1,…
- event ei1 performs the transition from σi0 to σi1 on process pi
- local history hi=ei1ei2ei3…
- global history H=h1∪h2∪h3…
- happened_before (→) relation between two events
- first case: eik→eiℓ iff k<ℓ
- second case: send(m) → recv(m)
- we can receive a message m only after we send it
- transitivity: e→e′∧e′→e′′⟹e→e′′
- antisymmetry: e→e′⟹e′→e
- it is a partial ordering of events
- if there is no relation between e and e′, we say they are concurrent
- global state (σ1k1,σ2k2,σ3k3,…)
- global state is composed of the local state of every process
- cut
- cut C … subset of the global history H that includes a prefix of each local history
- consistent cut … the global state could have existed
- cut C is consistent iff e′∈C∧e→e′⟹e∈C
- Chandy-Lamport
- we assume FIFO channels
- to save a snapshot, we need an initiator process which is the first to save its state and broadcast the message SNAPSHOT
- when process pi receives the SNAPSHOT message, it saves its state and also broadcasts SNAPSHOT (there are no other events in between)
- the state of channel cji corresponds to the messages that the process pi received from pj between broadcasting SNAPSHOT and receiving SNAPSHOT from cj
- after pi receives SNAPSHOT from all processes, it knows that the computation of the snapshot is terminated
Time
- how to put timestamps on events to have guarantees that one event happened before another event?
- I would like a timestamp function TS such that e→e′⟺TS(e)<TS(e′)
- we are assuming an asynchronous system
- no bound on message delays
- no bound on relative speed of processes
- Lamport Clock LC
- algorithm
- initial state … LCi←0
- for any internal event … LCi←LCi+1
- on send(m) … LCi←LCi+1
- attach LCi to m, so that ts(m)=LCi(send(m))
- when pj receives m
- LCj←max(LCj,ts(m))+1
- what can we say about LC?
- e→e′⟹LC(e)<LC(e′)
- we don't have ⟸
- LCi(e) corresponds to the length of the longest chain of events leading to e
- vector clocks
- definition
- for i=j:VC(ei)[j]= number of events on pi up to and including ei
- for i=j:VC(ei)[j]= number of events on pj that happened_before ei
- algorithm
- if ei is internal or send(m)
- VC(ei)[i]←VCi[i]+1
- VC(ei)[j]←VCi[j]
- if ei is recv(m)
- VC(ei)←max(VCi,ts(m))
- VC(ei)[i]←VC(ei)[i]+1