April 16, 2020 , ☕️ 12 min read
This blog some shattered notes maybe we should consider while studying about how to build scalable distributed system before we dive into let’s start with some definitions as they are the main characteristics of any distributed system
performance optimize response time.
improving performance speeds up response time but the total number of requests may not change.
scalability optimize the ability to handle the load.
improving scalability increases the ability to handle the load but the performance of each request may not change.
distributed systems are separated by space there’s a limit on information speed, it takes time to transfer the information, the state of the original sender may have changed, This means the receiver of on the information is always dealing with a stale data.
Eventual consistency garnets that in absence of new updates, all access to a specific piece of data will eventually return the most recent value.
Eventual consistent models break down into many different forms
Strong consistency garnets that an update to a piece of data needs an agreement from all nodes before it becomes visible.
we can apply strong consistency by using a non distributed system as lock to our data that lock has the data only and the whole distributed system read from it.
that comes at a cost, It affects our availability to recline to be elastic as this becomes a point of contention in the system.
Any two things that content for a single limited resource are in contention, this contention can only have one winner.
other forced to wait for the winner to complete, as the number of things competing for increases the time to free up resources increases and eventually exceed acceptable time.
The low defines the maximum improvements gained by parallel processing.
In distributed systems synchronizing the state of multiple nodes is done using cross talks, Each node in the system will send message to other nodes informing them of any state change.
The time it takes for synchronization to complete is called Coherency Delay.
increasing the number of nodes increases the delay
Gunther’s law builds on Amdahl’s law, in addition of contention if accounts for coherency delay, Coherence delay result in negative returns as the system scale-up, as the cost to coordinate between nodes extends any benefits from scaling up
Both Amdahl’s and Gunther’s law demonstrate linear scalability is almost unachievable.
linear scalability requires total isolation, reactive systems understand the limitation imposed by these laws and accept them rather than avoiding them.
Reactive systems reduce contention by:
They mitigate coherency delay by
this allows for higher scalability by reducing or eliminating things that prevent scalability
Contention In Sharded systems
the router represents a source of contention as well, but A sharded system minimizes contention by
Sharding is a CP solution, therefore it sacrifices availability if a shard goes down there is a period of time where it’s unavailable, the shard will migrate to another node eventually
Caching with Shards
caching is problematic in distributed systems, how do you maintain cache consistency with multiple writers? sharding allows each entity to maintain cache consistency the cache is unique to that entity
CRDTs are specially designed datatypes they are highly available and eventually consistent as a data store in multiple replicas for availability
A replica updates are applied on one replica and then copied asynchronously updates are merged to determine to find state.
CRDTs are a solution for availability, there are two types of CRDTs
→ CVRDT: Convergent Replicated Data Types
copy state between replicate, so it requires a merge operation that understands how to combine two states, so merge operation must be
→ CMRDT: Communicative Replicated Data Types
copy operations between replicas
The effect of CRDTs
CRDTs in distributed data is stored in memory
That requires the entire structure fit into the memory, they can be copied to disk as well that speeds up recovery if replica fails but during normal operation all data still in the memory
Personal blog by Mustafa Hussain. I share what I learned about Software Engineering, Productivity, and building new habits on my Youtube channel and blog. Feel free to join my newsletter to follow along.