Fundamentals of Scalable Systems - System Design Course

Contents hide

1) Latency vs. Throughput

1.1) What is Latency?

1.2) What is Throughput?

2) Load balancing

2.1) What is Load Balancing?

3) Availability, Reliability, Scalability

3.1) Availability

3.2) Reliability

3.3) Scalability

Latency vs. Throughput

In system design and performance engineering, latency and throughput are two key metrics used to measure how well a system performs under different conditions. Though often mentioned together, they represent very different aspects of performance.

What is Latency?

Latency is the time delay between a request and its corresponding response. It measures how quickly a system responds to a single request.

Key Points:

Measured in milliseconds (ms) or seconds (s)
Indicates responsiveness
Lower latency means faster response time
Important in real-time systems (e.g., gaming, video calls, trading platforms)

Example: If you send a message and it takes 120 milliseconds to appear on the recipient’s screen, the latency is 120ms.

What is Throughput?

Throughput refers to the number of operations or requests a system can handle per unit of time. It measures the system’s capacity.

Key Points:

Measured in requests per second (RPS) or transactions per second (TPS)
Indicates how much work a system can perform
Higher throughput means better system efficiency
Critical for high-load systems (e.g., web servers, APIs, databases)

Example: If a system can process 10,000 requests per second, its throughput is 10,000 RPS.

Key Differences Between Latency and Throughput

Feature	Latency	Throughput
Definition	Time taken to process one request	Number of requests processed per time
Measured in	Time units (ms, s)	Operations per time unit (RPS, TPS)
Focus	Speed of individual operations	Volume of operations over time
Priority in	Real-time responsiveness	Bulk processing performance

Analogy

Latency is like the time it takes for a car to travel from point A to B.
Throughput is like how many cars can pass through a highway in one hour.

Why Both Matter

Low latency ensures fast, responsive systems.
High throughput ensures systems handle scale and heavy usage.
A well-designed system balances both for optimal user experience.

Load balancing

What is Load Balancing?

Load balancing is the process of distributing incoming network traffic or system workload across multiple servers or resources to ensure no single server becomes overwhelmed, improving responsiveness and availability.

Why Load Balancing is Important

High Availability: Ensures services remain accessible even if one or more servers fail.
Scalability: Helps systems handle increased traffic by adding more servers behind the load balancer.
Performance Optimization: Distributes load to avoid bottlenecks and reduce response time.
Redundancy: Prevents downtime by rerouting traffic when a server becomes unresponsive.

How Load Balancing Works

When a user makes a request to a web application, the request is intercepted by a load balancer, which forwards the request to one of the backend servers based on specific algorithms or rules.

Common Load Balancing Algorithms

Round Robin: Distributes requests sequentially across the server pool.
Least Connections: Directs traffic to the server with the fewest active connections.
IP Hash: Assigns traffic based on the client’s IP address to maintain session persistence.
Weighted Round Robin: Assigns more requests to servers with higher capacity.

Types of Load Balancers

Hardware Load Balancer: Physical devices that manage network traffic; often used in enterprise environments.
Software Load Balancer: Applications (like HAProxy, Nginx) that distribute traffic on general-purpose servers.
Cloud Load Balancer: Load balancing services offered by cloud providers like AWS, Azure, and GCP.

Where Load Balancers Are Used

Web Servers: To manage high-traffic websites and APIs.
Application Servers: To balance processing loads across microservices.
Database Servers: In read replicas or clusters to distribute query loads.

Benefits of Load Balancing

Improves user experience with faster response times.
Prevents service disruption due to server overload.
Enhances security by hiding internal server architecture.
Makes scaling and maintenance easier without downtime.

Conclusion

Load balancing is a foundational strategy in modern system design, enabling systems to be resilient, scalable, and performant under variable traffic conditions.

Availability, Reliability, Scalability

Availability

Definition: Availability refers to the ability of a system or component to be operational and accessible when required for use.

Goal: Ensure users can access the service with minimal downtime.

Techniques:

Redundancy: Duplicate components or services to prevent failure.
Failover Systems: Automatic switching to backup systems during failures.
Health Checks: Monitoring to ensure services are running and available.

Reliability

Definition: Reliability is the probability that a system performs correctly and consistently over a specific period of time.

Goal: Ensure the system performs its intended function without failure.

Techniques:

Retry Mechanisms: Retrying failed operations to ensure completion.
Data Replication: Storing data in multiple locations for consistency and recovery.
Error Handling: Catching and resolving exceptions gracefully.

Scalability

Definition: Scalability refers to a system’s ability to handle increased load by adding resources such as servers, databases, or bandwidth.

Goal: Maintain performance and responsiveness as user demand grows.

Types:

Vertical Scaling: Increasing the capacity of a single machine (CPU, RAM).
Horizontal Scaling: Adding more machines or nodes to distribute load.

Best Practices:

Load Balancing: Distribute traffic evenly to prevent bottlenecks.
Stateless Architecture: Make components independent for easier scaling.
Caching: Reduce database load and improve performance.