Read Replica vs Database Sharding - Khi Nào Dùng Cái Nào

Cái dự án phải refactor database

Năm 2022, tôi tham gia một dự án e-commerce đang gặp database performance problems. Team trước đó đã implement sharding - chia data theo shop_id, mỗi shard là một database riêng.

Nghe hợp lý. Nhưng khi tôi nhìn vào usage pattern: 80% queries là read (product listing, search, analytics). Write chỉ chiếm 20%. Và read queries thường cần join data across nhiều shops (global search, admin reports).

Sharding đã giải quyết vấn đề không tồn tại - write scalability - trong khi bỏ qua vấn đề thực sự: read scalability.

Migration từ sharded sang read replica architecture mất 3 tháng và là một trong những projects đau đớn nhất tôi từng làm.

Quay lại chuyện kỹ thuật: 2 vấn đề, 2 giải pháp

Read replica giải quyết gì?

Vấn đề: Database bị overload bởi read queries. Write throughput vẫn ổn, nhưng SELECT queries chiếm hết resources.

Giải pháp: Replicate data sang một hoặc nhiều read-only instances. Writes vẫn vào primary, reads được route sang replicas.

           ┌─────────────────┐
           │   Application   │
           └────────┬────────┘
                    │
          ┌─────────┴──────────┐
          │                    │
     ┌────▼───┐          ┌─────▼──────┐
     │ Write  │          │    Read    │
     │Primary │──────────▶  Replica   │
     └────────┘  replicate └──────────┘

Khi nào dùng:

Read:write ratio > 3:1
Query patterns đa dạng, khó partition
Cần reporting/analytics không ảnh hưởng production
Data model complex, nhiều joins

Giới hạn:

Vẫn chỉ scale reads - writes vẫn phụ thuộc vào primary
Replication lag: replica có thể lag vài milliseconds đến vài seconds
Không giúp nếu vấn đề là write throughput

Sharding giải quyết gì?

Vấn đề: Database quá lớn để fit trên một server, hoặc write throughput đã maxed out primary.

Giải pháp: Chia data thành partitions (shards), mỗi shard là một database riêng. Writes được distributed across shards.

           ┌─────────────────┐
           │   Application   │
           └────────┬────────┘
                    │
          ┌─────────┴──────────┐
          ▼                    ▼
     ┌────────┐          ┌────────┐
     │ Shard 1│          │ Shard 2│
     │User A-M│          │User N-Z│
     └────────┘          └────────┘

Khi nào dùng:

Dataset quá lớn cho một server (TB scale)
Write throughput đã maxed out
Queries mostly access data của một entity cụ thể (user_id, tenant_id)
Không cần cross-shard queries

Giới hạn:

Cross-shard queries cực kỳ đau: phải query tất cả shards rồi merge
Shard key chọn sai thì hotspot
Schema changes phải apply cho tất cả shards
Transactions across shards rất phức tạp

Framework quyết định

Vấn đề của bạn là gì?
  │
  ├── CPU/IO cao do reads?
  │     └── Read replica trước
  │
  ├── Storage quá lớn?
  │     └── Sharding - với shard key cẩn thận
  │
  ├── Write throughput maxed out?
  │     └── Sharding - hoặc xem xét queue-based approach
  │
  └── Cả reads lẫn writes?
        └── Sharding + read replica per shard (complex nhưng đúng)

Quy tắc vàng: Đừng shard trước khi thật sự cần.

Read replica thêm vào production environment thường là 1-2 ngày. Sharding thì có thể mất vài tháng và là một-trong-những-migration-đau-nhất của đời developer.

Implement Read Replica với .NET và PostgreSQL

// DbContext configuration với read replica routing
public class ApplicationDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        // Primary connection string - writes go here
        optionsBuilder.UseNpgsql(
            Environment.GetEnvironmentVariable("PRIMARY_DB_CONNECTION"));
    }
}

// Separate read-only context
public class ReadOnlyDbContext : ApplicationDbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        // Read replica connection string
        optionsBuilder.UseNpgsql(
            Environment.GetEnvironmentVariable("REPLICA_DB_CONNECTION"));
    }

    // Prevent any write operations
    public override int SaveChanges() =>
        throw new InvalidOperationException("Read-only context cannot save changes");
}

// Service layer - use correct context
public class ProductQueryService
{
    private readonly ReadOnlyDbContext _readContext; // Queries use replica

    public async Task<List<Product>> GetProductsByCategoryAsync(int categoryId)
    {
        return await _readContext.Products
            .Where(p => p.CategoryId == categoryId && p.IsActive)
            .OrderBy(p => p.Name)
            .ToListAsync();
    }
}

public class OrderService
{
    private readonly ApplicationDbContext _writeContext; // Writes use primary

    public async Task<Order> CreateOrderAsync(CreateOrderCommand command)
    {
        var order = new Order { /* ... */ };
        _writeContext.Orders.Add(order);
        await _writeContext.SaveChangesAsync();
        return order;
    }
}

Lưu ý về replication lag:

// Với critical reads sau write - cần dùng primary, không dùng replica
public class OrderConfirmationService
{
    private readonly ApplicationDbContext _primaryContext; // Không phải replica

    public async Task<OrderStatus> GetStatusAfterCreationAsync(int orderId)
    {
        // Đây là read ngay sau write - dùng primary để tránh "read your own writes" problem
        return await _primaryContext.Orders
            .Where(o => o.Id == orderId)
            .Select(o => o.Status)
            .SingleAsync();
    }
}

Kinh nghiệm thực tế: Checklist trước khi quyết định

Profile hiện tại: Đo read/write ratio thực tế. Đừng estimate.
Query patterns: Cross-entity queries nhiều không? Nếu có → sharding sẽ đau.
Growth projection: Dataset sẽ lớn đến đâu trong 2 năm?
Team capacity: Sharding phức tạp hơn nhiều. Team có thể maintain không?
Start simple: Read replica trước - nếu vẫn không đủ, xem xét sharding.

Triết lý

Database architecture là một trong những quyết định hardest-to-change. Không như application code có thể refactor dễ dàng, thay đổi database sharding strategy đòi hỏi migration data, downtime planning, và testing phức tạp.

"Premature optimization is the root of all evil" - nhưng với database, late optimization cũng rất đau.

Đánh giá đúng từ đầu - và chọn solution đơn giản nhất đủ giải quyết vấn đề hiện tại, với headroom cho growth gần nhất.

Bạn đã gặp tình huống này chưa?

Bạn đang dùng strategy nào - read replica, sharding, hay cả hai? Và nếu bạn đã từng migrate - bạn có tip gì? 👇

/Son Do - believe in basic

#1percentbetter #SolutionArchitecture #Database #SystemDesign #PostgreSQL #SQLServer

Read replica vs sharding - chọn sai thì refactor rất đau

Cái dự án phải refactor database

Quay lại chuyện kỹ thuật: 2 vấn đề, 2 giải pháp

Read replica giải quyết gì?

Sharding giải quyết gì?

Framework quyết định

Implement Read Replica với .NET và PostgreSQL

Kinh nghiệm thực tế: Checklist trước khi quyết định

Triết lý

Bạn đã gặp tình huống này chưa?

Bài viết liên quan

PostgreSQL vs SQL Server - quyết định tôi đã đưa ra và tại sao

Conway's Law - tại sao architecture của bạn trông giống org chart

Khách hàng muốn 'hệ thống nhanh' - làm thế nào để biết nhanh là bao nhiêu