Skip to content

📐 Data Modeling - Scalable Schemas

The Senior Mindset: Data modeling is the art of balancing write performance, read latency, and storage costs. A scalable schema is one that allows the application to grow by 100x without requiring a complete rewrite of the database layer.


1. Relational Modeling (The “Normalization” Path)

Section titled “1. Relational Modeling (The “Normalization” Path)”

In SQL systems, we often model by entities.

  • Star Schema: Common in data warehousing. A central “Fact” table (e.g., Sales) surrounded by “Dimension” tables (e.g., Products, Dates).
  • Snowflake Schema: A more normalized version of the Star Schema.
  • When to use: When data integrity is the priority and query patterns are diverse.

2. Document Modeling (The “Embedding” Path)

Section titled “2. Document Modeling (The “Embedding” Path)”

In NoSQL (MongoDB), we model by access pattern.

  • Embedding: Storing related data in a single document (e.g., an Order document containing all Line Items).
  • Referencing: Storing an ID and “joining” in the application code.
  • Decision Rule: If the child data “belongs” to the parent and isn’t accessed independently, Embed. If the child data is large or shared, Reference.

When a single table or database is too large:

  • Vertical Partitioning: Moving infrequently used columns to a separate table to reduce row size.
  • Horizontal Partitioning (Sharding): Splitting rows across different physical servers based on a Shard Key (e.g., user_id).
    • Senior Tip: Choosing a bad Shard Key leads to “Hot Spots” (one server doing all the work).
  • Partial Indexes: Indexing only a subset of rows (e.g., WHERE status = 'active') to save space.
  • Functional Indexes: Indexing the result of a function (e.g., LOWER(email)).

⚖️ Trade-offs: Denormalization for Speed

Section titled “⚖️ Trade-offs: Denormalization for Speed”

At high scale, “pure” normalization often fails due to the cost of JOIN operations.

  • Calculated Fields: Storing the total_price in an Orders table instead of summing LineItems every time.
  • Duplication: Storing the username directly on a Comment object to avoid joining the Users table for every comment in a feed.
  • The Risk: Data Inconsistency. You must implement a strategy (like database triggers or application-level events) to update all copies when the source changes.

A senior engineer never assumes the schema is static.

  1. Expand and Contract Pattern:
    • Step 1: Add the new column/table.
    • Step 2: Start writing to both old and new.
    • Step 3: Migrate old data to new.
    • Step 4: Point the app to the new source.
    • Step 5: Delete the old column (Contract).
  2. Feature Toggles: Use flags to toggle between old and new data logic to allow for instant rollbacks.

⚖️ Designing for the Future (The Senior Perspective)

Section titled “⚖️ Designing for the Future (The Senior Perspective)”

How do you handle different types of the same entity?

  • Single Table Inheritance: One big table with a type column and many nulls. (Fastest reads).
  • Table Per Type: Different tables for different types. (Cleanest integrity).
  • Soft Deletes: Adding a deleted_at timestamp instead of deleting rows.
  • Warning: Soft deletes can ruin unique indexes and complicate queries. A senior might suggest an Archive table instead.

💡 Seniority Note: Always ask: “Is this schema optimized for the developer’s mental model or the machine’s execution plan?” Usually, at scale, the machine’s execution plan must win, even if the code becomes slightly more complex.


  • [[Relational-Databases]]
  • [[NoSQL-Databases]]
  • [[Arquitetura-Event-Driven]]