📐 Data Modeling - Scalable Schemas

The Senior Mindset: Data modeling is the art of balancing write performance, read latency, and storage costs. A scalable schema is one that allows the application to grow by 100x without requiring a complete rewrite of the database layer.

🏗️ Core Modeling Strategies

1. Relational Modeling (The “Normalization” Path)

In SQL systems, we often model by entities.

Star Schema: Common in data warehousing. A central “Fact” table (e.g., Sales) surrounded by “Dimension” tables (e.g., Products, Dates).
Snowflake Schema: A more normalized version of the Star Schema.
When to use: When data integrity is the priority and query patterns are diverse.

2. Document Modeling (The “Embedding” Path)

In NoSQL (MongoDB), we model by access pattern.

Embedding: Storing related data in a single document (e.g., an Order document containing all Line Items).
Referencing: Storing an ID and “joining” in the application code.
Decision Rule: If the child data “belongs” to the parent and isn’t accessed independently, Embed. If the child data is large or shared, Reference.

🚀 Scaling Techniques for Large Schemas

Partitioning & Sharding

When a single table or database is too large:

Vertical Partitioning: Moving infrequently used columns to a separate table to reduce row size.
Horizontal Partitioning (Sharding): Splitting rows across different physical servers based on a Shard Key (e.g., user_id).
- Senior Tip: Choosing a bad Shard Key leads to “Hot Spots” (one server doing all the work).

Indexing for Scale

Partial Indexes: Indexing only a subset of rows (e.g., WHERE status = 'active') to save space.
Functional Indexes: Indexing the result of a function (e.g., LOWER(email)).

⚖️ Trade-offs: Denormalization for Speed

At high scale, “pure” normalization often fails due to the cost of JOIN operations.

Calculated Fields: Storing the total_price in an Orders table instead of summing LineItems every time.
Duplication: Storing the username directly on a Comment object to avoid joining the Users table for every comment in a feed.
The Risk: Data Inconsistency. You must implement a strategy (like database triggers or application-level events) to update all copies when the source changes.

🛠️ Schema Evolution & Migrations

A senior engineer never assumes the schema is static.

Expand and Contract Pattern:
- Step 1: Add the new column/table.
- Step 2: Start writing to both old and new.
- Step 3: Migrate old data to new.
- Step 4: Point the app to the new source.
- Step 5: Delete the old column (Contract).
Feature Toggles: Use flags to toggle between old and new data logic to allow for instant rollbacks.

⚖️ Designing for the Future (The Senior Perspective)

Polymorphism in Schemas

How do you handle different types of the same entity?

Single Table Inheritance: One big table with a type column and many nulls. (Fastest reads).
Table Per Type: Different tables for different types. (Cleanest integrity).

Audit Logs & Soft Deletes

Soft Deletes: Adding a deleted_at timestamp instead of deleting rows.
Warning: Soft deletes can ruin unique indexes and complicate queries. A senior might suggest an Archive table instead.

💡 Seniority Note: Always ask: “Is this schema optimized for the developer’s mental model or the machine’s execution plan?” Usually, at scale, the machine’s execution plan must win, even if the code becomes slightly more complex.

[[Relational-Databases]]
[[NoSQL-Databases]]
[[Arquitetura-Event-Driven]]