📐 Data Modeling - Scalable Schemas
The Senior Mindset: Data modeling is the art of balancing write performance, read latency, and storage costs. A scalable schema is one that allows the application to grow by 100x without requiring a complete rewrite of the database layer.
🏗️ Core Modeling Strategies
Section titled “🏗️ Core Modeling Strategies”1. Relational Modeling (The “Normalization” Path)
Section titled “1. Relational Modeling (The “Normalization” Path)”In SQL systems, we often model by entities.
- Star Schema: Common in data warehousing. A central “Fact” table (e.g., Sales) surrounded by “Dimension” tables (e.g., Products, Dates).
- Snowflake Schema: A more normalized version of the Star Schema.
- When to use: When data integrity is the priority and query patterns are diverse.
2. Document Modeling (The “Embedding” Path)
Section titled “2. Document Modeling (The “Embedding” Path)”In NoSQL (MongoDB), we model by access pattern.
- Embedding: Storing related data in a single document (e.g., an Order document containing all Line Items).
- Referencing: Storing an ID and “joining” in the application code.
- Decision Rule: If the child data “belongs” to the parent and isn’t accessed independently, Embed. If the child data is large or shared, Reference.
🚀 Scaling Techniques for Large Schemas
Section titled “🚀 Scaling Techniques for Large Schemas”Partitioning & Sharding
Section titled “Partitioning & Sharding”When a single table or database is too large:
- Vertical Partitioning: Moving infrequently used columns to a separate table to reduce row size.
- Horizontal Partitioning (Sharding): Splitting rows across different physical servers based on a Shard Key (e.g.,
user_id).- Senior Tip: Choosing a bad Shard Key leads to “Hot Spots” (one server doing all the work).
Indexing for Scale
Section titled “Indexing for Scale”- Partial Indexes: Indexing only a subset of rows (e.g.,
WHERE status = 'active') to save space. - Functional Indexes: Indexing the result of a function (e.g.,
LOWER(email)).
⚖️ Trade-offs: Denormalization for Speed
Section titled “⚖️ Trade-offs: Denormalization for Speed”At high scale, “pure” normalization often fails due to the cost of JOIN operations.
- Calculated Fields: Storing the
total_pricein anOrderstable instead of summingLineItemsevery time. - Duplication: Storing the
usernamedirectly on aCommentobject to avoid joining theUserstable for every comment in a feed. - The Risk: Data Inconsistency. You must implement a strategy (like database triggers or application-level events) to update all copies when the source changes.
🛠️ Schema Evolution & Migrations
Section titled “🛠️ Schema Evolution & Migrations”A senior engineer never assumes the schema is static.
- Expand and Contract Pattern:
- Step 1: Add the new column/table.
- Step 2: Start writing to both old and new.
- Step 3: Migrate old data to new.
- Step 4: Point the app to the new source.
- Step 5: Delete the old column (Contract).
- Feature Toggles: Use flags to toggle between old and new data logic to allow for instant rollbacks.
⚖️ Designing for the Future (The Senior Perspective)
Section titled “⚖️ Designing for the Future (The Senior Perspective)”Polymorphism in Schemas
Section titled “Polymorphism in Schemas”How do you handle different types of the same entity?
- Single Table Inheritance: One big table with a
typecolumn and many nulls. (Fastest reads). - Table Per Type: Different tables for different types. (Cleanest integrity).
Audit Logs & Soft Deletes
Section titled “Audit Logs & Soft Deletes”- Soft Deletes: Adding a
deleted_attimestamp instead of deleting rows. - Warning: Soft deletes can ruin unique indexes and complicate queries. A senior might suggest an
Archivetable instead.
💡 Seniority Note: Always ask: “Is this schema optimized for the developer’s mental model or the machine’s execution plan?” Usually, at scale, the machine’s execution plan must win, even if the code becomes slightly more complex.
🔗 Related Links
Section titled “🔗 Related Links”- [[Relational-Databases]]
- [[NoSQL-Databases]]
- [[Arquitetura-Event-Driven]]