ForgeSDLC
Navigate
Home
Discover ForgeSDLC (101)
Practice (201)
Master (301)

This page is part of the ForgeSDLC knowledge base — an AI-assisted, human-directed methodology for taking product work from concept to production. For the core operating model and vocabulary, see Forge SDLC overview and What is ForgeSDLC?.

Operational data modeling (blueprint)

Operational data modeling covers the design of data structures for transactional, application-serving databases — the storage that powers day-to-day product features. This complements the analytical/scale focus of the broader Big Data discipline with practical guidance for relational and NoSQL databases used in operational systems.


1. Relational modeling

Normalization

Normal form Rule Why it matters
1NF Atomic values; no repeating groups Eliminates multi-valued columns; enables indexing
2NF 1NF + no partial dependencies on composite keys Reduces data duplication in tables with composite PKs
3NF 2NF + no transitive dependencies Each non-key column depends only on the primary key; reduces update anomalies
BCNF Every determinant is a candidate key Stronger form of 3NF; handles edge cases in key dependencies

Practical guidance: Most operational schemas should be in 3NF. Denormalize deliberately for read performance (and document the reason in an ADR).

Entity-Relationship modeling

Concept Description
Entity A distinct thing tracked by the system (User, Order, Product)
Attribute A property of an entity (name, created_at, price)
Relationship An association between entities (User places Order)
Cardinality 1:1, 1:N, M:N — determines foreign key placement and junction tables
Participation Total (every entity must participate) vs partial (optional)

Naming conventions

Convention Example Rationale
snake_case for tables and columns order_items, created_at Wide RDBMS convention; case-insensitive across databases
Singular table names user, order Entity represents one thing; collection is implied
Explicit join table names user_role Alphabetical order of participating entities
_id suffix for foreign keys user_id, order_id Clear FK identification
Avoid reserved words user_account not user Prevents quoting issues across databases

2. Indexing strategies

Index type Use case Trade-off
B-tree (default) Equality and range queries, ORDER BY, unique constraints General-purpose; write overhead per index
Hash Exact equality lookups only Faster equality; no range support; not all RDBMS support
GIN (Generalized Inverted) Full-text search, JSONB containment, array overlap Fast reads on complex types; larger index, slower writes
GiST (Generalized Search Tree) Geometric, range, and proximity queries Spatial and range data; less efficient for simple equality
Partial / filtered Index subset of rows matching a condition Smaller index; faster for filtered queries; requires predicate match
Covering / include Store extra columns in index to avoid table lookup Faster reads; larger index; write overhead
Composite Multi-column index for combined predicates Column order matters — leftmost prefix rule applies

Query plan analysis

Step Activity
1. EXPLAIN ANALYZE Run the query with execution plan and actual timings
2. Identify bottlenecks Look for sequential scans on large tables, nested loops on unindexed joins, sort operations
3. Check index usage Verify expected indexes are used; check for index-only scans vs table lookups
4. Statistics freshness Ensure table statistics are current (ANALYZE / UPDATE STATISTICS)
5. Iterate Add/modify indexes, rewrite query, or restructure schema; re-measure

3. Transaction isolation and concurrency

Isolation level Dirty reads Non-repeatable reads Phantom reads Performance
Read Uncommitted Possible Possible Possible Highest
Read Committed Prevented Possible Possible Good (PostgreSQL default)
Repeatable Read Prevented Prevented Possible (prevented in PostgreSQL) Moderate
Serializable Prevented Prevented Prevented Lowest (potential serialization failures)

Concurrency patterns

Pattern Description When to use
Optimistic locking Check version/timestamp at write time; retry on conflict Low contention; read-heavy workloads
Pessimistic locking Lock row(s) with SELECT FOR UPDATE before modification High contention; critical sections
Advisory locks Application-level locks using database primitives Coordinating distributed processes; preventing duplicate work
SKIP LOCKED Skip rows locked by other transactions Job queues; work distribution without contention

4. Schema migration

Principles

Principle Description
Version controlled Every schema change is a numbered migration in source control
Forward-only Avoid rollback migrations in production; use forward-fixing migrations
Backward compatible New schema must work with both old and new application code during deployment
Tested Migrations run in CI against a test database; verify data integrity
Small and incremental One concern per migration; avoid combining schema and data changes

Expand-contract pattern

Safe schema changes for zero-downtime deployments:

Phase Activity Code state
Expand Add new column/table alongside existing Code writes to both old and new; reads from old
Migrate Backfill data from old to new column/table Code writes to both; reads from new
Contract Drop old column/table; remove dual-write code Code uses only new schema

Migration tooling

Tool Language/framework Approach
Flyway JVM, SQL SQL-based migrations, version tracking
Liquibase JVM XML/YAML/SQL changesets, rollback support
Alembic Python (SQLAlchemy) Code-generated migrations, branch management
Prisma Migrate TypeScript/JavaScript Schema-first, declarative migrations
golang-migrate Go SQL file migrations, database-agnostic
dbmate Any Plain SQL migrations, lightweight, database-agnostic

5. NoSQL data modeling

Document databases (MongoDB, DynamoDB, Firestore)

Pattern Description When to use
Embedding Nest related data within a single document 1:1 or 1:few relationships; data accessed together; bounded growth
Referencing Store IDs and fetch separately M:N relationships; unbounded growth; data updated independently
Bucket pattern Group time-series data into fixed-size buckets IoT telemetry, logs, time-based queries
Polymorphic pattern Different document shapes in same collection with a type discriminator Flexible schemas; product catalogs; event stores

Key-value stores (Redis, DynamoDB)

Pattern Description Use case
Cache-aside Application reads cache first; on miss, reads database and populates cache Read-heavy, latency-sensitive
Write-through Writes go to cache and database simultaneously Strong consistency between cache and database
TTL-based expiration Keys expire after configurable time Session storage, rate limiting, temporary data
Sorted sets Ordered collections with score-based ranking Leaderboards, priority queues, time-based feeds

Graph databases (Neo4j, Amazon Neptune)

When to choose graph When to avoid graph
Relationships are the primary query target Simple key-value or document lookups
Traversal depth is variable (friends-of-friends, shortest path) Fixed-depth joins (standard SQL)
Schema evolves frequently Write-heavy, high-throughput transactional workloads
Social networks, recommendation, fraud detection, knowledge graphs OLAP/analytical workloads (better in columnar stores)

Time-series databases (TimescaleDB, InfluxDB, QuestDB)

Concern Guidance
Schema design Wide vs narrow table; tags vs fields; time partitioning
Retention policies Automatic downsampling (raw → hourly → daily); tiered storage
Query patterns Aggregation over time windows; last-known-value; gap filling
Ingestion Batch vs streaming; out-of-order handling; write amplification

6. Polyglot persistence

Using multiple database types, each optimized for its access pattern:

Access pattern Database type Example
Transactional CRUD Relational (PostgreSQL, MySQL) User accounts, orders, inventory
Full-text search Search engine (Elasticsearch, Meilisearch) Product search, log search
Caching Key-value (Redis, Memcached) Session store, API response cache
Graph traversal Graph (Neo4j, Neptune) Recommendations, social graph
Time-series Time-series (TimescaleDB, InfluxDB) Metrics, IoT telemetry, financial ticks
Document / flexible schema Document (MongoDB, DynamoDB) Content management, product catalogs
Analytical Columnar (ClickHouse, BigQuery) Business intelligence, event analytics

Synchronization strategies

Strategy Description Consistency
Dual write Application writes to both stores Inconsistent on failure (avoid without outbox)
Change data capture (CDC) Capture database changes and propagate to secondary stores Eventually consistent; reliable
Event sourcing Events are the source of truth; projections build views in each store Eventually consistent; auditable
ETL / scheduled sync Periodic batch transfer from primary to secondary Latency = batch interval; simple

External references

Topic URL Why it is linked
Use The Index, Luke https://use-the-index-luke.com/ SQL indexing and performance — practical and thorough
PostgreSQL Documentation https://www.postgresql.org/docs/ Authoritative reference for advanced SQL features
Designing Data-Intensive Applications https://dataintensive.net/ Distributed data systems — the essential reference
Database Internals (Petrov) https://www.databass.dev/ How databases work under the hood
MongoDB Data Modeling https://www.mongodb.com/docs/manual/data-modeling/ Document database modeling patterns
DynamoDB Book (Alex DeBrie) https://www.dynamodbbook.com/ DynamoDB single-table design patterns