How We Migrated a Naval Logistics Monolith to Microservices with Zero Downtime

A practitioner's account of migrating a decade-old Java monolith handling millions of annual shipments to an event-driven microservices architecture — without a maintenance window.

15 March 2026

When the operations team told us that a single deployment took four hours and still required rollback procedures, we knew the monolith had reached its ceiling. This is the story of how we migrated it — and what we would do differently.

The Starting Point

The platform was a Spring MVC application built in 2013. It handled shipment tracking, billing, customer notifications, and reporting — all in one deployable JAR. The database was a shared Oracle schema that every feature touched.

The symptoms were familiar:

  • Release cycles of 3–4 weeks because every team had to coordinate
  • A single bug in billing could take down the tracking API
  • Peak-load performance degraded unpredictably because the connection pool was shared

Why the Strangler-Fig Pattern

A big-bang rewrite was off the table. We had SLAs to meet and a team that needed to keep delivering features. The strangler-fig pattern let us migrate incrementally: new services handle a slice of the load in parallel with the monolith until confidence is high enough to decommission the old path.

Step 1: The Event Backbone

Before extracting a single service, we deployed Apache Kafka and defined the canonical domain events for shipment state changes:

ShipmentCreated
ShipmentPickedUp
ShipmentInTransit
ShipmentDelivered
ShipmentException

Getting these right took two weeks of whiteboard sessions. The schema is the contract — changing it later is expensive.

Step 2: Extract the Tracking Service

The tracking read path was the highest-traffic, lowest-coupling domain — the ideal first target. We:

  1. Built a new Spring Boot service that consumed Shipment* events from Kafka and maintained its own PostgreSQL read model
  2. Deployed it behind a feature flag, routing 1% of tracking queries to it
  3. Ran the old and new paths in parallel for two weeks, comparing outputs
  4. Ramped traffic to 100% once P99 latency and error rates were stable

The monolith continued publishing events through a new outbox table (the transactional outbox pattern) to avoid dual-write problems.

Step 3: Billing and Notifications

Same approach, six weeks apart. Each extraction taught us something that made the next one faster.

The billing service was the hardest: it had the most complex domain logic and the tightest database coupling. We wrote a dedicated migration guide for it that the team still references.

What We Learned

Schema first. The event schema is your API. Invest in it before writing any service code.

The outbox pattern is not optional. Writing to Kafka directly from a transaction is a distributed systems bug waiting to happen. The outbox gives you exactly-once semantics within your database boundary.

Observability before migration. We added Prometheus metrics and structured logging to the monolith before we started. Without baseline numbers, you cannot know whether the new service is actually better.

Strangler-fig requires discipline. It is tempting to skip the parallel-run step when you are confident in the new code. Resist this. The production data will surprise you.

Results

  • End-to-end throughput tripled at peak load
  • Teams deployed independently from day one of extraction
  • The last monolith endpoint was decommissioned eight months after we started

The migration is done. The maintenance windows are not.