Microservices Architecture Guide 2026: Design, Communication and Resilience

⏱️5 min read · 965 words

Microservices architecture has matured significantly in 2026. After years of “microservices hell” where teams created too many too-small services, the industry has converged on patterns that balance modularity with operational simplicity. This guide covers when to use microservices, how to design them, and how to handle the hard parts.

📋 Table of Contents

Monolith vs Microservices vs Modular Monolith
Bounded Contexts and Service Design
Communication Patterns
API Gateway Pattern
Distributed Tracing
Service Health and Resilience
When NOT to Use Microservices

Monolith vs Microservices vs Modular Monolith

Approach	When to Use	Trade-offs
Monolith	Startup, <10 engineers, unknown domain	Simple to build/debug, hard to scale independently
Modular Monolith	Growing team, clear bounded contexts, single deployment	Best of both worlds — modular but simple
Microservices	Large org, independent team scaling, polyglot tech	Independent scaling/deployment, complex distributed system

2026 consensus: Start with modular monolith, extract services when you have a clear reason (scale, independent deployment, team autonomy).

Bounded Contexts and Service Design

Good microservices align with Domain-Driven Design (DDD) bounded contexts:

E-commerce domain bounded contexts:

Order Service (owns orders, order items)
  - Create order
  - Update order status
  - Get order history

Product Service (owns products, inventory)
  - List products
  - Update inventory
  - Product search

User Service (owns users, auth)
  - Register, login
  - Profile management
  - Authentication tokens

Payment Service (owns payment processing)
  - Process payment
  - Handle refunds
  - Payment status

Notification Service (owns notifications)
  - Email, SMS, push
  - Triggered by events from other services

Each service:
  - Has its own database (database-per-service pattern)
  - Is owned by one team
  - Can be deployed independently
  - Communicates via API or events (not shared DB)

Communication Patterns

Synchronous (REST/gRPC)

# REST API call between services
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
async def get_user(user_id: int) -> dict:
    async with httpx.AsyncClient(base_url="http://user-service:8001") as client:
        r = await client.get(f"/api/users/{user_id}", timeout=5.0)
        r.raise_for_status()
        return r.json()

# Use circuit breaker for resilience
from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=30)
async def call_payment_service(order_id: int, amount: float):
    async with httpx.AsyncClient(timeout=10) as client:
        return await client.post("http://payment-service/payments",
                                 json={"order_id": order_id, "amount": amount})

Asynchronous (Event-Driven with Kafka)

from aiokafka import AIOKafkaProducer, AIOKafkaConsumer
import json

# Produce event when order is created
async def publish_order_created(order: dict):
    producer = AIOKafkaProducer(bootstrap_servers="kafka:9092")
    await producer.start()
    try:
        event = {
            "event_type": "order.created",
            "order_id": order["id"],
            "user_id": order["user_id"],
            "total": order["total"],
            "timestamp": datetime.utcnow().isoformat()
        }
        await producer.send("orders", json.dumps(event).encode())
    finally:
        await producer.stop()

# Notification service consumes the event
async def consume_order_events():
    consumer = AIOKafkaConsumer(
        "orders",
        bootstrap_servers="kafka:9092",
        group_id="notification-service",
        auto_offset_reset="earliest"
    )
    await consumer.start()
    async for msg in consumer:
        event = json.loads(msg.value)
        if event["event_type"] == "order.created":
            await send_confirmation_email(event["user_id"], event["order_id"])

API Gateway Pattern

# Kong / nginx / custom gateway routes
# All clients talk to one entry point:
# POST /api/orders -> Order Service :8001
# GET /api/products -> Product Service :8002
# POST /api/auth -> User Service :8003

# Kong route configuration
routes:
  - name: orders
    paths: ["/api/orders"]
    service: order-service
    plugins:
      - name: jwt           # validate JWT
      - name: rate-limiting # 100/min per user
      - name: request-transformer

  - name: products
    paths: ["/api/products"]
    service: product-service
    plugins:
      - name: response-ratelimiting

Distributed Tracing

# OpenTelemetry — trace requests across services
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Setup (once at startup)
provider = TracerProvider()
jaeger_exporter = JaegerExporter(agent_host_name="jaeger", agent_port=6831)
provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)

# Instrument FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

FastAPIInstrumentor.instrument_app(app)  # auto-instruments all routes
HTTPXClientInstrumentor().instrument()   # auto-instruments httpx calls

# Manual span for business logic
async def process_order(order_id: int):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)
        order = await get_order(order_id)
        span.set_attribute("order.total", order["total"])
        await charge_payment(order)
        await send_confirmation(order)

Service Health and Resilience

from fastapi import FastAPI
from datetime import datetime

app = FastAPI()

@app.get("/health")
async def health():
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "version": "1.2.3",
        "dependencies": {
            "database": await check_db(),
            "redis": await check_redis(),
        }
    }

@app.get("/ready")
async def readiness():
    # Check if service can handle traffic
    if not db_pool or not redis_client:
        raise HTTPException(503, "Service not ready")
    return {"status": "ready"}

# Kubernetes probes:
# livenessProbe: /health (if fails: restart container)
# readinessProbe: /ready (if fails: remove from load balancer)

When NOT to Use Microservices

Small team (under 5 engineers) — operational overhead kills velocity
Early product — domain boundaries not yet clear
Simple CRUD app — no scaling justification
Network latency is critical — inter-service calls add latency
No DevOps maturity — need CI/CD, monitoring, tracing first

Microservices in 2026 work best when: teams are large enough to own services independently, bounded contexts are clear, and you have the DevOps infrastructure to handle distributed systems. Start with a well-structured modular monolith and extract services only when you hit clear pain points.

📚 You might also like

🔗 Share this article

X / Twitter Facebook WhatsApp LinkedIn Telegram