Peak-Traffic Performance Testing for Microservices

Extended and matured a microservices performance testing platform to withstand high-traffic peak events.

Aspire IT Services

3,000Concurrent VUsers

M+Requests per Hour

100sof Endpoints Tested

The Problem

The team anticipated high-traffic periods, but lacked confidence in how the system would perform under load. Existing performance testing was limited, with no clear targets or sufficient observability into system behavior under stress. Microservices were tightly coupled, increasing the risk of cascading failures. I was brought in to evolve and scale the performance testing program, strengthening coverage, defining benchmarks, and improving visibility ahead of peak periods.

Performance Testing Architecture

Wrote Gatling scripts in Scala covering hundreds of endpoints across all microservices. Designed three scenario types: ramp-up (gradual user growth), sustained load (steady-state at target TPS), and spike (sudden 3,000 VU burst). Monitored CPU, memory, p95 latency, throughput, and error rates simultaneously using Grafana dashboards and New Relic APM traces. Automated test report generation and publishing to GitHub Pages — so the team had full results immediately after every run without any manual steps.

Business Impact

3,000 concurrent virtual users generating millions of API requests per hour — revealing real bottlenecks before any traffic event hit production
Multiple microservice-level bottlenecks identified and fixed before peak periods — preventing cascading failures in production
System performance baselines established for the first time: latency targets, max throughput, memory ceilings per service
Automated report pipeline meant the entire team had access to test results instantly — no manual reporting overhead

Stack: Gatling (Scala)GrafanaNew Relic AWSDockerJenkinsCypressGoogle Lighthouse