Peak-Traffic Performance Testing for Microservices
Extended and matured a microservices performance testing platform to withstand high-traffic peak events.
Aspire IT Services
The Problem
The team anticipated high-traffic periods, but lacked confidence in how the system would perform under load. Existing performance testing was limited, with no clear targets or sufficient observability into system behavior under stress. Microservices were tightly coupled, increasing the risk of cascading failures. I was brought in to evolve and scale the performance testing program, strengthening coverage, defining benchmarks, and improving visibility ahead of peak periods.
Performance Testing Architecture
Wrote Gatling scripts in Scala covering hundreds of endpoints across all microservices. Designed three scenario types: ramp-up (gradual user growth), sustained load (steady-state at target TPS), and spike (sudden 3,000 VU burst). Monitored CPU, memory, p95 latency, throughput, and error rates simultaneously using Grafana dashboards and New Relic APM traces. Automated test report generation and publishing to GitHub Pages — so the team had full results immediately after every run without any manual steps.
Business Impact
- 3,000 concurrent virtual users generating millions of API requests per hour — revealing real bottlenecks before any traffic event hit production
- Multiple microservice-level bottlenecks identified and fixed before peak periods — preventing cascading failures in production
- System performance baselines established for the first time: latency targets, max throughput, memory ceilings per service
- Automated report pipeline meant the entire team had access to test results instantly — no manual reporting overhead