Monitoring

API Gateway Metrics

By Devon Patel May 5, 2025 10 min read

Effective monitoring of API gateways requires understanding the key metrics that reveal performance, errors, and usage patterns. This guide covers essential metrics for AWS API Gateway and Oracle API Gateway, showing how to interpret them and set up effective monitoring workflows.

Why Gateway Metrics Matter

API gateways are critical infrastructure that can become bottlenecks. Monitoring these 5 key areas prevents issues:

Error rates (4xx/5xx)
Latency percentiles
Cache effectiveness
Traffic patterns
Backend integration health

AWS API Gateway Metrics (CloudWatch)

4XXError

Client-side errors (HTTP 4xx) including modified gateway responses

Namespace: AWS/ApiGateway Unit: Count

5XXError

Server-side errors (HTTP 5xx) indicating backend issues

Namespace: AWS/ApiGateway Unit: Count

Latency

End-to-end request time from API Gateway receipt to response

Includes integration latency Unit: Milliseconds

IntegrationLatency

Time between API Gateway sending to backend and receiving response

Isolates backend performance Unit: Milliseconds

CacheHitCount

Requests served from API cache (when caching enabled)

Compare with CacheMissCount Unit: Count

Count

Total API requests in period - your traffic baseline

Primary volume metric Unit: Count

Sample AWS CLI Command

aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name Latency \
  --dimensions Name=ApiName,Value=MyAPI Name=Stage,Value=prod \
  --statistics Average \
  --period 3600 \
  --start-time 2025-05-01T00:00:00Z \
  --end-time 2025-05-02T00:00:00Z

Oracle API Gateway Metrics

AWS Metric	Oracle Equivalent	Key Differences
4XXError	HttpResponses{status=4xx}	Oracle separates by exact status code
5XXError	BackendHttpResponses{status=5xx}	Oracle distinguishes gateway vs backend errors
Latency	Latency + InternalLatency	Oracle provides more granular timing breakdown
CacheHitCount	ResponseCacheAction	Oracle includes cache write metrics

Unique Oracle Metrics

Data Volume

BytesReceived: Inbound data size
BytesSent: Outbound data size

Business Metrics

UsagePlanRequests: Track by subscription tier
SubscriberRequests: Per-client usage

Viewing and Analyzing Metrics

CloudWatch Console

Navigate to CloudWatch Metrics
Select API Gateway namespace
Filter by API, stage, or method
Create dashboards with key metrics

Third-Party Tools

Datadog: Correlate with app metrics
Prometheus: For custom metric collection
Grafana: Visualization and alerting

Method-Level Metrics

AWS requires explicit enabling of detailed method metrics which may incur additional charges. Oracle provides these by default but with higher cardinality costs.

Creating Effective Alerts

Error Rate Alerts

Trigger when 5xx errors exceed 1% of traffic or 4xx errors spike unexpectedly

Latency Alerts

Monitor p99 latency crossing SLO thresholds (e.g., >500ms)

Traffic Anomalies

Detect unusual request patterns that may indicate attacks

Sample CloudWatch Alarm

aws cloudwatch put-metric-alarm \
  --alarm-name "High-5XX-Rate" \
  --metric-name 5XXError \
  --namespace AWS/ApiGateway \
  --statistic Sum \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:MyAlerts

Advanced Custom Metrics

When built-in metrics aren't sufficient, create custom metrics from:

Access Logs

Parse with Lambda or Fluentd
Extract client-specific patterns
Calculate business metrics

X-Ray Traces

Analyze latency distributions
Track downstream dependencies
Identify bottleneck segments

Real-World Implementation

A fintech company improved their API reliability by:

Setting 5xx alerts at 0.5% threshold
Creating custom metrics for PII detection
Building dashboards with 95th percentile latency

Result: 40% faster incident detection and 25% lower error rates.

Best Practices

1. Monitor Key Ratios

Track CacheHit/CacheMiss and 4xx/5xx ratios rather than just counts

2. Dimension Filtering

Break down by stage, method, and resource for troubleshooting

3. Baseline Comparison

Compare current metrics to historical baselines

Conclusion

Effective API gateway monitoring requires tracking the right metrics with appropriate granularity. While AWS and Oracle provide similar core metrics around errors, latency, and traffic, their implementations differ in:

Granularity: Oracle provides more detailed status code breakdowns
Cost Structure: AWS charges for detailed metrics while Oracle has higher base cardinality
Business Metrics: Oracle includes more subscription-aware tracking

For comprehensive monitoring, combine platform metrics with custom metrics from logs and traces. Set up alerts on error rates and latency thresholds, but also monitor trends and ratios that indicate emerging issues before they impact users.