Use Cases and Deployment Scope
In our environment, Datadog is a core part of our observability stack for application performance and infrastructure. We are using RUM for monitoring customer performance in web applications and identifying issues. For API, we have an APM monitor to check latency, track all endpoints and status codes, and the success rate. We have comprehensive AWS infrastructure monitoring, including EC2 instances, RDS MySQL Clusters, NLB/ALB, VPC flows, and Lambda functions. Additionally, APM for Java Microservices includes tracing, JVM Heap, GC, and thread pools. We are also utilizing it for log analysis and on-call services, including custom logs for settlement jobs, credit applications, normalization for fast searching, resolving deadlocks, and handling 5xx bursts. We are also using WAF for application endpoint security.
Alternatives Considered
Grafana, Prometheus, Amazon CloudWatch and Dynatrace
Other Software Used
Grafana Loki, Dynatrace, Prometheus