Skip to main content
Monitoring Trifecta - CloudWatch
Basics
- "eyes and ears" of your AWS env
- the "central nervous system" of the cloud
- colelcts signals (data), makes sense of them (analysis), and can trigger reflexes (automation) when something goes wrong
- a collections of tools
Types of Data
-
Metrics: numerical data like "what is the CPU usage of my server right now?"
-
Logs: text based records
-
Events: changes in the AWS env that can trigger automated responses
Goals:
-
Data Silos: single location to view the status of entire cloud env
-
Reactive Fixing: proactive monitoring, when something happens, it reports it
-
Manual Scaling: it can tell AWS to automatically add more servers to handle the load
-
Hidden Errors: provides deep visibility, search through logs
Why is it needed:
-
Operational Health: know something's wrong the moment it happens
-
Cost Optimization: identify zombie resources
-
Security & Compliance: monitoring logs, spot unusual activie, (an IP address logs 1k times in a minute, block it automatically)
-
TroubleShooting (MTTR): reduces the Mean Time to Resolution. when an error occurs, you can correlate a pike in a metric (like latency) directly with a specific log entry to find the root cause instantly
-
Components
-
Dashboards: visual graphs of your metrics
-
Alarms: if CPU > 80% for 5 mins, send alert
-
Logs Insights: query language to search through logs
-
Synthetics: scripts that "ping" your website to check for availability and broken links
Diff between AWS Config and AWS CloudWatch?
- AWS Config monitors Compliance while CloudWatch monitors Performance
- AWS Config is best for real-time monitoring and auto-scaling, where CloudWatch is best for tracking changes over time
The "Trifecta"
-
CloudWatch: tells you what is happening (high CPU)
-
AWS Config: tells you what changed in the setup (someone changed the instance to a smaller one)
-
CloudTrail: tells you who did it (user "Admin_Bob" made the change)