What is AWS database monitoring?
AWS database monitoring deals with the processes that analyze the performance, availability, and security of databases hosted on the Amazon cloud. It involves numerous small steps to collect the right metrics and data that allow cloud administrators to have the pulse of their AWS databases, and promptly react when the pulse signals something wrong.
AWS database monitoring is part of the wider cloud monitoring area, and a subtopic of cloud database monitoring, which encompasses all the databases running on clouds, AWS or not. AWS database monitoring is an essential task for any cloud-based infrastructure, as databases lay at the core of many workloads, making their efficiency and health crucial for companies.
Monitoring Amazon databases is done with multiple tools, some integrated in AWS, like CloudWatch, while others are external, like Checkmk. In the course of this guide we will take a look at the differences in monitoring the various types of AWS databases, what metrics to gather, and what features to watch out for in an AWS database monitoring tool.
How is AWS database monitoring performed?
Monitoring AWS databases is generally done in a few different ways. The first step is usually CloudWatch, which is already integrated in the AWS cloud and therefore a good start when setting up a monitoring infrastructure. It offers a set of dashboards and can monitor not only all the AWS databases, but also some on-premises ones, as well as plenty of other services and resources in the AWS cloud.
CloudWatch is not the only option, though. AWS X-Ray can monitor requests to MySQL and PostgreSQL-based self-hosted or Amazon databases, like Amazon RDS and Aurora. AWS SQL databases can then view all the requests made to it, and generate an application-centric view of the performance of your databases, providing an overview of how the AWS databases are being used for the administrators.
For a more open-ended approach, AWS exposes many metrics through their REST APIs. These can be exploited with custom-made scripts or, more commonly in medium/large environments, with third-party AWS database monitoring tools like Checkmk. These tools simplify gathering the right metrics and creating custom dashboards, alerts, reports and more.
Last but definitely not least, CloudWatch or third-party monitoring agents can also be used for more granular monitoring of databases on AWS and other resources. Many monitoring tools use both agents and the REST APIs to complete what can be monitored. A comprehensive AWS database monitoring system will benefit from having the highest number of different metrics and data collected in a unified view as possible.
AWS database options are aplenty, and not all monitoring efforts are equally worthwhile. Next, we will take a look at how the different Amazon databases are to be monitored, since there are usually slight differences.
AWS SQL database monitoring
Into the category of AWS SQL databases falls Amazon Aurora and Amazon RDS. Aurora is a MySQL and PostgreSQL-compatible relational database service, while RDS is a fully-managed service to set up your relational database of choice. Monitoring AWS SQL databases is very similar, so the collected metrics do not differ much. But still, there are a few differences you should be aware of.
Keeping an eye on the usual basic metrics like CPU utilization, memory usage, disk I/O, database connections, query execution time is obvious and should be included in any AWS database monitoring set up. In case of AWS RDS monitoring, if your databases are spread over multiple AZ (Availability Zones), replication logs are also important to consult.
Additionally, Aurora is an AWS serverless database and exposes a few specific metrics over RDS. Namely, cluster health metrics like the reading and writing number of operations. These can signal how well the underlying clusters of your Aurora databases are working. For instance, replication lag tells you how well replication is working at cluster-level. The single volume metrics like active transactions, deadlocks, queue depth, or login failure counts should be taken into consideration as well.
AWS NoSQL database monitoring
There are a few AWS NoSQL databases among the numerous AWS database options. You will probably face ElastiCache or DynamoDB the most. The former has a few metrics that are obviously related to cache, like hit ratio and evictions, and that are important to track with CloudWatch or your AWS database monitoring tool of choice.
DynamoDB is a managed NoSQL database service similar to Amazon Aurora, and when monitoring it you should focus on similar or equivalent metrics. Read and write capacity utilization, throttled requests, consumed provisioned throughput, latency, overall storage usage, and more.
In both ElastiCache and DynamoDB, the classic metrics related to CPU, memory, network, disk utilization and performance should be regularly checked as well. Again, either with CloudWatch for a simpler look or with a third-party AWS database monitoring tool like Checkmk.
AWS graph database monitoring
AWS Graph databases are part of the AWS NoSQL databases. An example is Amazon Neptune, which is fully-managed by AWS and cloud-native. These Amazon databases work differently than other NoSQL databases, but their monitoring metrics do not diverge much from what we have seen with ElastiCache and DynamoDB. Monitoring key utilization metrics like CPU usage, memory utilization, disk I/O and space, query latency, database connections and so on is as much a necessity with an AWS graph database as with any other NoSQL databases.
In case of a serverless Amazon Neptune though, it is important to keep track of NCU (Neptune capacity unit) related metrics, like serverless database capacity or NCU utilization. The NCU is a basic unit for serverless Neptune databases, consisting of a block of 2 GiB of memory, an associated virtual processor capacity (vCPU), and necessary network resources. They give information about the health of the respective Neptune instance.
AWS Time Series database monitoring
AWS time series databases are built to store and analyze time series data points, usually from telemetry or IoT (Internet of Things) devices. Amazon Timestream is an example for an Amazon TSD (Time Series Database). These types of databases are part of the many AWS serverless databases and export similar metrics as other Amazon databases. Whether you are using CloudWatch or a third party AWS database monitoring solution, checking the utilization metrics related to CPU, memory, disk I/O, network, is the first natural monitoring step.
More specific to AWS time series databases is a set of metrics that informs cloud administrators about the efficiency of the requests made to an Amazon Timestream database. Thus, checking metrics like system and user errors, latency, time series that have been rejected or successfully uploaded, and the amount of data scanned by queries sent to Timestream are important to monitor. Keeping an eye on AWS time series database logs is also key for proper AWS database monitoring.
AWS database monitoring best practices
Regardless of the AWS database options you choose, there is a set of best practices to follow for an efficient monitoring system. First, consider the factors that most matter to your organization. Whether it is performance, availability, scalability, or security. This way, you will probably pay more attention to a set of metrics that is more relevant for your environment.
Integrated tools
CloudWatch is not to be underestimated. While limited in configurability and customizability, CloudWatch can give you a helpful overview in only a few seconds, making it easy to understand if something is amiss or wrong. A more advanced AWS database monitoring solution, like Checkmk, can be used to investigate further and troubleshoot when necessary. It is not advisable to limit yourself to CloudWatch only, as it hardly fits every company's needs.
Databases' differences
If you think about what we said about the specific metrics of every type of Amazon database, you quickly realize that most of them have a lot of common metrics that should not be ignored. However, they also have a few individual metrics that inform cloud administrators about their specific functional areas. Whether you are using an AWS SQL database or an AWS NoSQL database, knowing what their typical metrics are is important to set up an efficient monitoring solution.
Alarms and notifications
Set up alarms and notifications based on your thresholds. CloudWatch can notify you through Amazon SNS, a push notification service that can notify on many devices and systems. Even though it comes at a cost, it can be of great help for emergency notifications. Otherwise, a separate AWS database monitoring tool will provide you with many options to inform administrators of overstepped thresholds and bring their attention to the right database. Checkmk allows you great control on when and how exactly you should be notified, and is an invaluable tool in avoiding notification fatigue.
If you have database replication or multi-Availability Zone (AZ) deployments for high availability or disaster recovery, monitoring replication lag, replication status, and failover events ensures data consistency and availability.
Review logs and metrics regularly
Make use of database logs whenever they are available. Analyze them regularly to gain insights, to troubleshoot issues, and to enhance database security monitoring.
Reviewing metrics on a regular basis is vital to comprehend where to optimize, to increase database performance, and to discover trends. Not to forget, paying attention to every metric is useful for discovering possible disruptions or misconfigurations earlier, and for extracting the maximum performance from your AWS cloud databases.
All this is possible thanks to the power of advanced AWS database monitoring tools like Checkmk. With Checkmk, you have a complete solution for an efficient AWS database environment And with the added benefit of being on the cheap side compared to other solutions and to Amazon's CloudWatch, Checkmk is a single solution to keep a healthy, efficient, and always available AWS database infrastructure.
FAQ
Yes, Amazon provides a service called DMS (AWS Database Migration Service), which simplifies and streamlines the conversion of your on-premises databases to one of the AWS database services. It is capable of schema conversion, data validation, schema and data replication, and it also furnishes a real-time monitoring of the conversion process. DMS includes both a command-line interface and an API for managing and configuring the whole migration.