What is cloud database monitoring?
Cloud database monitoring refers to the monitoring of databases run in the cloud. It is typically managed through various checks, uses alerts to inform cloud administrators of potential issues, and monitors performance to optimize the efficiency of cloud databases. There are many cloud database solutions that can, and should, be monitored, but here we will focus on those offered by the three main cloud vendors, AWS, Azure, and Google. These can be run in a public or private cloud, and both in a hybrid or fully cloud environment. In practice, a big part of cloud monitoring is the monitoring of cloud databases.
Cloud database monitoring does not substantially differentiate between where your databases are set up or physically located, but includes any type of database that is provisioned in a cloud. Thus, both cloud native databases and databases run in virtual machines fall within the sphere of cloud database monitoring. We are going to explain the difference between the two shortly. For now, it is sufficient to establish that cloud database monitoring involves collecting and analyzing metrics and logs from the database instances, monitoring their health, identifying potential issues or bottlenecks, and optimizing their performance.
What are the differences between traditional and cloud-native databases?
An explanation of the differences between classical and cloud databases is necessary before moving on to the actual monitoring of the latter. Google gives a good definition of what a cloud database is:
A cloud database is a database built to run in a public or hybrid cloud environment to help organize, store, and manage data within an organization. Cloud databases can be offered as a managed database-as-a-service (DBaaS) or deployed on a cloud-based virtual machine (VM) and self-managed by an in-house IT team.
The definition in Google thus does not distinguish between a ‘cloud database’ and a ‘cloud native database’, but only between databases that are run in the cloud and those which are not. Other definitions make a distinction. In these, cloud databases are the types of databases that can, but not only, be operated on a cloud service, while cloud native databases are those specifically designed and built to leverage the capabilities and advantages of cloud-native architectures and technologies. Both can and are operated in a cloud, but the latter are conceived from the beginning to operate only in a cloud.
Cloud-native databases are more closely defined as being developed using cloud-native principles, which emphasize containerization, microservices, scalability, resilience, and agility.
Cloud database monitoring focuses on both types of cloud databases, native and not. Cloud database monitoring tools operate in different ways depending on the type of database to be monitored, especially differentiating between one deployed by an IT team and one furnished by a cloud vendor as a managed service. Both are possible to monitor by these tools, independently on how we want to define them, cloud native or just “cloud” databases. Monitoring them is far from identical but broadly the metrics to be gathered are the same.
What cloud databases exist?
In computing there are multiple types of databases, and the same is valid in the cloud. An in-depth explanation of each is vastly outside the scope of this article. It is enough to provide a short definition of the various types, to better understand what databases are offered by cloud vendors and why there are so many.
Relational Database Management System (RDMS) has been the dominant type for over 4 decades. Oracle, MySQL, Microsoft SQL Server, PostgreSQL, SQLite, MariaDB and many more fall into this category and are available as cloud SQL databases. SQL-based databases are not the only RDMS, SQL is simply the most used query language in relational databases.
RDMS are table-oriented, while one of their alternatives, Object-oriented Database Management System (OODMS), is instead based on objects, combining databases with object-oriented programming language capabilities. They are far less common these days than RDMS, and are not readily offered by cloud vendors.
More popular instead are NoSQL databases, non-relational databases that use a different model that is not table-based like the rest of the SQL family. Every NoSQL database is part of one of over a dozen subcategories, depending on the data model they follow. Common among cloud databases are the Key-Value type (Couchbase, Redis, Memcached, Amazon DynamoDB, Azure Cosmos DB and others), Wide Column Store (Cassandra, Google Cloud Datastore, Azure Cosmos DB, Amazon DynamoDB), and Graph (Apache Graph, Azure Cosmos DB, Oracle Property Graph, RedisGraph, SAP HANA) databases. As it may have been noticed, a single database can support more than one data model.
These models are what were chosen by the most popular cloud database solutions, namely from AWS, Azure, and GCP. By no means is this given list exhaustive, as numerous types and subtypes of databases exist, inside and outside cloud services. For the sake of this guide, we have restrained ourselves from discussing any databases that are not available as cloud databases from any other provider outside the ‘big three’ clouds.
AWS cloud databases
AWS implements multiple types of database, covering a wide range of usage cases. In AWS database monitoring there are thus a great variety of different databases to monitor. Commonly, though, these databases are run through an Amazon RDS (Relational Database Service), which supports MariaDB, Microsoft SQL Server, MySQL, PostgreSQL, and Oracle databases. Amazon RDS generates metrics to be gathered by either an AWS dashboard, such as Cloudwatch, or by an external AWS database monitoring tool. For database performance monitoring, the AWS Database Performance Insight dashboard is the integrated service to check metrics and set up alarms related to the databases’ performance.
AWS SQL databases
As managed databases go, Amazon’s principal offering is Aurora. A fully-managed database engine, compatible with both MySQL and PostgreSQL. So, as with Amazon RDS, still within the relational databases field. As a cloud-native database, Amazon Aurora can be monitored mainly, but not exclusively, through Amazon CloudWatch.
AWS database monitoring includes not only proper databases but also systems that collect data in general. For instance, Amazon Redshift is a data warehouse that is used by companies to gather data from multiple sources, and analyze the data in a single place. Redshift is a fully managed, serverless cluster data warehouse that as far as cloud database monitoring goes is not too dissimilar from actual database instances. Since behind the scenes Amazon Redshift is a cluster of SQL-based databases, the metrics for monitoring it are the same as any other cloud relational database service.
AWS NoSQL databases
NoSQL databases have their place within the AWS cloud under Amazon DynamoDB. Monitoring a DynamoDB instance is simply done with Amazon CloudWatch or, as usual, with one of the many third-party database monitoring tools in existence.
Amazon ElastiCache for Memcached and Amazon MemoryDB for Redis are in-memory data store services, built to obtain the maximum performance. Compatible with Redis or Memcached, highly scalable and reliable, they exist to power real-time applications, necessitating up to hundreds of millions of operations per second. If your organisation has similar needs, monitoring either ElastiCache or MemoryDB for Redis is of critical importance to ensure that the target performance levels are reached and maintained throughout the lifespan of your applications.
Amazon DocumentDB is a more typical database, having compatibility with MongoDB, cloud native database for MongoDB workloads. Similar is Amazon Keyspaces, another fully-managed database service, compatible with the Cassandra database instead of MongoDB. Both are serverless, and easily monitored through either CloudWatch or common cloud database monitoring tools.
A subset of NoSQL databases are graph databases (GDB). These use data structures similar to graphs, relating the data items in the store to a collection of nodes and edges, with the latter representing the relationship between the nodes. Amazon Neptune is such a database, fully managed and cloud-native.
Amazon Timestream is a slightly different type of cloud database. It is a time series database that stores and analyzes time series data points. These are usually telemetry points or coming from IoT (Internet of Things) devices, creating up to trillions of time-related entries. For such levels of granularity and scalability, it is evident that the monitoring of such a database type is vital.
Azure cloud databases
Azure has a more focused set of cloud databases compared to AWS. Fewer choices may mean less flexibility but not necessarily less power or efficiency. The main cloud database solutions from Azure are Azure SQL (also as a managed instance), and SQL Server for virtual machines. Both run on the Microsoft SQL Server database engine, and have a series of usage tiers in terms of computation and service (as in storage and I/O utilization). Azure SQL can be fully-managed, either used as a single instance or in an elastic pool (a collection of single databases sharing a set of resources).
Azure SQL databases
As far as cloud SQL databases go, Azure has the option to run a fully-managed instance of a few databases that are not Microsoft SQL Server. PostgreSQL, MySQL, and MariaDB are supported by Azure, with an emphasis on migrating your on-premises databases to their cloud service.
Proper cloud-native databases on Azure means choosing Azure Cosmos DB. Built to be efficiently scaleable, it is a fully-managed and serverless distributed database that supports PostgreSQL, MongoDB, and Apache Cassandra. So, not simply a cloud SQL database. Should an on-premises Oracle database be in use by your organization, Azure supports migrating it to the cloud service.
Azure non-relational databases
As data store services go, Azure offers Azure Cache for Redis. A fully-managed, in-memory, highly scalable solution to power real-time applications and heavy workloads.
Regardless of the type of database chosen, Azure database monitoring includes all of those so far listed. Plenty of monitoring widgets (called Insights) are available to plug into Azure Monitor, the main tool integrated in the Azure cloud for Azure database monitoring and more. It can be used as a starting point, and to have a quick glance at the health of your cloud databases, but for more powerful and customizable monitoring, one of the many cloud database monitoring tools, like Checkmk, is highly recommended.
Google cloud databases
Google Cloud Platform, Google's cloud service, comes with its own series of cloud databases that are the subject of Google Cloud database monitoring. Cloud SQL databases and non-relational databases co-exist on GCP to offer a fairly large set of database choices.
Google SQL databases
Google Cloud SQL is the principal database to pick if you need a Cloud SQL database. It provides managed MySQL, PostgreSQL, and SQL Server databases to host on the Google Cloud platform. It is the bread and butter of databases on GCP, the obvious first choice for moderate workloads and the most common use cases.
If more is needed, Google offers AlloyDB. It is a fully-managed database, compatible with PostgreSQL for the most demanding enterprise database workloads. It is especially set up to offer much higher performance to a typical PostgreSQL database, up to 4X faster, Google claims.
An alternative hybrid choice is Cloud Spanner. It is a distributed cloud-native database, offering high scalability, automatic sharding, and extreme availability. Cloud Spanner attempts to combine the benefits of relational database structures with the non-relational horizontal scale. It is a highly flexible type of database, coming with a compatible PostgreSQL interface. Usually chosen for performance-sensitive operations, like global financial ledgers, gaming, payment solutions, Cloud Spanner is a peculiar choice for a cloud database.
On GCP it is also, more simply, to lift and shift your Oracle database from on-premises to the cloud with an ad-hoc solution called Oracle on Bare Metal. It can replicate your local set up of multiple types of Oracle databases, providing specialized hardware to sustain the necessary workloads with low latency.
Key practices for cloud database monitoring
Leaving the realm of relational databases, the basic NoSQL cloud database on GCP is Firestore. Once called Datastore, it is a document-based cloud native database that is an all-purpose choice for moderate workloads. Fully-managed, scalable, and serverless, Firestore is generally good for any task, as an unspecialized cloud database. If more power is desired, Google Cloud Bigtable is a similar database but guaranteeing lower latency, limitless scaling, and much higher availability. It is compatible with Apache HBase, an open source non-relational database, and easy to switch from.
Part of Google Cloud database monitoring are not just typical databases but also similar collections of data. Google BigQuery belongs to one of these, a data warehouse. BigQuery is like AWS Redshift, geared towards real-time processing workloads, data analytics, and machine learning tasks.
In-memory caching services could not lack on GCP and Memorystore for Redis and Memcached is supported on Google Cloud. Coming with low latency and high performance, Memorystore is unsurprisingly present on all of the big 3 cloud providers, and is useful to power applications necessitating millions of small operations per minute, like news feeds, chats, socials, web and mobile apps and more.
What metrics to monitor in cloud database monitoring?
Cloud database monitoring includes multiple types of databases, spanning various cloud services, as just seen. These export their set of metrics, which often overlap but not completely. Therefore, slight differences exist when doing cloud database monitoring depending on the exact type of database that it is being monitored. More differences arise when monitoring databases deployed on different clouds. The type and name of a metric may be quite unlike the same from another cloud, referring to the same type of database.
As a minimum, some metrics must be checked to ensure the efficiency, health, and possibly security of each of your cloud databases. CPU and memory utilization, number of database connections, number of outstanding I/Os requests waiting to access the disk, and the incoming and outgoing traffic from the database instance are key in database performance monitoring. These should never be ignored. Disk read operations, used local storage, read latency and throughput, and number of write operations per second are also extremely insightful and available for monitoring in all cloud databases.
Whether you are necessitating doing Google Cloud database monitoring or Azure database monitoring, or relying on an AWS environment for your databases, the key metrics to monitor are deep-down these just mentioned.
What to monitor in cloud databases?
Cloud database monitoring involves collecting multiple checks and metrics to be comprehensive enough. It spans cloud database performance monitoring, to have databases at the top of their efficiency, and cloud database security monitoring, to ensure no unauthorized accesses happen. Thus, many metrics and data need to be taken into consideration when monitoring a cloud database.
Database health is the first step. Checking that the databases are up and can be reached should not be dismissed as obvious. Any database monitoring tools can alert you when a database is down or becomes inaccessible. Then it is important to monitor each database’s size, to anticipate storage requirements and potential scalability issues. Backups should be regularly performed, and their operations should be carefully monitored. This includes recovery.
Database performance monitoring
Database performance monitoring is the core of the whole monitoring. Cloud administrators would do well in keeping a constant eye on metrics like CPU and memory utilization, latency, queries throughput, disk I/O, as a minimum. If all these are normal and connectivity is green, it can be fairly safely assumed that there are no immediate concerns with your databases. It may be worth investigating query performance in more detail as well. Queries that take a long time to execute, index usage, and, in the case of cloud SQL databases, the query execution plan are worthy info to monitor to keep the databases’ efficiency high. Monitor trends of all these metrics, if possible, to identify usage trends and plan for scaling the database resources as needed.
Most cloud database solutions export their logs for monitoring. Keep a close eye on the error logs of each of your cloud databases for any exception. A good alerting system can help to immediately inform cloud administrators of any new issue here.
Database security monitoring
Lastly, cloud database security must be ensured through accurate cloud database monitoring. Suspicious activity or unauthorized access attempts must be carefully monitored to protect databases from security breaches. Data at rest and in transit should ideally be encrypted, to meet standard security and compliance requirements. Review audit logs to track activity of every database user, so to detect potential security violations. These are the basic checks that involve cloud database security, but by no means the only ones.
Available cloud database monitoring tools
All the big three cloud vendors offer tools to monitor their environments, databases included. In all of them a few dashboards can be included in a main cloud monitoring tool to pull in data coming from databases. They work similarly to each other, but clearly differ in how the info is presented and the types and names of the metrics used.
The primary cloud database monitoring tool offered by AWS is CloudWatch. For the AWS RDS service there is a specific tool called Amazon RDS Performance Insight. Azure has a similar main tool, Azure Monitor, which is used to implement all types of monitoring on their cloud platform. Specific extensions for it exist for each type of database that Azure supports. For GCP, Google Cloud Monitoring integrates metrics coming from all the various databases present on the Google platform.
All of these tools are directly available once there is an account on one of these cloud services. They come at a cost, as their usage requires storage, network, and computing utilization that is therefore charged. Thus, they are only nominally-free cloud database monitoring tools, since although they are free to use, that use consumes resources that will need to be paid for.
Third-party cloud database monitoring tools
The situation is a bit different with third party cloud database monitoring solutions. In most cases they offer monitoring both on-premises and cloud databases, offering a more complete solution than tools from AWS, Azure, or GCP. In the majority of cases these tools were born to monitor on-premises databases and later adding support for cloud ones. The support for monitoring databases running on local servers is generally better than large cloud providers tools that are much more cloud-oriented.
Third party tools may end up also being cheaper, with more generous free tiers. This largely depends from software to software, and great attention should be paid to costs in all database monitoring solutions. Third party alternatives are often paid by the hosts or services to monitor, rather than the actual quantity of used resources. While monitoring hosts can certainly turn out to be expensive, services may be less cost-intensive. Checkmk follows the services monitoring model, with a generous free tier that is perfect for new users, for learning the tool, and later upgrading the subscription only once convinced of its usefulness.
Cloud database monitoring is a complex endeavor that is only partly covered by the integrated tools of cloud providers. These suffice for the most typical usage cases, and when little customization is necessary. Once your cloud infrastructure grows, or you lift and shifted a fairly complicated on-premises infrastructure, these tools show their limits. External cloud database monitoring tools like Checkmk offers, can go far beyond these limits and be a complete monitoring solution, for small to enterprise companies alike.
FAQ
Most cloud database solutions have a free usage tier that enables testing and limited use operations. Both AWS, Azure, and GCP offer such tiers. Other database vendors offer free cloud databases up to certain limits of storage space, used bandwidth, and session/user numbers. If more is needed, a subscription or license must be paid.
A cloud-native database is a database that is designed to leverage the advantages of cloud architectures. Cloud-native databases are optimized to make use of cloud-native principles such as containerization, microservices, scalability, high availability, and agility.