Cloud monitoring for high-performing applications

Illustration showing a hybrid environment

What is cloud monitoring?

Cloud monitoring is the collection of methods for observing, reviewing, and managing operational workflows in a cloud-based IT infrastructure. It does not differ in principle from monitoring on-premises hardware and software, but in practice it presents a few distinct challenges and differences that need to be considered when setting it up.

Cloud offerings are largely split between the three main vendors  Amazon Web Services (AWS), Microsoft Azure, and Google. Sometimes more than one cloud service is even used in the same infrastructure. Most cloud monitoring tools support more than one of the existing cloud vendors. 

Nowadays, the great majority of infrastructures are cloud-based or hybrid, presenting both on-premises and cloud services and devices. Monitoring devices and services in the cloud together with the classical, physical ones is a complex challenge for IT administrators and cloud monitoring tools to face. But to keep efficiency high and prevent issues, it is necessary to keep track of the entire infrastructure, wherever parts of it may actually be. Considering the large range of applications that can be run on the cloud, and their slight differences across the various cloud vendors, it is easy to understand how cloud monitoring can become complex.

We will see how and what to monitor later in this guide. First, let us focus on why cloud monitoring is important.

Why is cloud monitoring important?

As wisdom would suggest, if you do not know that something exists and how it works, you cannot know when that something is malfunctioning, under stress or soon to break down. Monitoring, including cloud monitoring, keeps you informed of every piece of your infrastructure at all times, both physical and virtual.

Needless to say, if your company depends on clouds to function, then monitoring them is a necessity rather than a choice. It is simply wise to be aware of as many metrics and as much information as possible that comes from your cloud, so you can prevent or fix issues, plan for improvements, and optimize it. Consider as well all the services that can be run in a cloud: servers, virtual machines, databases, virtual networks, storage, and more. Each of these requires a different range of metrics to keep track of, all of which are important to obtain an overall picture.

Cloud monitoring encompasses all these components. As with local monitoring, each individual part of a network can cause issues and, as none is completely isolated from others, can break other parts or bring down the whole network. Cloud networks are logical versions of physical ones and subject to similar problems. Therefore, cloud monitoring is an extension of local monitoring that has to be implemented to ensure that cloud environments perform efficiently and without issues. Typically, it involves a hybrid environment in which on-premises components and cloud-based components work together.

How cloud monitoring works

Cloud monitoring starts with the right tool. There are a few options here. One is to use the tools provided by the cloud vendor of your choice. Microsoft Azure uses Azure Monitor, a cloud monitoring tool to collect, analyze, and act on the telemetry data of Azure-based environments. Amazon Web Services (AWS) has CloudWatch as an in-house solution to monitor resources and applications on AWS clouds. For Google Cloud Platform (GCP) monitoring, Cloud Monitoring is available directly from Google.

These are set up to analyze data coming mostly from cloud environments, but not exclusively. AWS CloudWatch can for instance monitor on-premises servers as well. While these cloud monitoring software solutions are perfectly capable of monitoring their relative clouds, they cannot be extended to monitor clouds from other vendors. If your infrastructure is mixed, using more than one cloud environment, you need to use more than one cloud monitoring tool to monitor everything. This increases redundancy, presents compatibility challenges, and is slightly heavier on resources than a unified monitoring solution.

More reasonably, you could choose a single cloud monitoring tool that is not vendor-specific. Going beyond the boundaries of different cloud vendors, this class of tools can monitor more than one cloud in addition to your on-premises hosts. This choice brings together all the metrics, regardless of the clouds they come from, and provides a uniform, especially complete, view of your infrastructure. This type of cloud monitoring tool is capable of monitoring both private clouds that have dedicated hardware, and public clouds, like AWS, Azure and GCP. It is a monitor-it-all system.

In practical terms and regardless of the monitoring tool used, cloud monitoring is effected either with the installation of an agent, like it would be on a local network, or through the use of APIs exposed by the cloud environment. In the end, the results are similar, but differ in type and quantity of the metrics that the agent or the API offers for polling.

Speaking of metrics, cloud monitoring is closer to any classic monitoring system than one might imagine. After all, clouds are either actual hardware running somewhere else, or a virtualization of hardware. In both cases, how they operate does not differ substantially from real, physical systems running locally. We will see more of the exact metrics collected with cloud monitoring software when discussing the various subtypes of cloud monitoring. For now, it is sufficient to say that a state-of-the-art cloud monitoring tool, such as Checkmk, is capable of both installing agents on the cloud and using the APIs of each cloud vendor it supports, making collecting metrics a straightforward task.

illustration seamless integration

Private vs public clouds in monitoring

In a private cloud, all hardware and software resources are dedicated to a single company, either internally or through a third-party provider. Similar to dedicated vs shared in web hosting, a private cloud provides a higher level of management, control, and security. Public clouds, like AWS, Azure, and GCP offer their resources to multiple customers at the same time. The hardware is not dedicated to your organization. Public clouds provide resources as needed and depending on the subscribed plan.

In cloud monitoring, there are some differences between public and private clouds. Private clouds have a clear advantage in terms of control. It is easier to customize the cloud environment according to your needs there is more privacy and control on how the cloud resources are handled and installing specific agents for monitoring is more straightforward. As a result, a private cloud also enables much more in-depth monitoring. A private cloud may appear as the best choice overall, but it incurs higher costs compared to public clouds, which often makes them a not so obvious choice.

Public clouds, on the other hand, are cheaper overall, have a far lower level of maintenance, enjoy nearly unlimited scalability, and take advantage of higher redundancy. Not being limited to a fixed set of hardware means that, if anything fails, there is a backup ready. Most companies go with public clouds for these reasons and accept that there is less control of the environment and less customizability. Monitoring a public cloud may be limiting, depending on how easy installing your own monitoring agents is, and on how vast the set of metrics exposed by the cloud vendor’s APIs is.

In the end, as with many things in computing, the choice between public vs private cloud is a matter of choosing the right compromise for your organization. Both can be monitored, just with different levels of flexibility and ease.

Differences between cloud and SaaS monitoring

There is another difference in cloud monitoring that does not apply to on-premises monitoring. In the latter, the monitoring software is usually hosted on a server that is part of the local network. The need to install, configure, upgrade, and maintain it are tasks for the IT team.

In cloud monitoring, there is the possibility to install the monitoring tool of choice in the cloud as well. The burden of installing, updating and running it is usually delegated to the cloud administrator. Cloud monitoring is thus offered as a software as a service (SaaS) and runs alongside any other service you may have on the cloud. It uses the cloud’s resources and may incur higher service fees.

SaaS monitoring is also used to refer to the process of monitoring the performance and usage of SaaS applications. In this sense, the term differs not according to where the monitoring software is hosted, but according to what it monitors. Cloud monitoring may be a term for monitoring cloud infrastructure, while SaaS monitoring focuses on the applications that run on the cloud. SaaS monitoring has a synonym in cloud application monitoring. As with many terms in IT, both meanings are valid, and the difference is only context-related.

Types of cloud monitoring: What needs to be monitored in a cloud infrastructure?

Cloud infrastructure is large and complex, but not necessarily different from on-premises infrastructures. Any virtual infrastructure generally follows the same rules as the physical, which is also reflected in its monitoring. Both share many similarities, and a few differences. For example, the usual metrics, such as CPU and memory usage, network utilization, storage space, network throughput, lag, and every services’ status need to be included in a comprehensive monitoring strategy in cloud monitoring as well as in local network and server monitoring.

A key difference is the greater scalability of clouds, since one of the main advantages of cloud infrastructure is that it can scale very quickly and easily. Unlike on-premises ones, when a cloud network reaches capacity, it automatically scales to allow for the increased resource usage without hiccups. Thus, all the metrics related to capacity are of somewhat less importance in cloud monitoring than in a normal network or server monitoring. Their importance is more to keep track of the costs associated with their use rather than worrying about reaching capacity.

Cloud monitoring has a wealth of moving parts. It is best to see what needs to be monitored depending on the use case and type of service you intend to keep an eye on.

Cloud based server monitoring

Whether a server is running in the cloud or on-premises, the metrics to monitor are roughly the same. Although cloud server monitoring is about monitoring a virtual server, it still behaves as a server, and the exposed metrics are not completely different.

CPU, memory, and disk usage are the bare minimum to find out how the server is handling its workload and whether there is a need to scale it soon. Monitoring the network traffic of a cloud server is essential to identify bottlenecks and potential network issues. A good indication of the latter are long response times and a high error rate for the server in question. These are clear signs that not everything is running smoothly in your cloud environment, and the server should be checked. Lastly, monitoring a server’s uptime and status tell you whether the server is up and responding.

Any cloud monitoring software is capable of gathering these metrics and helping you with cloud server monitoring.

Virtual network monitoring

Virtual networks are software-defined networks that are a main component of cloud environments. They allow physically distant locations to operate on the same resources without using a VPN, and to delegate the actual hardware set up and maintenance to the cloud vendor of choice. They are an emulation, in practice, but nonetheless vital for companies relying on cloud or hybrid infrastructures. Virtual network monitoring is the branch of cloud monitoring that is charged with keeping these networks healthy and efficient.

Monitoring in the cloud therefore also means taking the pulse of the virtual networks that compose it. As with cloud server monitoring, what to monitor does not differ largely from its physical counterparts. Network latency, throughput, error rates, and the health status of each of the virtual devices that make up the network nodes are the basic metrics to keep an eye on, which is the same as it would be on a real network.

Virtual networks have the advantage of being much more flexible and scalable than a classic, physical network. Being defined in software, similar to other cloud services, means that these networks can quickly be modified, extended, or reduced. Virtual network monitoring should be capable of following these modifications without having to reconfigure the monitoring itself.

Cloud database monitoring

Databases are a key piece of every infrastructure. They contain the data that fuels most of the other functionalities. Databases exist in the cloud too and should be monitored as any other service.

Cloud database monitoring is the branch of cloud monitoring that takes care of checking the databases that are present on cloud networks. AWS RDS, Azure SQL, and GCP Cloud SQL are common cloud databases, but these are not the only ones. Generally, any cloud monitoring solution should, at minimum, be able to monitor these databases and, as is the case with Checkmk, possibly more.

The key metrics to monitor for databases are the overall cloud database resource usage, transactions per second, and cache hit ratio in addition to data and log files. Cloud databases are no different, and these metrics must be collected for proper cloud database monitoring as well.

Cloud storage monitoring

Storage is a necessity both on-premises and in the cloud. All cloud vendors offer a series of options to store files and applications. These range from simply having storage to back up your documents to usage tiers customized for enterprise needs. Being in a cloud environment means that this storage can easily be expanded when the need arises, unlike on-premises where physical drives must be purchased and installed before they can be used.

Regardless of what is stored in the cloud, it can be monitored. Cloud storage is the basis for many other services, like website hosting, application data, databases, and more. Monitoring is clearly beneficial and a good practice to make sure none of your services run out of space and that there are no bottlenecks in performance due to storage inefficiency. Cloud security monitoring intersects with cloud storage monitoring since security of the data hosted on the cloud is of primary importance for overall security.

Cloud storage monitoring is the branch of cloud monitoring that monitors the various offers for cloud storage present on the market. These include AWS S3 (Simple Storage Service), AWS EBS (Elastic Block Store), and AWS ECS (Elastic Container Service) as part of AWS cloud monitoring. Azure has a series of different offerings related to storage, such as Blob, Disk, Queue, File, and Table Storage. GCP storage is simply called Cloud Storage. There are clearly small differences, especially in relation to the intended use of the storage space, but the more vital the files hosted are, the more important it is to use a cloud monitoring solution to monitor them.

Azure storage dashboard

Website monitoring

Website hosting is easily done on the cloud. While dedicated hosting has existed for decades now, all the major cloud vendors only relatively recently started offering to host a website on their cloud services as well. It makes sense to have your company's website on the same cloud as your other internal or external services since this also makes it much easier to monitor them. Whether you are in need of a fully operating web server for a web app, or simply require static website, it is possible to have it on the cloud vendor of your choice.

From an organizational and monitoring point of view, it is more manageable to have everything on the cloud, whether it is a website, databases, networks, and any other file used internally or otherwise. Benefitting from the advantages in scalability of the cloud, a website hosted there can answer a spike in visits much more flexibly than most normal web hosting services.

As with all things vital for your company, monitoring your website’s health is also essential. Luckily, any cloud monitoring tool can monitor the critical metrics for a website, like bandwidth and disk space usage.

Virtual machine monitoring

Virtual machines (VMs) are commonly used in infrastructure, and cloud environments are no exception. With the ease with which these can be created in the cloud, they are often used by companies to have fresh environments for employees or testing.

Virtual machine monitoring deals with keeping track of the health and efficiency of VMs, and a virtual machine in the cloud is no different. The usual array of metrics such as CPU usage, memory utilization, I/O latency, VM uptime, network and disk usage need to be collected to know how well all your virtual machines are performing.

In the cloud, it is easier to have multiple VMs running at all times. Cloud monitoring software has to be capable of following the creation and erasure of VMs on the cloud, of knowing when to start monitoring a new one and when to abandon the monitoring of one that just shut down. Given the quantity and dynamic nature of VMs that can be quickly created in a cloud environment, monitoring them is a more difficult task than on-premises (very broadly speaking, exceptions here are common). Keeping track of such ephemeral VMs, which can be launched and dropped in seconds, in real time, is a complex challenge for cloud monitoring tools and administrators.

GCP GCE dashboard

Cloud performance monitoring

All the different processes and cloud workloads must run efficiently and with optimal performance, which is the aim of cloud performance monitoring. As with cloud monitoring in general, this can be done with dedicated tools from each of the cloud vendors or with general third party cloud monitoring tools.

For cloud performance monitoring, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor can give you the information you need to monitor the performance of all aspects of your respective cloud environments. If all you need is a single cloud, and your on-premises hardware and software monitoring needs are already covered, this may be a reasonable option.

Alternatively, to have much better customizability and flexibility, a dedicated cloud monitoring solution is a more viable option. Unlike integrated monitoring tools, a separate software solution that enables covering all your monitoring needs, on the cloud and locally, can save you resources, provide better interoperability, and be generally more flexible to your specific use cases.

Whichever you choose, cloud performance monitoring is about checking performance-sensitive metrics of all the services offered by the cloud. For apps, that could be CPU utilization; for storage and databases, disk usage; for virtual networks traffic use and latency; and so on. Each of the specific cloud services have their own performance metrics. Collecting these gives you not only a view of how well a cloud environment is performing, but also if there is any need to upgrade or optimize it.

To have a meaningful cloud performance monitoring setup, metrics have to be continuously monitored, and alerts sent out when any of them hit a predefined threshold. It is not a passive endeavor, but an active one to ensure that the infrastructure is running smoothly. Cloud monitoring software must be responsive to perform optimal cloud performance monitoring.

Cloud security monitoring

Cloud security monitoring groups a series of processes together that allow you to review, manage, and observe the workflow and security of cloud environments. Due to the sensitivity of data hosted on the cloud and the key operational importance of cloud services, security has to be a primary concern for any IT team.

While cloud vendors have spent a great deal of resources to make sure that data breaches and vulnerabilities are kept to a minimum, it is obviously never a risk-free endeavor. Cloud monitoring tools, however, also check the security of the cloud environments as well as their performance and efficiency.

In practical terms, this means a lot of checks. Active monitoring of connection attempts for possible intrusions involves checking the traffic volume across all your cloud networks for unusual usage spikes and monitoring session lengths, which may indicate malicious activity. User policies and permissions should regularly be checked as well along with certificate configurations and the renewal of expired ones. It may go as far as checksumming the cloud backups and storage files against the local version to ensure nothing was tampered with or became corrupted.

Cloud security monitoring is not exclusively about ensuring the safety of data in the cloud. It is also important to follow compliance regulations, which mandate implementing observation tools in the form of cloud monitoring software to comply with these regulations. Internal and external audits often focus on security. Therefore by necessity, monitoring in the cloud must include security checks to be in compliance with such audits. Usually, cloud server monitoring and cloud network monitoring are the main areas to focus on for security, without discounting the need to ensure the security of cloud storage and its related data.

Cloud monitoring best practices

The best way to create a good cloud monitoring strategy is by initially assessing the infrastructure. Determine if it is split across different cloud providers, if it has on-premises services, what types of clouds are involved and, in general, how all infrastructure is formed. This way a strategy can be planned that, oftentimes involves a centralized cloud monitoring tool that can monitor it all.

The reason for having a single monitoring solution that can encompass at least the main cloud vendors (AWS, Azure, and GCP) is that it is easier to deploy and maintain, and that it is a single point of control for the whole infrastructure. It is much easier to assess who has to have access to the monitoring dashboard when there is only one solution instead of multiple pieces, each monitoring part of the infrastructure.

Once the cloud monitoring solution is in place and configured, it is important to establish the metrics that are to be collected depending on the services you run on the cloud. Perhaps the infrastructure is mostly about apps running on services like AWS Lambda or Azure Functions, or perhaps a large use of databases is needed, like AWS DynamoDB or GCP Cloud SQL. In the case of the former, you may want to track CPU and memory usage and ensure that the number of microservices are within the expected range for the app. In the case of the latter, metrics like number of accesses, query performance, and data integrity are more important.

Tailor the metrics to the way you use your cloud, and set up relative alarms to get an alert as soon as anything goes beyond the threshold for normal functioning. Cloud monitoring involves a great quantity of pieces that have different uses and possible issues. Therefore, monitoring needs to be customized according to exactly what you are utilizing.

Further, the various cloud vendors apply different cost plans according to what is used, and how much. With cloud monitoring solutions, they are often able to monitor the current resource usage of the respective cloud platform and thus keep an eye on the costs incurred.

Screenshots monitoring costs in AWS

Cloud monitoring challenges

Cloud monitoring has its own set of challenges compared to on-premises monitoring. These are inherent to the nature of cloud environments and must be addressed, or at the very least considered, when setting up a cloud monitoring solution.

Clouds are dynamic and elastic environments, and resources come and go on-demand and automatically. Monitoring a system that is in flux and that can change any minute – if not any second – means that a software solution must be capable of keeping track of any types of changes in real time. Setting up monitoring once and forgetting about it is not an option when doing either cloud or hybrid cloud monitoring. You must keep in mind that networks can change, servers can be shut down, and that microservices arise and disappear in a blink. Both administrators and monitoring software have to be ready for things to change at any time and without necessarily triggering an alert or notification.

A cloud environment may also be distributed across multiple locations separated by a great geographical distance. An increase in latency to receive information is to be expected. Cloud network monitoring needs to take into account the possible lower performance in such distributed systems. To address this, cloud vendors offer various locations / regions for their data centers to mitigate the distance between your own networks and theirs. Yet in some hybrid environments, where both on-premises and cloud services are present, a higher latency is to be expected due to the still existing distance between cloud and local resources. A cloud monitoring solution needs to be strong in monitoring both sides of these networks.

Especially when retrieving the monitoring data via the cloud provider's API, you may have to contend with some restrictions. As with any API, only what is exposed can be polled, and in monitoring that means that only the metrics that are available can be collected. On-premises, you have complete access to devices and applications, which may not be exactly true for an environment that is not completely under your control.

Cloud environments may require more security and compliance measures, especially to ensure that data is protected. Cloud security monitoring may become more difficult to execute as there may not be the same level of control over resources as with on-premises.
Lastly, costs cannot be ignored when using a cloud environment. There are multiple usage tiers and every service is billed differently. However, monitoring the cloud may incur further costs that have to be calculated and tracked to ensure they remain acceptable.

How to choose a cloud monitoring tool?

With the vast array of services and cloud offerings available, choosing a cloud monitoring tool is not straightforward. As previously mentioned the first choice to make is to select software that can monitor all the clouds your infrastructure is going to be based on. The more cloud environments you need, the fewer choices you may end up having. Less common cloud services may not be supported by every monitoring tool and even within the big three, AWS, Azure, and GCP, not all services and resources are included in monitoring. Luckily, at least the 3 major cloud monitoring services are generally well-supported by cloud monitoring solutions, so this should not pose too much of a limitation.

It is highly desirable to have a tool that is easy to set up and configure. Cloud environments are quite complex already, so adding complexity is not advisable. Similarly, a tool that has a modern user interface, with plenty of dashboards to visualize the data, will give you an advantage when monitoring in the cloud. Checkmk fulfills these requirements.

Automation in monitoring is important to reduce human error and ease monitoring. In cloud environments, this means auto-discovery of new and stale services that can then be automatically added to or removed from monitoring. Checkmk auto-discovers services, and adds and removes hosts in real-time, so changes in the cloud environment are immediately reflected on the monitoring side.

Features to look for in a cloud monitoring software

The right monitoring solution must be able to meet your cloud requirements: whether you are going to use AWS cloud monitoring, or Azure cloud monitoring, or GCP cloud monitoring, you need a tool that can support interfacing with these vendors. There are other vendors as well that should be supported if you plan on using any of them.

Similarly, a good cloud monitoring solution needs to be able to monitor all the services and resources you need for your use case. Some free or cheap solutions have a service limit that may be too restrictive for your needs. Premium versions of these tools usually remove those limits, but it is still good to check if you are going to use an unusually large number of services in the cloud to necessitate a powerful tool, or a custom quote. It may be that you need to buy additional capacities for the cloud monitoring tool you chose.

Your cloud monitoring tool of choice has to be cloud-capable and not just adapted to it. It must be able to interface with your cloud environments as well as your on-premises networks and hosts, and it must support the cloud services you need.

Conclusion

Using the cloud comes with increased scalability, customizability, and flexibility. In terms of monitoring, these require a capable monitoring solution. The flux of services and hosts being launched and stopped every minute is a lot larger than what happens on-premises. Normal monitoring tools are simply not sufficient unless they have been developed with cloud monitoring in mind.

Cloud security monitoring and cloud performance monitoring also imply adapting the monitoring software to an environment that is not completely managed by your organization. Operating hybrid cloud settings adds some advantages but also means facing new difficulties in monitoring compared to what you are already handling on your physical networks. To monitor it all, you need a state-of-the-art monitoring tool such as the Checkmk Cloud Edition.

Monitor with Checkmk all cloud-native technologies like managed databases, functions, and microservices across all the main cloud vendors such as AWS, Azure, and Google Cloud Platform. With dedicated dashboards to easily visualize all your clouds, monitoring all the various resources and components becomes a simple matter. Checkmk is capable of handling  these ephemeral infrastructures.

FAQ

What is the difference between SaaS, PaaS, and IaaS?

SaaS, PaaS, and IaaS are different computing service models. PaaS (Platform as a Service) are complete platforms to develop, deploy, and manage applications and services in a cloud. SaaS (Software as a Service) provides access only to software that performs a service, without the need for customers to install or maintain anything. IaaS (Infrastructure as a Service) is a whole system to not only deploy and develop apps, but to also run business resources such as networks, servers, databases, storage and more.

What is hybrid cloud monitoring?

The term hybrid cloud monitoring describes monitoring hybrid infrastructures. These consist of both on-premises and cloud networks, hosts, and services. It does not matter how much of the infrastructure is local or on the cloud, nor if more than one cloud service is being used; a hybrid infrastructure is defined by simply having parts in both environments.

What is the difference between a cloud and SaaS monitoring solution?

A cloud monitoring solution can be used from the cloud but leaves the duty of installing and setting it up to your IT department. A SaaS monitoring cloud infrastructure is already present on a given cloud, and only needs to be accessed and configured to be utilized. It removes the need to install and update, leaving the maintenance to the provider. In turn, it loses some flexibility.

What is load balancer monitoring?

Load balancer monitoring is the process of monitoring load balancers. These, often shortened to LB, are software or devices, both physical and virtual, that distribute incoming network traffic across multiple resources. LBs ensure high availability of applications or services, and prevent overload. Monitoring load balancers is therefore necessary to exploit their usefulness and avoid their failures, which could easily cause network congestions.

What is a hyperscaler?

Hyperscaler is a computing term that comes from the ability of hyperscaling a software architecture according to increased demand. There is no universally accepted definition of a hyperscaler but all the main cloud providers systems such as AWS, Azure, GCP, IBM Cloud and so on, fit the definition of a hyperscaler.