Cloud monitoring for high-performing applications

Illustration showing a hybrid environment

What is cloud monitoring?

Cloud monitoring is the collection of methods for observing, reviewing, and managing the operational workflow in a cloud-based IT infrastructure. It does not differ in principle from the monitoring of on-premises hardware and software, but in practice it presents a few distinct challenges and differences that need to be considered when setting it up.

Cloud offerings are largely split between the three main vendors AWS, Azure and Google, and sometimes more than one cloud service is used in the same infrastructure. Most cloud monitoring tools support more than one of the existing cloud vendors. 

Nowadays, the great majority of infrastructures are cloud-based or hybrid, presenting both on-premises and cloud services and devices. Monitoring devices and services in the cloud together with the classical, physical ones is a complex challenge for IT administrators and cloud monitoring tools to face. But to keep its efficiency high and prevent issues, it is necessary to keep track of it all, wherever it may actually be. Considering the large range of applications that can be run on the cloud, and their slight differences across the various cloud vendors, it is easy to understand how cloud monitoring can become complex.

We will see how and what to monitor later on in this guide. First, let us focus on why cloud monitoring is important.

Why is cloud monitoring important?

As wisdom would suggest, if you do not know that something exists and how it works, you cannot know when that something is misbehaving, under stress or soon to break down. Monitoring, including cloud monitoring, keeps you informed of every piece of your infrastructure at all times, both physical and virtual.

Needless to say, if your company depends on clouds to function, then monitoring them is a necessity, rather than a choice. It is simply wise to be aware of as much metrics and information that comes from your cloud as possible, so you can prevent or fix issues, plan for improvements, and optimize it. More so if we think about all the services that can be run in a cloud: servers, virtual machines, databases, virtual networks, storage, and more. Each of these require a different range of metrics to keep track of, which are all important for the overall picture at the same time.

Cloud monitoring encompasses all these components. As with local monitoring, each individual part of a network can cause issues and, as none is completely isolated from others, can break other parts or bring down the whole network. Cloud networks are logical versions of physical ones and subject to similar problems. Therefore, cloud monitoring is an extension of local monitoring that has to be implemented to ensure that cloud environments perform efficiently and without issues. Typically, it involves a hybrid environment in which on-premises components and cloud-based components work together.

How cloud monitoring works

Cloud monitoring starts with the right tool. There are a few options here. One is to use the tools provided by the cloud vendor of your choice. Azure uses Azure Monitor, a cloud monitoring tool to collect, analyze, and act on the telemetry data of Azure-based environments. AWS has CloudWatch as an in-house solution to monitor resources and applications on AWS clouds. For GCP monitoring, Cloud Monitoring is available directly from Google.

These are set up to analyze data coming from cloud environments mostly, but not exclusively. AWS CloudWatch can for instance monitor on-premises servers as well. While these cloud monitoring software are perfectly capable of monitoring their relative clouds, they cannot be extended to monitor clouds from other vendors. If your infrastructure is mixed, using more than one cloud environment, you need to use more than one cloud monitoring tool to monitor everything. This increases redundancy, presents compatibility challenges, and is slightly heavier on resources than a unified monitoring solution.

< More reasonably, you would choose a single cloud monitoring tool that is not vendor-specific. Going beyond the boundaries of different cloud vendors, this class of tools can monitor more than one cloud, as well as your on-premises hosts. This choice brings together all the metrics, regardless of the clouds they come from, and provides a uniform, and especially complete, view of your infrastructure. This type of cloud monitoring tool is capable of monitoring both private clouds that have dedicated hardware, and public clouds, like AWS, Azure and GCP. It is a monitor-it-all system.

In practical terms and regardless of the monitoring tool used, cloud monitoring is effected either with the installation of an agent, like it would be on a local network, or through the use of APIs exposed by the cloud environment. In the end, the results are similar, but differ in type and quantity of the metrics that the agent or the API offers for polling.

Speaking of metrics, cloud monitoring is closer to any classic monitoring system than one imagines. After all, clouds are either actual hardware running somewhere else, or a virtualization of hardware. In both cases, how they operate does not differ substantially from real, physical systems running locally. We will see more of the exact metrics collected with a cloud monitoring software when discussing the various subtypes of cloud monitoring. For now, it is sufficient to say that a state-of-the-art cloud monitoring tool, like for example Checkmk, is capable of both installing agents on the cloud and using the APIs of each cloud vendor it supports, making collecting metrics a straightforward task.

illustration seamless integration

Private vs public clouds in monitoring

In a private cloud, all hardware and software resources are dedicated to a single company, either internally or through a third-party provider. Similar to dedicated vs shared in web hosting, a private cloud guarantees a higher level of management, control, and security. Public clouds are those well known, like AWS, Azure and GCP, that offer their resources to multiple customers at the same time. The hardware is not dedicated to your organization, but public clouds provide resources as needed and to whomever may need them, depending on the subscribed plan.

In cloud monitoring, there are some differences between public and private clouds. Private clouds have a clear advantage in terms of control. It is easier to customize the cloud environment according to your needs, there is more privacy and control on how the cloud resources are handled, and installing specific agents for monitoring is more straightforward. As a result, a private cloud also enables much more in-depth monitoring. A private cloud may appear as the best choice overall, but it incurs higher costs compared to public clouds, making them a not so obvious choice.

Public clouds, on the other hand, are cheaper overall, have a far lower level of maintenance, nearly unlimited scalability, and higher redundancy. Not being limited to a fixed set of hardware means that if anything fails, there is a backup ready. Most companies go with public clouds for these reasons, accepting that there is less control on the environment and less customizability. Monitoring a public cloud may be limiting, depending on how easy installing your own monitoring agents is, and how vast the set of metrics exposed by the cloud vendor’s APIs is.

In the end, as with many things in computing, the choice between public vs private cloud is a matter of choosing the right compromise for your organization. Both can be monitored, just with different levels of flexibility and ease.

Differences between cloud and SaaS monitoring

There is another difference in cloud monitoring that does not apply to on-premises monitoring. In the latter, the monitoring software is usually hosted on a server that is part of the local network. The usual needs of installing, configuring, upgrading, and keeping it running are the tasks of the IT team.

In cloud monitoring, there is the possibility of installing the monitoring tool of choice in the cloud as well. The burden of installing, keeping it updated and running is thus usually delegated to the cloud administrator. Cloud monitoring is thus offered as a SaaS (software as a service), and runs alongside any other service you may have on the cloud. It clearly uses the cloud’s resources, which may incur higher service fees.

SaaS monitoring is also used to refer to the process of monitoring the performance and usage of SaaS applications. In this sense, the term differs not according to where the monitoring software is hosted, but according to what it monitors. Cloud monitoring may be a term for monitoring cloud infrastructure, while SaaS monitoring focuses on the applications that run on the cloud. SaaS monitoring has a synonym in cloud application monitoring. As with many terms in IT, both meanings are valid, and the difference is only context-related.

What needs to be monitored in a cloud infrastructure?

Cloud infrastructure is large and complex, but not necessarily different from on-premises ones. The virtual generally follows the same rules as the physical, which is also reflected in its monitoring. Both share many similarities, and a few differences. For example, the usual metrics, such as CPU and memory usage, network utilization, storage space, network throughput, lag, and every services’ status need to be included in a comprehensive monitoring strategy in cloud monitoring as well as in local network and server monitoring.

A key difference is the greater scalability of the clouds, since one of the main advantages of a cloud infrastructure is that it can scale very quickly and easily. Unlike on-premises ones, when a cloud network reaches capacity, it automatically scales to allow for the increased resource usage without hiccups. Thus, all the metrics related to capacity are of somewhat less importance in cloud monitoring than in a normal network or server monitoring. Their importance is more to keep track of the costs of their use, rather than worrying about reaching capacity.

Cloud monitoring has a wealth of moving parts within it. It is best to see what needs to be monitored depending on the use case and type of service you intend to keep an eye on.

Cloud based server monitoring

Whether a server is running in the cloud or on-premises, the metrics to monitor are roughly the same. Although cloud server monitoring is about monitoring a virtual server, it still behaves as a server, and the exposed metrics are not completely different.

CPU, memory, and disk usage are the bare minimum to find out how the server is handling its workload and whether there is a need to scale it soon. Monitoring the network traffic of a cloud server is essential to identify bottlenecks and potential network issues. A good indication of the latter are long response times and a high error rate of the server in question. These are clear signs that not everything is going smoothly in your cloud environment, and the server should be checked. Lastly, monitoring a server’s uptime and status will give you the basic info on whether the server is up and responding.

Any cloud monitoring software is capable of gathering these metrics and help you with cloud server monitoring.

Virtual network monitoring

Virtual networks are software-defined networks that are a main component of cloud environments. They allow physically distant locations to operate on the same resources without using a VPN, and delegate the actual hardware set up and maintenance to the cloud vendor of choice. They are an emulation, in practice, but nonetheless vital for companies relying on cloud or hybrid infrastructures. Virtual network monitoring is the branch of cloud monitoring that is charged with keeping these networks healthy and efficient.

Monitoring in the cloud therefore also means keeping the pulse of the virtual networks that compose it. As with cloud server monitoring, what to monitor does not largely differ from its physical counterparts. Network latency, throughput, error rates, and the health status of each of the virtual devices that make up the network nodes are the basic metrics to keep an eye on, same as it would be on a real network.

Virtual networks have the advantage of being much more flexible and scalable than a classic, physical network. Being defined in software, similar to other cloud services, means that these networks can quickly be modified, extended, or reduced. Virtual network monitoring should be capable of following these modifications without having to reconfigure the whole monitoring.

Cloud database monitoring

Databases are a key piece of every infrastructure, containing the data that fuels most of the other functionalities. Databases exist in the cloud too, and should be monitored as any other service there.

Cloud database monitoring is the branch of cloud monitoring that takes care of checking the databases that are present on cloud networks. AWS RDS, Azure SQL, and GCP Cloud SQL are common cloud databases, but not the only ones. Generally, any cloud monitoring solution should be able to at least monitor these databases and, as is the case with Checkmk, possibly more.

The key metrics to monitor for databases are the overall resource usage, transactions per second, cache hit ratio, data and log files. Cloud databases are no different, and at least these metrics must be collected for proper cloud database monitoring.

Cloud storage monitoring

Storage is a necessity both on-premises and in the cloud. All cloud vendors offer a series of options to store files and applications, that go from simply having storage to back up your documents, to usage tiers customized to the enterprise needs. Being in a cloud environment means that this storage can easily be enlarged when the need arises, unlike on-premises where physical drives must be purchased and installed before they can be used.

Regardless of what is stored in the cloud, it can be monitored. Cloud storage is the base for many other services, like website hosting, application data, databases and more. Monitoring is clearly beneficial and good practice to make sure none of your services run out of space, and there are no bottlenecks in performance due to storage inefficiency. Cloud security monitoring intersects with cloud storage monitoring, as security of the data hosted on the cloud is of primary importance for the overall security.

Cloud storage monitoring is the branch of cloud monitoring that monitors the various offers of cloud storage present on the market. These include AWS S3 (Simple Storage Service), AWS EBS (Elastic Block Store), AWS ECS (Elastic Container Service) as part of AWS cloud monitoring. Azure has a series of different offerings related to storage, such as Blob, Disk, Queue, File, and Table Storage. GCP storage is simply called Cloud Storage. There are clearly small differences, especially in relation to the intended use of the storage space, but needless to say the more vital the files hosted are for you, the more important it is to use a cloud monitoring solution to monitor them.

Azure storage dashboard

Website monitoring

Website hosting is easily done on the cloud. While dedicated hosting has existed for decades now, all the major cloud vendors only relatively recently started offering to host a website on them as well. It makes sense to have the website of your company on the same cloud as your other internal or external services, as this also makes it much easier to monitor them. Whether you are in need of a fully operating web server for a web app, or simply of a static website, it is possible to have it on the cloud vendor of your choice.

From an organizational and monitoring point of view, it is more manageable to have everything on the cloud, whether it is website, databases, networks, and any other file used internally or not. Benefitting from the advantages in scalability of the cloud, a website hosted there can answer a spike in visits much more flexibly than most normal web hosting services.

As with all things vital for your company, monitoring your website’s health is also essential. Luckily, any cloud monitoring tool can monitor the critical metrics for a website, like bandwidth and disk space usage.

Virtual machine monitoring

Virtual machines are commonly used in infrastructure, and cloud environments are no exception. With the ease with which these can be created in the clouds, they are often used by companies to have fresh environments for employees or testing.

Virtual machine monitoring deals with keeping track of the health and efficiency of VMs, and a virtual machine in the cloud is no different. The usual array of metrics such as CPU usage, memory utilization, I/O latency, VM uptime, network and disk usage need to be collected to know how well all your virtual machines are performing.

In the cloud, it is easier to have multiple VMs running at all times. A cloud monitoring software has to be capable of following the creation and erasing the VMs on the cloud, of knowing when to start monitoring a new one and when to abandon the monitoring of one that just shut down. Given the quantity and dynamic nature of VMs that can be quickly created in a cloud environment, monitoring them is a more difficult task than on-premises (very broadly speaking, exceptions here are common). Keeping track of such ephemeral VMs, which are launched and dropped in seconds, in real time, is a complex challenge for cloud monitoring tools and administrators.

GCP GCE dashboard

Cloud performance monitoring

All the different processes and cloud workloads must run efficiently and with optimal performance, which is the aim of cloud performance monitoring. As with cloud monitoring in general, this can be done with dedicated tools from each of the cloud vendors or with general third party cloud monitoring tools.

For cloud performance monitoring, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor can give you the information you need to monitor the performance of all aspects of your respective cloud environments. If all you need is a single cloud, and your on-premises hardware and software monitoring needs are already covered, this may be a reasonable option.

Alternatively, to have much better customizability and flexibility, a dedicated cloud monitoring solution is a more valid option. Unlike integrated monitoring tools, a separate software that allows covering all your monitoring needs, on the cloud and locally, can save you resources, have better interoperability, and be generally more flexible to your specific use cases.

Whichever way you choose, cloud performance monitoring is about the checking of performance-sensitive metrics of each of the services offered by the cloud. For apps that could be CPU utilization, for storage and databases disk usage, for virtual networks traffic use and latency, and so on. Each of the specific cloud services have their own performance metrics. Collecting these gives you not only a view of how well a cloud environment is performing, but also if there is any need to upgrade or optimize it.

To have a meaningful cloud performance monitoring setup, metrics have to be constantly monitored, and alerts sent out when any hit a predefined threshold. It is not a passive endeavor, but an active one to assure that the infrastructure is running smoothly. Cloud monitoring software must be responsive to perform optimal cloud performance monitoring.

Cloud security monitoring

Cloud security monitoring groups a series of processes together that allow to review, manage, and observe the workflow and security of cloud environments. Due to the sensitivity of data hosted on the cloud and the key operational importance of the cloud services, security has to be a primary worry for any IT team.

While cloud vendors have spent a great amount of resources to make sure that data breaches and vulnerabilities are kept to a minimum, it is obviously never a risk-free endeavor. Cloud monitoring tools, however, also check the security of the cloud environments as well as their performance and efficiency.

In practical terms, this means a lot of checks. Active monitoring connection attempts for possible intrusions, check the traffic volume across all your cloud networks for unusual usage spikes and monitoring session’s lengths, which may indicate malicious activity. User policies and permissions should regularly be checked, along with certificate configurations and renewing expired ones. It may go as far as checksumming the cloud backups and storage files against the local version to ensure nothing was tampered with or became corrupted.

Cloud security monitoring is not exclusively about ensuring the safety of data in the cloud. It is also important to follow major regulations in compliance, which mandate implementing observation tools as a cloud monitoring software to comply with these regulations. Internal and external audits often focus on security, thus monitoring in the cloud necessarily must include security checks to be in compliance with such audits. Usually, cloud server monitoring and cloud network monitoring are the main area to focus on for security, without discounting the need to ensure the security of cloud storage and its relative data.

Cloud monitoring best practices

The best way to do cloud monitoring is by initially assessing the infrastructure. If it is split across different cloud providers, if it has on-premises services, what types of clouds are involved and, in general, how all infrastructure is formed. This way a strategy can be planned, which oftentimes involves a centralized cloud monitoring tool that can monitor it all.

The reason for having a single monitoring solution that can encompass at least the main cloud vendors (AWS, Azure, and GCP) is that it is easier to deploy and maintain, and that it is a single point of control for the whole infrastructure. It is much easier to assess who has to have access to the monitoring dashboard when there is only one solution instead of multiple pieces, each monitoring part of the infrastructure.

Once the cloud monitoring solution is in place and configured, it is important to establish the metrics that are to be collected depending on the services you run on the cloud. Perhaps the infrastructure is mostly about apps running on services like AWS Lambda or Azure Functions, or perhaps a large use of databases is needed, like AWS DynamoDB or GCP Cloud SQL. In case of the former, you may want to track CPU, memory usage, and that the number of microservices are within the expected range for the app. In case of the latter, metrics like number of accesses, query performance, and data integrity are more important.

Thus, tailor the metrics to the way you use your cloud, and set up relative alarms to get an alert as soon as anything goes above or beyond the threshold of normal functioning. Cloud monitoring involves a great quantity of pieces that have different uses and possible issues, and therefore its monitoring needs to be customized according to exactly what you are utilizing.

Further, the various cloud vendors apply different cost plans according to what is used, and how much. With cloud monitoring solutions, they are often able to monitor the current resource usage of the respective cloud platform and thus keep an eye on the costs incurred.

Screenshots monitoring costs in AWS

Cloud monitoring challenges

Cloud monitoring has its own set of challenges compared to on-premises monitoring. These are inherent to the nature of cloud environments and must be addressed, or at the very least considered, when setting up a cloud monitoring solution.

Clouds are dynamic and elastic environments, and resources come and go on-demand and automatically. Monitoring a system that is in a flux, that can change any minute – if not any second –, means that a software must be capable of keeping track of any types of changes in real time. Setting up monitoring once and forgetting about it is not an option when doing either cloud or hybrid cloud monitoring. You must keep in mind that networks can change, servers can be shut down, and that microservices arise and disappear in a blink. Both administrators and monitoring software have to be ready for things to change anytime, without necessarily triggering an alert or notification.

A cloud environment may also be distributed in multiple world locations, separated by a great geographical distance. An increase in latency to receive info is to be expected. Cloud network monitoring needs to take into account a possible lower performance in such distributed systems. To address this, cloud vendors offer various locations / regions for their data centers to mitigate the distance between your own networks and theirs. Yet in some hybrid environments, where both on-premises and cloud services are present, a higher latency is to be expected due to the still existing distance between cloud and local resources. A cloud monitoring solution needs to be strong in monitoring both sides of these networks.

Especially when retrieving the monitoring data via the cloud provider's API, you may have to reckon with some restrictions. As with any API, only what is exposed can be polled, and in monitoring that means that only the metrics that are available can be collected. On-premises, you have complete access to devices and applications, which may not be exactly true for an environment that is not completely under your control.

Cloud environments may require more security and compliance measures, especially to ensure that data is protected. Cloud security monitoring may become more difficult to execute as there may not be the same level of control over resources as with on-premises.
Lastly, costs are not to be ignored when using a cloud environment. There are multiple usage tiers and every service is billed differently. However, suffice to say, monitoring the cloud may incur further costs that have to be calculated and kept track off to remain acceptable.

How to choose a cloud monitoring tool?

With the vast array of services and cloud offerings available, choosing a cloud monitoring tool is not straightforward. We said earlier that the first choice to make is to pick a software that can monitor all the clouds your infrastructure is going to be based on. The more cloud environments you need, the fewer choices you could end up having. Less common cloud services may not be supported by every monitoring tool and even within the big three, AWS, Azure, and GCP, not all services and resources have been included in the monitoring. Luckily, at least the 3 major cloud services are generally well-supported by cloud monitoring solutions, so this should not pose too much of a limitation.

It is highly desirable to have a tool that is easy to set up and configure. Cloud environments are quite complex already, so adding complexity is not advisable. Similarly, a tool that has a modern user interface, with plenty of dashboards to visualize the data, will give you an advantage when monitoring in the cloud. Checkmk fits these requirements.

Automation in monitoring is important to reduce human error and ease up the whole monitoring. In cloud environments, this means auto-discovery of new and stale services that can then be automatically added or removed to the monitoring. Checkmk auto-discovers services, and adds and removes hosts in real-time, so changes in the cloud environment are immediately reflected on the monitoring side.

Features to look for in a cloud monitoring software

The right monitoring solution must be able to meet your cloud requirements: whether you are going to use AWS cloud monitoring, or Azure cloud monitoring, or GCP cloud monitoring, you need a tool that can support interfacing with these vendors. There are other vendors as well, that should be supported if you plan on using any of them.

Similarly, a good cloud monitoring solution needs to be able to monitor all the services and resources you need for your use case. Some free or cheap solutions have a service limit that may be too restrictive for your needs. Premium versions of these tools usually remove those limits, but it is still good to check if you are going to use an unusually large number of services in the cloud to necessitate a powerful tool, or a custom quote. It may be possible you need to buy additional capacities of the cloud monitoring tool you chose.

Once the cloud monitoring tool of choice is able to interface with your cloud environments, your on-premises networks and hosts, and supports enough cloud services for your needs, it has to be cloud-capable and not just adapted to it.

Conclusion

Using the cloud comes with increased scalability, customizability, and flexibility. In terms of monitoring, these require a capable monitoring solution, though. The flux of services and hosts being launched and stopped every minute is a lot larger than what happens on-premises. Normal monitoring tools are simply not sufficient unless they have been developed with cloud monitoring in mind.

Cloud security monitoring and cloud performance monitoring also imply adapting the monitoring software to an environment that is not completely managed by your organization. Operating hybrid cloud settings adds some advantages but also means facing new difficulties in monitoring compared to what you are already handling on your physical networks. To monitor it all, you need a state-of-the-art monitoring tool, like the Checkmk Cloud Edition.

Monitor with Checkmk all cloud-native technologies like managed databases, functions, and microservices across the main cloud vendors such as AWS, Azure, and Google Cloud Platform. With dedicated dashboards to easily visualize all your clouds, monitoring all the various resources and components becomes handy. Checkmk is capable of dealing with these ephemeral infrastructures.

FAQ

What is the difference between SaaS, PaaS, and IaaS?

SaaS, PaaS, and IaaS are different computing service models. PaaS (Platform as a Service) are complete platforms to develop, deploy, and manage applications and services in a cloud. SaaS (Software as a Service) provides access only to a software that performs a service, without the need for customers to install or maintain anything. IaaS (Infrastructure as a Service) is a whole system to not only deploy and develop apps, but to also run business resources such as networks, servers, databases, storage and more.

What is hybrid cloud monitoring?

The term hybrid cloud monitoring describes the monitoring of a hybrid infrastructure. These consist of both on-premises and cloud networks, hosts, and services. It does not matter how much of an infrastructure is local or on the cloud, nor if more than one cloud service is used; a hybrid infrastructure is defined by simply having parts of both environments.

What is the difference between a cloud and SaaS monitoring solution?

A cloud monitoring solution can be used from the cloud but leaves the duty of installing and setting up to your IT department. A SaaS monitoring cloud infrastructure is already present on a given cloud, and only needs to be accessed and configured to be utilized. It removes the need of installing and updating, leaving the maintenance to the provider. In turn, it loses some flexibility.

What is load balancer monitoring?

Load balancer monitoring is the process of monitoring load balancers. These, often shortened to LB, are software or devices, both physical and virtual, that distribute incoming network traffic across multiple resources. LBs ensure high availability of applications or services, and prevent overload. Monitoring load balancers is therefore necessary to exploit their usefulness and avoid their failures, which would easily cause network congestions.

What is a hyperscaler?

Hyperscaler is a computing term that comes from the ability of hyperscaling a software architecture according to increased demand. There is no universally accepted definition of a hyperscaler but all the main cloud providers systems like AWS, Azure, GCP, IBM Cloud and so on, fit the definition of a hyperscaler.