What is AWS EC2 monitoring?

AWS EC2 (Elastic Compute Cloud) is the Amazon service that offers highly scalable virtual servers in the cloud. AWS EC2 monitoring is the part of the larger AWS cloud monitoring topic that deals with monitoring EC2 sites. By monitoring various metrics, cloud administrators can gain insights on what is performing, and what is not, on these sites, and discover potential issues. Amazon EC2 monitoring is a crucial step in ensuring that your cloud infrastructure is available, healthy, and free of anomalies and bottlenecks.

EC2 monitoring is a variegated task that includes checking the efficiency of many types of AWS sites, as there can be multiple EC2 sites depending on the use case. We will see more of each type later. For now, it is sufficient to say that EC2 monitoring includes any type of EC2 site, with in practice, few differences in how the monitoring is performed.

AWS EC2 monitoring

Why monitor AWS EC2 instances?

Amazon offers a primary tool for AWS EC2 monitoring, CloudWatch. It is capable of detailed monitoring of EC2 sites, and in many cases it provides a set of dashboards that are good enough to be quickly put into use for detecting problems and identifying potential for optimizations. This is essentially what EC2 monitoring is all about. No matter how comprehensive and accurate AWS' own monitoring and managing of their resources is, it is necessary to add a second layer with a good CloudWatch setup or a third party Amazon EC2 monitoring solution. To avoid disruptions, outages, bottlenecks, misconfigurations, and general issues, monitoring EC2 instances is a task that should be undertaken by any sensible cloud administrator.

Using CloudWatch is the bare minimum to ensure that your cloud infrastructure is healthy and working as it should.

The capabilities of some of the features in CloudWatch are inadequate for satisfying the importance of a really comprehensive EC2 monitoring.

For instance, CloudWatch may lack some granularity that is required to cover your specific use case. Monitoring EC2 sites with a custom solution such as Checkmk can better satisfy your requirements, making the task of monitoring Amazon EC2 more worthwhile. This is especially important when you need more customizability and flexibility in the monitoring. Setting your own thresholds alternatively to the limits in CloudWatch, or defining specific monitoring configurations are all aspects that a separate monitoring tool can provide.

Furthermore, advanced features like anomaly detection, trend analysis, predictive analysis, custom alarming, and automated remediation are only some of the additional features that third party cloud monitoring tools may offer. These may not always be necessary, but certainly desirable, and can be the solution to many of the limits to functionality in CloudWatch.

Regardless of the tool of choice, monitoring EC2 instances is an important step in having a reliable, healthy, and high-performance cloud-based infrastructure. Without knowing how it is working and performing, you are blind to possible issues and disruptions. This is the primary reason for monitoring AWS EC2 sites.

AWS EC2 Checkmk dashboard

What are the types of AWS EC2 sites?

For detailed EC2 monitoring, knowing the various types of site is valuable. AWS has created many types of sites, for various applications. Knowing which is which will help you in EC2 monitoring as every instance focuses on particular aspects, for instance, memory efficiency or data storage, and therefore you will also need to focus on slightly different metrics.

Thus, let’s have a concise look at the existing AWS site types, and what they specialize in.

General purpose sites

As the name implies, general purpose sites are balanced virtual servers that can be used for a variety of tasks. These AWS site types are geared towards uses that do not require a focus on any specific resource, like computing power, efficient memory, high network throughput, or fast read/write operations. They instead provide a balanced environment that should suffice for most workloads.

Their main differences between these types of site is the CPU model, the software environment (Mac is for macOS workloads), and in how their costs are calculated.

Computing-optimized sites

Computing-optimized sites are AWS site types for CPU-intensive applications, such as batch processing workloads, media transcoding, high performance computing, scientific modeling, dedicated gaming servers, machine learning and more. Any type of CPU-intensive application can benefit from choosing these sites over the general purpose ones mentioned above.

Memory-optimized sites

If the needs of your organization include heavy memory usage applications, AWS sites that are memory-optimized are what you should consider using. These are specifically designed for memory-intensive workloads, for instance, in-memory databases, electronic design automation, financial actuarial, and of numerous types of heavy data analysis.

Most of the differences between memory optimized instances are in the type and amount of memory available. Some go up to 24Tb of memory, with multiple Gbps of network bandwidth to sustain a rapid exchange of data to and from the memory.

Accelerated-computing sites

AWS site types for accelerated-computing are for workloads that necessitate high GPU (Graphics Processing Unit) power. Tensor core GPUs for machine learning and HPC (High Performance Computing) are the main use cases for these EC2 sites. Image and video analysis, forecasting, advanced text and document analysis, voice translation and transcription, natural language processing (NLP), deep learning training are also what these sites excel at.

AWS offers quite a few of these types of site, depending on the type of GPU and quantity of overall resources.

Storage-optimized sites

For applications that necessitate high, sequential read and write access to very large datasets, AWS has a series of storage-optimized EC2 sites. These are ideal for both relational and non-relational databases, and for all workloads that require very fast access to medium size data sets, such as search engines and data analytics. These sites provide fast NVMe SSD storage but cheaper, and slower, ones, with simple HDD storage, are also available.

The main difference between sites of this type is in the maximum transactions processed per second (TPS), and cost per Tb of data storage. Naturally, increasing the speed increases the cost of a site.

HPC-optimized sites

HPC (High-Performance Computing) workloads can be run on computing-optimized sites quite well, but Amazon has two dedicated site types for high-performance processing workloads. These HPC-optimized EC2 sites offer a better price per performance when running large, complex simulations, deep-learning tasks, and generally very computationally intensive workloads. These instances are also designed for workloads that can take advantage of improved network throughput and packet-rate performance.

What metrics to monitor?

EC2 monitoring is done either by using CloudWatch and its dashboards, AWS APIs or a specific AWS EC2 monitoring tool. AWS exports a fairly good volume of metrics to base your monitoring on, not all of them available on all AWS site types. Common to all are basic metrics like CPU utilization, disk write and read operations per second, and network bytes or packets received or transmitted. Any type of EC2 site can report these metrics, and they are the bare minimum on which you should base your monitoring efforts.

Sites such as T4g, T3, T3a, and T2 also report metrics as CPU credits and the total usage, balance, and eventual surpluses. These sites use credits to calculate the usage, and relative costs, of the cloud resources, and of course taking these metrics into consideration is necessary to keep track of usage and expenses.

For more advanced monitoring, and in truth anything that exceeds the basics, either the CloudWatch agent or a third-party one is necessary. CloudWatch dashboards only export part of the possible metrics. For instance, to perform AWS EC2 memory monitoring an agent can collect metrics like active, buffer, cached, free, inactive, used memory, as well as the total of all the available memory for the specific EC2 site. There are slight differences depending on the operating system used by the site, Linux, Windows or MacOS, for example, as is to be expected. AWS EC2 memory monitoring clearly needs an agent to be installed for any advanced monitoring on EC2. Most AWS EC2 monitoring tools provide an agent that can check these and more metrics.

A key metric that has so far not been named is disk usage. This is because to perform AWS EC2 disk space monitoring it is necessary to monitor the EBS volumes that are attached to the EC2 sites in use. These are the virtual drives for the sites, and the disk space metrics are collected at their level. Similarly, as with the main EC2 site metrics, either CloudWatch or the APIs or an ad-hoc agent can inform you of how much disk space is in use and available for all of your AWS site types.

These are the basic metrics for EC2 performance monitoring. In the cases of EC2 security monitoring or general compliance, it may be important to report on AWS EC2 file integrity monitoring. A third party tool is needed here, as CloudWatch does not support checking the integrity of each file on your EC2 sites.

Best practices for monitoring EC2 sites

Regardless of what metrics to collect and what site you are using, make EC2 monitoring a priority. It is a big element of your infrastructure, and one that you do not want to fail at any time. Monitoring EC2 sites involves, at the very least the collection of metrics, but should also include log analysis. Implementing all of this in an AWS EC2 monitoring tool and automating this to suit your needs is the ideal solution.

Monitoring metrics and analyzing logs can be performed through CloudWatch or any of the more advanced third-party tools, such as Checkmk. The important thing is to have a system that includes all of the possible metrics, that can alert you as soon as anything seems to be awry or amiss, that can be set up with ease, ideally with only a few configuration steps required, and that supports all of your infrastructure, both hybrid and fully cloud-based as it may be. Checkmk is obviously our pick for this and more. An advanced monitoring tool for EC2 and AWS, as a whole Checkmk does most of the work that would otherwise be done through CloudWatch or custom agents for you, in a single solution for both hybrid and fully cloud-based infrastructures.