Effective GCP monitoring for performance and costs optimization

Ensure optimal performance and cost management through GCP monitoring. Get to know Google monitoring services and the GCP Monitoring API.

GCP GCE dashboard

What is GCP monitoring?

In the field of cloud monitoring, GCP monitoring is the process that deals with monitoring Google Cloud services and resources. GCP monitoring allows you to collect, analyze, and visualize various metrics and logs from GCP resources. These include virtual machines, containers, databases, load balancers, applications, and other services. As with any area of monitoring, these metrics can help you to identify and troubleshoot problems, optimize performance, and ensure the overall reliability of your applications.

Monitoring GCP is usually done by visualizing all the collected metrics in various dashboards. Most GCP monitoring tools on the market offer plenty of dashboards to organize the data into easy-to-read graphs and tables. With Google Cloud Monitoring, Google offers its own software that retrieves numerous metrics and logs and summarizes them in dashboards.

Throughout this article, we will look at the various cloud services that GCP monitoring encompasses. However, GCP monitoring goes beyond just data collection and also includes alerting, reporting, cloud cost monitoring and much more. For the sake of simplicity, we will therefore focus on the various core areas first.

What are the areas to monitor on a GCP cloud?

By monitoring your Google Cloud, you ensure optimal performance, security of your data and applications, and pinpoint areas for improvement.

GCP performance monitoring

Performance is a key parameter in any infrastructure, and one of the main reasons for using a cloud provider as GCP. With its distributed networks and computing powers, GCP offers, in addition to all other cloud providers, important advantages in terms of sheer performance. Nevertheless, only by monitoring the performance can you ensure that there are no losses at any point.

Google's default software, Cloud Monitoring, aggregates metrics, logs and trace data, displays them in various dashboards, and thus monitors GCP resources and services. GCP performance monitoring focuses on basic metrics such as CPU usage of VM, network usage and bandwidth, ratio of the number of active connections to managed databases, error rates in a virtual network, and storage usage of the various cloud applications. Depending on how you use your Google Cloud, your monitoring may include many more metrics.

To help with GCP performance monitoring, Google provides the Cloud Profiler. This focuses on collecting CPU and memory usage from your cloud applications, generating a statistical profile of each of them. Such a profile is helpful to make changes that improve the applications’ performance. Cloud Profiler attributes metrics to the specific parts of the application’s source code, so you can identify what part of the application is consuming the most resources, enabling costs to be limited, and optimizing performance.

GCP performance monitoring also includes a load balancer. As the virtual semaphore for virtual networks and applications, this plays a key role in distributing loads in the most efficient way, avoiding overloading a single resource or service. Within GCP monitoring, the Google Cloud Monitoring tool collects and analyzes metrics coming from the GCP load balancers, including latency, requests count, byte count for each request, and packet count. Keeping an eye on these metrics will help you realize how efficiently the load balancer is working and if fine-tuning is useful.

GCP network monitoring

The area of GCP network monitoring focuses on virtual networks. Whether you have a cloud-only or a hybrid infrastructure, it is important to ensure that the pieces interconnecting it all are working, optimally configured, and efficient. This is possible by using GCP monitoring tools.

GCP network monitoring is primarily used to guarantee that the basic network connectivity is not disrupted or interrupted, and to monitor how the actual networks are performing. Making sure that the paths between all the components forming a cloud infrastructure can communicate among each other is the first step with GCP network monitoring. Identifying the traffic status, trends, and possible bottlenecks is the next natural step. Other vital parts of network monitoring are firewalls, DNS, and VPNs.

GCP comes with the Network Intelligence Center, a console for Google clouds and its observability, monitoring, and troubleshooting. It has five modules to visualize network topology, packet loss and latency, to view the usage of firewall rules, to analyze network issues, and to test network connectivity. These modules integrate metrics that overlap with GCP performance and GCP security monitoring. The Network Intelligence Center is a simple product that is more than sufficient to get you started or for small networks, but shows its limits when the infrastructure grows into thousands of hosts and services. Third-party monitoring tools like Checkmk can provide more efficient and scalable monitoring insights, and is not limited to only monitoring GCP clouds.

GCP security monitoring

GCP security monitoring involves actively checking your GCP resources and services to detect intrusions, security threats, and potential vulnerabilities. It is the area of cloud monitoring that is charged with protecting your GCP resources by detecting possible security incidents, and inform you about them.

Securing clouds involves many efforts and checks. It is on the one hand checking the safety of data and communications and on the other hand preventing and, hopefully, never needing to resolve security incidents. GCP logs are often monitored to identify suspicious behaviors and possible unsafe events in GCP applications and resources. GCP includes Google Logging, a fully managed log management system, which includes storage, search, analysis, and alerting of logs. Google Logs Explorer is integrated in the Google Cloud Console and is the main instrument for consulting the logs on the GCP platform.

However, not only logs are important in GCP security monitoring. Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) actively monitor networks to identify unauthorized accesses and intrusions, and hopefully prevent them. Google includes the former in GCP with Cloud IDS. It detects network threats such as malware, spyware, and monitors intra- and inter-VPC (Virtual Private Cloud) communication. It is not an IPS system, but it helps to detect security-related issues in their GCP infrastructure.

For monitoring GCP applications and containers’ security, a useful tool is the Google Web Security Scanner. This identifies vulnerabilities in the App Engine, Google Kubernetes Engine (GKE), and Compute Engine applications. The Security Scanner acts as a crawler of public web URLs, spotting vulnerabilities by repeatedly submitting as many inputs and event handlers as possible. Not a complete GCP security monitoring for cloud applications by any stretch, but nonetheless a useful tool to keep in mind.

GCP application monitoring and VM monitoring

Whether your infrastructure makes great use of the power of Compute Engine’s VMs or leans more towards leveraging the App Engine for running applications on GCP: Both services drive a business – making the monitoring of these two services a must.

Google Cloud Monitoring and Google Logging collect metrics from cloud applications and virtual machines. These make part of the Google Cloud operation suite, formerly known as Stackdriver. It not only integrates GCP logging and monitoring, but provides you with a few more features. It can alert following pre-established policies, offload the management and scaling of Prometheus infrastructure, set up custom dashboards, perform uptime checks, and conduct security audits. Even though it is a fairly large monitoring solution, it is limited to GCP clouds.

GCP also offers an error reporting solution, named Google Error Reporting. This can identify and help you to understand errors in applications, thus providing a basic form of GCP applications monitoring. It supports many popular languages such as Go, Java, .NET, PHP, Node.js, Python, and Ruby. Through its integration with Google Logging, it can send errors as logs for further analysis.

Google Cloud Storage monitoring

GCP offers Cloud Storage as a service for hosting data. With different tiers for different desired levels of availability, Cloud Storage is a flexible service for storing unstructured data. Files are a part of web applications and as such are of vital importance for the health of your infrastructure. Monitoring GCP Cloud Storage is thus a logical addition to any monitoring efforts.

Google Cloud Monitoring and Google Logging come to provide help with that. Key metrics such as storage usage, data transfer rates, error rates, and security-related events are collected to give insight on how the storage service is used, by whom, and if scaling is necessary.

While Google Cloud Monitoring is an integrated and easy to use GCP monitoring tool, it may be insufficiently powerful in larger infrastructures and in more advanced use cases. Third-party Google monitoring tools may be better suited to your requirements. Checkmk and similar tools allow more flexible customization and efficiency, without making the tools integrated by Google itself redundant.

GCP database monitoring

Like any other cloud provider, GCP offers databases. Cloud SQL, Cloud Firestore and Cloud Bigtable are only a few of the choices you have when setting up a database on the Google cloud. Needless to say, all of these must be part of your GCP monitoring strategy.

GCP database services expose a series of metrics that include CPU utilization, memory usage, disk I/O, network traffic, query latency, connection pool usage, and more. Along with the support for collecting database logs in Google Cloud Monitoring, there is a fairly complete range of data for assessing databases’ health.

In addition to health, database performance is also very important. GCP monitoring includes a specific tool available for this task: Query Insights. It is an analyzer of queries for Cloud SQL databases, helping you in detecting, diagnosing, and preventing query performance problems. It is a helpful troubleshooter for SQL queries, and useful to pair with general GCP logging and monitoring.

Monitoring GCP costs

Cloud costs are a sore point for many. It's often difficult to find a cloud provider that meets all your needs and ensures that you don't spend money unnecessarily on your resources. Fortunately, most cloud monitoring solutions offer a way to review your resource usage and calculate its relative cost.

GCP also has such an option. Google Cost Management is a dashboard that collects info on the usage of GCP resources and services to provide you with visibility into your current cost trends and forecasts. In addition, you can set policies through which the software makes recommendations to help you contain costs and make better use of your budget. The software also sends you an alert should a service be running and incurring costs when you don't actually need it.

Third-party GCP monitoring solutions such as Checkmk also have their own cost control capabilities. Using monitoring agents or monitoring APIs, these solutions can check GCP usage costs in near real-time and are usually multi-cloud capable as well. Aggregating costs across all your cloud services and resources also eliminates the need to log into multiple cloud dashboards and review costs individually.

What GCP monitoring tools are available?

Monitoring GCP is either performed through the integrated dashboards in Google Cloud Monitoring, or through an external GCP monitoring tool. These tools leverage the use of monitoring agents that run in the cloud itself or through connecting to the Google Cloud Monitoring API.

The Cloud Monitoring API was designed to provide an interface to which monitoring tools can build their services, exposing a large set of metrics from many GCP services and resources. The API is offered as the REST and as gRPC. A specific language called Monitoring Query Language (MQL) can be used to poll the API and retrieve, filter, and manipulate its data. It is not a complex language, and anybody who is somewhat familiar with JSON and REST APIs can start using it in a matter of minutes.

Through the API, you can also use MQL or any of the languages that have their own GCP monitoring libraries to perform a range of monitoring tasks and set up a custom monitoring system over time. This is a great feature if Google Cloud Monitoring ever fails to meet your own needs. However, if Google Cloud Monitoring fails to meet your needs more often, as it often does in large enterprises, third-party GCP monitoring solutions remain as the only viable solution.

Checkmk and many others use the monitoring API as well as custom agents, going beyond the built-in GCP dashboards. Considering that both Google Cloud Monitoring and the API incur costs that increase with usage, the licensed version of an external GCP monitoring software may be cheaper in the end. Due to better customizability, advanced use case coverage, and multi-cloud support, third-party monitoring software is often a better solution for GCP cloud monitoring.

Why is Google cloud monitoring important?

GCP monitoring is of utmost importance for companies using GCP. Google monitoring services, in large part, ensure the smooth operation of cloud-based applications and infrastructure. With effective GCP monitoring, you can track the health and performance of GCP resources, identify potential issues before they occur, and take remedial action before they escalate. As we've seen, using GCP monitoring solutions like Google Cloud Monitoring or Checkmk collects key metrics and presents them in a variety of dashboards, allowing you to keep an eye on your GCP services and resources at a glance.

Google monitoring services often provide real-time insights into the behavior of GCP services, enabling organizations to analyze and optimize their deployments. With the use of GCP Monitoring API or ad-hoc monitoring agents, organizations can systematically access monitoring data and integrate it into their existing monitoring systems or custom dashboards. Customizing this data, and setting up alerts when predefined events occur or thresholds are surpassed, is key to an effective GCP monitoring strategy.

Last but not least, GCP monitoring aids in cost management by allowing you to monitor resource utilization and identify opportunities for optimization. By tracking usage across all of your GCP cloud resources, you can make informed decisions on optimizing costs, define the right size of resources, and eliminate unnecessary expenses.

GCP monitoring is thus critical for everyone using the Google Cloud Platform. Even if only by using Google Cloud Monitoring and Google Logging, or a more advanced tool like Checkmk, GCP monitoring should be a part of every monitoring system.