CIS Benchmark In Behavox: Part1 - Linux
Introduction
Hello, my name is Timokhin Maksim. I am a DevOps engineer at Behavox. Today I would like to talk about how we successfully implemented the CIS Benchmark for Linux at Behavox on our servers.
I will discuss the goals we set for ourselves, the results we achieved, and how we automated the process of applying the CIS Benchmark while establishing procedures to stay compliant.
I'll also touch on the layers involved in this process and briefly explain why we chose not to use off-the-shelf solutions. Get ready for a deep dive. Let's begin!
What is the CIS Benchmark
The website cissecurity.org provides the following definition:
CIS Benchmarks are best practices for the secure configuration of a target system. Available for more than 100 CIS Benchmarks across 25+ vendor product families, CIS Benchmarks are developed through a unique consensus-based process comprised of cybersecurity professionals and subject matter experts around the world.
Not entirely clear, right? In other words, a CIS Benchmark is a list of recommendations or settings for securing the configuration of a system and applications. Combined with effective tooling, these practices enhance overall system security and reduce the attack surface area.
But recommendations alone might not be enough. Automation and verification tools are always desirable. Let robots handle the routine.
Goals
Create a comprehensive automated hardening system that detects or rectifies issues at all operation stages, from image creation to runtime.
This includes:
- Making the system image secure by default.
- Detecting issues before they enter production if the introduced changes contradict the rules we check.
- Having a monitoring and alerting system in case the configuration changes during runtime.
- Being able to check the system in runtime at any given moment.
- Having the option to restore the system's state directly in runtime, if needed, and doing it quickly and easily.
Our goal was not to comply with 100% CIS policies, as it's simply impossible. For instance, certain kernel configurations can disrupt software running on the server or a requirement of having a separate partition for /var/log.
However, our objective is to apply the maximum number of security policies that make sense within the specific context of our business and environment.
Processes
We can divide them into two parts:
- Processes involved before the launch of the operation system. This process includes creating an Amazon Machine Image(AMI) or GCP OS Image.
- Processes in runtime. This process involves monitoring and alerting, as well as the ability to restore the system's state if necessary.
Development
The diagram speaks for itself, but let me add a few comments.
All changes to the infrastructure go through review and CI/CD – we create a test environment, spin up the OS from our image, run the OS configuration system, and, lastly, run our utility. An important aspect here is running it at the very end, enabling us to signal any unaddressed Benchmark recommendations or report a successful test.
In case of a test failure, the author adjusts their changes to comply with the recommendations. A successful test indicates that the changes can be merged.
Runtime
As mentioned earlier, our goals cover the entire lifecycle of the system. Therefore, checking the system in runtime is essential, as changes can be introduced at any time. For this, we use the same utility that runs in runtime. If an alert is triggered, the responsible person receives a notification and can decide what to do next: investigate and/or restore the system's state.
Implementation
Let's focus on the stages of applying the CIS Benchmark.
The AMI stage is the system image creation stage, which we use to create instances for AWS or GCP.
This stage is the first because it sets the basic system configuration. However, this is insufficient, as the image is incomplete at creation. For example, certain mount points may not exist yet and can be created during system operation.
Therefore, during the cluster preparation stage, we run SaltStack to bring the system to the required state.
Next, tools we created for detecting and fixing problems come into play.
Tooling
It’s worth mentioning Vulnerability Assessment/Management systems, specifically Nessus, which is the industry standard.
Nessus allows you to set up scanning policies and audit the results.
Of course, we would have chosen in its favour. Still, because of the isolated network to which we have limited access, we decided to create our own tools to solve the problem and maintain compatibility with Nessus.
Once the process is clear and approved, we can choose the tool. We chose OpenSCAP. OpenSCAP provides a set of utilities, check files, scripts, and files for various systems, such as Ansible playbooks, for issue resolution. It is important to note that it was developed by specialists from RedHat and other security experts. The tool is actively maintained and has good documentation, leaving no doubt about its quality.
A slight inconvenience was the absence of a daemon. To be precise, it exists, but the project on GitHub has been archived and is not being developed further, and there is no exporter at all.
Therefore, we developed two Go utilities: cis-scanner and cis-exporter.
cis-scanner
The сis-scanner is a utility built on top of oscap. Its main task is to run the scanner, create an ARF report (Asset Reporting Format), and create a remediation script.
A crucial condition for creating the utility was to see metrics and alerts only for the rules we wanted to monitor. Some rules don't fit our needs; for example, one recommendation requires setting.
net.ipv4.ip_forward = 0
This kernel parameter disables the ability to forward packets not intended for the host, thereby preventing the use of orchestration systems such as Kubernetes or Nomad.
That's why we compiled a list of rules that are not suitable for our infrastructure and aligned it with our internal security team. We added the ability to read this list from a file into the cis-scanner and ignore the rules that are present in it.
After completing the checks, oscap generates the remediation.sh script, which does not include functions for fixing issues. Consequently, if something in the system's runtime configuration changes, you can safely run remediation.sh.
cis-exporter
The сis-exporter is a utility that runs on the same host with cis-scanner and collects metrics in Prometheus format. The exporter relies on the ARF report, parses it, and exposes metrics. In case the report is missing, it also notifies about it.
We prefer to proactively address such issues rather than wait for the security team to ask us to solve an issue with a security breach.
Conclusion
In the end, we achieved excellent results:
- Our system detects issues before they reach production and during runtime.
- We proactively address security-related problems.
- We saved some money by not using paid images from cloud providers.
🛠️ Behavox is looking for talented engineers to join us in building the world’s leading, AI powered archiving, compliance and security solutions. If you're interested, check out our careers page.