The project was very ambitious; integration work, bug fixing, adding features, packaging, testing, and finally releasing a cloud software stack with a SUSE gecko on it within less than 7 month.
I was asked to take care of the security. It was a challenge because I was not familiar with the inherit security issues of Cloud technology and I had to realize that OpenStack was very complex. Additionally security seems not to be a major concern when OpenStack was designed. Don't get me wrong, the upstream developers have a well working security team, quick patches, professional response and process handling. But for example OpenStack makes heavy use of RESTful APIs which are publically available, authentication (authC) is password-based, connections were not encrypted by default, passwords are send over the wire without encryption, and so on.
Even if the start of the project was very challenging for everyone involved it was a great success at the end. Let me tell you in this blog how we managed to ship the most secure OpenStack-based Cloud solution. But at first kudos to all people that helped making it possible. I had a lot of help from the SUSE Cloud team, then two of my former team-mates from the SUSE Security-Team (Sebastian Krahmer, Matthias Weckbecker) built the powerful validation and verification team to test the software for vulnerabilities based on an initial risk assessment, and last but not least, the OpenStack (especially the secuirty guys) community which is incredible!
The secure application development processAs you know from my ealier postings, we introduced something equal to Mircosofts "Secure Developemt Life Cylce" process and I called it SAD, Secure Application Development. (Not very clever but good to remember)
In this case we do not develop the stack from scratch, we do integration and add features. So, I had neither influence on the secure coding standards used nor on the code quality. Therefore we were limited to define security requirements and drive the "Verification and Validation" (V&V) process.
How to find a Needle in a Haystack
When you try to get familiar with Cloud Computing you will stumble through a field of marketing buzzwords, various commercial Cloud providers with very special/limited solutions, different security guidelines and so on and so on... maybe you will feel lost.
Well that was the way I felt when I started. I wanted to find Cloud-specific attack vectors / risks which are unique to this new technology... but there isn't much new technology involved in Cloud Computing and there isn't much research of the new stuff. So, where to start?
- Web-security issues of the Dashboard
- deployment issues, like file permissions, host and network hardening
- API security
Security RequirementsThe German Federal Office for Information Security (BSI) provides some simple and effective security requirements for Cloud Service Providers which I prefered to use because we are a German company and the set of recommendations is manageable.
- My testing suite based on the OWASP testing guide v3
- API fuzzing
- Manual reviews
- Code reviews
- Regression testing
Vulnerabilities and Hardening
- CVE-2012-2094: The log viewer of the Dashboard was vulnerable to an XSS attack.
- CVE-2012-2144: One of the first bugs we found was a Session Fixation vulnerability in the Dashboard
- CVE-2012-3360: (lpd#1015531) The Nova API was vulnerable to Path Traversal which allowed remote users to overwrite arbitrary files with chosen content.
- CVE-2012-3537 The crowbar ohai plugin insecurely handles temp. files which leads to local privilege escalation
- CVE-2012-3540: Just a few days before the Gold Master we found an Open Redirect vulnerability using the next= paramater, that would allow a remote attacker to execute a Phishing attack.
- CVE-2012-3551: A simple XSS attack was possible using the file= parameter.
- lpd#963098 and lpd#948317: Guessing passwords was very easy, there is no way to specify a mandatory password policy for newly created users in the Dashboard, authentication violations weren't logged by Keystone, and there is no limit per client for guessing passwords. We tried to introduce cracklib to enforce a password policy, but it was just "a" policy not the one the CSP would like to use, therefore a configureable regex was introduced in local_settings.py, BTW, don't foret to enable it. Additionally our Keystone server logs authentication violation in /var/log/keystone now. A rate limit to stop online password guessing will be introduced later.
- lpd#1006414 : Remote command execution via pickle. No CVE-ID assigned, severity unclear.
- With SUSE Cloud it is possible to enable SSL for the API, the Dashboard (Session Cookies use "secure" and "httpOnly" flag in SSL mode), the VNC server, etc. All this interfaces to the Cloud are used via the Internet and are therefore easy to attack.
- OpenStack, crowbar, ceph come with a lot of configuration files, a lot of them contain passwords, keys and other sensitive information. These files are not world-readable anymore in SUSE Cloud.
- Default passwords/secrets/keys (for example to secure session cookies) are a big problem in current web-frameworks like Rails or Django, especially when you use cloned images in a VM. (Django's SECRET_KEY was fixed, take care yourself about static secrets)
- Fuzzing the RESTful APIs of Nova/Compute and Keystone was done using the fuzz_xmlrpc.pl script and did not reveal any additional suspicious actions (beside watching logs and error messages, behavior was monitored with an inotify and exec monitor). This testing was sufficient for the first round.
- ... and various other tiny issues...
- ... about 40% of the current Essex security issues were found by us.
- Agile development is something I really like, it doesn't try to tame the chaos, it is just open, pragmatic, and adaptive. The problem is that the code changes very often and even lately. This can cause failures to re-occurre therefore automatic regression testing is very important. (CI server)
- To make automatic regression testing effective it is very important for security engineers to create testcases (proof-of-concept exploits) which is time consuming but a clear proof of the problem which helps developers to understand the problem, and to reproduce it automatically.
- Even if creative minds, which security engineers are IMO, didn't like it, but risk assessment based on design analysis is the key element to be successful.
What will come next?Automatic security tests are my major concern for the next product release, this would minimize regressions and gives the security engineers the time to concentrate on more sophisticated attacks as well on focusing on design improvements. I will try to leave my microcosm and get more involved in the upstream development as well as defining meaningful security guidelines for CSPs and for Cloud Computing software stack developers.
This project brings a lot of joy and the next major release will become even more secure ... so remember to have fun! :-)