Cloud Discovery is an open-source tool from Twistlock that connects to cloud providers and gets an inventory of all the various infrastructure resources deployed. Cloud Discovery gathers and reports resources' metadata in an aggregated way.
With Cloud Discovery users won't have to poke around consoles from different cloud providers and manually navigate through all the service pages (like AWS EC2 or Azure Virtual Machines), then export the data and collate it again in a spreadsheet. Furthermore, applications' security holes can be identified when there's more visibility across environments, such as which resources are missing a firewall rule. For example, you could use Cloud Discovery to run security checks after cloud resources are created in a CI/CD pipeline, then alert or apply a patch automatically before the new changes go live. Regardless if the applications are deployed to AWS, Azure or Google Cloud Platform, Cloud Discovery only needs read-only permissions to collect the necessary information.
Besides discovering services provided by the cloud vendor, Cloud Discovery can also identify "self-installed" cloud-native components. For instance, Docker registries on EC2 instances, or Kubernetes API servers managed by users. Then, with the data collected identify weak security settings on those clusters like publicly opening SSH access.
InfoQ recently talked to Liron Levin, chief architect at Twistlock, to learn more about the use cases and how to use Cloud Discovery.
InfoQ: Why use Cloud Discovery? Why not use the existing native metrics tools of the cloud vendors to get an inventory of the resources deployed?
Liron Levin: Today, cloud providers are launching new services at an astounding rate that developers are excited to experiment with. These two trends culminate in a perfect storm where organizations are not able to easily keep tabs on exactly what services are being deployed, on what cloud provider or region, and with which specific permissions. We wanted to make it fast and easy for any organization to know exactly what services have been deployed and where, giving them visibility into the "unknown unknowns" of cloud-native service sprawl.
An organization would use Cloud Discovery to have better visibility across clouds to then start securing these resources, either PaaS or IaaS resources. The more services that are deployed that aren't secured, the more prone the organization is to attacks and threats. With Cloud Discovery, the organization is able to get an inventory of all the various cloud artifacts that have been deployed in their accounts across the public cloud providers instantly and understand which ones are secured and which ones are not.
InfoQ: How do you get started with Cloud Discovery in a multi-cloud environment? Where and how do you deploy the tool?
Levin: Cloud Discovery has been built to run as a standalone service in a container. So for example, you'd start a container by running the following command:
docker run -d --name cloud-discovery --restart=always \ -e BASIC_AUTH_USERNAME=admin -e BASIC_AUTH_PASSWORD=pass -e PORT=9083 -p 9083:9083 twistlock/cloud-discoverThen, to scan and list all AWS assets you'd query Cloud Discovery with the following API call:
curl -k -v -u admin:pass --raw --data \ '{"credentials": [{"id":"<AWS_ACCESS_KEY>","secret":"<AWS_ACCESS_PASSWORD>"}]}' \ https://localhost:9083/discoverOr if you want the same information from GCP, you'd make the following call:
SERVICE_ACCOUNT=$(cat <service_account_secret> | base64 | tr -d '\n') curl -k -v -u admin:pass --raw --data '{"credentials": [{"secret":"'${SERVICE_ACCOUNT}'", "provider":"gcp"}]}' https://localhost:9083/discover?format=jsonWe're currently in the process of adding instructions for Azure.
InfoQ: How can someone integrate Cloud Discovery with other tools?
Levin: We built Cloud Discovery with a specific focus on interoperability and ease of integration. Following the Unix philosophy of doing one thing well, it's designed to extract metadata about every cloud-native service you're using across every provider, account, and region and return it in a standard open JSON format. There's a whole universe of tools that already work well with JSON and make it easy to monitor, alert, and track changes over time. We've seen people use Cloud Discovery to provide reports to auditors, to integrate with their SIEM and alert on newly deployed rogue services, and even to help identify potential wasted cloud spend.
So if an organization is already using a monitoring stack that can consume JSON, the integration APIs can be used to pull in the JSON response and surface the data in that tool of choice. In fact, an app can be written to use Cloud Discovery on a timed interval to receive updates and then update the other monitoring and alerting tools via their APIs.
InfoQ: Could you give us an example of how to integrate Cloud Discovery? Could you configure alerts?
Levin: For example, someone could use Cloud Discovery to run a port scan command, like the following on their environment on a daily basis:
curl -k -v -u admin:pass --raw --data '{"subnet":"172.17.0.1", "debug": true}' https://localhost:9083/nmapThe above call would generate an output like the one in the following image:
In this example, a person would be able to quickly determine that someone has provisioned a registry and a MongoDB instance that are both insecure because they lack authorization.
From there, they send that JSON data to whatever app they're currently using for alerting and feed that data in on predefined schedule. Then, alert anyone who needs to know that there are insecure assets or assets that shouldn't be there.
InfoQ: What's the roadmap for Cloud Discovery?
Levin: Cloud Discovery already supports the three main cloud providers (AWS, GCP, and Azure), but we're looking to add support for IBM Cloud and Oracle Cloud in the near future. We are also working on adding an additional check for popular insecure apps, including RabbitMQ, Redis, Postgres, Wordpress (brute-force support), Elasticsearch, Kibana, Vault (ensure https configured), and Nats.
The third enhancement we're working on is adding brute-force detection of common passwords for these insecure apps. This includes additional brute-force detection checks that would check for pre-defined usernames and passwords, as well as a custom password list.
InfoQ: Cloud Discovery is the first stand-alone contribution that Twistlock has made. But you've been active in the open source community before this tool, correct?
Levin: Yes, our team built the authorization framework in Docker, which is used by OpenShift and Open Policy Agent and the third-party-pluggable secret backend for Docker Swarm. We also contribute to the Kubernetes CIS benchmark. Our CTO John Morello co-authored NIST SP 800-190, the Container Security Guide, and our research arm, Twistlock Labs, has found 14 security vulnerabilities that were publicly disclosed, with CVE IDs that were allocated to NVD. So in addition to open-source contributions, our team has contributed significantly to improving container security for the broader cloud-native ecosystem at large.
Some specific examples of our open-source contributions include a pluggable secrets backend for Docker, a plugin auth in Docker, privileged containers compatibility with users, TLS auth to Docker, and a change to Docker registry auth default behaviour.
You can access the code written in Go from the official GitHub repository, and take a look at which implementations on each cloud provider it supports.