With the acquisition of startup Semmle, GitHub aims to make continuous vulnerability detection part of their continuous integration and continuous deployment service.
Announcing the acquisition, GitHub CEO Nat Friedman wrote:
Semmle’s revolutionary semantic code analysis engine allows developers to write queries that identify code patterns in large codebases and search for vulnerabilities and their variants.
Semmle created LGTM, a continuous code analysis platform aimed to identify vulnerabilities in software systems. At the heart of LGTM lies Semmle QL, a query language and code search engine that enables code analysis to find and eradicate security vulnerabilities.
QL uses variant analysis, a technique security engineers typically use to identify vulnerabilities starting from a known vulnerability that is treated as a seed for the search process. In other words, once a vulnerability is identified through, e.g., pen-testing or other techniques, security researchers audit the rest of the code base to find similar problems. That is the process QL automates and extends across multiple code bases, allowing developers to write queries that can be shared and reused. According to Semmle, their solutions have been used to identify thousands of vulnerabilities, including over 100 CVEs in open source projects.
One important feature of Semmle analysis engine is it treats code as data at the AST-level, rather than as text that you operate on using regular expressions. This is an example of how you can analyse a C function that iterates over an array passed as argument and ensure an array of sufficient length is passed at each call location:
import cpp
from Function f, FunctionCall c, int i, int a, int b
where f = c.getTarget()
and a = c.getArgument(i).getType().(ArrayType).getArraySize()
and b = f.getParameter(i).getType().(ArrayType).getArraySize()
and a < b
select c.getArgument(i), "Array of size " + a
+ " passed to $@, which expects an array of size " + b + ".",
f, f.getName()
In the snippet above, f
is the function, c
the function call, i
an integer used to iterate over the call arguments, and a
and b
are used to store the expected array size and the actual array size. The following snippet aims to ensure all public fields of a class are declared final:
from Field f
where f.hasModifier("public")
and
not(f.hasModifier("final"))
select f.getDeclaringType().getPackage(),
f.getDeclaringType(),
f
As you can see, QL syntax is declarative -- somewhat resembling SQL -- and object-oriented. Semmle currently supports C and C++, C#, COBOL, Java, JavaScript and TypeScript, and Python. Go support is in the workings.
Semmle QL is not a new product, and has been used by many large companies, says Semmle, including Uber, NASA, Microsoft, and Google. This has contributed to building a large library of QL queries. According to Semmle, developers can reuse thousands of open-source queries and execute them as part of their automatic CI pipelines whenever they send a new pull request.
Presently, Semmle QL can be used through LGTM, which is able to connect to your GitHub account, but GitHub plans to make it an integral part of it CI/CD service through GitHub Actions.