Microsoft has open sourced Checked C, a research project meant to add bounds checking to C and C++.
One of the major vulnerabilities in C and C++ is the possibility to perform out-of-bounds memory accesses. Buffer overflows were responsible for 10-16% of all security vulnerabilities in US between 2010 and 2015, according to the National Vulnerability Database, NIST. And since most system software (operating systems, databases, compiler, interpreters, browsers, etc.) is written in one of those two languages, many systems out there are susceptible to memory corruption leaving them open to malfunction or attacks intended to steal private information or take control of a system. A group of researchers from Microsoft and University of Maryland is attempting to address these issues, proposing Checked C, an extension to C and a subset of C++ meant to add bounds checking to these languages.
Unlike modern languages such as Java or C# which come with automatic bounds checking, adding such verification to C is difficult, explains David Tarditi, one of the Microsoft researchers involved in the project:
There are two obstacles to adding bounds checking to C. First, it is not clear where to put the bounds information at runtime. Second, it is not clear how to make the bounds checking efficient for programs where performance matters. The solution of changing the representation of all C pointer types and arrays to carry bounds information is not sufficient in practice. C may be used at the base of systems where hardware or standards dictate data layout and data layout cannot be changed. C programs must also interoperate with existing operating systems and software that require specific data layouts.
Checked C enables programmers to write code in C/C++ that is “guaranteed to be bounds checked.” To achieve that, Checked C adds new pointer and array types that are bounds checked either during compilation or runtime:
ptr<T>
– a pointer to typeT
which does not require bounds checking. Such pointers are not allowed to be involved in pointer arithmetic. This pointer cannot benull
when accessing the memory. Most pointers are expected to be of this type.array_ptr<T>
– pointer to an element of an array containing values of typeT
. This pointer can be included in arithmetic operations. It cannot benull
when reading/writing memory. The responsibility for bounds checking is left to the programmer.span<T>
– a pointer that includes bounds information with it. It supports arithmetic operations. It cannot benull
when reading/writing memory.T array_var checked[100]
– declaring an array of typeT
and size 100 which is bounds checked. The checked array is converted to anarray_ptr
when C conversion applies.
The specification establishes the behavior of various operations involving pointer types for indirection, array reference, assignment, addition, comparison, address-of (&
), conversion from checked array type to pointer type, etc.
Existing C programs continue to work “as is”, with the observation that C *
remains unchecked and pointer arithmetic is allowed to avoid breaking existing code, but compilers will have to include a flag that reports a warning or an error when *
is not used accordingly.
Checked C is open sourced on GitHub, including the specification, a clang implementation and a LLVM implementation. Developers interested in this project are invited to contribute, either to improve the specification, proposing new functionality such as type casting or memory management, adding tests, or extending other compilers to support Checked C.
There have been other attempts to add bounds checking to C in the past, including using static analysis, enhanced compilers or runtimes that avoid modifying the language, program verification or new C-based languages. “Related work,” the Chapter 9 of the specification covers in detail these other approaches and explains why the authors chose to extend the language.