In order to take advantage of the rich tooling available on the Windows platform, compiler writers such as LLVM need to be able to generate PDB files. PDB or Program Debug Database is a literal database describing compiled code on the Windows platform. Containing various types of records, it allows tools such as debuggers to map between compiled code and source code.
In order to improve performance, this data is heavily indexed. And that’s part of the problem. Zach Turner of LLVM’s Windows team writes:
CodeView is a debug information format invented by Microsoft in the mid 1980s. For various reasons, other debuggers developed an independent format called DWARF, which eventually became standardized and is now widely supported by many compilers and programming languages. CodeView, like DWARF, defines a set of records that describe mappings between source lines and code addresses, as well as types and symbols that your program uses. The debugger then uses this information to let you set breakpoints by function name, display the value of a variable, etc. But CodeView is only somewhat documented, with the most recent official documentation being at least 20 years old. While some records still have the format documented above, others have evolved, and entirely new records have been introduced that are not documented anywhere.
[…]
[PDB] contains CodeView but it also contains many other things that allow indexing of the CodeView records in various ways. This allows for fast lookups of types and symbols by name or address, the philosophical equivalent of “tables” for individual input files, and various other things that are mostly invisible to you as a user but largely responsible for making the debugging experience on Windows so great. But there’s a problem: While CodeView is at least kind-of documented, PDB is completely undocumented. And it’s highly non-trivial.
Microsoft provides tooling and SDKs for consuming PDB files, but nothing for generating them. And even these require the use of proprietary libraries as the open source PDB code doesn’t even compile.
Based on that partial code upload from Microsoft, the LLVM team was able to build their own PDB generator. While still considered “alpha quality”, it allows applications compiled using CLANG and the LLVM backend to start working with Windows tooling. Turner continues:
We’d love for you to try it out and report issues on our bug tracker. To get you started, download the latest snapshot of clang for Windows.
As part of their exploration into supporting PDB, LLVM has documented the PDB format. While not complete, it offers an important looking into the complicated format that was previously unavailable.
To supplement this, they have also built a tool called llvm-pdbutil. Among other things, this allows for two-way conversion between YAML and PDB. (For those of you who don’t know, YAML is a human-readable format that uses whitespace instead of brackets. It is probably best known as the format used for the API documentation language RAML.)
It should be noted that there are actually two PDB formats. In addition to the full version discussed above, there is also a Portable PDB format intended just for .NET Core applications. Portable PDB is documented with an open source library for reading it.