GitHub announced improvements to its code search and code navigation capabilities. The new code search, which is still available experimentally, features now the possibility of finding code symbols and using regular expressions. Code navigation has been made available from within pull requests and extended to provide more precise information for Python repos.
For code search, our vision is to help every developer search, discover, navigate, and understand code quickly and intuitively. [...] And once you get to a result page, the rich browsing experience is optimized for reading and understanding code, allowing you to make sense of unfamiliar logic quickly, even for code outside your IDE.
The new code search features are available as a technology preview at https://cs.github.com. Not all of GitHub repositories are currently included in the new search index, which nevertheless spans over 5 million public repositories. Those include highly popular repos and all public repositories of users in the technology preview. Users in the technology preview can additionally search their own private repos.
GitHub new search index is built parsing code with Tree-sitter to extract symbols in a syntax-aware way. Supported languages include C#, Python, Go, Java, JavaScript, TypeScript, PHP, Protocol Buffers, Ruby, and Rust, with others coming. At the moment, though, only the default branch of included repositories can be searched, with the exclusion of binary and generated files, and files over 350 KiB in size.
To be able to search for symbols as well as to support regular expressions, GitHub introduced a new syntax with specific operators. For example, you can search for a given symbol using
language:go symbol:Maint::deleteRows
Symbol search also supports the use of regular expressions, like in the following example, which finds all conversions implemented for the String
type:
language:rust symbol:/^String::to_.*/
Besides language
and symbol
, the new search supports a number of others qualifiers, such as path
, extension
, repo
, org
, etc. Furthermore, qualifiers may be combined using OR
and AND
operators.
GitHub stated they will grow the index to include every public repository as well as expand the number of supported languages. Likewise, they will enhance search capabilities using scoring and ranking heuristic to return more fitting results.
Speaking of code navigation, a feature previously available when viewing files through GitHub's UI, it has been extended to work for pull request as well. This will make it easier, for example, to see which functions could be affected by a given change.
While code navigation is still based on a purely textual search to find out which definitions are related to a given reference, this has started evolving. In particular, for Python repositories GitHub has started to use stack graphs to improve the way code navigation identifies which specific definition each reference refers to. This will foreseeably reduce the level of noise you may get when attempting to navigate to very common symbols.
In the next months, GitHub will add support for stack graphs to additional languages as well.