Stack Overflow recently announced making its dataset available through Google’s BigQuery. Using regular SQL statements, developers can query the full set of Stack Overflow data including posts, votes, tags, and badges. Using BigQuery’s REST API developers can export data on demand using their tool of choice. Available datasets in BigQuery can be JOINed using plain SQL allowing developers to derive useful insights across domains.
Along with Stack Overflow dataset, BigQuery provides access to Hacker News dataset and GitHub dataset which together with Stack Overflow can give useful insights. Also added recently are three datasets around New York City data, regarding motor vehicle collisions, Citi bike trips and non-emergency municipal service requests calls to 311. These complement an existing BigQuery dataset with every taxi and limousine trip in New York City from 2009 to 2015.
Other BigQuery datasets currently available include weather information dated for some data as far as 1763, Medicare data, 3.5 million digitized books, an image dataset with metadata and labelling for 9 million URLs, IRS and Major League Baseball data. A dataset with worldwide news and events, updated every 15 minutes is also available through the GDELT project, along with Genomics datasets through the Personal Genome Project, Wikipedia page views data and almost 2 billion Reddit comments.
Stack Overflow dataset is available through the BigQuery console here and further discussion is available in the reddit community.