In a recent blog article, the Twitter engineering team has shared architectural details of their internal Qurious data insights platform and its advantages for real-time analysis. Designed for business customers, the platform allows users to analyze Twitter's BigQuery data using natural language queries and create dashboards. The goal of the project is to increase agility when creating actionable insights from streaming data. The team emphasizes that Qurious is a step forward in cutting the costs and time to generate such reports.
Deployed on GKE containers, the system's main component is the GCP's Data QnA service for translation of natural language phrases to SQL commands that can be run in BigQuery (Python client Github link). It also includes GCP Cloud SQL for storing peripheral data, Cloud Load Balancer, and GCP Cloud Logging service. The system is augmented with a cache for frequently asked questions and a suggestions module.
BigQuery is a popular data warehouse for OLAP applications provided by Google Cloud Platform. It supports SQL (ANSI:2011) and separates the storage-compute boundary with its serverless nature. Its Data QnA service, which is currently private alpha, is based on the Analyza paper. It is aimed at decreasing the barrier to analytical processing by converting natural language instructions to related SQL code snippets that can be run against BigQuery data. The system is also experimented with Google's spreadsheet products to automatically generate formulas.
Implementing natural language interface to databases is a long-standing issue in database engineering (link to a related 1995 review, a theoretical framework paper). For designing such a system, the tradeoff between predictability/reliability and intelligence has to be balanced. Analyza architecture is not based on machine learning, instead, continuous improvement of the system is provided by the curator and the knowledge base/graph. This enables the system to be predictable and intelligent enough to permit production usage. As the usage of the product generates data, an opportunity for the integration of machine learning models into the system may also be possible in the future.
For more information about BigQuery Data QnA, the following case study presented in Google Cloud Next can also be viewed.