The Data and AI company Databricks recently unveiled Lakehouse AI, a suite of tools for building and governing generative AI models, including large language models (LLMs), within the Databricks platform. Among the tools were LakehouseIQ, a "knowledge engine" that uses AI to understand a company's unique data, culture, and language in order to improve natural language interfaces like chatbots.
Databricks updated its Delta Lake standard for the analytic data tables in data-lake systems. Delta Lake competes with other open-source standards like the Apache Foundation's Iceberg and Hudi. "There's all this talk about format wars, and it's actually really unfortunate," Databricks CEO and co-founder Ali Ghodsi said. The key innovation in Delta Lake 3.0 is its new Universal Format, or "UniForm", capability. This feature enables data stored in Delta Lake data tables to be read as if it were in Apache Iceberg or Apache Hudi format.
On the governance front, Databricks released new federation capabilities in its Unity Catalog service that break down silos between data systems on the platform. These new features enable customers to locate, query, and govern data across all connected systems from a single interface, without the necessity of moving or duplicating the data. The federation includes the ability to map and query data from sources like MySQL, Redshift, Snowflake, and BigQuery, in addition to Databricks. Users can apply access control and auditing to data from a single point.
Databricks also announced it would acquire MosaicML. The acquisition of MosaicML aims to take the cost of hosting models down to hundreds of thousands of dollars per model and enable businesses to train customized language models at a lower cost by combining Databricks' data infrastructure with MosaicML's model-training platform. "Not everybody, every application, requires a GPT-4," said Sreekar Krishna, KPMG's U.S. artificial-intelligence leader, referring to OpenAI's large language model.
Some industry watchers question if Databricks' reliance on its proprietary platform could deter new customers due to concerns about vendor lock-in. Ori Rafael, the CEO of data pipeline company Upsolver, sees echoes of the data warehouse lock-in that customers experienced with Oracle and Teradata in the new cloud data warehouse offerings. "I think the Lakehouse [has] too much branding around it and not enough essence," Rafael said. Ghodsi argued, "Databricks and MosaicML have an incredible opportunity to democratize AI and make the Lakehouse the best place to build generative AI and LLMs."
For more information on Databricks Data and AI Summit 2023, check out the conference main website and session catalog.