Databricks, headquartered in San Francisco, is an American data and AI company known for their lakehouse software that stores and analyzes data. The company is relied upon by more than 10,000 organizations worldwide, including over 50% of the Fortune 500, and was recently valued at 43 billion USD. Despite the company's focus on data warehouses and lakes, it is expanding its offerings to include generative AI on its platform. Determined to become the preferred data platform for developing generative AI solutions, Databricks has strategically pursued this objective.
On June 26, 2023, the company made a significant announcement, revealing the acquisition of the generative AI startup MosaicML for approximately $1.3 billion. With this acquisition, the company secured the biggest acquisition of the generative AI era to date[1]. Following the acquisition, the entire MosaicML team, consisting of 62 employees, including MosaicML’s industry-leading research team, will join Databricks. This translates to Databricks paying $21 million per employee.
In addition to Databricks acquiring the entire staff from MosaicML, the company gains access to MosaicML's platform, enabling customers to easily train and deploy large language models on their proprietary data. Following this development, Databricks has introduced the Data Intelligence Platform, seamlessly incorporating MosaicML's generative AI expertise into its lakehouse[1]. With this strategic move, Databricks is targeting companies that do not have the resources to build their own large language model from scratch and companies that prefer not to partner with big tech due to e.g. security risks. Databricks CEO, Ali Ghodsi, states: “What we are seeing most interest in is people who have very sensitive data who want to build their own AI (…) we’re helping them do that”[2]. When integrating MosaicML into Databricks, the company argues that businesses can quickly and cost-effectively build and train their own large language models. Databricks claims that this could potentially reduce costs from millions of dollars to mere thousands[2].
The intention is to shift the paradigm from relying on major tech players like OpenAI and Google to putting organizations in control of their AI initiatives and proprietary data. Databricks advocates for the success of open-source generative AI. They assert that many customers are adopting open-source LLMs for various generative AI applications. According to Databricks, as the quality of open-source software models improves rapidly, customers are increasingly experimenting with these models to assess factors such as quality, cost, reliability, and security[3]. The key distinction between collaborating with, for example, OpenAI, and using Databricks’ platform is that customers can retain complete ownership of their generative AI models. Moreover, since these models are constructed using open-source technologies, there is no risk of vendor lock-in, as the responsibility for the models is held by an open-source community rather than a single company[3].
[1]Forbes: ”Databricks’ New AI Product Adds A ChatGPT-Like Interface to Its Software”, 2023
[2]CIOTrends: ”Databricks to Acquire Generative AI Platform MosaicML”, 2023
[3]Databricks: ”Build on Open Source”, n.d.