xAI, the AI company founded by Elon Musk, recently announced Grok, a large language model. Grok can access current knowledge of the world via the X platform and outperforms other LLMs of comparable size, including GPT-3.5, on several benchmarks.
xAI was launched earlier this year and trained their first model, the 33B parameter Grok-0. The company has not disclosed the parameter or training details of its latest version, Grok-1, but says that the model outperforms GPT-3.5 and Llama 2 on several benchmarks, including mathematics benchmarks GSM8k and MATH, the question-answering benchmark MMLU, and the coding benchmark HumanEval. The model is touted as having "bit of wit and has a rebellious streak," and xAI claims it will answer questions that other LLMs will not. According to the xAI team:
By creating and improving Grok, we aim to gather feedback and ensure we are building AI tools that maximally benefit all of humanity. We believe that it is important to design AI tools that are useful to people of all backgrounds and political views. We also want to empower our users with our AI tools, subject to the law. Our goal with Grok is to explore and demonstrate this approach in public. We want Grok to serve as a powerful research assistant for anyone, helping them to quickly access relevant information, process data, and come up with new ideas. Our ultimate goal is for our AI tools to assist in the pursuit of understanding.
While the word "grok" was coined by Robert Heinlein's in his sci-fi novel Stranger in a Strange Land, xAI says that their model is inspired by the the Hitchhiker's Guide to the Galaxy, the eponymous fictional guidebook of Douglas Adams's sci-fi series. According to xAI, it is "intended to answer almost anything...."
Although technical details about Grok are scarce, xAI mentioned that they built a custom ML framework for training and inference using JAX, Rust, and Kubernetes; they also mention that the model was trained for two months. xAI founding member Toby Pohlen posted a thread on X with videos demonstrating the Grok UI. Further, the X account for the Qdrant open-source vector database posted that Grok's real-time knowledge capabilities are built on Qdrant, and encourages users to "stay tuned" for more details in a future blog post and tech talk with the X engineering team.
Reaction to the announcement was mixed. On Reddit, one user praised the effort, saying:
Beating Meta with just two months of training is really impressive. We know they have at least 10,000 H100s, which is more compute than was used for GPT-4. It seems like they are going to continue with rolling releases, so it will probably improve quickly. Also, it's nice that the model seems much less censored, as this will push other companies to do the same.
Hacker News users were more skeptical. One user speculated that Grok's benchmark scores could be due to training on the test set:
Many of the modern LLMs take an entire copy of the internet which includes the test set for many of these benchmarks. So if someone claims to beat ChatGPT and their model is trained on the test set, of course they’ll do better. Even ChatGPT is likely trained on the test set.
xAI said that they could not rule out that possibility. However, the team also hand-graded the model's attempt at the Hungarian national high school final exam in mathematics, which was published after their dataset was collected. On this exam, Grok outperformed both GPT-3.5 and Claude 2.
Other users questioned whether Grok's touted lack of censorship meant that xAI was "brushing off" concerns of bias and other risks. xAI said that they are working on "safeguards against catastrophic forms of malicious use." The company lists Dan Hendrycks, the director of the Center for AI Safety, as an advisor. Hendrycks recently appeared on the Future of Life Institute podcast to discuss AI risks. In the podcast, Hendrycks said of xAI:
I think it's relevant to note that [xAI is] a fairly serious effort. I'd anticipate it would probably be one of the main three AI companies next year or the year after: OpenAI, Google DeepMind, and xAI. I don't think of it as a smaller effort: it has the capacity to have a substantial show of force.
A waitlist for early beta access to Grok is available only to verified X users.