The Grok-2 language model has been released in beta on the X platform, introduced alongside Grok-2 mini. The model, tested under the designation "sus-column-r" on the LMSYS leaderboard, has achieved a higher Elo Score compared to Claude 3.5 Sonnet and GPT-4-Turbo. Grok-2 mini, a smaller variant, is also part of the beta release, designed to offer a balance between speed and performance.
Both models have undergone evaluations across various academic benchmarks, including reasoning, reading comprehension, math, science, and coding. They exhibit enhancements over their predecessors and show competitive performance in areas such as graduate-level science and math competition problems.
The release on X includes updated features for Premium and Premium+ users, such as advanced text and vision understanding capabilities. Grok-2’s integration with real-time information from the X platform is also notable. Grok-2 mini aims to provide a balance between speed and response quality.
Later this month, both models will be accessible to developers via an enterprise API platform. This API will feature enhanced security, multi-region inference, and management tools.
Plans are in place for Grok-2 to support improved search functionality, post analytics, and reply features on the X platform. A preview of its multimodal capabilities is also expected soon.
Compared to recent LLM releases, Grok-2's advancements are positioned alongside notable models like GPT-4 and Claude 3.5. However, as with other recent model releases, there are ongoing discussions about the potential for misuse, particularly in image generation capabilities, though specific measures to address this have not been detailed by X.
User Silver-Chipmunk7744 commented on Reddit:
If you change it to coding, Claude 3.5 Sonnet is now 27 points above Grok mini. My guess is Claude is so obnoxious with all the moralizing and censorship that it's why it's so close in score to Grok Mini and GPT4o mini. One thing I do find odd is how close the ELO of the "mini" versions is to the main version. Only 30 ELO difference. Meanwhile, something like a GPT3.5 turbo is behind almost 200 points.
Elvis Savaria, founder & lead AI scientist at DAIR.AI, posted on his X account:
By now, you might have seen that Grok-2 ranks #2 in the LMSYS Chatbot Arena. Insane how fast the xAI team has produced a strong frontier model that competes with other very capable LLMs like GPT-4o, Gemini, and Claude 3.5 Sonnet.
The posts on X show clear enthusiasm for Grok-2's capabilities, especially its real-time data integration and more open conversational style. However, preferences also lean on personal needs, with some users valuing ChatGPT's established features, UI, and broader accessibility despite its limitations in real-time data access.