InfoQ Homepage Model Inference Content on InfoQ
News
RSS Feed-
Meta Optimises AI Inference by Improving Tail Utilisation
Meta (formerly Facebook) has reported substantial improvements in the efficiency and reliability of its machine-learning model serving infrastructure by focusing on optimising tail utilisation.
-
JLama: The First Pure Java Model Inference Engine Implemented With Vector API and Project Panama
Karpathy's 700-line llama.c inference interface demystified how developers can interact with LLMs. Even before that, JLama started its journey of becoming the first pure Java-implemented inference engine for any Hugging Face model, from Gemma to Mixtral. Leveraging the new Vector API and PanamaTensorOperations class with native fallback the library is available in Maven Central.