BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Apple Unveils Apple Foundation Models Powering Apple Intelligence

Apple Unveils Apple Foundation Models Powering Apple Intelligence

This item in japanese

Apple published the details of their new Apple Foundation Models (AFM), a family of large language models (LLM) that power several features in their Apple Intelligence suite. AFM comes in two sizes: a 3B parameter on-device version and a larger cloud-based version.

The smaller model, AFM-on-device, was created by pruning a 6.4B parameter model; the larger model, known as AFM-server, was trained "from scratch," but Apple did not disclose its size. Apple did release details of both models' development: both are based on the Transformer decoder-only architecture, pre-trained on 6.3T tokens of data. The models use pluggable task-specific LoRA adapters that are chosen at runtime to tailor model performance for specific tasks, such as proofreading or replying to email. Apple evaluated both models on several benchmarks, including instruction-following and mathematical reasoning, and found that they "compared favorably," and in some cases outperformed, similar-sized models such Llama 3 or GPT-4. According to Apple:

Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.

InfoQ recently covered Apple's announcement of Apple Intelligence at their WWDC 2024 event. InfoQ also covered Swift Assist, a code generation model integrated with XCode, which Apple describes as being part of the same family of generative AI models as AFM.

The adapter architecture allows AFM to be modified "on-the-fly" for specific tasks. The adapters are "small neural network modules" that plug into the self-attention and feed-forward layers of the base model. They are created by fine-tuning the base model with task-specific datasets. The adapter parameters are quantized to low bit-rates to save memory; the on-device adapters consume on the order of 10 MB, making them suitable for small embedded devices.

Apple took several steps to ensure AFM produced safe output. In addition to ensuring that no user data was included in their pre-training set, Apple applied filtering to remove harmful content, spam, and PII. In the fine-tuning stage, Apple treated "safety alignment as one of the many core post-training tasks" and more than 10% of the fine-tuning data was safety-related. They also performed manual and automated "red-teaming" to identify and test model vulnerabilities.

Apple evaluated AFM's performance on a variety of benchmarks and compared the results to several baseline models, including GPT-4, Llama 3, and Phi-3. In tests where human judges ranked the outputs of two models side-by-side, AFM-on-device outperformed larger models Gemma-7B and Mistral-7B. AFM-server achieved "competitive" results, with a win-rate of 52% against GPT-3.5.

Ruoming Pang, the lead author of Apple's technical report on AFM, posted on X that

While these LMs are not chatbots, we trained them to have general purpose capabilities so that they can power a wide range of features including summarization, writing assistance, tool-use, and coding.

Several other users posted their thoughts about AFM on X. Huggingface engineer Vaibhav Srivastav summarized the report, calling it "quite feature packed" and saying he "quite enjoyed skimming through it." LiquidAI Staff ML Scientist Maxime Labonne estimated that AFM-server might have ~70B parameters, but lamented that the paper had "almost no details" on this model's size.

About the Author

Rate this Article

Adoption
Style

BT