InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Apple Unveils Apple Foundation Models Powering Apple Intelligence

AI, ML & Data Engineering

Apple Unveils Apple Foundation Models Powering Apple Intelligence

This item in japanese

Aug 27, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Apple published the details of their new Apple Foundation Models (AFM), a family of large language models (LLM) that power several features in their Apple Intelligence suite. AFM comes in two sizes: a 3B parameter on-device version and a larger cloud-based version.

The smaller model, AFM-on-device, was created by pruning a 6.4B parameter model; the larger model, known as AFM-server, was trained "from scratch," but Apple did not disclose its size. Apple did release details of both models' development: both are based on the Transformer decoder-only architecture, pre-trained on 6.3T tokens of data. The models use pluggable task-specific LoRA adapters that are chosen at runtime to tailor model performance for specific tasks, such as proofreading or replying to email. Apple evaluated both models on several benchmarks, including instruction-following and mathematical reasoning, and found that they "compared favorably," and in some cases outperformed, similar-sized models such Llama 3 or GPT-4. According to Apple:

Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.

InfoQ recently covered Apple's announcement of Apple Intelligence at their WWDC 2024 event. InfoQ also covered Swift Assist, a code generation model integrated with XCode, which Apple describes as being part of the same family of generative AI models as AFM.

The adapter architecture allows AFM to be modified "on-the-fly" for specific tasks. The adapters are "small neural network modules" that plug into the self-attention and feed-forward layers of the base model. They are created by fine-tuning the base model with task-specific datasets. The adapter parameters are quantized to low bit-rates to save memory; the on-device adapters consume on the order of 10 MB, making them suitable for small embedded devices.

Apple took several steps to ensure AFM produced safe output. In addition to ensuring that no user data was included in their pre-training set, Apple applied filtering to remove harmful content, spam, and PII. In the fine-tuning stage, Apple treated "safety alignment as one of the many core post-training tasks" and more than 10% of the fine-tuning data was safety-related. They also performed manual and automated "red-teaming" to identify and test model vulnerabilities.

Apple evaluated AFM's performance on a variety of benchmarks and compared the results to several baseline models, including GPT-4, Llama 3, and Phi-3. In tests where human judges ranked the outputs of two models side-by-side, AFM-on-device outperformed larger models Gemma-7B and Mistral-7B. AFM-server achieved "competitive" results, with a win-rate of 52% against GPT-3.5.

Ruoming Pang, the lead author of Apple's technical report on AFM, posted on X that

While these LMs are not chatbots, we trained them to have general purpose capabilities so that they can power a wide range of features including summarization, writing assistance, tool-use, and coding.

Several other users posted their thoughts about AFM on X. Huggingface engineer Vaibhav Srivastav summarized the report, calling it "quite feature packed" and saying he "quite enjoyed skimming through it." LiquidAI Staff ML Scientist Maxime Labonne estimated that AFM-server might have ~70B parameters, but lamented that the paper had "almost no details" on this model's size.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Apple Unveils Apple Foundation Models Powering Apple Intelligence

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter