Meta uses authentication to protect its service’s endpoints against abusive usage. Post-processing access data to remove personally identifiable information is an approach they found too resource-intensive. An article was published recently explaining how Meta leveraged de-identified authentication to protect their services and their user’s privacy at the same time.
De-identified authentication is a privacy-friendly form of authentication because it is accomplished with anonymous credentials, i.e., without credential metadata that could be used to identify any particular user.
Meta used anonymous credentials to implement de-identified authentication in a highly available core service called Anonymous Credential Service (ACS), built on Meta’s container orchestration framework Twine.
Anonymous credentials work with de-identified authentication because the authentication process is divided into two phases (token issuance and de-identified authentication) after a setup phase.
Source: https://engineering.fb.com/2022/03/30/security/de-identified-authentication-at-scale/
The client first obtains the server’s public information during the setup phase, including its public keys.
The token issuance phase uses blind signatures. In the end, the client possesses a shared secret calculated from a token it created randomly and the server’s token signature. Because the signature is blind, the server has never seen the original token.
Authenticated requests are made with de-identified authentication. In these requests, the client sends the original token, the request payload and an HMAC of the payload with the shared secret. The server uses this information to verify that it validated the token during the issuance phase.
An essential requirement for the ACS service is token scope isolation, so tokens issued for one use case cannot be used in another use case. Such isolation is achieved by mandating a use case name in each ACS API request and having different server keys per use case.
Meta implemented three techniques to reduce the number of resources required by the ACS service.
The first was to allow limited reuse of previously issued tokens with use case-specific thresholds, thus reducing the number of token issuance calls made to ACS. The second involved ensuring that clients spread batched requests involving ACS randomly in time to avoid traffic spikes. These techniques minimised the use of computing resources.
The third technique had the ACS team eliminate the manual effort necessary to onboard new use cases by implementing a self-service onboarding portal to automate provisioning.
Meta leverages de-identified authentication to publish WhatsApp telemetry without collecting the user’s identity for such log requests. Another example of a use case using de-identified authentication at Meta is federated learning. This technique allows training a machine learning model while keeping sensitive data on client devices by having the devices share the model updates with the server instead of the raw data.
User privacy and de-identification are widespread concerns in the industry. Jean Yang, CEO of Akita Software, spoke at QCon 2019 about the operationalisation of privacy and compliance, mentioning de-identification.