BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon Updates Transcribe with Automatic Redaction of Personally Identifiable Information Feature

Amazon Updates Transcribe with Automatic Redaction of Personally Identifiable Information Feature

Amazon Transcribe is an automatic speech recognition (ASR) service, allowing customers to add speech-to-text capabilities to their applications. Recently, the public cloud provider made a significant update to the service with an automatic redaction of Personally Identifiable Information (PII) feature.

Transcribe is popular with AWS customers leveraging the service for automatic transcription of customer calls to build datasets for further analysis and natural language processing tasks such as sentiment analysis. With these calls, the exchanging of personal data, such as address information and credit card numbers, are possible. Nevertheless, with the implementation of the California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR) in place, personal information should not need to end up in these datasets.

Hence, Amazon has now included a new feature, which can automatically identify information like a Social Security number, credit card number, bank account number, name, email address, phone number and mailing address and redact that. Transcribe will automatically replace these types of information with ‘[PII]’ in the transcript.

Source: https://aws.amazon.com/blogs/aws/now-available-in-amazon-transcribe-automatic-redaction-of-personally-identifiable-information/

As Julien Simon, Artificial Intelligence & Machine Learning Evangelist Amazon EMEA, explains in his blog post about the automatic of Personally Identifiable Information (PII) feature :

They will be replaced with a ‘[PII]’ tag in the transcribed text. You also get a redaction confidence score (instead of the usual ASR score), as well as start and end timestamps. These timestamps will help you locate PII in your audio files for secure storage and sharing, or for additional audio processing to redact it at the source.

Other public cloud vendors offer similar capabilities, such as Google with its data loss prevention API that customers can use in conjunction with its speech-to-text service to identify and redact sensitive data. Similarly, Microsoft offers a speech-to-text service on Azure, which in combination with its Information Protection service can help to identify and redact sensitive data.

The automatic of Personally Identifiable Information (PII) feature is currently available in US English in several regions around the world: 

  • US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), AWS GovCloud (US-West),
  • Canada (Central), South America (São Paulo)
  • Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt)
  • Middle East (Bahrain)
  • Asia Pacific (Mumbai), Asia Pacific (Hong Kong), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo)

Pricing details of Transcribe are available on the pricing page.

Rate this Article

Adoption
Style

BT