Microsoft recently announced that the Azure Event Hubs schema registry now includes JSON schema support, providing Kafka applications with a centralized repository for schema documents used in messaging-centric and event-driven applications. The JSON schema support is currently in public preview.
Azure Event Hubs is a cloud-based service that enables the ingestion, storage, and processing of large-scale event streams from various sources for real-time analytics and data integration. Its schema registry feature provides users a repository to use and manage schemas in schema-driven event streaming scenarios. By using Azure Schema Registry, both the producer and consumer applications can seamlessly exchange data without the need to handle and distribute the schema.
The company further expands the capabilities of Azure Schema Registry in Event Hubs by adding support for JSON schemas to enable schema-driven event streaming. Kasun Indrasiri, a senior product manager of Azure Messaging at Microsoft, explains in a Tech Community blog post the rationale for bringing JSON support to Event Hubs Schema Registry:
JSON Schema is used to define and validate the structure of JSON data. It helps ensure consistency, completeness, and accuracy of data by providing a clear definition of the expected format. By incorporating JSON Schema validation into the event streaming applications, developers can ensure that any data being produced or consumed adheres to the predefined schema. This helps to prevent issues such as missing fields, incorrect data types, and inconsistent data formats.
In client-side schema validation use cases, developers can use Azure Event Hubs schema registry for client application event serialization and de-serialization. In the case of Kafka applications, a Kafka producer application utilizes the JSON schema stored in Azure Schema Registry to serialize events, which are then published to a Kafka topic/event hub within Azure Event Hubs. At the same time, the Kafka consumer retrieves the JSON schema from Azure Schema Registry using the schema ID of the event to deserialize the consumed events from Event Hubs.
Source: https://learn.microsoft.com/en-us/azure/event-hubs/schema-registry-json-schema-kafka
Developers can use JSON Schema by creating a new schema group in Azure Event Hubs Schema Registry in Event Hubs and use JSON Schema as the schema format. Subsequently, they can create the JSON Schemas for schema validation under that schema group.
Like Azure Event Hubs schema registry, Apache Kafka, a distributed streaming platform, has a schema registry called Confluent Schema Registry allows users to define and manage schemas for their events, ensuring data compatibility and consistency. Another example is Amazon Managed Streaming for Apache Kafka (Amazon MSK). This fully managed service runs Apache Kafka on the AWS platform, which allows using third-party tools like Confluent Schema Registry or the AWS Glue Schema Registry to manage schemas. And finally, the Google Cloud Pub/Sub messaging service offers schema management using Google Cloud's Schema Registry.
Jesse Squire, an engineer working on the Azure SDKs at Microsoft, noted some benefits for JSON Schema support in Azure Event Hubs Schema Registry like:
- Contributes to the Kafka interoperability story for Event Hubs, focusing on cross-product producing and consuming scenarios.
- Enables a consistent developer experience with Schema Registry across different schema formats, reducing support costs and special-case documentation needs.
Lastly, more details on Azure Event Hubs can be found on the documentation landing page. In addition, pricing and availability are available on the pricing page.