Microsoft has announced their implementation of the Apache Avro wire protocol. Avro is described a “compact binary data serialization format similar to Thrift or Protocol Buffers” with additional features needed for distributed processing environments such as Hadoop.
In order to make the protocol as fast as possible, the Microsoft Avro Library uses expression trees to build and compile a custom serializer at run time. After the initial hit to compile the serializer into IL code, this should provide significantly better performance than reflection-based algorithms.
Unlike Protocol Buffers, the Avro protocol is self-describing. When the connection is made between client and server, the schema is transmitted. Usually just once, so neither have to hard code the binary format nor do you need to pay the price to transmit the schema in each message.
Because of this, the Microsoft Avro Library can support three modes:
- Reflection mode. The IL code for the serializer is built based on the schema of .NET types to achieve maximum performance.
- Generic record mode. The JSON schema of the data can be specified at runtime so that it provides the ability for handling dynamic data with arbitrary schema.
- Container mode. The library can generate portable files with embedded schema. The file format is compatible with Avro container file specification and can be used across platforms.
When used in reflection mode, Avro uses the same DataContract/DataMemeber attributes that WCF developers are familiar with.
In generic record mode it is assumed that you don’t have a .NET class predefined to store the data. Instead you use the AvroRecord class in conjunction with a JSON document that describes the format of the data. AvroRecord objects need to be accessed in a late bound manner (C# dynamic, VB Option Strict Off).
Container mode can be used in conjunction with reflection or generic record mode. Since you are creating files in this mode instead of sending messages over the wire you can compress and/or encrypt the data using whatever means you prefer. Out of the box you get no compression or deflate, but instructions for building your code codec are included.