Hortonworks has quietly made available the DataFlow platform which is based on Apache NiFi and attempts to solve the processing needs of the IoAT.
Hortonworks has recently introduced the DataFlow (HDF) platform to an audience of oil and gas producers during a webinar. HDF is based on Apache NiFi which is a real-time data streaming and processing system open sourced by NSA last year. The initial name of the project was Niagarafiles. When NiFi was open sourced, several former NSA developers founded Onyara, a company continuing the development of the project and providing support. Hortonworks has recently bought Onyara and integrated those developers into their team.
Because NiFi can be used to stream data coming from a wide variety of sources, Hortonworks considers HDF fit for the Internet of AnyThing (IoAT). Data flowing in HDF is multidirectional and point-to-point, enabling users to interact with the stream of collected data and even reach out to the source of it, down to sensors and devices. HDF is complementary to HDP, the former dealing with data-in-motion while the later, based on Hadoop, getting insights from data-at-rest.
NiFi was built with a number of concepts in mind: the ability to granularly manage the flow of information, tracking everything that happens with data – where it comes from and what happened with it along the way, and securing the control and data planes. NiFi’s main features are:
- Guaranteed data delivery
- Data buffering with a back-pressure mechanism
- Prioritized queuing
- QoS
- Data provenance – NiFi keeps a log with every change a data has gone through enabling traceability, data recovery and replay, auditing, evaluation
- Logging detailed history of data
- Interactive command and control console providing visual feedback for system changes
- Flow templates
- Pluggable/multi-role security
- Extendability
- Clustering
NiFi is not just for IoT, being useful for all sorts of real-time data processing needs: predictive analytics, fraud detection, big data ingest, resource evaluation, and others. NiFi comes out of the box with 90 data processors including encoders, encrypters, compressors, converters, creating Hadoop sequence files from data flows, interacting with AWS, sending messages to Kafka, getting messages from Twitter, and others. One can configure the data processors through a drag&drop visual UI, chaining them and using back-pressure between them to control the data flow. The tool has built-in scalability, request replication, load balancing and failover.
On the roadmap one can find: better configuration management of flows, an extension and template registry, first class Avro support, interactive queue management, multi-tenant data flow, and others.
HDF can be tested in a sandboxed environment with Apache Ambari.