The enterprise data cloud company Cloudera recently announced the general availability (GA) of Cloudera DataFlow for the Public Cloud, a cloud-native service for data flows to process hybrid streaming workloads on the Cloudera Data Platform (CDP). With Cloudera DataFlow for the Public Cloud, customers can automate complex data flow operations, improve the operational efficiency of streaming data flows with auto-scaling capabilities, and reduce cloud costs by removing infrastructure sizing guesswork.
With Cloudera DataFlow for the Public Cloud, the company brings a FlowOps service with several capabilities such as:
- Central Flow Catalog for manageability, discovery, and version control
- Central dashboard for monitoring, troubleshooting, and performance tuning of data flows across multiple cloud clusters
- Simple deployment wizard and robust APIs for auto-scaling flows on Kubernetes managed by CDP
- Pre-built flows called "ReadyFlows" for some of the common streaming use cases
Under the hood, Cloudera DataFlow for the Public Cloud leverages Kubernetes as the scalable runtime, and it provisions NiFi clusters on top of it as needed. The foundation is a brand new Kubernetes Operator developed from the ground up to manage the lifecycle of Apache NiFi clusters on Kubernetes. Through this operator requests for clusters lead to the provisioning of them. Furthermore, once the provisioning is complete, the operator will also take care of other life cycle aspects, like upgrading Apache NiFi to a new version or terminating a cluster.
Cloudera DataFlow for the Public Cloud users can access the service through the hosted CDP Control Plane, which hosts critical components of CDF-PC like the Catalog, the Dashboard, and the ReadyFlow Gallery.
Source: https://blog.cloudera.com/cloudera-dataflow-for-the-public-cloud-a-technical-deep-dive/
Today, many organizations leverage Apache NiFi to capture and process data across hybrid cloud architectures by visually designing no-code data flows. However, one of the challenges with Apache NiFi is deploying multiple data flows into a single cluster, and these flows compete for resources – leading to performance issues. Some mitigate that issue by sizing a more significant amount of infrastructure than necessary and thus ending up with underutilized infrastructure and higher costs. Furthermore, other challenges they can face are scaling or not having a central overview of the flows.
Dinesh Chandrasekhar, head of product marketing, Data-in-Motion at Cloudera, said in a Cloudera press release:
Cloudera DataFlow automates and manages cloud-native data flows on Kubernetes - and it is something only we offer. Now it is easy for our customers to boost the operational efficiency of their streaming workloads and save on infrastructure costs in the public cloud.
Initially, Cloudera DataFlow for the Public Cloud will be available on the Amazon Web Services (AWS) platform, and Microsoft Azure will be next. And lastly, the pricing details of Cloudera DataFlow for the Public Cloud are available on the pricing page.