InfoQ Homepage Data Pipelines Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

How Agoda Unified Multiple Data Pipelines into a Single Source of Truth

Agoda recently described how it consolidated multiple independent data pipelines into a centralized Apache Spark-based platform to eliminate inconsistencies in financial data. The company implemented a multi-layered quality framework that combines automated validations, machine-learning-based anomaly detection, and data contracts, while processing millions of daily booking transactions.

Eran Stiller
on Jan 14, 2026
Architecture & Design

Solving Fragmented Mobile Analytics: Uber’s Platform-Led Approach

Uber Engineering outlines its platform-led mobile analytics redesign, standardizing event instrumentation across iOS and Android to improve cross-platform consistency, reduce engineering effort, and provide reliable insights for product and data teams.

Leela Kumili
on Jan 13, 2026
Cloud

Cloudflare Workflows Adds Python Support for Durable AI Pipelines

Innovative Cloudflare Workflows now supports both TypeScript and Python, enabling developers to orchestrate complex applications seamlessly. With durable execution and state persistence, it simplifies the development of robust data pipelines and AI/ML models. Experience enhanced concurrency and intuitive design, making orchestration effortless for Python enthusiasts.

Steef-Jan Wiggers
on Nov 19, 2025
Architecture & Design

Inside Atlassian Lithium: How a Dynamic ETL Platform is Transforming Data Movement and Cutting Costs

Atlassian recently introduced Lithium, an in-house ETL platform designed to meet the requirements of dynamic data movement. Lithium streamlines tasks such as cloud migrations, scheduled backups, and in-flight data validations by supporting ephemeral pipelines and tenant-level isolation while ensuring efficiency and scalability, resulting in significant cost savings.

Eran Stiller
on Jan 29, 2025
DevOps

Netflix Enhances Metaflow with New Configuration Capabilities

Netflix has introduced a significant enhancement to its Metaflow machine learning infrastructure: a new Config object that brings powerful configuration management to ML workflows. This addition addresses a common challenge faced by Netflix's teams, which manage thousands of unique Metaflow flows across diverse ML and AI use cases.

Claudio Masolo
on Jan 10, 2025
Architecture & Design

How Allegro Reduced the Cost of Running a GCP Dataflow Pipeline by 60%

Allegro achieved significant savings for one of the Dataflow Pipelines running on GCP Big Data. The company continues working on improving the cost-effectiveness of its data workflows by evaluating resource utilization, enhancing pipeline configurations, optimizing input and output datasets, and improving storage strategies.

Rafal Gancarz
on Nov 13, 2024
Architecture & Design

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

Rafal Gancarz
on Aug 07, 2024
Cloud

Local Emulator for Azure Event Hubs in Preview: Offering Developers a Local Development Experience

Microsoft recently launched the local emulator's preview release for Azure Event Hubs. This emulator is designed to give developers a local development experience for Azure Event Hubs, allowing them to develop and test code against the services in isolation.

Steef-Jan Wiggers
on Jun 04, 2024
Architecture & Design

Yelp Overhauls Its Streaming Architecture with Apache Beam and Apache Flink

Yelp reworked its data streaming architecture by employing Apache Beam and Apache Flink. The company replaced a fragmented set of data pipelines for streaming transactional data into its analytical systems, like Amazon Redshift and in-house data lake, using Apache data streaming projects to create a unified and flexible solution.

Rafal Gancarz
on Apr 22, 2024
Architecture & Design

Netflix Creates Incremental Processing Solution Using Maestro and Apache Iceberg

Netflix created a new solution for incremental processing in its data platform. The incremental approach reduces the cost of computing resources and execution time significantly as it avoids processing complete datasets. The company used its Maestro workflow engine and Apache Iceberg to improve data freshness and accuracy and plans to provide managed backfill capabilities.

Rafal Gancarz
on Jan 15, 2024
Architecture & Design

Goldsky’s Streaming-First Architecture for Blockchain Data with Flink, Redpanda and Kubernetes

Goldsky created a platform for the real-time processing of blockchain data. The platform allows clients to extract data from blockchains into their own databases to support product features, but without running the data pipeline infrastructure. The event-driven architecture (EDA) of Goldsky leverages Apache Flink, Redpanda, Kubernetes, and cloud provider services.

Rafal Gancarz
on Oct 30, 2023
AI, ML & Data Engineering

A Modern Compute Stack for Scaling Large AI, ML, & LLM Workloads at QCon SF

Jules Damji, a lead developer advocate at Anyscale Inc., discussed the difficulties data scientists encounter when managing infrastructure for machine learning models. He emphasized the necessity for a framework that supports the latest machine learning libraries, is easily manageable, and can scale to accommodate large datasets and models. Damji introduced Ray as a potential solution.

Andrew Hoblitzell
on Oct 06, 2023
Cloud

Confluent Announces Apache Flink on Confluent Cloud in Open Preview

Confluent recently announced the open preview of Apache Flink on Confluent Cloud as a fully-managed service for stream processing. The company claims that the managed service will make it easier for companies to filter, join, and enrich data streams with Flink.

Steef-Jan Wiggers
on Sep 29, 2023
DevOps

Running Apache Flink Applications on AWS KDA: Lessons Learnt at Deliveroo

Deliveroo introduced Apache Flink into its technology stack for enriching and merging events consumed from Apache Kafka or Kinesis Streams. The company opted to use AWS Kinesis Data Analytics (KDA) service to manage Apache Flink clusters on AWS and shared its experiences from running Flink applications on KDA.

Rafal Gancarz
on Aug 16, 2023
Architecture & Design

Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers

Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.

Rafal Gancarz
on Jul 26, 2023

Newer News

Older News

InfoQ Software Architects' Newsletter

News