InfoQ Homepage Data Pipelines Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Yelp Rebuilds Corrupted Cassandra Cluster Using Its Data Streaming Architecture

Yelp created a solution to sanitize data from the corrupted Apache Cassandra cluster utilizing its data streaming architecture. The team explored many potential options to address the data corruption issue, however, ultimately had to move the data into a new cluster to remove corrupted records in the process.

Rafal Gancarz
on Jul 17, 2023
Architecture & Design

Instacart Creates a Self-Serve Apache Flink Platform on Kubernetes

Instacart moved their Apache Flink workloads from AWS EMR to Kubernetes to meet the high demand for data processing use cases using Flink within the organization, as using EMR became problematic for many teams with different requirements. As a result, they made the platform easier to use and reduced their operational and infrastructure costs.

Rafal Gancarz
on Jul 12, 2023
AI, ML & Data Engineering

Strategies and Principles to Scale and Evolve MLOps - at QCon London

At the QCon London conference, Hien Luu, senior engineering manager for the Machine Learning Platform at DoorDash, discussed strategies and principles for scaling and evolving MLOps. With 85% of ML projects failing, understanding MLOps at an engineering level is crucial. Luu shared three core principles: "Dream Big, Start Small," "1% Better Every Day," and "Customer Obsession."

Roland Meertens
on Apr 02, 2023
Cloud

AWS Publishes Reference Architecture and Implementations for Deployment Pipelines

AWS recently released a reference architecture and a set of reference implementations for deployment pipelines. The recommended architectural patterns are based on best practices and lessons collected at Amazon and customer projects.

Renato Losio
on Feb 18, 2023
Cloud

AWS Glue Now Supports Crawler History

AWS recently launched support for histories of AWS Glue Crawlers, which allows the interrogation of Crawler executions and associated schema changes for the last 12 months.

Nsikan Essien
on Sep 19, 2022
Architecture & Design

Data Collection, Standardization and Usage at Scale in the Uber Rider App

Uber Engineering recently published how it collects, standardises and uses data from the Uber Rider app. Rider data comprises all the rider's interactions with the Uber app. This data accounts for billions of events from Uber's online systems every day. Uber uses this data to deal with top problem areas such as increasing funnel conversion, user engagement, etc.

Eran Stiller
on Sep 22, 2021
Development

QCon Plus November 2021 is Now Hybrid. Attend Online and In-Person (NY & SF)

The QCon Plus software development conference will be back November 1-5, 2021 - online and in-person. Get the chance to engage and network with professionals driving change and innovation inside the world’s most innovative software organizations.

Adelina Turcu
on Jul 17, 2021
Architecture & Design

Designing for Failure in the BBC's Analytics Platform

Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."

Eran Stiller
on Feb 24, 2021
Architecture & Design

PayPal Standardizes on Apache Airflow and Apache Gobblin for Its Next-Gen Data Movement Platform

PayPal recently described how it standardized on Apache Airflow and Apache Gobblin for implementing its next-gen data movement platform. In a recent blog post, PayPal engineers detail how the existing data movement platform evolved into many tools & platforms in a complex and unmanageable ecosystem and their shift towards a new implementation.

Eran Stiller
on Feb 10, 2021
Architecture & Design

Data Mesh Principles and Logical Architecture Defined

The concept of a data mesh provides new ways to address common problems around managing data at scale. Zhamak Dehghani has provided additional clarity around the four principles of a data mesh, with a corresponding logical architecture and organizational structure.

Thomas Betts
on Dec 14, 2020
AI, ML & Data Engineering

Accelerating Machine Learning Lifecycle with a Feature Store

Feature Store is a core part of next generation ML platforms that empowers data scientists to accelerate the delivery of ML applications. Mike Del Balso and Geoff Sims recently spoke at Spark AI Summit 2020 Conference about the feature store driven ML development.

Srini Penchikala
on Jul 20, 2020
Cloud

Amazon Introduces the New Streaming ETL Feature on AWS Glue

Recently, Amazon announced AWS Glue now supports streaming ETL. With this new feature, customers can easily set up continuous ingestion pipelines that prepare streaming data on the fly and make it available for analysis in seconds.

Steef-Jan Wiggers
on May 02, 2020
Cloud

KSQL Now Available on Confluent Cloud

KSQL is the streaming SQL engine for Apache Kafka, and it is currently available as a fully-managed service on the Confluent Cloud Platform for all its customers on usage-based billing plans. In a recent blog post, Confluent announced the availability of Confluent Cloud KSQL.

Steef-Jan Wiggers
on Apr 25, 2020
AI, ML & Data Engineering

Michael Berthold on End-to-End Data Science Using KNIME Software

Open source data analytics platform KNIME CEO and co-founder Michael Berthold gave the keynote presentation at this year's KNIME Fall Summit 2019 Conference. He spoke about the end-to-end data science cycle. The data science process lifecycle mainly involves create and productionize categories.

Srini Penchikala
on Nov 30, 2019
AI, ML & Data Engineering

High-Performance Data Processing with Spring Cloud Data Flow and Geode

Cahlen Humphreys and Tiffany Chang spoke recently at the SpringOne Platform 2019 Conference about data processing with Spring Cloud Data Flow and Apache Geode frameworks.

Srini Penchikala
on Oct 30, 2019

Newer News

Older News

InfoQ Software Architects' Newsletter

News