InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Relational Data at the Edge
Justin Kwan and Vignesh Ravichandran discuss Cloudflare’s edge database architecture, unique challenges and practices for data replication, failover and recovery, and custom performance techniques.
-
Redesigning OLTP for a New Order of Magnitude
Joran Greef discusses TigerBeetle, a new database, and why OLTP has a growing impedance mismatch, why the OLTP workload is becoming more contentious, why row locks, why storage faults, write stalls.
-
Enabling Remote Query Execution through DuckDB Extensions
Stephanie Wang focuses on DuckDB’s extension model, and on query execution and planning, which is a use case of this DuckDB extension model.
-
Multi-Region Data Streaming with Redpanda
Michał Maślanka introduces the design of Redpanda’s Multi-Region feature, and describes how they leveraged Raft’s properties, a constraint solver, automatic data balancing, and tiered storage.
-
In-Process Analytical Data Management with DuckDB
Hannes Mühleisen discusses DuckDB, an analytical data management system that is built for an in-process use case. DuckDB speaks SQL, is integrated as a library, and uses query processing techniques.
-
Going beyond the Case of Black Box AutoML
Kiran Kate covers the basics of AutoML and then presents Lale (https://github.com/IBM/lale), an open-source scikit-learn compatible AutoML library which implements Gradual AutoML.
-
Graph Learning at the Scale of Modern Data Warehouses
Subramanya Dulloor outlines an approach to addressing the challenges of warehouses and shows how to build an efficient and scalable end-to-end system for graph learning in data warehouses.
-
How Netflix Ensures Highly-Reliable Online Stateful Systems
Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.
-
PRQL: a Simple, Powerful, Pipelined SQL Replacement
Aljaž Mur Eržen discusses PRQL, a language that can be compiled to most SQL dialects, which makes it portable and reusable, important factors of OLAP.
-
Ephemeral Execution is the Future of Computing, but What about the Data?
Jerop Kipruto and Christie Warwick use Tekton to explore challenges of data gravity in ephemeral execution, discussing clean container injection mechanisms and a secure server interface.
-
Simplifying Real-Time ML Pipelines with Quix Streams
Tomáš Neubauer discusses Quix Streams, an open-source Python library that helps data scientists and ML engineers to build real-time ML pipelines.
-
Improve Feature Freshness in Large Scale ML Data Processing
Zhongliang Liang covers the impact of feature freshness on model performance, discussing various strategies and techniques that can be used to improve feature freshness.