InfoQ Homepage Data Content on InfoQ
-
1BRC–Nerd Sniping the Java Community
Gunnar Morling discusses some of the tricks employed by the fastest solutions for processing a 13 GB input file within less than two seconds through parallelization and efficient memory access.
-
Architecting for Data Products
Danilo Sato discusses what constitutes a data product and different types of data products, how data products support data architecture at different levels, skills and team topologies needed.
-
Incremental Data Processing with Apache Hudi
The presenters discuss an introduction to incremental data processing, contrasting it with the two prevalent processing models of today - batch and stream data processing.
-
Understanding Architectures for Multi-Region Data Residency
Alex Strachan discusses challenges to build multi-region data storages, understanding why and when a business needs to do this, who are the real stakeholders, and who owns what.
-
Multi-Region Data Streaming with Redpanda
Michał Maślanka introduces the design of Redpanda’s Multi-Region feature, and describes how they leveraged Raft’s properties, a constraint solver, automatic data balancing, and tiered storage.
-
Graph Learning at the Scale of Modern Data Warehouses
Subramanya Dulloor outlines an approach to addressing the challenges of warehouses and shows how to build an efficient and scalable end-to-end system for graph learning in data warehouses.
-
How Netflix Ensures Highly-Reliable Online Stateful Systems
Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.
-
Ephemeral Execution is the Future of Computing, but What about the Data?
Jerop Kipruto and Christie Warwick use Tekton to explore challenges of data gravity in ephemeral execution, discussing clean container injection mechanisms and a secure server interface.
-
Improve Feature Freshness in Large Scale ML Data Processing
Zhongliang Liang covers the impact of feature freshness on model performance, discussing various strategies and techniques that can be used to improve feature freshness.
-
The Rise of the Serverless Data Architectures
Gwen Shapira explores the implications of serverless workloads on the design of data stores, and the evolution of data architectures toward more flexible scalability.
-
Building High-Fidelity Data Streams
Sid Anand discusses how they built a lossless streaming data system that guarantees sub-second (p95) event delivery at scale with better than three nines availability.
-
What is Derived Data? (and Do You Already Have Any?)
Felix GV explains what derived data is, and dives into four major use cases which fit in the derived data bucket, including: graphs, search, OLAP and ML feature storage.