InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
In-Process Analytical Data Management with DuckDB
Hannes Mühleisen discusses DuckDB, an analytical data management system that is built for an in-process use case. DuckDB speaks SQL, is integrated as a library, and uses query processing techniques.
-
Going beyond the Case of Black Box AutoML
Kiran Kate covers the basics of AutoML and then presents Lale (https://github.com/IBM/lale), an open-source scikit-learn compatible AutoML library which implements Gradual AutoML.
-
Graph Learning at the Scale of Modern Data Warehouses
Subramanya Dulloor outlines an approach to addressing the challenges of warehouses and shows how to build an efficient and scalable end-to-end system for graph learning in data warehouses.
-
How Netflix Ensures Highly-Reliable Online Stateful Systems
Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.
-
PRQL: a Simple, Powerful, Pipelined SQL Replacement
Aljaž Mur Eržen discusses PRQL, a language that can be compiled to most SQL dialects, which makes it portable and reusable, important factors of OLAP.
-
Ephemeral Execution is the Future of Computing, but What about the Data?
Jerop Kipruto and Christie Warwick use Tekton to explore challenges of data gravity in ephemeral execution, discussing clean container injection mechanisms and a secure server interface.
-
Simplifying Real-Time ML Pipelines with Quix Streams
Tomáš Neubauer discusses Quix Streams, an open-source Python library that helps data scientists and ML engineers to build real-time ML pipelines.
-
Improve Feature Freshness in Large Scale ML Data Processing
Zhongliang Liang covers the impact of feature freshness on model performance, discussing various strategies and techniques that can be used to improve feature freshness.
-
The Rise of the Serverless Data Architectures
Gwen Shapira explores the implications of serverless workloads on the design of data stores, and the evolution of data architectures toward more flexible scalability.
-
Amazon DynamoDB Distributed Transactions at Scale
Akshat Vig explains how transactions were added to Amazon DynamoDB using a timestamp-based ordering protocol to achieve low latency for both transactional and non-transactional operations.
-
Responsible AI: from Principle to Practice!
Mehrnoosh Sameki discusses Responsible AI best practices to apply in a machine learning lifecycle and shares open source tools to incorporate to implement Responsible AI in practice.
-
Needle in a 930M Member Haystack: People Search AI @LinkedIn
Mathew Teoh explores how LinkedIn's People Search system uses ML to surface the right person that you're looking for.