InfoQ Homepage Database Content on InfoQ
-
Big Data Infrastructure @ LinkedIn
Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.
-
Performance and Search
Dan Luu discusses how to estimate performance using back of the envelope calculations that can be done in minutes or hours, even for applications that take months or years to implement.
-
Scaling up Near Real-Time Analytics @Uber &LinkedIn
Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Stream Processing & Analytics with Flink @Uber
Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.
-
Demistifying DynamoDB Streams
Akshat Vig and Khawaja Shams discuss DynamoDB Streams and what it takes to build an ordered, highly available, durable, performant, and scalable replicated log stream.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
-
Data Cleansing and Understanding Best Practices
Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.
-
SQL Server on Linux: Will it Perform or Not?
Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.
-
Practical Data Synchronization Using CRDTs
Dmitry Ivanov discusses the basic CRDTs implementations in Scala, explaining the advantages of these data structures to solve many synchronization problems as well as their limitations.
-
ScyllaDB: Achieving No-Compromise Performance
Avi Kivity discusses ScyllaDB, the many necessary design decisions, from the programming language and programming model through low-level details and up to the advanced cache design, and more.
-
Data Science in the Cloud @StitchFix
Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.