InfoQ Homepage Infrastructure Content on InfoQ
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
-
Data Science in the Cloud @StitchFix
Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.
-
Petabytes Scale Analytics Infrastructure @Netflix
Tom Gianos and Dan Weeks discuss Netflix' overall big data platform architecture, focusing on Storage and Orchestration, and how they use Parquet on AWS S3 as their data warehouse storage layer.
-
Big Data in the Real World: Technology and Use Cases
Mike Olson presents several use cases where big data is collected and analyzed to gather insights from the automotive, insurance, financial, and other sectors.
-
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark introduces Bayesian Global Optimization as an efficient way to optimize ML model parameters, explaining the underlying techniques and comparing it to other standard methods.
-
Machine Learning and End-to-End Data Analysis Processes in Spark Using Python and R
Debraj GuhaThakurta discusses ML and data analysis processes in Spark using examples written in Python and R.
-
Scaling the Data Infrastructure @Spotify
Mārtiņš Kalvāns and Matti Pehrs overview the Data Infrastructure at Spotify, diving into some of the data infrastructure components, such us Event Delivery, Datamon and Styx.
-
Scaling Counting Infrastructure @Quora
Chun-Ho Hung and Nikhil Garg discuss Quanta, Quora's counting system powering their high-volume near-real-time analytics, describing the architecture, design goals, constraints, and choices made.
-
Orchestrate All the Things! with Spring Cloud Data Flow
Eric Bottard and Ilayaperumal Gopinathan discuss easy composition of microservices with Spring Cloud Data Flow.
-
Java (SE) State of the Union
Gil Tene presents the current state of Java SE and OpenJDK, the role of Java in the Big Data and Infrastructure components, JCP, the ecosystem, trends, etc.
-
Cloud Native Streaming and Event-driven Microservices
Marius Bogoevici demonstrates how to create complex data processing pipelines that bridge the big data and enterprise integration together and how to orchestrate them with Spring Cloud Data Flow.