InfoQ Homepage Spark Content on InfoQ
-
Hunting Criminals with Hybrid Analytics
David Talby demos using Python libraries to build a ML model for fraud detection, scaling it up to billions of events using Spark, and what it took to make the system perform and ready for production.
-
Apache Spark for Big Data Processing
Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.
-
The Lego Model for Machine Learning Pipelines
Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.
-
Spring XD Today and Tomorrow
Mark Pollack discusses Spring XD and its integration driven by the Big Data ecosystem at large such as Kafka, Spark, functional programming, integration with Python, and designer/monitoring UIs.
-
How 30 Years of Ticket Transaction Data Helps you Discover New Shows!
Vaclav Petricek discusses how to train models, architect and build a scalable system powered by Storm, Hadoop, Spark, Spring Boot and Vowpal Wabbit that meets SLAs measured in tens of milliseconds.
-
Financial Modeling with Apache Spark: Calculating Value at Risk
Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.
-
Lightning Fast Cluster Computing with Spark and Cassandra
Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.
-
A Taste of Random Decision Forests on Apache Spark
Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.
-
Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets
Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.
-
Why Spark Is the Next Top (Compute) Model
Dean Wampler argues that Spark/Scala is a better data processing engine than MapReduce/Java because tools inspired by mathematics, such as FP, are ideal tools for working with data.
-
Unified Big Data Processing with Apache Spark
Matei Zaharia talks about the latest developments in Spark and shows examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code.
-
Apache Spark Plus Many Other Frameworks: How Spark Fits into the Big Data Landscape
Paco Nathan keynotes on how Spark fits into the big data landscape, describing what other systems work with Spark, and explaining why Spark is needed in the future.