InfoQ Homepage Spark Content on InfoQ

Presentations

RSS Feed

Newer Older

AI, ML & Data Engineering

Hunting Criminals with Hybrid Analytics

David Talby demos using Python libraries to build a ML model for fraud detection, scaling it up to billions of events using Spark, and what it took to make the system perform and ready for production.

David Talby
on May 10, 2016

Icon

41:26
AI, ML & Data Engineering

Apache Spark for Big Data Processing

Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.

Ludwine Probst Ilayaperumal Gopinathan
on Feb 14, 2016

Icon

01:24:27
AI, ML & Data Engineering

The Lego Model for Machine Learning Pipelines

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

Leah McGuire
on Jan 16, 2016

Icon

49:07
AI, ML & Data Engineering

Spring XD Today and Tomorrow

Mark Pollack discusses Spring XD and its integration driven by the Big Data ecosystem at large such as Kafka, Spark, functional programming, integration with Python, and designer/monitoring UIs.

Mark Pollack
on Jan 08, 2016

Icon

01:06:08
How 30 Years of Ticket Transaction Data Helps you Discover New Shows!

Vaclav Petricek discusses how to train models, architect and build a scalable system powered by Storm, Hadoop, Spark, Spring Boot and Vowpal Wabbit that meets SLAs measured in tens of milliseconds.

Vaclav Petricek
on Aug 19, 2015

Icon

48:46
Financial Modeling with Apache Spark: Calculating Value at Risk

Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.

Sandy Ryza
on Jul 12, 2015

Icon

42:33
Lightning Fast Cluster Computing with Spark and Cassandra

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

Piotr Kołaczkowski
on Jun 17, 2015

Icon

49:53
A Taste of Random Decision Forests on Apache Spark

Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.

Sean Owen
on Apr 28, 2015

Icon

48:14
AI, ML & Data Engineering

Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets

Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.

Eugene Mandel
on Mar 12, 2015

Icon

35:16
Why Spark Is the Next Top (Compute) Model

Dean Wampler argues that Spark/Scala is a better data processing engine than MapReduce/Java because tools inspired by mathematics, such as FP, are ideal tools for working with data.

Dean Wampler
on Dec 15, 2014

Icon

42:05
Unified Big Data Processing with Apache Spark

Matei Zaharia talks about the latest developments in Spark and shows examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code.

Matei Zaharia
on Dec 11, 2014

Icon

51:36
Apache Spark Plus Many Other Frameworks: How Spark Fits into the Big Data Landscape

Paco Nathan keynotes on how Spark fits into the big data landscape, describing what other systems work with Spark, and explaining why Spark is needed in the future.

Paco Nathan
on Nov 09, 2014

Icon

30:34