InfoQ Homepage Big Data Content on InfoQ

Presentations

RSS Feed

Newer Older

AI, ML & Data Engineering

Developing Real-time Data Pipelines with Apache Kafka

Joe Stein makes an introduction for developers about why and how to use Apache Kafka. Apache Kafka is a publish-subscribe messaging system rethought of as a distributed commit log.

Joe Stein
on Mar 04, 2016

Icon

01:30:26
AI, ML & Data Engineering

Apache Spark for Big Data Processing

Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.

Ludwine Probst Ilayaperumal Gopinathan
on Feb 14, 2016

Icon

01:24:27
AI, ML & Data Engineering

The Lego Model for Machine Learning Pipelines

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

Leah McGuire
on Jan 16, 2016

Icon

49:07
Java

Tuning Java for Big Data

Scott Seighman discusses causes of common performance issues in Big Data environments, heap size, garbage collection, JVM reuse tuning guidelines and Big Data performance analysis tools.

Scott Seighman
on Oct 28, 2015

Icon

54:52
AI, ML & Data Engineering

Ground-up Introduction to In-memory Data

Viktor Gamov covers In-Memory technology, distributed data topologies, making in-memory reliable, scalable and durable, when to use NoSQL, and techniques for Big In-Memory Data.

Viktor Gamov
on Oct 10, 2015

Icon

44:53
AI, ML & Data Engineering

Pulsar: Real-time Analytics at Scale

Sharad Murthy & Tony Ng present Pulsar, a real-time streaming system which can scale to millions of events per second with high availability and 4GL language support.

Tony Ng Sharad Murthy
on Sep 13, 2015

Icon

44:41
Development

Exploratory Data Analysis with R

Matthew Renze introduces the R programming language and demonstrates how R can be used for exploratory data analysis.

Matthew Renze
on Sep 13, 2015

Icon

48:35
AI, ML & Data Engineering

Spreadsheets for Developers

Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.

Felienne Hermans
on Sep 11, 2015

Icon

01:21:47
AI, ML & Data Engineering

The Many Faces of Apache Kafka: How is Kafka Used in Practice

Neha Narkhede discusses how companies are using Apache Kafka and where it fits in the Big Data ecosystem.

Neha Narkhede
on Aug 27, 2015

Icon

42:09
Financial Modeling with Apache Spark: Calculating Value at Risk

Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.

Sandy Ryza
on Jul 12, 2015

Icon

42:33
Lightning Fast Cluster Computing with Spark and Cassandra

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

Piotr Kołaczkowski
on Jun 17, 2015

Icon

49:53
Translating Imperative Code to MapReduce

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

Cosmin Radoi Manu Sridharan Stephen J Fink Rodric Rabbah
on Jun 10, 2015

Icon

19:02