InfoQ Homepage Apache Spark Content on InfoQ
-
Big Data Processing with Apache Spark - Part 4: Spark Machine Learning
In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concepts and Spark MLlib library for running predictive analytics using a sample application.
-
Big Data Processing with Apache Spark - Part 3: Spark Streaming
In this article, third installment of Apache Spark series, author Srini Penchikala discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application.
-
Health Informatics and Survival Prediction of Cancer with Apache Spark Machine Learning Library
In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.
-
Big Data Processing with Apache Spark - Part 2: Spark SQL
Spark SQL, part of Apache Spark big data framework, is used for structured data processing and allows running SQL like queries on Spark data. In this article, Srini Penchikala discusses Spark SQL module and how it simplifies running data analytics using SQL interface. He also talks about the new features in Spark SQL, like DataFrames and JDBC data sources.
-
Big Data Processing with Apache Spark – Part 1: Introduction
Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. In this article, Srini Penchikala talks about how Apache Spark framework helps with big data processing and analytics with its standard API. He also discusses how Spark compares with traditional MapReduce implementation like Apache Hadoop.