InfoQ Homepage Streaming Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Big Data Processing with Apache Spark - Part 2: Spark SQL

Spark SQL, part of Apache Spark big data framework, is used for structured data processing and allows running SQL like queries on Spark data. In this article, Srini Penchikala discusses Spark SQL module and how it simplifies running data analytics using SQL interface. He also talks about the new features in Spark SQL, like DataFrames and JDBC data sources.

Srini Penchikala
on Apr 16, 2015
AI, ML & Data Engineering

Big Data Processing with Apache Spark – Part 1: Introduction

Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. In this article, Srini Penchikala talks about how Apache Spark framework helps with big data processing and analytics with its standard API. He also discusses how Spark compares with traditional MapReduce implementation like Apache Hadoop.

Srini Penchikala
on Jan 30, 2015
AI, ML & Data Engineering

Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse

This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.

Kai Wähner
on Sep 10, 2014
Architecture & Design

Using SEDA to Ensure Service Availability

A new strategy for incorporating event driven architecture for scalability and availability of services in the context of SOA. These strategies are based on queuing research pioneered for the use of highly abailable and scalable services, initially in the Web context, but moving into the SOA and Web services context. Actual implementation is described in the context of Mule.

Rune Schumann Rune Peter Bjornstad
on Oct 11, 2006

Newer Articles

Older Articles

Topics

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Improving Developer Experience Using Automated Data CI/CD Pipelines

Efficient Resource Management with Small Language Models (SLMs) in Edge Computing

Monorepos: Beyond the Technicalities

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Articles

Big Data Processing with Apache Spark - Part 2: Spark SQL

Big Data Processing with Apache Spark – Part 1: Introduction

Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse

Using SEDA to Ensure Service Availability

Cloudflare Advocates for Broader Adoption of security.txt Standard for Vulnerability Reporting

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Steve Klabnik and Herb Sutter Talk about Rust and C++

Improving Developer Experience Using Automated Data CI/CD Pipelines

How Allegro Reduced the Cost of Running a GCP Dataflow Pipeline by 60%

To Dare or not to Dare: the MVA Dilemma

Using DORA for Sustainable Engineering Performance Improvement

Monorepos: Beyond the Technicalities

The Journey of ClearBank From Start-Up To Scale-Up

Hugging Face Launches SmolTools: Practical AI Apps Powered by SmolLM2 Model

Google Debuts OpenAI-compatible API for Gemini

Anthropic Releases New Claude Models and Computer Use Feature

AWS CodeBuild Adds Support for Managed GitLab Runners

Google Cloud Boosts Observability Capabilities with Log Scopes

Uber Achieves Significant Storage Savings with MyRocks Differential Backups

QCon San Francisco

QCon London

InfoQ Dev Summit Boston

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Articles