InfoQ Homepage Big Data Content on InfoQ
-
Using Data Effectively: beyond Art and Science
Hilary Parker talks about approaches and techniques to collect the most useful data, analyze it in a scientific way, and use it most effectively to drive actions and decisions.
-
Big Data and Deep Learning: A Tale of Two Systems
Zhenxiao Luo explains how Uber tackles data caching in large-scale DL, detailing Uber’s ML architecture and discussing how Uber uses Big Data, concluding by sharing AI use cases.
-
Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud
Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.
-
Implementing AutoML Techniques at Salesforce Scale
Matthew Tovbin shows how to build ML models using AutoML (Salesforce), including techniques for automatic data processing, feature generation, model selection, hyperparameter tuning and evaluation.
-
Privacy Ethics – A Big Data Problem
Raghu Gollamudi broadly covers best practices with respect to Data Management aspects from mapping Enterprise data to applying Data Protection rules like GDPR at petabyte scale.
-
What is a Data Citizen?
Caitlin McDonald discusses how big data affects people online and the ethics to be considered when dealing with data.
-
When Data Kills
Cori Crider shares insights from her investigations of US drone strikes in Yemen and Pakistan, and explores how misuse of mass surveillance data has claimed innocent lives.
-
Streaming SQL Foundations: Why I ❤Streams+Tables
Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.
-
Bias in BigData/AI and ML
Leslie Miley discusses how inherent bias in data sets has affected things from the 2016 Presidential race to criminal sentencing in the United States.
-
Scaling with Apache Spark
Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.
-
Serverless Design Patterns with AWS Lambda: Big Data with Little Effort
Tim Wagner discusses Big Data on serverless, showing working examples and how to set up a CI/CD pipeline, demonstrating AWS Lambda with the Serverless Application Model (SAM).
-
Scio: Moving Big Data to Google Cloud, a Spotify Story
Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.