InfoQ Homepage Big Data Content on InfoQ
-
Robust Foundation for Data Pipelines at Scale - Lessons from Netflix
Jun He and Harrington Joseph share their experiences of building and operating the orchestration platform for Netflix’s big data ecosystem.
-
Privacy Architecture for Data-Driven Innovation
Nishant Bhajaria discusses how to set up a privacy program and shares tips on how to influence engineering and other teams to own their data and its usage so that privacy is a shared goal.
-
What Does It Mean to Be a Data Scientist? Definitions and Lessons Learned from the Trenches
Brian Korzynski discusses what Data Science and Big Data are, focusing on the data preparation that needs to take place, and making a distinction between ML issues and programming.
-
Big Data Legal Issues. GDPR and Contracts
Anton Tarasiuk discusses the legal issues that can be encountered when dealing with Big Data, GDPR and contracts.
-
Big Data's Ethical Drought: The Thirst for More Data Has Led to a Lapse in Ethics and Privacy
Katharine Jarmul provides examples of data (mis)use and asking how we can work with data without violating the trust and privacy of users, producing an ethical product?
-
Putting the Spark in Functional Fashion Tech Analytics
Gareth Rogers shows how his team used Clojure to provide a solid platform to connect and manage an AWS hosted analytics pipeline and the pitfalls they encountered on the way.
-
Apache Metron in the Real World – Big Data and Cybersecurity, a Perfect Match
Dave Russell takes a look at a number of different organizations who are on their big data cybersecurity journey with Apache Metron.
-
Petastorm: A Light-Weight Approach to Building ML Pipelines
Yevgeni Litvin describes how Petastorm facilitates tighter integration between Big Data and Deep Learning worlds, simplifies data management and data pipelines, and speeds up model experimentation.
-
People You May Know: Fast Recommendations over Massive Data
Sumit Rangwala and Felix GV present the evolution of PYMK’s architecture, focusing on Gaia, a real-time graph computing capability, and Venice, an online feature store with scoring capability.
-
Productionizing H2O Models with Apache Spark
Jakub Hava demonstrates the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.
-
Winning Ways for Your Visualization Plays
Mark Grundland explores practical techniques for information visualization design to take better account of the fundamental limitations of visual perception.
-
Migrating from Big Data Architecture to Spring Cloud
Lenny Jaramillo discusses how Northern Trust migrated to PCF, highlighting how this helped them accelerate the delivery of functionality to their customers.