InfoQ Homepage Big Data Content on InfoQ
-
Video Stream Analytics Using OpenCV, Kafka and Spark Technologies
What is the role of video streaming data analytics in data science space. Learn how to implement a motion detection use case using a sample application based on OpenCV, Kafka and Spark Technologies.
-
Apache Beam Interview with Frances Perry
InfoQ Interviews Apache Beam's Frances Perry about the impetus for using Beam and the future of the top-level open source project and covers the thoughts behind the programming model as well as some of the touch-points in integration with other data engineering tools like Apache Spark and Flink.
-
Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included! (Part 2)
Goldman Sachs is widely known as a leader in investment banking, but they are very much a leading technology firm as well. Continuing our exploration of Reladomo, the primary Java ORM used at GS and now open source, GS Technology Fellow, Mohammad Rezaei looks at advanced features, such as sharding, caching, bitemporal access, performance, and testing.
-
Machine Learning Techniques for Predictive Maintenance
In this article, the authors explore how we can build a machine learning model to do predictive maintenance of systems. They discuss a sample application using NASA engine failure dataset to predict the Remaining Useful Time (RUL) with regression models.
-
Predicting Movie Ratings: NLP Tools is What Film Studios Need
In this article, the author discusses how to use Natural Language Processing (NLP) techniques to predict the movie ratings using the data shared on social media platforms. Sentiment analysis of movie reviews can also be used to classify movies into different genres and to improve the movie recommendation systems.
-
From Alibaba to Apache: RocketMQ’s Past, Present, and Future
Feng Jia and Wang Xiaorui share the core distributed systems principals behind RocketMQ, Alibaba's distributed messaging and data streaming platform now open sourced through the Apache Foundation.
-
Building Pipelines for Heterogeneous Execution Environments for Big Data Processing
The Pipeline61 framework supports the building of data pipelines involving heterogeneous execution environments. It reuses the existing code of the deployed jobs in different environments and provides version control and dependency management that deals with typical software engineering issues. A real-world case study shows its effectiveness.
-
Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included!
Goldman Sachs is widely known as a leader in investment banking, but they are very much a leading technology firm as well. Reladomo is the primary Java ORM used at GS, and it is now open source. In this article GS Technology Fellow, Mohammad Rezaei, takes us on a deep dive into Reladomo.
-
Big Data Processing Using Apache Spark - Part 6: Graph Data Analytics with Spark GraphX
In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
-
Three Experts on Big Data Engineering
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
-
Learning Paths: QCon London Expert Recommendations
Advice on the best talks to attend at QCon London 2017 from London Thought Leaders.
-
Q&A with Immuta on the Implications of EU’s General Data Protection Regulation (GDPR)
InfoQ talked with Immuta’s Andrew Burt and Steve Touw, to better understand the implications and challenges of the EU's Global Data Protection Regulation, which will come into effect in May 2018.