InfoQ Homepage Big Data Content on InfoQ
-
Conquering the Challenges of Data Preparation for Predictive Maintenance
Predictive maintenance (PdM) applications aim to apply machine learning (ML) on IIoT datasets in order to reduce occupational hazards, machine downtime, and other costs. In this article, the author addresses some of the data preparation challenges faced by the industrial practitioners of ML and the solutions for data ingest and feature engineering related to PdM.
-
Analytics Zoo: Unified Analytics + AI Platform for Distributed Tensorflow, and BigDL on Apache Spark
In this article we described how Analytics Zoo can help real-world users to build end-to-end deep learning pipelines for big data, including unified pipelines for distributed TensorFlow and Keras on Apache Spark, easy-to-use abstractions such as transfer learning and Spark ML pipeline support, built-in deep learning models and reference use cases, etc.
-
Sentiment Analysis: What's with the Tone?
Sentiment analysis is widely applied in voice of the customer (VOC) applications. In this article, the authors discuss NLP-based Sentiment Analysis based on machine learning (ML) and lexicon-based approaches using KNIME data analysis tools.
-
Spark Application Performance Monitoring Using Uber JVM Profiler, InfluxDB and Grafana
In this article, author Amit Baghel discusses how to monitor the performance of Apache Spark based applications using technologies like Uber JVM Profiler, InfluxDB database and Grafana data visualization tool.
-
Natural Language Processing with Java - Second Edition: Book Review and Interview
Natural Language Processing with Java - Second Edition book covers the Natural Language Processing (NLP) topic and various tools developers can use in their applications. Technologies discussed in the book include Apache OpenNLP and Stanford NLP. InfoQ spoke with co-author Richard Reese about the book and how NLP can be used in enterprise applications.
-
Democratizing Stream Processing with Apache Kafka® and KSQL - Part 2
In this article, author Robin Moffatt shows how to use Apache Kafka and KSQL to build data integration and processing applications with the help of an e-commerce sample application. Three use cases discussed: customer operations, operational dashboard, and ad-hoc analytics.
-
How to Choose a Stream Processor for Your App
Choosing a stream processor for your app can be challenging with many options to choose from. The best choice depends on individual use cases. In this article, the authors discuss a stream processor reference architecture, key features required by most streaming applications and optional features that can be selected based on specific use cases.
-
Analyzing and Preventing Unconscious Bias in Machine Learning
This article is based on Rachel Thomas’s keynote presentation, “Analyzing & Preventing Unconscious Bias in Machine Learning” at QCon.ai 2018. Thomas talks about the pitfalls and risk the bias in machine learning brings to the decision-making process. She discusses three use cases of machine learning bias.
-
Q&A on the Book Testing in the Digital Age
The Book Testing in the Digital Age by Tom van de Ven, Rik Marselis, and Humayun Shaukat, explains the impact that developments like robotics, artificial intelligence, internet of things, and big data are having in testing. It explores the challenges and possibilities that the digital age brings us when it comes to testing software systems.
-
Democratizing Stream Processing with Apache Kafka and KSQL - Part 1
In this article, author Michael Noll discusses the stream processing with KSQL, the streaming SQL engine for Apache Kafka. Topics covered include challenges of stateful stream processing and how KSQL addresses them, and how KSQL helps to bridge the world of streams and databases through streams and tables.
-
FPGAs Supercharge Computational Performance
Originally used in the development of new hardware, new, cloud-based FPGAs are making the technology more accessible. The dramatic improvements in speed and lower costs over traditional CPUs means more companies can start benefiting from the technology. FPGAs are fundamentally concurrent, which makes them an ideal tool for data-intensive, parallel processing problems.
-
Big Data and Big Money: The Role of Data in the Financial Sector
When we consider the 3Vs of big data— volume, velocity, and variety—it is hard to think of many sectors whose requirements fit so nicely into the guidelines at finance.