InfoQ Homepage Big Data Content on InfoQ
-
Q&A with Christoph Windheuser on AI Applications in the Industry
Increased hardware power and huge amounts of data are making existing machine learning approaches like pattern recognition, natural language processing, and reinforcement learning possible. Artificial Intelligence is impacting the development process; it’s increasing the complexity of things like version control, CI/CD and testing.
-
Amazon Announces Managed Streaming for Kafka in Public Preview
At the recent AWS re:Invent 2018 event, Amazon announced a new fully managed service that makes it easy for customers to build and run applications that use Apache Kafka to process streaming data. This new service is called Amazon Managed Streaming for Kafka, Amazon MSK for short, and is now in public preview.
-
Google Cloud Announces Transfer Appliance in Beta for Cloud Data Migrations in the EU
Google announced that Transfer Appliance, a high-capacity server that lets customers move large amounts of data to Google Cloud Platform (GCP) quickly and securely, is available in beta in the European Union (EU). Google will handle the data transfer with Transfer Appliance in GCP in the EU, and data will not leave the EU.
-
The Evolution of Uber’s 100+ Petabyte Big Data Platform
Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.
-
Data Lakes and Modern Data Architecture in Clinical Research and Healthcare
Dr. Prakriteswar Santikary, chief data officer at ERT, spoke at Data Architecture Summit 2018 Conference last month about data lake architecture his team developed at their clinical research organization. He discussed the data platform deployed in the cloud to streamline data collection, aggregation and clinical reporting and analytics, using concepts like serverless computing and data services.
-
Event Sourcing to the Cloud at HomeAway
Adam Haines, Data Architect at HomeAway, recently spoke at the Data Architecture Summit 2018 Conference about how his team leverages event sourcing cloud design pattern to accelerate the big data initiatives in their organization.
-
Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings
Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.
-
Agile Data Modeling for NoSQL Databases
Pascal Desmarets recently spoke at Data Architecture Summit 2018 Conference about agile modeling and best practices for NoSQL databases.
-
Redis 5.0 Released with New "Streams" Data Type
Redis recently announced version 5 of its popular database, 15 months after the release of Redis 4. Probably the most important feature of this version is the support for a new data type, Streams. Sorted set functionality has also improved and Redis modules have also been expanded, with the introduction of Clusters and Timers APIs. LOLWUT and other improvements are reviewed in the article...
-
Tim Berners-Lee Introduces "Solid" Decentralized Identity Platform
Solid is a new decentralized identity platform from WWW Creator Tim Berners-Lee. Solid provides a mechanism for users to own and better control the usage of their data.
-
William McKnight on Data Platforms and Creating a Modern Data Architecture
William McKnight gave a keynote presentation last week at Data Architecture Summit 2018 Conference on creating a modern data architecture using different data platforms.
-
High Volume Space Exploration Time-Series Data Storage in PostgreSQL
The European Space Agency Science Data Center (ESDC) switched to PostgreSQL with the TimescaleDB extension for their data storage. ESDC’s diverse data includes structured, unstructured and time series metrics running to hundred of terabytes, and querying requirements across datasets with open source tools.
-
Netflix Keystone Real-Time Stream Processing Platform
Netflix recently published a post in their tech blog discussing the design considerations and insights of Keystone, their Real-time stream processing platform. Keystone has been operational since December 2015 and has grown significantly over the years as Netflix subscribers have grown from 65 to over 130 million in the past 3 years. This article follows on the latest state of Keystone platform...
-
Implementing Privacy by Design in Hyperledger Indy
Centralized identity providers, such as social media sites and consumer email services, provide convenience to users. But this approach creates data privacy and security risks. Hyperledger Indy, an open source blockchain project, is being built to address the current issues that exist in centralized identity providers by taking a 'Privacy by Design' approach to deal with these risks.
-
California Creates Consumer Privacy Act
California has enacted the California Consumer Privacy Act (CCPA) of 2018 which, starting on January 1, 2020, would grant consumers several rights with respect to information about them that businesses collect, store, sold, and share. This is the first legislation of its kind in the United States.