InfoQ Homepage Database Content on InfoQ
-
Google Releases Cloud Dataproc for Kubernetes in Alpha
Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.
-
Google Research Use of Concept Vectors for Image Search
Google recently released research about creating a tool for searching Similar Medical Images Like Yours (SMILY). The research uses embeddings for image-based search and allows users to influence the search through the interactive refinement of concepts.
-
How Statistical Forecasting Can Help You Trust Your Data and Drive Business Agility
Statistical forecasting is a highly effective way to improve delivery predictions and avoid some traditional estimation problems. In a case study from AgileByExample 2018, by Piotr Leszczynski, he says it can also help you understand and trust your data more, and drive improvements in business agility.
-
Amazon Announces General Availability of Quantum Ledger Database
On September 10th, Amazon announced the general availability of Quantum Ledger Database (QLDB), a ledger database based on blockchain technology. As such, QLDB provides a fully managed ledger which can contain multiple tables, implementing an immutable transaction journal, which is cryptographically verifiable, and owned by a centralized trusted authority.
-
Immer JavaScript Immutable State Management Framework Releases V4
Alec Larson released a few days ago the fourth major iteration of award winner JavaScript library Immer, thereby patching an important edge case. Immer is a JavaScript package which allows developers to work with immutable state as it was mutable, by implementing a copy-on-write mechanism.
-
Jagadish Venkatraman on LinkedIn's Journey to Samza 1.0
At the recent ApacheCon North America, Jagadish Venkatraman spoke about how LinkedIn developed Apache Samza 1.0 to handle stream processing at scale. He described LinkedIn's use cases involving trillions of events and petabytes of data, then highlighted the features added for the 1.0 release, including: stateful processing, high-level APIs, and a flexible deployment model.
-
ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes
At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.
-
Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads
In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults
-
Amazon Announces General Availability of Aurora Multi-Master
In a recent announcement, Amazon has publicized the general availability of Aurora Multi-Master, which allows for reading and writing on multiple database instances across several Availability Zones. Consequently, this brings high availability capabilities, as the platform no longer needs to trigger a fail-over upon failure of database instances.
-
An Introduction to Structured Data at Etsy
Etsy recently published a blog post detailing how they store and manage structured data. The Etsy team make extensive use of taxonomies, and store the structured data with JSON files.
-
DigitalOcean Adds Managed MySQL and Redis Services
Cloud provider DigitalOcean recently released a pair of new managed data services. Their Managed MySQL and Redis offerings are on-demand and elastic, and offer a variety of sizes and high-availablity options.
-
Amazon Releases AWS Lake Formation to General Availability
Recently, Amazon announced the general availability (GA) of AWS Lake Formation, a fully managed service that makes it much easier for customers to build, secure, and manage data lakes.
-
Google Releases Enterprise Database Options, Targets SQL Server Customers
In a recent blog post, Google announced enhancements to their existing Google Cloud Platform (GCP) database investments, including Cloud SQL for Microsoft SQL Server in alpha, Federated queries from BigQuery to Cloud SQL and Elastic Cloud on GCP being available in Japan and soon to be released in Sydney, Australia.
-
Data Engineering in Badoo: Handling 20 Billion Events Per Day
Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.
-
Enabling Single Tenant Workloads in the Cloud, Microsoft Introduces Azure Dedicated Host
In a recent blog post, Microsoft announced Azure Dedicated Hosts, a service that allows organizations to run Linux and Windows virtual machines on single-tenant physical servers. This service was introduced to address customer compliance and regulatory requirements. Organizations can also take advantage of Azure Hybrid Benefits which allows them to leverage existing software investments.