InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage Data Analysis Content on InfoQ

News

RSS Feed

Newer Older

Cloud

Google Cloud Launches C4 Machine Series: High-Performance Computing and Data Analytics

Google Cloud recently announced the general availability of its new C4 machine series, powered by 4th Gen Intel Xeon Scalable Processors (Sapphire Rapids). The series offers a range of configurations tailored to meet the needs of demanding applications such as high-performance computing (HPC), large-scale simulations, and data analytics.

Steef-Jan Wiggers
on Aug 27, 2024
Cloud

Data Solutions Framework: an Open Source Project for Building Data Solutions on AWS

AWS recently released the Data Solutions Framework (DSF), an opinionated open-source framework designed to accelerate the creation of data solutions on AWS. Built using the AWS CDK, the framework exposes abstractions and patterns as building blocks for constructing data solutions and is available in TypeScript (npm) and Python (PyPi).

Renato Losio
on Mar 02, 2024
Cloud

Amazon Q Data Integration in AWS Glue Simplifies Data Transformation on AWS

Recently, AWS announced the preview of a new feature for AWS Glue, enabling customers to use natural language for authoring and troubleshooting data integration jobs. With Amazon Q data integration in AWS Glue, developers can provide a description of their data integration workload, and the service will generate an ETL script.

Renato Losio
on Feb 25, 2024
AI, ML & Data Engineering

Spotify's Approach to Leverage Recursive Embedding and Clustering to Enhanced Data Explainability

One of the main challenges of any online business is to get actionable insight from their data for decision-making. Spotify shares its methodology and experience to solve this problem by clustering diverse data sets through a unique method involving dimensionality reduction, recursion, and supervised machine learning.

Reza Rahimi
on Jan 19, 2024
Architecture & Design

Netflix Creates Incremental Processing Solution Using Maestro and Apache Iceberg

Netflix created a new solution for incremental processing in its data platform. The incremental approach reduces the cost of computing resources and execution time significantly as it avoids processing complete datasets. The company used its Maestro workflow engine and Apache Iceberg to improve data freshness and accuracy and plans to provide managed backfill capabilities.

Rafal Gancarz
on Jan 15, 2024
Architecture & Design

ClickHouse Keeper: Efficient Apache ZooKeeper Alternative Created with C++ and Raft

ClickHouse project team created an in-house replacement for Apache Zookeeper as it needed a more efficient implementation that would also address some of Zookeeper's shortcomings. Now, ClickHouse Keeper is an essential part of the ClickHouse project and a cornerstone of this open-source analytical database, but can also be used independently for many distributed coordination use cases.

Rafal Gancarz
on Dec 01, 2023
AI, ML & Data Engineering

KubeCon NA 2023: Kubernetes Storage Platform to Run Real-Time Analytic Databases

Kubernetes storage platform provides a portable and flexible foundation for data management to help developers build their own data solutions. Robert Hodges spoke last week at KubeCon CloudNativeCon North America 2023 Conference on different techniques his teams developed to build their own data platform.

Srini Penchikala
on Nov 15, 2023
Cloud

Confluent Announces Apache Flink on Confluent Cloud in Open Preview

Confluent recently announced the open preview of Apache Flink on Confluent Cloud as a fully-managed service for stream processing. The company claims that the managed service will make it easier for companies to filter, join, and enrich data streams with Flink.

Steef-Jan Wiggers
on Sep 29, 2023
DevOps

Running Apache Flink Applications on AWS KDA: Lessons Learnt at Deliveroo

Deliveroo introduced Apache Flink into its technology stack for enriching and merging events consumed from Apache Kafka or Kinesis Streams. The company opted to use AWS Kinesis Data Analytics (KDA) service to manage Apache Flink clusters on AWS and shared its experiences from running Flink applications on KDA.

Rafal Gancarz
on Aug 16, 2023
Architecture & Design

Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers

Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.

Rafal Gancarz
on Jul 26, 2023
Cloud

AWS Introduces New Clickstream Analytics on AWS Solution for Mobile and Web Applications

AWS recently announced a new service called Clickstream Analytics on AWS, an end-to-end solution to collect, ingest, analyze, and visualize clickstream data inside organizations’ web and mobile applications.

Steef-Jan Wiggers
on Jul 14, 2023
Cloud

Unified Analytics Platform: Microsoft Fabric

At the recent annual Build Conference, Microsoft introduced a unified analytics platform with Microsoft Fabric that brings together all the data and analytics that organizations need.

Steef-Jan Wiggers
on Jun 01, 2023
Cloud

AWS Introduces Athena Provisioned Capacity

AWS recently announced a new feature Provisioned Capacity for Athena, that allows users to run SQL queries on fully-managed compute capacity for a fixed price and no long-term commitments.

Steef-Jan Wiggers
on May 04, 2023
Architecture & Design

Netflix Built a Scalable Annotation Service Using Cassandra, Elasticsearch and Iceberg

Netflix recently published how it built Marken, a scalable annotation service using Cassandra, ElasticSearch and Iceberg. Marken allows storing and querying annotations, or tags, on arbitrary entities. Users define versioned schemas for their annotations, which include out-of-the-box support for temporal and spatial objects.

Eran Stiller
on Feb 22, 2023
Java

Apache Druid 25.0 Delivers Multi-Stage Query Engine and Kubernetes Task Management

Apache Druid is a high-performance real-time datastore and its latest release, version 25.0, provides many improvements and enhancements. The main new features are: the multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready, and Kubernetes can be used to launch and manage tasks eliminating the need for middle managers...

Andrea Messetti
on Jan 19, 2023

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News