InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Apache Spark Brings Pandas API with Version 3.2

The Apache Spark team has integrated the Pandas API in the product's latest 3.2 release. With this change, dataframe processing can be scaled to multiple clusters or multiple processors in a single machine using the PySpark execution engine.

Sabri Bolkar
on Nov 04, 2021
Cloud

AWS Announces the Public Preview of AWS Data Exchange for Amazon Redshift

Recently AWS announced the public preview of AWS Data Exchange for Amazon Redshift. This new feature enables customers to find and subscribe to third-party data in AWS Data Exchange to query in an Amazon Redshift data warehouse.

Steef-Jan Wiggers
on Oct 27, 2021
Cloud

AWS Announces the General Availability and Open Sourcing of the Amazon Genomics CLI

Amazon Genomics CLI is a tool that makes it easier to process genomics data at a petabyte-scale on AWS. Earlier this year, the public cloud vendor shared a preview of the tool, and it is now open source and generally available.

Steef-Jan Wiggers
on Oct 06, 2021
Mobile

Facebook Mariana Trench Helps Developers to Find Vulnerabilities in Android and Java Apps

Recently open-sourced by Facebook, Mariana Trench (MT) aims to help developers identify and prevent security and privacy bugs in Android and Java applications.

Sergio De Simone
on Oct 02, 2021
Cloud

AWS Announces Customizable Image Support for Amazon EMR on EKS

Recently, AWS announced customizable image support for Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS) that allows customers to modify the Docker runtime image that runs their analytics application using Apache Spark on their EKS cluster.

Steef-Jan Wiggers
on Jul 28, 2021
Architecture & Design

Airbnb Builds Himeji - a Scalable Centralized Authorization System

Airbnb recently described how it built Himeji, a scalable centralized authorization system. Himeji stores permissions data and performs permission checks as a central source of truth. It uses a sharded and replicated in-memory cache to improve performance and lower latencies and has served checks in production for about a year.

Eran Stiller
on May 12, 2021
Cloud

Hazelcast Jet 4.4 Released - the Four-Year Anniversary Release as Seen by Scott McMahon

Hazelcast Jet recently celebrated its four-year anniversary with the release of version 4.4. Besides the normal bug fixes and performance enhancements, this new version ships with new features such as the unified file connector and the first beta version of the SQL interface. InfoQ spoke to Scott McMahon, technical director of field engineering at Hazelcast, about this new release.

Olimpiu Pop
on Mar 19, 2021
Culture & Methods

Using Machine Learning in Testing and Maintenance

With machine learning, we can reduce maintenance efforts and improve the quality of products. It can be used in various stages of the software testing life-cycle, including bug management, which is an important part of the chain. We can analyze large amounts of data for classifying, triaging, and prioritizing bugs in a more efficient way by means of machine learning algorithms.

Ben Linders
on Mar 18, 2021
AI, ML & Data Engineering

DataStax Announces Astra Serverless Database-as-a-Service

DataStax , the company behind the Cassandra database, announced last week the general availability of Astra serverless, the open, multi-cloud serverless database-as-a-service (DBaaS).

Srini Penchikala
on Mar 15, 2021
Architecture & Design

Designing for Failure in the BBC's Analytics Platform

Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."

Eran Stiller
on Feb 24, 2021
Cloud

Google Brings Databricks to Its Cloud Platform

Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.

Steef-Jan Wiggers
on Feb 23, 2021
Architecture & Design

PayPal Standardizes on Apache Airflow and Apache Gobblin for Its Next-Gen Data Movement Platform

PayPal recently described how it standardized on Apache Airflow and Apache Gobblin for implementing its next-gen data movement platform. In a recent blog post, PayPal engineers detail how the existing data movement platform evolved into many tools & platforms in a complex and unmanageable ecosystem and their shift towards a new implementation.

Eran Stiller
on Feb 10, 2021
Culture & Methods

Analyzing Large Amounts of Feedback to Learn from Users

Making it easy for users to give feedback and automating the collection of feedback helps to get more feedback faster. Using artificial intelligence, you can analyze large amounts of feedback to get insights and visualize trends. Sharing this information widely supports taking action to enhance your product and solve issues that users are having.

Ben Linders
on Dec 24, 2020
.NET

Microsoft Releases .NET for Apache Spark 1.0

Last month, Microsoft released the first major version of .NET for Apache Spark, an open-source package that brings .NET development to the Apache Spark platform. The new release allows .NET developers to write Apache Spark applications using .NET user-defined functions, Spark SQL, and additional libraries such as Microsoft Hyperspace and ML.NET.

Arthur Casals
on Nov 28, 2020
Cloud

Google Announces a New, More Services-Based Architecture Called Runner V2 to Dataflow

Google Cloud Dataflow is a fully-managed service for executing Apache Beam pipelines within the Google Cloud Platform(GCP). In a recent blog post, Google announced a new, more services-based architecture called Runner v2 to Dataflow – which will include multi-language support for all of its language SDKs.

Steef-Jan Wiggers
on Aug 30, 2020

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News