BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage AI, ML & Data Engineering Content on InfoQ

  • EMRFS Brings Consistency to Amazon S3

    Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.

  • Apache Flink 0.8.0 Released, Roadmap for 2015 Published

    Apache Flink has released the version 0.8.0 of their project. Besides the usual performance, compatibility, and stability improvements, it has also added a streaming Scala API, where streaming capabilities had so far been missing. Apache Flink has also been promoted to the top-level of the Apache projects recently after joining the incubator roughly nine months ago.

  • Facebook Open Sources Modules for Faster Deep Learning on Torch

    Facebook has open sourced a number of modules for faster training of neural networks on Torch.

  • Google on the Technical Debt of Machine Learning

    A number of Google researchers and engineers presented their view on the technical debt of using machine learning at a NIPS workshop. They identified different aspects of technical debt and came to the conclusion that without proper care, using machine learning or complex data analysis in your company can induce new kinds of technical debt different from classical software engineering.

  • Distributed, Fault Tolerant Transactions in NoSQL

    Five years ago many NoSQL databases were pre version 1.0 and when, it came to the CAP tradeoff, choosing availability over consistency was in vogue. Fast forward to today and distributed, fault tolerant transactions are moving into the fore as a new round of NoSQL databases seek to redefine our NoSQL expectations.

  • Apache Spark 1.2.0 Supports Netty-based Implementation, High Availability and Machine Learning APIs

    Apache Spark 1.2.0 was released with Netty-based implementation, High Availability and Machine Learning APIs. It represents the work of 172 contributors from over 60 institutions and comprises more than 1000 patches. InfoQ talks with Patrick Wendell, a Spark committer and PMC member.

  • Alex Bordei on Scaling NoSQL Databases

    Network performance, virtualization and testing are some of the considerations to address performance and scalability issues with NoSQL databases. Alex Bordei wrote about scaling NoSQL databases and tips for increasing performance when using these data stores.

  • FoundationDB 3.0 Scales to New Heights

    <a href="https://foundationdb.com/">FoundationDB</a> has released version 3.0 of its <a href="https://foundationdb.com/key-value-store">key-value store</a> with a primary focus on scalability and performance.

  • Splunk Enterprise 6.2 Supports Instant Pivot and Enhanced Event Pattern Detection

    The latest version of big data analytics tools Splunk Enterprise and Hunk support instant pivot, enhanced event pattern detection, and prebuilt dashboard panels. Splunk Inc., provider of the software platform for operational intelligence, recently announced the general availability (GA) of version 6.2 of Splunk Enterprise and Hunk: Splunk Analytics for Hadoop and NoSQL Data Stores.

  • New and Interesting on ThoughtWorks Radar Jan 2015

    ThoughtWorks has published a digital preview of the January 2015 radar, providing opinion on techniques, tools, platforms and languages and taking a snapshot of the current trends in software technology.

  • Splice Machine Version 1.0 Supports Integration with Hadoop and Analytic Window Functions

    Splice Machine version 1.0 supports analytic window functions and integration with Hadoop ecosystem. Splice Machine team recently released their Hadoop based RDBMS data management solution that can be used for transactional workloads on Hadoop.

  • Google Open Sources Cloud Dataflow Java SDK

    Google announced earlier this year their Cloud Dataflow, a service and SDK for processing large amounts of data in batches or real time. Now they have open sourced the Dataflow Java SDK, enabling developers to see how it works and possibly use the SDK for services running on-premises or in other clouds.

  • LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

    LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

  • Agile View of Big Data

    An agile view of Big Data, wherein data is viewed as a real time stream, offers a new look at how data is managed. Using an agile data infrastructure, organizations can conquer Big Data challenges with a level of ease, flexibility and performance. White paper by codeFutures describes the Agile view of Big Data.

  • Gobblin, LinkedIn's Unified Data Ingestion Platform

    At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.

BT