InfoQ Homepage Big Data Content on InfoQ
-
Deep Mind Discloses Details to InfoQ about NHS Partnership amid Reports of Vast Patient Data Access
After months of awaiting details about the NHS and Google DeepMind partnership InfoQ gains insights into recent claims of widespread patient data access.
-
Elephant in the Cloud - Hadoop as a Service
Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.
-
AirFlow Joins Apache Incubator
AirFlow recently joined the Apache Incubator program. AirFlow is a workflow and scheduling system designed to manage data pipelines. Developed by AirBnb for their internal usage, it was open sourced last September, as previously reported by InfoQ.
-
Operational Data Stream and Batch Processing at Netflix with Mantis
Operational Data Stream and Batch Processing at Netflix with Mantis
-
Neo4j 3.0 Released with Binary Communication Protocol and Standardised Drivers
Today at GraphConnect Europe 2016, Neo Technology announced the release of Neo4j 3.0, which includes a new binary protocol for transmitting data between server and client, and a new set of standardised drivers for interacting with the database, along with stored procedure support and higher performance and capacity. InfoQ spoke to Neo Technology to find out more.
-
Google Cloud Machine Learning and Tensor Flow Alpha Release
Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.
-
Apache Flink 1.0.0 is Released
InfoQ's Rags Srinivas caught up with Stephan Ewen, a project committer for Apache Flink about the 1.0.0 Release and the roadmap
-
Funnel Analysis at Twitter for Improving User Engagement
Funnel analysis is used to analyze a sequence of events to help with user engagement on a website or a mobile application. Data Science team at Twitter uses this concept to learn how users interact with user interfaces during sign up or tweeting for improving user engagement with Twitter.
-
AlphaGo: Google and DeepMind Publish Seminal AI Work
A game simulation at Google's Deep Mind defeated expert humans at Go last month in a breakthrough for AI. Go is considered one of the great unsolved problems in AI.
-
Benchmarking Netflix Dynomite with Redis on AWS
Last year, Netflix Cloud Database Engineering (CDE) team introduced Dynomite. Dynomite is a proxy layer, aiming to turn any non-distributed database into a sharded, multi-region replication aware distributed database system. Now Netflix released a benchmark using Dynomite with Redis in AWS infrastructure.
-
How Airbnb Uses Net Promoter Score to Predict Guest Rebooking
Net Promoter Score (NPS) is a customer loyalty metric used to determine the likelihood that a customer will return to a company's website or use their service again. Airbnb uses NPS extensively in measuring the customer loyalty, as a more effective measurement to determine the likelihood that a customer will return to book again or recommend the company to their friends.
-
Yahoo Open-Sources DataSketches for Faster Operations Over Streams
Yahoo has open-sourced DataSketches, a library written in Java for stochastic streaming algorithms. DataSketches is able to perform traditionally expensive operations, like counting distinct occurrences of a variable within a stream, using a fraction of time and memory and with a predictable error margin.
-
Riley Newman on How Airbnb Uses Data Science
Riley Newman, head of data science at Airbnb, recently published an article describing how the Californian startup defines and uses data science. He explains that data can be seen as the voice of the customers, and data science as an act of interpretation. He also details several initiatives that have been particularly important for scaling data science.
-
MongoDB Hits 3.2 and Becomes Enterprise Ready
MongoDB recently announced the newest version of its NoSQL database synonymous product. Building upon the new features introduced in 3.0 release, 3.2 is expanding and solidifying MongoDB’s interest towards the corporate world.
-
IBM Commits to Advance Apache Spark
Earlier last month in Las Vegas, at IBM Insight 2015, IBM announced a major commitment to the Apache Spark project. Referring to it as “potentially the most significant open source project of the next decade” tells a lot about how important IBM believes Apache Spark is. With IDC reporting that 80% of cloud applications in the future will be data intensive, Apache Spark can unlock previously...