InfoQ Homepage Database Content on InfoQ

Articles

RSS Feed

Newer Older

Development

Q&A: Relevant Search with Elasticsearch and Solr

In their book "Relevant Search", Doug Turnbull and John Berryman focus on the challenge of providing search results by balancing the needs and intents of the user. Using Elasticsearch and Solr, relevance engineers can constantly tune the needs of the business vs. the needs of the user.

David Iffland
on Sep 26, 2016
AI, ML & Data Engineering

Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines

With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.

Srini Penchikala
on Sep 24, 2016
AI, ML & Data Engineering

Spark GraphX in Action Book Review and Interview

“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.

Srini Penchikala
on Sep 12, 2016
Cloud

Introduction to SQL Server Containers

Containers are just around the corner for the Windows community, and this article takes a closer look at using SQL Server containers. The author discusses the value, use cases, and means for taking advantage of SQL Server containers today.

Paul Stanton
on Sep 08, 2016
AI, ML & Data Engineering

Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines

InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline

Dylan Raithel
on Aug 29, 2016
AI, ML & Data Engineering

Christine Doig on Data Science as a Team Discipline

Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.

Srini Penchikala
on Aug 26, 2016
.NET

Starcounter vs. ORM and DDD

The so-called “object-relation impedance mismatch” has long been discussed in engineering circles. Most attempts at a solution rely try to mask the issue by pulling logic into the application tier. Kostiantyn Cherniavskyi looks at these issues and shows how many of them can be solved with hybrid databases such as Starcounter.

Kostiantyn Cherniavskyi
on Aug 10, 2016
AI, ML & Data Engineering

Virtual Panel: Current State of NoSQL Databases

NoSQL databases have been around for several years now and have become a choice of data storage for managing semi-structured and unstructured data. These databases offer lot of advantages in terms of linear scalability and better performance for both data writes and reads. InfoQ spoke with four panelists to get different perspectives on the current state of NoSQL databases.

Srini Penchikala
on Aug 02, 2016
AI, ML & Data Engineering

Big Data Analytics with Spark Book Review and Interview

Big Data Analytics with Spark book, authored by Mohammed Guller, provides a practical guide for learning Apache Spark framework for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. InfoQ spoke with author about the book & development tools for big data applications.

Srini Penchikala
on Jun 23, 2016
Cloud

Everything Is “Lock-In”: Focus on Switching Costs

Coding in Java, buying SAP, deploying OpenStack, and using Amazon Web Services: each one introduces a type of lock-in. However, it makes no difference how hard you try- some form of lock-in is unavoidable. What matters most is understanding the layers of lock-in, and how to assess and reduce your switching costs.

Richard Seroter
on Jun 08, 2016
AI, ML & Data Engineering

Martin Van Ryswyk on DataStax Enterprise Graph Database

DataStax recently announced a new product called DataStax Graph to store graph data models. It's based on open source Titan graph database and uses Apache Tinkerpop framework's Gremlin query language. InfoQ spoke with Martin Van Ryswyk about the new product.

Srini Penchikala
on May 17, 2016
AI, ML & Data Engineering

Big Data Processing with Apache Spark - Part 4: Spark Machine Learning

In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concepts and Spark MLlib library for running predictive analytics using a sample application.

Srini Penchikala
on May 15, 2016