InfoQ Homepage Database Content on InfoQ
-
Q&A: Relevant Search with Elasticsearch and Solr
In their book "Relevant Search", Doug Turnbull and John Berryman focus on the challenge of providing search results by balancing the needs and intents of the user. Using Elasticsearch and Solr, relevance engineers can constantly tune the needs of the business vs. the needs of the user.
-
Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines
With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
-
Spark GraphX in Action Book Review and Interview
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
-
Introduction to SQL Server Containers
Containers are just around the corner for the Windows community, and this article takes a closer look at using SQL Server containers. The author discusses the value, use cases, and means for taking advantage of SQL Server containers today.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
-
Christine Doig on Data Science as a Team Discipline
Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.
-
Starcounter vs. ORM and DDD
The so-called “object-relation impedance mismatch” has long been discussed in engineering circles. Most attempts at a solution rely try to mask the issue by pulling logic into the application tier. Kostiantyn Cherniavskyi looks at these issues and shows how many of them can be solved with hybrid databases such as Starcounter.
-
Virtual Panel: Current State of NoSQL Databases
NoSQL databases have been around for several years now and have become a choice of data storage for managing semi-structured and unstructured data. These databases offer lot of advantages in terms of linear scalability and better performance for both data writes and reads. InfoQ spoke with four panelists to get different perspectives on the current state of NoSQL databases.
-
Big Data Analytics with Spark Book Review and Interview
Big Data Analytics with Spark book, authored by Mohammed Guller, provides a practical guide for learning Apache Spark framework for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. InfoQ spoke with author about the book & development tools for big data applications.
-
Everything Is “Lock-In”: Focus on Switching Costs
Coding in Java, buying SAP, deploying OpenStack, and using Amazon Web Services: each one introduces a type of lock-in. However, it makes no difference how hard you try- some form of lock-in is unavoidable. What matters most is understanding the layers of lock-in, and how to assess and reduce your switching costs.
-
Martin Van Ryswyk on DataStax Enterprise Graph Database
DataStax recently announced a new product called DataStax Graph to store graph data models. It's based on open source Titan graph database and uses Apache Tinkerpop framework's Gremlin query language. InfoQ spoke with Martin Van Ryswyk about the new product.
-
Big Data Processing with Apache Spark - Part 4: Spark Machine Learning
In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concepts and Spark MLlib library for running predictive analytics using a sample application.