We are at the RICON distributed systems conference with Adam Wray, CEO and president and Peter Coppola, VP of Product, of Basho Technologies, who is hosting the conference.
InfoQ: Welcome gentlemen, can you tell us about the RICON conference and what your goals are?
Adam: RICON was an event that our company has hosted for several years, for thought leaders in distributed systems. It’s really looking at how data and applications are spread across multiple environments and what it means to the data and what it means to the workload itself. So when I took over the company and came on board as CEO and president, one of the things we were looking at is the value of continuing to host this conference for a company of our size, and the value of going forward with RICON. What we came up with was that it was a critical need; we needed to support it and we needed to expand upon it, because workloads are heading towards an environment that is covered all over the world and so the challenges, whether you’re looking at it first from the academic world and trying to understand how to handle data in disparate environments or whether you’re doing it with workload in the database tier, the orchestration tier, the application tier, etc. it’s a very challenging and wonderful subject to take on as a neutral conference.
InfoQ: So you’re addressing an external universe and scouting for external speakers to address that space?
Adam: The thought process was that we framed it as a distributed systems conference for developers whether it’s some other type of workload and technical infrastructure in public, private or on-premise cloud at web-scale. These are really challenging environments. To give you a sense of some of the people that were attracted to this type of thought process, one of the keynotes, Mac Devine, CTO of IBM’s software Cloud Services Group agreed to speak because he feels passionately about this subject and the fact that where we’re going is to create a whole new layer of opportunity and challenges that we need to be focused on as an industry. We had speakers like Marten Mickos (SVP & General Manager of Cloud at HP) giving a keynote as well, who has taken on this challenge for all of HP’s cloud services.
So that’s one end, and on the other end we have representatives from universities across the globe; Portugal, Germany, and even here locally, we have representatives from Berkeley to MIT, so from the west to the east coast.
InfoQ: So obviously there is something in it for Basho, besides just hosting the conference?
Adam: The process for us is really being the thought leader. As a database ourselves, we can handle at-scale distributed environments, we believe, at production level load better than anyone else in the world for unstructured data. So this is a challenge that our core routing engine was uniquely designed to handle. And so the thought process was that these workloads, whether people are looking at say identity management or orchestration or some type of application, ultimately they are going to need a database and so it’s a process we can help solve as well, from our point of view.
InfoQ: When you say database you mean NoSQL, correct?
Adam: If you think of the unstructured database world, databases that are not relational, you have two camps: you have the Hadoop camp, and the NoSQL and its derivatives. Hadoop is good for batch processing, and a lot of people do analytics work on top of it, and collect all of that data. I sometimes humorously refer to it as the data warehouse of the 21st century because it’s become the resting place for all data to take you through the workload you apply it to.
NoSQL and its derivatives are different data models or different data types that are trying to actually handle workloads in live environments. They are actually in the use cases. In fact one of the people here from The Weather Company (parent company of The Weather Channel) , CIO Bryson Koehler, they built an Internet of Things platform which they can basically take in data from 40,000 sensor feeds from all over the world, and then apply it in real-time to retailers like Walmart to understand in real-time what types of things they should make available based on the temperature that’s going to be happening say in the afternoon vs. the morning. And so they might advertise or make selections available in front of their store based on integrating The Weather Company’s data in real-time with their ERP system. And so in that example, one datapoint they have recently studied was Apple’s iOS 8. They are getting hit by something like 10 billion transactions a day requesting data. And at the core of that the data resides Basha’s Riak. That’s a lot of people checking data on their iPhones!
InfoQ: The database field is quite crowded. Oracle is the elephant in the room, but there are also all of the NOSql offerings; Cassandra, MongoDB, so how is a vendor to compete in this jungle!
Adam, you mean “why Basho?” If you think of the four or five players that have significant critical mass in the NoSQL space, the same four or five are going to always come into the discussion. It’s going to be MongoDB, Datastax, Couchbase; Cassandra is a movement even though Datastax is the commercial portion of that, and Basho. It gets really quiet from a software perspective after that. There are some hardware manufacturers that put things together, etc. But if you think software, none of us are that much bigger than the other. I am within a shots range of most of my competitors’ revenues; we all raised pretty significant capital, we all have a certain degree of blue-chip clients. Where I think Basho has the distinct advantage, and it is a reason that people such as Peter, Dave McCrory (Basho CTO), and myself are here, is that Basho has one thing that our competitors have not done a very good job of, and that’s production level scale. And so I gave you an example of The Weather Company. We recently beat out Oracle at NHS, the National Health Care Society for England. It’s effectively the medical society that handles all 80 million patients in the United Kingdom. They replaced Oracle’s relational database with Basho’s Riak. Why did they do that? Because it was the one database they could trust at production level scale. Our core routing engine has got time and time again those types of use cases, and what attracted us here is that you can do a lot if you can prove that you can do production level load in distributed environments. You can build a lot on top of that. We think our future, which we can only allude to now but over time we’ll come out with the strategy that brought us here, especially as we get to Q1, the ability to take that and do so much more is why I am sitting in the chair.
InfoQ: That’s a pretty ambitious target
Adam: Well, ambitious is that we believe we can be the number one NoSQL database. We actually believe there is a limitation to the document databases like MongoDB and Couchbase. And Cassandra, though it cleanly can scale as well, has a problem of being difficult to manage at scale. Whereas if you ask our clients as they operationally scale up our engine, it’s very simple.
InfoQ: What other products do you provide besides Riak?
Adam: We basically have two products, Riak, which is a key-value store, and RiakCS which is an object store. And so we’re going after the database and storage market on the same platform. And by the way, we are the only company in the entire space that has database, storage, and search, all integrated together. It gives you an idea of the flexibility of the engine we built.
InfoQ: I didn’t know about the search part of it. Does that compete with Google and Endeca?
Adam: You think of it more like it competes with Solr and Elastic search, vs. say a Google. It is specifically designed to be able to search your database with altogether different queries
InfoQ: Search “my” database? Or search a Riak database?
Adam: Riak. Peter, I don’t know if you want to add to that?
Peter: We introduced it in 2.0. We had search prior to 2.0 but it was an in-house developed technology. As a part of 2.0 we tightly integrated Solr with Riak so it actually sits and runs on the same instance of Riak in a given node in a cluster.
InfoQ: Do you sell Riak or how do you make money?
Adam: We have an open source version and we have an enterprise version of our Riak and RiakCS. The enterprise version has core features service/support, and multi data centers. We do data centers in a distributed environment, that’s the difference between enterprise and open source, and we sell that commercially. Our clients include every one from ATT, NTT, Yahoo Japan, Best Buy. We have offices in Tokyo, London, DC, and Seattle. We are headquartered out of Bellevue, Washington in Seattle.
InfoQ: On the Gartner Magic Quadrant, not just Basho but all of the NoSQL vendors are not where you would want them to be, in the lower left quadrant. can you address that?
Peter: Quite frankly I did have a conversation with Gartner saying “how useful is this Magic Quadrant when you are combining traditional relational databases with NoSQL databases?" because they’re designed to do different things. They mixed structured and unstructured data in the same bucket.
Some folks estimate that NoSQL has about 20% penetration in enterprises today, and by 2017 it will double to 40% penetration. So Gartner is still looking at it as folks are predominantly using Oracle and there are some use cases and applications within the enterprise for these new technologies. What I said to them was “you’re doing your readers a disservice by not allowing them to see these as a separate category”. They’re being used for very different things. In fact sometimes in the same application you’re using a traditional relational database, a strongly consistent database next to a NoSQL database. So I’ll give you an example, a database that you store people’s passwords and stuff in, might be very different from the kind of database that you, if you’re on Facebook, are storing status updates; they solve very different problems. And so you’ll see in that same application people having both types of databases. So people aren’t traditionally using NoSQL to do transaction processing. So they should separate them out, but they didn’t seem very inclined to do so.
InfoQ: Really! They will eventually, these are smart people, NoSQL is No SQL, the name tells you it is the exact opposite!
Adam: The analysts have a hard time understanding this space, how much disruptive change is happening. Though there is in some estimates a $30 billion to $50 billion opportunity here over the next five to ten years, the reality is that for enterprise using it across their production workloads, it’s young. Most of the work has been by the developer community, applied to web-based architecture, not taken into traditional enterprise. Traditional enterprise is just starting to embrace it in an aggressive way, which is why you see Gartner just starting to really get their head around it. Which from Basho’s point of view is incredibly opportunistically great, because the game is afoot and the money is just coming to the table, and we have production level capabilities.
InfoQ: So you’re saying Gartner is starting to come aboard?
Adam: No. Gartner, because their enterprise clients are people who pay them money to give them analysis reports are starting to ask “what is this?”, they’re starting to try to figure it out. I could be completely wrong on this but I would guess that by next year Gartner will break out NoSQL. I can tell you that Forrester already breaks out multiple different iterations on NoSQL, so they look at more than just NoSQL; they look at the document databases in one whole separate report. So they have two reports, one on NoSQL and one on document databases.
InfoQ: The traditional enterprise CIO is going to resist technologies like this in favor of their “old reliables”. What argument can someone working at say a bank use to convince their people to bring in Riak?
Peter: It depends on use cases, and within any industry there are a variety of use cases. So if you look at companies that are in the financial space, not only do they have trading systems, but they have consumer facing applications as well, and so keeping session state; as your consumer logs in, keeping that session state, particularly if you have a very large region or a global audience, is one classic use case. One of the presenters that is here this week, Two Sigma, built a whole analysis system for their hedge funds on top of Riak. So there’s another financial services type application. Even within a given vertical, there are places for traditional as well as NoSQL. NoSQL is typically used when you’re talking about very large data sets, things that are too big to fit on a single machine. In traditional database architecture, when your dataset gets to be too big, you manually break off pieces, they call that sharding; NoSQL solves that problem automatically for you.
So (we will solve) use cases where there is a very large dataset, or use cases where the audience is spread across a large region or globally, where latency will come into play.
Adam: We are 100% available. Even if something doesn’t work, when it comes to the CAP theorem, it’s the “C”, that’s the piece. We choose an eventual consistency model. Regardless of whether a node drops or not, we will be able to synchronize all environments. And some vendors choose to just do reads but not writes. We also do the writes, so when the partition is healed, if the network was separated, or when a node that was down comes back, you will see logic that says “how do I resolve the fact that multiple writes happened during that failure?” But the system stays up and we accept both reads and writes the whole time.
InfoQ: What’s on the horizon, both in the general distributed computing front and the Riak front?
Adam: Think of distributed systems as a whole or hybrid cloud. I think with the proliferation of public clouds like Azure, AWS, and Google, you’re going to see many more enterprise clients look to be able to run components of their workloads on private or public, and this opens up a whole new litany of challenges that we’re ideally well suited for, and so you can expect that whereas right now we are already interoperable at an API layer with certain stores such as S3 from AWS, we have a certain amount of API compatibility already, you can expect that we are going to continue to drive towards a point of view that’s going to enable hybrid clouds.
Peter: On the Riak product side, we are the only provider in the space that has NoSQL for key-value store, object storage and search from a single vendor leveraging a single common platform, and what we’re finding is that customers are looking for vendors that can solve a lot of their use case and application requirements, so we will continue doing our best in bringing more functionality; they don’t want to end up in a place where they have five, six, maybe 10 NoSQL vendors, you know graph on the one hand, columnar on another; they would like a small number of vendors solving their problems. So you’ll see us add additional functionality and additional data models.