Datameer recently announced SmartAI that integrates Big Data with machine learning models to provide better data insights.
The goal of SmartAI is to be able to operationalize AI for Big Data i.e. run AI algorithms at scale on Big Data, to enable business analysts to plugin AI models into analytic workflows, and to be able to do all this in a secure and compliant manner.
InfoQ caught up with Andrew Brust, senior director, market strategy and intelligence at Datameer, regarding the general role of Big Data in AI in general and SmartAI in particular.
InfoQ: Datameer has hitherto been in the Big Data space. Why move to AI? Is Big Data passe?
Andrew Brust: Datameer is still very much in the Big Data space. In fact, SmartAI is all about integrating AI into the Big Data analytics workflow. Much of the difficulty around enterprise adoption of AI has been that it’s very segregated from Big Data, Business Intelligence and other analytics tools and technologies. With SmartAI, we are trying to fix that. We’re harnessing AI’s power by bringing it into mainstream analytic pipelines.
InfoQ: What do you think of the synergy between Big Data and AI? How does SmartAI exploit this?
Brust: There can be a strong synergy between Big Data and AI, but the industry hasn’t done much to facilitate that. With SmartAI, Datameer is bringing these two worlds together. Scoring data against machine learning models has mostly been an ad hoc process, much of the time done on data scientists’ workstations. But scoring should be done on an automated basis, at Big Data scale, across all the nodes in a Hadoop cluster. That’s what they’re there for.
InfoQ: Machine Learning involves significant data preparation and manipulation. Does that play into Hadoop’s strengths in general and to your company’s strength in particular?
Brust: Yes, data preparation is a big part of our story, a big part of Big Data’s story and it’s also a big part of the AI story. If you’re going to do AI, you’re going select an algorithm and train a model. The data used to train the model has to be clean, filtered down to just the bare essential input (feature) columns, and aggregated at the right level. All of those steps are data preparation steps. So, too, is deriving columns based on complex calculations against values present in the raw data. Datameer does all of this, using Hadoop, and it does it very well. So adding AI capabilities to the product constitutes a very natural extension.
InfoQ: What do you mean by operationalizing AI? Can you go into some technical depth here in general and in particular with SmartAI?
Brust: As I mentioned earlier, scoring data has been mostly a manual, bespoke process, done by data scientists on their own workstations. The output of that process is very useful to the business. But doing it that way isn’t up to Enterprise standards, any more than running some other critical computing process by manually clicking a button would be.
The scoring process needs to be automated, managed, monitored, and run at scale. Put another way, it needs to be operationalized. It may sound ridiculous to say that heretofore it hasn’t been done that way, but, in large part, that’s an accurate statement. With SmartAI, scoring is operationalized, because it’s integrated into the same data pipeline engine and management fabric that we have used for Big Data analytics for years.This architecture makes it possible, for example, to run a churn analysis model on your Big Data, on a daily or hourly basis, compliant with policies set-up for such an analysis pipeline.
InfoQ: Insofar as Deep Learning is concerned, does SmartAI address both training and inference? Or does this mainly address analytics?
Brust: The actual training process does not move into Datameer. Data Scientists will continue to train their models using their favorite development environment, language, tool and/or command line interface.
Once the models are trained, they can be imported into Datameer and become callable as additional spreadsheet functions in our library of over 270 of them. This enables embedding AI into secure, governed, scheduled data pipelines, rendering both the analytics and the AI even more powerful than they would be if separate.
InfoQ: Finally, why the choice of TensorFlow and does the product roadmap plan to address other AI toolkits in the future?
Brust: Today, TensorFlow has the popularity, mindshare and ecosystem momentum and so that’s where we’ve made our investments. The AI field is ever-changing and we are well-aware of the many other open source deep learning frameworks out there, any one of which has the potential to gain its own momentum. We’ll monitor the market and see if the landscape changes; the architectural approach we’ve taken with TensorFlow could be used with other libraries. For now, though, we are very confident in our decision to implement SmartAI as an integration of TensorFlow into Datameer.
Datameer SmartAI will be available as a technical preview shortly.