If one reviews the talks at a modern conference, it'd be reasonable to assume that many of today's software systems are made up of stateless compute, distributed databases, and high-throughput message brokers. Startup Swim recently open-sourced their platform that uses stateful "digital twins" to analyze streaming data in real time without depending on databases, message brokers, app servers, or storage.
Swim describes its platform as a "single integrated software stack that is efficient and able to run anywhere, from the smallest compute devices to the largest clouds." The software consists of a 2MB JVM extension that run on devices or servers. Data Center Knowledge explains that the "software works through mesh-connected 'digital twins' that utilize machine learning to predict changes in the data being created."
Swim CEO Chris Sachs told The New Stack that this platform takes a different approach to data.
"The difference between the edge and the cloud has a lot to do with databases. Cloud applications, in our view, are database-centric apps," said Sachs in an interview. "To us, the edge is not. By the edge, we don't mean devices, although we do run on physical edge devices, and we don’t really mean a place at all. We mean apps driven by data as it flows, rather than post hoc after it's stored in a database, whether it's in a cloud or on a physical edge. We don't care where we physically run."
...
Sachs says that SWIM’s edge compute platform contrasts the traditional cloud computing paradigm by processing real-time as it arrives, not after it is stored in a database. He divides applications into two general categories — database-centric and data-driven, with Swim enabling the latter.
What Swim open sourced consists of the components for building and running stateful applications, the UI framework and SDKs for building streaming visualizations, and the streaming APIs to integrate various data streams. One key part of the platform is the WARP streaming protocol. Developed by Swim, it's described as a "multiplexed streaming upgrade to HTTP, to enable the creation of bidirectional streaming links between distributed application services."
The four year old startup released the open source version of their platform under the permissive Apache 2.0 license. They plan to follow an "open core" model where the proprietary version adds "security, manageability, persistence, and scalability features to the core Swim platform."
To learn more, InfoQ reached out to Swim CTO Simon Crosby, a well-known industry veteran who founded XenSource, served as CTO at Citrix, and co-founded the security company Bromium.
InfoQ: What would you like to see the community do with the OSS Swim platform? Build extensions? Discover new use cases?
Simon Crosby: Swim holds the promise of a massively more efficient and simple way to build data driven apps. We are looking forward to the new uses that the community will find. We also want to foster and support a community that contributes new application frameworks upon which others can build.
InfoQ: Let's see if I understand the platform correctly. Is it right that the platform instantiates instances of objects for each "thing", these addressable objects store state about that "thing" and are updated when the real-life "thing" changes state, and the platform creates models that represent aggregate information or intelligence about the collections?
Crosby: Spot on. Swim builds a live model of the real world from data. The model is a graph of digital twins of real world sources, with links to reflect relationships like proximity or containment. Digital twins are active stateful objects that mirror the state of the real world. They also analyze, learn and predict, query etc on the data flows in the graph, in real time. And then they stream their results.
InfoQ: Conventional wisdom seems to be that a modern distributed system consists of stateless microservices using a database backend and asynchronous interactions facilitated by an event broker. You're coming at this from a different perspective. What's your take on stateful computing in distributed systems?
Crosby: The real world is stateful. Stateless and databases have allowed the cloud to scale out superbly, but is massively wasteful of CPU. For every cycle of useful edge compute at memory speed, a stateless centralized Architecture wastes a billion cycles. That's enough for a raspberry pi at the edge to outrun a powerful cloud instance!
So the future of the edge is stateful, and computation is driven by data in real time. An inference cycle at the edge takes less time than getting one packet of data from the edge to the cloud!
InfoQ: What's the developer workflow with Swim? Assuming one has an environment up and running, what does the developer have to do to create and deploy an application?
Crosby: Swim is a 2MB extension to the JVM. The developer simply defines a schema on the data and the methods on the digital twins of each type that are called when data arrives. She also defines the computational objects that will analyze the digital twins in the model. The powerful capabilities in swim put the bleeding edge of analytics, learning and inference in the palm of the dev, who can deliver deep insights and predictions in a few lines of java.
Swim also has an in-browser implementation that can link to digital twins to deliver real time UIs in a few lines of js.
InfoQ: Many of the public Swim use cases relate to real-time analytics and IoT. What other use cases are suitable for this platform? Which scenarios would NOT be a good fit?
Crosby: I think of swim as being a way to build a "LinkedIn for things": things link to other things to which they are related, and powerful analytics execute in real time in the graph. I read about Lyft spending $100M per year on AWS. I'd love to see swim used to drop that to $10M. In a nutshell, swim is massively applicable to the problems tackled by many cloud native apps. We want to help those folk too.