Eric Horesnyi, CEO @streamdata.io, talks to Charles Humble about how hedge funds are applying deep learning as an alternative to the raw speed favoured by HFT to try and curve the market.
Key Takeaways
- Streamdata.io was originally built for banks and brokers, but more recently hedge funds have begun using the service.
- Whilst Hedge Funds like Renaissance Technologies have been using mathematical approaches for some time deep learning is now being applied to markets. Common techniques such as gradient descent and back propagation apply equally well to market analysis.
- The data sources used are very broad. As well as market data the network might be using, sentiment analysis from social networks, social trading data, as well as more unusual data such as retail data, and IoT sensors from farms and factories.
- By way of contrast High Frequency Trading focusses on latency. From an infrastructure stand-point you can play with propagation time, Serilization (the thickness of the pipe), and Processing time for any active component in chain.
- One current battleground in HFT is around using FPGA to build circuits dedicated to feed handlers. Companies such as Novasparks are specialists in this area.
Subscribe on:
Show Notes
About Streamdata.io
- 1m:09s Streamdata.io was built for developers facing the issue of having to present real-time data to applications and web clients, especially in the trading area
- 1m:20s Their customers are typically banks, brokers and API providers
- 1m:30s They provide a proxy as a service, listen to API’s and then turn the API’s into event driven data feeds
- 2m:26s In the last year they've seen hedge funds using the service to listen to API’s that they haven’t thought about
- 3m:27s AI in itself is an old concept and finance has been using it for a while - companies like Renaissance Technologies hired mathematicians to make sense of the marketplace.
Using Technology in Finance
- 4m:15s What’s changed is the evolution and maturation of AI with deep learning
- 5m:08s Linguistic and image recognition has improved vastly thanks to Facebook and Google
- 5m:30s The frameworks and recipes that have been created for weather forecasts and other day-to-day activities can be used for financial decision making in the market.
Techniques to Train Neural Networks
- 6m:38s Time horizons for your strategy are key. If you have a one hour time horizon, you can employ all the techniques of deep learning— gradient descent, back propagation and techniques that come with the discipline like learning, testing, validating
- 7m:26s It takes time to consume data, and the more data you consume, the better your prediction
- 7m:35s If you want to have an impact on the market every minute or second these models are more difficult to apply.
Overfitting
- 8m:21s There are limitations to using data sets to make your decision - if you make a decision using data and the market changes mood, then you’re going to make wrong decisions.
- 8m:38s If you think about your machine learning as a brain that can learn and adapt to conditions in the marketplace effectively, failure can occur if the brain is trained incorrectly.
- 10m:18s The classic AI issue with overfitting—if you’ve trained the brain to something that was specific to a time and environment and the brain got fit to that environment so much, that it cannot perform different when things change.
Avoiding Overfitting
- 10m:45s To avoid overfitting introduce some noise in the test data sets that you put into your machine learning
- 11m:00 However when you introduce noise, you could end up having a brain that may not be fit for any market.
Different Ways To Train Models
- 11m:52s Have the model trained on the subset of data, with a time horizon and window that evolves based on the time you’re targeting.
- 12m:12s The model will be able to adapt to the market and make decisions quickly; this is using data streaming algorithms
- 12m:37s If in machine learning your investment strategy is long term or okay with making a decision once per day, then that’s not an issue you have.
- 12m:54s The issue comes about when you start getting into strategies where you are pretty close to the market and you want to leverage momentum.
The Buyer-side Industry
- 13m:16s The buy-side industry is the group of companies that everybody knows like Vanguard, Fidelity, CapGroup, BlackRock - they take money from people and invest it in the long term investments (for retirement for example)
- 14m:08s The human trader is an exception in the market now - it’s all algorithms these days
The Rise of Electronic Exchanges
- 14m:31s In the US between 1998 - 2005 we had an explosion of regulation to open the market and allow opportunity for new players to make the market to become more efficient electronically
Reducing Latency from an Infrastructure Perspective
- 15m:44s There are multiple elements to work with to reduce latency
- 15m:50s The propagation time depends on the distance and density of the medium you use
- 16m:09s Serialization: the time it takes a data packet to be put on the medium and to play with this is a matter of using big pipes
- 16m:33s Processing time: anytime you have equipment that is active - it will take latency.
Propagation
- 17m:17s We saw in the year 2000 that companies would start getting as close as possible to exchanges. Then they asked for dark fibres- a way to connect directly from one point to another over the shortest distance. Now largely superseded by using radio waves.
Serialization
- 19m:08s The size of the pipe you use between one point and another. The typical media was a 10MB then is went to 100MB if you go from 100MB to 10GB you can reduce your latency by a few milliseconds as well.
Processing
- 19m:57s It’s very well understood by all developers. It’s two things: equipment avoidance and building dedicated, specialist equipment for feed handling using FPGA.
- 20m:33s You can accelerate that processing time by building dedicated circuits for your feed
GPU’s
- 21m:26s Everyone working on AI is using GPU whether they know it or not
- 21m:46s Just using GPU is so powerful to perform calculus computation on various metrics—which is what you need for gradient descent or back propagation
Cloud vs. Data Centers
- 22m:35s The industry is conservative for good reason—we’re talking about IP that generates a lot of money
- 22m:54s Even if it costs more, the return on your risk will be better. Some elements can probably be put in the cloud.
What’s Next at Streamdata.io
- 23m:25s What’s exciting for us is servicing more AI companies on top of the application builder we’re using for presenting real-time data
- 23m:56s Exploring sectors other than finance using real-time data including transport and healthcare.
Data Sources Used
- 24m:31s Streamdata are offering people the possibility to build these event-driven "brains". You want the brain to be fed with as much information as possible about the world around it.
- 25m:25s You need market data, and you also need to consume data about what people say about companies that you can find on social media networks
- 25m:32s You also have networks like StockTwits that specialise in social trading data with people that are experts in the area and talk about what they think about specific companies as specific times
- 25m:51s You can also consume API’s specialized in providing sentiment
- 26m:21s The brain can then make an educated decision as if it was on the floor of all exchanges and sitting by all the traders and analysts in the world
- 26:37 You can see people consuming unusual data for forecasts
- 27m:01s If you’re trading on Meat Futures, it’s probably a good idea to listen to what’s going on in Argentina using IoT sensors to check the health of cows there
- 27m:25s These unusual sources of data are opening a wealth of possibility for the brain to know everything that’s happening at a given moment in the world to make the best possible decision on whether to buy or sell
- 28m:35s It used be about speed, today it’s about knowing more than others, every second
- 28m:58s It’s about how fast you can make sense of all the data, and what’s your ability to build a model to make sense out of data that could seem uncorrelated.