In this podcast, Shane Hastie, Lead Editor for Culture & Methods spoke to Matthew Scullion about the state of the data analytics workforce, friction in data analytics value streams and the resultant high rates of stress and burnout.
Key Takeaways
- 60% of the work in data analytics is sourcing and preparing the data
- 75% of data teams believe that the ways that they're doing migration of data, data integration and maintenance, are costing their organizations both time, productivity and money
- 50% of data analytics team members feeding feel under constant pressure and stress, and are experiencing burnout
- Most large corporates are trying to integrate data from over 1000 disparate systems
- Pre-cloud data integration tools and techniques do not work well in the modern cloud-based environment
Subscribe on:
Transcript
Shane Hastie: Good day folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today I'm sitting down literally across the world from Matthew Scullion. Matthew is the CEO of Matillion who do data stuff in the cloud. Matthew, welcome. Thanks for taking the time to talk to us today.
Introductions [00:25]
Matthew Scullion: Shane, it is such a pleasure to be here. Thanks for having us on the podcast. And you're right, data stuff. We should perhaps get into that.
Shane Hastie: I look forward to talking about some of this data stuff. But before we get into that, probably a good starting point is, tell me a little bit about yourself. What's your background? What brought you to where you are today?
Oh gosh, okay. Well, as you said, Shane, Matthew Scullion. I'm CEO and co-founder of a software company called Matillion. I hail from Manchester, UK. So, that's a long way away from you at the moment. It's nice to have the world connected in this way. I've spent my whole career in software, really. I got started very young. I don't know why, but I'm a little embarrassed about this now. I got involved in my first software startup when I was, I think, 17 years old, back in late nineties, on the run-up to the millennium bug and also, importantly, as the internet was just starting to revolutionize business. And I've been working around B2B enterprise infrastructure software ever since.
And then, just over 10 years ago, I was lucky to co-found Matillion. We're an ISV, which means a software company. So, you're right, we do data stuff. We're not a solutions companies, so we don't go in and deliver finished projects for companies. Rather, we make the technologies that customers and solution providers use to deliver data projects. And we founded that company in Manchester in 2011. Just myself, my co-founder Ed Thompson, our CTO at the time, and we were shortly thereafter joined by another co-founder, Peter McCord. Today, the company's international. About 1500 customers around the world, mostly, in revenue terms certainly, large enterprise customers spread across well over 40 different countries, and about 600 Matillioners. Roughly half R and D, building out the platform, and half running the business and looking after our clients and things like that. I'm trying to think if there's anything else at all interesting to say about me, Shane. I am, outside of work, lucky to be surrounded by beautiful ladies, my wife and my two daughters. And so, between those two things, Matillion and my family, that's most of the interesting stuff to say about me, I think.
Shane Hastie: Wonderful. Thank you. So, the reason we got together to talk about this was a survey that you did looking at what's happening in the data employment field, the data job markets, and in the use of business data. So, what were the interesting things that came out of that survey?
Surveying Data Teams [03:01]
Matthew Scullion: Thanks very much for asking about that, Shane. And you're quite right, we did do a survey, and it was a survey of our own customers. We're lucky to have quite a lot of large enterprise customers that use our technology. I mean, there's hundreds of them. Western Union, Sony, Slack, National Grid, Peet's Coffee, Cisco. It's a long list of big companies that use Matillion software to make their data useful. And so, we can talk to those companies, and also ones that aren't yet Matillion customers, about what they've got going on in data, wider in their enterprise architecture, and in fact, with their teams and their employment situations, to make sure we are doing the right things to make their lives better, I suppose. And we had some hypotheses based on our own experience. We have a large R and D team here at Matillion, and we had observations about what's going on in the engineering talent market, of course, but also feedback from our customers and partners about why they use our technology and what they've got going on around data.
Our hypothesis, Shane, and the reason that Matillion exists as a company, really is, as I know, certainly you and probably every listener to this podcast will have noticed, data has become a pretty big deal, right? As we like to say, it's the new commodity, the new oil, and every aspect of how we work, live and play today is being changed, we hope for the better, with the application of data. It's happening now, everywhere, and really quickly. We can talk, if you want, about some of the reasons for that, but let's just bank that for now. You've got this worldwide race to put data to work. And of course, what that means is there's a constraint, or set of constraints, and many of those constraints are around people. Whilst we all like to talk about and think about the things that data can do for us, helping us understand and serve our companies' customers better is one of the reasons why companies put data to work. Streamlining business processes, improving products, increasingly data becoming the products.
All these things are what organizations are trying to do, and we do that with analytics and data visualization, artificial intelligence and machine learning. But what's spoken about a lot less is that before you can do any of that stuff, before you can build the core dashboard that informs an area of the business, what to do next with a high level of fidelity, before you can coach the AI model to help your business become smarter and more efficient, you have to make data useful. You have to refine it, a little bit like iron ore into steel. The world is awash with data, but data doesn't start off useful in its raw state. It's not born in a way where you can put it to work in analytics, AI, or machine learning. You have to refine it. And the world's ability to do that refinement is highly constrained. The ways that we do it are quite primitive and slow. They're the purview of a small number of highly skilled people.
Our thesis was that every organization would like to be able to do more analytics, AI and ML projects, but they have this kink in the hose pipe. There's size nine boots stood on the hose pipe of useful data coming through, and we thought was lightly causing stress and constraint within enterprise data teams. So we did this survey to ask and to say, "Is it true? Do you struggle in this area?" And the answer was very much yes, Shane, and we got some really interesting feedback from that.
Shane Hastie: And what was that feedback?
60% of the work in data analytics is sourcing and preparing the data [06:42]
Matthew Scullion: So, we targeted the survey on a couple of areas. And first of all, we're saying, "Well, look, this part of making data useful in order to unlock AI machine learning and analytics projects. It may well be constrained, but is it a big deal? How much of your time on a use case like that do you spend trying to do that sort of stuff?" And this, really, is the heart of the answer I think. If you're not involved in this space, you might not realize. Typically it's about 60%, according to this and previous survey results. 60% of the work of delivering an analytics, AI and machine learning use case isn't in building the dashboard, isn't in the data scientist defining and coaching the model. Isn't in the fun stuff, therefore, Shane. The stuff that we think about and use. Rather it's in the loading, joining together, refinement and embellishment of the data to take it from its raw material state, buried in source systems into something ready to be used in analytics.
Friction in the data analytics value stream is driving people away
So, any time a company is thinking about delivering a data use case, they have to think about, the majority of the work is going to be in refining the data to make it useful. And so, we then asked for more information about what that was like, and the survey results were pretty clear. 75% of the data teams that we surveyed, at least, reported to us that the ways that they were doing that were slowing them down, mostly because they were either using outdated technology to do that, pre-cloud technology repurposed to a post-cloud world, and that was slowing this down. Or because they were doing it in granular ways. The cloud, I think many of us think it's quite mainstream, and it is, right? It is pretty mainstream. But it's still quite early in this once-in-a-generation tectonic change in the way that we deliver enterprise infrastructure technology. It's still quite early. And normally in technology revolutions, we start off doing things in quite manual ways. We code them at a fairly low level.
So, 75% of data teams believe that the ways that they're doing migration of data, data integration and maintenance, are costing their organizations both time, productivity and money. And that constraint also makes their lives less pleasant personally as they otherwise could be. Around 50% of our user respondents in this survey revealed this unpleasant picture, Shane, to be honest, of constant pressure and stress that comes with dealing with inefficient data integration. To put it simply, the business wants, needs and is asking for more than they're capable of delivering, and that leads to 50% of these people feeding back that they feel under constant pressure and stress, experiencing burnout, and actually, this means that data professionals in such teams are looking for new roles and looking to go to areas with more manageable work-life balances.
So, yeah, it's an interesting correlation between the desire of all organizations, really, to make themselves better using data, the boot on the hose pipe slowing down our ability to doing that, meaning that data professionals are maxed out and unable to keep up the demand. And that, in turn, therefore, leading to stress and difficulty in attracting and retaining talent into teams. Does that all make sense?
Shane Hastie: It does indeed. And, certainly, if I think back to my experience, the projects that were the toughest, it was generally pretty easy to get the software product built, but then to do the data integration or the data conversions as we did so often back then, and making that old data usable again, were very stressful and not fun. That's still the case.
Data preparation and conversion is getting harder not easier [10:43]
Matthew Scullion: It's still the case and worse to an order of magnitude because we have so many systems now. Separately, we also did a survey, probably need to work on a more interesting way of introducing that term, don't I, Shane? But we talk to our clients all the time. And another data point we have is that in our enterprise customers, our larger businesses – so, this is typically businesses with, say, a revenue of 500 million US dollars or above. The average number of systems that they want to get data out of and put it to work in analytics projects, the average number is just north of a thousand different systems. Now, that's not in a single use case, but it is across the organization. And each of those systems, of course, has got dozens or hundreds, in many cases thousands of data elements inside it. You look at a system like SAP, I think it has 80,000 different entities inside, and that would count as one system on my list of a thousand.
And in today's world, even a company like Matillion, we're a 600-person company. We have hundreds of modern SaaS applications that we use, and I'd be fairly willing to bet that we have a couple of ones being created every day. So, the challenge is becoming harder and harder. And at the other side of the equation, the hunger, the need to deliver data projects much, much more acute, as we race to change every aspect of how we work, live and play, for the better, using data. Organizations that can figure out an agile, productive, maintainable way of doing this at pace have a huge competitive advantage. It really is something that can be driven at the engineering and enterprise architecture and IT leadership level, because the decisions that we make there can give the business agility and speed as well as making people's lives better in the way that we do it.
Shane Hastie: Let's drill into this. What are some of the big decisions that organizations need to make at that level to support this, to make using data easier?
Architecture and data management policies need to change to make data more interoperable [12:44]
Matthew Scullion: Yeah, so I'm very much focused, as we've discussed already, on this part of using data, the making it useful. The refining it from iron ore into steel, before you then turn that steel into a bridge or ship or a building, right? So, in terms of building the dashboards or doing the data science, that's not really my bag. But the bit that we focus on, which is the majority of the work, like I mentioned earlier, is getting the data into one place, the de-normalizing, flattening and joining together of that data. The embellishing it with metrics to make a single version of the truth, and make it useful. And then, the making sure that process happens fast enough, reliably, at scale, and can be maintained over time. That's the bit that I focus in. So, I'm answering your question, Shane, through that lens, and in my belief, at least, to focus on that bit, because it's not the bit that we think about, but it's the majority of the work.
First of all, perhaps it would be useful to talk about how we typically do that today in the cloud, and people have been doing this stuff for 30 years, right? So, what's accelerating the rate at which data is used and needs to be used is the cloud. The cloud's provided this platform where we can, almost at the speed of thought, create limitlessly scalable data platforms and derive competitive advantage that improves the lives of our downstream customers. Once you've created that latent capacity, people want to use it, and therefore you have to use it. So, the number of data projects and the speed at which we can do them today, massively up and to the right because of the cloud. And then, we've spoken already about all the different source systems that have got your iron ore buried in.
So, in the cloud today, people typically do, for the most part, one of two different main ways to make data useful, to do data integration, to refine it from iron ore into steel. So, the first thing that they do, and this is very common in new technology, is that they make data useful in a very engineering-centric way. Great thing about coding, as you and I know well, is that you can do anything in code, right? And so, we do, particularly earlier technology markets. We hand code making data useful. And there's nothing wrong with that, and in some use cases, it's, in fact, the right way to do it. There's a range of different technologies that we can do, we might be doing it in SQL or DBT. We might be doing it using Spark and Pi Spark. We might even be coding in Java or whatever. But we're using engineering skills to do this work. And that's great, because A, we don't need any other software to do it. B, engineers can do anything. It's very precise.
But it does have a couple of major drawbacks when we are faced with the need to innovate with data in every aspect of how we work, live and play. And drawback number one is it's the purview of a small number of people, comparatively, right? Engineering resources in almost every organization are scarce. And particularly in larger organizations, companies with many hundreds or many thousands of team members, the per capita headcount of engineers in a business that's got 10,000 people, most of whom make movies or earth-moving equipment or sell drugs or whatever it is. It's low, right? We're a precious resource, us engineers. And because we've got this huge amount of work to do in data integration, we become a bottleneck.
The second thing is data integration just changes all the time. Any time I've ever seen someone use a dashboard, read a report, they're like, "That's great, and now I have another question." And that means the data integration that supports that data use case immediately needs updating. So, you don't just build something once, it's permanently evolving. And so, at a personal level for the engineer, unless they want to sit there and maintain that data integration program forever, we need to think about that, and it's not a one and done thing. And so, that then causes a problem because we have to ramp new skills onto the project. People don't want to do that forever. They want to move on to different companies, different use cases, and sorry, if they don't, ultimately they'll probably move on to a different company because they're bored. And as an organization, we need the ability to ramp new skills on there, and that's difficult in code, because you've got to go and learn what someone else coded.
Pre-cloud tools and techniques do not work in the modern cloud-based environment
So, in the post-cloud world, in this early new mega trend, comparatively speaking, one of the ways that we make data useful is by hand-coding it, in effect. And that's great because we can do it with precision, and engineers can do anything, but the downside is it's the least productive way to do it. It's the purview of a small number of valuable, but scarce people, and it's hard to maintain in the long term. Now, the other way that people do this is that they use data integration technology that solves some of those problems, but that was built for the pre-cloud world. And that's the other side of the coin that people face. They're like, "Okay, well I don't want to code this stuff. I learned this 20 years ago with my on-premise data warehouse and my on-premise data integration technology. I need this stuff to be maintainable. I need a wider audience of people to be able to participate. I'll use my existing enterprise data integration technology, ETL technology, to do that."
That's a great approach, apart from the fact that pre-cloud technology isn't architected to make best use of the modern cloud, public cloud platforms and hyperscalers likes AWS Azure and Google Cloud, nor the modern cloud data platforms like Snowflake, Databricks, Amazon Redshift, Google BigQuery, et al. And so, in that situation, you've gone to all the trouble of buying a Blu-ray player, but you're watching it through a standard definition television, right? You're using the modern underlying technology, but the way you're accessing it is out of date. Architecturally, the way that we do things in the cloud is just different to how we did it with on-premises technology, and therefore it's hard to square that circle.
It's for these two reasons that today, many organizations struggle to make data useful fast enough, and why, in turn, therefore, that they're in this lose-lose situation of the engineers are either stressed out and burnt out and stuck on projects that they want to move on from, or bored because they're doing low-level data enrichment for weeks, months, or years, and not being able to get off it, as the business' insatiable demand for useful data never goes away and they can't keep up. Or, because they're unable to serve the needs of the business and to change every aspect of how we work, live and play with data. Or honestly, Shane, probably both. It's probably both of those things.
So our view, and this is why Matillion exists, is that you can square this circle. You can make data useful with productivity, and the way that you do it is by putting a technology layer in place, specifically designed to talk to these problems. And if that technology layer is going to be successful, we think it needs to have a couple of things that it exhibits. The first one is it needs to solve for this skills problem, and do that by making it essentially easier whilst not dumbing it down, and by making it easier, making a wider audience of people able to participate in making data useful. Now, we do that in Matillion by making our technology low-code, no-code, code optional. Matillion's platform is a visual data integration platform, so you can dive in and visually load, transform, synchronize and orchestrate data.
That low-code, no-code environments can make a single engineer far more productive, but perhaps as, if not more importantly, it means it's not just high-end engineers that can do this work. It can also be done by data professionals, maybe ETL guys, BI people, data scientists. Even tech-savvy business analyst, financiers and marketers. Anyone that understands what a row and a column is can pretty much use technology like Matillion. And the other thing that the low-code, no-code user experience really helps with is managing skills on projects. You can ramp someone onto a project that's already been up and running much more easily, because you can understand what's going on, because it's a diagram. You can drop into something a year after it was last touched and make changes to it much, much more easily because it's low-code, no-code.
Now, the average engineer, Shane, in my experience, often is skeptical about visual 4GL or low-code, no-code engineering, and I understand the reasons why. We've all tried to use these tools before. But, in the case of data, at least, it can be done. It's a technically hard problem, it's one that we've spent the last seven, eight years perfecting, but you can build a visual environment that creates the high-quality push down ELT instruction set to the underlying cloud data platform as well, if not perhaps even better than we could by hand, and certainly far faster. That pure ELT architecture, which means that we get the underlying cloud data platform to do the work of transforming data, giving us scalability and performance in our data integrations. That's really important, and that can be done, and that's certainly what we've done at Matillion.
The skills challenges are most apparent in large organisations
The other criteria I'll just touch on quickly. The people that suffer with this skills challenge the most are larger businesses. Smaller businesses that are really putting data to work tend to be either technology businesses or technology-enabled businesses, which probably means they're younger and therefore have fewer source systems with data in. A higher percentage of their team are engineering team members. They're more digitally native. And so, the problem's slightly less pronounced for that kind of tech startup style company. But if you're a global 8,000, manufacturing, retail, life sciences, public sector, financial services, whatever type company, then your primary business is doing something else, and this is something that you need to do as a part of it. The problem for you is super acute.
And so, the second criteria that a technology that's going to solve this problem has to have is it has to work well for the enterprise, and that's the other thing that Matillion does. So, we're data integration for the cloud and for the enterprise, and that means that we scale to very large use cases and have all the right security and permissions technology. But it's also things like auditability, maintainability, integration to software development life-cycle management, and code repositories and all that sort of good stuff, so that you can treat data integration in the same way that you treat building software, with proper, agile processes, proper DevOps, or as we call them in the data space, data-ops processes, in use behind the scenes.
These challenges are not new – the industry has faced them with each technology shift
So, that's the challenge. And finally, if you don't mind me rounding out on this point, Shane, it's like, we've all lived through this before. Nothing's new in IT. The example I always go back to is one from, I was going to say the beginning of my career. I'd be exaggerating my age slightly there, actually. It's more like it's from the beginning of my life. But the PC revolution is something I always think about. When PCs first came in, the people that used them were enthusiasts and engineers because they arrived in a box of components that you had to solder together. And then, you had to write code to make them do anything. And that's the same with every technology revolution. And that's where we're up to with data today. And then later, visual operating systems, abstracted the backend complexity of the hardware and underlying software, and allowed a wider audience for people to get involved, and then, suddenly, everyone in the world use PCs. And now, we don't really think about PCs anymore. It's just a screen in our pocket or our laptop bag.
That's what will and is happening with data. We've been in the solder it together and write code stage, but we will never be able to keep up with the world's insatiable need and desire to make data useful by doing it that way. We have to get more people into the pass rush, and that's certainly what we and Matillion are trying to do, which suits everyone. It means engineers can focus on the unique problems that only they can solve. It means business people closer to the business problems can self-serve, and in a democratized way, make the data useful that they need to understand their customers better and drive business improvement.
Shane Hastie: Some really interesting stuff in there. Just coming around a little bit, this is the Engineering Culture podcast. In our conversations before we started recording, you mentioned that Matillion has a strong culture, and that you do quite a lot to maintain and support that. What's needed to build and maintain a great culture in an engineering-centric organization?
Maintaining a collaborative culture [25:44]
Matthew Scullion: Thanks for asking about that, Shane, and you're right. People that are unlucky enough to get cornered by me at cocktail parties will know that I like to do nothing more than bang on about culture. It's important to me. I believe that it's important to any organization trying to be high performance and change the world like, certainly, we are here in Matillion. I often say a line when I'm talking to the company, that the most important thing in Matillion, and I have to be careful with this one, because it could be misinterpreted. The most important thing in Matillion, it's not even our product platform, which is so important to us and our customers. It's not our shiny investors. Matillion was lucky to become a unicorn stage company last year, I think we've raised about 300 million bucks in venture capital so far from some of the most prestigious investors in the world, who we value greatly, but they're not the most important thing.
It's not even, Shane, now this is the bit I have to be careful saying, it's not even our customers in a way. We only exist to make the lives of our customers better. But the most important thing at Matillion is our team, because it's our team that make those customers' lives better, that build those products, that attract those investors. The team in any organization is the most important thing, in my opinion. And teams live in a culture. And if that culture's good, then that team will perform better, and ultimately do a better job at delighting its customers, building its products, whatever they do. So, we really believe that at Matillion. We always have, actually. The very first thing that I did on the first day of Matillion, all the way back in January of 2011, which seems like a long, long time ago now, is I wrote down the Matillion values. There's six of them today. I don't think I had six on the first day. I think I embellished the list of it afterwards. But we wrote down the Matillion values, these values being the foundations that this culture sits on top of.
If we talk to engineering culture specifically, I've either been an engineer or been working with or managing engineers my whole career. So, 25 years now, I suppose, managing or being in engineering management. And the things that I think are the case about engineering culture is, first of all, engineering is fundamentally a creative business. We invent new, fun stuff every day. And so, thing number one that you've got to do for engineers is keep it interesting, right? There's got to be interesting, stimulating work to do. This is partly what we heard in that data survey a few minutes ago, right? If you're making data useful through code, might be interesting for the first few days, but for the next five years, maintaining it's not very interesting. It gets boring, stressful, and you churn out the company. You've got to keep engineers stimulated, give them technically interesting problems.
But also, and this next one applies to all parts of the organization. You've got to give them a culture, you've got to give each other a culture, where we can do our best work. Where we're intellectually safe to do our best work. Where we treat each other with integrity and kindness. Where we are all aligned to delivering on shared goals, where we all know what those same shared goals are ,and where we trust each other in a particular way. That particular way of trusting each other, it's trusting that we have the same shared goal, because that means if you say to me, "Hey, Matthew, I think you are approaching this in the wrong way," then I know that you're only saying that to me because you have the same shared goal as I do. And therefore, I'm happy that you're saying it to me. In fact, if you didn't say it to me, you'd be helping me fail.
So, trust in shared goals, the kind of intellectual safety born from respect and integrity. And then, finally, the interest and stimulation. To me, those are all central to providing a resonant culture for perhaps all team members in an organization, but certainly engineers to work in. We think it's a huge competitive advantage to have a strong, healthy culture. We think it's the advantage that's allowed us, in part, but materially so, to be well on the way to building a consequential, generational organization that's making the world's data useful. Yes, as you can tell, it's something I feel very passionate about.
Shane Hastie: Thank you very much. A lot of good stuff there. If people want to continue the conversation, where do they find you?
Matthew Scullion: Well, me personally, you can find me on Twitter, @MatthewScullion. On LinkedIn, just hit Matthew Scullion Matillion, you'll find me on there. Company-wise, please do go ahead and visit us at matillion.com. All our software is very easy to consume. It's all cloud-native, so you can try it out free of charge, click it and launch it in a few minutes, and we'd love to see you there. And Shane, it's been such a pleasure being on the podcast today. Thank you for having me.
Mentioned