Transcript
Crossland: My name is Tamsin Crossland. I'm an architect. I work for Icon Solutions. We're a FinTech company. We work in instant payments. What I want to talk about is applying machine learning to fraud detection. When we first started researching it, we found two themes that were going on. We found these hype type things. I'm sure you've all seen this, when will we bow to our machine overlord? By 2025, robots will be playing symphonies and all that stuff. Then we found the other extreme as well, which was the fairly wacky math. What we were looking at is how can we actually apply this technology to our requirements and to those of our clients? I'm going to talk about payments. Then I'm going to do a demonstration. Then I'll end with a few thoughts.
In terms of payments, the way it worked, if you wanted to interact with the bank through most of 20th centuries, you had to go into a branch. That was the only way you could interact with the bank. If somebody wanted to steal money from a bank, they had to rob it. That was basically the only option they had, which is why you can see the big security barriers that they had in the branches at that point in time. Then, moving on to about 1960s, the bank started employing new technologies. They took things like the IBM 360 series, and they actually started using it. Even then it was pretty secure. The people who were using it were people who worked for the bank. It was a closed network. If you wanted to actually get into the systems, you had to go into the bank's offices, and you had to be an employee. The potential for fraud was fairly small. There was a little bit of fraud going on internally, but it was fairly minimal.
The Increasing Scale, Diversity, and Complexity of Fraud
Then it all changed. Things have changed with the technology. We suddenly saw internet banking. We went from having closed networks, which were fairly secure to suddenly the bank's systems are connected to anywhere across the world. Then more recently, we've seen mobile as well. Vulnerabilities in payment service have increased as the shift to digital and mobile platforms accelerates. Then more recently, what we've seen is instant payments. I'm working on a project for a client right now, and they want to be able to do payments within 5 seconds. If Greg here wants to make a payment to me, within 5 seconds, his bank has got to check that he's got enough money. That I'm not on the sanctions list. That what we're doing doesn't look fraudulent. We've got 5 seconds to do that. The amount of time has gone from a matter of hours where we had an overnight batch run to having to do things very quickly, which doesn't leave much chance for fraud detection. Then finally, what we've seen is the dark web. This enables people involved in fraud to actually share their expertise. Sophistication in fraud has increased. We've got bad actors actually collaborating together. We've got an exchange of stolen data, techniques, and expertise. If one of them finds a certain fraud technique that works, they can share it with others and they can all apply it onto different banks.
This graph is pretty scary, and this goes from 2013 to 2016, produced by McKinsey. It's showing the average number of fraudulent transactions attempted per merchant per month. It's a steep all climb, 34%. The blue is successful attempts. It's going up pretty fast. It's continued to do that since then. The fraud threat facing banks and payment firms has grown dramatically in recent years. McKinsey were estimating that fraud was at $31 billion by 2018.
We have people here who want to do payments and they might be doing it via the internet, mobile, and branches, or the corporate channels. They connect to the instant payments orchestrator. I care for the ones that exist as well. When you receive the payment, it has the orchestrator checking, is there enough money in the account? Is the person on the sanctions list? Are they doing fraud? Then they send it through a payment scheme to the bank. We've got 5 seconds to do that. We haven't got much time to detect fraud at all. This is why we've been looking at machine learning.
Prior to that, as Greg mentioned, people were using rule based systems. These were introduced around about the 1980s from a previous wave of AI when the expert system was introduced. The aim of the expert system was to encapsulate what fraud detection experts knew, and try and put it into rules. They've done a reasonable job, to be fair. An example of a rule, if a credit card transaction is more than 10 times larger than the average for this customer, then it might be fraudulent. That thing might work, where you've got a back office team who can look at the transaction, and let it go or not. You don't have that with 5 seconds. The rules allow humans to apply their expertise as long as you can apply their expertise within rules. Actually, trying to capture someone's knowledge and put it into rules proved really hard. That's why expert systems died out. The rules based systems have carried on since then. They're difficult and time consuming to implement well, to actually sit down with someone who's worked in fraud detection for 25 years, 30 years, and say, how do you detect fraud? Let's write a bunch of rules to encapsulate your knowledge. It's really difficult to do. You have a painstaking definition for every anomaly possible. Every anomaly possible is the key thing. Because if you leave something out, you've got a backdoor to fraud. If you make an omission, unexpected anomalies will happen, and nobody will suspect it. Now we've got our dark web where people are sharing secrets, "I found a particular rule that hasn't been applied by a certain bank," and they've got an opening. Legacy systems today, and this is overnight batch, apply about 300 different rules on average to approve a transaction.
Neural Network
This is what we're looking at applying now. In 1943, McCulloch and Pitts came up with the idea of a neural network. Computers weren't around at that time, and at that time they were looking at grid logic gates. The idea is we've got neurons in our brains. The way they work is you have your dendrites, which is the input. We have the cell body which does something that nobody really quite understands. What we do know is something goes into the dendrite, and something comes out at the axon. They're all interconnected. This is principle of the neurons connecting together to encapsulate knowledge. What we do with neural networks is we try and create a node here. We have inputs from other nodes and outputs to other nodes. We try and interconnect them in a similar way to the brain.
How do you train this? Each of the nodes within them, you'll see that they've got a numeric value. These values will be changed as you train your network. When you first start training a network. All those values are entirely random. As you train it more and more, those values get tweaked. The different weights attached to each of the different nodes get adjusted. For example, if we want to train a neural network to know about the difference between cats and dogs, what we do is we have labelled images. Each image will say, this is a cat. I'll load it at the input. On the output, I'll say, this is a cat. Then some really clever math behind that will actually adjust all the different weights between them so that you can tell where you've got a cat, or where you've got a dog. How does this apply to fraud? Banks have got a list of all the different transactions that they've performed over time. Hopefully, they've marked which ones are fraudulent. If we can load those up, all of which are labeled fraudulent or not, then hopefully we can adjust the weights and we can actually train our neural network.
Here's a really interesting use case produced by NetGuardians. What they're saying is that they collected 10 million records of payments over a 12 month period. They then trained a neural network on them. Previously, using a rule based system, they could look at about 32%, less than a third of their payments. After applying the neural network, they could look at 100% of all payments. They've gone from looking at less than a third, to looking at all their payments. What they saw from that, the results, 83% reduction in the number of false positive compared to before. False positives is basically where something's been flagged as a fraudulent transaction, and there is a 93% reduction in fraud investigation time. This is really exciting here. We've got 118% of fraud cases caught compared to rules based. We caught all the frauds that were previously uncaught and another 18% on top of that.
Comparing the two, rule based is good for catching obvious fraud scenarios. There's a lot of manual work required to actually generate all these rules. It's easier to explain. That's actually a big advantage of those. If I go into a bank and I say, why did you turn down my transaction? It's fairly easy to explain the rule based system. With machine learning, it's a lot more difficult because all we've got is a whole load of numbers, a whole load of weights have been adjusted. Actually, explaining how you've come to a decision is quite difficult to do. Machine learning is really good at finding hidden correlations in the data. It can find patterns, and it can adjust the weights within that network to find patterns. You're likely to detect frauds that you haven't seen before. We've got automatic detection of possible fraud scenarios.
Demonstration 1
I'm going to do a demonstration. In terms of the demonstration, feel free to do this. As a company we use Docker for our solution. It's scalable, and it works well. The one thing that Docker gives you, which is really nice is you get the equivalent of an app store. What you can do is you can type, Docker run TensorFlow latest, and it will download all the TensorFlow libraries and run it for you. How fantastic is that? Then we found various libraries. If you remember, I was talking about some of the crazy math that's going on. The nice thing about these libraries is we've been able to use them without having to get into the nitty-gritty of the math behind them. We log us on to, and we install Pandas, which is for data analysis, and then scikit as well.
The next thing was to actually find some data to train. To do that, we use Kaggle. On there, we found two days' worth of credit card transactions made in September 2013. Over those 2 days, there were 492 frauds that were detected. That's actually really low. This is one of the things that as data scientists we find difficult is, with fraud detection, most people are honest. That's why supermarkets allow you to do self-scan, because most people are actually honest, which is great for society. As a data scientist, it means that you end up with a really unbalanced dataset. Because they publish this, obviously, there's privacy issues and so on. It's been through a process called principal component analysis. It's a way of extracting information from a computing dataset. Basically, they'd apply that process and we've got a whole load of numeric values but no context to what they actually mean. For confidentiality issues, they wouldn't describe the features of the dataset or provide any more background information. They couldn't. They had to keep it confidential.
This is the data that we had. You can see V1 to V28 at the top, those are the features that have been anonymized using PCA. It's just a whole load of numeric values. Then we've got the time, the number of seconds between the transaction and the start of the dataset. We've got the amount. The only features which haven't been transformed by PCA are the time and the amount. Then on the right-hand side, you'll see class. That side is 0 if it's non-fraudulent, or 1 if it's fraudulent. When you look at the data it's nearly always 0.
How It Works
We install TensorFlow. We import the TensorFlow, the Pandas, the scikit-learn, NumPy numeric Python library. Then what we're going to do, first of all, is we're actually going to load the dataset up. These are the first few rows of the data. Then what we're going to do is one of the key things you do with machine learning, which is you have a training dataset and a test dataset. We've taken 80% of our data to actually do the training and then we've kept 20% back to test it. We train it on our model. Then once we've got a model, we test it using our test data. What we're doing is we're extracting 80%, and then the remainder becomes the test dataset. We do that. What we're going to do is we're actually going to run this basic one here, which has got an input layer here, which has got the different inputs and the different features, 30 different inputs, so it's 29. We then got our hidden layer, and then we've got an output layer with two outputs either for fraudulent or non-fraudulent. If I run this. I run it 5 times, and it's looking really good. If we look here, the first time we run it, we've got 99.59%. A pretty good day. The second time out of five, we run it and it's still looking pretty good, we're at 99.83%. We run it a third time, it's running at about the same now, 99.83%, fantastic. The fourth one, around about the same, it's settling at about 99.83%. I think you'd agree, if we could detect 99.83%, that's pretty good going. If I scroll down a little, we've got 99.82%, 99.83%, fantastic. It's all looking really good. Now let's run our test data, the other 20%. We run that, and 99.83%. I think, yes, pretty good.
At this point, we're thinking, "We've cracked it already." Until we start doing the analysis. This is where the data science starts coming into it. If I run that, 0 fraud. Do you remember I said, 0.172% of the data are fraud? Basically, it's trained itself using the majority dataset, the non-fraudulent transaction. Again, phenomenal accuracy. We're not detecting any frauds, which defeats the point. We also need to do something else. This was the point where we'd reached now. We were realizing, we need to do a little bit more. The first thing we tried to do is just to balance the dataset simply. We thought, why don't we just take 492 of the non-fraudulent ones, take first ones of those, and compare it to the actual fraudulent transactions? We ran it, and suddenly it's dropped to 49%. The reason for that is we've actually lost a lot of data. We've gone from about 285,000 records to 492. We've lost a lot of data with that. We need to be a bit more methodical about our approach. We need to move from just piling stuff in and trying to work, and actually do some proper data science.
We did a bit of research and found Janio Bachmann. He's done some really interesting research in this area. We imported a couple of actual libraries. There's a wonderful library, imblearn for imbalanced learning. You imported that, and then we also got one to visualize stats as well. The fundamental problem is we've got this imbalanced data. We've got 0.172 on the right-hand side of non-fraudulent. Then we've got a fraudulent transaction, it's very small. Then we've got a non-fraudulent. Our model by default will just come straight on the majority dataset.
We started looking at the data. Underfitting and overfitting is a term you might have come across. Underfitting is where we just take a really simple root for the data and we're not actually getting the picture. Overfitting is where we're allowing outliers to skew our model. Really, what you want is what we've got in the middle there, which is the ideal fit. An example of overfitting was in Japan. When they were predicting how often they felt they'd have earthquakes, and they used these metrics to build their power stations, which was unfortunate. Basically, they enlarged smart lines to take them down here. The overfitted line actually said it was a lot less likely to have a Richter 9 earthquake, and they did. Overfitting is a bad thing and we want to stay away from that.
We also looked at outliers. If you look at that chart on the right, American temperatures, and generally about in the 70s, and then we've got one that's 300. That's an outlier. If you have that in your model, it's going to skew your results. Try and remove the outliers.
When you're at school, if you've got x and y, you can plot that fairly easily. You've got two dimensions. If you want to be a bit fancy, you can do three, x, y, and z. What do you do? Like we had, we've got 30 different fields. How do you plot that? This is how principal component analysis works. For this example here, what we've got is looking at tumors, whether they're benign or malignant. The blue is benign and the M is malignant. In this case, we've got 30 different features. You can still see two clear clusters in the data for whether a tumor is malignant or benign.
Demonstration 2
This is how we went about doing it properly. If you looked at the range of values for the filter of PCA, they're in a range of about minus one to plus one. We scale time and amount to fit in in that scale as well. We looked at random under-sampling, which is, rather than just taking the 28 top records, or the amount that we need for non-fraudulent, what we actually want to look at is, can we pick out enough data so that we actually still keep the whole picture. That's called random under-sampling. Once we'd done that, we'd got a lot more balanced dataset. By being a bit more methodical, we'd actually kept the meaning behind the data as well. We then used a correlation matrix to identify the values that are most important. If you look at this bottom row here, we've got class, which is whether it's fraudulent or not. The blue values which are 2, 4, 11, and 19, they're blue. The higher these values are, the more likely the result will be a fraudulent transaction. Then we've got our negative correlations, which are the red ones, 17, 14, 12, and 10. They're negatively correlated. The lower these values are, the more likely that we'll have a fraud. What we're able to do is identify the features which are really determining whether a transaction is fraudulent or not.
Then what we wanted to do is to remove the outliers, the extreme values. We actually plotted them here. Then what we're able to do is say, actually, these values are outliers, so we can remove those. You have to be careful with this. There's a bit of trial and error that goes on to it, because if you remove too many you start losing useful data. There's a bit of trial and error really to work out, how much can you remove while you're improving your results without losing accuracy? After we'd done that, we got about 3% improvement. What we were looking at then was, can we actually do something with this dataset? We use a couple of libraries, PCA and t-SNE. What we were looking at is, we know we've got this dataset, is there actually a pattern in there? Using PCA and t-SNE what we're able to do is to say, actually, if you look at that red cluster, there's a clear cluster there, which is fraudulent. There's a clear cluster of non-fraudulent, which is the blue ones. We could say, looking at this data, we can actually see two clear clusters. We should be able to do something with this.
SMOTE
Recently, there's been a development called SMOTE, which is wonderful. If you remember our 0.172, which was our fraudulent data, what we can actually do is actually generate additional data, which is consistent with the minority dataset. The two approaches are random under-sampling, where we reduce the amount of non-fraudulent data. Then this is the alternative which is actually to create, in effect, additional fraudulent transactions. It creates synthetic or artificial data points, which are consistent with the minority class. It's a fairly simple neural network. We've got our inputs here. We've got 30 of them. We've got our 28 features plus time and amount. We've got a hidden layer. Then we've got an output layer with two nodes for fraudulent and non-fraudulent. When I talked about neuron, I said that there's actually something that goes within the nodes, which actually performs the calculation. What we used was ReLU. Basically, if it's less than 0, it returns 0. If it's more than 0, it returns a value. That's what was being calculated in each node. Then what we needed to do is to actually adjust all those different weights across the whole neural network. We used gradient descent which basically from 1943 when McCulloch and Pitts first came up with this idea, until the gradient descent, neural networks weren't really going anywhere. This was some really clever math, which is actually, if I can plot a gradient, when I reach that point there, that's my optimum result. That's applied to just all of the different weights, from our input dataset to a fraudulent or a non-fraudulent transaction. This is our neural network, we've got 30 nodes on the input, 32 on the hidden layer, and 2 on the output. We've got around about 2000 different parameters to adjust. When we run it with the first sample where we'd reduce the majority dataset, we're getting pretty good accuracy at 92%.
In terms of measuring the result, we decided to use confusion matrix, which I always thought was really accurate because it confused the hell out of me when I first saw it. It's actually fairly simple. True positive means you've had a fraud and you've detected it, good. True negative, you've had a transaction that's not fraudulent and you detect it. False negative means you've had a fraudulent transaction and you haven't detected it. That's particularly bad because banks are losing money. False positive is also bad. I'm on holiday. I've seen a vase I really like. It is quite expensive. I pick it up. I go to buy it. By the time I get there, there's a whole queue of people behind me. I want to buy the vase. I give my credit card. It gets declined. How do I feel at that moment in time where it's hot? I'm on holiday. I want this vase. There's a whole queue of people behind me. I don't like my bank. False positives are not a good thing if you want to keep your customers happy.
Looking at the random under-sample that we ran, 92 out of 98 fraudulent transactions were detected. It's pretty good. If you look at the false positives, we've got 1763, which is quite high. There, if you compare it to 55,000, it's still a small amount. We then run the same thing using over-sampling. This is where we're generating synthetic data points to inflate our minority dataset. We're creating additional fraudulent transactions. We run that through as well. Our accuracy is pretty good. It's pretty much 100%, 99%. If we look at the result for here, 68 out of 98, so it's not as good as the other one. On the other hand, we've got a lot less false positives. When you speak to certain banks, they accept that they're going to live with a certain level of fraud. That's the cost of doing business. They might be willing to live with a slightly higher level of fraud, to not have people getting upset when they're trying to buy a vase on holiday and they're suddenly being flagged as a potential fraudster. There's a business decision behind which approach do you take.
Up to now I've talked about supervised learning, which is, we've got holiday transactions. They've all been labeled as fraudulent or non-fraudulent. What about unsupervised learning, which is where we've got the transactions and none of them are labeled. How does that work? If we take a sample of data, I've got Disney characters here, can we split them into meaningful groups, into clusters? If we take the Disney characters, at the top we've got birds, at the bottom we've got rodents. We've split them between the two using unsupervised learning.
For the demonstration, what I'm going to use is very much a standard dataset that's used in data science, which was a sample of Irises that were taken in the middle of the 20th century. We've got 50 samples. For those samples, 4 features were measured from the sample, the length and width of the sepals and the petals in centimeters. What we want to do is, can we split them into three clusters for the three different types of Iris that we've got? We're using K-means, which is an approach where we try and split it into clusters, and K is the number of clusters that we want.
Demonstration 3
If I load my dataset, we've got 50 different values. We've got four values for each of the different plants with the length and width of the sepals and petals. I'm going to apply K-means to those and I'm going to say I want three clusters to come out of it, which is done. If I plot that, and we scroll down, now we've got three different clusters. It's detected the three different types of Irises. It's put them into three different clusters. "Why is she talking about plants when we're in the middle of a talk on fraud and stuff?" Because that technique can be used to look at transactions. If we're trying to find transactions which we don't know whether they're fraudulent or not, we can use that technique to say, I've got a new transaction. Is it closer to the cluster of fraudulent transactions, non-fraudulent, or suspicious transactions? Actually, use that to find new transactions and to find out whether a transaction might be fraudulent or not. We can actually compare results. We can take results from our supervised machine learning. We can take our clusters and actually analyze the data, and see, can we pick out other records that might have been fraudulent or not?
Wrap-up
Two questions every machine learning project should ask, and they really should, is the purpose of the project ethical? Is the implementation of the project ethical? I work with banks. Some people will be surprised, but banks actually take ethics fairly seriously. They spend millions of pounds doing adverts with horses running on beaches and all this stuff. What they don't want to do is find all that being undone because when they have one of our clients referred to as a news night moment, which is, you have the dramatic music and the first item of news is, mega bank has been sexist in the way it's implemented its fraud detection. They don't want that. They don't want to be on the back pages. They certainly don't want to be in the front pages. It's up to us to make sure that when we implement it, we do it in a way that's ethical. Is the purpose of the project ethical? What you're asking is what are the additional benefits of the project and who does it benefit? It didn't take me long to do this. These are actual stories of misery. Scammers look for vulnerability in older people. Scammers targeting flood victims. What we're trying to do is stop that. The question is, is the purpose of the project ethical? Yes, I would say it is. I'd be interested to argue with anyone who would say that it isn't.
Is the implementation of the project ethical? Does it implement unfair bias? There's a fairly famous story where a company decided to implement machine learning for their recruitment. They loaded all their previous recruitment decisions. They loaded it up. Trained their model. Their model had learned from their previous pattern. When they'd done recruitment in the past they were sexist, no surprise there. Suddenly, they were actually making sexist decisions in their recruitment. They had to withdraw their machine learning system. You need to test that you're not implementing unfair bias. You need to disclose to stakeholders about their interactions with AI. That's actually a little bit tricky when you're doing payments in 5 seconds. I think what you'd have to do in the instant payments world is say to your customers, when you make a payment, your payment will be scanned using machine learning. Then you've got to have all the usual things for your governance. It's got to be secure. It's got to be reliable, robust. You've got to have the right accountability in place.
Is the implementation of the project that I described ethical? Absolutely not. We're saying key factors are 2, 4, 11, 14, because for certain colors on the graph, we have no idea what they mean. We might be saying, we're going to flag transactions as fraudulent for reasons that we don't even know why they're doing it. We might be making really prejudiced decisions, which are not acceptable to society today. You need to understand your data before you make these decisions.
Is it intelligent? Have I created a sentient being on my laptop that's able to be aware of fraud, aware of people, aware of everything? No, of course, I haven't. On the other hand, what it is doing is intelligent. If you recruited somebody that came in to your company and they're able to spot new patterns of fraud [inaudible 00:35:54]. I think you'd regard that person as intelligent. It's doing something that is seen as intelligent by people, but I'd say it's intelligent. What I've shown you is that we've been able to take a dataset and we've been able to detect 92 out of 98 fraudulent transactions using neural networks. If you want to see an implementation for AI for good, I describe it as this.
Questions and Answers
Participant: As per my understanding, the Kaggle dataset has numerical values that are confidential. The features on the Kaggle dataset has numerical values that are confidential?
Crossland: They are, absolutely.
Participant: How can a fraud detection system label new transactions, non-existent in the dataset, by not knowing what features to extract from that transaction?
Crossland: Because it's detecting patterns. This is the big difference between rule based learning and machine based learning. With rule based learning, if we're taking exactly what you've described, you're saying, if a transaction is in a location that the customer has never been to and it's a lot more than they've been to, then we can say, yes, that might be a fraudulent transaction. We speak to our fraud detection experts and they say that's actually a good way of doing it. We can do that. With machine learning, all we've got is a whole load of different numerical values. What we're doing is we're just training those values based on patterns. As we load more and more transactions, those different weights are all being adjusted. There's some really wacky math going on behind there.
You're quite right. When we do machine learning, the model doesn't understand what it's doing. It doesn't know what a cat or a dog is. It has got no idea at all. All it knows is that one numerical representation of pixels is a cat and another one is a dog. By adjusting the weights and loading enough values into it, we can produce a model that we can then apply in the future.
Participant: From your experience, what is the minimum percentage of the fraudulent that you need to have in the dataset, 20%, 10%, so the model is able to understand?
Crossland: Between the training and the test data, we tend to do 8 to 20. It depends how much data you've got. If you've got a decent sized dataset, then you want to make sure that when you do your testing, you've got enough so that you can actually get a reasonable result from that, but stuff around 8 to 20.
Participant: We know that for example, cats, dogs, they continue to look like they look now or just adjust slowly. The fraud patterns change quite frequently. If the model is trained for one piece of data in 2013, 2020 is a completely different story. What do you do?
Crossland: Absolutely. You have to keep doing it. Because to be able to do it, what you have do is you have to train a model. Then you test it. Then you look at what you're generating. Then you make sure you're not doing bias. Then you have to keep doing it. As you're getting new data in, you need to be retraining your model. You need to have version control. You need to have all the things you'd have when you implement new software to do that. You're quite right. If you've trained your data on from 2018, you need to be loading new data somewhere else to generate a new model. You then need to be testing that new model in parallel to running your old model. Then you need to do all the things that you do when you release any new software before you actually implement it.
Participant: The 5-second throughput for instant payments, were you able to achieve? This is fraud, which is more or less rampant in the industry. How about sanctions because that is not numerical data. That's a country name. It's a person's name, which is sanctioned. What do you do with sanctions changing very frequently, like Turkey buys missiles from Russia, is sanctioned by the U.S. After a week, it's not sanctioned.
Crossland: In terms of the 5 seconds, this is an investigation, we haven't implemented it. Once we've trained our model, and you train your model beforehand, you would then want to publish a service to say, check this particular transaction as it goes through to see whether it's flagged as fraudulent or not. You'd be applying that model to individual transactions. We could do that within the 5 seconds.
I think what you're saying is with sanctions, you've got a list of names. You could use another branch of AI which is looking at actually detecting variations of names, that type of thing. If you've got Jon, they might be using Jonathan, or variations of that name, John with an H, Jon, that type of thing, to actually do pattern matching to detect names. You could certainly use that type of technology to do it.
Participant: You mentioned that there was one approach where you had more false positives, another one when you had more false negatives. Is there a more structured way of trading off between those two possibilities? Some parameter you can adjust in your model to determine how much you care about false positives or false negatives.
Crossland: I was comparing two different approaches, one was where we'd reduce the number of fraudulent transactions to balance, and the other one is where we inflated the number of fraudulent transactions and we're taking the two. I think with both of those, if we knew the dataset better, we could probably adjust them so that we could either have more false positives or we have less fraudulent transactions detected. We could probably adjust either of the two models to do that to a certain degree if we knew the dataset better.
Participant: Is there still a purpose or a need for a rules based fraud engine, or does ML replace that need?
Crossland: I think for now we are going to see rule based detection in place just because of explainability. That is the problem. There is some really exciting work going on in machine learning to explain it. If I went into a bank with machine learning, and I think Apple have been experiencing some of this, and you say why have you detected it? The only explanation you can give is, one of the parameters somewhere in the middle of that has said that you're potentially fraudulent. That's really difficult to explain, especially with equality tags and so on. I think in terms of where we are, I can see rule based remaining until we can get explainability improved.
Participant: You've mentioned both supervised and unsupervised learning, does both take place in the 5 seconds time? What's the role of unsupervised version in this process?
Crossland: I think with supervised, the 5 second would be our final model. We've developed a model and then we provide a service. Once we've trained our models, we've trained all the different values that we've gotten there. We've got a model which we've tested it, and it's not doing bias, and all that type of thing. We can say, give me a new transaction. Then I can say this one's fraudulent or not. That model will be mostly trained using supervised learning. I think what we might be looking at as well is to use some of the techniques for unsupervised learning to say, actually, are there additional things that we could look at? If you look at some of the research that's being done with deep neural networks, where we have multiple layers, you might actually say there's a particular bit of the data that I want to look at, can I work on that? You might determine that approach using your unsupervised learning. I would say, supervised learning is how you build your model. Then if you want to research and look for other patterns, and so on, that's where the unsupervised learning comes in too.
Participant: Then to touch a very interesting comment about the ethics of such a project. What is your opinion? The mathematical or statistical model shows the correlation between parameter A and fraud or not fraud. From the ethic point of view, the parameter A could be sensitive, or not to be politely used nowadays, what do you do? Because mathematically and financially we see that this impacts the fraud coefficient, but from the ethics perspective is not allowed to be used?
Crossland: I think you've got to do your best to make sure that your implementation is ethical. To do that, you've got to test it. You've got to really test your data. You'll find most banks will have some code on their ethics on protected characteristics that you don't want to be making biased decisions on. I think you've got to make sure when you test your data, when you test your model, that you're not actually making those decisions. You're quite right. Otherwise you could end up producing an unethical solution.
Participant: My question is about when you start thinking about implementing this solution, because I'm assuming this Kaggle dataset is just a test phase. You've mentioned that fraud is a really quickly shifting field. In addition, you have to make sure that the decisions that you're making are ethical, so you have to check those decisions. How do you plan on making sure that you keep checking whether or not these things that you're suggesting are actually fraud? I'm assuming someone has to look at it, while balancing not checking everything. Because on the one hand, you might be missing new types of fraud. On the other hand, you might be overworking your employers, and then the whole point of the system is good. Do you have any thoughts on that?
Crossland: To implement an ethical solution, you have to test your data thoroughly. As a result of doing that, during that point, you suddenly came up with a new approach to fraud, a new way of doing fraud. During that window you are leaving yourself open to fraud in that particular pattern. Personally, I would argue, if you're going to be ethical about it, you're going to have to accept that at certain points you might have a slightly higher level of fraud while you implement that system.
Participant: I'm curious, how much manpower is being required for this research? How big is the team? What people you've involved? How long has it been worked on? If anyone was to embark on such an exercise, what time is required to get to the point of maturity that you're at now?
Crossland: There were three of us working for a matter of months to produce that. Three people in a matter of months to get 90 to 98. It's a pretty impressive result, really.
See more presentations with transcripts