This week, George Neal, Chief Analytics Officer at PrecisionLender, sits down for a second time with Maria Abbe to answer some of your questions about cleaning up and better leveraging your data.
Helpful Links
Podcast Transcript
Maria Abbe: Hi and welcome to the Purposeful Banker podcast, the podcast brought to you by PrecisionLender, where we discuss the big topics on the minds of today's best bankers. I'm your host, Maria Abbe, content manager at PrecisionLender. Today I'm joined by George Neal. He's our chief analytics officer here at PrecisionLender. Thank you all for joining us. A few weeks ago George and I sat down and we talked about data, loving your data that is. Meaning, we discussed how to treat your data and how to use it properly. We also touched on artificial intelligence and our instance of artificial intelligence being Andi. You can listen to that episode by clicking the link within this podcast episode.
Today we're taking that conversation one step further. Since that episode, George has received a lot of great feedback and some additional questions. That's what we're discussing today. George, so you spoke about moving the data quality and correction function to the people that are most tied to the impact of poor quality data. Most of those people, they have full time jobs already. One question we received was, how have you seen this work successfully? Then, what do we need to watch out for?
George Neal: That's a really great question. Most banks and most organizations don't have extra people sitting around on their hands doing nothing. If they do, there are other issues other than data going on. What we've seen be successful and what most organizations that do this well have managed to do, is they've made the correction function very, very low overhead. They've implemented a very fast method to correct data quality issues, and they've empowered the people that are interacting with that data to correct it at the point that they see a problem. That means that your data correction tools have to be powerful enough that they can be used on the fly in the midst of a conversation or transaction. Similar to how you would say, "Oh, you have an updated address? Hang on, let me update that right here in the system right now," and have that transaction take only a couple of seconds.
The people who are impacted by data quality need to have one and two second mechanisms to correct that, and that has to be build in the bank. That's oftentimes a challenge for IT environments, but where we've seen it work successfully and where we've seen data in power organizations, that's one of the consistent themes we've seen. Build a mechanism to correct it very quickly, and make that overhead essentially nonexistent to those personnel, and they'll take care of your data.
Maria Abbe: You said last time we spoke that no organization has perfect data, but data quality is a huge issue with auditors, and particularly when they're using models for things like credit decisions or capital forecasting. Do you have any suggestions on how to reconcile the natural imperfection in the data versus the need for accurate and complete that makes the auditors and regulators, et cetera, happy?
George Neal: This is an ongoing struggle in a lot of environment; healthcare, financial services, and a few others. Where, when you take a traditional financial auditing approach, when you think about dual entry accounting and the kind of auditing that comes from that, the goal is to make sure that every transaction is recorded, that the ledger is 100% complete and that everything is accurate. For the kinds of work that we do here in PrecisionLender and the kind of empowering machine learning that's taking place in the FinTech industry and in banks, it's a different set of requirements. We really have to change the conversation from one of accurate and complete to one of sufficient and representative. By that I mean, most people who work in the machine learning, statistics, in any of these spaces, what they're looking for in the data set is, does this data represent reality? Do I have enough of it? Do I have sufficient data to draw a conclusion?
That requires, in many cases, educating your auditors or educating your partners that are quality checking this to understand what is sufficient. You need to have metrics around that. You need to have metrics around, do I have sufficient data to make this conclusion? There are measures available in data science and in statistics that'll allow you to measure that pretty well. Is my sample representative? If you're using less than all of your data, because for whatever reason your data is imperfect, because it's data in the wild and it's going to be imperfect, is the data that you're using representative of the thing you're trying to model? Again, there are measures for that as well.
The first step in this is always getting the auditor, regulator, et cetera, comfortable with whatever unit of measure you're using. Whether you're comparing means and standard deviations, looking for matches in distributions among your data and among your population. Whatever metric it is that you're using, make sure that they're on board with it. Then keep true to that. The way that these arguments tend to hold is if you show, I have sufficient data, I have representative data. That's what I need for a valid model, and here's why, and here's how I make sure that the data that I'm using moving forward remains sufficient, remains representative. I've seen that conversation go very, very well. I've never seen anyone in the machine learning and financial modeling space do very well with the accurate and complete argument. I would certainly try to steer towards sufficient and representative.
Maria Abbe: Now, we had two questions come from some of our banking listeners, actually. One of them was asking about how to move from reports to action oriented data use. The other, about prioritizing data warehousing efforts. Now, you group these two together?
George Neal: Yeah, so when I listen to these questions about, how do I move from reports to action data, or how do I plan for data warehousing, the reason I group them together is because I think that the answer is effectively the same. That is, you need to, before you engage in your data warehousing effort and before you can say, "Hey, I have action oriented data," you need to identify those reports and those points where your data can cause a behavioral change and focus there.
For those people that we've spoken to that are saying, "I have this warehouse full of reports," physical warehouse full of reports, "I have reports coming out of my ears, but they don't change anything," somewhere in there someone's reading one or two of those reports, making a different decision, and changing what's happening in your organization. Find those reports. If you find more than a handful of them, I'd suggest in your initial efforts reduce it down to a handful. Identify what change is being driven by that, and make your effort one to automate and empower that change. Shortcut out the whole reporting process.
If you find that, for example, you have great reporting structure around migrations of credit quality and how you want that to change who you offer credit to, why then does that go through a report, through a human being, through potentially a committee, through all kinds of mechanisms to eventually get into your underwriting system or your pricing system, to then face the customers? Why not simply take that migration of credit quality and directly feed it into those environments and into those systems as an automated change? Then for those people who want to make sure that there's a human monitoring all this, simply report out this was done. The change has already take place.
Similarly, for data warehousing efforts I've seen a lot of money spent on data warehousing efforts where the end goal was effectively, "Let's generate our reports better and faster." In many organizations, if that's the direction you've taken, I understand it, but I would wholeheartedly recommend for those embarking on new data warehousing efforts, start with the same approach. Pick the five or six reports that generate the most change and empower your data warehousing to support automating that.
That means in many cases, don't build your data warehouse for reporting, but rather build it to support the analytics for automated action, and they're different. Automated action data warehousing tends to have a lot more time series data, it tends to capture things specific around a behavior you want to change, rather than these gross reporting databases that, while impressive and good for going back to our audit conversation, good for auditing, are less useful in getting an ROI on a data warehouse.
Maria Abbe: Now, we had one listener write, and I love this. It goes, "I know there are no silver bullets in data use and management, but have you or your team discovered any silver-plated bullets you can point us at?"
George Neal: I thought that was pretty funny when I read it, and that's why I sent it over to you. There are a few silver-plated bullets that we might point you to. Going back to our first question about how to make it easier for the frontline, in our environment we found that using chat bots and instant messaging systems as means for data correction is actually really, really effective. People inherently understand messaging systems. They interact with them all the time. If you can simply type in, "Data correction bot, fix this field to this," it cuts that transaction down to seconds and it's very empowering. I wholeheartedly encourage you to look at the use of those conversational language tools in your environment as something that will really help this process out.
The other thing is, any organization that has the ability to measure where change happens, finding out and knowing where your changes are taking place at a policy, price, or exception level. Look at your exception logs, look at where you're changing policy and customer interaction rules, and those are great places to find the ability to correct data. They seem to be, wherever we have exceptions, there tends to be a need for that type of corrective tool. Wherever we have specific policies that impact policies that are being changed, usually those change drivers are also indicators that we're going to have data quality issues, so look there.
Maria Abbe: Interesting. It sounds like based on everything that we've spoken about today, there's a lot of change and behavior change within an organization which can be a big feat. Do you have any suggestions on how to take this to the executive team to get their support in order to make that change happen?
George Neal: This is always a hard question, because executive teams are very, very different in what they prioritize and how they approach it. What I have seen is, where executive teams don't want to jump in full body to the idea of, "Yes, we need to empower our data and yes we need to spend money to do that," I've rarely seen an executive team say, "We don't think that data's valuable and we don't want it to be empowered." What I've seen instead is oftentimes a, "But that money needs to be spent elsewhere." Let me kind of focus my answer to that.
If what you're facing is a situation and a crisis of resource and budget, pick one item and let's simply take it to them and say, "This one report, this one change, this one function, let's correct it and see whether or not we get the return we're expecting and expand from there." We've had a couple of partners that have gone through a process similar to that, where it's been one to show value and then perhaps the second to prove it wasn't a fluke, and then you get a whole lot more buy-in.
People understand the value of expediting change. I think most of the resistance that's typically gotten is that historically when executive teams have bought in on, "Let's do some type of process improvement," they haven't seen the returns from those IT investments that they might've. Start small is my actual suggestion. Most of these answers can scale pretty quickly, and once they scale up you'll see huge returns on them, at least in what I've seen. I think it's fairly easy to build a self-evident case through just one or two changes, and many times you can find one or two that can be done with minimal effort.
Maria Abbe: Great. Well George, it's been great having this conversation again about data with you. I think that will do it for us today, so thank you everybody for listening. You can always find more information about today's episode at precisionlender.com/podcast, and if you like what you've been hearing make sure to subscribe to the feed in iTunes, in SoundCloud, or in Stitcher. Of course, we would love to get ratings and feedback on any of those platforms. Thanks again for listening everybody, and until next time, this has been Maria Abbe and George Neal, and this is the Purposeful Banker podcast.