Head of Marketing
The world is in awe of the transformative power of data. It promises to unlock mind-blowing insights, unveil lucrative new opportunities, save our planet from oblivion, spreading its tentacles into pretty much every corner of our business & personal lives…
So it may seem surprising that lurking beneath the 21st-century veneer of Blockchain, AI, NLP, and ML, there are vast swathes of banking that have fallen beneath the radar of the data revolution, with little change over the last 20, 30, or even 40 years.
In the mid-90s, I began my career as a business analyst putting risk systems into banks. I was cool with the envisioning stage, where we helped clients to see how our tech could transform the way they did their jobs, but I struggled inordinately when it got to the dreaded data mapping…Weeks of work, field by field, instrument by an instrument, writing the rules to take data from each source system and convert it into the structure required by the new system.
I wanted to cry, curl up in a ball, close my eyes, wake up in another life…And to rub salt into the wound, I knew that much of the logic I wrote would never be used, or would become obsolete before it was needed because something somewhere had changed.
Over 20yrs on, that laborious process for taking data from one system to another remains largely unchanged, but at a far greater scale: More systems. More data. More flows. Armies of analysts doing a similar job to the one that I found utterly soul-destroying 20 odd years ago. Vastly more effort going to waste.
So just why is this process so painful? Might there be another way?
My experience above relates to ‘schema-on-write’, the predominant approach to integrating data since computer systems became widespread back in the ‘80s, typically delivered via ‘ETL’ (Extract, Transform, Load). The concept is straightforward: First, define the required data model. Second, build bespoke rules to translate data from source systems into the required format. Finally, read or query the data from the pre-defined data model.
An ‘industry’ of data integration:
In a large financial institution, most systems draw disparate data from a wide range of other systems. This leads to a highly complex over-arching data model, and multiple onerous ETL builds. To minimise ongoing maintenance, much effort is made to anticipate future requirements (which typically means a lot of time is spent mapping data that might never get used). Furthermore, it’s inevitable that unforeseen needs will arise, which then require changes to the data model and to each of the ETLs and an onerous migration of historical data…
Upstream systems can change too, requiring changes to the ETL, and potentially to the downstream system’s data model as well.
Net result: An expensive and ongoing overhead of data integration.
Slow project delivery:
Building bespoke ETLs for each data source adds many months to project delivery timeframes. (i.e. before any value may be derived from a new system).
Within financial institutions, different functions, such as Finance & Risk, want to view the same data under a different lens. Of course, these different views should be sourced from the same data to ensure a consistent view, but designing a ‘one-size fits’ schema is a bit like trying to develop a new line of Marmites with universal appeal…
The lesser-known alternative to schema-on-write is schema-on-read (alternatively known as schema-agnostic).
In place of all that mind-numbingly laborious up-front mapping, raw data is onboarded ‘as-is’ in any format, reducing the time to onboard data from months to just a few hours. Mapping logic is then applied as and when a processing or reporting requirement arises. This means the highest priority requirements can be delivered in a fast and agile way, with no dependency on laborious ETL builds. Nothing is mapped unless it’s genuinely needed, and the logic is far more up-to-date. That means, value is delivered faster, with fewer people, and with all data in one place, it’s easy to accommodate new requirements.
With schema-on-read, the mapping typically creates a virtual data model to extract the necessary information from the raw data. These objects are reusable, accelerating the delivery of future requirements that may use the same concepts. This also means different classes of users may define their own data objects, enabling- for example – Finance and Risk to view the same, single ‘source of truth’ under their respective lenses.
The schema-agnostic approach relies on a database that supports highly flexible and fast data modeling, search, and retrieval. Sadly, the old-fashioned relational database, with its rigid data model just won’t cut it – and relational databases are still the dominant technology of most core banking systems.
Maybe you’re ditching an old database? Choosing a new system? Or even (long, deep breath) considering how to re-invent those ‘untouchable’ core systems…?
Before you get lost in how modern tech can eat legacy for breakfast, spare a thought for how those shiny new systems might talk to each other. It’s an integration that kills most opportunities to employ great tech in banks – (great FT article here on how it can cost £1m to deliver each £100k of value).
In the worlds of sport or business, we take it for granted that a cohesive team of average players who communicate well can outperform a group of individualistic
stars… But few of us think this way about our systems.
Maybe it’s time we should.