Yeah, these things don't "just happen". I cannot imagine even in a relatively low stakes environment (let's say a photo sharing app) to fuck up that badly without getting a heart attack.
I am in Spain right now and I have to get cash from an ATM soon, it feels like russian roulette to do so. I had to use a money transfer service UK internally to pay a dentist bill, because TSB's online banking website was showing an outdated phone number of mine that is used to verify transactions for new recipients. And of course, when you change the number you cannot use the new number for 2 days because of "security concerns".
IBMs involvement in the case doesn't fill me with confidence either.
I really hope this disaster is finding its way into MBA courses as an example of why you need a sane migration path, "no matter" the costs.
EDIT: removed the request for recommendations for new banks, should be a different thread.
I worked on a bank migration project, early on in my career, and these things are a nightmare all around.
First, building something like this requires an acute understanding of banking software. And banking software means something written in COBOL, RPG400 etc. These languages are pretty old and hence, finding talent for these is like trying to find needle in a haystack. So, most of the stuff has to be done via bruteforce, trial and error. So most of the analysis provided by expert and "senior" business analysts is just that, analysis. Engineers have to bang their heads against the wall to glue stuff together.
Secondly, everyone has to be on the same page. The idea has to that customers are priority and egos are not.
So stuff like this shouldn't happen at all:
> To make matters worse, the Sabadell development team did not have full control – and therefore a full understanding – of the system they were trying to migrate customer data and systems from because Lloyds Banking Group was still the supplier.
In our case, the team managing the original product made it difficult for us to merge customer data. They would frequently seed the data incorrectly and refused to provide proper data dictionary. They demanded training on the new product and that all data transformation should be done by them.
Needles to see after spending 3 year and 100s of millions, the project was scrapped. The migration was never completed and both banks remained on their respective systems, kicking the can down the road.
> banking software means something written in COBOL, RPG400 etc. These languages are pretty old and hence, finding talent for these is like trying to find needle in a haystack.
It's really interesting to see which of the new UK and European 'challenger' banks purchased third-party banking software and which decided to spend the time writing their own modern banking systems [1]. A few examples from the linked article...
Monzo: For banking ops, it decided to build its own platform. Technology used is mainly open source: Linux, Apache Cassandra distributed database (used by the likes of Apple and Twitter), Google’s Go (golang) programming language at the back-end and PostgreSQL relational database. The system is hosted at two data centres in the UK on Mondo’s own hardware. There is a team of 16 people working on this.
Atom Bank: It has created a hefty technology set-up in its run up to the launch: FIS’s Profile core banking system; FIS/Sungard’s Ambit Quantum and Ambit Focus for treasury and risk management; Iress’ Mortgage Sales & Origination (MSO) suite for mortgage business, front-to-back office; Wolters Kluwer’s OneSumX for regulatory reporting; Intelligent Environments (IE) for front office capabilities; CSC’s ConfidentID system for security; Phoebus Software for secured business lending and account servicing for residential lending; and WDS Virtual Agent for customer queries supplied by WDS (a subsidiary of Xerox).
Starling: The bank has an in-house developed core system. It uses GPS and Bottomline Technologies for processing and payments operations, respectively.
At what point does it become cheaper to buy one of these smaller banks and migrate everyone across to their platform?
I’m Monzo’s Head of Engineering. A lot of the technologies mentioned above for us are accurate but some aren’t in use - notably PostgreSQL - and some others which are crucial are missing - eg. Kubernetes. The vast majority of the platform runs on AWS. Our engineering team has grown to around 60 now.
Modern doesn't necessarily mean good. AWS hasn't been without its share of problems. "Tried and tested" is something I would be looking for with banking software.
Saying that TSB's stuff has just gone titsup and as far as I am aware they aren't using anything too trendy.....
The real answer here is that it's proably never going to be cheaper to buy a bank with a working it platform as merging terabyte worth of actively changing data from an underdocumented 1960 noSQL database into a new greenfield product is the hard problem almost nobody have succeeded at doing without massive problems and unexpected delays and costs.
In reality what happens when a bank buys another bank is that yet another middleware layer is introduced so that frontend system can access data from both of the old system not that one system is discarded while the other survive.
You'd think they could copy over the limited data people actually need - account balance, direct debits, mortgage status and payments and the like over to the new system and archive all the other stuff?
Maybe it is part of crafting an article but did anyone else find the contrast of the looming tech disaster and team champagning really unfair?
The issue was obviously a systematic one, and that 18 month slog would have been a horrific death march for the people actually working on the project so why shouldn't they be able to celebrate it being "over"?
Granted, it was a failure, but I'm not really sure what the floor staff - the actual "software engineers" who were pictured had to do with it when they were set an impossible goal to begin with...
> why shouldn't they be able to celebrate it being "over"?
Because it's not nearly over. In my experience, going into production is the most worrisome phase. For one thing, no matter how much you test and prepare, you just never know what's going to happen. For another, there are the initial bugs and user confusion that come with any new deployment, and the rush to respond to them effectively and quickly in order to maintain user confidence in the new system. And in the bank's situation we're not talking about an update, but an entirely new thing made from whole cloth (or code), if I understand correctly.
The outcome of the project, the goal, isn't to press the big red 'release' button, it's a stable, functioning system that meets the project specifications. When you have that, then you are done and it's time to celebrate. And exhale.
You're right - I fully agree with you, but I actually think that's kind of an entirely separate issue.
If it took your team an 18 month death march sprint just to get to the "red button" stage, I have a hard time imagining many people who wouldn't toss it all in without some kind of company recognition and were instead told "lol JK welcome to level 2 - hope you're ready to work harder now".
I'm not sure even the videogames industry could get away with not pandering to morale that badly without mass exodus.
Nah I think this is just a personality thing. Coworkers often try to get my hyped up about finishing some release and I've learned that it's just not how I respond. My "celebration" is just to be relieved, go home and tell my wife about it, and start thinking about the next thing.
Taking a moment to enjoy a success, or just that a major heap of work is over, is critical.
The industry has a major problem with burnout. If we don't slow down, and take a break, then eventually, burnout becomes inevitable. The mind gets overworked.
Some people celebrate to recharge. Others stop using a keyboard, and find something else to do.
I don't recharge around others, I find it exhausting. But after a major project ends, I do usually find myself buying the new hit PC game, or taking a hike into the mountains.
Your coworkers celebrate, so that they can feel the weight of the release lift off easier.
Something else might already be playing that role with you - and some releases will be easy for you, and several months of hellish stress for someone else.
I think you missed my point (I probably didn't make it well): my point was just that "celebration" in the typical sense, is not how I take a moment to enjoy a success.
If you fix scope and timeline, then the thing that has to give is quality. It doesn't seem like they failed to deliver all the parts of their system, they just didn't all work correctly.
I worked for Lloyds TSB around 2003/2004. In banking the domain knowledge (banking & finance) is more valuable then the technology knowledge. There was a guy at Lloyds who couldn’t write a line of code, but he know every field and every column and what that field meant and why it was there.
This guy was as close to unfireable (is that a word) as it gets.
That sounds familiar. I was there too around 2003-2005, very little of what has been said surprised me and rang a lot of bells from my time there. Hugely siloed and with very strict hierarchies.
You would think domain knowledge extended to not using floating point values for a balance, I just downloaded a CSV statement and it's full of entries that look like 1234.560000000003.
The team at TSB must be utterly incompetent, if the alternative of bringing in a new team, with absolutely no knowledge of the systems or what has been done in the last 18 months, is thought to have a better chance at working. That's the message I get.
More likely, IBM will install a standard COTS banking software platform, migrate data where they can, and declare success. If account balances match, that will be enough. Desired functionality, either for internal users or customers, will be secondary.
And that's probably what TSB should have done from the beginning, minus the IBM involvement. That hasn't worked well historically, based on the number of lawsuits against then for failing to deliver contracted systems.
> The bank’s boss, Paul Pester, said TSB will waive £10m in overdraft fees and pay extra interest on current accounts. He has hired a new team of IT experts from IBM who have been told the problems must be fixed by Saturday.
(a quote from a different article).
This guy is an absolute joke. Just because you want something fixed quickly doesn't mean it's gong to happen. Bringing in an outside team is already a REALLY bad sign, but demanding an outside team to get up to speed and to implement a complete fix in two days? Yikes.
Systems integration is one of those problems that is new every time because every environment is different. Sure patterns arise and sometimes they can be cut/pasted between organizations but most of the time there is just enough difference to make it a huge risk. I have written at least a dozen integrations between SAP and commodity trading platforms and the mantra DRY doesn’t really apply. You start over from scratch each time. Sure I know more about the quirks of the various systems but just because it works at the last place doesn’t mean it works now.
All of these systems are moving targets as well at various version/patch levels so the best way to estimate projects of this nature is to take a conservative estimate and double it. Then add 50%.
> When TSB split from Lloyds Banking Group (LBG), a move forced by the EU as a condition of its taxpayer bailout in 2008, a clone of the original group’s computer system was created and rented to TSB for £100m a year.
> That banking system was a “bodge of many old systems for TSB, BOS, Halifax, Cheltenham and Gloucester and others” that had resulted from the “nightmare” integration of HBOS with Lloyds as a result of the banking crisis, according to one insider who had extensive access to and intimate knowledge of LBG and TSB’s internal systems over a prolonged period.
That sounds completely crazy. If you've got £100m a year in IT budget, why on earth would you buy a clone of Frankenstein?
You could hire a fine team of devs to build you a modern system. Then again, I'm not the kind of guy who believes in "never rewrite" which seems to be the advice.
> On Thursday he admitted the bank was on its knees, announced that he was personally seizing control of the attempts to fix the problem from his Spanish masters, and had hired a team from IBM to do the job.
This doesn't give me much confidence, either. Hiring outside help is a Coase problem. You're going to find frictions dealing with the externals. And it will cost you, I'm guessing, at least £100M a year.
With that kind of budget, and with IT being more or less all a retail bank does, you should hire hundreds of experienced staff, make them integral to the business, and let them solve the issues as they appear to the business units. When things happen they will have an idea of what the priorities are. There are plenty of software people who understand how banking works, and what systems are needed. Go and hire them.
And shut down your entire operation while said team worked?
The problem is that the average banks system is a kind of Frankenstein tree that have grown inside and around every policy, procedure and task the bank performs without any coherent design and with several dozen loosely coupled component each with is own poor and fragmented documentation.
And while it's typically only required to remain up 16-18 hours a day you have a zero allowance for unplanned downtime and a fairly high peak load which along with the age and complexity of some of the components make the entire systems a nightmare to run on a tight budget.
And it's worth nothing that the system that failed was not the Frankenstein system they inherited from Lloyds but the new one they tried to import from Spain.
So if a $100M budget would get a new system built, then what they would need is a $200M budget ($100M to run legacy in prod + $100M to build new system and gradually migrate).
The only good in-house systems I've ever seen we're (a) based on vendor reference designs w/ minimal changes, (b) based on OSS, (c) architected by some very smart people who stay at the company for non-monetary reasons.
Because when you boil it down to it, no company is big enough to solve a problem better than a group working with multiple companies (unless the problem is trivial).
Or to word it another way, are you bigger / better / smarter than both of your top competitors put together? If no, then don't reinvent the wheel.
Banks can't have any downtime. They need to be able to process EFTPOS and Credit Card Transactions 24x7. Now, not all systems need to be always available in the bank, but the key ones do.
It is not easy. Many regulations and existing legacy systems result in systems build around FTP and flat files. Bussiness side also does not have a clue, sometimes it is very difficult to tell why someone made a decision 10 years ago since nothing is documented.
The only hope is to replace systems one by one and make gradual migrations. Problem with this approach is that you need to add new interfaces to existing legacy systems and add some crazy stuff to the new system.
Above would be possible if you can hire top talent and retain it for many years. With current IT market, it is not possible. Even google and facebook that pay way above market struggle to retain people for more than two years. Additionally, business domain is boring and technology is outdated.
> Even google and facebook that pay way above market struggle to retain people for more than two years
Perhaps they're not paying "way above market" then. If buyers paying "way above market" struggle to attract sellers, it means the buyers are wrong about what the market price is.
People find a partner, have children, move to follow their partners or move back to take care of their parents. Projects are completed, roles shift, company changes.
A higher salary doesn't have any effect on any of the above.
Attracting is one thing, retention is another. Pay isn’t the reason people leave from FANGS oftentimes, but pay is oftentimes the reason people move to FANGS.
Talented people want to make an impact. Working for google building proto-to-proto services is opposite of making an impact. On average your work will have very little spotlight but you will make lots of money for the company.
If you were a jazz musician in the 1920s, the best-paying stable job you could get was in Paul Whiteman's band. So Whiteman was able to attract some of the best players of his day (well, the best white players of his day, anyway). But the work frequently meant playing boring arrangements for upper-class toffs. There was a fair bit of turnover in the staff despite the good pay. Some of his players just drank themselves into an early grave.
You can't just say 'hey guys we'll build you a new system, see you in two years'. It's about the migration path, it always is, building new systems is easy.
>>> That sounds completely crazy. If you've got £100m a year in IT budget, why on earth would you buy a clone of Frankenstein?
It's a fairly low budget for a big company. That's a few hundreds employees at an average cost of $100k then the rest goes to hardware, suppliers, support contracts and other costs.
That was the system they already had, spinning out a clone of the old one they were using was presumably seen as the easiest way to separate the two banks. And for all the negative press it got in that article, it did at least work...
Banking is particular in the sense that there is often a large number of legacy systems that need to communicate reliably, at the same time that major business/technology decisions are made by leaders who do not have technical understanding. (Even in my native Norway, lauded for good digital banking services, almost no banks have a single technologist in their executive team -- the culture is changing, but most places still consider technology a service that is purchased and mostly separate from business concerns).
In a good case, top leadership will listen to architects and leads on the tech teams before making critical decisions, but in some cases, "business goals" will trump concerns from the technologists. This is of course a huge failure of communication, but it is an even bigger failure of organization. If technologists have to threaten to quit just to get their point across, the organization is broken.
Let's say I'm a senior banking developer i Europe. I'm being paid a fixed, moderate salary with a fixed number of hours each week, and I get no part of the bonus if this €100 million initiative succeeds, and I suffer no loss if we incur another €100 million of extra costs due to this failure. If I quit, it's with 3 months notice and it's a big PITA to find new work that suits my interests. What incentive do I have to do anything but do my best to alert the leadership to these problems, and then do my best to move the train of failure along?
The exact scenario described in this story -- an expensive service contract terminated at a hard date, with costs to re-instate this contract therefore becoming even bigger, and the development team being pressed to deliver on this deadline whether it is realistic or not -- does not surprise me at all, and must have happened dozens of times all over the world.
If it goes wrong on a small scale, you will only see a few hundred or thousands of customers affected in a non-catastrophic manner (e.g. see the wrong balance in their accounts, but with the correct number being accessible in the back-end system), but it stands to reason that this example would at some point happen at a spectacular scale with no easy way back.
I'm not holding my breath, but at some point the boards of these banks should realize that technology is a core competency, and get people with tech skills in a position to make critical top-level decisions. (Not to mention get pay to have some semblance of connection with the sums involved in the success or failure of the work -- I get the impression that this is the case elsewhere in finance).
> at some point the boards of these banks should realize that technology is a core competency
The regulator already has. Some years ago they fined NatWest after an IT issue because the management processes and technology risk management was not robust; I'd expect that they'll take a very hard look at this case.
I remember that quite painfully, I was an ulster bank customer back then. albeit I was fortunate enough to be a broke student with not much going in or out of my account then.
"""“This turned what was a super-hard systems job [into] a clusterfuck in the making,” the insider said"""
Oh dear.
Later:
"""The bank has been forced to cancel all overdraft fees for April and raise the interest rate it pays on its classic current account in a bid to stop disillusioned customers taking their business elsewhere."""
I think the main reason their customers aren't taking their business elsewhere is that their money is stuck till this is resolved.
The UK's banks are all obliged to offer a system to consumers (and small businesses now too) where - manually if necessary - the bank transfers that customer's current account to another bank in a specific time period (one week? 10 working days? I don't remember). This is a result of government investigators concluding that banks weren't actually facing much competition because their customers thought switching to a competitor would be really hard and so they didn't bother.
Obviously for this to be economical normally, the banks must automate the shit out of the problem. That means not just transferring the correct balance but identifying regular payments, notifying payees like employers, and sorting basically everything out. Anything they don't automate ends up as yet more work for their customer services agents, because if it goes wrong they have to pay to fix it. So for TSB right now this is yet another cost they're soaking, and they don't even get to keep the customers, those customers are gone, no take-backs.
I’m at a major UK university and many acquaintances use TSB. Not a single one of them can log into their account. It has been this way for around a week, as far as I can tell.
I have an account, tried logging in the other day just to see how screwed it was and actually had no issues at all so I guess not everyone is affected (I tried on like Thursday so after the worst of it was done I believe)
I managed to log in, but the system is pretty barebones.
Trying to change or apply for a new banking product just takes you to a help page saying the ability to do this is 'Coming soon'. (Some other features are scheduled to be available by the 'End of April', for comparison.)
Also, in the 'pending transactions' popdown, e.g. £38.60 is displayed as '38.6'...
That implies the balances are being represented as floats and turned directly into strings ... how does something that basic happen at all? Is Sabadell's Spanish web UI like that too? No wonder they're screwed
I got a letter from TSB earlier this week stating that they have to send me paper statements this month, they are unable to provide paperless statements for the time being.
I wonder why exactly this has failed. It feels like when using good practices - especially TDD, this shouldn't have happened.
Also wanting to do "big bang release" is a recipe for failure.
I am in Spain right now and I have to get cash from an ATM soon, it feels like russian roulette to do so. I had to use a money transfer service UK internally to pay a dentist bill, because TSB's online banking website was showing an outdated phone number of mine that is used to verify transactions for new recipients. And of course, when you change the number you cannot use the new number for 2 days because of "security concerns".
IBMs involvement in the case doesn't fill me with confidence either.
I really hope this disaster is finding its way into MBA courses as an example of why you need a sane migration path, "no matter" the costs.
EDIT: removed the request for recommendations for new banks, should be a different thread.