In the beginning of time, Grace Hopper invented business software and Cobol at IBM. This meant that in addition to calculating the trajectory of an intercontinental ballistic missile, large computers could now be installed, literally built into the very lower basement of international enterprises and be used to process payroll and do forecasting, gradually obsoleting armies of calculators and typists.
Earliest business applications
At this point, nobody knew how to actually or maintain write large software systems. I mean, granted, current paradigms like object oriented programming and functional programming were formalised around that time, but still, people had not written as much software back then as you now have to write to make a car set fire to petrol at an opportune time given crank angle, engine speed and temperature (I exaggerate, but not really).
From terminals to personal computers
Eventually Microsoft began it’s journey to global domination in the office by piggybacking onto IBM’s good name and getting installed into offices by default since nobody got fired for buying IBM. Unfortunately, IBM’s own operating system guys that wrote time sharing operating systems for mainframes had no influence over the people that created the CP/M knockoff that was to be PC-DOS (and Microsoft’s MS DOS), so it had no multi-user or networking security features built it as, well you were supposed to have it on your desk, and buy another computer for somebody else’s desk. No need for passwords, amirite? I mean UNIX existed and was fairly wide-spread in the corporate environment at the time, and it had decent security built-in, at least a fundamental understanding that even as a power user, you don’t want to have that power all the time.
The beginnings of Enterprise IT was therefore to maintain and write software for mainframe computers, and that was so alien of work to most organisations that it quickly became outsourced to suppliers that could lower maintenance costs, but overcharged for development work, thus lowering OPEX but moving software to CAPEX. This had one of the first negative incentives driving the business to make fewer changes, making enterprise IT infamously slow to react and become a department of No.
As technology changed, the IT department stayed in the cellars and basements where the business rarely ventured. As mainframes were phased out and replaced with banks of personal computers, the white coats disappeared. Gradually, typing stuff into computers at the IT department stopped being work exclusively for women, and men started taking over; salaries increased (not saying there is a causal relationship, just saying), but still no natural daylight.
Forces conspire to make organisations write and maintain code
There is a syndrome that happens to computer people called “Not Invented Here”, which is a form of exceptionalism that means that no outsider could possibly understand our Very Special Requirements, so we need to write our own X, where X can be anything. While certain pieces of software became standardised, like payroll and accounting for small businesses, there is still big money in helping people shoot themselves in the foot by developing and maintaining their own adaptations of commercial ERP systems like SAP and on a lesser scale Dynamics NAV or GP. Microsoft Excel is the most successful way of combining both, as in letting people buy a commercial off the shelf application but then making excel sheets with bespoke maths that can both be the lifeblood of new business and be its cause of death when the bespoke template turns out to have a bug and nobody knows how to fix it anymore.
An accounting quirk (see the bit about CAPEX above) where you count code written as value created rather than a pure expense, and book development costs as having added capital value to your code base despite, objectively, few if any of your competitors would buy your bespoke mess of a back-office system if you offered it on the open market. Also they most often don’t count the depreciation that comes naturally unless you make the code maintainable and easy to refactor if requirements change in future.
At this point we have an alienated IT department that has only two key metrics, to cut cost and to add features to internal software products, but no real way to directly discuss requirements with those that use the software.
In the beginning you only have developers. As in, they develop, profess to test and deploy their code without any supervision. Then you have an outage that embarrasses somebody in management but no singular scapegoat could be found and all of a sudden you get ops guys that are there to protect the business from the devs. After another outage that embarrasses the management further, you may get a QA department. You can imagine how IT security comes to exist as a function within an IT department?
You now have one team that’s there to make changes that implement business requirements they have captured some way or another, one side that’s there to make sure those changes are valid and one side that’s there to make sure there are no outages. The incentives become to make few releases so that the QA guys want to be able to make a full regression before they green-light the release, and the QA guys are the ones getting a bollocking when bugs get out in the wild. The ops guys have enough to contend with without f^!£$g developers making changes ON PURPOSE, how are you ever going to maintain a stable system if you keep poking it with new software all the time?! So yea at best one release per month, anything quicker would be irresponsible and you would be working your QA department to the bone.
From a business perspective this means you are never getting your change in. As more things go wrong, process/red tape is added, lead times longer, change freezes are introduced periodically. Longer release cycles and bigger releases cause bigger problems.
Technically speaking, after mainframes people bought servers. At first they were just computers, beige like all the rest of them, and a server room was just a cupboard where they were shoved. You bought a physical computer from a supplier where you got a decent price, and you set it up, installed your OS and then you installed your software one it. Or you bought the computer and shipped it to your software vendor and let them install their software on it. 19″ racks became a thing, and servers became distinct from PCs in that they became loud and flat. You were going to stick them in a room away from humans anyway, so noise was no longer a concern. You did want many computers per rack, and you wanted simple but effective cooling, so you would fit powerful high-RPM fans.
As things progressed, people realised that it is hard to find office space that allows you to fit redundant power, automatic fire suppression and redundant network connections, so instead of trying to fit that into your basement, they would go to a third party that offered co-located datacentres. There you could mount your rack servers and they would give you ways to remotely manage them, so you wouldn’t have to physically interact with the servers to run them. All the patching and other maintenance could be done over the internet, and the data centre would make sure no villains could get at your hardware.
After a while people realised that you could just buy a couple of massively overpowered servers and then divvy up the computer horsepower onto virtual machines, pretend servers that would behave like separate physical machines. Carving out a bit of virtual compute and create a new “server” was a lot faster than buying a physical server and having your colo provider plug it into your rack. You would still have to install the OS and configure the networking, but there would be templates and automation. Heck, you could even write command line scripts to new up new servers.
Point is, in Enterprise IT there is no time to write scripts, and far be it from the mind of any ops person to collaborate with developers to explore things like version control and automated testing, I mean the developers are the enemy, the cause of all our problems, why would we collaborate? VMs are therefore largely artisan creations with very bespoke installs, apart from possibly sharing a raw template with some antivirus or monitoring, not even instance 1 and 2 of a load balanced pair have the same software on them.
As the blows keep coming with outages, bugs in productions and near misses that cause leadership to go on long-term sick leave, decrees can go out to create test environments that are “the same as production”.
Servers are bought (VM hosts, or course) and VMs are configured. Obviously, it is prohibitively expensive to make it exactly the same as production given that the load requirements will be different, so some corners are cut, but, it should be close enough. As the old meme would go – narrator: It was not close enough. Also, since the server operating systems are still hand crafted, there are multiple potential places where differences between production and test can creep in.
In various countries, leaps were made at various times that caused people to adopt electronics at vastly greater rate than before. In Sweden there was a push in the late nineties, early noughties where you would get a quite substantial tax rebate if you bought a modern computer at a certain cost, causing a lot of people to all of a sudden possess a modern computer. With various Covid stimulus cheques it seems a lot of Americans put money straight into a gaming computer (thus worsening the current silicon supply chain constraint situation). There have been similar schemes all over the world at one time or another to encourage adoption of technology to promote familiarity with new technology, I just can’t be bothered to google more examples. Basically, people have used the internet and are able to see what commercial software development can do. Another cause of people becoming aware of the wider world is how various Apple products have had a marked impact on Enterprise IT, and BYOD basically becomes a thing in businesses because the CFO one days comes back to the office with a MacBook Air and stares the IT department down until they “make it work” with their existing corporate software that only runs on a specific type of dell laptop they have imaged two years ago, requiring IE6.
The exposure to what computers can really do and the daily torture of using enterprise software to do their jobs, dissatisfaction among the people on the business side is rife. Eventually, some middle manager just takes their corporate expense account and hires some consultant off the books and builds some quick win software to solve a specific problem. If we are unlucky, this is an all out win. It works, it generates business and was executed in a timely fashion on, or just over, budget. A dangerous precedent is set, another wedge is driven in between the business and the IT department. Shadow IT is born.
To bring us to the final bit of the story, let’s assess where we are.
Enterprise IT is disgraced, underfunded and distinct from the core business, literally moved away from the rest of the organisation. A troll under a bridge or an ogre in a swamp. Sure, the CTO may report to the Board of Directors, but there is no real correlation between desired business outcomes and the metrics that the IT department measures its services against, and no work is done to ensure that IT services directly benefit the key outcomes that the business as a whole needs to achieve. Control of IT spend is instead largely project based, as in, the business has an idea, it needs some IT support, a project is created with a budget and a final ship date before anybody with technical know-how has even assessed it, and then work commences. Probably contractors are brought in if there is sizeable chunk of development, but normally the department is kept lean. Service desk and “maintenance development” presumably outsourced to a country in a different timezone.
What Enterprise IT wants is to be a force for good within the organisation, but since the beginnings of time IT has often been deemed a non-core support function, and that increased conceptual distance has made it more difficult to effectively be of use to the organisation. Cultural differences between IT and other parts of the organisation, and perhaps ineffective communication between stakeholders and the developer organisation as caused upper management to silo the organisations further apart rather than agree on an effective way of working closer together. In cases where IT is not involved in other aspects of the business, obviously there will be no “osmotic” assimilation of domain knowledge, so the business may be shocked of how little the IT department actually knows about the bread-and-butter business that keeps the lights on.
How do we turn things around? What are the most important things moving forward? I will leave out the Practices bit, because I have rarely seen bad practitioners in organisations I have worked with. People will write automated tests and implement CI and CD if they are given circumstances not directly hostile to professional software development. The bigger lego pieces are usually where the problems lie, and why management buy-in is crucial.
The VMs are not family or even pets, you should delete them all and start over. Not in one go, or to cause another outage, but replace at pace the VMs you are currently running in production with new ones created through automation. When you can automatically deploy your core business application from nothing to a running instance without any manual intervention you are done. Thinking you could replace your running VMs with automation but not actually having proven it has no value. Ideally make the deployment process be some version of “stand up new VMs with an app on them, run tests to make sure it’s working, route traffic to the new instances, destroy the old instances”, so that you can deploy without causing any loss of availability. This is crucial in building trust with the rest of the organisation. When I write VM, I’m not making a dig against containers or containerisation, just saying you can achieve this without moving to kubernetes and a service mesh, you can do it with bog standard VMs and a bit of scripting.
In order to be able to fearlessly deploy, you will have to decouple systems so that you can deploy small changes often, with small blast radius and short time to recover.
Organise your IT department after what you are supporting. Yes, after business functions but also after the systems that you maintain. If you have a monolith that support all business functions, then you need to split it up. The output of different teams needs to be independently deployable. This is not easy, but it is the only way to enable teams to make predictable progress. Consider it an investment in predictable outcomes.
Maintain products, don’t run projects
When you have finally managed to put together teams that match your customers and your work, then don’t disband them again after you have delivered a certain milestone and the “project is done”. Budget for products rather than projects, so see your own software as a driver to increase profits and reduce cost. Set targets for what you want to achieve and measure outcomes. Let developers prototype things and show the business what’s possible. If you have built the right platform for people, it will be possible to securely bring new ideas to life and test them with real clients and get true feedback. This way you can delight customers faster, and there is lesser risk you get saddled with maintaining some Shadow-IT piece of software that suddenly became core IT after it was successful and the consultant that wrote it moved on to greener pastures.
By introducing a wider interface between the business and IT, and making teams that have the autonomy and platform support to independently iterate of features quickly, you will both delight the business, delight the customers, make more money and have happier teams. There will be times when you have to organise and coordinate, but the lion’s share of work can be done independently.