Another man’s ML

If you have seen the “code review” of Imperial College’s modelling code, after it being tidied up by Microsoft and others, and the reactions to it, I’d like to offer my unsolicited medium temp take on code review, legacy code and the type of code that people for whom code is not the goal but the means to an end tend to write. If you already have firm opinions here, you might want to skip this, this is an attempt to explain development stuff to people that don’t do software development for a living. Features an unnecessary recap of computer science history.

Background

Developers

Writing code is basically about solving a specific problem by expressing it as source code. Either a complex problem that you cannot fully keep all in your head, or a simple but tedious one that you wish somebody else would just do for you. Or perhaps you are just exploring something you find curious or interesting, but that is not perhaps the most common situation in a professional setting.

Many people with various disparate backgrounds develop software today. Some start by being “in computers” but on the operations side, i e administrating networks or servers, some start because they want some thing in Excel to just do this little thing, getting into the worst Wikipedia hole ever that takes them to a whole new career – and of course some start out programming right form the beginning. Others go to university and learn computer science but stay away from academia and get a normal software engineering job. Now these backgrounds play into when you read their code. If you are a machine engineer and your task is to make a combustion engine behave nicely (start in cold weather, use little fuel, have a pleasant throttle response, deal with less than ideal fuel quality and stay within emissions regulations) you might look at the hardware you are dealing with, knowing what problem you want to solve and then learn what little you can get away with with regards to the various hardware specs, libraries and language quirks and your code might not make sense to a nodejs back-end developer, but another machine engineer might at least know what you are on about and understand the variable names.

What is a program?

Batch

In the early days of business software, you ran batches. You would have a payroll program that would calculate people’s wages, calculate fees and holiday balances, you would feed it a stack of employee records and it would output some printouts that could be given to accounting. Input-=> program => output. One go. Boom. Bob is your uncle. Some programs still work like this. If you remember BAT files on DOS, those were named that way because it was short for Batch. These programs have a start, a middle and an end. On Linux there are various shells that fulfil the same role but are more advanced. Usually, when something goes wrong, you will at some point discover that something has gone awry and you abort mission and show some kind of error message to the user, hoping they know how to fix the problem. In most cases this type of error handling is not only sufficient but preferable is this situation, as the program usually has just one job, so it might as well fail spectacularly with loads of information dumped in the output if it cannot follow through, making life easier for the user when trying to make things work.

The smallest computers businesses would have before the PC revolution were mini computers. After a while these became powerful enough that instead of time-sharing, you could have multiple users using the computer at the same time using something called a teletype , an electric typewriter keyboard paired with a separate printer. You were typing into the computer, and the computer would respond onto the paper. It looked like a command line but on paper. In 1969 the Internet was beginning to be a thing at universities and the Telnet program and protocol was invented. This meant that you could use your TeleType to talk to computers far away over a network (!).

You can see this vestigially in Linux today since /dev/tty is a virtual device that is the command window you are currently typing in. TTY of course short for teletype. The whole paper thing was deeply impractical of course, and soon they replaced the printer with a monitor and the “terminal” was born, and for a decade and more working on the computer meant you used a terminal to interact with a mainframe or mini computer.

Servers

The reason for bringing up Telnet and teletypes is that telnet is a different type of program, our next type. Or rather telnetd is. Telnetd starts out on the command line, creates a new process and closes the standard “files” (stdin, stdout, stderr) that the command prompt uses to feed information in and out of the program, leading the operating system to act like the program has ended, but in actual fact it is still running with an open network socket, listening to network calls ready to serve for instance users using the program telnet – without the d – to connect. This type program that detaches from its owning terminal is called a daemon, and there are plenty of daemons in your average Linux machine. A similar concept in Windows is called a Windows Service. These programs are how servers are implemented. Web servers, email servers, game servers. You start them and they perform a specific task and will never finish until you specifically terminate them. It is important that daemons are resilient to failure, so that one user connecting and experiencing a problem does not affect other users of the same computer. Use error codes or special protocol states to report problems back to the user, or disconnect the user, but the program must not itself exit unless explicitly told to stop. With these long-running programs you would start noticing compound problems such as small memory leaks or file descriptor leaks could have severe consequences. These problems mattered less in batch programs, as long as the results were correct, all memory and file descriptors would be returned to the system when the program ended anyhow.

You saw a similar paradigm shift in the mid noughties when web pages went from being generated on the server and rendered in the browser to being small programs that ran in the browser for a long time. Memory leaks and other inefficiencies that never used to matter back when the world was recreated every single time you requested a fresh page from the server all of a sudden led to real problems for users.

Loops and leaks

In the 1970’s – computer games came into being. These pieces of software required ingenuity and engineering heroics perhaps beyond the scope of this post, but in terms of what type of program they were, they are more closely related to a server in that they do not terminate automatically, but in their early guises the did not wait for network input but ran in a loop that advanced time and moved players incrementally per iteration in the loop, reacting to input, determining whether objects had collided and updated game state for the next go round, always trying to use as little resources as possible to cram in as much game as you could on limited hardware.

Meanwhile in the offices, the personal computer revolution happened and first Apple and then Microsoft nicked the fourth type of program from the Xerox Palo Alto Research Centre, the graphical user interface, or GUI. This type of program was a bit like a game, in that it runs an event loop that listens to events that are sent form the operating system, or specifically the window manager, telling programs they would have to redraw themselves or similar. Because these message loops ran very often, any tiny bug in the event code could quickly cause big problems and early Windows and Mac programs were notoriously hard to write and problems were common. Basically, there was so much code needed to implement even a simple GUI program, known as boilerplate code, and people were reinventing the wheel. If only there was a way to reuse bits of code, so that if you were happy with a button abstraction, you could just use that button in other places?

Because the world of computers and stuff is so new you would think it was quick to adopt new ideas when as soon as they have been discovered right? Anyway. In the 1960s an ALGOL derivative and Simula 67 were working with Object Oriented Programming. Even the source of the user interfaces Apple and Microsoft nicked, Xerox PARC, were working with OOP in a language called Smalltalk. This seemed like the holy grail to some.

Objects, bodging and garbage

Already back in 1985 Steve Jobs was working on a prototype computer nicknamed the Big Mac that ran a proper operating system, a UNIX system, that had more reasonable hardware than the fairly anaemic ur-Macintosh that had premiered a year earlier. When Jobs made himself impossible at Apple and had to be fired, he took the prototype and his gang with him. NeXT and the UNIX based NeXT Step operating system came into being shortly later. The language used to write this operating system was Objective C, an attempt to weld object oriented features on top of C – a language which did not have these features, despite being developed in the same era as Simula and Algol, but that had been successful enough to immediately become the systems programming language of choice after it was used when developing UNIX in 1969.

When Jobs was eventually brought back into Apple, MacOS had reached the end of the road, and Apple has nothing but disdain for its customer base, so basically they replaced wholesale their old broken operating system with NeXT Step, badge engineered to be called MacOS X, and their existing developers and customers were told to just deal with it. Given the paragraphs above I am sure you understand what an enormous disruption that was to a company that had been making a living writing software for the Mac. They had to start over, almost from scratch.

Honestly- I wish Microsoft had done that with one of their UI stacks they invented in the late noughties. Microsoft had come to the end of the road with Windows UI graphics (GDI, from 1985). It has problems with multiple users on the same computer, both security and performance and it was baffled by modern resolutions and could not use modern graphics hardware to offload any processing. Microsoft too developed a stack that leveraged 3D processing hardware, but it had other failings and the Windows Division hated it, so they invented another, and another. Now they have UWP and seem happy with the performance. Ideally now they should cut the cord and let people deal with it, but that is not the Microsoft way.

Anyway, for NeXT Step, Jobs created InterfaceBuilder. A broken unstable piece of software that is still in use today when building user interfaces for the Mac and the iPhone. The beauty of it is that you draw the user interface in a graphical editor that shows your UI the way it will look when you run it. It would take Microsoft several years to come up with something even close. That thing became Visual Basic, and it was not properly object oriented, didn’t encourage proper separation of UI code and the code that solves your problem and on top of it, it had stability issues – but – it was so easy to use and create Windows programs that it too became a runaway success. It was just a tiny step up in complexity from writing excel macros, so it was a common gateway drug into programming.

A Danish academic called Bjarne Staastrup also got into the game of retrofitting object oriented features onto C, but his product C++ became much more successful and immediately became the main language used in application development in both high performance computing and in the Windows world, which at the time was vastly larger than the NeXT/Objective C realm. The coolest thing with C++ was that it was a strict superset of C, so any valid C was valid C++, so it was easy to gradually go more and more C++ and sadly despite C++ now supporting many recent concepts inspired by newer languages as well as its own groundbreaking features from a couple of decades back, most C++ developers are C/C++ developers, writing basically C with some objects. Code is then perhaps unnecessarily unsafe because the programmers are unaware of the newer safer ways of writing code that C++ now supports.

Object orientation seemed very promising, and there was much rejoicing. Developers were still very much involved in the nitty gritty and there was still a lot of details knowledge needed to write a program that should run on a specific computer. Also, C++ still made you manage your own memory and getting memory management wrong had huge costs in terms of vulnerabilities and lost productivity. High Performance computer manufacturer Sun Microsystems decided to solve this problem by creating Java. This language was compiled onto byte code, an intermediate language that was not machine code of individual physical computers but like the machine code of a well defined virtual machine that also managed application memory with a concept called garbage collection that had existed before but had been improved quite a bit. The Java Virtual Machine was then implemented on very many computers, and Sun pitched this code with the optimistic slogan write once – run everywhere. This was a runaway success and all interesting developments in enterprise software development, most cool databases and all of the Netflix networking stack is based on the Java Virtual Machine. Microsoft were dead jealous and created the .NET Framework and the C# language to try and crush Java. I mean C# still lives and is arguably still superior, but – no, they did not manage to do so.

Determinism

If you go to proper programming school, i e you set out to be a developer or at least you get an education on the subject, which again, is only true for a subset of those that write code for a living, you will these days have been told about unit testing, this means writing tiny bits of code to check that the rest of the code is actually doing what it is supposed to be doing. I was part of a generation that was let loose upon the world without knowing about this kind of stuff, and let me tell you, it makes a difference to how your code works.

When you start by thinking about how to test things, you move things out that you cannot replicate in a test. You do not check the local time, you have an abstraction in your code that provides the time, so that in the test you can replace the real time check with a fake check that tells the code exactly the time you need it to be. This means that tests are deterministic, they will work the same way every time you run them as long as you provide the same data.

It may shock you, but this is not obvious to everybody. Loads of businesses have code that works differently whenever you run it because it is hardwired to depend on the system time, on external sensors or similar to do its job. There are no seams where you can put a test dummy. Is it ideal? No, I would change it, but it does not guarantee that the code is broken. Would I complain in a code review? Yes. But it may have been working fine for 30 years.

Code Quality

After several decades and as software became more complex and due to the new types of programs being created, the increased need to avoid defects, the industry as well as academia had yet to answer how to write code with fewer defects. The military was worried, Med Tech was concerned. A bug in the software of a radiation cannon meant to treat cancer had already killed a patient. What are we to do?

Basically humans are bad at complexity and repetition. There are those among us that are more diligent than others, but you cannot rely solely on the individual diligence of your developers.

In the beginning you just wrote all the code in one place and hoped to keep track of it in your head.

Structured programming in the 1980s taught us to write smaller functions, divide programs by layers of abstraction into gradually more and more detailed implementation. The idea was at every level of abstraction you could just read the code and understand what was going on, and if you needed more detail about how things were done you would scroll down to the implementation. Large code files were not seen as a big problems yet.

We have discussed object oriented programming. This is what truly started the sprawl. If you look at a pile of Java code, every tiny class is in its own file in multiple depths of folders, and provided you can find the file, it is astoundingly clear and focuses the mind. Luckily the rise of Java also meant the rise of the Integrated Development Environment (well, basically everybody wanted what Visual Basic had) that quickly got enhanced editors that could make sense of the code and link you, like in a website, to other pieces of relevant code.

Basically people came up with metrics for code quality. How many code paths go through functions? How many branches are there? How manny levels of indentation? What percentage of the code is executed when the tests are run? The point is that the bigger number of different routes the code can take through your code, the harder it is to make sure you have verified that every code path actually works, and quantifying it helps selling to the boss that you need to spend time sorting stuff out. “Ooh, that’s a five there, see? Can’t have that. That’s an MOT failure right there”. The truth is these measurements are heuristics we use. We need them as a guide to make sure we constantly keep an eye, because quality deteriorates incrementally, and these metrics can help catch things early. There is however nothing you can run to conclusively say the code is error free. The best you can do is write a set of tests that verifies that code behaves like you expect- this goes a very long way – but you still cannot guarantee that the code is “right”, I e that it correctly handles scenarios outside of the tests you have devised.

What about Open Source?

Open Source and Free Software are ways to release software where the user gets to see and change the source code. Open Source is free as in free beer, and Free Software is free as in free speech.

The argument being made is that when thousands of people can see code they can see problems and fix them. Open source code is automatically better. I only need one counterexample to refute this statement. OpenSSL. Simple bugs went unnoticed for decades, despite the millions of eyes. The code is horrendous- or is it? I don’t know cryptography- maybe it’s fine?

Have you read the source for the Linux kernel, or Emacs? If you are overwhelmed by a sense of clarity, enlightenment and keep saying “of course! It all makes sense now!” to yourself, well, then you are better at reading code than me.

Greenfield or Legacy?

When a developer approaches writing some code, the main approach differs between whether or not there is something there to begin with. If you are new to a language or framework it is useful to start with some sample project that runs and you can poke at and see what happens. This helps you see what is “idiomatic”, i e how you are supposed to write code beyond the rules that language grammar prescribes, and beyond the syntax associated with a library.

Once you have a full grasp of a language and a set of tools, the ideal state of being is the revered Greenfield project, starting with a literal blank page. File -> New Project. Nobody else had muddled things up and only you and your crystal clear vision holds sway and no abstract arbitrary limitations are shackling your creativity. Truly, this shall be the greatest travel expenses management application (or whatever you are building) imagined by man.

The most likely thing you encounter though is somebody elses spaghetti code, no abstractions make sense. Names are all wrong, describing business concepts from bygone days and there are parts of the code you are dissuaded from looking at by elder colleagues. A shadow comes across their faces as they say “We tried to refactor that once, but…” and after some silence “Yeah, Jimmy didn’t make it” and then you never speak of it again. This is called Legacy Code.

As young or hip you forget that the reason that storied scary code is still around is that the rest of the company is making money from it. If that hadn’t been the case they would have stopped using it a long time ago. Should you let it stay scary and horrible? No, of course not. You must go were Jimmy went before, but with a bit more care. Gently refactor, rename, extract method et cetera. But the important first step is to understand the code. This isn’t a fast process. I was a consultant many years ago, and then you had to quickly acquaint yourself with source code, but even with practice it takes a while and a lot of domain knowledge, i e knowing about what the software actually does, like the machine engineer above, to truly be able to safely refactor legacy code. You may even find that it wasn’t so crazy to begin with. Maybe a few automated renames to reflect changed nomenclature in the company and perhaps a few paragraphs of gasp! documentation. You will not know the full scope until you truly understand the code.

Take

My lukewarm take is therefore – given that there are so many different types of software out there, that there are so many people of different types of backgrounds writing code – I am very sceptical of quick-fire judgements about code quality, especially if the people making these judgements do not have the domain knowledge to truly understand what is going on. Can professional developers identify problem areas and places that need to be changed for the sake of ease of maintenance? Sure, but – that will become clear over time. In summary – one man’s spaghetti code is another man’s Machine Learning.

Toy apps only?

Maybe I am unfair, but when using Microsoft development tools, it often feels like they only as an afterthought are adapted for professional development. The main acceptance criterion is mainly – does it look cool to the casual observer when we show it at Build?

Azure SDK

When the cloud became cool, Microsoft added a menu option to Visual Studio, so that you could click a button and push a website to Azure. Cool, right? Well – how would you use it in real life? Give developers access to push directly from Visual Studio? Seriously? Also – there is an option to attach a debugger to an azure website. Obviously not a scalable way to develop, but in a crisis it could be a Hail Mary when working with a non-production environment, but if you didn’t cowboy deploy like above, this Hail Mary is not available. Surely, if they had spent five minutes working this out on paper before they built it they could have created much more useful versions of these features

Containerisation

So here is a backgrounder on a major shift in how software is developed and deployed, including the struggle Microsoft have had to stay relevant and it features a couple of major technical achievements marred by minor impracticalities, negating a lot of usefulness.

A few years ago a company called Docker bundled some Linux kernel features and used them to wrap software into little half-isolated worlds that ran on the same machine, shared operating system and could communicate amongst themselves in prescribed ways, but were otherwise isolated. Thinner isolation than full VMs, and horizontally sliced to promote resource sharing and make better use of cloud infrastructure. They called these half isolated worlds containers, and the rest was history. “Works on my machine” At Scale, as some put it. Microsoft were livid. Once again they were left in the cold because no cool kids will use their operating system. After a Herculean effort people bodged together a thing called Docker for Windows based on VirtualBox or Hyper-V, the Bing of hypervisors. Basically you could run Linux docker containers on Windows so now one year’s Build talks were safe, but the efforts to stay relevant continued. They tried to build Windows containers, but a minimum Windows install was several tens of gigabytes, so work began on cutting superfluous cruft from a dedicated container edition of Windows to make it as small as Linux, but Windows containers still didn’t take off.

Microsoft at the same time after many years of developers complaining about the Windows command-line experience decided to bring actual native Bash to Windows by emulating(!) Linux on Windows, reimplementing most of the Linux syscall ABI in a new subsystem called WSL. It could run a lot of Ubuntu which set hearts and minds racing, but didn’t provide a big enough syscall compatibility to be able to run Docker, but still a massive feat of software engineering.

A year later Microsoft ships the second generation of WSL, now no longer a reimplementation but a full-fledged Linux distribution run virtualised with almost completely transparent booting of the VM in the background, full file system integration and beefed up command-line integration. Extremely impressive stuff. At this point, Microsoft had also reached maturity with their rewritten high-performance cross-platform open source web framework ASP.NET Core, and with WSL2 as mentioned they support running multiple Linux distributions as a subsystem in Windows. They are openly courting developers that ship software on Linux and want them to at least write code on Windows, even if they must host it on Linux in production. With WSL2 Microsoft are finally ready – docker running ostensibly natively on Windows.

Container Tools for Visual Studio

With the state of technology being as described above, naturally I want to get in on it. Writing my same old code but running it on cheaper Linux machines? Fantastic.

To do docker containers in .NET Core the Microsoft way, people have been using Docker Desktop which interoperates with Visual Studio through a plug-in called Container Tools which handles the creation, destruction, starting, stopping and debugging of containers.

I see that a preview version of Docker Desktop exists that uses docker installed in WSL2 as a back-end. I downloaded it and tried to install it but as it turns out – it cannot be installed on Windows 10 home edition due to platform incompatibility, making it unavailable to me. I play around with raw docker-compose, foregoing the visual studio plug-in, but resentment and ennui means I just end up doing something else.

Fast forward a few months ad Docker Desktop changed which Win32 APIs they used , replacing some exclusive to Win 10 Pro with more generally available ones, meaning from my perspective they fixed the problem – making Docker Desktop with experimental WSL2 backend available on Windows 10 Home. Hallelujah, praise the Lord, you’d think – the hard stuff is all done now, let’s go!!!

Enter Visual Studio Container Tools, a product that’s – as stated previously – a few years old that wraps the docker-compose, Kubernetes and docker CLIs among others in MSBUILD tasks and a Visual Studio extension.

Unlike with bash scripts or running docker-compose on the command line, where usually – pleasantly – error conditions leave information in context, thus aiding progress, similar docker configuration errors when discovered by Visual Studio tooling are expressed as compiler errors, exceptions thrown in the MSBUILD task itself with either no context, or misleading context. Why would anybody accept living like this? When they could get somewhat googleable error messages and at least a handful of people that have experienced this before if they just used the bare command line on a bare metal Linux machine.

As part of the logs I see that an address is already in use. Wat? Oh yeah, all Dockerfiles generated by the plug-in export ports 80 and 443, so all four websites in the solution try to host on the same ports. The generated docker-compose.yml has no port disambiguation. Surely a scenario with multiple sites wasn’t unexpected? How many ASP.NET Core solutions in real life has less than two websites? Heck, even toy apps should struggle under this limitation.

The point to this is – no engineering efforts are spared to do cool things “look! I’m running Linux natively in Windows” or “Look! This breakpoint is hit in the cloud.” but, in the daily struggle of delivering better code faster, very few of these gimmicks actually work, because of unnecessary simple things that could have been so much easier to fix than the almost literal rocket-science that went into those headline grabbing toy features.

The A-list cloud providers

Preface

I know a lot of people that work with Microsoft Azure, and I know a few people that work for Amazon Web Services. I go further back with the Azure people, so they get to suffer from my Facebook rants about how much Azure has wronged me, and how much greener the grass is over in Bezos’ walled garden – whilst the people at AWS see none of it thanks to the algorithms.

This post is an attempt to mend fences/garden walls a bit. Both cloud providers have their specialities and I’m going to opine about what they are.

All of my opinions are based around the stuff I want to do, and how much work the cloud provider makes me do.

  1. Host simple blogs
    1. I don’t care about storage, just make it happen
    2. I might want to make tiny code changes, make that possible
    3. I want to be able to access a command-line and some logs for when PHP breaks itself
    4. I want HTTPS. Don’t make it a big deal
  2. Deploy ASP.NET Core apps of the latest stable framework version (no previews necessary)
    1. I want the code to deploy automagically after I push my changes to github
    2. I want to be able to see correlated logs for various sites that call eachother. Make it look like Graylog
    3. I want correlated logs from mobile apps as well
    4. I want HTTPS. NBD, right?

Very simple requirements, and usually they are covered in various beginners’ guides

Coalface

Azure

Hosting ASP.NET Websites in Azure is of course simple, although not as simple as you would think. Yes, you can create what was known as a Windows Azure WebApp, now App Service, selecting the latest framework and just make it work. You will have to accept that it will randomly stop working as the make sweeping changes in the underlying architecture, from being a simple VM with a rogue git installation doing the actual deployment to some mysterious docker-based solution, even though your app isn’t containerised. As usual with Microsofty things, it is very easy to create a sample app, but getting something that works in production is a lot of work.

AWS

In Amazon Web Service-land things are different. They have very mature .NET APIs as wel, and will let you use a myriad of services, but there are no services that directly replace Azure WebApps. If you have a .NET Core Web API you can host it in AWS Lambdas using dedicated APIs and it will do exactly what you want at a very low cost, most likely completely free, depending on what other services you use. If you want to host a web site made by old people, one that uses server side rendering, you are SooL, and will either have to host a VM like a barbarian or learn docker / docker-compose. Fargate seems cheap though. The point is, the threshold to even get started is pretty high, but once you have produced something you have it written as a script somewhere and it seems production ready.

California

Once I guided a customer through reading a certificate information screen in a browser, and they immediately noted over the phone that the certificate "was trusted by California and was something something 2048 South Africa". The abbreviations for Certificate Authority and the Rivest, Shamir, and Adelman public key encryption algorithm caused confusion.

Certificates in Azure just got a lot better. You used to have to use a dodgy plugin in your Azure Website if you were suffering through using a Windows App Service that would set up Let’s Encrypt challenge endpoints and generate and install certficates for you, and getting it to work automatically was not fool proof (exhibit A, me – a fool – not managing) , or – if you had a proper Linux App Service – hand cranking the certificate util in your local WSL installation every couple of months and manually upload the certificate like in the olden days. Now though, allegedly you click once and it works.

Meanwhile in Bezos Land, there is a certificate manager, you have to faff with domain verifications, like in Azure to be fair, and IAM up and down, but then everything just works TM.

Drop the (data)base

Azure Database

Microsoft rarely ventures out to break backward compatibility, but when they do, they really break it. When they decided “Heard you like RDBMS, so we put RDBMS in your cloud” they built a SQL Server of the mind, a false creation proceeding from the heat oppressed Service Fabric. I e they implemented a small subset of SQL Server features, and made a new highly specialised application that to the end user almost looked like a load balanced SQL Server in the cloud. They called it SQL Azure (I think) and just told developers to increase connection timeout and not to expect connections to stay open. As the years have passed, compatibility has come closer, but it’s still surprisingly anaemic at places.

AWS RDS

Bezos et al just put a SQL Server in a VM and locked down the access. Boom, job done. Sure, getting backup and restore to work required some SQL hacking and more IAM to access S3 for storage to access any backup files, but to the end user it mostly just looks like a SQL Server, and you see few oddities when using Management Studio. It is important to note that the user interface is very cryptic hostile to beginners, but it doesn’t look obtuse on purpose, so you forgive it.

It’s a set-up

Azure App Service + Vault

Using the parameters in Azure App Services to configure your app is super trivial in ASP.NET Core, as you can read them as normal ASP.NET IConfiguration values, and they seem pretty reliable, except when Azure Pipelines decides to wipe all settings, which it may or may not do depending on mood.

Secrets, like certificates and keys, are easily stored in Azure Key Vault that can also be piped into the ASP.NET Configuration system for ease of use.

AWS Systems Manager, KMS and Secrets Manager

Because .NET Core is a pretty big deal, there are already APIs that give you the same benefits from a programmer perspective, you can just download your settings into the normal IConfiguration system, but there are pitfalls with encryption keys for certain settings and compatibiltiy with AWS KMS can be a faff to get right.

IdS4 on .NET Core 3.1

Sometimes I write literary content with substance and longstanding impact, and sometimes I just write stuff down that I might need to remember in the future.

Wen migrating a .NET Core 2.2 IdentityServer4 project to .NET Core 3.1 I had a number of struggles. The biggest one was that the IdentityServer threw the following error:

idsrv was not authenticated. Failure message: Unprotect ticket failed

After scouring the entire internet all I got was responses alluding to the fact that I needed to have the same data protection key on both instances of my website. The only problem with that was that I only had one single instance. Since I was also faffing about with changing DataProtection (unwisely, but hey, I like to live dangerously on my free time) – this was a very effective distraction that kept me debugging the wrong thing for ages.

After starting off from a blank .NET Core 3.1 template and adding all the crust from my old site I finally stumbled upon the difference.

In my earlier migration attempt I had mistakenly put:

            app.UseEndpoints(o =>
               {
                   o.MapControllers();
               });

Now, my main problem was that I had made multiple changes between testing, which is a cardinal sin I only allow myself in my free time, which is why it is so precious now. If you had only made this change and run the website you would notice that it isn’t running, probably. But since I changed a bunch of stuff at once I had a challenge figuring out what went wrong.

In the default template the writing was different and gave completely different results:

      app.UseEndpoints(endpoints =>
            {
                endpoints.MapControllerRoute(
                    name: "default",
                    pattern: "{controller=Home}/{action=Index}/{id?}");
            });

And lo, not only did the website load with the default IdS4 development mode homepage, the unprotect errors went away(!!).

The main lesson here is: Don’t be stupid, change one thing at a time – even if you think you know what you’re doing.

More WordPress

I previously dabbled with moving blogs off of WordPress.com onto Azure. This is a new story about moving on from there.

Lazy Loyalty

I have stuck with Azure for WP things for a number of years. I guess it’s like a gym membership, you set it up, is costs money, you think it is going to be useful, but you end up feeling it is a waste of a lot of money and the shame grows.

I had the unfortunate and for me unusual position of early adopter, so I had Windows based WP “Azure Web Apps” as they were called back in the day, long before MS ❤️ Linux and all that. The most weak sauce image available at that, backed by the most pathetic MySQL instance possible.

File System Philosophy

As no doubt even the least Windows hating computer guy will gleefully tell you, when the topic of Windows performance comes up, Microsoft bet on the wrong horse when designing NTFS, the file system of the future for the Windows NT Operating system in the late eighties. They thought the concept of an abundance of tiny files was a bygone era, the future was large media files, databases, documents. The file system is pretty good at caching large files fragmented over a physical disk.

Linus “Linux” Torvalds believed in the UNIX principle, believed that thousands of tiny script files would configure the future. In some ways he was right, and definitely in the case of file system performance. Ext4 is vastly superior to NTFS when it comes to storing/hosting websites because, you know it, websites are a shedload of tiny files.

Total Tragedy

So – underpowered cloud instance, wrong file system, cheapest DB… what was the end result? One word – abysmal. It would take 30 to 40 seconds to load a page in the admin interface. Completely unworkable.

So what to do? Well, the move had been arduous enough that I was tempted to just leave it, rather than move it again. I didn’t have another Christmas holiday I could spend on it (I had missed the “maintenance window” for hobby IT as it was already January at this point).

Scaling up?

Azure started offering a Linux based VM that came with WordPress preinstalled, i.e. a grown-up version of what I already had, with four times the virtual hardware as well, with a beefier MySQL instance becoming available as well. But at what cost you ask? Well- a significant one. It was that or all the Sky Sports you can possibly buy – so not worth it. Also, they had no upgrade path, so yeah, it would be like a complete move all over again, and like I wrote, just missed the maintenance window.

Scaling out?

So, instead I tinkered. Added a shedload of plugins. Azure Redis Cache, Azure Blob Storage & CDN and the WP Total Cache plug-in. Tried to smush images, but the admin interface was still too slow, so it was too difficult to remember what you had been doing when the page finally rendered. Now using all of these resources didn’t exactly make things cheaper, but it was more like a couple of movies off Amazon Prime than full on Sky Sports.

In a fit of desperation I even signed up for Cloudflare – which meant that the front office of the site, when cached, was the fastest thing in the universe, from anywhere, but Cloudflare couldn’t fix everything, so the error message of brokenness was very common, ruining the end-user experience.

Scaling down

Eventually after getting yet another Azure billing notification I just went at the Googles and searched for WP HTTPS NGINX or similar, and my filter bubble helped me find Bitnami. This will read like a transcript of a YouTube reaction video, but I don’t know anything about Bitnami, I didn’t want to know. It seems to be a company that knows about setting up Linux boxes. You can either download their stuff or buy AWS or Azure VMs through them, preloaded with their scripts and – shockingly – documentation on how to set it up.

I chose the smallest thing they had fitted with SSD on AWS, it came preconfigured with the correct firewall settings, a WordPress preloaded with useful plugins, and scripts to help you fetch free certificates from Let’s Encrypt and instructions on how to configure NGINX to use them.

Moving on up

The default – empty – WordPress was lightning fast, and the plug-in All in One Migration was most excellent. It has a sidekick that allows you to upload bigger files, you will need that, but overall, that  plugin made the migration a lot less awful compared to the default WordPress Import/Export tool. For the first site, I installed that same plug-in on the old site, downloaded all site data except spam comments, uploaded the File onto the new sites using the plugin and after some confusion when I had to log in with my credentials from the other site, and the browser became sceptical of this site now stating it was somebody it isn’t, but after authenticating and dismissing the security prompt the site was still acceptably fast. I changed DNS to point to this new machine and felt pretty good about myself having moved stuff across – even the media files were working, something that never happens with the normal import.

After the DNS had propagated I configured the lets encrypt stuff and got my padlock back. I started to see traffic and I noticed that there was a slight problem. The admin interface and the WP Rest API were completely broken due to mixed content warnings (crossed-over padlock!). The internet seems so baffled by my error, I presume it Just Works on the more common Apache Bitnami configurations, but that NGINX does some HTTPS termination before the web apps hear anything about the incoming requests, meaning the sites must be set to accept http internally.

By setting a parameter in WP-config.php forcing the admin interface to be loaded over HTTPS, the API started working again and there was much rejoicing.

Basically, a few hours in, the migration I had been fearing for such a long time had completed.

It remains to be seen what the total cost per unit of time will be, but the estimate looks good, at around a third of Azure prices – since one of the new images was made slightly more powerful and thus expensive.

…and the number of the counting shall be 3

.NET Core 3.0 is here, allegedly the penultimate stop on the roadmap before the Singularity, when they finally bin .NET Framework and unify on top of Windows XP… er… .NET 5. The news are packed with stuff about WinForms, WPF and other legacy technologies, but I’m going to stick with the webby and consoley bits, where I’ve been mostly operating since I started using .NET Core back in 1.0 days.

Scope

As usual I will mostly just write down gotchas I have come across so that if I come across it again I will have a greater chance of not wasting so much time the second time around.

We are starting from a .NET Core 2.2 app that initially was .NET Core 2.1, so it may not have been fully upgraded in all respects between 2.1 and 2.2 if there were changes I couldn’t be bothered implementing.

Breaking changes

I followed an excellent guide to get started with references that need to leave your project file and other that need to come back in after they were exiled from the magic default Microsoft.AspNetCore.App assembly, as well as other breaking changes. It’s not that bad, and you really will enjoy the experience.

There has long been a trend among hipsters to forego the unstructured default folders in ASP.NET projects that buckets Controllers, Views and Models into separate folders, in favour of instead having two folders in the root, one called Features and another called Infrastructure. The Features folder would contain – you guessed it – each feature, with controllers, views, viewmodels and the data model grouped together. To make this work, there was the necessity of creating a new Convention for adding controllers with Feature, so that the view resolver would know where to look for the Views. Since AddMvc is now called AddControllersWithViews, I made that change hoping to make things look happy again. I noticed that the FeatureConvention had a squiggly. This is because the old interface IPageConvention that the FeatureConvention used to implement no longer existed. What to do? Well I looked around all over the internet and I found nothing, so finally I discovered the IControllerModelConvention interface, and by literally just replacing the name of the interface , everything just compiled, so the interfaces were identical.

services.AddControllersWithViews(options =>
            {
                var policy = new AuthorizationPolicyBuilder()
                    .RequireAuthenticatedUser()
                    .Build();
                options.Conventions.Add(new FeatureConvention());
                options.Filters.Add(new AuthorizeFilter(policy));
            })

Controller Actions

So with .NET Core 2.2 the ActionResult<T> type came into being, but now you are starting to see squigglies around IActionResult, saying the bell tolls for untyped responses. This is not such a big deal. Often this means you get to cut away vast swathes of boilerplate where you respond with Ok() or Json() around something, instead just returning what the handler created, replacing Task<IActionResult> with Task<ActionResult<SomeExcellentDto>>. Not only have you now achieved a lot of automagic swaggering where you otherwise would have had to write attributes manually to inform the consumer what the payload looks like, you have also eliminated the need for a class of tests just ensuring that the action methods return data of the right kind.

The thing to look out for is that if your handler returns an IEnumerable<T>, due to the way ActionResult<T> works, you need to cast the enumerable to an array or a list, because otherwise the type cannot be instantiated, i e instead of Task<ActionResult<IEnumerable<SomeExcellentDto>>> you need to go with Task<ActionResult<SomeExcellentDto[]>> or Task<ActionResult<List<SomeExcellentDto>>>

(Swash)buckle up, buttercup

The swaggering is a separate chapter – the latest versions of Swashbuckle are hard integrated with OpenAPI, so you have to replace any filters you may have created to format your swagger document since all the APIs are broken and replaced with similar ones from OpenAPI, and if you do any swaggering at all you have to get the latest prerelease of Swashbuckle to even be able to compile. Basically – if you are using Swashbuckle today – congratulations, you are about to start swearing.

IdentityModel crisis

The IdentityModel nuget package has been updated quite radically between versions, and if you were doing clever things in message handlers to keep track of or request tokens from the token endpoint when talking between services, you may need to update your code to do without the TokenClient that bereft of life has ceased to be.

The new method is to use extension methods on to HttpClient, and the canonical example is providing a typed HttpClient. created on demand by the HttpClientFactory, that in turn calls the extension methods. See documentation here. The token endpoint and the extension methods on HttpClient are covered in more detail here.

Show me the money

Payment gateways for small businesses. What do you look for in them? Personally I don’t care – I crave simple pleasures.

I would prefer if a checkout page looks like the rest of the site, but I need for it to be not-obviously-insecure and be compliant with current regs but also it must be less work to implement than the rest of the site was to create. If taking card payments was my core business, I would be in that business. I would prefer to sacrifice a significant chunk of revenue for this usability bonus.

The checklist

I would like a payment API

  1. To which I can connect with, ideally a .NET Core client, but pure HTTPS is fine.
  2. Where I can specify what my customer is buying
  3. I get to know who bought from me (email is all I need)
  4. Where I can indicate how much they will be charged (so that I can do discounts) – in fairness this is only sometimes missing
  5. That understands the concept of VAT and can just handle it for me. In the EU, VAT now in some cases has to be declared in the customer’s country. This is the type of faff a Stripe, Paypal et al should handle for me.
  6. Deal with 3D Secure automagically.
  7. Deal with PSD2/SCA automagically

Reality

Dodgy simile

Proper guitar amplifiers have a spectrum of volume*. Your exact volume knob indicators may vary, but the segments are universal.

Volume 1 – 4

practically silent,

4-4.8

Audible,

4.9

Decent volume, speakers are operating at reasonable dynamics, you can play. It’s just a bit quiet.

5.0 – 11.0

Massive noise complaints, police arrive.

Cards

From what I can tell, payment gateways operate similarly.

Level 1

You just need a button, and money might appear on your account. Never you mind who paid you for what.

Level 2

You can get to know who paid for what, but you’re SOL on VAT and have to do discounts manually like some schmuck. And webhooks. MOAR webhooks FTW.

Level 3

First you must create the Universe, then you must do 3DSecure manually and do three API calls to just begin to set up the first thing that might eventually become a card transaction.

Resolution

There is none that I can see. Am open to suggestions.

* I am aware attenuators solve this problem, but play along please.

Logging

I have had the misfortune of delving into logging a lot lately. To save time for next time I will write down the findings here.

My goals are simple. A couple of sites and APIs log into the same log aggregator, could be loggly, seq or graylog for instance. Given that I supply a correlation ID, I want to be able to tag all log entries related to one user as it travels through the system. This isn’t even on the bare minimum Charity Majors event logging, this is just glorified text but searchable with fields.

As of the date today, I want to be clear that for .NET, Serilog is best. Log4net is out since a long time ago, NLog tried but cannot explain how to do structured logging, so will have to be excused. Serilog has a more pleasant interface t and although I have struggled in the past to get the log context to enrich properly and had to resort to the Microsoft log abstraction combined with Serilog.AspNetCore to succeed and had problems getting the loggly sink working at all since docs skipped the need for the loggly-csharp nuget package. Still, it keeps winning, on old .NET Framework as well as .NET Core.

Setting up the Correlation ID has two parts. The first part is a piece of middleware in the request pipeline that wraps the call to the next stage in the pipeline in a using() statement. Here you extract the correlation ID from the caller or supply a suitable unique default for this call.

Then you create a message handler for setting a correlation ID on the outgoing HttpClient call. You can use the IHttpContextAccessor to get the incoming CorrelationId or the same default as earlier and map the message handler to any HttpClients you have defined in the projects.

Spite is the mother of invention

Premise

This is a tale about a blog on WordPress.com that had a loyal readership and regular, high quality content (so yeah, not writing about this blog). The owner wanted to use the odd plugin and advanced theme, and I was always bothered that WordPress was living off of, well I exaggerate wildly now, this person’s words by putting ads everywhere in addition to the massive annual fee.

Liberation

So with the lure of freedom on them yonder hills, we moved the blog off of WordPress.com onto a Windows VM on Azure (yeah, well… yeah…). Domain hosted on dnSimple, so the logistics of pointing the domain to Azure instead of WordPress and setting up verification TXT records and such was a doddle.

Hosted MySQL instance on Azure was easy enough, but the WordPress.COM theme we had been using was not available on WordPress.Org so we had to pick another one. Sadly we went with Customizr which really means vendor lock-in, as you do a bunch of customisations, hence the name, that are all out the window once you change themes.

Of course, there is no option but to run HTTPS today, and trying to pinch pennies we weren’t going to buy an EV cert from one of the remaining dodgy CAs out there, but iinstead we went with Let’s Encrypt using a tutorial posted by Scott Hanselman. 

Selling out – but is anybody buying?

To make the big bucks we hooked the the site up to Google Analytics and ditto Adsense, and there were plugins to really automate that stuff. Yoast SEO beat out MonsterInsights on features for the analytics and integrates both with Search Console and Analytics. The killer feature for Yoast SEO is the customisable canonical URL which is useful if you reprint blog posts from another site and want to beg Google for mercy for the crime of duplicate content.

The actual ads, how do they work? Well by cunningly just clicking like an insane person (which really is the best way to learn), I managed to understand the concept of Auto Ads. This again is abstracted away by a plugin, in our case Advanced Ads. As the site owner didn’t want ads on all pages, we had to hack it by creating a plain text and code ad with the Auto Ad code from Google pasted in there and then the Advanced Ads thing deciding which pages to actually serve the ad code. The downside is a persistent nagging that you ‘shouldn’t display visible ads in headers’, but I guess that’s fine. They are just script tags, so there is nothing visible there..

Also, all the cool kids enter the Amazon Affiliate program, so we did that. They do have a minimum number of referrals you have to make as they don’t want to deal with tiny unprofitable sites, so I suspect we shall be unceremoniously booted out fairly soon, but the concept of having widgets where you choose your favourite books related to the subject of your blog and maybe in the long term share some revenue if people take you up on your recommendations seems fair. Shame that the widgets themselves are so immensely horribly broken and difficult to use. Allegedly, they are supposed to update when you make changes in the affiliate program site, but they really aren’t. I don’t get paid by Amazon so I shan’t debug their system, but it can really only be that the command that goes back to save settings isn’t picked up, or that they are unable to bust the cache and have old widgets served, but I strongly suspect it is the actual save that is broken, since the widget loses the data already in the wizard before you even enter the last page.

AMPed up

After a few hours I noticed that all the permalinks from the old site were broken on the new one, so I checked the Permalinks tab and it turned out there was a custom setting that I just set to default which made things work and there was much rejoicing. No audit log here so I can’t check, but if I made that change it must have been unintentional. My favourite hypothesis is that somehow the otherwise impressive WordPress XML-based import somehow failed to bring over the settings correctly.

As I rarely venture out into the front end I had not quite grasped what AMP is. I realised I was getting another load of 404s – his time for URL’s ending in /amp. I did a bit of googling and I realised I should probably get yet another plugin to handle this. Like with most WordPress plugins there are varying degrees of ambition and usually they want you to spend $200 in extras to get what you need, but although I brought the site off WordPress.com to deny them ad revenue for the site in question, I was under no illusion that I would be able to produce any such revenue to the owner as whatever $3 would be produced would definitely be eclipsed by the hosting cost.

By going with the default WordPress AMP plugin you can’t do ads, but it works – ish, by using the major competitor you get a functional site, but a completely different look compared to the non-AMP site, and we didn’t want that after all the effort we had already put in.

After reading some more, I realised that everybody was going off AMP anyway, for varying reasons, but that was all the peer pressure I needed, so I broke out the Azure debug console and edited web.config to put in a URL redirect from AMP URL to a normal one.

This was incredibly frustrating as first I forgot that .NET Regexes are different from normal regexes and also you have to not be stupid and use the correct match in the redirect expression ({R:0} is the whole source data, while {R:1} is the first match, which is what I needed).

<staticContent>
<remove fileExtension=".woff2" />
<mimeMap fileExtension=".woff2" mimeType="font/woff2" />
</staticContent>
<rewrite>
<rules>
<rule name="Disable AMP" stopProcessing="true">
<match url="^(.)amp\/?\r?$" />
<action type="Redirect"
  url="https://<awesomesite>.com/{R:1}"
  redirectType="Found" />
  </rule>
  <rule name="Redirect to naked" stopProcessing="true">
  <match url="(.)" />
<conditions>
<add input="{HTTP_HOST}"
pattern="www.<awesomesite>.com" />
</conditions>
<action type="Redirect"
url="https://<awesomesite>.com/{R:0}"
/>
</rule>
<rule
name="WordPress: https://<awesomesite>.com"
patternSyntax="Wildcard">
<match url="*"/>
<conditions>
<add input="{REQUEST_FILENAME}"
matchType="IsFile" negate="true"/>
<add input="{REQUEST_FILENAME}"
matchType="IsDirectory" negate="true"/>
</conditions>
<action type="Rewrite" url="index.php"/>
</rule>
</rules>
</rewrite>

So there are a couple of things here – first a mime type correction to make IIS server web fonts, thne a redirect for AMP sites, then a redirect from http://www.awesomesite.com to awesomesite.com for prettiness, and also to canonicalise it to avoid duplicate records in the offices of Google, which they do not like. WordPress itself will force https if necessary, so all we need to do in this config file is to curb the use of www.

Summary

The actions we took to move the blog were the following:

  1. Set up the new site
    1. Create site
    2. Create blob storage
    3. Create redis cache (I did this later, but you might as well)
    4. Set up a database
  2. Export existing data from old blog
  3. Import data into new system
  4. Choose a theme
  5. Verify that old google links work on the new site (I didn’t do this fast enough)
  6. Verify that any way you try and call the site is redirected to a canonical represenation. Use a hosts file if you haven’t redirected the DNS yet, which with hindsight should have been the way I did it.
  7. Move the DNS to point to the new site
  8. Add the LetsEncrypt support to the site by following the guide. No more certificate errors
  9. Install plugins for analytics and ads.
  10. Create a Google account
    1. Register with Google Search Console
    2. Register with Bing search console (for those two or three people that don’t know Google.
    3. Register with Google Analytics
    4. Register with Google AdSense

Conclusion

So this was very easy and horribly frustrating at once. DnSimple and provisioning resources was a doddle. Following the internet guide to set up Let’s Encrypt and HTTPS was super straightforward, but then WordPress plug-in management, PHP and Amazon widgets were shit shows to be honest. I mean I realise Amazon has a complex architecture and their systems are never 100% up or 100% down and so on, but a save button being completely broken doesn’t feel even slightly “up” from the point of view of the end user.

PHP is garbage and brittle and you are hard-pressed to build anything viable on top of it (but obviously some have succeeded). These plugin smiths aren’t Facebook though. They would correctly interject that I am running WordPress on the least suitable platform imaginable. That is true (it has to do with how the Azure VM instances are set up, on the fact that they run on Windows and most importantly NTFS which has performance characteristics that are completely unsuitable for Unix style applications and favour a small number of large files where EXT4 favours large amounts of small files), but if the Powers that Be really consider Windows and NTFS to be such tremendous deal-breakers, then they should simply not allow Microsoft to host WordPress on Windows at all. As it stands, it does WP no favours with 1 minute turnaround to save settings for a plugin and similar. Then again, I also live in the UK which notoriously has a Internet infrastructure dating back to the Victorian Era, so it’s hard to tell what’s actually the worst culprit, but the sidecar web app that hosts the debug console for the blog is a lot snappier than WordPress, and that is hosted in the same IIS intallation as the WordPress site, although not in the same app pool.

Structural Equality – or is it?

I was recently presented with a conundrum. We had constrained data valid for the domain in a record type. Sadly this record type contained a reference datatype, so built-in structural equality broke down as the reference type never was equal in the way we thought would make sense.

This gave me the opportunity to learn how you override the implementation of Equals and GetHashCode in F# which I was previously unfamiliar with.

This is the finished implementation of the record type, or one like it, rather:

 [<CustomEquality>]
[<CustomComparison>]
type Structure =
{
Name: StructureName
Status: StructureStatus
Format: Regex
}
with
interface IComparable with
member this.CompareTo { Name = name; Status = status; Format = format } =
compare ( this.Name, this.Status, this.Format.ToString() ) (name, status, format.ToString())
interface IComparable with
member this.CompareTo obj =
match obj with
| null -> 1
| 😕 Structure as other -> (this :> IComparable<_>).CompareTo other
| _ -> invalidArg "obj" "not a Structure"
override this.Equals (o: obj) =
match o with
  | :? Structure as os ->
  (this.Name, this.Status, this.Format.ToString()) =
(os.Name, os.Status, os.Format.ToString())
| _ -> false
  override this.GetHashCode() =
  (this.Name, this.Status, thus.Format.ToString()).GetHashCode()

So yeah, ujse of pattern matching to determine data types in the non-generic functions and extensive use the built-in structural equality in tuples.

Very nice. With thanks to TeaDrivenDev and Isaac Abraham on Twitter (and this StackOverflow response)