Skip to content

Fixing a few broken twigs in the nest

Owl twigs HERO2 1

GovernorHub ran into a few stumbling blocks back in March. Users might have noticed that the system ran more slowly than usual or worse, for a handful of people, the system fell over altogether (technical term!).

kyle modified v2

The GovernorHub team wanted to let you know what actually happened for those two weeks in March, why it happened and what they're doing about it.

Get ready for words like ‘heavy load’, ‘front end requests’ and ‘error clogging’ as we get the lowdown from GovernorHub lead developer, Kyle Selman.

Kyle, firstly can you tell us which of our customers were affected by these issues and for how long?

I’m afraid pretty much anyone who used GovernorHub at peak times during this two week period was affected unfortunately. Due to the nature of the problem, the issues ramped up with an influx of users (heavy load) which meant that problems occurred just after lunch and again at 6pm when most governing board meetings take place.

How did it affect users?

Our users would have found that pages loaded very slowly - and what do you do when a page takes ages to load? You keep refreshing it, which then adds even more load to the system so we ended up with this snowball effect where once someone was affected, it became an even bigger issue as people started trying to refresh the page over and over again.

When you realised there were problems, what was your thinking at first?

It’s actually very tricky with this kind of slowdown. GovernorHub is now a very complicated system that’s scaling at large. When you see this kind of problem, you first start to look at the logging systems to see what kind of errors are coming back from users. What we saw was that the system was struggling to fetch requests for resources (for things like bringing up a noticeboard post) and there was some sort of overload in the system. When something comes out of the blue like this, it’s usually quite a complex problem to solve and there are often quite a few red herrings to catch you out along the way.

You say GovernorHub is quite complex and scaling at large, can you explain what’s going on in the background to our users?

Well the way GovernorHub works is that we’ve essentially got lots of different services - let’s call them different computers. Each different computer has a different job to do. One might be dealing with all of the processing that comes with governor training booking, for example. All of these computers need to be able to talk to each other. Sometimes you get bottlenecks - if you’re trying to talk to one computer and it’s busy doing some work already, it’s going to take longer to get the information (a front-end request) back out of that computer (service).

The more people that start using that service, the more chance there is of a service getting clogged up and it starts to slow down which then has a knock on effect on other parts of the system. 

We’ve been gradually moving features over from older systems to newer ones and often adding new features. However this means more requests. So where it might have been, say, 10 requests for a certain page load in the past, that’s risen to 15.

The icing on the cake in March was having the busiest period we’ve ever had (1 in 2 governors in the country visited GovernorHub at some point during this period). More users than ever before will stress test a system further. Every system has an imaginary hill, as I like to describe it, where the system might fall down due to heavy load but it’s always unclear where that hill is until you reach it. In March, we hit our hill because of the rise in users and also the rise in requests.

You did eventually get a handle on it (and move that hill further out of reach). Can you explain what you did?

KYLE SLIDE

Google Cloud Platform graph illustrating the request issues

We use Google Cloud Platform for all of our logging and it’s got some really useful metrics about our services which gave us some good graphs to illustrate the issues. 

The grey peaks in the image (above) illustrate the number of requests we received and the red line of peaks illustrate the errors where things were getting bottle-necked. As you can see, we were gradually able to reduce the overall number of requests and also the amount of errors to a completely manageable level.

The root cause was some old code in a variety of places that wasn’t doing what it should - it was adding unnecessary requests to the system, so we went through to explore each one and update anything that wasn’t working as it should.

Well done for resolving it. On a personal note, how much sleep did you get during this period? I know you were working through the night on some occasions.

Well I find it hard to sleep when there’s a problem until I’ve solved it. You end up completely messing up your sleep cycle, so that did happen to some extent as I tried to get to the bottom of the issues.

Are you sleeping like a baby now?

No. I never sleep like a baby. That’s just the nature of software engineering. Your mind probably works in a certain way and you can’t easily rest if something isn’t working. There is always work to do to improve the system.

Can GovernorHub users rest assured that this won’t happen again?

We’re still working to scale the system effectively. We’re always growing and there are always new things to add so I can’t make any absolute promises, but they can rest assured that we’re working really hard to develop the system to its full potential and we’ll do our best to keep that imaginary hill well out of sight.

Post your comment

By submitting your details, you agree to have your name and email address stored for the purposes of managing your enquiry. For more information see GovernorHub's Privacy Policy.

Comments

  • Jean Reid 22 Jun 2023, 07:07 (11 months ago)

    Well done GovernorHub team especially Kyle for all the hard work you put into keeping GovernorHub working and always improving, great team!! i always get a speedy response and fix whenever i flag a problem or have job request

    Kyle i hope you get to switch off your software engineering brain at times and have a complete rest 😊

    Thanks Jean

    By submitting your details, you agree to have your name and email address stored for the purposes of managing your enquiry. For more information see GovernorHub's Privacy Policy.

RSS feed for comments on this page | RSS feed for all comments


You might also like: