It was business as usual on the 18th of December 2018. Midnight rolled in and we received an alarming email from one of our well-wishers (here’s a shout out to Adam Connell of BloggingWizzard!) alerting us of the dreaded ‘404’ error.
Our homepage was down and inaccessible to anyone visiting our website!
A ‘404 error’ on a website is the worst nightmare for any online business. Especially one whose value proposition is to ensure that its customers never encounter a broken website!
Our post today details the events that unfolded with that one prompt email. And how we went about analyzing and finally fixing the issue. Due to the nature of this story and the events leading up to it, some parts of the narrative would need to go back and forth in time. Do bear with us.
Let’s begin at the beginning
The chilling email that arrived well past midnight (our HQ is in India) was noticed by two members of the BlogVault team – Akshat (Founder) and Aman (Customer Support) – who were up helping customers. They were tempted to dismiss it as a temporary glitch for that particular customer. But on checking they realized that the page was indeed down and had been down for a few hours!
Their timely intervention salvaged the homepage promptly, which otherwise would have been inaccessible until the technical team arrived the next morning.
Had the entire website gone down, we would’ve discovered it pronto, with the help of BlogVault’s Uptime Monitoring. But since the rest of the site seemed to be up and running, and only the homepage went down, the reason behind the issue remained a mystery.
Whatever the reason, there was no denying that we were losing visitors, potential customers, and revenue! So our topmost priority was to get the homepage up and running pronto.
Getting the homepage up and running
Given the urgency of the matter, Aman and Akshat (A&A) decided that the best course of action would be to get hold of a working copy of the website backup, test it and restore it. Thanks to BlogVault’s smart sync technology, every little change in the website is tracked and backed-up. So all we needed to do was test one of the latest working backups, before restoring it on the live site. Taking advantage of BlogVault’s Test Restore feature (we shall be exploring the ingenuity of this feature a little later in the post), they tested one of the latest backups. And after ensuring that the homepage and the website, were working properly, restored it.
Our home page was accessible, and we were back in business!
A twist in the tale
The next morning we were in for a nasty surprise! While going through the website to ensure nothing else was broken, we realized that the home page was showing outdated content!
To understand what had happened, let’s trace back the sequence of events preceding the chilling email. We had made modifications on the BlogVault website a few hours before receiving the warning email – changes A&A were unaware of. So, when they restored the back up the previous night, they could not verify if the version was the correct one.
Cut back to the present and we were facing a home page that reflected none of the modifications made the previous day – translating to 12-14 hours of lost work!
Now, we had two options:
1. Accept that a mistake had happened, redo the work and move on.
2. Demystify the puzzle and figure out a way to recover the work.
As Yoda would say, go down without a fight we shall not!
A team was constituted post-haste and we decided to tackle the issue with a two-pronged approach – 1. recover missing content, 2. Identify the root cause of the incident.
The road to recovery (time spent: 20 mins)
WordPress, as we know maintains multiple revisions or backups of a page. So our first attempt at recovering the missing content involved examining these revisions. That didn’t work out too well. But the lesson we learned was invaluable. We realized that WordPress revisions are not a dependable backup option!
What then? Well, our BlogVault website is configured to backup changes in real time. Which means that every little change is being tracked and backed up. So all we really needed to do now was find the right backup that reflected all the updates we had made on the site, and restore it.
BlogVault’s Test restore to the rescue
With any other product in this segment, restoring the right backup version would have been another long and arduous process. One that would involve scouring through each possible backup version, restoring it, and checking on the live site to see if it is the one. And let’s not forget the huge risk it poses to the live site. But with BlogVault’s Test Restore functionality, our workload was reduced drastically (just a few minutes is all it took). The ingenuity and simplicity of this feature were revealed to us first hand. Let me give you a brief on how Test Restore can simplify an otherwise laborious process.
What’s BlogVault’s Test Restore?
This particular feature allows you to test the backups in a safe environment instead of the live site. Thus aiding in the process of choosing the most appropriate backup version. After finding the backup version you are satisfied with, you can restore that specific version to the live site with the auto-restore feature of BlogVault.
We started restoring backups in test environment one-by-one and quickly zeroed in on the backup version with all the missing content. However, the 404 error that caused all this confusion could still not be traced in any of these backups. We were in a fix!
The return of Error 404
During our brainstorming session with the team, Shivam, our Lead Dev (God bless him!) suggested we check these test restores in a different environment. One that had not been tried before. We agreed it was totally worth a shot. So we ran the test restore on another computer in incognito mode.
No one would ever have been so happy to see a 404 error on their page! So now we had a backup version that not only had our modified content but also replicated the error message.
The next step for us was to find out the change that led to this issue.
Light at the end of the tunnel (time spent: 5 mins)
To get to the root of the problem, we had to retrace our steps to when the changes were made. Now, typically whenever a change needs to be made – be it adding a blog post, or installing a theme – it is first introduced in a staging environment. For the uninitiated, let me give a brief intro to BlogVault’s staging feature.
Staging is the process of creating a copy of your website on the BlogVault server so that you can test changes on your website. For instance, you want to update a plugin that is known to crash websites. With BlogVault’s staging, you can update the plugin in a test environment and check whether it is working. If it is working fine, then you can update it on to the live site.
So as I was saying, whenever a change needs to be made on our site, we first introduce it in a safe staging environment. This staged environment is a clone of the live site. And thus, we are able to correctly assess the effects of these changes, without affecting the live site. Had the change been introduced in a staging environment, like all changes normally are, we could have avoided all this drama. Our next step was to find the what, when and where of this change.
Since this was a change made directly on the website, we realized that we could precisely pinpoint it using WPSecurity Audit Log. We scoured through the detailed audit log and found the one we were looking for. It was an activity log where one of our team members had accidentally changed the meta property of the homepage and set it to Private! This was our EUREKA moment.
The devil is in the details… truly!
How could something seemingly so insignificant cause so much distress, right?
Well, finally we could see the light at the end of the tunnel. The only thing left to do now was to identify when the page was set to Private and then recover the backup made right before that. We promptly set out to find the backup we were looking for. Found it, restored the successfully tested the backup version and made the page LIVE.
Phew! In less than 30 minutes, we’d cracked the puzzle and restored our lost work !!!!!
We realized (a little later, once the high had settled) we’d saved more than 11-plus hours (woah!!!!) worth of effort getting wasted.
All thanks to the truly state-of-the-art features of BlogVault (a flagrant plug, we know, but it’s true)! Now if this isn’t what you call making life easier, then what is?
Familiarity breeds complacency
Thanks to this incident, we came to fully comprehend and appreciate the value of following a rigid practice of making changes, however insignificant, only in a staging environment. NEVER on a live site.
And since we are on the topic of positive outcomes, we also got to eat our own dog food after a very long time. Every one of us in the team got to experience BlogVault as an end user. And the experience was invaluable!
It reinforced our belief in the value we continue to add to the lives of our valued customers. And drove home how crucial it is to have a reliable backup system with ingenious features like staging and test restore. This literally saves hours (even weeks in certain cases) worth of work.