“Familiarity breeds complacency” — That’s our story from last week. 😊
It was business as usual on 18th December 2018, until (wait for it) we received an email from one of our beloved customer (Thank you sir) alerting us of the dreaded “404” error on the HomePage of our website.
Yes … you heard that right … our homepage was down and un-accessible to anyone visiting our website. The worst nightmare for any business!!!!! 😱
I’ll be sharing detailed insights and how we fixed the page, later in the post (keep reading!) but first, let’s look at how the events unfolded that night.
To give some context, the email had arrived well past midnight here in India (our HQ). We were staring at a 12-hour website blackout until the team arrived the next morning. Luckily for us, both Akshat (Founder) and Aman (Customer Success) were up helping customers and took notice of that email.
They were hoping that it was a temporary glitch for that specific customer. But you know what, the customer (as always) was right. 😊
The BlogVault homepage indeed was down and had been down for hours. We were not just losing new visitors and potential revenues but also their trust.
At that point, our #1 Priority was: (no guesses!) Get the homepage back up as soon as possible.
Before we tell you how we got our home page back up, let’s get a sense of the moving pieces of the website.
Now, there are multiple stakeholders involved (both internal and external) and responsible for making changes to the website. There is a good stable process that we utilize when it comes to making changes, such that it doesn’t impact or break our live website. More on this later.
Aman and Akshat are not one of the stakeholders involved in maintaining the site or making changes to.
This made things complicated on that fateful day. They were unaware of modifications made on the website earlier that day and hence were not in a position to identify and fix the issue.
Opting for a Quick Stop-Gap Solution
As soon as Aman and Akshat learned about the home page, they knew they had to get the home page up and running as quickly as possible. Or else we were staring at a significant loss (of not just revenues but reputation and trust).
They decided that the best thing under the current circumstances was to get hold of the latest copy of the website back up, test if the home page is working and then restore it.
They were relieved to find that the home page and the website, in general, were working well and proceeded with restoring that version of the website.
The home page was back. Woohoo … We were back in business … What a relief!
But wait …
Like any good movie, this story comes with a surprise … Hold on to your seat folks!
But Wait … There’s a Twist in This Tale!
The next morning, I was going through the website to ensure nothing else was broken and found an unpleasant surprise. The home page was showing OLD content. Damn ….
It suddenly became my #1 Task for the day to figure out a solution for this. I picked up the phone and spoke to Aman.
When Aman and Akshat restored the backup last nite, they were not aware of the changes and hence could not validate if the version was the correct one.
Having figured out the problem, he could see this going two way: Either we accept what happened and redo the whole work (i.e. put in another 12+ hours). Or we figure out a way to recover the changes.
We decided to go with the latter because there’s way no way we are going to let the team’s work go to waste.
Getting to the Root of the Problem
With no time to lose, we quickly put together a team who’d fulfil two goals:
- Recover the lost modifications &
- Find out what really went wrong with the homepage
But before we begin, let me apprise you of the measure we take to avoid any mishaps on the website. This will help you understand the path we took.
Precautions We Take With the Website –
Since a significant amount of thought and planning goes into each and every page of the website, especially the home page, instructions are to test the changes on a Staging Site before making them on the Live Site.
Staging is the process of creating a copy of your website on BlogVault server where you can test changes on your website. For instance, you want to update a plugin installed on your site but the plugin is known to crash websites. Hence, you can test the plugin in a staging environment before updating it on your live site.
First Off, Find What Went Wrong on the BlogVault Homepage!
Although we have a strict process in place to ensure nothing goes wrong on the site when someone is making modifications, at times it’s hard to ensure 100% compliance of the same especially when multiple stakeholders are involved.
(This though is not an excuse and we need to tighten our screws around that process.)
After speaking to Aman and Akshat, we were fairly confident that the page went down because of the changes made the previous day.
Thanks to our defined process for creating a staging site for making any changes, we are in a better position to narrow down the changes that caused homepage blackout.
With this direction, we identified the changes that were made directly on the live site without first testing them on the staging environment.
We had our breakthrough in less than an hour and were confident of recovering lost content soon.
Up Next, How to Recover The Lost Data?
After putting our heads together for half an hour, we identified two possible ways to recover lost data :
- Iterating through internal WordPress Revisions, and
- BlogVault Backups
Recovery Attempt 1: WordPress Revisions
WordPress maintains multiple revisions (or backups) of a page.
We went through the revisions and unfortunately, could not find the lost data. This led us to conclude that WordPress revisions are not dependable backup options.
A dead end!
Recovery Attempt 2: BlogVault Backups
BlogVault website is configured to backup changes in real-time i.e. Every little change is being recorded and backed up.
This made us confident that our website modification were present in one of the backups. We just needed to find the correct backup version.
Since restoring each backup individually is too time-consuming and can put our live site at risk we decided to utilize one of our key features at BlogVault called Test Restore.
It was time to try out the fruits of our labour!
What’s Test Restore?
This particular feature allows users to restore the desired backup in a Test Environment and not on the live site. It helps users make up their minds on which backup version to restore. After users find the backup version they are satisfied with, they can move that version from the Test Environment to the Live Site.
We started restoring backups in test environment one by one. Although we could easily find our changes but we were still not able to replicate the issue in any of our backup versions and we in the dark for the reason that led to this.
We went back to the drawing board for more ideas of recovery. During the discussion, one of our team members asked if we should perhaps check these test restores in a different environment where we have not tried before. And so we decided to run the “Test Restore” on another computer and in incognito mode.
As soon as we opened Test Restore in the incognito mode we saw the 404 error from the previous night. Bazingaa. We were finally able to replicate the error in one of the backups. Now, we had a backup version that not only had our changes but also replicated the issue. First milestone unlocked. The next step for us was to find out the change that led to this issue.
Finding Out What Went Wrong With the BlogVault Homepage
We were so close to victory and now just had to figure out the specific change that led to this issue. We decided to take a different approach this time around. Instead of going through all the individual changes. We decided that the team will look at the meta properties of the homepage to identify the issue. And that’s when someone noticed that the homepage was set to Private. This was our EUREKA moment! A moment of pure joy!
If you remember, it was a customer who informed Akshat and Aman that the homepage was down. The homepage gave off a 404 error because it was accidentally set to “Private”.
The only thing left for us now was to find out the person and time when the page was set to Private. If we can do that, we can recover the backup made right before the page was set to “Private”.
Recovery Attempt 3: Bingo!
The saying “When there’s a will, there’s a way!” – perfectly demonstrates what happens next.
After fiddling around with the page, we remembered having installed WPSecurity Audit Log on the website. We pulled out the changelog of the website and bingo! – found the exact time and the person who had set the homepage to “Private”.
We swiftly found the backup we were looking for and restored the successfully tested backup version and made the page LIVE. Woohoo … We were back to business … And this time, for real.
Remember when we said “familiarity breeds complacency”? Thanks to this incident, we recognized that the solid process we had in place (of staging before making any changes to the live site) was falling short and that we had to buckle up.
Another good thing that came out of this incident is that after a really long time, you could say we ate our own dog food. The entire team including new members got a chance to see the value of our product from a vantage point. If anything, it reiterated the following beliefs:
- The importance of having a rock solid reliable backup system with staging and test restore features. This literally saves hours and days (or weeks in certain cases) of work.
- And also, the value BlogVault adds to the life of a web developer and agency.
Post this incident, we are all supercharged with a strong belief to make a difference in the lives of all site owners, developers and agencies.