MongoDB: Execute backup + incremental restore using the oplog

I had a case a few days ago where the secondary node of a replica set went out-of-sync for days and could not recover because the oplog was rotated on the other nodes. However, the customer had the backup of the oplog and wanted to replay it since the full rebuild would take a long time because the database had Terabytes size. So how to replay the oplog until a state where the secondary can re-join the replica set? In fact, it is not a complex procedure. Below is a simulation of a real case:

1-) I have 3 MongoDB instances running (1 Primary, 2 Secondaries) – Primary is 37017:

2-) Started a workload on the Primary load to simulate an application using YCSB tool:

3-) In the middle of the workload, I stopped one of the secondaries (the 37018):

4-) Performing a dump of the oplog from the Primary (37017). It can be from the other secondary(37019) too:

And now we have the bson file:

5-) Now we will start the secondary that we want to restore in a different port and without the replSet tag to avoid undesired connections while we replay the oplog

6-) Apply the mongorestore with replay

Note that in my case, on step 4, I performed the full dump of the oplog without specifying the TS, this explains the errors on the beginning of the restore. Since the oplog operations are idempotent these errors do not affect the restore.

😎 Shutdown the node and bring back to the original configuration

9-) Checking the replica set status

So these are the steps required to achieve the objective. Note that it is necessary to have enough oplog space to ensure that the time window is enough to bring back the node.

Hope you enjoy and find it useful!

Leave a Reply