Monday, November 09, 2020

Elasticsearch - Backup and Restore using Snapshots

The data in Elasticsearch can be backed up using snapshots and it can be restored to a different Elasticsearch cluster.

To take backup of the Elasticsearch on the source cluster, first we need to create a repository (in Elasticsearch) with the target location and type.

The type of the repository is, where do you want to store the backup files. The type can be File Share, Microsoft Azure, Amazon S3, Google Cloud or Hadoop HDFS.

Complete details on Creating a Snapshot is found at https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-register-repository.html


Creating a File Share Repository:

put http://10.21.32.43:9200/_snapshot/mybackup
{
  "type": "fs",
  "settings": {
    "location": "/data/elasticsearch/backup"
  }
}
The above request would create a repository of type File Share and the backup files would be copied to /data/elasticsearch/backup


Creating a Repository in Azure:

Complete details of Azure Repository can be found at https://www.elastic.co/guide/en/elasticsearch/plugins/current/repository-azure.html

Before you create a repository in Azure (or even for other providers), please check your access to Azure Repository from your local and from all the Elastic Search Servers. Admin can configure access to cloud storage based on IPs. When there is an IP restriction in Azure, there is no way to find out that from the error messages.

We have to install a plugin for Azure for backup/restore. Command for installing azure plugin is

bin/elasticsearch-plugin install repository-azure

In linux, you may have to use sudo (If not running from root account). elasticsearch may be found at /usr/share/elasticsearch/bin

After installing Azure plugin, we need to add the storage account name and the secret key in Elasticsearch to connect to Azure.

echo This/is/a/key/for/Azure/Storage== | elasticsearch-keystore add -stdin azure.client.default.key

echo AzureStorageAccountName | elasticsearch-keystore add -stdin azure.client.default.account

You may have to use sudo in linux.

Once Elasticsearch is configured to use Azure, we can create a repository.

put http://10.21.32.43:9200/_snapshot/mybackup

{
  "type": "azure",
  "settings": {
    "client": "default",
    "container": "test",
    "base_path": "testpath"
  }
}

This would create a repository, and the snapshot files would be created in testpath in the test container of the storage account.


Taking Snapshot:

The below request starts the process of creating a snapshot

put http://10.21.32.43:9200/_snapshot/mybackup/snapshot1?wait_for_completion=false

With the below request, you can see the status of the snapshot. For any snapshots in progress, it would show the details of the snapshot like how many indices are completed, in progress, not yet started etc. If there are no pending snapshots, it would return an empty array.

put http://10.21.32.43:9200/_snapshot/_status


Incremental Snapshots:

When you take multiple snapshots in a repository, each snapshot is complete for all practical purposes. Any snapshot can be restored or deleted independent of any other snapshot.

But, internally, the snapshots are incremental. i.e., when you take another snapshot in the same repository, it would create the files for the changes that happened since the last snapshot. The metadata of the snapshot would be created appropriately to get the data required from that snapshot.


Creating Read Only Repository (On Destination):

On the destination Elasticsearch cluster, we need to create a read only repository pointing to the same location (either file share or Azure or any other cloud provider). Making the repository as read only is the only difference with respect to the creation of the repository on the source cluster.

put http://10.89.78.67:9200/_snapshot/mybackup

{
  "type": "azure",
  "settings": {
    "client": "default",
    "container": "test",
    "base_path": "testpath",
    "readonly": true
  }
}


Restoring the Snapshots:

The below request would restore the snapshot on the destination cluster.

post http://10.89.78.67:9200/_snapshot/mybackup/snapshot1/_restore


Restoring Status:

There is no specific request to check the status of the restore. However, when the restore is started, the cluster would go to yellow state. Once it goes to Green state, we can understand that the restore is completed.

The below request gives the status of the cluster.

get http://10.89.78.67:9200/_cluster/_health


Restoring Incremental Snapshots:

When you restore incremental snapshots, you would get an error since there is already an index in the destination cluster with the same name as the index that the restore is trying to create.

To restore the snapshot from second time onwards, you need to close the existing indices, so that the cluster can update them.

Request to close all the indices

post http://10.89.78.67:9200/_all/_close

Request to close one index

post http://10.89.78.67:9200/indexName/_close


Restore after Closing the Indices:

During the restore, Elasticsearch would open the indices that needed to be updated. If any index is not there in the snapshot, then that index would not be opened and it would continue to be in closed state.

If you are using the destination cluster only as a backup of another cluster, if you see any index in closed state after a restore, it means that index was deleted from the source cluster.


Handling Aliases:

If you are having aliases in the source cluster, and if the underlying index is getting changed and the old index is getting deleted, then you need to do special handling on the destination cluster.

For example, in the source cluster, you create one index everyday. The index name may be something like IN20201105, IN20201106 etc. You have an alias named INDate, which is pointing to that day's index. You keep only the active index and remove all the old indices.

In this case, when you take snapshots everyday, the indices are going to be different each day. The index that was present yesterday won't be present today.

When you restore the snapshots everyday on the destination cluster, the deleted indices would be in closed state.

When the indices are restored, the aliases would be restored with the relationship with the new index, but it won't delete the relationship with the old index. The alias would be pointing to both old and new indices. When you query the alias, it would have both the old index (that was deleted in the source cluster and closed in the destination cluster) as well as the new index. Since, one index in the alias is in closed state, you would get an error when you are trying to query it.

The simple option is deleting all the indices that are in closed state after a restore.

Request for deleting an index

delete http://10.89.78.67:9200/indexName/


Automated Snapshot and Restore everyday:

The snapshot can be taken in the source cluster without having any impact on the source cluster. However when you restore, the cluster would be down for some time.

You can use any scripting language that you are comfortable for the automation. I used WGET and VIM for automatic snapshot and restore.

There are 4 steps in the entire process of snapshot and restore. First step is on the source cluster and the next 3 steps are on the destination cluster. You have to schedule in such a way that, before the second step is started, the first step is completed.


1. Take Backup in the source cluster

Request in Linux

wget -d --method=put "http://10.21.32.43:9200/_snapshot/prodbackup/s`date +\%Y\%m\%d`?wait_for_completion=false" -O CreateSnapshotES.txt -o CreateSnapshotNetwork.txt

Request in Windows

In different versions of Windows (and/or localization), the date command is displayed with different formats. You need to check the date command's format.

wget -d --method=put "http://10.21.32.43:9200/_snapshot/prodbackup/s%DATE:~-4%%date:~4,2%%date:~7,2%?wait_for_completion=false" -O CreateSnapshotES.txt -o CreateSnapshotNetwork.txt


2. Close the indices on the target cluster

Request to close all the indices

wget -d --method=post "http://10.21.32.43:9200/_all/_close" -O CloseDRIndicesES.txt -o CloseDRIndicesNetwork.txt


3. Restore the snapshot on the target cluster

Request to restore the snapshot on the target cluster [Date part of the request should be same as in the request 1.]

wget -d --method=post "http://10.89.78.67:9200/_snapshot/prodbackup/s`date +\%Y\%m\%d`/_restore" -O DRRestoreES.txt -o DRRestoreNetwork.txt


4. Delete the closed indices

I used VIM and WGET for automating this.

Create a file vimclose.vim with the following content (Change the IP address in your script). Make sure that there is a new line at the end (after :wq).

:v/close/d
:%s/.*close *//
:%s/  *.*//
:%s/\(.*\)/wget -d --method=delete "http:\/\/10.89.78.67:9200\/\1" -O \1.json -o \1.txt/
:wq

Create a batch file or shell script with the following content.

For Windows.

wget "http://10.89.78.67:9200/_cat/indices?v&s=index" -O closedindices.bat
vim -s vimclose.vim closedindices.bat
closedindices.bat

For Linux

wget "http://10.89.78.67:9200/_cat/indices?v&s=index" -O closedindices.sh
vim -s vimclose.vim closedindices.sh
bash closedindices.sh

No comments:

Post a Comment