Hi Folks, I am trying to validate the <nebula snap...
# nebula
m
Hi Folks, I am trying to validate the nebula snapshot operations on my cluster and as part of this I have created 3 graph spaces (
my_space_1, my_space_2 and my_space_3
) and created a snapshot. Then I deleted the namespace
my_space_3
and tried to restore the snapshot by copying/overwriting the folders
data
and
wal
in the corresponding snapshot directory to its parent directory (at the same level with checkpoints) and restarted the nebula services. But I couldn't find the deleted namespace
my_space_3
after the restart. Can someone please take a look and help me with this, Thanks.
Copy code
(root@nebula) [(none)]> show spaces;
+--------------+
| Name         |
+--------------+
| "my_space_1" |
| "my_space_2" |
| "my_space_3" |
+--------------+
Got 3 rows (time spent 507/749 us)

Mon, 26 Sep 2022 10:59:24 UTC

(root@nebula) [(none)]> CREATE SNAPSHOT;
Execution succeeded (time spent 107299/107410 us)

Mon, 26 Sep 2022 10:59:29 UTC
(root@nebula) [(none)]> SHOW SNAPSHOTS;
+--------------------------------+---------+----------------------------------------+
| Name                           | Status  | Hosts                                  |
+--------------------------------+---------+----------------------------------------+
| "SNAPSHOT_2022_09_26_10_59_29" | "VALID" | "my_host_1:9779, my_host_2:9779" |
+--------------------------------+---------+----------------------------------------+
Got 1 rows (time spent 432/631 us)

(root@nebula) [(none)]> drop space my_space_3;
Execution succeeded (time spent 838/995 us)

Mon, 26 Sep 2022 11:05:40 UTC

(root@nebula) [(none)]> show spaces;
+--------------+
| Name         |
+--------------+
| "my_space_1" |
| "my_space_2" |

ubuntu@my_host:/usr/local/nebula/data/storage/nebula/1/checkpoints/SNAPSHOT_2022_09_26_10_59_29$ sudo cp -r data/ wal/ ../../
cp: 'data/OPTIONS-000007' and '../../data/OPTIONS-000007' are the same file
cp: 'data/000043.sst' and '../../data/000043.sst' are the same file
cp: 'data/000045.sst' and '../../data/000045.sst' are the same file
cp: 'data/000047.sst' and '../../data/000047.sst' are the same file
cp: 'wal/2/0000000000000030206.wal' and '../../wal/2/0000000000000030206.wal' are the same file
cp: 'wal/2/0000000000000030572.wal' and '../../wal/2/0000000000000030572.wal' are the same file
cp: 'wal/2/0000000000000000381.wal' and '../../wal/2/0000000000000000381.wal' are the same file
cp: 'wal/4/0000000000000030201.wal' and '../../wal/4/0000000000000030201.wal' are the same file

ubuntu@my_host :~$sudo /usr/local/nebula/scripts/nebula.service restart all
(root@nebula) [(none)]> show spaces;
+--------------+
| Name     |
+--------------+
| "my_space_1" |
| "my_space_2" |
+--------------+
Got 2 rows (time spent 471/657 us)
Hi @wey, Can you please take a look at this issue, Thanks.
w
Did you only do snapshot restore actions for storage? Snapshot for Meta data is still needed, too.
Copy code
usr/local/nebula/data/meta
m
Yes, I did snapshot restoration for both meta and storage. I have copied the snapshot metadata to the actual metadata directory before restarting the services.
Copy code
ubuntu@my_host_2:/usr/local/nebula/data/meta/nebula/0/checkpoints/SNAPSHOT_2022_09_26_10_59_29$ sudo cp -r data/ wal/ ../../
cp: 'data/OPTIONS-000013' and '../../data/OPTIONS-000013' are the same file
cp: 'data/000009.sst' and '../../data/000009.sst' are the same file
cp: 'data/000015.sst' and '../../data/000015.sst' are the same file
cp: 'data/000017.sst' and '../../data/000017.sst' are the same file
cp: 'wal/0/0000000000000299579.wal' and '../../wal/0/0000000000000299579.wal' are the same file
cp: 'wal/0/0000000000000242239.wal' and '../../wal/0/0000000000000242239.wal' are the same file
u
m
Yes, I did followed this steps only and tried restoring both meta and storage, but it's not working as expected
u
There are some problems about your restore procedure: 1. you should first stop all services.
2. you should del the old wal and data, then copy the ones in checkpoints folder
❤️ 1
3. after created snapshot, you should move all the checkpoints data from the original place to some where your backup folder. For if you drop the space, the checkpoints will be dropped either.
m
okay, Thanks. let me try that
I have followed the above steps and copied the snapshot data to the actual location and restarted the services, but now the
metad
service is not coming up and seeing the below error in the logs. Do we have any suggestions to fix this error ?, Thanks.
E20220929 14:40:58.028486 39193 MetaDaemon.cpp:182] Parser God Role error:E_LEADER_CHANGED
u
There's an annoying thing : Daemon can not read god user status so that it exits if leader change happend . This has been fixed in 3.3 version. But it only gives some retry times. If leader change still exist, then it should still exit. The best way to start meta is to fix leader change.
Try start meta in different order may help, else you have to try to compile a daemon to fix this.
❤️ 1
m
Thanks. Tried starting the metad in different orders , but that didn't help
Do you have any instructions to compile a metad daemon ?