Recovering from a failed Reprovision in Triton

Dieser Beitrag ist im April 2018 erschienen


Ich sammle hier ja fast alle Fehler, die mir bisher beim Betrieb und Updates von Triton so vor die Füße gefallen sind. Gestern war es wieder soweit. Das Update von cns0 in de-gt-1 brach ab und ließ das System wie folgt zurück:

[root@headnode (my-dc-1) ~]# sdcadm update cns -y --force-data-path
Finding candidate update images for the "cns" service.
Using channel release
Up-to-date.
[root@headnode (my-dc-1) ~]# vmadm list |grep cns
7ef0c7c3-e0af-42a3-9495-06475f60eedd  OS    1024     stopped  cns0
[root@headnode (my-dc-1) ~]# vmadm reprovision 7ef0c7c3-e0af-42a3-9495-06475f60eedd -f /opt/root/cns.json
Failed to reprovision VM 7ef0c7c3-e0af-42a3-9495-06475f60eedd: Command failed: cannot open 'zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/data': dataset does not exist
[root@headnode (my-dc-1) ~]# ls -la /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/data
/zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/data: No such file or directory
[root@headnode (my-dc-1) ~]# ls -la /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/
total 38
drwx------   2 root     staff          2 Apr. 18 07:21 .
drwxr-xr-x  83 root     root          87 Apr. 18 08:28 ..
[root@headnode (my-dc-1) ~]# vmadm start 7ef0c7c3-e0af-42a3-9495-06475f60eedd
Unable to start VM 7ef0c7c3-e0af-42a3-9495-06475f60eedd: first of 1 error: Command failed: could not verify zfs dataset zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/data: dataset does not exist
zoneadm: zone 7ef0c7c3-e0af-42a3-9495-06475f60eedd failed to verify
[root@headnode (my-dc-1) /var/tmp]# zfs list |grep 7ef0c7c3-e0af-42a3-9495-06475f60eedd           
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd                          0    25G   536M /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd                                                      
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-data    46K   686G    46K /data
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-root  6,89M  25,0G   535M /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-root
zones/cores/7ef0c7c3-e0af-42a3-9495-06475f60eedd                  23K   100G    23K /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/cores

Trent Mick von Joyent wieß mich auf diesen Gist hin:

What vmadm reprovision does is:

  • stops the VM (Same as 'vmadm stop <uuid>'
  • sets the 'transition' marker on the zone via zonecfg to indicate it should be state 'provisioning'
  • sets zoned=off on the delegated dataset (if there is one)
  • zfs rename -f zones/<uuid>/data zones/<uuid>-reprovisioning-data (if there's a delegated dataset)
  • umount /zones/<uuid>/cores
  • zfs rename -f zones/<uuid> zones/<uuid>-reprovisioning-root
  • zfs clone -o quota=<quota>G <new_image_snapshot> zones/<uuid>
  • copy in the /zones/<uuid>-reprovisioning-root/config/* to the new zone's /zones/<uuid>/config
  • zfs destroy -r zones/<uuid>-reprovisioning-root
  • mount zones/cores/<uuid>
  • run the brand install script (eg. /usr/lib/brand/joyent-minimal/jinstall)
  • zfs rename -f zones/<uuid>-reprovisioning-data zones/<uuid>/data (if there was a delegated dataset)
  • zfs set zoned=on zones/<uuid>/data
  • update the dataset-uuid (image_uuid) field in zonecfg for the zone (indicating the new image's uuid)
  • zfs set compression=<value> zones/<uuid> (if it was set before)
  • zfs set recsize=<value> zones/<uuid> (if it was set before)
  • update the mdata:execute service's start method timeout using svccfg (if it was set before)
  • write out a zoneconfig file for zoneinit if this brand requires it
  • write out the zone's netfiles (/etc/hostname., /etc/defaultrouter, /etc/dhcp., etc...) if this brand requires it
  • delete garbage (old log files, etc.) from the new zoneroot if the brand requires it (some old images were full of garbage)
  • start the VM (Same as 'vmadm start <uuid>')
  • remove the provisioning transition flag and mark the zone as running again

What different failure cases do we need to worry about?

Dealing with failure we can ignore some of the things vmadm does because they don't impact the recovery process. The main failure types we have here are:

  1. failing before we've renamed any datasets (eg. in the stop)
  2. failing after we've renamed the delegated dataset
  3. failing after we've umounted cores
  4. failing after we've renamed the zoneroot
  5. failing after we've created a new zoneroot
  6. failing after we've destroyed the old zoneroot
  7. failing after we've renamed the delegated back to its original name
  8. failing after we've re-added the zoned property to the delegated dataset

How do I recover from the above cases?

Basically any case after #8 where we fail, we'll be dealing with modifications to the new zoneroot which all are going to involve performing the same actions. First identify which step you've failed on in the previous section then the recovery should involve starting at the same number here:

  1. if we fail here, stopping the zone manually and re-running the reprovision should be fine.
    vmadm stop <uuid>
  2. if we fail here, we'll want to rename the delegated dataset back to its original name and re-add the zoned=on property, then reprovision again
    zfs rename -f zones/<uuid>-reprovisioning-data zones/<uuid>/data
    zfs set zoned=on zones/<uuid>/data
  3. if we fail here, we'll want to do everything from 2 and then also remount the cores dataset then reprovision again
    zfs mount zones/cores/<uuid>
  4. if we fail here, we'll want to rename the zoneroot back to its original name then goto 3
    zfs rename zones/<uuid>-reprovisioning-root zones/<uuid>
  5. if we fail here, we'll want to destroy the newly cloned zoneroot then goto 4
    zfs destroy zones/<uuid>
  6. if we fail here, we'll need to move the new zoneroot to the correct name (if it isn't) and goto 3
  7. if we fail here, we're probably best off adding the zoned=on flag back in and trying the reprovision again
    zfs set zoned=on zones/<uuid>/data
  8. any failure at this point we should be able to just re-run the reprovision, all the steps here modify data in the zoneroot and we'll be destroying that when we reprovision anyway. So that will undo anything these do.

    Notes: 
  • re-running reprovision may end us up with the zone stopped in the end in some cases. In that case we'll also want to make sure that the zone is started when we're done.
  • if something goes really wrong and your zone comes up but goes into state 'failed' after booting, you can remove the 'failed' state as follows, then reprovision again:
    zonecfg -z <zonename> 'remove attr name=failed'

In meinem Fall war offenbar nach Schritt fünf etwas schiefgegangen, da ein reprovisioning-root und ein neues zoneroot existiert (welches aber leer ist):

[root@headnode (my-dc-1) /var/tmp]# zfs list |grep 7ef0c7c3-e0af-42a3-9495-06475f60eedd           
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd                          0    25G   536M /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd                                                      
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-data    46K   686G    46K /data
zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-root  6,89M  25,0G   535M /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd-reprovisioning-root
zones/cores/7ef0c7c3-e0af-42a3-9495-06475f60eedd                  23K   100G    23K /zones/7ef0c7c3-e0af-42a3-9495-06475f60eedd/cores

Wir starten mit der Reparatur also im Schritt fünf und löschen das leere zoneroot, gehen zu Schritt vier und benennen reprovisioning-root um, mounten in Schritt drei das core-Verzeichnis, benennen in Schritt zwei das reprovisioning-data Verzeichnis um und schalten zoned auf on. Da die VM sowieso gestoppt ist, brauchen wir sie nicht zu stoppen sondern können gleich mit der Reprovisionierung starten. Dafür erzeugen wir zunächst das beschreibende JSON-File und können danach damit die Zone Reprovisionieren:

[root@headnode (my-dc-1) ~]# vmadm get 7ef0c7c3-e0af-42a3-9495-06475f60eedd > /var/tmp/cns-reprovision.json
[root@headnode (my-dc-1) ~]# vmadm reprovision 7ef0c7c3-e0af-42a3-9495-06475f60eedd -f /var/tmp/cns-reprovision.json

Danach lief die Zone wieder und ließ sich mit sdcadm update cns -y aktualisieren. Diesmal ohne Abbruch.