Nur damit ich es nicht vergesse: In einem mirrored setup mit Spares funktioniert der Austausch von defekten Platten etwas anders als in einem raidz setup.
So sieht es aus, wenn es "kaputt" ist:
[root@myserver-2 (de-dc-2) ~]# zpool status
pool: zones
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning
in a degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 38,6G in 0 days 00:21:01 with 0 errors on Sun Feb 19 06:43:45 2023
config:
NAME STATE READ WRITE CKSUM
zones DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
c1t2d0 REMOVED 0 0 0
c1t6d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
spares
c1t6d0 INUSE currently in use
c1t7d0 AVAIL
errors: No known data errors
"Kaputt" stimmt natürlich nicht ganz, denn eine unserer Spares, die wir extra für diesen Fall vorgehalten haben, ist ja eingesprungen.
Für den RAID-Controller sieht die Sache in dem Moment so aus:
[root@myserver-2 (de-dc-2) ~]# /opt/root/storcli/storcli /c0 /eall /sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.
Drive Information :
=================
-------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
-------------------------------------------------------------------------------
252:0 15 Onln 0 278.875 GB SAS HDD N N 512B MK3001GRRB U -
252:1 16 Onln 1 278.875 GB SAS HDD N N 512B ST9300653SS U -
252:2 17 Failed 7 278.875 GB SAS HDD N N 512B MK3001GRRB U -
252:3 10 Onln 2 278.875 GB SAS HDD N N 512B ST9300653SS U -
252:4 9 Onln 3 278.875 GB SAS HDD N N 512B ST9300653SS U -
252:5 11 Onln 4 278.875 GB SAS HDD N N 512B MK3001GRRB U -
252:6 12 Onln 5 278.875 GB SAS HDD N N 512B ST9300653SS U -
252:7 13 Onln 6 278.875 GB SAS HDD N N 512B ST9300653SS U -
-------------------------------------------------------------------------------
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
Nachdem die Platte getauscht wurde, sieht die Ausgabe so aus:
252:2 18 UGood F 278.875 GB SAS HDD N N 512B ST9300653SS U -
Die Platte hat offenbar noch eine "f"oreign Konfiguration, die wir erstmal löschen müssen:
[root@myserver-2 (de-dc-2) ~]# /opt/root/storcli/storcli /c0 /fall delete
Controller = 0
Status = Success
Description = Successfully deleted foreign configuration
Der RAID-Controller sieht es dann so:
252:2 18 UGood - 278.875 GB SAS HDD N N 512B ST9300653SS U -
Schnell noch ein RAID-0 auf der Platte angelegt:
[root@myserver-2 (de-dc-2) ~]# /opt/root/storcli/storcli /c0 add vd type=r0 drives=252:2
Controller = 0
Status = Success
Description = Add VD Succeeded
Danach kann die Platte dann mit format -e
und fdisk
neu partitioniert und gelabelt werden.
Für ZFS stellt sich die Lage jetzt so dar:
[root@myserver-2 (de-dc-2) ~]# zpool status -x
pool: zones
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-4J
scan: scrub in progress since Fri Feb 24 14:09:14 2023
115G scanned at 373M/s, 75,3G issued at 243M/s, 115G total
0 repaired, 65,17% done, 0 days 00:02:49 to go
config:
NAME STATE READ WRITE CKSUM
zones DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
c1t2d0 UNAVAIL 0 0 0 corrupted data
c1t6d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
spares
c1t6d0 INUSE currently in use
c1t7d0 AVAIL
errors: No known data errors
Und jetzt reicht es tatsächlich, mit zpool replace zones c1t2d0
ZFS mitzuteilen, dass die defekte Platte ausgetauscht worden ist, und ZFS fängt an zu resilvern:
[root@myserver-2 (de-dc-2) ~]# zpool status -x
pool: zones
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Feb 24 14:16:05 2023
115G scanned at 1,43G/s, 78,0G issued at 987M/s, 115G total
1,40G resilvered, 67,58% done, 0 days 00:00:38 to go
config:
NAME STATE READ WRITE CKSUM
zones DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
c1t2d0/old UNAVAIL 0 0 0 corrupted data
c1t2d0 ONLINE 0 0 1,06K (resilvering)
c1t6d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
spares
c1t6d0 INUSE currently in use
c1t7d0 AVAIL
errors: No known data errors
Wenn der Vorgang abgeschlossen ist, wird auch die eingesprungene Spare-Platte wieder frei:
[root@myserver-2 (de-dc-2) ~]# zpool status
pool: zones
state: ONLINE
scan: resilvered 38,8G in 0 days 00:20:49 with 0 errors on Fri Feb 24 14:36:54 2023
config:
NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
spares
c1t6d0 AVAIL
c1t7d0 AVAIL
errors: No known data errors