2 | Testbox Imaging (Backup / Restore)
3 | ==================================
6 | Introduction
7 | ------------
8 |
9 | This document is explores deloying a very simple drive imaging solution to help
10 | avoid needing to manually reinstall testboxes when a disk goes bust or the OS
11 | install seems to be corrupted.
12 |
13 |
14 | Definitions / Glossary
15 | ======================
16 |
17 | See AutomaticTestingRevamp.txt.
18 |
19 |
20 | Objectives
21 | ==========
22 |
23 | - Off site, no admin interaction (no need for ILOM or similar).
24 | - OS independent.
25 | - Space and bandwidth efficient.
26 | - As automatic as possible.
27 | - Logging.
28 |
29 |
30 | Overview of the Solution
31 | ========================
32 |
33 | Here is a brief summary:
34 |
35 | - Always boot testboxes via PXE using PXELINUX.
36 | - Default configuration is local boot (hard disk / SSD)
37 | - Restore/backup action triggered by machine specific PXE config.
38 | - Boots special debian maintenance install off NFS.
39 | - A maintenance service (systemd style) does the work.
40 | - The service reads action from TFTP location and performs it.
41 | - When done the service removes the TFTP machine specific config
42 | and reboots the system.
43 |
44 | Maintenance actions are:
45 | - backup
46 | - backup-again
47 | - restore
48 | - refresh-info
49 | - rescue
50 |
51 | Possible modifier that indicates a subset of disk on testboxes with other OSes
52 | installed. Support for partition level backup/restore is not explored here.
53 |
54 |
55 | How to use
56 | ----------
57 |
58 | To perform one of the above maintenance actions on a testbox, run the
59 | ``testbox-pxe-conf.sh`` script::
60 |
61 | /mnt/testbox-tftp/pxeclient.cfg/testbox-pxe-conf.sh rescue
62 |
63 | Then trigger a reboot. The box will then boot the NFS rooted debian image and
64 | execute the maintenance action. On success, it will remove the testbox hex-IP
65 | config file and reboot again.
66 |
67 |
68 | Storage Server
69 | ==============
70 |
71 | The storage server will have three areas used here. Using NFS for all three
72 | avoids extra work getting CIFS sharing right too (NFS is already a pain).
73 |
74 | 1. /export/testbox-tftp - TFTP config area. Read-write.
75 | 2. /export/testbox-backup - Images and logs. Read-write.
76 | 3. /export/testbox-nfsroot - Custom debian. Read-only, no root squash.
77 |
78 |
79 | TFTP (/export/testbox-tftp)
80 | ============================
81 |
82 | The testbox-tftp share needs to be writable, root squashing is okay.
83 |
84 | We need files from both PXELINUX and SYSLINUX to make this work now. On a
85 | debian system, the ``pxelinux`` and ``syslinux`` packages needs to be
86 | installed. We actually do this further down when setting up the nfsroot, so
87 | it's possible to get them from there by postponing this step a little. On
88 | debian 8.6.0 the PXELINUX files are found in ``/usr/lib/PXELINUX`` and the
89 | SYSLINUX ones in ``/usr/lib/syslinux``.
90 |
91 | The initial PXE image as well as associated modules comes in three variants,
92 | BIOS, 32-bit EFI and 64-bit EFI. We'll only need the BIOS one for now.
93 | Perform the following copy operations::
94 |
95 | cp /usr/lib/PXELINUX/pxelinux.0 /mnt/testbox-tftp/
96 | cp /usr/lib/syslinux/modules/*/ldlinux.* /mnt/testbox-tftp/
97 | cp -R /usr/lib/syslinux/modules/bios /mnt/testbox-tftp/
98 | cp -R /usr/lib/syslinux/modules/efi32 /mnt/testbox-tftp/
99 | cp -R /usr/lib/syslinux/modules/efi64 /mnt/testbox-tftp/
100 |
101 |
102 | For simplicitly, all the testboxes boot using good old fashioned BIOS, no EFI.
103 | However, it doesn't really hurt to be prepared.
104 |
105 | The PXELINUX related files goes in the root of the testbox-tftp share. (As
106 | mentioned further down, these can be installed on a debian system by running
107 | ``apt-get install pxelinux syslinux``.) We need the ``*pxelinux.0`` files
108 | typically found in ``/usr/lib/PXELINUX/`` on debian systems (recent ones
109 | anyway). It is possible we may need one ore more fo the modules [1]_ that
110 | ships with PXELINUX/SYSLINUX, so do copy ``/usr/lib/syslinux/modules`` to
111 | ``testbox-tftp/modules`` as well.
112 |
113 |
114 | The directory layout related to the configuration files is dictated by the
115 | PXELINUX configuration file searching algorithm [2]_. Create a subdirectory
116 | ``pxelinux.cfg/`` under ``testbox-tftp`` and create the world readable file
117 | ``default`` with the following content::
118 |
119 | PATH bios
120 | DEFAULT local-boot
121 | LABEL local-boot
123 |
124 | This will make the default behavior to boot the local disk system.
125 |
126 | Copy the ``testbox-pxe-conf.sh`` script file found in the same directory as
127 | this document to ``/mnt/testbox-tftp/pxelinux.cfg/``. Edit the copy to correct
128 | the IP addresses near the top, as well as any linux, TFTP and PXE details near
129 | the bottom of the file. This script will generate the PXE configuration file
130 | when performing maintenance on a testbox.
131 |
132 |
133 | Images and logs (/export/testbox-backup)
134 | =========================================
135 |
136 | The testbox-backup share needs to be writable, root squashing is okay.
137 |
138 | In the root there must be a file ``testbox-backup`` so we can easily tell
139 | whether we've actually mounted the share or are just staring at an empty mount
140 | point directory.
141 |
142 | The ``testbox-maintenance.sh`` script maintains a global log in the root
143 | directory that's called ``maintenance.log``. Errors will be logged there as
144 | well as a ping and the action.
145 |
146 | We use a directory layout based on dotted decimal IP addresses here, so for a
147 | server with the IP all its file will be under ````:
148 |
149 | ``<hostname>``
150 | The name of the testbox (empty file). Help finding a testbox by name.
151 |
152 | ``testbox-info.txt``
153 | Information about the testbox. Starting off with the name, decimal IP,
154 | PXELINUX style hexadecimal IP, and more.
155 |
156 | ``maintenance.log``
157 | Maintenance log file recording what the maintenance service does.
158 |
159 | ``disk-devices.lst``
160 | Optional list of disk devices to consider backuping up or restoring. This is
161 | intended for testboxes with additional disks that are used for other purposes
162 | and should touched.
163 |
164 | ``sda.raw.gz``
165 | The gzipped raw copy of the sda device of the testbox.
166 |
167 | ``sd[bcdefgh].raw.gz``
168 | The gzipped raw copy sdb, sdc, sde, sdf, sdg, sdh, etc if any of them exists
169 | and are disks/SSDs.
170 |
171 |
172 | Note! If it turns out we can be certain to get a valid host name, we might just
173 | switch to use the hostname as the directory name instead of the IP.
174 |
175 |
176 | Debian NFS root (/export/testbox-nfsroot)
177 | ==========================================
178 |
179 | The testbox-nfsroot share should be read-only and must **not** have root
180 | squashing enabled. Also, make sure setting the set-uid-bit is allowed by the
181 | server, or ``su` and ``sudo`` won't work
182 |
183 | There are several ways of creating a debian nfsroot, but since we've got a
184 | tool like VirtualBox around we've just installed it in a VM, prepared it,
185 | and copied it onto the NFS server share.
186 |
187 | As of writing debian 8.6.0 is current, so a minimal 64-bit install of it was
188 | done in a VM. After installation the following modifications was done:
189 |
190 | - ``apt-get install pxelinux syslinux initramfs-tools zip gddrescue sudo joe``
191 | and optionally ``apt-get install smbclient cifs-utils``.
192 |
193 | - ``/etc/default/grub`` was modified to set ``GRUB_CMDLINE_LINUX_DEFAULT`` to
194 | ``""`` instead of ``"quiet"``. This allows us to see messages during boot
195 | and perhaps spot why something doesn't work on a testbox. Regenerate the
196 | grub configuration file by running ``update-grub`` afterwards.
197 |
198 | - ``/etc/sudoers`` was modified to allow the ``vbox`` user use sudo without
199 | requring any password.
200 |
201 | - Create the directory ``/etc/systemd/system/[email protected]`` and create
202 | the file ``noclear.conf`` in it with the following content::
203 |
204 | [Service]
205 | TTYVTDisallocate=no
206 |
207 | This stops getty from clearing VT1 and let us see the tail of the boot up
208 | messages, which includes messages from the testbox-maintenance service.
209 |
210 | - Mount the testbox-nfsroot under ``/mnt/`` with write privileges. (The write
211 | privileges are temporary - don't forget to remove them later on.)::
212 |
213 | mount -t nfs myserver.com:/export/testbox-nfsroot
214 |
215 | Note! Adding ``-o nfsvers=3`` may help with some NTFv4 servers.
216 |
217 | - Copy the debian root and dev file system onto nfsroot. If you have ssh
218 | access to the NFS server, the quickest way to do it is to use ``tar``::
219 |
220 | tar -cz --one-file-system -f /mnt/testbox-maintenance-nfsroot.tar.gz . dev/
221 |
222 | An alternative is ``cp -ax . /mnt/. && cp -ax dev/. /mnt/dev/.`` but this
223 | is quite a bit slower, obviously.
224 |
225 | - Edit ``/etc/ssh/sshd_config`` setting ``PermitRootLogin`` to ``yes`` so we can ssh
226 | in as root later on.
227 |
228 | - chroot into the nfsroot: ``chroot /mnt/``
229 |
230 | - ``mount -o proc proc /proc``
231 |
232 | - ``mount -o sysfs sysfs /sys``
233 |
234 | - ``mkdir /mnt/testbox-tftp /mnt/testbox-backup``
235 |
236 | - Recreate ``/etc/fstab`` with::
237 |
238 | proc /proc proc defaults 0 0
239 | /dev/nfs / nfs defaults 1 1
240 | /mnt/testbox-tftp nfs tcp,nfsvers=3,noauto 2 2
241 | /mnt/testbox-backup nfs tcp,nfsvers=3,noauto 3 3
242 |
243 | We use NFS version 3 as that works better for our NFS server and client,
244 | remove if not necessary. The ``noauto`` option is to work around mount
245 | trouble during early bootup on some of our boxes.
246 |
247 | - Do ``mount /mnt/testbox-tftp && mount /mnt/testbox-backup`` to mount the
248 | two shares. This may be a good time to execute the instructions in the
249 | sections above relating to these two shares.
250 |
251 | - Edit ``/etc/initramfs-tools/initramfs.conf`` and change the ``MODULES``
252 | value from ``most`` to ``netboot``.
253 |
254 | - Append ``aufs`` to ``/etc/initramfs-tools/modules``. The advanced
255 | multi-layered unification filesystem (aufs) enables us to use a
256 | read-only NFS root. [3]_ [4]_ [5]_
257 |
258 | - Create ``/etc/initramfs-tools/scripts/init-bottom/00_aufs_init`` as
259 | an executable file with the following content::
260 |
261 | #!/bin/sh
262 | # Don't run during update-initramfs:
263 | case "$1" in
264 | prereqs)
265 | exit 0;
266 | ;;
267 | esac
268 |
269 | modprobe aufs
270 | mkdir -p /ro /rw /aufs
271 | mount -t tmpfs tmpfs /rw -o noatime,mode=0755
272 | mount --move $rootmnt /ro
273 | mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro
274 | mkdir -p /aufs/rw /aufs/ro
275 | mount --move /ro /aufs/ro
276 | mount --move /rw /aufs/rw
277 | mount --move /aufs /root
278 | exit 0
279 |
280 | - Update the init ramdisk: ``update-initramfs -u -k all``
281 |
282 | Note! It may be necessary to do ``mount -t tmpfs tmpfs /var/tmp`` to help
283 | this operation succeed.
284 |
285 | - Copy ``/boot`` to ``/mnt/testbox-tftp/maintenance-boot/``.
286 |
287 | - Copy the ``testbox-maintenance.sh`` file found in the same directory as this
288 | document to ``/root/scripts/`` (need to create the dir) and make it
289 | executable.
290 |
291 | - Create the systemd service file for the maintenance service as
292 | ``/etc/systemd/system/testbox-maintenance.service`` with the content::
293 |
294 | [Unit]
295 | Description=Testbox Maintenance
296 | After=network.target
297 | [email protected]
298 |
299 | [Service]
300 | Type=oneshot
301 | RemainAfterExit=True
302 | ExecStart=/root/scripts/testbox-maintenance.sh
303 | ExecStartPre=/bin/echo -e \033%G
304 | ExecReload=/bin/kill -HUP $MAINPID
305 | WorkingDirectory=/tmp
306 | Environment=TERM=xterm
307 | StandardOutput=journal+console
308 |
309 | [Install]
310 | WantedBy=multi-user.target
311 |
312 | - Enable our service: ``systemctl enable /etc/systemd/system/testbox-maintenance.service``
313 |
314 | - xxxx ... more ???
315 |
316 | - Before leaving the chroot, do ``mount /proc /sys /mnt/testbox-*``.
317 |
318 |
319 | - Testing the setup from a VM is kind of useful (if the nfs server can be
320 | convinced to accept root nfs mounts from non-privileged clinet ports):
321 |
322 | - Create a VM using the 64-bit debian profile. Let's call it "pxe-vm".
323 | - Mount the TFTP share somewhere, like M: or /mnt/testbox-tftp.
324 | - Reconfigure the NAT DHCP and TFTP bits::
325 |
326 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/AboveDriver NAT
327 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Action mergeconfig
328 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/TFTPPrefix M:/
329 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/BootFile pxelinux.0
330 |
331 | - Create the file ``testbox-tftp/pxelinux.cfg/0A00020F`` containing::
332 |
333 | PATH bios
334 | DEFAULT maintenance
335 | LABEL maintenance
336 | MENU LABEL Maintenance (NFS)
337 | KERNEL maintenance-boot/vmlinuz-3.16.0-4-amd64
338 | APPEND initrd=maintenance-boot/initrd.img-3.16.0-4-amd64 ro ip=dhcp aufs=tmpfs \
339 | boot=nfs root=/dev/nfs nfsroot=
340 | LABEL local-boot
342 |
343 |
345 |
346 | .. [1] See http://www.syslinux.org/wiki/index.php?title=Category:Modules
347 | .. [2] See http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
348 | .. [3] See https://en.wikipedia.org/wiki/Aufs
349 | .. [4] See http://shitwefoundout.com/wiki/Diskless_ubuntu
350 | .. [5] See http://debianaddict.com/2012/06/19/diskless-debian-linux-booting-via-dhcppxenfstftp/
351 |
352 |
355 | :Status: $Id: TestBoxImaging.txt 64601 2016-11-08 16:34:21Z vboxsync $
356 | :Copyright: Copyright (C) 2010-2016 Oracle Corporation.
357 |
358 |