VMware, PowerShell and much automation

We run many of our servers as virtual machines on VMware GSX Server. This makes backups easy; simply power down the VM and copy the relevant disk image. It’s also rather nice for setting up a test server from a known-good state every day.

We already have a VM that builds in the early hours of the morning, right after the nightly production build. It has a non-persistent disk image, which means that all changes to the hard disk are lost when it is switched off. A script shuts it down prior to the production build, then starts it again afterwards. Another script starts on the VM at boot time to run the necessary installers.

This setup is simple and effective, but the installers can take a while to run. The sample data is a particular problem, sometimes taking as long as forty-five minutes to insert into the database. This means that, should the VM fail to build during the night, there’s a very long period of time between fixing the problem and having a working test server. Plus, the server image is connected to the Windows domain and Active Directory sometimes gets confused by the disk image reverting. This always requires manual intervention and the process for fixing it is prone to error.

With the new requirement for multiple similar servers running automatic tests in parallel, I decided to solve all our problems at once by writing a set of scripts to build server images independently of the network and the domain. The first criterion, network independence, arises from the fact that all the servers will come from a single base image with a single hostname, and running more than one at a time is going to really confuse the network. Changing the hostname is high on the list of priorities for the image builder.

The system I decided upon involves attaching a ‘parameter disk’ to the VM, containing a Bootstrap.ps1 script and all the programs necessary to configure the VM. The base image will search for the bootstrap script every time it starts and run it if possible. This lets us shut down a built image without losing everything on it (as we did with the old system) and without having the installers run on every boot. Once the image is set up, we just disconnect the parameter disk and it’ll behave as an ordinary VM.

Implementing all this has taken most of a fortnight and expanded our PowerShell script library significantly. Most of the complexity comes from the scripts’ interaction with VMware. This interaction occurs primarily through VMware’s COM API, but some things aren’t supported, particularly modifications to the VMs’ hardware configuration.

On the host machine, where the VMs are manipulated:

The image building script does the following:

  1. Copies the base image to the target directory,
  2. Calls another script which sets up the parameter disk,
  3. Boots the VM and waits for it to exit,
  4. Disconnects the parameter disk.

Our test server VM requires a few more things to be done afterwards, like checking the logs on the parameter disk, setting the new image’s MAC address, attaching it to the network, shutting down the old test server and booting the new one.

I needed to be able to set the VM’s MAC address because all our other machines are on the domain, the test server cannot be, and I need to give the VM a known IP so it can coincide with the IP of the old test server, which was on the domain and is the host most of our other machines have bookmarked.

The API provides a means for modifying configuration info like MAC addresses and hard disk devices in memory (VmCtl.Config) but any changes made this way are lost when the VM process terminates. Making permanent changes that will persist when you move the VM image elsewhere requires direct manipulation of the .vmx file, which does not appear to be very well documented. I take this to mean that it’s not really meant to be tinkered with, but that’s just an invitation…

Fortunately, Google turned up this page among the documentation, describing the procedure for setting a static MAC address.

On the VM, where the parameter disk is consumed:

Building two dozen servers for automated testing isn’t much use if they can’t connect to the network because they all have the same hostname. We need some means of changing the VM’s hostname. All Windows machines have a supposedly-unique System ID (SID) as well, and I’d like to change that during image construction just in case we need build machines for the domain.

Windows does not support changing the SID after the GUI phase of system installation has begun. The base image contains a thoroughly cooked Windows system, which has not only fully completed the OS installation but has several apps installed as well. Fortunately for Windows sysadmins everywhere, there’s a nice little app called NewSID, a part of the Sysinternals collection, which can change the SID after installation is complete. Running it in non-interactive mode can be done via a command-line switch, ‘/a’.

At least, in theory.

Microsoft has added a EULA to each of the Sysinternals tools, which means that NewSID will sit and wait for someone to click a button before doing anything, even in ‘non-interactive’ mode. This can be circumvented by creating a registry key before you run NewSID:

[HKEY_CURRENT_USERSOFTWARESysinternalsNewSID]
"EulaAccepted"=dword:1

Of course, NewSID will reboot the machine after it runs. For our purposes it should only be run once, no matter how many times the system is rebooted subsequently.

So the PowerShell script for running NewSID on a machine and changing its hostname to $compName looks something like this:

if($env:COMPUTERNAME -eq $compName)
{
[do subsequent setup tasks here];
}
else
{
if(-not (Test-Path HKCU:/SOFTWARE/Sysinternals))
{
New-Item HKCU:/SOFTWARE/Sysinternals
}
$key = New-Item HKCU:/SOFTWARE/Sysinternals/NewSID;
$key.SetValue("EulaAccepted", 1);
$p = [diagnostics.process]::Start("newsid.exe", "/a ${compName}");
$p.WaitForExit();
}

Construction of the parameter disk:

Parameter disks are defined by INCLUDES configuration files which contain a list of files to put on the disk. This turned out to be a bit limiting; NewSID will need to be run for all our server configurations and adding that functionality isn’t as simple as just putting files on the disk.

So packages can also be included. A package is basically a directory with a BUILD.ps1 script in it, which takes a path to the parameter disk and can assume it’s running from the package’s own directory. For most packages this script just copies files, but the NewSID one has to do something a little more complicated.

A package is not allowed to make any assumptions about the parameter disk apart from the presence of a Bootstrap.ps1 file. All parameter disk configurations will have this file and it will be the first thing run. NewSID’s BUILD.ps1 has to set up the disk so that its own Bootstrap.ps1 gets run first, which is done by renaming the existing Bootstrap.ps1 and chaining to it from its own.

This does come with the caveat that NewSID must be the last package included, otherwise it might get displaced by another.

An INCLUDES file looks something like this:

Bootstrap.ps1
@Installers
@NewSID [hostname]

Filenames are relative to the directory containing the INCLUDES file. Any line starting with an ‘@’ is assumed to be a package name. Packages can be given a list of comma-delimited arguments, which will be passed as an array as the second argument to BUILD.ps1.

At present, our scripts are rather tied to our filesystem layout and server configurations. Once they’re a bit more portable I’ll make them available for download.

UPDATE: These scripts are rapidly approaching obsolescence, what with our decision to manage our VMs with VirtualCenter. Managed.VIM and VirtualCloud will supercede the scripts and be a lot less nasty to maintain.