dribble from the Tech World


VPN Router on a Stick

Previously, when using a Cisco PIX firewall, VPN 3000 (Altiga), or other VPN hardware as an endpoint for a L2L or remote access VPN connection over the internet, 2 explicit internet facing interfaces were needed to allow internet access for these VPN connections. This was due to the fact that internet traffic would need to leave the internet interface (unencrypted).....which is the same interface that the original encrypted traffic came in on. So it was simply not possible for this traffic to use a single interface to come in encrypted and leave unencrypted...a workaround to this if 2 interfaces were not available was to use split tunneling.

What is split tunneling? It uses ACLs to specify what traffic should be tunneled and what traffic should not be sent through the VPN. So traffic destined for all of the subnets on the corporate LAN will be sent through the VPN tunnel, and all other traffic (internet traffic) will NOT be sent over the VPN. The problem with this configuration is security - a system is connected to both the 'trusted' corporate LAN and the untrusted internet. In a standard, all-traffic-tunneled VPN, all network traffic from the remote endpoint (or network) is tunneled back to the corporate LAN and further internet access is controlled.

This is no longer the case. To start, lets take a look at exactly what were talking about:



This is now possible using PIX or ASA code version 7.2 or higher and VPN client software version 5.x and later. Here's the key commands to enable this configuration:

// Command that permits IPsec traffic to enter and exit the same interface.

same-security-traffic permit intra-interface

// Forces VPN Clients over the tunnel for Internet access.

split-tunnel-policy tunnelall

// The NAT statement to define what to encrypt (the addresses from the vpn-pool).

nat (outside) 1

Note that if you have a range of IPs to be assigned to the VPN clients instead of an entire subnet, you will need to add the all of them to the nat (outside) to allow them to access the internet.

A complete configuration example is available from here.

Software Based Storage: Thoughts and local storage tests

It has occurred to me that all these comparisons are not exactly equal...while the VM configurations, test procedures, and testing hardware are all identical - there are certainly ways to improve performance...some methods could be applied to all comparisons (adding a storage controller card with battery-backed write cache, and several 15K SAS spindles), and some are specific to the software presenting the storage (using an SSD to house the ZIL and\or l2ARC - only applies to ZFS-based products). In reality, these tests are performed using the 'absolute worst case scenario' - who in their right mind would use a single (7200 RPM, non-enterprise) drive to house anything more than a music library?

All that said, I wanted to take the network and the 3rd party storage providers out of the question and repeat some tests using a local datastore on the ESX host. Here's what the datastore looked like during zeroing and OS install:

During install, latency stayed right around 40ms. Complete installation for 3 VMs took just under 1 hour. Here's the datastore latency during idle operation and the IOMeter test at the far right of the chart:

Latency was just under 20ms during idle. The first test was a single VM running the standard IOMeter worker as in previous comparisons:

This shows the local storage to be around 25 IOPS worse than the MS iSCSI target and a single IOMeter test. The 3 VM test shows where DAS takes a big hit:

The best average I saw during the test:

So in the end, local storage is clearly not the way to go (except in some very specific use cases, but that involves some gear that will NEVER be approved for a home lab...and nor should it be)

Windows 7 Streaming: Media on an External Drive

In an effort to preserve some functionality of not having a dedicated Media Center PC (see this post for more), I have a Windows 7 VM with media streaming enabled. What does that mean? It means that any DLNA enabled device can see my media library and stream its contents. I recently needed to add more storage space to said VM, and in doing so moved all media to the new E:\ drive. Then the problems started - I could not stream ANYTHING. No music showed pictures showed up....and only 5 videos showed up. What the heck? The only error I received when trying to stream was No files have been found on this remote library.

Keep in mind that I had configured the library to include the new paths on the E:\ drive for all of the media, so it WAS showing up in the local media library, but it wasn't streaming. So what's different? Is it a share? No- the E$ share is active, so it can't be it. Permissions? Nope...they're the same...

When all else fails, run ProcMon.

The trace on the local laptop while trying to stream showed nothing, but then on the DLNA server - it showed something quite different:

...multiple, continuous hits by wmpnetwk.exe to a 'drmstore.hds' file. Having dealt with DRM in the past, I know how picky it can be. For example, when I upgraded the processor on the MC PC, none of the recorded TV content would play - I ended up deleting a file related to DRM to resolve the issue - as DRM is tied to the processor on the system. So I did the same thing here:

...Just rename 'drmstore.hds' to .old, and let it be recreated. A few notes:

  • The entire DRM directory is hidden
  • The file will be locked when you try to delete it, so you will need to kill the wmpnetwk.exe process before you rename the file
  • wmpnetwk.exe is simply the DLNA\Media streaming process, so just restart Windows Media Player, and the process will restart

After that, you should have a new drmstore.hds file, and all files should now stream as before.

FreeNAS Performance Part 1: NFS Storage

EDIT 1/8/2013: This post should be titled FreeNAS: The performance you will get when you don't allocate enough RAM, or enough disk resources.

These results are not a true representation of what FreeNAS can do. Here's a better example: FreeNAS Performance Part 2


Following the Microsoft iSCSI VS. StarWind iSCSI, I would like to also compare another option that offers FreeBSD based network storage - FreeNAS. It supports AFP, CIFS, NFS, iSCSI and has a very user friendly web GUI - further information is available here at the FreeNAS website.

Test Specifications

The same whitebox server that was used for the StarWind and Microsoft iSCSI tests was used for the FreeNAS server - 3.00 GHz Xeon, 3GB RAM, single 1GbE interface, single 80GB spindle for both the OS and NFS export.

OS Installation Performance

Let me put it this way - after 1 hour, none of the VMs had finished more than ~48% completion....Just short of 2 hours after the install was initiated, one of the VMs had successfully installed an OS, and the other 2 had failed setup with errors. Here's some of the built in reporting for FreeNAS:

And CPU utilization:

The latency for the NFS datastore is terrible:

Running IOMeter on a single VM while the other two VMs were installing the OS (Same IOMeter worker configuration as in previous tests):

Hoping to improve performance, the other 2 VMs were powered down, and the IOMeter test was run again:

The IOPS only improved by ~100 - the VM disk IO latency is still around ~1700+ ms - this is confirmed again by terrible host datastore latency - overall average write latency 100ms+ :



FreeNAS NFS storage, when configured in the same way as all previous experiments, has worse performance than local storage.

Software iSCSI Targets: Part 2B - StarWind, Multiple VMs

Part 2B: 3 VM IOMeter load on a StarWind iSCSI datastore. Same procedure as previous testing - Complete install time: 32 minutes, 39 seconds - 3.5 minutes faster than the Microsoft iSCSI target software. Here's the setup:

Here's the CPU\RAM of the iSCSI server during OS install:

Just as previously, RAM is allocated to cache and CPU is heavily utilized. Here is the network utilization during OS install:

The utilization graph looks strikingly similar to the MS iSCSI target install, however, utilization goes over 30% at a few points. During the major portion of OS install - CPU utilization is high, as is the underlying physical disk queue length. This shows me that a faster disk subsystem would improve performance even further.

And a brief view of network utilization during VMWare tools install on all 3 VMs:


Deduplication and Thin Provisioning

StarWind clearly has the advantage here. Not only is the LUN thin provisioned, but it is also deduped once data is written. The deduplication engine works inline, so as data is being written, it is deduped - with a 4K size, it is very effective. All 3 VMs should be taking up nearly 27GB of space - but as you can see, they are not - only about 8.69GB of used space. This yields a 3.15:1 dedupe ratio - which is what I would expect for 3 mostly identical servers.


*Note: Each VM should be taking up about 9.13GB however, there is likely some block redundant data within the IOMeter test files on each VM.


Performance - Single VM IOMeter, 2 VMs Idle

Again, the StarWind iSCSI target showed far better performance than the Microsoft software target:

I also tested rebooting one of the idle VMs while this test was running - as expected the IOPS dropped down to the 300 range, and response time went up. The physical disk queue also jumped - again showing this as a limitation (reboot initiated at 11:09 AM).


All 3 VMs running IOMeter

With all 3 VMs running an IOMeter worker, the performance is still very good. Keep in mind that the IOMeter test files are 500MB - the 3 combined likely fit in the StarWind cache helping the test along. Only 2 of the 3 VMs shown:

The network utilization as the 3 tests are started:



The StarWind iSCSI target software has several clear advantages over the Microsoft solution - performance advantages including high-speed RAM cache, thin provisioning, deduplication...with a high-performance disk subsystem that includes controller-based RAM cache as well as faster physical spindles - a major performance boost can be had. Additionally, aggregating network links and adding more RAM to the cache can produce a powerful, fast and efficient software based iSCSI storage solution.

Software iSCSI Targets - Part 2a: MS iSCSI - Multiple VMs

In Part 2 of this series, we will look at the performance of the Software iSCSI targets under a heavier load - more specifically, 3 server VMs. While this may not seem like very much load, keep in mind that the backend storage is still just a single 7200RPM spindle, and all networking is over a single 1GbE link. All of the hardware from test 1 is the same, but here are the specs for the three new VMs:

Windows Server 2008R2 SP1 (3x)

  • 1 vCPU, 2GB vRAM
  • 18GB System drive - * Thick provisioned, eager zeroed


The Procedure

Create VMs, mount install ISO, begin installing OS, repeat two more times. During the install, here's what the iSCSI server looks like:

It's a bit hard to tell, but the green graph is iSCSI I/O Bytes/second, the red is iSCSI Target disk latency, and yellow is iSCSI requests/second. In the backround, you can see that the CPU and RAM are fairly dormant.

The iSCSI server network adapter does not appear to utilize more than 30% of total available bandwidth. *Keep this network utilization graph in mind...the pattern will show up again...*

The next important bit of information: From start to Windows took 35 minutes, 53 seconds for 3 VMs installing simultaneously.

IOMeter testing

First, one VM running an IOMeter test, while the other two are idle:

Once again, network usage does not appear to exceed ~30% during the test. Iometer results for 1 VM (start of test):

Average after one minute:

Next, 3 VMs running the same IOMeter workers (same as in Part1):

After one minute:

As you can see by the IPOS performance, this may not be the best solution. In part 2B we'll look at the same tests using a StarWind iSCSI target.

StarWind iSCSI vs. Microsoft iSCSI - Part 1

*NEW* StarWind V8 [BETA] is here!! and it looks VERY promising!!

For some small to medium sized businesses and even some home power users, shared storage is a must-have. Unfortunately, standard high performance SANs carry with them a hefty price tag and a massive administrative overhead - so an EMC or NetApp filer is often out of the question. What is the answer? Turn an everyday server into an iSCSI storage server using iSCSI software target software.

Microsoft's iSCSI target software was originally part of the Windows Storage Server SKU and only available to OEMs. When Windows Storage Server 2008 was released, it was also included in Technet subscriptions making it readily available for businesses and home users to test and validate. Windows Storage Server 2008 R2 was different - it was released as an installable package to any Server 2008 R2 install - but again, it was available through Technet. Then, in April of 2011, Microsoft released the iSCSI target software installer to the public (in this blog post).

Enter Starwind. They have had a software target solution around for quite some time - the current release version is 5.8. The free-for-home-use version is becoming very feature rich - features include thin provisioning, high speed cache, and block level deduplication. The pay versions add multiple server support, HA mirroring, and failover - the full version comparison can be found here.

The Test Setup
I first want to compare both solutions side by side - first testing general OS performance, then more specialized workloads, etc. All network traffic is carried on a Catalyst 2970G switch, on the native VLAN with all other traffic - this is not an optimal configuration, but I want to start with the basics and try to improve performance from there.

iSCSI Target Server

  • Whitebox Intel Xeon 5160 Dual Core 3.0GHz
  • 3GB RAM, single GbE link
  • 60 GB iSCSI LUN presented from standalone 80GB 7200RPM SATA disk

ESXi 5.0 Host Server

  • Whitebox AMD Athlon X2 3.0 GHz
  • 8GB RAM, single GbE link

Windows 7 Test VM

  • 2 vCPU, 2GB RAM
  • Windows 7 SP1 x86
  • 20GB system volume - *Thick provisioned, eager zeroed


Comparison 1: Installing OS


The StarWind target was installed, and a virtual volume presented over iSCSI - 60GB, with deduplication turned on and a 1.5GB cache enabled. First impressions: OS installation is very quick - I did not time it, but it was remarkably quick. During the install, the iSCSI server was clearly using most of its resources for iSCSI operations - the single 1GbE link was saturated at 95%:

The high speed cache feature is very clearly a factor as it allocates the RAM immediately, and the CPU load is all from the StarWind process:


Microsoft iSCSI

The Starwind software was uninstalled and the Microsoft target software installed in its place. A 60GB LUN was presented to ESXi - and OS installation began (the VM was created with the same specs). Immediately, it was obvious that the installation was going much slower than with the StarWind target software. The resources in use by the iSCSI server clearly show this:

Average network use is around 30%:

Same story with CPU use and allocated RAM:


Comparison 2: IOMeter Test

Here is the configuration used for the tests:



This test may be a bit one sided due to the fact that this test VM is the only running on this datastore, and thus the entire IOMeter test file is likely coming from the RAM cache. Either way, here are the results:

It will be interesting to see if this performance scales with more VMs (containing similar blocks - in the kernel, etc) and with more RAM in the iSCSI server.

Microsoft iSCSI

The results while using the Microsoft software target:


Part 1 Results

IOMeter clearly shows a 10X improvement in performance - these results will need to be verified with a heavier load, but I can only imagine that more RAM and faster base storage disks will only improve these results.

In part 2 I will see if these results will scale with multiple server VM workloads - and also how effective the StarWind deduplication engine is.

Windows Server 8: Offline servicing

From the little that I've looked into Windows Server 8, my favorite new built-in feature is offline servicing. This was possible in the past with Windows Images (WIM files) using the dism.exe tool, but this new feature looks much more promising - the VHD is becoming a very powerful format.



This will make servicing VMs even easier. 2.5

After a lengthy period of downtime the blog is back up, and is now running on 2.5!! Now that we've moved in to our new place, there should not be any other major interuptions.


SnapMirror from FAS to StoreVault

First, a few warnings:


  • This is NOT supported by NetApp. At all. In any way shape or form.
  • Using anything other than the StoreVault Manager GUI can cause data loss.


You have been warned - do this at your own risk!
First some background - setting aside the fact that FAS to StoreVault is not supported at all - lets go Back to Basics: SnapMirror -
Volume SnapMirror operates at the physical block level. It replicates the contents of an entire volume, including all Snapshot copies, plus all volume attributes verbatim from a source (primary) volume to a target (secondary) volume. As a result, the target storage system must be running a major version of Data ONTAP that is the same as or later than that on the source.
Here's the problem - the StoreVault will likely be running OnTap (S version) 7.2.x and the FAS will be running 7.3.x - thus meaning that volume SnapMirror will not even work. In fact, if you try, you will probably receive an unspecified error when trying to initialize the mirror. What's the solution? Try to get your filers on the same major version? Good luck - especially since the StoreVault is EOL. Or use qtree SnapMirror.
The caveats:
What does this mean? It means that all the great features of VSM do not apply - particularly (in my case) SMVI integration. HOWEVER, all that said it is still possible to efficiently replicate all data from one filer to another based on a schedule. If you are mapping volumes (not qtrees) to LUNs in a VMWare cluster, you are likely wondering how QSM will work - that's where the trick is, and it's fairly simple.
First - remember that SnapMirror is always configured from the destination. Next, use the following syntax to setup a QSM to mirror the entire volume to a qtree:
snapmirror initialize -S SrcFiler:/vol/VolumeName/- DestFiler:/vol/VolumeName/qtreeName
The key is the '/-' to indicate the entire source volume. Also, do NOT create the qtree on the destination filer before initializing the SnapMirror - the initialize will create the qtree for you. This can also be done in the [unsupported] FilerView on the destination StoreVault to enable throttling and a schedule without having to go into the /etc/snapmirror.conf file.