How to run nested ESXi 7 on QNAP’s Virtualization Station

**Important update at the end that should be read prior to wasting your time.

This weekend I found myself in need of an additional ESXi host so instead of acquiring new hardware I figured I might as well run it nested on my beefy QNAP TVS-h1288X with its Xeon CPU and 72GB of RAM. I already use the QEMU-based Virtualization Station (VS) for hosting my primary domain controller and it’s my go-to host for spinning up my ONTAP simulators so I figured nesting an ESXi VM shouldn’t be that difficult. What I hadn’t taken into account however is the fact that VMware has deprecated the VMKlinux Driver Stack, removing support for all of the NICs VS makes available to you in the GUI while provisioning new virtual machines. At first I researched injecting drivers or rolling my own installation ISO but these seemed overly complicated and somewhat outdated in their documentation. Instead I decided to get inside of VS and see if I could do something from that angle, it was after all simply their own version of QEMU.

I started the installation process, but it wasn’t long before I received this error message:

ESXi 7 No Network Adapters error message

I shut down the VM, and changed the NIC type over and over eventually exhausting the five possibilities presented in the VS GUI:

Not even the trusty old e1000 NIC, listed as Intel Gigabit Ethernet above worked…Over to the CLI I went. Some Googling around on the subject lead me to believe there was a command that would produce a list of supported virtualized devices, but the commands I was finding were for native KVM/QEMU installs and not intended for VS so I poked around and came across the qemu-system-x86_64 command, and when I ran it with the parameters -device help and it produced the following, abbreviated list:

./qemu-system-x86_64 -device help
[VL] This is a NROMAL VM
Controller/Bridge/Hub devices:
name "i82801b11-bridge", bus PCI
................<SNIP>
Network devices:
name "e1000", bus PCI, alias "e1000-82540em", desc "Intel Gigabit Ethernet"
name "e1000-82544gc", bus PCI, desc "Intel Gigabit Ethernet"
................<SNIP>
name "vmxnet3", bus PCI, desc "VMWare Paravirtualized Ethernet v3"
................<SNIP>

That last line is exactly what I was looking for, this lead me to believe that QEMU should be able to support the VMXNET3 network device so I cd’d over to the .qpkg/QKVM/usr/etc/libvirt/qemu directory and opened up the XML file associated with my ESXi VM and changed the following sections:

<interface type='bridge'>
      <mac address='00:50:56:af:30:fe'/>
      <source bridge='qvs0'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

to:

<interface type='bridge'>
      <mac address='00:50:56:af:30:fe'/>
      <source bridge='qvs0'/>
      <model type='vmxnet3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

I saved the file and for good measure I also restarted VS. I booted the VM, and I received the same error message as above. This time I cc’d over to .qpkg/QKVM/var/run/libvirt/qemu and had a look at the XML file that represented the running config of the VM, and the NIC was still set to e1000. It took me a bit of hacking around to determine that in order to make this change persistent, I needed to edit the XML file using:

virsh edit 122d6cbc-b47c-4c18-b783-697397be149b

That last string of text being the UUID of the VM in question. If you’re unsure of what the UUID is of a given VM, simply grep “qvs:name” from all the XML files in the .qpkg/QKVM/usr/etc/libvirt/qemu directory. I made the same change as I had previously, exited the editor and booted the VM once again…This time, success! My ESXi 7.0u2 host booted fine and didn’t complain about the network. I went through the configuration and it is now up and running fine. The GUI still lists the NIC as Intel Gigabit Ethernet.

I’m reluctant to make any changes to the VM using the GUI at this time for fear of the NIC information changing, but I’m okay not using the GUI if it means being able to nest ESXi 7 on Virtualization Station for testing purposes.

**Update: While the ESXi 7.0u2 VM would boot find, I have been unable to actually add it to my vCenter server. I tried running the VM on my physical ESXi host and was able to add it to vCenter, then I powered down the ESXi VM and imported it into VS. The import worked, but then it showed as disconnected from vCenter. Next I tried importing vCenter into the virtualized ESXi host, but that won’t boot as VS isn’t presenting the VT-x flag even though I have CPU passthrough enabled. I’m still going to try and get this going, but won’t have time to devote to troubleshooting VS for a couple of days.

macOS How-To Guide: Installing vCenter Server Appliance from ISO directly

Ever since macOS started enforcing code signing there’s been the occasional hoop to jump through to get non-App Store software to run. Typically it’s as easy as right-clicking on the binary and choosing Open, which is all well and good until that application needs to launch a subsequent one. Recently I downloaded the ISO for vCenter Server Appliance and double-clicked on it to mount it. I then navigated to:

/Volumes/VMware VCSA/vcsa-ui-installer/mac

Once here, I double-clicked Installer[.app] and of course got the following:

I dutifully two-finger-clicked (ie: right-click) and chose Open and then Open again and proceeded with the initial vCenter configuration. Not too far into the process, Installer wanted to call ovftool but since this was a direct launch, I received a message similar to the previous one:

At this point, I couldn’t find a way to work around this security control directly so I decided to do some research. As with most things VMware-related I came across a post by William Lam, specifically one entitled How to exclude VCSA UI/CLI Installer from MacOS Catalina Security Gatekeeper?. This post lead me to this command:

sudo xattr -r -d com.apple.quarantine <directory of ISO contents>

but that assumes you’ve copied the ISO to another drive and not running it directly as if it were a read-only file system mounted much like the DVD media it represents. I copied all ~8GB of the ISO to my local SSD issued the command above and sure enough, it was going to work. I wasn’t happy with this however and was determined to run this installer from the ISO as intended. The root of the problem is that when you mount the ISO, either by double clicking on it in Finder or issuing an hdiutil mount image.iso, it mounts the file system with the quarantine option:

I did some quick research on how to mount an ISO without this option using any of mount, hdiutil, diskutil or Disk Utility[.app] to no avail. I did notice however that after I unmounted the ISO using diskutil unmount /dev/disk3 that the image remained in the sidebar albeit greyed out:

I decided to right-click on it here and choose Mount:

A quick check over in Terminal[.app] and voilà, no quarantine!

At this point I navigated to the Installer once again and was able to run through to completion without any security notifications. I’m not sure if this is a well known workaround but I didn’t find any reference to it online so hopefully someone will benefit from it.

TL;DR: Instructions for running installers directly from read-only mounted file systems on macOS:

  • Double-click the image so that it mounts normally with DiskImageMounter.app
  • Issue the following CLI command to unmount it: diskutil unmount /dev/diskX
  • Launch Disk Utility.app
  • Right-click the image name in the side bar and choose Mount
  • Your installer can be run directly from the r/o filesystem

Installing the ONTAP 9.7 simulator in Fusion 12.1

At the time of writing this 9.8 is available but I’m specifically writing this for someone who is trying to install 9.7 and having problems. Before I get into the actual simulator installation we need to come cover some stuff around VMware Fusion first.

With regards to networking, VMware Fusion can provide three different interface types, they are as follows:

  1. Bridged – this type puts the interface directly on the same LAN as your Mac, this is great if you want the VM to appear as though it’s on the network that your Mac is using.
  2. Host-only – this is a completely isolated network, the only hosts that can access it are those on your Mac configured with this type of interface. There is no external access with this type.
  3. NAT – this is similar to number two, but allows the host with this type to reach out of the Mac, such as for Internet access.

    If you want more details on this please go read this KB.

By default, the simulator has four network interface; the first two, e0a/e0b are for the ClusterNet network, the back-end network used by cluster nodes to communicate with each other, and should be of type host-only. The second two, e0c/e0d are for client access and management access, these are of type NAT but can also be set to bridged. If you use Nat, then VMware will assign IP addresses via DHCP based on the configuration of the VMNET8 interface settings; to view this cat the file located here:

/Library/Preferences/VMware\ Fusion/vmnet8/dhcpd.conf

Mine looks like this:

allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 172.16.133.0 netmask 255.255.255.0 {
	range 172.16.133.128 172.16.133.254;
	option broadcast-address 172.16.133.255;
	option domain-name-servers 172.16.133.2;
	option domain-name localdomain;
	default-lease-time 1800;                # default is 30 minutes
	max-lease-time 7200;                    # default is 2 hours
	option netbios-name-servers 172.16.133.2;
	option routers 172.16.133.2;
}
host vmnet8 {
	hardware ethernet 00:50:56:C0:00:08;
	fixed-address 172.16.133.1;
	option domain-name-servers 0.0.0.0;
	option domain-name "";
	option routers 0.0.0.0;
}

What this means is that any interface set to NAT in my instance of Fusion will receive DHCP addresses in the subnet 172.16.133.0/24, but the DHCP pool itself is only 172.16.133.[128-254]. The subnet mask will still be 255.255.255.0 (ie: /24) and the default gateway is 172.16.133.2 as that is the internal interface of the virtual router created to do the NAT; .1 is held by the “external” interface which you can view by issuing an ifconfig vmnet8 at the command prompt. Note, this interface is created when Fusion is launched and torn down when you quit. If you set the interface type to bridged, those interfaces will get DHCP addresses from the same LAN that the Mac is connected to.

On to the actual installation…

First thing you need to do is download the OVA from NetApp:

  1. Go to https://support.netapp.com/
  2. login (yes, required)
  3. At the top click Downloads → Product Evaluation
  4. Click “Data ONTAP™ Simulator
  5. Agree to the terms
  6. Download the OVA and license keys for the version you’re looking for.

Now that you have the OVA, you’re ready to import it into Fusion. Launch Fusion, then click the + sign and choose Import:

import

Browse for and open the downloaded OVA:

choose file
open

Now click continue:

continue

Give the folder you’re going to store it in a name and click save, I like to name it after the node:

ONTAP 9.7, Node 1

Fusion will import the OVA and present you with the settings. You can modify them if you want, but for now I’m going to leave them as default. Click Finish:

finish

You’ll likely be asked if you’d like to upgrade the VM version, don’t bother:

At this point the vSIM will boot for its first time, I believe the official instructions tell you to hit CTRL-C, halt the boot and call for the maintenance menu then issue an option 4, but if this is the first node you do not have to do that. The root aggregate is automatically created:

First boot with aggr0 creation

Now you can open a browser and point it at the IP address listed on your screen, in my case it will be https://172.16.133.132/, but it may be different for you. You will get a certificate error, but bypass that to access the GUI to finish the configuration. IF you do not get the following screen or get no site at all, there’s something else wrong. Also, hover your mouse over the node in the Health card, if the serial number doesn’t appear, refresh the web page, otherwise configuration will fail:

No node serial

It should look like this:

With node serial

Now enter all the required information, since the IP addresses are being statically assigned, I’m choosing ones outside of the DHCP range, as should you:

Cluster name and admin password
Networking information

I don’t check the “single-node” box, it will still work as a single node if you don’t but if you do, it removes the ClusterNet interfaces completely. I like having those interfaces for experimentation and teaching purposes; also it keeps the door open to adding a second node, which I will cover in a follow up post if there is anyone interested. Now click Submit:

other info

At this point I like to start pinging either the cluster IP I specified or the node IP so I can see when the cluster gets configured since the browser doesn’t always refresh to the new IP address:

ping

Once ping starts responding, go ahead and visit the new IP address via your browser:

Now the person I wrote this blog entry for isn’t getting the GUI above, but instead the GUI for the out-of-band interface for a UCS server, so the IP space their vmnet8 is using collides with production IP space. This can be verified at this point by disconnecting any Ethernet connections and turning off WiFi, once that is done, reload the browser and the IP conflict should be resolved until you’re connected once again. To resolve it permanently, that person will need to edit the dhcpd.conf file for vmnet8 mentioned above, using a subnet known to not conflict. Here’s an example, alternative dhcpd.conf:

allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 10.0.0.0 netmask 255.255.255.0 {
	range 10.0.0.128 10.0.0.254;
	option broadcast-address 10.0.0.255;
	option domain-name-servers 10.0.0.2;
	option domain-name localdomain;
	default-lease-time 1800;                # default is 30 minutes
	max-lease-time 7200;                    # default is 2 hours
	option netbios-name-servers 10.0.0.2;
	option routers 10.0.0.2;
}
host vmnet8 {
	hardware ethernet 00:50:56:C0:00:08;
	fixed-address 10.0.0.1;
	option domain-name-servers 0.0.0.0;
	option domain-name "";
	option routers 0.0.0.0;
}

This changes the subnet in use to 10.0.0.0/24 with the DHCP range being 10.0.0.[128-254] and the default gateway of VMs using it to 10.0.0.2.

This is where I’m going to end this post for now as the simulator is now accessible via HTTPS and SSH and ONTAP is ready to be configured. You will still need to assign disks, create a local storage tier (aggregate) as well as an SVM with volume(s) for data among other things. The intent of this post was to get this far, not to teach ONTAP. If you’d like to see a post around either adding a second node to the cluster or configuring ONTAP on the first one, please leave a comment and I’ll try and get around to it.

ONTAP 9.8 has been announced

Timed perfectly with NetApp INSIGHT 2020 is the annual ONTAP payload announcement. Once again, there’s a lot in this payload, so I will simply deliver a list of bulleted sections, addressing as many of the changes as I’m able. I’ll provide additional detail on the ones I feel are the most interesting. For a full run down, please consult the release notes or start a conversation with me on twitter.

FlexGroup Volume Enhancements

  • Async Delete
    • Delete large datasets rapidly from the CLI.
      • This is great for those high file count deployments.
  • Backup enhancements
    • 1,023 snapshots supported
    • NDMP enhancements
  • FlexVol to FlexGroup in-place conversion enhancements
  • VMware datastore support
  • Proactive resizing of constituent volumes

FlexCache Volumes, a true global namespace

  • SMB support added with distributed locking
  • 10x origin to cache fan-out ratio, now 1:100
  • Caching of SnapMirror secondary volumes
  • Cache pre-population

Data Visibility

  • File system analytics, viewable in System Manager
    • Enabled on a per-volume basis
    • Can also be queried via API access
  • QoS for Qtrees
    • IOPS and throughput policies available per qtree object
    • Managed via REST API or CLI
    • Qtree-level statistics
    • NFS only in this release, no adaptive QoS

All-SAN Array (ASA) enhancements

  • Persistent FC Ports
    • Symmetric active/active host-to-LUN access
    • Each node on the ASA will maintain a “shadow FC LIF”, reducing SAN failover times even further.
  • Larger Capacities
    • Max LUN = 128TB LUNs
    • Max FlexVol = 300TB
      • These limit increases are on the ASA only
  • MCC-IP support
  • Priced ~20% less than unified platforms
Before Persistent FC Ports
With Persistent FC Ports

ONTAP S3

  • Preview-only in 9.7, GA in 9.8
  • System manager integration
  • Bucket access policies
  • Multiple buckets per volume
  • TLS 1.2 support
  • Multi-part upload
    ONTAP S3 is not a replacement for a dedicated, global object store

Storage Efficiency Enhancements

  • FabricPool
    • Tiering from HDD aggregates
    • Object tagging (For information life cycle policies)
    • Increased cooling period (max 183 days)
    • Cloud retrieval
  • Storage efficiencies
    • Differentiation of hold and cold data for application of different compression methods, 8k compression group for hot, 32k for cold
    • Deduplication prior to compression

Simplification

  • Upgrade directly to two versions newer without passing via intermediary version
  • Headswaps using nodes running the latest version of ONTAP can be used on nodes running versions of ONTAP up to two versions behind
  • REST API enhancements
    • ZAPI to REST mapping documentation
    • ONTAP version information in API documentation
  • System Manager Improvements
    • Single-click firmware upgrades
    • File system analytics
      • Granular details about your NAS file systems
    • Hardware and Network visualization
    • Data Protection Enhancements
      • Reverse resync
  • Simpler Compliance
    • Volume move support, no second copy required
    • WORM as the default

Security and Data Protection Enhancements

  • Secure purge
    • crypto shred individual files
  • IPSec
    • encrypted network traffic, regardless of protocols
      • Simplifies secure NFS, no need for Kerboros
      • iSCSI traffic on the wire can now be encrypted
  • Node root volume encryption
  • MetroCluster
    • Unmirrored aggregate support
  • SnapMirror
    • SnapMirror Business Continuity (SM-BC) provides automated failover of synchronous SnapMirror relationships for application-level, granular protection
      • These are non-disruptive
      • SM-BC is preview-only in 9.8 and SAN-only.
    • SnapMirror to Object Store
      • Google Cloud, Azure, or AWS
      • Meta Data included so Object Store data is a complete archive
      • Efficiencies maintained
SnapMirror to Object Store

Virtualization Enhancements

  • FlexGroup volumes as VMware datastores
  • SnapCenter backup support
  • 64TB SAN datastore on the ASA
  • SRA support for SnapMirror Synchronous
  • Support for Tanzu storage

That sums up the majority of the improvements, looking forward to this release coming out. See you at NetApp INSIGHT 2020!

NetApp releases a new AFF and a new FAS(?)

While we ramp up for NetApp INSIGHT next week, (the first virtual edition, for obvious reasons), NetApp has announced a couple of new platforms. First off, the AFF A220, NetApp’s entry-level, expandable AFF is getting a refresh in the AFF A250. While the 250 is a recycled product number, the AFF A250 is a substantial evolution of the original FAS250 from 2004.

The front bezel looks pretty much the same as the A220:

AFF A250 – Front Bezel

Once you remove the bezel, you get a sneak peak of what lies within from those sexy blue drive carriers which indicate NVMe SSDs inside:

AFF A250 – Bezel Removed

While the NVMe SSDs alone are a pretty exciting announcement for this entry-level AFF, once you see the rear, that’s when the possibilities start to come to mind:

AFF A250 – Rear View

Before I address the fact that there’s two slots for expansion cards, let’s go over the internals. Much like its predecessor, each controller contains a 12-core processor. While the A220 contained an Intel Broadwell-DE running at 1.5GHz, the A250 contains an Intel Skylake-D running at 2.2GHz providing roughly a 45% performance increase over the A220, not to mention 32, [*UPDATE: Whoops, this should read 16, the A220 having 8.] third generation PCIe lanes. System memory gets doubled from 64GB to 128GB as does NVRAM, going from 8GB to 16GB. Onboard connectivity consists of two 10GBASE-T (e0a/e0b) ports for 10 gigabit client connectivity with two 25GbE SFP28 ports for ClusterNet/HA connectivity. Since NetApp continues to keep HA off the backplane in newer models, they keep that door open for HA-pairs living in separate chassis, as I waxed about previously here. Both e0M and the BMC continue to share a 1000Mbit, RJ-45 port, and the usual console and USB ports are also included.

Hang on, how do I attach an expansion shelf to this? Well at launch, there will be four different mezzanine cards available to slot into one of the two expansion slots per controller. There will be two host connectivity cards available, one being a 4-port, 10/25Gb, RoCEv2, SFP28 card and the other being a 4-port, 32Gb Fibre Channel card leveraging SFP+. The second type of card available is for storage expansion: one is a 2-port, 100Gb Ethernet, RoCEv2, QSFP28 card for attaching up to one additional NS224 shelf, and the other being a 4-port, 12Gb SAS, mini-SAS HD card for attaching up to one additional DS224c shelf populated with SSDs. That’s right folks, this new platform will only support up to 48 storage devices, though in the AFF world, I don’t see this being a problem. Minimum configuration is 8 NVMe SSDs, max is 48 NVMe SSDs or 24 NVMe + 24 SAS SSDs, but you won’t be able to buy it with SAS SSDs. That compatibility is being included only for migrating off of or reusing an existing DS224x populated with SSDs. If that’s a DS2246, you’ll need to upgrade the IOM modules to 12GB prior to attachment.

Next up in the hardware announcement is the new FAS(?)…but why the question mark you ask? That’s because this “FAS” is all-flash. That’s right, the newest FAS to hit the streets is the FAS 500f. Now before I get into those details, I’d love to get into the speeds and feeds as I did above. The problem is that I would simply be repeating myself. This is the same box as the AFF A250, much like how the AFF A220 is the same box as the FAS27x0. The differences between the AFF 250 and the FAS500f are in the configurations and abilities or restrictions imposed upon it.

While most of the information above can be ⌘-C’d, ⌘-V’d here, this box does not support the connection of any SAS-based media. That fourth mez card I mentioned, the 4-port SAS one? Can’t have it. As for storage device options, much like Henry Ford’s famous quote:

Any customer can have a car painted any color that he wants so long as it is black.

-Henry Ford

Any customer can have any size NVMe drive they want in the FAS500f, so long as it’s a 15.3TB QLC. That’s right, not only are there no choices to be made here other than drive quantity, but those drives are QLC. On the topic of quantity, the available configurations start at a minimum 24 drives and can be grown to either 36 or 48, but that’s it. So why QLC? By now, you should be aware that the 10k/15k SAS drives we are so used to today for our tier 2 workloads are going away. In fact, the current largest spindle size of 1.8TB is slated to be the last drive size in this category. NetApp’s adoption of QLC media is a direct result of the sunsetting of this line of media. While I don’t expect to get into all of the differences between Single, Multi, Triple, Quad or Penta-level (SLC, MLC, TLC, QLC, or PLC) cell NAND memory in this post, the rule of thumb is the more levels, the lower the speed, reliability, and cost are. QLC is slated to be the replacement for 10k/15k SAS yet it is expected to perform better and only be slightly more expensive. In fact, the FAS500f is expected to be able to do 333,000 IOPS at 3.6ms of latency for 100% 8KB random read workloads or 170,000 IOPS at 2ms for OLTP workloads with a 40/60 r/w split.

Those are this Fall’s new platforms. If you have any questions put it in a comment or tweet at me, @ChrisMaki, I’d love to hear your thoughts on these new platforms. See you next week at INSIGHT 2020, virtual edition!

***UPDATE: After some discussion over on Reddit, it looks like MetroCluster IP will be available on this platform at launch.

What’s going on with Intel’s X710 Ethernet controller?

I’ve previously written about this Ethernet controller back when 40GbE Ethernet was relatively new to NetApp’s FAS and AFF controllers. Since that article, I’ve started to come across various oddities with this Ethernet controller.

Last Fall, I had a customer who was experiencing problems with LACP during an ONTAP upgrade (9.1 → 9.3 → 9.5P6) on their AFF A700s using the X1144A, dual port 40GbE card, which uses the Intel X710 Ethernet controller. We had the first 40GbE port broken out into 4x10GbE links, 2-each to either half of a pair of Cisco Nexus N9K-C9396PX in the same vPC Domain. During a controller reboot, we noticed that the interface group using multimode_lacp, most or all of the ports wouldn’t come up and on the Cisco-side, the port(s) would become disabled due to too many link up/down events. Immediately we wanted to look at potential cable problems but quickly dismissed that idea as well. After some digging, it looked as though NetApp was referencing Cisco Bug ID CSCuv87644 as potentially related. This led me down a long path of investigating the changes made to the networking stack in ONTAP over the past couple of years, and I’ve still got a post I’m working on around that. The workaround was to increase the debounce timer value on the Cisco 9k to 525ms, the default value is 100ms.

The port debounce time is the amount of time that an interface waits to notify the supervisor of a link going down. During this time, the interface waits to see if the link comes back up. The wait period is a time when traffic is stopped.

Source: Cisco

Recently, a different customer of mine was trying to buy a Nimble HF20 and they wanted to include the Q8C17B, a four port, 10GbE NIC, also based on the Intel X710 Ethernet controller. The vendor came back to me and said they needed to know if the customer was going to be using VLAN tagging on the Q8C17B, because if they needed VLAN tagging, they’d have to choose a two port NIC instead. This confused me, but after some emails back and forth, HPE Nimble Storage Alert # EXT-0061 was referenced as the reason for this. At some point Nimble will release a patch that updates the firmware on this NIC, hopefully bringing back VLAN functionality. A bit of looking around, and the same VLAN issue has been identified by VMware in KB2149781.

Lastly, I also came across a NIST vulnerability from 2017 regarding the same Ethernet controller, it seems that has since been addressed in a firmware update.

While the above doesn’t necessarily imply a huge problem with the X710, I simply found it interesting and thought I’d include them all in one post.

ONTAP Fall 2019 Update – 9.7

Right on schedule, to coincide with NetApp INSIGHT 2019 is the announcement of the next release of NetApp’s ONTAP, 9.7. Going over the list of improvements, much of what is expected in 9.7 seems incremental. The themes for this release are High Performance, Simplicity and Data Protection. This release will also bring support for a few new platforms, the FAS8300, 8700 and the AFF A400. Also, a new twist on the A220 and A700, the first models in the new All SAN Array(ASA) versions of the all flash FAS’.

FlexCache, the most recent feature to be brought back from the depths of 7-mode gets a bit more attention. First up, both FC and IP MetroCluster support, allowing you to extend a volume namespace across MCC sites and per-site load-balancing for NFS clients. Also, FlexGroups can now be the origin volume for FlexCache, allowing for origin volumes greater than 100TB and higher file counts. 

In the realm of security, data-at-rest encryption is on by default for all newly created volumes provided there is a key manager configured. ONTAP will encrypt the data using hardware encryption if the drives are available, otherwise it will leverage software-based encryption. Setting up the onboard key manager is now extra simple with a setup wizard available in System Manager.

MetroCluster network can now co-exist on your data access switches provided they comply with specifications. MCC’s with either an A220 or FAS2750 do not qualify. 

There’s an interesting new bit of engineering coming in the new AFF A400 platform where compression will be offloaded to a PCI network card.

FlexGroup improvements include NDMP support, allowing backup by any 3d party application that supports NDMP. ONTAP 9.7 brings NFS v4.0 and v4.1 to FlexGroups, including support for pNFS. The long awaited conversion in-place from FlexVol to single-member FlexGroup is here, allowing you to scale capacity and performance without having to perform a client-based copy. While VMware datastores will work on FlexGroups, this isn’t supported quite yet. If you’re a NetApp partner and you have a customer who would like to use FlexGroups as a VMware datastore, contact your SE.

Another oft-request feature, this one of FabricPools, is the ability to tier to more than one object store. In 9.7, FabricPool Mirrors is announced, allowing you to tier to two separate object stores. FabricPool mirrors can be used to add resiliency, or change providers, perhaps to re-patriate your data to an on-premises StorageGRID deployment. Keeping on the topic of FabricPool, customers wanting to tier to an object store that isn’t officially qualified no longer need an FPVR, though they must perform their own testing to ensure the object store meets their needs. The officially qualified object stores are: Alibaba Cloud Object Storage Services, Amazon S3, Amazon Commercial Cloud Services, Google Cloud Storage, IBM Cloud Object Storage, Microsoft Azure Blog Storage and StorageGRID.

FabricPool Mirrors

Wrapping up the 9.7 updates, ONTAP Select gets NVMe device support, 12-node clusters and NSX-T support on ESXi.

Rubrik and NetApp, did that just happen?

I wasn’t sure I’d ever see the day where I’d be writing about not only the partnership of NetApp and Rubrik, but actual technological integration, this always seemed somewhat unlikely. While there have been some rumours flying around in the background for some time, the first real sign of cooperation between the two companies was when we saw the publication of a Solution Brief around combining NetApp StorageGRID with Rubrik Cloud Data Management (CDM) to automate data lifecycle management through Rubrik’s simple control plane while using StorageGRID as a cloud-scale object-based archive target. And then…nothing, not even the sound of crickets.

As Summer started to draw to a close and the kids were back in school, those in the inner circle started to hear things, interesting things. If you were to talk to your local Rubrik reps or sales engineers, the stories they had to tell were around NAS Backup with NAS Direct Archive as well as using older NetApp gear as a NAS target, nothing game changing. This backing up of the NAS filesystems still involved completely trolling the directory structure which was time consuming and performance impacting; something was still missing.

On September 24th this year, exactly one month ago, a new joint announcement hit the Internet, Rubrik and NetApp Bring Policy-Based Data Management to Cloud-Scale Architectures. While interesting, still not exactly what some of us were waiting for. Well, wait no longer, as of now, Rubrik has officially announced plans to integrate with NetApp’s SnapDiff API. What’s that you may ask? It is the ability to poll ONTAP via API call to leverage the internal meta-data catalogue to quickly identify the file and directory differences between two snapshots. This is a game changer for indexing NAS backups, since Rubrik will no longer need to scan the file shares manually, backup windows will shrink dramatically. Also, while other SnapDiff licensees can send data to another NetApp target, Rubrik is the first backup vendor to license SnapDiff and be able to send the data to standard public cloud storage.

Since the ink is just drying on Rubrik’s licensing of the SnapDiff API, it’s not quite ready in their code yet, but integration is being targeted for release 5.2 of CDM. Also, Rubrik will have a booth at INSIGHT (207) and be presenting on Tuesday, session number 9019-2, stop by to see what all the fuss is about. Also, be sure to look for me and my fellow A-Team members, there’s a good chance you’ll find us hanging around near the NetAppU booth where you’ll find a pretty cool surprise! You can also find me Wednesday, October 30th, at 11:30 am presenting 3009-2 Ask the A-Team – Building A Modern Data Platform, register for that today.

Gartner’s new Magic Quadrant for Primary Storage

Hot off the presses is Gartner’s new Magic Quadrant (GMQ) for Primary Storage and it’s great to see NetApp at the top-right, right where I’d expect them to be. This is the first time Gartner has combined rankings for primary arrays and not separated out all-flash from spinning media and hybrid arrays, acknowledging that all-flash is no longer a novelty.

As you can see on the GMQ below, the x-axis represents completeness of vision while the y-axis measures ability to execute, NetApp being tied with Pure on X and leading on Y.

As mentioned, this new MQ marks the retiring of the previous divided GMQs of Solid-State Arrays and General-Purpose Disk Arrays. To read more about NetApp’s take on this new GMQ, head over to their blog post on the subject or request a copy of the report here.

There’s a new NVMe AFF in town!

Yesterday, NetApp announced a new addition to the midrange tier of their All-Flash FAS line, the AFF A320. With this announcement, end-to-end NVMe is now available in the midrange, from the host all the way to the NVMe SSD. This new platform is a svelte 2RU that supports up to two of the new NS224 NVMe SSD shelves, which are also 2RU. NetApp has set performance expectations to be in the ~100µs range.

Up to two PCIe cards per controller can be added, options are:

  • 4-port 32GB FC SFP+ fibre
  • 2-port 100GbE RoCEv2* QSFP28 fibre (40GbE supported)
  • 2-port 25GbE RoCEv2* SPF28 fibre
  • 4-port 10GbE SFP+ Cu and fibre
    *RoCE host-side NVMeoF support not yet available

A couple of important points to also note:

  • 200-240VAC required
  • DS, SAS-attached SSD shelves are NOT supported

An end-to-end NVMe solution obviously needs storage of some sort, so also announced today was the NS224 NVMe SSD Storage Shelf:

  • NVMe-based storage expansion shelf
  • 2RU, 24 storage SSDs
  • 400GB/s capable, 200Gb/sec per shelf module
  • Uplinked to controller via RoCEv2
  • Drive sizes available: 1.9TB, 3.8TB and 7.6TB. 15.3TB with restrictions.

Either controller in the A320 has eight 100GbE ports on-board, but not all of them are available for client-side connectivity. They are allocated as follows:

  • e0a → ClusterNet/HA
  • e0b → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0c → First NS224 connectivity
  • e0d → ClusterNet/HA
  • e0e → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0f → First NS224 connectivity
  • e0g → Client network, 100GbE or 40Gbe
  • e0h → Client network, 100GbE or 40Gbe

If you don’t get enough client connectivity with the on-board ports, then as listed previously, there are myriad PCIe options available to populate the two available slots. In addition to all that on-board connectivity, there’s also MicroUSB and RJ45 for serial console access as well as the RJ-45 Wrench port to host e0M and out-of-band management via BMC. As with most port-pairs, the 100GbE ports are hosted by a single ASIC which is capable of a total effective bandwidth of ~100Gb.

Food for thought…
One interesting design change in this HA pair, is that there is no backplane HA interconnect as has been the case historically; instead, the HA interconnect function is placed on the same connections as ClusterNet, e0a and e0d. This enables some interesting future design possibilities, like HA pairs in differing chassis. Also, of interest is the shelf connectivity being NVMe/RoCEv2; while currently connected directly to the controllers, what’s stopping NetApp from putting these on a switched fabric? Once they do that, drop the HA pair concept above, and instead have N+1 controllers on a ClusterNet fabric. Scaling, failovers and upgrades just got a lot more interesting.