Category Archives: Hardware

C-Series lineup

NetApp Announces A Whole New Line

Up until today, if you were looking for a physical ONTAP array for your environment, your choices were the hybrid flash, FAS array offering around 5-10ms of latency or the sub-ms AFF A-series. Sure there was one anomaly in there, the QLC-based FAS 500f, but that AFF in FAS clothing was just that, an anomaly. While I have no evidence to point to here, but my theory is that the 500f was NetApp’s way of dipping their toe in the water of QLC-based arrays. Upon launch, the 500f was pricey and the configurations limited and restricted, both of which were addressed at some point after launch. As an employee at a partner that sells a lot of NetApp, I looked at the 500f when it first launched and then basically never looked at it again because of those two points.

Today, NetApp is announcing the all new C-Series of QLC-based arrays, the “C” being for “Capacity Enterprise Flash”. While the controllers themselves aren’t new, the fact that they only support QLC media is what is different. While I won’t go into the details of what QLC, or Quad Layer Flash is in this post, the fact of the matter is that it is more affordable than Triple Layer Flash (TLC) and almost as performant. What this means for those purchasing NetApp arrays is that they can get near the performance of an AFF system at a fraction of the cost. Most of us in the storage world know that 10k and 15k RPM SAS drives are slowly going to be phased out in favour of high-capacity SATA drives and high-performance NAND storage, leaving a void. QLC-based arrays will fill that void, and at a higher performance level. If you start to research QLC vs TLC, you’ll find lots of concerns around durability which are not completely unfounded, but you would have also found these concerns when the industry went from Multi-cell (MLC) to TLC and that seems to have gone well enough. Technology of the storage devices themselves improve over time and software-based mitigation strategies such as write avoidance also improve. I’m not knowledgeable enough on this latter point to go into details, but ONTAP is a beast and has all sorts of tricks up its sleeve.

So without further ado, I present NetApp’s Enterprise Capacity Flash line, the AFF C800, AFF C400 and AFF C250:

AFF C800
AFF C400
AFF C250

Quick Specs:

AFF C800AFF C400AFF C250
Max drive count (15.3TB NVMe QLC)1449648
Max effective capacity (5:1 efficiencies)8.8 PB5.9 PB2.9 PB
Max Usable capacity (1:1)1.6 PiB1.06 PiB540.37 TiB
Minimum configurations12 × 15.38 × 15.38 × 15.3
100GbE ports per HA pair20164
25GbE ports per HA pair1612 onboard / 16 HBA4 onboard / 16 HBA
32Gb FC ports323216
By the numbers

Now some of you may have thought, “I thought there was already a C-series with the C190?”, and you’d be right. NetApp is repurposing the C-series branding as well as introducing a successor to the C190, the AFF A150. While the new A150 will still have some restrictions, it won’t be nearly as restrictive as the C190. The physical form-factor remains the same as the C190, but the A150 will allow for up to two expansion shelves for a total of 72 SAS SSDs including the internal ones in capacities of 960GB, 3.8TB and 7.6TB, coming to a max usable capacity of ~402TiB, or 2.2PB at an efficiency level of 1:5.

Back to the new C-Series conversation, they bring with them a new default licensing model, ONTAP One. ONTAP One is something I have personally been asking for many years at this point, and it includes all of the licenses; Core, Data Protection, Hybrid Cloud and Security & Compliance. Personally I’m looking forward to not having to worry about what features are available with a certain license offering, instead, C-Series with ONTAP One as the default licensing model will ensure you or your customers will never be left wondering if their array has a given feature.

The C-Series should be available to quote as of March 27, 2023 and should start shipping by the end of April. This statement as well as all of the information above is based on pre-release information I received and may be subject to change at press time. I will endeavour to add corrections below should any of the above change at launch.

Migrating from the CN1610 to the BES-53248 for cluster interconnect

In my continuing effort to make the adoption of the BES-53248 more streamlined, I figured I would also write a migration guide as I personally had to read the documentation more than once to understand it completely. If you haven’t already checked it out, it might be helpful to first consult my first timers’ guide as the following guide starts with the assumption that your new switches are racked and Inter-Switch Links (ISL) connected and initial configuration has been performed.

Another quick caveat, this is by no means a replacement for the official documentation and the methods below may or may not be supported by NetApp. If you want the official procedure, that is documented here.

Now that we’ve got the above out of the way, I’ll get down to brass tacks. To keep things simple, we’re going to start with a simple two-node switched cluster which should look like this:

You should also have your new BES switched setup as so:

Next step, lets make sure we don’t get a bunch of cases created by kicking off a pre-emptive auto support:

system node autosupport invoke -node * -type all -message MAINT=2h

Elevate your privilege level and confirm all cluster LIFs are set to auto-revert:

set advanced
network interface show -vserver Cluster -fields auto-revert

If everything above looks good, it’s time to login to your second BES which NetApp wants you to name cs2 and configure a temporary ISL back to your first CN1610. Personally I feel the temporary ISL is optional, but can provide a bit of added insurance to your change:

(cs2) # configure
(cs2) (Config)# port-channel name 1/2 temp-isl
(cs2) (Config)# interface 0/13-0/16
(cs2) (Interface 0/13-0/16)# no spanning-tree edgeport
(cs2) (Interface 0/13-0/16)# addport 1/2
(cs2) (Interface 0/13-0/16)# exit
(cs2) (Config)# interface lag 2
(cs2) (Interface lag 2)# mtu 9216
(cs2) (Interface lag 2)# port-channel load-balance 7
(cs2) (Config)# exit

(cs2) # show port-channel 1/2
Local Interface................................ 1/2
Channel Name................................... temp-isl
Link State..................................... Down
Admin Mode..................................... Enabled
Type........................................... Static
Port-channel Min-links......................... 1
Load Balance Option............................ 7
(Enhanced hashing mode)

Mbr     Device/        Port      Port
Ports   Timeout        Speed     Active
------- -------------- --------- -------
0/13    actor/long     10G Full  False        
        partner/long
0/14    actor/long     10G Full  False 
        partner/long
0/15    actor/long     10G Full  False
        partner/long
0/16    actor/long     10G Full  False
        partner/long

At this point, we’re going to disconnect any of the connections to the second CN1610 and run these to the second BES-53248. You may need different cables to ensure they are supported, check Hardware Universe. When you’re done this recabling step, it should look like this:

Note: It’s this step here that made me realize the temporary ISL is optional since we now have our two sets of LIFs isolated from each other.

Next, let’s put the (optional) temporary ISL into play. At your first CN1610, disconnect the cables connected to ports 13-16 and once they’re all disconnected, assuming these cables are supported by both switches, plug them into ports 13-16 on your second BES, so it looks like this:

Now on the second BES-53248, verify the ISL is up:

show port-channel

Assuming the port-channel is up and running, let’s check the health of our cluster LIFs by issuing the following commands at the cluster command line:

network interface show -vserver Cluster -is-home false
network port show -ipspace Cluster

The first command shouldn’t produce any output, give the LIFs time to revert however. The second command, you want to make sure all ports are up and healthy. Once all the LIFs have reverted home, you can now move all the links from the first cluster node as well as removing the temporary ISLs so you end up with this:

Run the same two commands as before:

network interface show -vserver Cluster -is-home false
network port show -ipspace Cluster

Provide everything looks good, you’re free to remove the CN1610s from the rack as they are no longer in use. The final step is to clean up the configuration on your second BES-53248 by tearing down the temporary ISL configuration, done like this:

(cs2) # configure
(cs2) (Config)# deleteport 1/2 all
(cs2) (Config)# exit
(cs2) # write memory

This guide is by no means a replace for the official documentations but rather a companion to it. You should always consult the official documentation, I purposely cut out some of the steps I felt gave the docs a bit of a TL;DR feel but it doesn’t mean I wouldn’t personally run those steps if I were doing the work. This document is only my attempt to clarify the official docs, hopefully it does so for you.

NetApp releases a new AFF and a new FAS(?)

While we ramp up for NetApp INSIGHT next week, (the first virtual edition, for obvious reasons), NetApp has announced a couple of new platforms. First off, the AFF A220, NetApp’s entry-level, expandable AFF is getting a refresh in the AFF A250. While the 250 is a recycled product number, the AFF A250 is a substantial evolution of the original FAS250 from 2004.

The front bezel looks pretty much the same as the A220:

AFF A250 – Front Bezel

Once you remove the bezel, you get a sneak peak of what lies within from those sexy blue drive carriers which indicate NVMe SSDs inside:

AFF A250 – Bezel Removed

While the NVMe SSDs alone are a pretty exciting announcement for this entry-level AFF, once you see the rear, that’s when the possibilities start to come to mind:

AFF A250 – Rear View

Before I address the fact that there’s two slots for expansion cards, let’s go over the internals. Much like its predecessor, each controller contains a 12-core processor. While the A220 contained an Intel Broadwell-DE running at 1.5GHz, the A250 contains an Intel Skylake-D running at 2.2GHz providing roughly a 45% performance increase over the A220, not to mention 32, [*UPDATE: Whoops, this should read 16, the A220 having 8.] third generation PCIe lanes. System memory gets doubled from 64GB to 128GB as does NVRAM, going from 8GB to 16GB. Onboard connectivity consists of two 10GBASE-T (e0a/e0b) ports for 10 gigabit client connectivity with two 25GbE SFP28 ports for ClusterNet/HA connectivity. Since NetApp continues to keep HA off the backplane in newer models, they keep that door open for HA-pairs living in separate chassis, as I waxed about previously here. Both e0M and the BMC continue to share a 1000Mbit, RJ-45 port, and the usual console and USB ports are also included.

Hang on, how do I attach an expansion shelf to this? Well at launch, there will be four different mezzanine cards available to slot into one of the two expansion slots per controller. There will be two host connectivity cards available, one being a 4-port, 10/25Gb, RoCEv2, SFP28 card and the other being a 4-port, 32Gb Fibre Channel card leveraging SFP+. The second type of card available is for storage expansion: one is a 2-port, 100Gb Ethernet, RoCEv2, QSFP28 card for attaching up to one additional NS224 shelf, and the other being a 4-port, 12Gb SAS, mini-SAS HD card for attaching up to one additional DS224c shelf populated with SSDs. That’s right folks, this new platform will only support up to 48 storage devices, though in the AFF world, I don’t see this being a problem. Minimum configuration is 8 NVMe SSDs, max is 48 NVMe SSDs or 24 NVMe + 24 SAS SSDs, but you won’t be able to buy it with SAS SSDs. That compatibility is being included only for migrating off of or reusing an existing DS224x populated with SSDs. If that’s a DS2246, you’ll need to upgrade the IOM modules to 12GB prior to attachment.

Next up in the hardware announcement is the new FAS(?)…but why the question mark you ask? That’s because this “FAS” is all-flash. That’s right, the newest FAS to hit the streets is the FAS 500f. Now before I get into those details, I’d love to get into the speeds and feeds as I did above. The problem is that I would simply be repeating myself. This is the same box as the AFF A250, much like how the AFF A220 is the same box as the FAS27x0. The differences between the AFF 250 and the FAS500f are in the configurations and abilities or restrictions imposed upon it.

While most of the information above can be ⌘-C’d, ⌘-V’d here, this box does not support the connection of any SAS-based media. That fourth mez card I mentioned, the 4-port SAS one? Can’t have it. As for storage device options, much like Henry Ford’s famous quote:

Any customer can have a car painted any color that he wants so long as it is black.

-Henry Ford

Any customer can have any size NVMe drive they want in the FAS500f, so long as it’s a 15.3TB QLC. That’s right, not only are there no choices to be made here other than drive quantity, but those drives are QLC. On the topic of quantity, the available configurations start at a minimum 24 drives and can be grown to either 36 or 48, but that’s it. So why QLC? By now, you should be aware that the 10k/15k SAS drives we are so used to today for our tier 2 workloads are going away. In fact, the current largest spindle size of 1.8TB is slated to be the last drive size in this category. NetApp’s adoption of QLC media is a direct result of the sunsetting of this line of media. While I don’t expect to get into all of the differences between Single, Multi, Triple, Quad or Penta-level (SLC, MLC, TLC, QLC, or PLC) cell NAND memory in this post, the rule of thumb is the more levels, the lower the speed, reliability, and cost are. QLC is slated to be the replacement for 10k/15k SAS yet it is expected to perform better and only be slightly more expensive. In fact, the FAS500f is expected to be able to do 333,000 IOPS at 3.6ms of latency for 100% 8KB random read workloads or 170,000 IOPS at 2ms for OLTP workloads with a 40/60 r/w split.

Those are this Fall’s new platforms. If you have any questions put it in a comment or tweet at me, @ChrisMaki, I’d love to hear your thoughts on these new platforms. See you next week at INSIGHT 2020, virtual edition!

***UPDATE: After some discussion over on Reddit, it looks like MetroCluster IP will be available on this platform at launch.

Gartner’s new Magic Quadrant for Primary Storage

Hot off the presses is Gartner’s new Magic Quadrant (GMQ) for Primary Storage and it’s great to see NetApp at the top-right, right where I’d expect them to be. This is the first time Gartner has combined rankings for primary arrays and not separated out all-flash from spinning media and hybrid arrays, acknowledging that all-flash is no longer a novelty.

As you can see on the GMQ below, the x-axis represents completeness of vision while the y-axis measures ability to execute, NetApp being tied with Pure on X and leading on Y.

As mentioned, this new MQ marks the retiring of the previous divided GMQs of Solid-State Arrays and General-Purpose Disk Arrays. To read more about NetApp’s take on this new GMQ, head over to their blog post on the subject or request a copy of the report here.

There’s a new NVMe AFF in town!

Yesterday, NetApp announced a new addition to the midrange tier of their All-Flash FAS line, the AFF A320. With this announcement, end-to-end NVMe is now available in the midrange, from the host all the way to the NVMe SSD. This new platform is a svelte 2RU that supports up to two of the new NS224 NVMe SSD shelves, which are also 2RU. NetApp has set performance expectations to be in the ~100µs range.

Up to two PCIe cards per controller can be added, options are:

  • 4-port 32GB FC SFP+ fibre
  • 2-port 100GbE RoCEv2* QSFP28 fibre (40GbE supported)
  • 2-port 25GbE RoCEv2* SPF28 fibre
  • 4-port 10GbE SFP+ Cu and fibre
    *RoCE host-side NVMeoF support not yet available

A couple of important points to also note:

  • 200-240VAC required
  • DS, SAS-attached SSD shelves are NOT supported

An end-to-end NVMe solution obviously needs storage of some sort, so also announced today was the NS224 NVMe SSD Storage Shelf:

  • NVMe-based storage expansion shelf
  • 2RU, 24 storage SSDs
  • 400GB/s capable, 200Gb/sec per shelf module
  • Uplinked to controller via RoCEv2
  • Drive sizes available: 1.9TB, 3.8TB and 7.6TB. 15.3TB with restrictions.

Either controller in the A320 has eight 100GbE ports on-board, but not all of them are available for client-side connectivity. They are allocated as follows:

  • e0a → ClusterNet/HA
  • e0b → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0c → First NS224 connectivity
  • e0d → ClusterNet/HA
  • e0e → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0f → First NS224 connectivity
  • e0g → Client network, 100GbE or 40Gbe
  • e0h → Client network, 100GbE or 40Gbe

If you don’t get enough client connectivity with the on-board ports, then as listed previously, there are myriad PCIe options available to populate the two available slots. In addition to all that on-board connectivity, there’s also MicroUSB and RJ45 for serial console access as well as the RJ-45 Wrench port to host e0M and out-of-band management via BMC. As with most port-pairs, the 100GbE ports are hosted by a single ASIC which is capable of a total effective bandwidth of ~100Gb.

Food for thought…
One interesting design change in this HA pair, is that there is no backplane HA interconnect as has been the case historically; instead, the HA interconnect function is placed on the same connections as ClusterNet, e0a and e0d. This enables some interesting future design possibilities, like HA pairs in differing chassis. Also, of interest is the shelf connectivity being NVMe/RoCEv2; while currently connected directly to the controllers, what’s stopping NetApp from putting these on a switched fabric? Once they do that, drop the HA pair concept above, and instead have N+1 controllers on a ClusterNet fabric. Scaling, failovers and upgrades just got a lot more interesting.

Raw AutoSupport, tried and true – sysconfig

While NetApp keeps improving the front end that is ActiveIQ, for both pre-sales and support purposes, I constantly find myself going into the Classic AutoSupport and accessing the raw autosupport data; most often it’s sysconfig -a. Recently I was trying to explain the contents to a co-worker and I realized that I should just document it as a blog post. So here is sysconfig explained.

The command sysconfig -a is the old 7-mode command to give you all the hardware information from the point of view of ONTAP. All the onboard ports are assigned to “slot 0” whereas slot 1-X are the physical PCIe slots where myriad cards can be inserted. Here’s one example, I’ll insert comments as I feel it is appropriate. Continue reading

ONTAP 9.4 – Improvements and Additions

While the actual payload hasn’t hit the street yet, here’s what I can tell you about the latest release in the ONTAP 9 family which should be available here any day now. **EDIT: RC1 is here. 9.4 went GA today.

FabricPool

Lots of improvements to ONTAP’s object-tiering code in this release, it appears they’re really pushing development here:

  • Support for Azure Blob, both hot and cool tiers, no archive tier support
    • This adds to the already supported AWS-S3 and StorageGRID Webscale object stores
  • Support for cold-data tiering policies, whereas in 9.2,9.3 it was backup and snapshot-only tiering policies
    • Default definition of cold data is 31 days but can be adjust to anywhere from 2-63 days.
    • Not all cold blocks need to be made hot again, such as snapshot-only blocks. Random reads will be considered application access, declared hot and written back into performance tier whereas sequential reads are assumed to be indexers, virus scanners or other and should be kept cold and therefore will not be written back into performance tier.
  • Now supported in ONTAP Select, in addition to the existing ONTAP and ONTAP Cloud. Wherever you run ONTAP, you can now run FabricPools, SSD aggregate caveat still exists.
  • Inactive Data Reporting by OnCommand System Manager to determine how much data would be tiered if FabricPools were implemented.
    • This one will be key to clients thinking about adopting FabricPools
  • Object Store Profiler is a new tool in ONTAP that will test the performance of the object store you’re thinking of attaching so you don’t have to dive in without knowing what your expected performance should be.
  • Object Defragmentation now optimizes your capacity tier by reclaiming space that is no longer referenced by the performance tier
  • Compaction comes to FabricPools ensuring that your write stripes are full as well as applying Compression, Deduplication

Continue reading

ONTAP 9.3 is out soon, here’s the details you need.

It’s that time of year again, time for an ONTAP release…or at least an announcement. When 9.3 drops, not only will it be an LTS (Long Term Support) version, but NetApp continues to refine and enhance ONTAP.

Simplifying operations:

  • Application-Aware, data management for MongoDB
  • Adaptive QoS
  • Guide cluster setup and expansion
  • simplified data protection setup, much simpler.

Efficiencies:

Not so long ago, in ONTAP 9.2, NetApp introduced inline, aggregate-level dedupe. What many people may not have realized, due to the nature of way ONTAP coalesces writes in NVRAM prior to flushing them to the durable layer is that this inline aggregate dedupe’s domain was restricted to the data in the NVRAM. With 9.3, a post-process aggregate scanner has been implemented to provide true, aggregate-level dedupe.

Continue reading

What you need to know about NetApp’s 40GbE options

­With the introduction of the new NetApp platforms back in September 2016, came 40GbE as well as 32Gb Fibre Channel connectivity.

I had my first taste of 40GbE on the NetApp side back in January when I got to install the first All Flash FAS A700 in Canada. The client requested a mix of 40GbE and 16Gb FC with some of the 40GbE being broken out into 4 × 10GbE interfaces and some being used natively.

NetApp is deploying two flavours of 40GbE cards: the X1144A for the AFF A300, AFF A700s and FAS8200, and the X91440A for the AFF A700 and FAS9000 storage systems. At first glance, you might be tempted to assume that those are the same PCIe card since the part numbers are very similar (the latter just being in some sort of carrier to satisfy the I/O module requirement for the blade-style chassis that is home to the A700 and FAS9000), Upon further inspection the two are not exactly equal.

The ports on most PCIe cards and onboard interfaces are deployed in pairs, with one shared application-specific integrated circuit (ASIC) on the board behind the physical ports. On the X1144A, both external ports share one ASIC with an available combined bandwidth of 40Gb/s, whereas the X91440A has two ASICs. Each has two ports, but one is internal and not connected to anything, giving you 40Gb/s per external port.

The ASIC (or controller) in question is the Intel XL710. What’s important about this is that both external ports on an X91440A can be broken out to 4 × 10GbE interfaces for a total of eight, or one can remain at 40GbE while the other is broken out. On the X1144A however, you can either connect both ports to your switch using 40GbE connections or you can break-out port A to 4 × 10GbE and port B gets disabled. According to Intel, if you connect both ports via 40GbE, “The total throughput supported by the 710 series is 40 Gb/s, even when connected via two 40 Gb/s connections.”

Now before we get all up in arms about this, lets really get into the weeds here. Both the FAS8200/FAS9000 and the AFF A300/700 are using PCIe 3.0. Each PCIe 3.0 lane can carry 8 Gigatransfers per second (GT/s). For the purposes of this post, that is close enough to 8Gb/s. The FAS8200/AFF A300 has an Intel D-1587 CPU with a maximum eight lanes per slot, so roughly 64Gb/s of throughput, whereas the FAS9000/AFF A700 has an Intel E5-2697 with a maximum 16 lanes per I/O slot which gives it about 128Gb/s of throughput. So even if NetApp included a network interface card for the A300/FAS8200 with two XL710’s on it, the PCIe slot it’s connected to couldn’t provide 80Gb/s of throughput, whereas the the I/O modules in the A700/FAS9000 can.

Say you want to change between 40GbE and 10GbE. Unlike modifying UTA2 profiles (as explained here), with the XL710, you need to get into maintenance mode first and use the nicadmin command. Here’s an example:

sysconfig output before:

slot 1: 40 Gigabit Ethernet Controller XL710 QSFP+
                 e1a MAC Address:    00:a0:98:c5:b2:fb (auto-40g_cr4-fd-up)
                 e1e MAC Address:    00:a0:98:c5:b2:ff (auto-unknown-down)

At this point I already had the breakout cable installed. That’s why the second link shows as down.

Conversion example:

*> nicadmin
 nicadmin convert -m { 40G | 10G } <port-name>
 
 
 *> nicadmin convert -m 10g e1e
 Converting e1e 40G port to four 10G ports
 Halt, install/change the cable, and then power-cycle the node for
 the conversion to take effect.  Depending on the hardware model,
 the SP (Service Processor) or BMC (Baseboard Management Controller)
 can be used to power-cycle the node.

sysconfig output after:

slot 1: 40 Gigabit Ethernet Controller XL710 QSFP+
                 e1a MAC Address:    00:a0:98:c5:b2:fb (auto-40g_cr4-fd-up)
                 e1e MAC Address:    00:a0:98:c5:b2:ff (auto-10g_twinax-fd-up)
                 e1f MAC Address:    00:a0:98:c5:b3:00 (auto-10g_twinax-fd-up)
                 e1g MAC Address:    00:a0:98:c5:b3:01 (auto-10g_twinax-fd-up)
                 e1h MAC Address:    00:a0:98:c5:b3:02 (auto-10g_twinax-fd-up)

Unfortunately I don’t have access to either a FAS8200 nor an AFF A300 with 40GbE otherwise I’d provide the sysconfig output before and after there as well.

Now, there’s a bit of a debate going on around the viability of 40GbE over 100GbE. While 40GbE is simply a combined 4 × 10GbE; 100GbE is only a combined 4 × 25GbE. With regards to production costs, apparently to make a 40GbE QSFP+, you literally combine 4 lasers (hence the Q in QSFP) into the module; well, the same goes for 100GbE. You only need one laser to produce the wavelength for 25GbE, and while that still means you need four for 100GbE, four times the production cost still yields 250% of the throughput of 40GbE which makes me wonder where it will end up in a year.

So there you go, more than you ever wanted to know about NetApp’s recent addition of 40GbE into the ONTAP line of products as well as my personal philosophical waxing around the 40 versus 100 GbE debate.

NetApp Volume Encryption, The Nitty Gritty

It all begins in the configuration builder tool

This article focuses on the implementation and management of encryption with NetApp storage. Data at Rest Encryption (NetApp Volume Encryption or NVE for short) is one of the ways that you can achieve encryption with NetApp, and it’s one of the most exciting new features of ONTAP 9.1. Here’s how you go about implementing it.

If you’re a partner or NetApp SE, when building configurations, as long as the cluster software version is set to 9.x, there is a checkbox that lets you decide which version of ONTAP gets written to the device at the factory. As of 9.1, ONTAP software images will either be capable of encryption via a software encryption module, or not. There are laws around both the import and export of software that is capable of encryption, but that is beyond the scope of this article. I do know you can use the encryption-capable image in Canada (where I am located), so I’m covered. If you’re unsure about the laws in your country, consult your legal adviser on this matter.

Once this cluster-level toggle has been set and you add hardware into the configuration, there are two more checkboxes in the software section:

  1. NetApp Volume Encryption (off by default)
  2. Trusted Platform Module (TPM, on by default) ***Clarification Update*** – TPM NOT REQUIRED FOR NVE

The first one triggers the generation of the license key for NVE and the second one activates a piece of hardware dedicated to deal with cryptographic keys. One thing I’m still not sure of is (should you choose to remove the checkmark)  if the TPM is simply disabled or doesn’t physically exist in your NetApp controller, I have an email into NetApp to confirm this. [Update: The module is integral to the controller and disabled in firmware if being shipped to certain countries. Shout out to @Keith_Aasen for tracking that down for me.]

Okay, now for the more customer-relevant information…

To get started with NVE, you’re going to need a few things:

  1. A encryption-capable platform
  2. A encryption-capable image of ONTAP
  3. A key manager
  4. A license key for NVE

Encryption-capable platform

The following platforms are currently capable of encryption: FAS6290, FAS80xx, FAS8200, and AFF A300. This is limited by the CPU in the platform as it must have a sufficient clock-speed and core-count with support for the AES instruction set. I’m sure this list will be ever-expanding, but be sure to check first if you’re hoping to use NVE. [UPDATE: After some digging, I can confirm that all the new models support NVE, the entry-level FAS2650 included.]

Encryption-capable image of ONTAP

Provided you’re not in a restricted country as per the above, your image will be the standard nomenclature of X_q_image.tgz where X is the version number. The non-encryption-capable version will be X_q_nodar_image.tgz which I’ll simply refer to as nodar(e) (No Data At Rest Encryption) for the rest of this article. The output of version -v will tell you if you’re nodar or standard.

NetApp Release 9.1RC1: Sun Sep 25 20:10:49 UTC 2016 <1O>

NetApp Release 9.1RC1: Sun Sep 25 20:10:49 UTC 2016 <1Ono-DARE>

Key manager

The on-board key manager introduced in ONTAP 9.0 enables you to manage keys for use with your NSE drives, helping you avoid costly and possibly complex external solutions. Currently, NVE only supports using the on-board manager, so if you’re going to use NVE layered on top of NSE, you need to use the on-board one.

Setting this up is exactly one command:

security key-manager setup

You’ll be prompted for a passphrase, and that’s it, you’re done.

License key for NVE

If you didn’t get this license key at time of purchase, talk to your account representative or SE over at NetApp (though, hopefully, if you’ve bought one of the new systems announced at Insight 2016, they decided to include it since, at least for now, it is a no-cost license).

What next?

Now that you’ve got all the prerequisites covered, encrypting your data is very simple. As the name implies, encryption is done at the volume level, so naturally it’s a volume command that encrypts the data (a volume move command, in fact):

volume move start -volume vol_name -destination-aggregate aggr_name -encrypt-destination true

The destination aggregate can even be the same aggregate that the volume is already hosted on. Don’t want that volume encrypted anymore for some reason? Change that last flag to false.

If you’re creating a new volume that you want encrypted, that’s just as simple:

volume create -volume vol_name -aggregate aggr_name -size 1g -encrypt true

Wrapping up

NetApp Volume Encryption is pretty easy, but since it’s so new, OnCommand System Manager doesn’t support it just yet. You’ll have to stick to the CLI for now, although I’m sure the GUI will catch up eventually, if that’s your preferred point of administration. It should also be noted that while NSE solutions are FIPS 140-2 compliant, NVE has yet to go through the qualifications. Also, if FIPS is a requirement, the on-board key manager isn’t compliant yet either. Since with the on-board key manager the keys are literally stored on the same hardware using them, NVE only protects you from compromised data on individual drives removed from your environment through theft or RMA. If someone gained wholesale access to the HA pairs, the data would still be retrievable. Also, this is for data-at-rest only. You must follow other precautions for data-in-flight encryption.

Into the weeds

I did all my tests for this post using the simulator, and I learned a lot, but your mileage may vary. In the end, only you are responsible for what you do to your data. I had heard that if you have the wrong software image then you’d have to do a complete wipe of your HA pair in order to convert it. I have since proven this wrong (at least in the simulator) and I definitely can’t guarantee the following will be supported.

For my tests I had two boot images loaded: one standard and one nodar. What I learned is that you can boot into either mode, provided you don’t have any encrypted data. Even if you have the key manager setup and NVE is licensed, you can still boot back and forth. The first time you boot your system using the nodar image with encrypted data on the system, however, you’ll hose the whole thing. I did test first encrypting data, then decrypting it, then converting to nodar, and the simulator booted fine. When I booted into nodar with an encrypted volume, even going back to standard didn’t work. Booting into maintenance mode shows the aggregates with a status of partial and the boot process hints that they are in some sort of transition phase (7MTT?). Either way, I was unable to recover my simulator once I got it to this state, so I definitely advise against it in production. Heck, I’d advise you just to use the proper image to start with.

I hope you learned something. If you have any questions or comments, either post them below or reach out on twitter. I’m @ChrisMaki from the #NetAppATeam and Solution Architect @ScalarDecisions.