Category Archives: NetApp

ONTAP One for all and all for One

Back in February, NetApp announced their new QLC-based AFF systems, NetApp C-Series, C being for capacity flash. That new product line alone was celebration-worthy, what was really exciting and involved a touch of burying the lead was the inclusion of a licensing model called ONTAP One which is the all-you-can-eat equivalent of NetApp licensing. When a C-series is licensed with ONTAP One, you get to use all of the features of ONTAP. At the time of launch, my only complaint was that it was only on the new platform, but behind the scenes I was told to watch that space. Well, as of today, all new and existing FAS, AFF and ASA systems licensed with anything more than the bare minimum can now get licenses for everything ONTAP.

NetApp has simplified their licensing to only two options, ONTAP Base and ONTAP One. If your existing system had either Flash, Core+DP, or Premium, you are now entitled to ONTAP One licensing. What exactly does that look like? Here’s a picture:

How do you acquire your new licenses you may ask? Customers with a valid support contract can login to the NetApp support portal, download a new license file and install it. Some features may require you to upgrade to 9.10.1, but you should really be on at least that new of a release by now.

As with all great things, there are some caveats and restrictions but not enough to warrant covering them here. The majority of my readers will be able to proceed as above, and edge cases around the IPA license model versus LICKEYs or SnapMirror Cloud/S3 SnapMirror licenses can be found in the documentation.

ONTAP One is something I’ve wanted NetApp to introduce for years, this will not only eliminate and post-sales problems due to improper configurations but also remove a FUD point for their competition.

C-Series lineup

NetApp Announces A Whole New Line

Up until today, if you were looking for a physical ONTAP array for your environment, your choices were the hybrid flash, FAS array offering around 5-10ms of latency or the sub-ms AFF A-series. Sure there was one anomaly in there, the QLC-based FAS 500f, but that AFF in FAS clothing was just that, an anomaly. While I have no evidence to point to here, but my theory is that the 500f was NetApp’s way of dipping their toe in the water of QLC-based arrays. Upon launch, the 500f was pricey and the configurations limited and restricted, both of which were addressed at some point after launch. As an employee at a partner that sells a lot of NetApp, I looked at the 500f when it first launched and then basically never looked at it again because of those two points.

Today, NetApp is announcing the all new C-Series of QLC-based arrays, the “C” being for “Capacity Enterprise Flash”. While the controllers themselves aren’t new, the fact that they only support QLC media is what is different. While I won’t go into the details of what QLC, or Quad Layer Flash is in this post, the fact of the matter is that it is more affordable than Triple Layer Flash (TLC) and almost as performant. What this means for those purchasing NetApp arrays is that they can get near the performance of an AFF system at a fraction of the cost. Most of us in the storage world know that 10k and 15k RPM SAS drives are slowly going to be phased out in favour of high-capacity SATA drives and high-performance NAND storage, leaving a void. QLC-based arrays will fill that void, and at a higher performance level. If you start to research QLC vs TLC, you’ll find lots of concerns around durability which are not completely unfounded, but you would have also found these concerns when the industry went from Multi-cell (MLC) to TLC and that seems to have gone well enough. Technology of the storage devices themselves improve over time and software-based mitigation strategies such as write avoidance also improve. I’m not knowledgeable enough on this latter point to go into details, but ONTAP is a beast and has all sorts of tricks up its sleeve.

So without further ado, I present NetApp’s Enterprise Capacity Flash line, the AFF C800, AFF C400 and AFF C250:

AFF C800
AFF C400
AFF C250

Quick Specs:

AFF C800AFF C400AFF C250
Max drive count (15.3TB NVMe QLC)1449648
Max effective capacity (5:1 efficiencies)8.8 PB5.9 PB2.9 PB
Max Usable capacity (1:1)1.6 PiB1.06 PiB540.37 TiB
Minimum configurations12 × 15.38 × 15.38 × 15.3
100GbE ports per HA pair20164
25GbE ports per HA pair1612 onboard / 16 HBA4 onboard / 16 HBA
32Gb FC ports323216
By the numbers

Now some of you may have thought, “I thought there was already a C-series with the C190?”, and you’d be right. NetApp is repurposing the C-series branding as well as introducing a successor to the C190, the AFF A150. While the new A150 will still have some restrictions, it won’t be nearly as restrictive as the C190. The physical form-factor remains the same as the C190, but the A150 will allow for up to two expansion shelves for a total of 72 SAS SSDs including the internal ones in capacities of 960GB, 3.8TB and 7.6TB, coming to a max usable capacity of ~402TiB, or 2.2PB at an efficiency level of 1:5.

Back to the new C-Series conversation, they bring with them a new default licensing model, ONTAP One. ONTAP One is something I have personally been asking for many years at this point, and it includes all of the licenses; Core, Data Protection, Hybrid Cloud and Security & Compliance. Personally I’m looking forward to not having to worry about what features are available with a certain license offering, instead, C-Series with ONTAP One as the default licensing model will ensure you or your customers will never be left wondering if their array has a given feature.

The C-Series should be available to quote as of March 27, 2023 and should start shipping by the end of April. This statement as well as all of the information above is based on pre-release information I received and may be subject to change at press time. I will endeavour to add corrections below should any of the above change at launch.

Migrating from the CN1610 to the BES-53248 for cluster interconnect

In my continuing effort to make the adoption of the BES-53248 more streamlined, I figured I would also write a migration guide as I personally had to read the documentation more than once to understand it completely. If you haven’t already checked it out, it might be helpful to first consult my first timers’ guide as the following guide starts with the assumption that your new switches are racked and Inter-Switch Links (ISL) connected and initial configuration has been performed.

Another quick caveat, this is by no means a replacement for the official documentation and the methods below may or may not be supported by NetApp. If you want the official procedure, that is documented here.

Now that we’ve got the above out of the way, I’ll get down to brass tacks. To keep things simple, we’re going to start with a simple two-node switched cluster which should look like this:

You should also have your new BES switched setup as so:

Next step, lets make sure we don’t get a bunch of cases created by kicking off a pre-emptive auto support:

system node autosupport invoke -node * -type all -message MAINT=2h

Elevate your privilege level and confirm all cluster LIFs are set to auto-revert:

set advanced
network interface show -vserver Cluster -fields auto-revert

If everything above looks good, it’s time to login to your second BES which NetApp wants you to name cs2 and configure a temporary ISL back to your first CN1610. Personally I feel the temporary ISL is optional, but can provide a bit of added insurance to your change:

(cs2) # configure
(cs2) (Config)# port-channel name 1/2 temp-isl
(cs2) (Config)# interface 0/13-0/16
(cs2) (Interface 0/13-0/16)# no spanning-tree edgeport
(cs2) (Interface 0/13-0/16)# addport 1/2
(cs2) (Interface 0/13-0/16)# exit
(cs2) (Config)# interface lag 2
(cs2) (Interface lag 2)# mtu 9216
(cs2) (Interface lag 2)# port-channel load-balance 7
(cs2) (Config)# exit

(cs2) # show port-channel 1/2
Local Interface................................ 1/2
Channel Name................................... temp-isl
Link State..................................... Down
Admin Mode..................................... Enabled
Type........................................... Static
Port-channel Min-links......................... 1
Load Balance Option............................ 7
(Enhanced hashing mode)

Mbr     Device/        Port      Port
Ports   Timeout        Speed     Active
------- -------------- --------- -------
0/13    actor/long     10G Full  False        
        partner/long
0/14    actor/long     10G Full  False 
        partner/long
0/15    actor/long     10G Full  False
        partner/long
0/16    actor/long     10G Full  False
        partner/long

At this point, we’re going to disconnect any of the connections to the second CN1610 and run these to the second BES-53248. You may need different cables to ensure they are supported, check Hardware Universe. When you’re done this recabling step, it should look like this:

Note: It’s this step here that made me realize the temporary ISL is optional since we now have our two sets of LIFs isolated from each other.

Next, let’s put the (optional) temporary ISL into play. At your first CN1610, disconnect the cables connected to ports 13-16 and once they’re all disconnected, assuming these cables are supported by both switches, plug them into ports 13-16 on your second BES, so it looks like this:

Now on the second BES-53248, verify the ISL is up:

show port-channel

Assuming the port-channel is up and running, let’s check the health of our cluster LIFs by issuing the following commands at the cluster command line:

network interface show -vserver Cluster -is-home false
network port show -ipspace Cluster

The first command shouldn’t produce any output, give the LIFs time to revert however. The second command, you want to make sure all ports are up and healthy. Once all the LIFs have reverted home, you can now move all the links from the first cluster node as well as removing the temporary ISLs so you end up with this:

Run the same two commands as before:

network interface show -vserver Cluster -is-home false
network port show -ipspace Cluster

Provide everything looks good, you’re free to remove the CN1610s from the rack as they are no longer in use. The final step is to clean up the configuration on your second BES-53248 by tearing down the temporary ISL configuration, done like this:

(cs2) # configure
(cs2) (Config)# deleteport 1/2 all
(cs2) (Config)# exit
(cs2) # write memory

This guide is by no means a replace for the official documentations but rather a companion to it. You should always consult the official documentation, I purposely cut out some of the steps I felt gave the docs a bit of a TL;DR feel but it doesn’t mean I wouldn’t personally run those steps if I were doing the work. This document is only my attempt to clarify the official docs, hopefully it does so for you.

The BES-53248 first-timer’s guide

With the CN-1610 starting to get long in the tooth and with more platforms supporting and/or requiring a cluster interconnect network greater than 10Gbit, the need to introduce a non-Cisco option came to be. This option is the BES-53248, which is a “Broadcom Supported” switch produced by Quanta, makers of all things hyper scale who sells it as the QuantaMesh T4048-IX8. At some point Broadcom’s EFOS is installed on the T4048-IX8 via the Open Network Install Environment (ONIE) and it becomes the product we know as the BES-53248. While definitely a superior switch, supporting 10/25/40/100Gbit, the deployment thereof is not as streamlined, hence this post.

I struggled a bit with how to approach this topic and settled up the following: I will provide a numbered list of steps as a guide and index but then have sections below that expand upon those steps. There could very well be times where you want to perform these steps in a different order but if this is your first time working on this switch and it’s factory-fresh, the steps below are how I would advise proceeding.

  1. Equipment Ordering, including licences
  2. Broadcom Support Account, Firmware Download
  3. Reference Configuration Files (RCF)
  4. Supporting Infrastructure
  5. Initial Configuration

Equipment Ordering, including licences

The BES-53248 has 48 × 10/25Gbit ports and 8 × 40/100Gbit ports; by default the first 16 × 10/25Gbit ports are available for cluster interconnect connections and the last 2 × 40/100Gbit are reserved for Inter-Switch Links (ISL); which is already an improvement over the CN1610’s 12 × ClusterNet ports. If the environment requires more ports than this, the 10/25Gbit ports can be licensed in blocks of 8 (Part # SW-BES-53248A2-8P-1025G) all the way up to 48, and there is one license (Part # SW-BES-53248A2-6P-40-100G) to activate the remaining 6 × 40/100Gbit ports. Be sure your order also has all the requisite transceivers and cables, consult HWU for specific compatibilities. Lastly, the BES-53248 doesn’t ship with rails by default, so make sure your quote shows them if you need them.

When your switches arrive they will include a manilla envelope with licensing information if licenses above the base configuration were ordered, do not recycle this envelope as it contains the very important Transaction Key which you will use to generate your license file at this site:

https://efos-licensing.broadcom.com/License/RedeemTransactionKey

Before visiting that link, along with your license keys you’ll need the switch serial numbers which are located on the switches themselves like so:

The license file generation procedure is instant, so not having this ahead of time isn’t that big of a deal provided you have internet access while at the installation site.

Broadcom Support Account, Firmware Download

What isn’t instantaneous however is the creation of a TechData-provided, Broadcom Support Account (BSA), and you need this account do download firmware for the switches. In order to setup a BSA, which hopefully you did a couple of days in advance of requiring the firmware, you need to send an email to: support@techdata.com with the following information:

Indicate if OEM (Netapp/Lenovo), Partner/Installer or Customer:
Name of Company device is registered to (if partner/installer):
Requester Name:
Requester Email Address:
Requester Phone Number:
Address where device is located:
Device Model Number: BES-53248
Device Serial Number:

I’ve found the folks that respond to this email address are pretty easy to deal with, though I’m not sure you’ll be able to get your account if you don’t already have the serial number, comment below if you know. My account creation took roughly 24 hours and then I had access to the firmware downloads. Download the appropriate firmware for your environment. The switches I received in August of 2021 shipped with EFOS 3.4.4.6 which was supported in the environment I was deploying into, but so was 3.7.0.4 so that’s where I wanted to land.

Reference Configuration Files (RCF)

Download the appropriate RCF for the environment and edit accordingly. If you visit HWU and drill down into the switch category, you can download the RCF from there:

I was converting an AFF8080 from two-node switchless to switched and adding an A400 at 100Gbit. I grabbed RCF 1.7 from Hardware Universe (not where I’d expect to find it but nice and easy) and uncommented ports 0/49-0/54 by removing the initial exclamation point on the lines in question since the additional 40-100 license activates all of these ports, I deleted the lines setting the speed to 40G full-duplex. I hope in version 1.8 of the RCF, this configuration will also be applied as a range since that’s the only license option available for purchase on these ports.

Supporting Infrastructure

In your site requirements checklist, ensure the availability of an http (or ftp, tftp, sftp, scp) server on the management network. Once the equipment is racked and the management interface cabled, you will need this server to host your EFOS firmware, license files and RCF.

Initial Configuration

The first time you connect to the device, most likely via serial, assuming the unit was factory-fresh like mine, the username should be admin and the password should be blank. You will be immediately forced to change the password. I noticed that when I was going through this, copying, and pasting the new password didn’t work for me but typing the same password did; this may have had something to do with the special characters chosen or the app I was using (serial.app on macOS). Another thing to be aware of, if you’re applying RCF 1.7 you will have to be on EFOS 3.7.0.4 first. The switches I based this post on shipped with 3.4.4.6 and there are some commands in the RCF that aren’t compatible, so you’ll want to upgrade EFOS before applying RCF 1.7. Also, applying an RCF means wiping any existing configuration first, so you might as well get this out of the way while you are on site.

Once you’ve changed the password, it’s time to configure the management IP address so you can retrieve the license files, EFOS image and RCF from the http server mentioned previously. You’ll need to be logged in, and have elevated your privilege level to enable:

User:admin
Password:************
(CLswitch-01) >enable

(CLswitch-01) #serviceport ip 10.0.0.209 255.255.255.0 10.0.0.1

(CLswitch-01) #show serviceport

Interface Status............................... Up
IP Address..................................... 10.0.0.209
Subnet Mask.................................... 255.255.255.0
Default Gateway................................ 10.0.0.1
IPv6 Administrative Mode....................... Enabled
IPv6 Prefix is ................................ fe80::c218:50ff:fe0b:24c5/64
Configured IPv4 Protocol....................... None
Configured IPv6 Protocol....................... None
IPv6 AutoConfig Mode........................... Disabled
Burned In MAC Address.......................... B4:A9:FC:34:8F:CE

(CLswitch-01) #ping 10.0.0.1
 Pinging 10.0.0.1 with 0 bytes of data:

Reply From 10.0.0.1: icmp_seq = 0. time= 2 msec.
Reply From 10.0.0.1: icmp_seq = 1. time <1 msec.
Reply From 10.0.0.1: icmp_seq = 2. time= 26 msec.

Now that you are on the network, the first thing we should do is add any additional licenses. Here are the commands with an explanation of what they do:

show license


show port all | exclude Detach 


copy http://10.0.0.80/switch1_license.data nvram:license-key 1 

reload

show license




show port all | exclude Detach 
See how many licenses are currently applied, if any.

Display currently licensed ports.

Copies the file from the http server and places it in index 1

reboots the switch

This is after you’ve re-logged in, it should show you something different than the last time you ran this.

This should show additional ports than from before adding the license.

Once you have added your license file(s), it’s time to upgrade EFOS, here are the commands with an explanation of what they do:

show bootvar



copy active backup


show bootvar

show version


copy http://10.0.0.80/FastPath-EFOS-esw-qcp_td3-qcp_td3_x86_64-LX415R-CNTRF-BD6IOQHr3v7m0b4.stk active 

show bootvar

reload

show version
Shows the images: active, backup, current-active and next-active.

Copies the active image to the backup slot, just in case.

Verify that the above worked.

Shows the version actually running.

Copies the image on the web server to the active slot.




Verify the last command.

Reboot the switch.

Verify the upgrade worked.

Now that we have upgraded our EFOS image, it’s time to apply the RCF. There really is no point in doing any additional configuration until we’ve done this since we have to destroy our configuration before applying the RCF anyway. Be sure that you’re only applying the default RCF if you haven’t added any licenses. If you have added licences, you need to uncomment the lines that configure the additionally licensed ports. Here are the commands with an explanation:

erase startup-config




copy http://10.0.0.80/BES-53248_RCF_v1.7-Cluster-HA.txt nvram:script BES-53248_RCF_v1.7-Cluster-HA.scr 

script list




script apply BES-53248_RCF_v1.7-Cluster-HA.scr 

show running-config



write memory


reload

This clears the startup configuration, overlaying an RCF-sourced configuration can have negative consequences.

This copies the txt file from the web server to NVRAM as a script and renames it in the process.

gives you a directory listing of available scripts to confirm the above transfer worked

applies the contents of the RCF to the configuration

displays the new running configuration to verify successful application of RCF

commit new configuration to non-volatile memory

reboots the switch so this new configuration can take affect

There, you’re all done, now you can proceed with the official guide on (re)configuring the management IP address, ssh and so on. Good luck, and if you have an experience that strays from the above, please let me know so I can update the post.

NetApp releases a new AFF and a new FAS(?)

While we ramp up for NetApp INSIGHT next week, (the first virtual edition, for obvious reasons), NetApp has announced a couple of new platforms. First off, the AFF A220, NetApp’s entry-level, expandable AFF is getting a refresh in the AFF A250. While the 250 is a recycled product number, the AFF A250 is a substantial evolution of the original FAS250 from 2004.

The front bezel looks pretty much the same as the A220:

AFF A250 – Front Bezel

Once you remove the bezel, you get a sneak peak of what lies within from those sexy blue drive carriers which indicate NVMe SSDs inside:

AFF A250 – Bezel Removed

While the NVMe SSDs alone are a pretty exciting announcement for this entry-level AFF, once you see the rear, that’s when the possibilities start to come to mind:

AFF A250 – Rear View

Before I address the fact that there’s two slots for expansion cards, let’s go over the internals. Much like its predecessor, each controller contains a 12-core processor. While the A220 contained an Intel Broadwell-DE running at 1.5GHz, the A250 contains an Intel Skylake-D running at 2.2GHz providing roughly a 45% performance increase over the A220, not to mention 32, [*UPDATE: Whoops, this should read 16, the A220 having 8.] third generation PCIe lanes. System memory gets doubled from 64GB to 128GB as does NVRAM, going from 8GB to 16GB. Onboard connectivity consists of two 10GBASE-T (e0a/e0b) ports for 10 gigabit client connectivity with two 25GbE SFP28 ports for ClusterNet/HA connectivity. Since NetApp continues to keep HA off the backplane in newer models, they keep that door open for HA-pairs living in separate chassis, as I waxed about previously here. Both e0M and the BMC continue to share a 1000Mbit, RJ-45 port, and the usual console and USB ports are also included.

Hang on, how do I attach an expansion shelf to this? Well at launch, there will be four different mezzanine cards available to slot into one of the two expansion slots per controller. There will be two host connectivity cards available, one being a 4-port, 10/25Gb, RoCEv2, SFP28 card and the other being a 4-port, 32Gb Fibre Channel card leveraging SFP+. The second type of card available is for storage expansion: one is a 2-port, 100Gb Ethernet, RoCEv2, QSFP28 card for attaching up to one additional NS224 shelf, and the other being a 4-port, 12Gb SAS, mini-SAS HD card for attaching up to one additional DS224c shelf populated with SSDs. That’s right folks, this new platform will only support up to 48 storage devices, though in the AFF world, I don’t see this being a problem. Minimum configuration is 8 NVMe SSDs, max is 48 NVMe SSDs or 24 NVMe + 24 SAS SSDs, but you won’t be able to buy it with SAS SSDs. That compatibility is being included only for migrating off of or reusing an existing DS224x populated with SSDs. If that’s a DS2246, you’ll need to upgrade the IOM modules to 12GB prior to attachment.

Next up in the hardware announcement is the new FAS(?)…but why the question mark you ask? That’s because this “FAS” is all-flash. That’s right, the newest FAS to hit the streets is the FAS 500f. Now before I get into those details, I’d love to get into the speeds and feeds as I did above. The problem is that I would simply be repeating myself. This is the same box as the AFF A250, much like how the AFF A220 is the same box as the FAS27x0. The differences between the AFF 250 and the FAS500f are in the configurations and abilities or restrictions imposed upon it.

While most of the information above can be ⌘-C’d, ⌘-V’d here, this box does not support the connection of any SAS-based media. That fourth mez card I mentioned, the 4-port SAS one? Can’t have it. As for storage device options, much like Henry Ford’s famous quote:

Any customer can have a car painted any color that he wants so long as it is black.

-Henry Ford

Any customer can have any size NVMe drive they want in the FAS500f, so long as it’s a 15.3TB QLC. That’s right, not only are there no choices to be made here other than drive quantity, but those drives are QLC. On the topic of quantity, the available configurations start at a minimum 24 drives and can be grown to either 36 or 48, but that’s it. So why QLC? By now, you should be aware that the 10k/15k SAS drives we are so used to today for our tier 2 workloads are going away. In fact, the current largest spindle size of 1.8TB is slated to be the last drive size in this category. NetApp’s adoption of QLC media is a direct result of the sunsetting of this line of media. While I don’t expect to get into all of the differences between Single, Multi, Triple, Quad or Penta-level (SLC, MLC, TLC, QLC, or PLC) cell NAND memory in this post, the rule of thumb is the more levels, the lower the speed, reliability, and cost are. QLC is slated to be the replacement for 10k/15k SAS yet it is expected to perform better and only be slightly more expensive. In fact, the FAS500f is expected to be able to do 333,000 IOPS at 3.6ms of latency for 100% 8KB random read workloads or 170,000 IOPS at 2ms for OLTP workloads with a 40/60 r/w split.

Those are this Fall’s new platforms. If you have any questions put it in a comment or tweet at me, @ChrisMaki, I’d love to hear your thoughts on these new platforms. See you next week at INSIGHT 2020, virtual edition!

***UPDATE: After some discussion over on Reddit, it looks like MetroCluster IP will be available on this platform at launch.

Gartner’s new Magic Quadrant for Primary Storage

Hot off the presses is Gartner’s new Magic Quadrant (GMQ) for Primary Storage and it’s great to see NetApp at the top-right, right where I’d expect them to be. This is the first time Gartner has combined rankings for primary arrays and not separated out all-flash from spinning media and hybrid arrays, acknowledging that all-flash is no longer a novelty.

As you can see on the GMQ below, the x-axis represents completeness of vision while the y-axis measures ability to execute, NetApp being tied with Pure on X and leading on Y.

As mentioned, this new MQ marks the retiring of the previous divided GMQs of Solid-State Arrays and General-Purpose Disk Arrays. To read more about NetApp’s take on this new GMQ, head over to their blog post on the subject or request a copy of the report here.

There’s a new NVMe AFF in town!

Yesterday, NetApp announced a new addition to the midrange tier of their All-Flash FAS line, the AFF A320. With this announcement, end-to-end NVMe is now available in the midrange, from the host all the way to the NVMe SSD. This new platform is a svelte 2RU that supports up to two of the new NS224 NVMe SSD shelves, which are also 2RU. NetApp has set performance expectations to be in the ~100µs range.

Up to two PCIe cards per controller can be added, options are:

  • 4-port 32GB FC SFP+ fibre
  • 2-port 100GbE RoCEv2* QSFP28 fibre (40GbE supported)
  • 2-port 25GbE RoCEv2* SPF28 fibre
  • 4-port 10GbE SFP+ Cu and fibre
    *RoCE host-side NVMeoF support not yet available

A couple of important points to also note:

  • 200-240VAC required
  • DS, SAS-attached SSD shelves are NOT supported

An end-to-end NVMe solution obviously needs storage of some sort, so also announced today was the NS224 NVMe SSD Storage Shelf:

  • NVMe-based storage expansion shelf
  • 2RU, 24 storage SSDs
  • 400GB/s capable, 200Gb/sec per shelf module
  • Uplinked to controller via RoCEv2
  • Drive sizes available: 1.9TB, 3.8TB and 7.6TB. 15.3TB with restrictions.

Either controller in the A320 has eight 100GbE ports on-board, but not all of them are available for client-side connectivity. They are allocated as follows:

  • e0a → ClusterNet/HA
  • e0b → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0c → First NS224 connectivity
  • e0d → ClusterNet/HA
  • e0e → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0f → First NS224 connectivity
  • e0g → Client network, 100GbE or 40Gbe
  • e0h → Client network, 100GbE or 40Gbe

If you don’t get enough client connectivity with the on-board ports, then as listed previously, there are myriad PCIe options available to populate the two available slots. In addition to all that on-board connectivity, there’s also MicroUSB and RJ45 for serial console access as well as the RJ-45 Wrench port to host e0M and out-of-band management via BMC. As with most port-pairs, the 100GbE ports are hosted by a single ASIC which is capable of a total effective bandwidth of ~100Gb.

Food for thought…
One interesting design change in this HA pair, is that there is no backplane HA interconnect as has been the case historically; instead, the HA interconnect function is placed on the same connections as ClusterNet, e0a and e0d. This enables some interesting future design possibilities, like HA pairs in differing chassis. Also, of interest is the shelf connectivity being NVMe/RoCEv2; while currently connected directly to the controllers, what’s stopping NetApp from putting these on a switched fabric? Once they do that, drop the HA pair concept above, and instead have N+1 controllers on a ClusterNet fabric. Scaling, failovers and upgrades just got a lot more interesting.

ONTAP 9.6

UPDATE, MAY 17: RC1 is out, you can grab it here.

It’s my favourite time of year folks, yup it’s time for some new ONTAP feature announcements. It feels as though 9.6 is going to have quite the payload, so I’m not going to cover every little tid-bit, just the pieces that I’m excited about. For the full release notes, go here, NetApp SSO credentials required. Or, if you’re one of my customers feel free to email me for a meeting and we can go over this release in detail.

The first thing worth mentioning is that with 9.6, NetApp is dropping the whole LTS/STS thing and all releases going forward will be considered Long Term Service support. What this means is every release has three years of full support, plus two years of limited support.

The rest of the updates can be grouped into one three themes or highlights;

  1. Simplicity and Productivity
  2. Expanded customer use cases
  3. Security and Data Protection

Some of the Simplicity highlights are:

  • System Manager gets renamed to ONTAP System Manager and overhauled, now based on REST APIs with Python SDK available at GA
    • Expect a preview of a new dashboard in 9.6
  • Automatic Inactive Data Reporting for SSD aggregates
    • This tells you how much data you could tier to an object store, freeing up that valuable SSD storage space
  • FlexGroup volume management has gotten simpler with the ability to shrink them, rename them and MetroCluster support
  • Cluster setup has gotten even easier with automatic node discovery
  • Adaptive QoS support for NVMe/FC (maximums) and ONTAP Select (minimums)

Here’s what the System Manager dashboard currently looks like:

And here’s what we can look forward to in 9.6

The Network Topology Visualization is very interesting, I’m looking forward to seeing how in-depth it gets.

Expanded Customer Use Cases

  • NVMe over FC gets more host support; it now includes VMware ESXi, Windows 2012/2016, Oracle Linux, RedHat Linux and Suse Linux.
  • FabricPools improvements:
    • Gains support for two more hyperscalers: Google Cloud and Alibaba Cloud
    • The Backup policy is gone replaced with a new All policy, great for importing known-cold data directly to the cloud
    • Inactive Data Reporting is now on by default for SSD aggregates and is viewable in ONTAP System Manager – Use this to determine how much data you could tier.
    • FabricPool aggregates can now store twice as much data
    • SVM-DR support
    • Volume move – Can now be done without re-ingesting the cloud tier, moves the meta data and hot data only
  • FlexGroup Volume Improvements:
    • Elastic sizing to automatically protect against one constituent member filling up and returning an error to the client
    • MetroCluster support, both FC and IP MetroCluster
    • Volume rename now trivial
    • Volume size reduction now availble
    • SMB Continuous Availability (CA) file share support
  • FlexCache Improvements:
    • Caching to and from Cloud Volumes ONTAP
    • End-to-end data encryption
    • Max cached volumes per node increased to 100 from 10
    • Soft and hard quota (tree) on origin volume enforced on cached volume
    • fpolicy support

Security and Data Protection

  • Over-the-wire encryption for SnapMirror
    • Coupled with at-rest encryption, data can now be encrypted end-to-end
  • SnapMirror Synchronous now supports
    • NFSv4, SMB 2 & 3 and mixed NFSv3/SMB volumes
    • This is in addition to existing support for FCP, iSCSI and NFSv3
  • NetApp Aggregate Encryption (NAE)
    • This can be seen as an evolution of NetApp Volume Encryption (NVE), all volumes in the aggregate share the same key.
    • Deduplication across volumes in the aggregate is supported for added space savings
  • Multi-tenant Key Management for Data At-Rest Encryption
    • Each tenant SVM can be configured with it’s on key management servers
  • Neighbour tenants are unaffected by each others’ encryption actions and much maintain control of their own keys
    • This is an added license
  • MetroCluster IP Updates
    • Support for entry AFF and FAS systems!
      • Personally I think this one is a game-changer and will really drive MetroCluster adoption now that the barrier to entry is so low
    • AFF A220 and FAS2750 and newer only

And that is most of the new enhancements of features appearing in 9.6; 9.6RC1 is expected around the second half of May, GA typically comes about six weeks later. You can bet that I’ll have it running in my lab the day it comes out.

ONTAP 9.5

UPDATE: 9.5RC1 is now out and you can grab it here.

It’s that time of year again, time for NetApp’s annual technical conference, Insight. This also means that a Long-Term Support (LTS) release of ONTAP is due, this time it’s 9.5. As I write this, I am sitting in the boarding lounge of YVR, waiting for my flight to Las Vegas for NetApp Insight and I see the Release Candidate (RC) for 9.5 is not out quite yet, but I do have the list of new features for you nonetheless.

The primary new features of 9.5 are:

  • New FlexCache accelerates performance for key workloads with read caching across a cluster and at remote sites.
  • SnapMirror Synchronous protects critical applications with synchronous replication
  • MetroCluster-IP enhancements reduce cost for business continuity: 700km between sites; support midrange systems (A300/FAS8200)
  • FabricPool now supports automated cloud tiering for FlexGroup volumes

Now, let’s dig into each one of these new features a bit.

FlexCache: FlexCache makes its return in 9.5 and provides the ability to cache hot blocks, user data and meta data on a more performant tier while the bulk of the data sits in a volume elsewhere in the cluster or even on a remote cluster. FlexCache can enable you to provide lower read latency while not having to store the bulk of your data on the same tier. At this time, only NFSv3 is supported though the source volume can be on AFF, FAS or ONTAP Select. While the volume you access is a FlexGroup volume, the source volume itself cannot be a FlexGroup but rather must be a FlexVol. An additional license is required.

SnapMirror Synchronous: SM-S also makes a long-awaited return to ONTAP allowing you to provide a recovery point objective (RPO) of zero and very low recovery time objective (RTO). FC, iSCSI and NFSv3 only at this time and your network must have a maximum roundtrip latency of no more than 10ms, FlexGroup volumes not supported. An additional license is required.

MetroCluster-IP (MC-IP): NetApp continues to add value to the mid-range of appliances by bringing MC-IP support to both the AFF A300 as well as the FAS8200. At the same time, NetApp has increased the maximum distance to 700km, provided your application can tolerate up to 10ms of write acknowledgement latency.

FabricPool: Previously hampered by the need to tier volumes greater than 100TiB? Now that FabricPool supports FlexGroups, you are in luck. Also supported in 9.5 is end-to-end encryption of data stored in FabricPool volumes using only one encryption key. Lastly, up until now, data would only migrate to your capacity tier once your FabricPool aggregate reached a fullness of 50%, this parameter is now adjustable though 50% remains the default.

While those are the primary features included in this latest payload, existing features continue to gain refinement, especially in the realm of storage efficiency. Specifically, around logical space consumption reporting, useful for service providers. Also, adaptive compression is now applied when 8KB compression groups (CG) are <50% compressible, allowing CG’s to be compacted together. Databases will see the most benefit here, typical aggregate savings in the 10-15% range. Finally, provided you have provisioned your storage using System Manager’s application provisioning, adaptive compression will be optimized for the database being deployed; Oracle, SQL Server or MongoDB.

That’s all for now, if you want more details come find me at NetApp Insight on the show floor near the Social Media Hub or at my Birds of a Feather session, Monday at 11:15am where myself and other NetApp A-Team members will discuss the Next Generation Data Centre.

Raw AutoSupport, tried and true – sysconfig

While NetApp keeps improving the front end that is ActiveIQ, for both pre-sales and support purposes, I constantly find myself going into the Classic AutoSupport and accessing the raw autosupport data; most often it’s sysconfig -a. Recently I was trying to explain the contents to a co-worker and I realized that I should just document it as a blog post. So here is sysconfig explained.

The command sysconfig -a is the old 7-mode command to give you all the hardware information from the point of view of ONTAP. All the onboard ports are assigned to “slot 0” whereas slot 1-X are the physical PCIe slots where myriad cards can be inserted. Here’s one example, I’ll insert comments as I feel it is appropriate. Continue reading