Author Archives: chrismaki

Gartner’s new Magic Quadrant for Primary Storage

Hot off the presses is Gartner’s new Magic Quadrant (GMQ) for Primary Storage and it’s great to see NetApp at the top-right, right where I’d expect them to be. This is the first time Gartner has combined rankings for primary arrays and not separated out all-flash from spinning media and hybrid arrays, acknowledging that all-flash is no longer a novelty.

As you can see on the GMQ below, the x-axis represents completeness of vision while the y-axis measures ability to execute, NetApp being tied with Pure on X and leading on Y.

As mentioned, this new MQ marks the retiring of the previous divided GMQs of Solid-State Arrays and General-Purpose Disk Arrays. To read more about NetApp’s take on this new GMQ, head over to their blog post on the subject or request a copy of the report here.

There’s a new NVMe AFF in town!

Yesterday, NetApp announced a new addition to the midrange tier of their All-Flash FAS line, the AFF A320. With this announcement, end-to-end NVMe is now available in the midrange, from the host all the way to the NVMe SSD. This new platform is a svelte 2RU that supports up to two of the new NS224 NVMe SSD shelves, which are also 2RU. NetApp has set performance expectations to be in the ~100µs range.

Up to two PCIe cards per controller can be added, options are:

  • 4-port 32GB FC SFP+ fibre
  • 2-port 100GbE RoCEv2* QSFP28 fibre (40GbE supported)
  • 2-port 25GbE RoCEv2* SPF28 fibre
  • 4-port 10GbE SFP+ Cu and fibre
    *RoCE host-side NVMeoF support not yet available

A couple of important points to also note:

  • 200-240VAC required
  • DS, SAS-attached SSD shelves are NOT supported

An end-to-end NVMe solution obviously needs storage of some sort, so also announced today was the NS224 NVMe SSD Storage Shelf:

  • NVMe-based storage expansion shelf
  • 2RU, 24 storage SSDs
  • 400GB/s capable, 200Gb/sec per shelf module
  • Uplinked to controller via RoCEv2
  • Drive sizes available: 1.9TB, 3.8TB and 7.6TB. 15.3TB with restrictions.

Either controller in the A320 has eight 100GbE ports on-board, but not all of them are available for client-side connectivity. They are allocated as follows:

  • e0a → ClusterNet/HA
  • e0b → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0c → First NS224 connectivity
  • e0d → ClusterNet/HA
  • e0e → Second NS224 connectivity by default, or can be configured for client access, 100GbE or 40GbE
  • e0f → First NS224 connectivity
  • e0g → Client network, 100GbE or 40Gbe
  • e0h → Client network, 100GbE or 40Gbe

If you don’t get enough client connectivity with the on-board ports, then as listed previously, there are myriad PCIe options available to populate the two available slots. In addition to all that on-board connectivity, there’s also MicroUSB and RJ45 for serial console access as well as the RJ-45 Wrench port to host e0M and out-of-band management via BMC. As with most port-pairs, the 100GbE ports are hosted by a single ASIC which is capable of a total effective bandwidth of ~100Gb.

Food for thought…
One interesting design change in this HA pair, is that there is no backplane HA interconnect as has been the case historically; instead, the HA interconnect function is placed on the same connections as ClusterNet, e0a and e0d. This enables some interesting future design possibilities, like HA pairs in differing chassis. Also, of interest is the shelf connectivity being NVMe/RoCEv2; while currently connected directly to the controllers, what’s stopping NetApp from putting these on a switched fabric? Once they do that, drop the HA pair concept above, and instead have N+1 controllers on a ClusterNet fabric. Scaling, failovers and upgrades just got a lot more interesting.

ONTAP 9.6

UPDATE, MAY 17: RC1 is out, you can grab it here.

It’s my favourite time of year folks, yup it’s time for some new ONTAP feature announcements. It feels as though 9.6 is going to have quite the payload, so I’m not going to cover every little tid-bit, just the pieces that I’m excited about. For the full release notes, go here, NetApp SSO credentials required. Or, if you’re one of my customers feel free to email me for a meeting and we can go over this release in detail.

The first thing worth mentioning is that with 9.6, NetApp is dropping the whole LTS/STS thing and all releases going forward will be considered Long Term Service support. What this means is every release has three years of full support, plus two years of limited support.

The rest of the updates can be grouped into one three themes or highlights;

  1. Simplicity and Productivity
  2. Expanded customer use cases
  3. Security and Data Protection

Some of the Simplicity highlights are:

  • System Manager gets renamed to ONTAP System Manager and overhauled, now based on REST APIs with Python SDK available at GA
    • Expect a preview of a new dashboard in 9.6
  • Automatic Inactive Data Reporting for SSD aggregates
    • This tells you how much data you could tier to an object store, freeing up that valuable SSD storage space
  • FlexGroup volume management has gotten simpler with the ability to shrink them, rename them and MetroCluster support
  • Cluster setup has gotten even easier with automatic node discovery
  • Adaptive QoS support for NVMe/FC (maximums) and ONTAP Select (minimums)

Here’s what the System Manager dashboard currently looks like:

And here’s what we can look forward to in 9.6

The Network Topology Visualization is very interesting, I’m looking forward to seeing how in-depth it gets.

Expanded Customer Use Cases

  • NVMe over FC gets more host support; it now includes VMware ESXi, Windows 2012/2016, Oracle Linux, RedHat Linux and Suse Linux.
  • FabricPools improvements:
    • Gains support for two more hyperscalers: Google Cloud and Alibaba Cloud
    • The Backup policy is gone replaced with a new All policy, great for importing known-cold data directly to the cloud
    • Inactive Data Reporting is now on by default for SSD aggregates and is viewable in ONTAP System Manager – Use this to determine how much data you could tier.
    • FabricPool aggregates can now store twice as much data
    • SVM-DR support
    • Volume move – Can now be done without re-ingesting the cloud tier, moves the meta data and hot data only
  • FlexGroup Volume Improvements:
    • Elastic sizing to automatically protect against one constituent member filling up and returning an error to the client
    • MetroCluster support, both FC and IP MetroCluster
    • Volume rename now trivial
    • Volume size reduction now availble
    • SMB Continuous Availability (CA) file share support
  • FlexCache Improvements:
    • Caching to and from Cloud Volumes ONTAP
    • End-to-end data encryption
    • Max cached volumes per node increased to 100 from 10
    • Soft and hard quota (tree) on origin volume enforced on cached volume
    • fpolicy support

Security and Data Protection

  • Over-the-wire encryption for SnapMirror
    • Coupled with at-rest encryption, data can now be encrypted end-to-end
  • SnapMirror Synchronous now supports
    • NFSv4, SMB 2 & 3 and mixed NFSv3/SMB volumes
    • This is in addition to existing support for FCP, iSCSI and NFSv3
  • NetApp Aggregate Encryption (NAE)
    • This can be seen as an evolution of NetApp Volume Encryption (NVE), all volumes in the aggregate share the same key.
    • Deduplication across volumes in the aggregate is supported for added space savings
  • Multi-tenant Key Management for Data At-Rest Encryption
    • Each tenant SVM can be configured with it’s on key management servers
  • Neighbour tenants are unaffected by each others’ encryption actions and much maintain control of their own keys
    • This is an added license
  • MetroCluster IP Updates
    • Support for entry AFF and FAS systems!
      • Personally I think this one is a game-changer and will really drive MetroCluster adoption now that the barrier to entry is so low
    • AFF A220 and FAS2750 and newer only

And that is most of the new enhancements of features appearing in 9.6; 9.6RC1 is expected around the second half of May, GA typically comes about six weeks later. You can bet that I’ll have it running in my lab the day it comes out.

ONTAP 9.5

UPDATE: 9.5RC1 is now out and you can grab it here.

It’s that time of year again, time for NetApp’s annual technical conference, Insight. This also means that a Long-Term Support (LTS) release of ONTAP is due, this time it’s 9.5. As I write this, I am sitting in the boarding lounge of YVR, waiting for my flight to Las Vegas for NetApp Insight and I see the Release Candidate (RC) for 9.5 is not out quite yet, but I do have the list of new features for you nonetheless.

The primary new features of 9.5 are:

  • New FlexCache accelerates performance for key workloads with read caching across a cluster and at remote sites.
  • SnapMirror Synchronous protects critical applications with synchronous replication
  • MetroCluster-IP enhancements reduce cost for business continuity: 700km between sites; support midrange systems (A300/FAS8200)
  • FabricPool now supports automated cloud tiering for FlexGroup volumes

Now, let’s dig into each one of these new features a bit.

FlexCache: FlexCache makes its return in 9.5 and provides the ability to cache hot blocks, user data and meta data on a more performant tier while the bulk of the data sits in a volume elsewhere in the cluster or even on a remote cluster. FlexCache can enable you to provide lower read latency while not having to store the bulk of your data on the same tier. At this time, only NFSv3 is supported though the source volume can be on AFF, FAS or ONTAP Select. While the volume you access is a FlexGroup volume, the source volume itself cannot be a FlexGroup but rather must be a FlexVol. An additional license is required.

SnapMirror Synchronous: SM-S also makes a long-awaited return to ONTAP allowing you to provide a recovery point objective (RPO) of zero and very low recovery time objective (RTO). FC, iSCSI and NFSv3 only at this time and your network must have a maximum roundtrip latency of no more than 10ms, FlexGroup volumes not supported. An additional license is required.

MetroCluster-IP (MC-IP): NetApp continues to add value to the mid-range of appliances by bringing MC-IP support to both the AFF A300 as well as the FAS8200. At the same time, NetApp has increased the maximum distance to 700km, provided your application can tolerate up to 10ms of write acknowledgement latency.

FabricPool: Previously hampered by the need to tier volumes greater than 100TiB? Now that FabricPool supports FlexGroups, you are in luck. Also supported in 9.5 is end-to-end encryption of data stored in FabricPool volumes using only one encryption key. Lastly, up until now, data would only migrate to your capacity tier once your FabricPool aggregate reached a fullness of 50%, this parameter is now adjustable though 50% remains the default.

While those are the primary features included in this latest payload, existing features continue to gain refinement, especially in the realm of storage efficiency. Specifically, around logical space consumption reporting, useful for service providers. Also, adaptive compression is now applied when 8KB compression groups (CG) are <50% compressible, allowing CG’s to be compacted together. Databases will see the most benefit here, typical aggregate savings in the 10-15% range. Finally, provided you have provisioned your storage using System Manager’s application provisioning, adaptive compression will be optimized for the database being deployed; Oracle, SQL Server or MongoDB.

That’s all for now, if you want more details come find me at NetApp Insight on the show floor near the Social Media Hub or at my Birds of a Feather session, Monday at 11:15am where myself and other NetApp A-Team members will discuss the Next Generation Data Centre.

NetApp HCI Update

As NetApp continues to make its mark on and help define the Next Generation Data Centre, the need for more node types of their HCI offering has become apparent and they are responding in kind.

First up, staying current by using the latest generation of Intel Skylake processors in the new nodes is a given; as well as offering myriad combinations of both CPU and memory while maintaining interoperability with the current generation of HCI nodes.

First up, are a raft of new compute nodes, some of which are optimized around core count which you can use to satisfy various licensing models.

 

Model # Processor Memory
H410C-14020 2 x Xeon Silver 4110
(8 core @ 2.1GHz)
384 GB
H410C-15020 512 GB
H410C-17020 768 GB
H410C-25020 2 x Xeon Gold 5120
(14 core @ 2.2GHz)
512 GB
H410C-27020 768 GB
H410C-28020 1 TB
H410C-35020 2 x Xeon Gold 5122
(4 core @ 3.6GHz)
512 GB
H410C-37020 768 GB
H410C-57020 2 x Xeon Gold 6138
(20 core @ 2.0GHz)
768 GB
H410C-58020 1 TB

 

Next up, the much-requested GPU accelerated compute nodes have been announced, optimized for Windows 10 VDI deployments. This one moves away from the 2 RU chassis with 4 compute nodes and is one 2 RU server in itself consisting of:

  • 2 x NVIDIA Tesla M10 GPUs
  • An Intel Skylake Xeon 6130 (16 cores @ 2.1GHz)
  • 512MB RAM

On to the networking-side of things, your concerns have been heard. NetApp will soon begin offering their H-Series switch, the Mellanox SN2010 to help complete your HCI build-outs. This switch is a paltry 1RU, half-width consisting of 18 SFP+/28 ports with optional cable and transceiver bundles. Support for this switch will be NetApp-direct, so no worries around cross-vendor finger pointing.

Keeping in the network mindset, NetApp is making things simpler by reducing the required network port count and associated infrastructure by 40%. HCI compute nodes now only require two SFP28 connections, down from four, vSphere distributed switch is a requirement.

Tied closely to NetApp’s HCI offering is their Solidfire storage whose latest release, version 11, provides some great new features. Version 11 brings to the table the ability to SnapMirror to ONTAP Cloud, IPv6 management network, 16TB maximum volume size and protection domains. This last feature helps protect your HCI deployments against chassis failure by automatically detecting HCI chassis and node configuration. Solidfire’s double-helix data layout ensures that secondary blocks span domains.

All the above should allow you to build a truly Next Generation Data Centre for your employer or your customers.

Raw AutoSupport, tried and true – sysconfig

While NetApp keeps improving the front end that is ActiveIQ, for both pre-sales and support purposes, I constantly find myself going into the Classic AutoSupport and accessing the raw autosupport data; most often it’s sysconfig -a. Recently I was trying to explain the contents to a co-worker and I realized that I should just document it as a blog post. So here is sysconfig explained.

The command sysconfig -a is the old 7-mode command to give you all the hardware information from the point of view of ONTAP. All the onboard ports are assigned to “slot 0” whereas slot 1-X are the physical PCIe slots where myriad cards can be inserted. Here’s one example, I’ll insert comments as I feel it is appropriate. Continue reading

ONTAP 9.4 – Improvements and Additions

While the actual payload hasn’t hit the street yet, here’s what I can tell you about the latest release in the ONTAP 9 family which should be available here any day now. **EDIT: RC1 is here. 9.4 went GA today.

FabricPool

Lots of improvements to ONTAP’s object-tiering code in this release, it appears they’re really pushing development here:

  • Support for Azure Blob, both hot and cool tiers, no archive tier support
    • This adds to the already supported AWS-S3 and StorageGRID Webscale object stores
  • Support for cold-data tiering policies, whereas in 9.2,9.3 it was backup and snapshot-only tiering policies
    • Default definition of cold data is 31 days but can be adjust to anywhere from 2-63 days.
    • Not all cold blocks need to be made hot again, such as snapshot-only blocks. Random reads will be considered application access, declared hot and written back into performance tier whereas sequential reads are assumed to be indexers, virus scanners or other and should be kept cold and therefore will not be written back into performance tier.
  • Now supported in ONTAP Select, in addition to the existing ONTAP and ONTAP Cloud. Wherever you run ONTAP, you can now run FabricPools, SSD aggregate caveat still exists.
  • Inactive Data Reporting by OnCommand System Manager to determine how much data would be tiered if FabricPools were implemented.
    • This one will be key to clients thinking about adopting FabricPools
  • Object Store Profiler is a new tool in ONTAP that will test the performance of the object store you’re thinking of attaching so you don’t have to dive in without knowing what your expected performance should be.
  • Object Defragmentation now optimizes your capacity tier by reclaiming space that is no longer referenced by the performance tier
  • Compaction comes to FabricPools ensuring that your write stripes are full as well as applying Compression, Deduplication

Continue reading

ONTAP 9.3 is out soon, here’s the details you need.

It’s that time of year again, time for an ONTAP release…or at least an announcement. When 9.3 drops, not only will it be an LTS (Long Term Support) version, but NetApp continues to refine and enhance ONTAP.

Simplifying operations:

  • Application-Aware, data management for MongoDB
  • Adaptive QoS
  • Guide cluster setup and expansion
  • simplified data protection setup, much simpler.

Efficiencies:

Not so long ago, in ONTAP 9.2, NetApp introduced inline, aggregate-level dedupe. What many people may not have realized, due to the nature of way ONTAP coalesces writes in NVRAM prior to flushing them to the durable layer is that this inline aggregate dedupe’s domain was restricted to the data in the NVRAM. With 9.3, a post-process aggregate scanner has been implemented to provide true, aggregate-level dedupe.

Continue reading

ONTAP 9.2RC1 is out

Continuing with their new standard six month release cadence, ONTAP 9.2RC1 was released today and I continue to be impressed with the feature payload NetApp has been delivering with each new release; here are the highlights:

  • FabricPools
    • Automated cloud-tiering of NetApp Snapshots to a target that speaks S3 (AWS or StorageGrid)
  • QoS Floors or minimums
    • Allows you to reserve performance for critical workloads, SAN on AFF only.
  • Efficiencies:
    • Inline dedupe at the aggregate level
      • Previously volume-only
      • Not applicable to post-process dedupe
      • AFF only
    • Advanced Disk Partitioning now available on FAS8xxx and FAS9xxx, for the first 48 drives.
  • NetApp Volume Encryption
    • Now available for FlexGroups

Here’s the list of supported platforms for ONTAP 9.2:

AFF FAS
AFF A700s FAS9000
AFF A700 FAS8200
AFF A300 FAS2650
AFF A200 FAS2620
AFF80x0 FAS80x0
FAS25xx

ONTAP Select, the software-only version of ONTAP gets a few nice improvements as well:

  • FlexGroups
  • 2-Node HA
  • ESX Robo licensing (This is a big deal for me and my customers)

Read the full release notes here.

What you need to know about NetApp’s 40GbE options

­With the introduction of the new NetApp platforms back in September 2016, came 40GbE as well as 32Gb Fibre Channel connectivity.

I had my first taste of 40GbE on the NetApp side back in January when I got to install the first All Flash FAS A700 in Canada. The client requested a mix of 40GbE and 16Gb FC with some of the 40GbE being broken out into 4 × 10GbE interfaces and some being used natively.

NetApp is deploying two flavours of 40GbE cards: the X1144A for the AFF A300, AFF A700s and FAS8200, and the X91440A for the AFF A700 and FAS9000 storage systems. At first glance, you might be tempted to assume that those are the same PCIe card since the part numbers are very similar (the latter just being in some sort of carrier to satisfy the I/O module requirement for the blade-style chassis that is home to the A700 and FAS9000), Upon further inspection the two are not exactly equal.

The ports on most PCIe cards and onboard interfaces are deployed in pairs, with one shared application-specific integrated circuit (ASIC) on the board behind the physical ports. On the X1144A, both external ports share one ASIC with an available combined bandwidth of 40Gb/s, whereas the X91440A has two ASICs. Each has two ports, but one is internal and not connected to anything, giving you 40Gb/s per external port.

The ASIC (or controller) in question is the Intel XL710. What’s important about this is that both external ports on an X91440A can be broken out to 4 × 10GbE interfaces for a total of eight, or one can remain at 40GbE while the other is broken out. On the X1144A however, you can either connect both ports to your switch using 40GbE connections or you can break-out port A to 4 × 10GbE and port B gets disabled. According to Intel, if you connect both ports via 40GbE, “The total throughput supported by the 710 series is 40 Gb/s, even when connected via two 40 Gb/s connections.”

Now before we get all up in arms about this, lets really get into the weeds here. Both the FAS8200/FAS9000 and the AFF A300/700 are using PCIe 3.0. Each PCIe 3.0 lane can carry 8 Gigatransfers per second (GT/s). For the purposes of this post, that is close enough to 8Gb/s. The FAS8200/AFF A300 has an Intel D-1587 CPU with a maximum eight lanes per slot, so roughly 64Gb/s of throughput, whereas the FAS9000/AFF A700 has an Intel E5-2697 with a maximum 16 lanes per I/O slot which gives it about 128Gb/s of throughput. So even if NetApp included a network interface card for the A300/FAS8200 with two XL710’s on it, the PCIe slot it’s connected to couldn’t provide 80Gb/s of throughput, whereas the the I/O modules in the A700/FAS9000 can.

Say you want to change between 40GbE and 10GbE. Unlike modifying UTA2 profiles (as explained here), with the XL710, you need to get into maintenance mode first and use the nicadmin command. Here’s an example:

sysconfig output before:

slot 1: 40 Gigabit Ethernet Controller XL710 QSFP+
                 e1a MAC Address:    00:a0:98:c5:b2:fb (auto-40g_cr4-fd-up)
                 e1e MAC Address:    00:a0:98:c5:b2:ff (auto-unknown-down)

At this point I already had the breakout cable installed. That’s why the second link shows as down.

Conversion example:

*> nicadmin
 nicadmin convert -m { 40G | 10G } <port-name>
 
 
 *> nicadmin convert -m 10g e1e
 Converting e1e 40G port to four 10G ports
 Halt, install/change the cable, and then power-cycle the node for
 the conversion to take effect.  Depending on the hardware model,
 the SP (Service Processor) or BMC (Baseboard Management Controller)
 can be used to power-cycle the node.

sysconfig output after:

slot 1: 40 Gigabit Ethernet Controller XL710 QSFP+
                 e1a MAC Address:    00:a0:98:c5:b2:fb (auto-40g_cr4-fd-up)
                 e1e MAC Address:    00:a0:98:c5:b2:ff (auto-10g_twinax-fd-up)
                 e1f MAC Address:    00:a0:98:c5:b3:00 (auto-10g_twinax-fd-up)
                 e1g MAC Address:    00:a0:98:c5:b3:01 (auto-10g_twinax-fd-up)
                 e1h MAC Address:    00:a0:98:c5:b3:02 (auto-10g_twinax-fd-up)

Unfortunately I don’t have access to either a FAS8200 nor an AFF A300 with 40GbE otherwise I’d provide the sysconfig output before and after there as well.

Now, there’s a bit of a debate going on around the viability of 40GbE over 100GbE. While 40GbE is simply a combined 4 × 10GbE; 100GbE is only a combined 4 × 25GbE. With regards to production costs, apparently to make a 40GbE QSFP+, you literally combine 4 lasers (hence the Q in QSFP) into the module; well, the same goes for 100GbE. You only need one laser to produce the wavelength for 25GbE, and while that still means you need four for 100GbE, four times the production cost still yields 250% of the throughput of 40GbE which makes me wonder where it will end up in a year.

So there you go, more than you ever wanted to know about NetApp’s recent addition of 40GbE into the ONTAP line of products as well as my personal philosophical waxing around the 40 versus 100 GbE debate.