I’ve previously written about this Ethernet controller back when 40GbE Ethernet was relatively new to NetApp’s FAS and AFF controllers. Since that article, I’ve started to come across various oddities with this Ethernet controller.
Last Fall, I had a customer who was experiencing problems with LACP during an ONTAP upgrade (9.1 → 9.3 → 9.5P6) on their AFF A700s using the X1144A, dual port 40GbE card, which uses the Intel X710 Ethernet controller. We had the first 40GbE port broken out into 4x10GbE links, 2-each to either half of a pair of Cisco Nexus N9K-C9396PX in the same vPC Domain. During a controller reboot, we noticed that the interface group using multimode_lacp, most or all of the ports wouldn’t come up and on the Cisco-side, the port(s) would become disabled due to too many link up/down events. Immediately we wanted to look at potential cable problems but quickly dismissed that idea as well. After some digging, it looked as though NetApp was referencing Cisco Bug ID CSCuv87644 as potentially related. This led me down a long path of investigating the changes made to the networking stack in ONTAP over the past couple of years, and I’ve still got a post I’m working on around that. The workaround was to increase the debounce timer value on the Cisco 9k to 525ms, the default value is 100ms.
The port debounce time is the amount of time that an interface waits to notify the supervisor of a link going down. During this time, the interface waits to see if the link comes back up. The wait period is a time when traffic is stopped.
Source: Cisco
Recently, a different customer of mine was trying to buy a Nimble HF20 and they wanted to include the Q8C17B, a four port, 10GbE NIC, also based on the Intel X710 Ethernet controller. The vendor came back to me and said they needed to know if the customer was going to be using VLAN tagging on the Q8C17B, because if they needed VLAN tagging, they’d have to choose a two port NIC instead. This confused me, but after some emails back and forth, HPE Nimble Storage Alert # EXT-0061 was referenced as the reason for this. At some point Nimble will release a patch that updates the firmware on this NIC, hopefully bringing back VLAN functionality. A bit of looking around, and the same VLAN issue has been identified by VMware in KB2149781.
Lastly, I also came across a NIST vulnerability from 2017 regarding the same Ethernet controller, it seems that has since been addressed in a firmware update.
While the above doesn’t necessarily imply a huge problem with the X710, I simply found it interesting and thought I’d include them all in one post.