Tuesday, April 22, 2014

NetApp Patents Offer Glimpse into Cache IQ Internals

The United States Patent and Trademark Office (USPTO) recently published several patent applications that reveal new details about the NetApp-acquired Cache IQ product.

Named “RapidCache”, this inline network-attached storage (NAS) appliance accelerates NFS performance by intelligently finding and placing the active dataset of hosts into DRAM and SSD to drastically improve performance.

For readers unfamiliar with the architecture of the RapidCache appliance, read this article from November 2012.

New Details
Several new details have emerged from these new patent filings.

First, the RapidCache appliance can be transparently “spliced” into a network; interestingly, it doesn’t splice all of the connection parameters in Layer 2, 3, and 4 – only the connection state and source/destination sequence numbers in Layer 4. This enables clients to perceive as if they are communicating with a NAS system when, in fact, they are being served via cache.

The patents further details the seamless nature of inserting the RapidCache appliance onto a network by noting that there is no need for clients to be unmounted (from NFS exports). It further states that no configuration changes are required for either storage systems or clients.

Operating System
As previously known, the RapidCache operating system (named: “IQ OS”) is a heavily customized FreeBSD kernel. Not surprisingly, several FreeBSD components are mentioned throughout the patent filling, including:

  • GEOM framework (so the storage tier can access SSDs)
  • CAM subsystem (for different SCSI devices to use the same interface)
  • FreeBSD HBA drivers (manages HBA controller, I/O submission)

Let’s now review other features built into the appliance.

Other Features
Several details about cache policies have also materialized. One rather useful feature of the RapidCache software involves activating and/or disabling cache policies based upon a time schedule. Read the patent application entitled: “DYNAMIC DETECTION AND SELECTION OF FILE SERVERS IN A CACHING APPLICATION OR SYSTEM” for more information on this feature.

Another unique aspect of this product is Multi-Path Support (MPS). This validation mechanism is important as data often changes due to volume attributes being updated, older files being restored, or even if a client directly updates files on the storage system:

“When a client reads a file, MPS evaluates its cache lease time to determine whether it needs to check file server attributes. If not expired, the read will be served immediately from cache. If expired, MPS checks the backend file server to confirm no changes have occurred. If changes are found, MPS will pull the data from the file server, send it to the client, reset its lease, and update the cache. With regular activity, file leases should rarely expire since they are updated on most NFS operations. Expiration only occurs on idle files. MPS timeout can be configured from, for example, a minimum (e.g., 3 seconds) to a maximum (e.g., 24 hours).”

The patent fillings also reveal that the Cache IQ software leverages a Ranked Priority Multi-Queue (RPMQ) replacement algorithm that: “balances access frequency with customer-defined priority values”. This is implemented via two sets of queues: global queues (ordered by access frequency) and per-priority shadow queues (to choose which block to evict first).

Another interesting topic is described as a “greedy threshold”. The patent entitled: “SYSTEM AND METHOD FOR OPERATING A SYSTEM TO CACHE A NETWORKED FILE SYSTEM” best describes this feature:

“...the RPMQ caching algorithm can determine the appropriate rank for the cacheable entity. Each rank may contain a total capacity as well as a “greedy threshold.” The greedy threshold determines the amount of space within the rank that can be filled by cacheable entities that have not earned their position in the rank due to frequency of access. The purpose of the greedy threshold is to allow higher ranks with available space to partially fill even if there are not enough cacheable entities to fill the rank based off their natural queue position.”

But what happens when there is a RapidCache failure?

Failure Scenarios 
Without any reconfiguration, the RapidCache appliance automatically becomes a pass-through between the client and NAS system during failure scenarios. It is also possible for an administrator to place a data server node into manual by-pass mode; this is useful for “data server” maintenance without downtime.

The patent also reveals that the solid-state drives within the RapidCache appliance are “treated as an independent virtual tier, without RAID”. It goes on to state that:

“In the event of a failed SSD, the overall cache size will shrink only by the missing SSD. The previously cached data will be retrieved from the file server (as requested) and stored on available media per policy.”

Several failure scenarios explored in the patent application entitled: “SYSTEM AND METHOD FOR MANAGING A SYSTEM OF APPLIANCES THAT ARE ATTACHED TO A NETWORKED FILE SYSTEM”. These scenarios include sequence of events when the interconnect bus is unplugged, flow director failure, and how cache coherency is maintained are among the primary topics.

NetApp filled these patent applications on September 18, 2013; they were subsequently published on March 20, 2014. Visit the USPTO website to read the patents in their entirety.

Sunday, April 13, 2014

NetApp Confirms ‘Heartbleed’ Bug In Certain Products

NetApp last week released an advisory confirming that seven of its current products are vulnerable to the widely publicized security flaw known as the “Heartbleed” bug. These vulnerable products are:

  • Antivirus Connector for Clustered Data ONTAP
  • NetApp Manageability SDK (5.0P1 and later)
  • OnCommand Unified Manager Core Package (5.x only)
  • OnCommand Workflow Automation (2.2RC1 and later)
  • SMI-S Agent for Data ONTAP
  • SMI-S Providers for E-Series
  • SnapProtect (10.0 and service packs)

The Heartbleed Bug is serious security vulnerability in OpenSSL 1.0.1 releases prior to 1.0.1g, which allow remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read. This is due to a missing bounds check in the handling of the Transport Layer Security (TLS) heartbeat extension packets.

Until software fixes are issued for the affected products, NetApp recommends implementing Intrusion Prevention Systems (IPS) and Intrusion Detection Systems (IDS) products available from third parties to stop an attack.

NetApp will continue to update their advisory, entitled “NTAP-20140410-heartbleed”, as more information becomes available.

Tuesday, March 25, 2014

NetApp to Recommend Increase to Cluster Port Count

With the recent debut of release candidates for clustered Data ONTAP 8.2.1, NetApp now recommends that customers running FAS6280, 6290, 8040, and 8060 systems use all four onboard ports for the cluster-interconnect. While this is not a requirement, the additional interconnect ports are necessary to reach peak performance for "remote workloads" -- that is, when a logical interface (LIF) home port serves data from a node that is different than the node which actually owns the data.

This recommendation means high-throughput applications (such as animation, rendering, or computer-aided design) will leverage the cluster interconnect more effectively during large sequential remote reads. It is expected that future releases of clustered Data ONTAP will provide further interconnect performance increases for additional workloads over time.

Even with clustered Data ONTAP 8.2.1, setting up a switched cluster will still default to using 2 cluster interfaces; however, it is possible to override the settings to 4 interfaces. Once configured, clustered Data ONTAP components (SpinNP, CSM, etc.) will automatically load-balance -- just as they have done for many years now.

It is also possible to reconfigure an existing node from 2 to 4 cluster interfaces.

Clustered Data ONTAP 8.2.1 RC2 is now available for download from the NetApp Support Site.