Portal   Forum   Members   Market   Gallery   Events

SAS vs SATA

Discussion in 'PC Forum' started by Jakeman, Sep 12, 2014.

  1. Jakeman

    Jakeman MSC Founder and Donator

    Joined:
    Feb 26, 2000
    Messages:
    25,756
    Likes Received:
    27
    Market Rating:
    16
    I am going to build a file server soon and I have been reading about SAS vs SATA. I thought I would post some explanation here so others can benefit.

    First recognize that SAS is the hard drive controller and the physical connector, not the drive itself. But the term is often used to describe a type of hard drive, so it's an overloaded term.

    The hard drive itself:

    As I stated previously, SAS is not the drive itself. Unfortunately there is a loose association between SAS and other commonly associated hard drive attributes (high rotation, small capacity, high reliability), and most people blindly subscribe to that association without separating and understanding the two. I have read lots of guides which basically say, "get SAS if you want these hard drive attributes (high rotation, small capacity, high reliability)." But that is not necessarily true.

    There is a new categorization of drive recently called NL-SAS (nearline SAS) which breaks the loose association between SAS and its common hard drive attributes (high rotation, small capacity, high reliability). NL-SAS is still SAS. "Nearline" is just a qualifier which relegates the drive to a suggested application. For example, I have been looking at these drives for my file server:

    http://www.wdc.com/en/products/products.aspx?id=580

    The WD enterprise drives I am looking at are NL-SAS. These drives move the balance between rotation and capacity. SAS usually has high rotation and small capacity, but these NL-SAS drives have low rotation and large capacity. Typical numbers look like this:


    SAS:
    10k or 15k RPM
    < 1TB

    NL-SAS:
    5400 or 7200 RPM
    1 - 4TB

    Both are SAS. But the "NL-SAS" label creates a new loose association between the SAS connector and common hard drive attributes.

    The third attribute (high reliability) also changes. This is measured in bit error rate (BER), which WD calls "Non-recoverable read errors per bits read." Typical numbers look like this:


    SAS:
    1 in 10^16 BER

    NL-SAS:
    1 in 10^15 BER (beware that WD writes this as "10 in 10^16", so don't be fooled by the "^16")

    So these NL-SAS drives are 10x more likely to corrupt your data than the typical SAS drive. This table qualifies the different magnitudes of error rates to give you a better idea of what is good and what is not (and I have copied the table below):

    sas-sata-table1.jpg

    Again, these drive attributes are loosely associated with SAS and NL-SAS, but they really have nothing to do with the SAS connector itself.

    The SAS connector itself:

    Ignoring the commonly associated hard drive attributes, the SAS connector itself has real benefits over SATA:


    Command Queuing and Reordering: The SAS controller better handles high concurrency of reading and writing operations.

    Full Duplex Operation: Again, the SAS controller and connector better handles high concurrency. Full duplex means the bus can send and receive data at the same time. It's a two-lane road instead of a one-lane which is called Half duplex.

    Bad Sector Recovery of 7-15 seconds: This is as opposed to up to 30 seconds for most SATA controllers. Basically this means the SAS controller is less tolerant of corrupt data. In a raid configuration or a file system with ECC (like ZFS) this means the drive will more readily make use of parity and error correction. This is a good behavior to have in a storage system with fault tolerance to fall back on. A "trouble" sector will sooner be compensated for rather than tolerated.

    Internal Data Integrity Checks: The SAS controller constantly works to ensure integrity of data in transmission. If data gets corrupted during transmission to the drive then the drive would normally write out that corrupt data, but SAS works to make sure that data is not corrupted up to the point that it is handed off to the drive.

    Conclusion:

    SAS itself is good for high usage applications and it contributes some degree of reliability independent of the drive itself. But the drive itself is mainly what defines overall reliability (look for the bit error rate mentioned previously). There is nothing to prevent a manufacturer from putting a SAS connector on an unreliable drive, so do your research and read the specs of whatever drive you are considering. As mentioned previously, so-called "nearline" SAS drives are basically SAS drives with a 10x higher error rate than is typical for hard drives with SAS connectors.

    All SAS is not equal. Remember that you are buying a hard drive with a SAS connector, not a SAS drive. Research the hard drive specs. Don't just assume that SAS = reliability. Look for the error rate in the drive specs and use the table above to put it in perspective. And for any large storage system you should always have fault tolerance.
     
    Last edited: Sep 12, 2014
  2. smack

    smack Peasant and Donator

    Joined:
    Mar 22, 2000
    Messages:
    160
    Likes Received:
    3
    Market Rating:
    0
    Hm! I didn't know SATA and SAS had different reliability. I also didn't know about the performance advantages of SAS. That makes me want SAS.

    I've heard "Bad Sector Recovery of 7-15 seconds" referred to as TLER (time limited error recovery) but different manufacturers have different names for it. Having a shorter timeout for retrying reads (internal to the drive) allows the RAID controller to kick your drive out of the RAID sooner. This allows the RAID controller to read by reconstructing the data from parity sooner (instead of waiting up to a few minutes trying to read it from the disk with the read error). This is a reliability for performance tradeoff. This technically makes drives with a lower read retry timeout less reliable. This may make sense in a business context where the drive is being used in a RAID and you only want 7 to 15 seconds of bad performance instead of a few minutes. But it probably isn't important in the home fileserver context. TLER would increase the chance of failure to reconstruct a critical RAID (a RAID that can't have another drive fail without losing data). But I'm not sure if this is significant when thinking about this from a probability standpoint when the drives are used in a RAID.

    Not related to SAS:
    There is a good paper about drive reliability (with an enormous sample size). It says there's a strong correlation between failure rate and drive make/model.
    Failure Trends in a Large Disk Drive Population (section 3.2)

    Google won't tell us which make/models are unreliable, but BlackBlaze will. :-)
    What Hard Drive Should I Buy (wtf Seagate?)
     

Hitometer: 53,588,729 since 1995