Gizmodo Explains: What a RAID Hard Drive Array Is (and Why You Want One)

via Gizmodo by matt buchanan on 4/8/09

RAID: It's not just for Warcraft nerds anymore. If you've got a ton of music, photos and video and you don't know about RAID hard-drive arrays yet, read this—or wave your precious media files buh-bye.

RAID sounds like vaguely scary, like something you to do a French village if you're a Viking or what you do on a Saturday night if you're a lonely, sad person, but really it just stands for redundant array of independent disks. Which really just means a bundle of hard drives acting like one superharddrive. There are a few of different kinds of RAID configurations and they do different things—one kind gives you crazyfast performance, another makes your storage safer than a single hard drive acting alone via redundancy.

The most likely place you're going to run into RAID—if you're not familiar with it already—is when you jump into the world of network attached storage, aka NAS, providing a huge bank of hard-drive goodness for storage, backup and media streaming across all of your home or office computers and devices.

But okay, let's jump into the basic kinds of RAID setups, known annoyingly as levels, even though they are mostly different. If you wanna skim, the ones you should know are RAID 0, RAID 1, and RAID 5.

RAID 0 is all about performance, employing what's called striping, where data is broken up into fragments and written across multiple drives, sort of treating them as one giant drive. Let's assume we've got a setup with four hard disks. The performance edge comes from the fact you're getting massive throughput—it's like going from one lane to four, since you're writing and accessing all four drives in parallel. It's for pros and crazies handling massive files, like HD video editing. The downside is that if even one of the hard drives fails, you lose everything. Every file is now incomplete. It's not technically RAID since there's no redundancy going on—possibly hence the zero.

RAID 1 is the main configuration most novices should learn about. It writes, or mirrors, data to multiple disks, so you've got multiple hard drives that are exactly the same. Obviously, this is good for data reliability, since if one fails, you've got another. If you don't have an independent disk controller or host adapter for disk, however, performance can be kind of crummy trying to write to the disks simultaneously, and performance isn't going to be as good as a striped RAID configuration, obviously. On top of that, you have to buy two 500GB disks just to get 500GB of storage, so it gets a little costly, too. Safety first!

RAID 2 stripes data like RAID 0, but at even smaller level (bits instead of blocks) and uses additional hard drives and what's called Hamming code for error protection and parity which allows it to recover corrupt data. Guess what? No one uses it anymore.

RAID 3 stripes data across multiple drives as well, but at the byte level, and it has a single disk dedicated to data parity and error correction. Because of the byte level split, all the drives work together simultaneously as one unit, which means it can only do one one read or write operation at a time. Pretty rare to see, and nothing you, Joe Q. Consumer have to worry about. It's good for high transfer rates (again, HD video editing comes to mind) with a measure of security that you don't get with RAID 0, since you can lose a disk and still be okay.

RAID 4 is a striping+parity disk setup too, but at the larger block level, so disks can be more independent, and you can have multiple read operations in different places going on. Since you're only using a single disk for parity, which has to be written to every time you write data, you can still only have one write operation going on at a time.

RAID 5 is where much of the NAS in a box action is today if you're not rolling with RAID 1, and tries to offer the best of all RAID worlds—performance and redundancy—by combining the various configurations. It stripes data across multiple hard drives, but instead of just dumping parity data onto a single drive, it spreads it across all of the hard drives too, meaning there's no bottleneck from writing parity data. (Though writing parity data is still kind of a drag.) In this configuration, you can lose one hard drive and be okay, since that drive's parity data is on a different disk. Sadly, there's some concern about its probability for failure over the next year as hard drives increase in size and the system expands.

RAID 6 is like RAID 5, but it uses two disks for parity and correction which are setup so that if one of the drives bombs out during data recovery, the system keeps on chugging.

• Beyond the standard RAID levels, you've also got the multiples: RAID 1+0 and 0+1. In RAID 0+1, you take a striped set of drives (RAID 0 from above) and mirror them in a RAID 1 setup, so you get some redundancy. RAID 1+0 is the reverse: It stripes across a set of mirrored drives. With hard drives getting cheap, the army of hard drives you need to make this happen is easier to get going, so you might see more of it.

Just to be clear: RAID's not a total substitute for backup. For average internet geek people, the best way to think about RAID is as a way to more reliably store a large bank of data (which we all have now with photos, videos, music and more) with extra protection against hard drive failure. You can lose an entire RAID array via virus or accidentally overwriting it (both of which are possible).

If you buy networked storage in a box, like from Western Digital, you'll typically have options for RAID 1 or RAID 5—which one you pick decides on what's more important to you. Total redundancy for more reliability or a combination of performance and reliability. Lifehacker has a step-by-step guide to building your own RAID array in your computer for some more hands on how to advice, and if you need some help picking your network storage solution, we've got you covered there too.

Still something you still wanna know? Send any questions about raids, WoW or hard things to [email protected], with "Giz Explains" in the subject line.

by Ben Pike