Raid Optimization for Database Operations

Raid Optimization for Database Operations

Presented March 2, 2000 at the Unix Users Association of Southern California, Los Angeles Chapter by Brian Mann

Please note: These are outline notes only, and are not intended to be a stand-alone tutorial or lesson. You had to be there. If enough interest is expressed, I will consider refining and expanding this into a complete tutorial.

What is RAID

Defined by Patterson, Gibson & Katz at UCB in 1987
"A Case for Redundant Arrays of Inexpensive Disks (RAID)"

Need driven by expense & performance of SLED, failure rate of JBOD
Three main objectives

Performance

Reliability

Cost

Originally 5 levels

RAID 1 - Mirroring

RAID 2 - Byte striping with ECC

RAID 3 - Byte striping with parity drive

RAID 4 - Block striping with parity drive

RAID 5 - Block striping with striped parity

Additional levels
RAID 0 - Block striping, no parity

RAID 0+1 (1+0, 10) - Hybrid of striping and mirroring

RAID 0+3 (35) - Striped RAID 3 segments
RAID 0+5 (5+0,50) - Striped RAID 5 segments

RAID 6 - Block striping with striped redundant parity

RAID 7 - Asynchronous I/O w/ embedded RTOS

Implementing RAID

Hardware vs. software

Cost

Performance

Administration

Internal vs. external

Choosing a level

Capacity

Cost

Performance

Comparison of RAID Levels

Level Advantages Disadvantages
RAID 0 Fastest I/O
No overhead for parity
Simple design, easily implemented Not really RAID
One drive failure destroys all data
Not for mission-critical deployment
RAID 1 All drives usable for data reads
Can be implemented w/ 2 drives Greatest storage overhead - 100%
Highest cost/capacity ratio
RAID 3 High transfer rates
Degraded mode still fast Requires spindle synchronization
Can't do overlapped I/O
RAID 4 High read transfer rates
Efficient use of capacity Poor write rates
Parity drive can be bottleneck
RAID 5 Very high read rate
Efficient use of capacity Slower write rates
Slow rebuild times
RAID 6 Allows failure of multiple drives
Very poor write performance
Proprietary solution, rare
RAID 7 Supposed to be fastest Proprietary, very expensive
RAID 1+0 Very high reads and writes Most expensive

Level	Advantages	Disadvantages
RAID 0	Fastest I/O No overhead for parity Simple design, easily implemented	Not really RAID One drive failure destroys all data Not for mission-critical deployment
RAID 1	All drives usable for data reads Can be implemented w/ 2 drives	Greatest storage overhead - 100% Highest cost/capacity ratio
RAID 3	High transfer rates Degraded mode still fast	Requires spindle synchronization Can't do overlapped I/O
RAID 4	High read transfer rates Efficient use of capacity	Poor write rates Parity drive can be bottleneck
RAID 5	Very high read rate Efficient use of capacity	Slower write rates Slow rebuild times
RAID 6	Allows failure of multiple drives	Very poor write performance Proprietary solution, rare
RAID 7	Supposed to be fastest	Proprietary, very expensive
RAID 1+0	Very high reads and writes	Most expensive

Optimizing Database Performance

When using striping, match stripe size to data block size used by application. Use 2K stripe with 5-drive array to match 8K data blocks in Oracle,
4K stripe with 3-drive array (don't forget parity requirement!).

Other databases may have different requirements (Pick = 2K, others ?). Remember it's always a power of 2. Don't use 7 drives! Gives 6 data drives, not a factor of 2 for matching.

Rather than one big array, try to use multiple smaller ones.

Databases typically update indexes along with data, put them on different arrays. Also logging files, OS, swap, etc.
Don't run any other apps on database server!!! Data block sizes often don't match, will seriously degrade write performance.

Remember all tuning is compromise between cost, performance and reliability. I am assuming reliability is not negotiable, therefore tradeoff is price/performance.

Spend your cash on cache. The more cache you have, the better the performance, period! This also means use hardware RAID. More disks = more performance (keeping in mind power of 2 rule). For ultimate performance, use striping with mirrored drives (level 0+1). Most expensive per megabyte, works fastest in degraded mode, restores fastest.

Use multiple controllers or a multi-channel controller to get fastest sustained transfer rates. Multiple cards also give redundancy from card failure, if configured properly.
Case Study - CHA

Began when server upgrade didn't result in expected performance.

Developed performance tests matching profile of typical database usage. Performed reads and writes to files in same ratio as production environment.

Tested different controller cards using RAID levels 1 and 5, using identically configured machines.

Concluded DPT controller / level 1 best for us.

Test Results

Controller RAID Level Elapsed Time (min) (SAR)
%usr %sys %wio %idle
Mylex 1 211 10 5 84 0
DPT 1 171 18 7 75 0
Mylex 5 224 8 5 86 0
DPT 5 190 13 6 81 0

Controller	RAID Level	Elapsed Time (min)	(SAR) %usr	%sys	%wio
Mylex	1	211	10	5	84
DPT	1	171	18	7	75
Mylex	5	224	8	5	86
DPT	5	190	13	6	81

Sources

www.raid-advisory.com/

www.networkcomputing.com/811/811buyers.html

www.baydel.com/tutorial.html

www.icp-vortex.com/supp/tech/primerus.pdf

www.adtx.co.jp/english/raid.html

www.acnc.com/raid.html

www.twincom.com/raid.html

SmartRAID User's Manual, by DPT