SCSI vs. IDE/SATA Disks
From PostgreSQL Wiki
The common generalization is that SCSI disks are fast, reliable, and expensive, while IDE drives including PATA and SATA designs are slow, unreliable, and cheap. The truth is a bit more complicated.
SCSI Disks
- Often have higher available maximum RPM (10K or 15K).
- Maximum capacity is lower (73-146GB are popular sizes).
- Cost/MB is high.
- Tend to be made with more expensive and possibly more reliable components, and believed to go through more thorough testing. The data from studies by Carnegie Mellon and Google don't show any significant bias toward SCSI being more reliable. But the data from Network Appliance suggests "SATA disks have an order of magnitude higher probability of developing checksum mismatches than Fibre Channel disks".
- Usually default to the write cache being turned off, ensuring reliable database operation.
- According to some sources such as NetAPP, the firmware in SCSI devices tends to be optimized better for RAID use in that it will return errors so data can be reconstructed from partner devices, where desktop oriented SATA devices often struggle internally to repair the damage instead.
ATA Disks
- Most drives have slower RPMs (7200 is standard, some 10K designs like the Western Digital Raptor).
- Maximum capacity is higher (1TB available). This is achieved by putting more platters into the drive. More platters means more heat and moving parts, and all other things being equal that can contribute to a higher failure rate. More platters can also mean slower seeks and generally slower performance in cases where the read and write heads are heavier; on the flip side, in cases where you're moving across the whole disk more platters may reduce average seek time.
- Cost/MB is low. A fair performance comparison will recognize that while individual SCSI disks may be faster, if you can put more disks into the system because they're cheaper the aggregate performance of the ATA-based solution may be better. This is obviously limited by server space issues, you may reach the upper limit on disk expansion before you can add enough SATA disks to pull ahead.
- Always default to the write cache enabled. Good (S)ATA RAID controllers will turn it off for you if setup correctly. It's possible to disable the cache on disks via the operating system, but this can be dangerous. There are reports of drives that don't turn off caching regardless and cases where the write cache turns back on if the drive is reset. A better technique is to turn the cache off using the diagnostic tools most manufacturers provide, so that it defaults to off even on reset. This requires some discipline on your part, to make sure that happens even when disks are replaced (the time around disk replacement after a drive failure tends to be stressful).
If you get a good ATA controller, one that always turns off the individual disk caches for you, it's possible to build a reliable database system around ATA drives. But if you're just using the controller that comes integrated with your motherboard, unless you're very careful to validate that the write cache is disabled you risk database corruption if there's a crash.
SCSI-based setups generally avoid this issue by have sane defaults for database use. You will also likely get higher transfer rates and better seek performance in particular from an individual SCSI disk than a single ATA one. But in cases where you can throw more disks as the problem, being able to purchase more ATA disks per dollar may end up in a system that's considerably faster than the same amount spent on SCSI hardware.
Contents |
Recommended SATA Controllers
- 3ware units are generally agreed to be solid, with some concerns about their RAID 5 performance.
- LSI MegaRAID (but not the SATA 150-2) are considered very reliable but somewhat slower than the other vendors listed here.
- Areca controllers are very fast, but harder to obtain and since they're newer they're not as time-tested.
Helpful vendors of SATA RAID systems
Derived from http://archives.postgresql.org/pgsql-performance/2006-11/msg00136.php)
Studies on drive reliability
- Failure Trends in a Large Disk Drive Population - Google study
- Disk failures in the real world - Carnegie Mellon study
- An Analysis of Data Corruption in the Storage Stack (shorter version) - University of Wisconsin-Madison and Network Appliance study
