Cover V12, I08

Article
Figure 1

aug2003.tar

Tapes: A Modern History, Trends

Henry Newman

During the past 30 years or so, tape technology has not changed nearly as much as rotating storage (disks). But in most environments, especially enterprise environments, tape is still a requirement for continuity of operation. In most cases, even if you have an off-site remote mirror, tapes are still required as a third copy. Given that, most sites are still going to need tape in the future. So this month I will talk about many of the issues surrounding tape hardware.

As shown in Table 1, changes in this area have been incremental at best. If you compare tape density and speed increases to CPU system performance increases from the same period, you would have a tape that would write at 629 GB/sec and contain uncompressed over 73 TB. Of course, that's impossible, just as impossible as having a disk drive spin at over 600 million times per second, which is the same ratio.

A more linear improvement in enterprise tape performance has only been available with the release of the StorageTek T9940B and most recently the Ultrium-II drive. Given that most of the world is running RAID for critical environments, it might be fairer to compare RAID capabilities than those of a single disk. So, over the same period, we see:

Tape Density Increase 1333 Times
Tape Transfer Rate increase 24 Times
Disk Density Increase 2250 Times
Disk RAID-5 8+1 Density LUN Increase 18000 Times
Disk Transfer Rate Single Average 21 Times
Disk RAID Transfer Rate LUN Average 133 Times]

Whatever angle you look at, the performance and density of tapes compared with that of rotating storage is out of whack. If you add to that a comparison with CPU performance, you can see a huge imbalance.

Tapes Types

Most enterprise environments use tapes that write in a linear format, but another tape type exists that generally has a higher density -- known as helical tape. Helical tape has more contact between the tape and the head on the tape drive. With linear tape, the data is written lengthwise down the drive. With helical tape, the data is written horizontally across the tape, hence the reason for more contact with the tape drive heads.

Here are some general comparisons between these two tape types:

  • A very small defect on a helical tape can corrupt the data if the error correction buffer is full. Error correction space is often left on the tape and if that space fills up, the tape becomes unreadable.
  • Helical tape heads wear out long before linear tape heads because the tape heads make more intimate contact with the tape.
  • Reliability is generally higher for linear tapes over helical for both the media and head life of the drive because more contact means more wear.
  • Because of media wear, high-end linear tapes generally have a longer storage life than high-end helical tapes.

Linear tape vendors include: IBM 3590B/E, STK 9840/9940, Quantum SuperDLT, older DLT 7000/8000, and LTO, which many vendors sell.

Helical tape vendors include Sony, which makes AIT-1, AIT-2, and AIT-3, as well as the DTF line of tapes. Other helical types include 8mm Mammoth and Mammoth-2 4mm(DAT).

Even for a single tape type, different tape providers often claim better technology built into the construction.You will have to make the determination of which claims are valid and make sense for the different tape types.

Compression

Almost all tapes (unlike disks and RAID) automatically compress the data input stream. This is an important consideration when determining drive types because different drives have different compression algorithms. Not surprisingly, enterprise tape drives from IBM and StorageTek have higher compression rates than lower-end drives such as DLT and Mammoth. Drive vendors often provide estimated compression rates, but these are averages and your mileage may vary. Compression is important given the cost of the media as a function of the drive cost. Consider the following example:

Drive 1
Drive Cost: $35 000
Media Cost: $75
Compression 5 to 1
Drive Size 250 GB

Drive 2
Drive Cost: $5 000
Media Cost: $75
Compression 2 to 1
Drive Size 250 GB

Let's say you have 400 TB of raw data over a time period that will need to be backed up. Drive One will require 327 pieces of media at a cost of $24,525 and, including the drive itself, will cost a total of $59,525. Drive Two will require 820 pieces of media at a cost of $61,500 and, again including the drive itself, will cost a total of $66,500. The cost of a larger tape library and the cost of software licensing must also be considered, as some vendors license by the number of tapes.

Clearly, compression must be a consideration in the total cost of ownership of for tape systems but, as I said, your mileage for compression on each drive type with your data will vary. One quick way to see whether your data is compressible is to use the gzip program with the -9 option. In my experience, by using gzip -9 "file name", you will get the maximum compression achievable for the data. The tape hardware usually has two parts for hardware compression: data dictionary to compression against, and the compression buffer. You might want to ask the vendor the size of each and the hardware implementation (LZRW1, LZO, etc.)

You will need to test each of the tape drives that are under consideration with a statistically significant sample of your data to determine how your data behaves with the drive and its compression algorithm.

How It Will Be Used

Tape drives and the associated libraries have different characteristics for tape load, tape ready position, and rewind time. In some cases, this is not important, such as in applications like backup where generally all you are doing is loading the tape and writing large amounts of data sequentially, then rewinding the tape and moving it back into a position in the library. On the other hand, with hierarchical storage management (HSM) applications, tape load, position, and rewind time become critical issues especially for reading data back, but this will also depend on the requirements for the retrieving the data. For a good definition of HSM, see:

http://www.snia.org/education/dictionary/h/
HSM applications are becoming more popular given the length of time required for backup with increasing storage densities. In fact, StorageTek developed the T9840A and B drives specifically for HSM applications with small files. It has a 4-second load time and averages 8 seconds to first data byte. Typical other products require 6 to 15 times more than the T9840A and B drive times to first byte. Note, however, that if the files are large, load and position time become insignificant compared with transfer time. If you have a 20-GB file and, with compression, the transfer rate is 30 MB/sec, the transfer time equals 682 seconds. With a 50-second load and position time, that's only about 7.5% of the total time. My rule of thumb when architecting a system is to keep load and position time to less than 10% of the time to write the data.

For HSM applications, reading is a different matter. Most applications can consolidate the files to ensure large amounts of data are written; reading, however, requires an understanding of the recall rate of the files, the size of the files recalled, and, most importantly, the speed requirement for recall. A credit card company that stores information to provide approval codes is far different from a research site doing genetic research recalling a gene for comparison between two people. Understanding your application(s) environment is critical to developing a good architecture.

Trends

Given all of these issues, one might ask, "Is tape dead?" A number of the large storage vendors pronounced tape dead four years, three years, two years ago, then again last year, and will likely do so this year and next year. Tape has some significant advantages over disk (rotating storage), however, that indicates to me it's not dead yet. For example:

1. Tape does not require power -- Most modern disk drives require power to be powered on for reliability. The Seagate 120 GB ATA drive, for example, uses 13 watts. That can get really expensive if you have 400 TB of secondary storage.

2. Error Rates -- Bit error rates for ATA drives (FC and SCSI drives are an order of magnitude better) are 10 to the 14th, while bit error rates for enterprise tape are 10 to the 18th and other tapes (AIT and DLT) are around 10 to 17th. That means tapes are between two and four orders of magnitude more reliable than both ATA and SCSI disk drives.

3. Tapes can handle higher shock than disks and still survive.

Note that all of the above information is available from the Web pages of the companies mentioned.

What to Do

I believe that for at least the next few years, tapes and tape drives will continue to be a critical part of the storage infrastructure. This will continue because tape is far cheaper than rotating storage in total cost of ownership given the issues with power requirements for rotating storage and compression support with tape drives. Almost all of the tapes in the market claim 30 years of shelf life -- even the lower end tapes. Keeping a tape for 30 years might be possible but even if you manage to do that, how are you going to be able to read it? Tapes, like any storage medium, are dependent on outside influences like:

1. What is the interface and driver? (Try finding a SCSI-1 interface from the early 1990s, much less an IPI-3 interface 20 years from now.)

2. Will the tape drive be available to read the tape? (A little over 30 years ago, 7-track tapes were state of the art, but finding one to read a tape today will be next to impossible.)

3. What is the data format of the tape? (Some vendors write in tar format, for example, and will tar or an application like Veritas Netbackup be available in 2033?)

4. What is the data and will any program be able to read it? (PDF is a popular format, and applications can read it today, but what about reading a MS Word 2.0 document from 10 years ago with MS Word 2002?)

All in all I advocate a migration strategy. Whatever you do and whatever tape type you decide upon, it is critical to plan a migration strategy as part of the initial decision process. Nothing lasts forever -- especially the way you read your data.

Henry Newman has worked in the IT industry for more than 20 years. Originally at Cray Research and now with a consulting organization, he has provided expertise in systems architecture and performance analysis to customers in government, scientific research, and industry around the world. His focus is high-performance computing, storage and networking for UNIX systems, and he previously authored a monthly column about storage for Server/Workstation Expert magazine. He may be reached at: hsn@hsnewman.com.