|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
||
| As the Hard Disc Spins VI: Command Queuing | ||
|
(Review by MS, February 26, 2004) | ||
|
WD Raptor WD360GD |
Seek Latencies
Seek latencies are a function of the time it takes the head to settle over the cylinder that contains the specified LBA. As long as only a single LBA is specified or else sequential data are called that are located within the same track, there will only a single head movement necessary. However, the situation changes dramatically if multiple outstanding requests will necessitate the access of a number of LBAs that are spread out over various cylinders. In that case the head will have to perform a zigzag course over the width of the platter and that involves a substantial amount of movement.
Supermarkets and Elevators
We have used the analogy of a shopping list for a supermarket before, one possibility is to simply go line item by line item which does not take into consideration the location of the items within the supermarket and, consequently, will involve substantial legwork. Another, possibly more intuitive analogy is an elevator. It was not very long ago that most elevators approached the target floors in the order in which the buttons were pressed – a rather inefficient way of servicing all commands, and, moreover, it involved an enormous waste of time for going back and forth between the different target locations. The interesting thing is that this is exactly the method employed by most contemporary HDDs.
Elevators have evolved over the last decades to understand that the most economic way of servicing the different outstanding commands will have to include reordering and rescheduling of certain commands. A side effect is faster speed and because of the overall reduced workload, less mechanical wear and tear that directly translates into better reliability and higher lifespan / endurance of all parts.
The elevator example is actually quite fitting, everybody knows that it is possible to enter the elevator on the third floor and then push the 7th floor button, even if the next floor that was originally selected was e.g. the 10th floor. As long as the command comes in time to be inserted into the ongoing workflow before the target floors are passed it is possible to reorder the commands and stop at the next floor. This is called dynamic reordering of a command queue and coincidentally the essence of the Serial ATA Native Command Queuing scheme.
To take the elevator example a bit further: Somebody may have just missed a target floor, and, therefore, will have to hit that particular floor on the way back. Most elevators will clear the commands at the point of turnaround, however, the more advanced units will keep track of entered commands and defer their execution by creating the next queue already during the execution of the first. This decision-making as to into which queue any newly entered command may fit or, by extension, whether there is another, more efficient way of executing the entire outstanding workload is also part of the NCQ scheme written into the SATA II specifications.

In any non-queued execution of commands, the sequential access of multiple random LBAs may involve several rotations along with excess of head movements (left). Queuing of the commands to streamline their execution will reduce the overall distance from the first LBA to the last that the head has to move over the surface of the platter dramatically. Since the rotational velocity is constant, the length of the purple line (right) compared to the length of the red line (left) is proportional to the execution time in an either non-queued or else a queued execution scheme. Note that this only holds for the case of random I/O accesses.
To summarize the above, the shortest way is not always the fastest, and therefore, much more than the raw positional data of the LBA addresses on the platters has to be taken into consideration for the most efficient way of servicing outstanding commands. In the beginning of this article, we mentioned the terms “intelligent” and “internal”. Now we are coming back to exactly these terms because the only device that knows the physical location of the data is the drive itself and it has to apply very sophisticated algorithms encompassing seek length, starting location, acceleration profiles of actuators and head switching along with operational “marching orders” such as e.g. quiet seeks.
Rotational Latencies
We have covered some of the issues regarding rotational latencies in an earlier article and shown the principle behind out of order data delivery (enabled in the SATA II specs) and how it can greatly reduce rotational latencies. A simlar scheme, that is, ModifiyDataPointers is part of the SCSI specifications, however, to the best of our knowledge, nobody has adopted it for reasons of diminishing returns - the extremely low rotational latencies of the SCSI devices make MDP-based out of order data delivery almost a moot point. Briefly, the full rotational latency for any 5400 rpm drive is 11.1 msec, a 7200 rpm drive handles the same in 8.3 msec and any 10k rpm drive will manage a full rotation in 6 msec. The average rotational latency is statistically ˝ of a full rotation , meaning that 5400 rpm drives will spend on average 5.5 msec until the platter have rotated the correct LBA under the head. For a 7200 rpm drive, this value is still 4.15 msec. Keep in mind that those numbers are statistical averages, real life accesses can be shorter or else, they can be much longer.
On the other hand, it is important to understand that we are talking about milliseconds here, whereas most other processes in any halfway decent PC are running within nanoseconds, that is, about 1 million times faster. In so far, even if the total access times remain astronomically high (in terms of CPU cycles), any saving of 1 msec will speed up the processing by 3,000,000 - that is three million - CPU cycles , assuming we are looking at a 3 GHz P4 or similar. Optimized workflow is by definition more important for multiple host / multiple client systems. However, HyperThreading is on the best way to simulate exactly this scenario even on a home PC, in that it enables quasi-simultaneous execution of unrelated threads which, in turn adds a whole new level of independent access patterns to the average workload.
next page: => Hard Disc Drive Architecture VI: Different Queing Standards =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!