|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| ATI All-In-Wonder Radeon 32 MB DDR The Return of the ATi-Knights | |
| (Review by MS) |
PIXEL TAPESTRY™ architecture
One feature offsetting the Charisma engine from the rest of the bunch is that it possesses 3 texture units per rendering pipeline. Most graphics processors can apply one or two textures to any given pixel per clock cycle. If multiple textures are overlaid on the same pixel, that is, e.g. the base texture + gloss + specular maps to generate a real life appearance of the object rendered, the number of passes required to generate and combine the individual textures depends on the capabilities of the engine. More advanced GPUs have more than one pipeline, for example, the GeForce has four pipelines receiving input from texture units each. The Radeon Charisma engine has two pipelines with fed by three texture units each. In other words, it is capable of applying up to three textures in a single pass. Most current games are not capable of taking advantage of this feature yet, similarly most video benchmarks are optimized for multiples of 2 textures. This is something that needs to be considered when looking at e.g. 3D Mark2000 which uses four textures for the fill rate test. In this case, the Radeon can either do 3 + 1 textures or 2 + 2 textures but in either scenario, it will need two passes, just as the GeForce which, with its two texture units, is less sophisticated but can handle MadOnion benchmarks with greater efficiency.
Depth is the feature that makes 3D applications 3-dimensional. Therefore, it is not surprising that the Z plane (depth) plays an utmost important role in the generation of 3-dimensional scenes. Overlapping layers do the rest to make the Z-buffer the probably busiest part of the memory subsystem, consequently, also using up the majority of bandwidth. There are several ways of how traffic can be reduced, that is, bandwidth can be conserved.
Z-Buffer compression
Z-buffers can be divided into two separate units, the first being the internal Z-buffer integrated into the graphics engine itself, the second being the external Z-buffer which is stored in the local frame buffer. An analogy would probably be the L1 and backside L2 cache of the Slot CPUs, not in terms of clock speed but regarding the overall speed. That is, the external Z-buffer is substantially slower than the integrated internal Z-buffer but it is also way slower. This is compensated for by subdividing the Z-buffer into 8 or 64 pixel blocks which are stored in compressed format. The benefits are two fold, in that compressed data take up much less space and further occupy less memory bandwidth for data transfer to the internal Z-buffer where they are decompressed if needed. The compression factor is somewhere between ½ and ¼, resulting in some 2-4 fold better usage of the external Z-buffer and similarly faster data transfer to the internal Z-buffer. At low resolutions, Z-buffer compression may not yield that much benefit, however, with increasing resolution, video data exceed the space of the local frame buffer and need to be stored within the AGP aperture of the system memory. Under these conditions, any compression will increase the data transfer to the internal Z-buffer and offset the drop in fill rate usually caused by falling back to a unified memory architecture.
Fast Z-buffer clear and Hierarchical Z-buffer
Before any new frame can be rendered the Z-buffer needs to be cleared. Clearing every single block of the internal and external Z-buffer (zero-fill of all blocks) requires an additional write step taking up precious time and bandwidth. ATi has devised a smart workaround for this problem. Similar to formatting a Hard Drive the conventional way as opposed to a low-level format, the data in the external Z-buffer are merely tagged as erased, which then causes the internal Z-buffer only to zero-fill all blocks. This process safes substantial time and conserves bandwidth, however, only few games currently are demanding enough to take advantage of fast Z-buffer clear.
Another way to reduce bandwidth is the use of a hierarchical Z-buffer. A hierarchical Z-buffer is basically a low resolution matrix of visible pixels that can be used to compare the depth of any pixel before it is rendered to the hierarchical map and, whatever does not match is being thrown out. The hierarchical Z-buffer employs 64 pixel tiles (8 x 8), i.e., very low resolution which, can lead to visual artifacts if such tile coincides with edges of objects at different Z-planes since the wrong pixels may be rejected. A hierarchical Z-buffer can only be used to compare 3D maps after the triangle setup was established but prevents rendering of invisible pixels.
Next Page: => There is more than one way to skin a vertex =>
If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.