Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:








































































































































What are you
shopping for?



































































































































































LOSTCIRCUITS

SHORTCUTS:
A Darwin Award
Prelude to a Bug
Virtualization and TLBs
A Band Aid and a Patch
Phenom Specs
The Spider Platform
Test Configurations
ASUS M3A32-MVP and AOD
Memory subsystem
CPU Power Consumption
TrueSpace and Power Efficiency
Cinebench
DVD-Shrink, MainConcept
VirtualDub/DivX
3DMark'06
FarCry
F.E.A.R.
World In Conflict
Crysis
UnrealTournament3
NB Frequency: Does it Matter?
The Secret of AOD
Final Analysis

Give Us Some Feedback on this Review

 AMD's Phenom Processor - Beyond Erratum 298
(Author: Michael Schuette, January 1, 2008)

Erratum 298

Erratum 298 will be documented in the upcoming update to the Revision Guide for AMD Family 10th Processors (PID 41322):

The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2. In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation. The system may experience a machine check event reporting an L3 protocol error has occurred. In this case, the MC4 status register (MSR 0000_0410) will be equal to B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be equal to 26h."

AMD's claim to fame is the release of the first native quad-core die, which for sure is no small accomplishment

The Tee El Bee

What is a TLB anyway. Aside from butchered translations of the acronym in the worst case scenario there appears to be an unhealthy amount of misinformation regarding TLBs and their role in processing. To make a reasonable explanation, it is necessary to understand at least a modicum about memory management in modern processors. The x86 architecture originally started out with a direct-mapped memory space (real mode) in the 8086 processor. The memory bus used 20 bit addressing to allow utilization of 2^20 bit or 1 MebiByte, which for all practical purposes is equivalent to 1 MegaByte. Suffice it to say that 1 MB of memory reached its limitation relatively quickly, despite all predictions to the contrary. The 80286 (“286”) processor series extended the address space to 24-bit addressing for a total memory space of 16 MB and introduced the grandfather of all current memory management schemes, that is, protected mode addressing by the memory management unit.

In short, protected mode is a virtual addressing scheme that increases the control of the operating system over application software memory usage, by introducing virtual addressing, paging and a number of other features. One issue associated with entering protected mode on the 286 processor was that it could not revert back to real mode, causing Bill Gates to refer to the 286 as a “brain dead processor” since it turned out that the 286 was not capable of running multiple simultaneous instances of MS-DOS in the Microsoft Windows environment.

Bill Gates' brain death was reversed with the introduction of the 80386 processor which started out in real mode and then could enter protected mode after creating the global descriptor table (GDT) of the virtual address space using at least three entries for the null descriptor, the code segment descriptor and the data segment descriptor and then switching on the A20 gate (ironically borrowed from Intels 8042 keyboard controller). The A20 gate is needed to enable the A20 address line necessary to step beyond the 20-bit limitation by electrically turning on the higher address lines. In the case of the 386 processor, the number of address lines was increased to 32, thereby increasing the address space to 32-bit or 4 GibiBytes – which essentially prevails as the 32-bit address space used today. Suffice it to say that it is possible on all current processors to run "virtual DOS machines" that simulate a DOS environment within the context of an NT environment without revertig back to real mode.

As mentioned above, protected mode allows the operating system to control the memory access of application software. In order to protect the operating system itself from unauthorized accesses by applications, the x86 architecture uses four rings of protection with ring 0 having the highest protection and ring 3 the lowest. In most desktop environments, only ring 0 and ring 3 are used for the OS (kernel memory) and applications, respectively. Since the addressing of the memory space is done by creating a virtual address space based on code and data segment descriptors, each application can have its own memory space that it can use to store and load data. The advantage of this virtual address space is that each application can have a more or less contiguous address space or at least think that the address space is contiguous. Suffice it to say though, that multiple applications may use the same virtual addresses within their own address space. In reality, this is not a problem, though, since the individual virtual address spaces are task-specific and are translated into physical addresses by the memory management unit on the CPU.

To speed up this translation process, the memory management hardware uses so-called translation lookaside buffers (TLB), that can be considered a specialized, associative form of cache which maps virtual addresses onto physical addresses. Each TLB typically consists of a small content addressable memory (CAM), which essentially uses a reversed mode of operation from a standard random access memory. That is, instead of giving an address to retrieve content, the content is presented in order to look up an address. The content in this case is the virtual address. If the virtual address (content) is in the TLB, then the physical address is returned within one cycle. If not, then the memory management unit has to use data called page table entries (PTE) within data structures called page tables and that takes significantly longer. The physical page number is then combined with the page offset to generate the complete physical address. Typical Miss penalties are in the order of 30 cycles compared to a single cycle for a hit. Thirty cycles are a high number but bear in mind that this happens on the level of the CPU (including cache), and, therefore the cycle times are CPU cycle times. A case in point would be the memory controller of the Phenom running at 2 GHz or 1800 MHz, respectively, in which case the 30 cycles would equal to 15 ns or 18 ns additional access time.

A PTE or TLB entry can also contain house keeping information, that is, a dirty bit to indicate whether the page has been written to, or the accessed bit to keep track of which page was least recently used in order to utilize page replacement algorithms. Moreover, since we are working in protected mode, permissions can be set to allow access by users and supervisors along with the cacheable, uncacheable or uncached speculative write combine (among others) flags.

All of this is very straightforward as long as one is dealing with a single task. Task switching makes things a bit more complicated and require some extra strategies for managing the TLBs. Historically, the easiest way has been to flush the TLBs but this type of eviction is not very efficient. A different and much more elegant method introduced by DEC with the Alpha EV6 architecture is to tag each valid entry in the TLBs with an address space number (ASN) and only to consider entries with a tag that matches the task at hand.


(AMD Phenom 9600 2.3GHz
(HD9600WCGDBO))

next page: => Additional Complexity of Virtualization =>

All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 - 2008 LostCircuits