Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:








































































































































What are you
shopping for?



































































































































































LOSTCIRCUITS

SHORTCUTS:
The Ultimate Dream Machine
Snoops and Filters
Intels Snoop Filter Implementation
SkullTrail Extreme D5400SX
System Configurations
Power Play
3D Rendering and Energy Efficiency
Cinebench
DVD-Shrink, MainConcept
VirtualDub / DivX 6.8
3DMark'06
World In Conflict
Crysis
F.E.A.R.
UnrealTournament3
The Final Analysis

Give Us Some Feedback on this Article

 Intel's SkullTrail Extreme Platform
Playground of the Titans
(Author: Michael Schuette, February 10, 2008)

The Snoop Filter on Intel's 5400 MCH

Intel is not using eDRAM but rather an SRAM-based snoop filter: Below is an excerpt from the 5400 MCH data sheet

The Snoop Filter (SF) eliminates traffic on the snooped frontside bus of the processor being snooped. By removing snoops from the snooped bus, the full bandwidth is available for other transactions. Supporting concurrent snoops effectively reduces performance degradation attributable to multiple snoop stalls.

The SF is composed of four affinity groups each containing 4 K sets of x24-way associative entries. The overall SF size is 24 MB. Each affinity group supports a Activeway management algorithm. Lookups are done on a 96 way lookup, full 24-way per set for 4 sets for hit/miss checks.

The snoop filter is organized in four parts referred to as the Tag Ram Affinity Groups, Affinity[3:0]. Each Affinity Group is associated with each last level cache. Under normal conditions a snoop is competed with a 1 snoop stall penalty. When the processors request simultaneous snoops the first snoop is completed with a one snoop stall penalty, the second snoop requires a 2 snoop stall penalty.

During SF access arbitration, processor 0 is given priority over processor 1. Thus simultaneous snoops are resolved with a 1 snoop stall penalty for processor 0 and a 2 snoop stall penalty for processor 2.

The SF stores the tags and coherency state information for all cache lines in the system. The SF is used to determine if a cache line associated with an address is cached in the system and where. The coherency protocol engine (CE) accesses the SF to look-up an entry, update/add an entry, or invalidate an entry in the snoop filter.

The SF has the following features:

  • Snoop Filter tracks total of 24 MB of processor L2 cache lines, this is equivalent to: (24 * (220)byte) / 64 byte CL = 393,216 cache lines.
  • The SF is configured in 4 K sets organized as a 4 DID Affinity x 24 Way x 4 K Set- Associativity array. This is equivalent to (212 Sets) x 24 Way x 4 DID = 393,216 tag entries
  • 4 x 24 Affinity Set-Associativity will allocate/evict entries within the 24-way corresponding to the assigned affinity group if the SF look up is a miss. Each SF look up will be based on 96-way (4x24 ways) look up.
  • The size of the snoop filter Tag RAM is: 4096 sets * 4 affinities * 24 ways * 33 bits/affinity/ set/way = 1,622,016 bytes
  • The size of the snoop filter Victim Ram is: 4096 sets * 4 affinities * 8 bits = 16,384 bytes
  • The size of the snoop filter Random ROM is: 1024 addresses * 16 bits = 2,048 bytes
  • The Snoop Filter is operated at 2x of Intel® 5400 chipset core frequency, i.e. 533 MHz to provide 267 MLUU/s (where a Look-Up-Update operation is a read followed by a write operation to the tag).

    • The maximum lookup and update bandwidth of the Snoop Filter is equal to the max request bandwidth from both FSB’s. The lookup and update bandwidth from I/O coherent transactions have to share the bandwidth with both FSBs per request weighted-round-robin arbitration.
    • The SF lookup latency is four SF-clocks or two Intel® 5400 chipset core clocks to support single snoop stall in idle condition (single request issued from either bus). If both bus are making requests simultaneously, the snoop-filter will always select bus 0 first. In such scenario, bus 0 request will have one snoopstall and bus 1 request will have two snoop-stalls.
  • Active Way / Invalid / E/M / Pseudo-Random replacement algorithm, with updates on lookups and invalidates, Invalid / Pseudo-Random replacement algorithm, with updates on lookups and invalidates.
  • Tag entries support a 38-bit physical address space. The MCH supports an external address space of 38 bits as well.
  • Stores coherency state (EM) and Bus[1:0] for each valid cache line in the system. The tracking algorithm utilizes conservative tracking (super-set tracking). The processor can silently down grade a line state from E to S/I or S to I without any action appearing on the FSB. Therefore, a line appearing in the SF as E states may actually missed in the corresponding processor caches. Conversely a SF S-line will never be found in E/M state in a processors L2 cache, or a SF miss will never be found in M/E/S state in a processors L2 cache. The following is the summery of the snoop-filter state definitions:
    • Coherency state: the cache line is in E/M state if the bit is set; else, the line is in share state
    • If Bus[1:0]=00, the entry is invalid.
    • If Bus[1:0]=01, the FSB0 processor(s) has ownership of the line.
    • If Bus[1:0]=10, the FSB1 processor(s) has ownership of the line.
    • If Bus[1:0]=11, both buses have ownership and the line must be shared by both FSB processors (EM must be 0).
    • EM||Bus[1:0] =111 is a reserved definition.
  • ECC coverage, with correction of single bit errors, detection of double bit errors (SEC-DED).
    • Invalid/pRandom Array does not implement ECC or parity protection. A bit failure will result in the selection of the wrong victim entry and may have a minimal impact on performance. However, the coherency engine will resolve the conflict and guarantee correctness.
  • In summary, the Snoop Filter is there to reduce the amount of snooping necessary in a multi SMP system by using a common tag RAM to consolidate the cache entry look-ups with the result that any given CPU does not need to snoop every other processor in the system and can resort to the snoop filter as the main “table of content”. Moreover, multiple snoops can concurrently be done from different CPUs residing on different buses.

    next page: => Enter The SkullTrail =>

    All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
    Thank you!

    General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
    All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
    Copyright 2002 - 2008 LostCircuits