RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Monday, April 16, 2007

To gather meaningful performance metrics, it's usually a good idea to run several iterations of the same test, averaging the numbers in some way, to eliminate noise from the results.  This is true of sequential and fine-grained parallel performance analysis alike.  Though it's clearly important for sequential code too, data locality can add enough noise to your parallel tests that you'll want to do something about it.  For example, if iteration #1 enjoys some form of temporal locality left over from iteration #0, then all but the first iteration would receive an unfair advantage.  This advantage isn't usually present in the real world -- most library code isn't called over and over again in a tight loop -- and could cause test results to appear rosier than what customers will actually experience.  Therefore, we probably want to get rid of it.

To eliminate this noise, we can clear the Dcaches (data caches) of all processors used to run tests before each iteration.  How do you do that?  Well...

Intel offers a WBINVD instruction to clear a processor's Dcache, but sadly it's a privileged instruction and there's no way to get at it from user-mode.  So that's a no-go for most Windows programmers.  There's also a Win32 function to clear a processor's Icache, but this doesn't work for Dcaches, which is what we're after, so we're out of luck there too.

Instead, we can implement a really low tech solution.  Take some random data, sized big enough to fill a processor's L2 cache, and read the whole thing from each processor whose cache we wish to clear before each iteration.  This will evict all of the existing lines in the caches that could be left over from previous iterations.  All of the new lines will be brought in as shared, and, while they will be evicted when we start using real data in the query, this effectively simulates a cold cache.  Here's an example of this:

const int s_garbageSize = 1024 * 1024 * 64; // 64MB.
static IntPtr s_garbage = System.Runtime.InteropServices.Marshal.AllocHGlobal(s_garbageSize);

unsafe static void ClearCaches() {
    for (int i = 0; i < Environment.ProcessorCount; i++) {
        SetThreadAffinityMask(GetCurrentThread(), new IntPtr(1 << i));
        long * gb = (long *)s_garbage.ToPointer();
        for (int j = 0; j < s_garbageSize / sizeof(long); j++) {
            long x = *(gb + j); // Read the line (shared).
            long y = Math.Max(Math.Min(x, 0L), 0L); // Prevent optimizing away the read.
        }
    }
    SetThreadAffinityMask(GetCurrentThread(), new IntPtr(0));
}

[DllImport("kernel32.dll")]
static extern IntPtr GetCurrentThread();
[DllImport("kernel32.dll")]
static extern IntPtr SetThreadAffinityMask(IntPtr hThread, IntPtr dwThreadAffinityMask);

This clearly isn't the most efficient implementation.  On multi-core architectures, some cores are apt to share some levels of cache, so with the above approach we'll end up doing duplicate (and wasted) work.  And few processors on the market have 64MB of L2 cache -- I've just chosen a reasonable number that's bigger than most processors -- so we could try to be more precise there too.  You could use the GetLogicalProcessorInformation API, new to Windows Server 2003 (server) and Vista (client), if you really wanted to be a stud.  In any case, this does the trick.

 

Recent Entries:

Search:

Browse by Date:
<April 2007>
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

Browse by Category:

Notables: