Outline
	Results are majority of time
Why do I care?
	Sun used it during Ultrasparc development
	Intel used it for P6 development
	Linus has improved context switching and file movement 3-10x
	Both Sun & SGI had similar page coloring bug
Example benchmark
	It's simple
Benchmark interfaces
	All the gunk you need to do accurate timing is here
	Portable as well - does not use SGI interfaces
Results
	lots of graphs & tables
	apologize for all the numbers
	Point them at home page for latest tables & paper
Systems measured
	Tried to be the best of each vendor
	R4K included for comparison
	Linux on P6 & Alpha is included
Memory latencies
	OUTLINE only, don't tell them answers
How is memory latency measured?
	Explain slowly
	Explain back to back loads vs load in a vacuum
	Note that all graphs are to scale
R4K-UP memory latencies
	Use this graph to explain how to read the graph
HP K210 memory latencies
	Amazing external cache latencies!!!
	MP machine too
Memory latency summary
	Explain chart
		latencies are in ns, have to divide by clock to get cycles
	Cache line size can be wrong if they do readahead
		Some HP systems do this
	Note HP system
		256K off chip cache that comes back one cycle after load
	Why does everyone take 2 cycles to do a L1 load?
	Alphas & P6 suck - 2 clocks to get to L1
	I could have done an efficiency slide 
How is context switching measured?
P6/Linux context switch times
	Explain graph slowly
	left cluster is 2 processes of varying sizes
	top line is 64K processes
	Marketing number: 2 processes of 0 size
		Doesn't show processes falling out of cache
R4K-UP context switch times
	Reasonable numbers but weird variance
R4K-MP context switch times
	Sort of medium poor numbers
R10K-Everest context switch times
	Great numbers
	Remains to be seen if we can do this on a R10K MP
Process creation & signaling summary
	Note jump between fork/exit/wait and exec
Interprocess communication summary
	Context switches as even as possible
	Note R10K RPC/UDP number is bad
	Note most TCP latencies better than UDP latencies
	Note ultrasparc TCP & mention over the wire
File and VM system summary
	File create & delete
		p_tupdate, tar, makes
		smaller is better
		K210, IRIX, OSF1 do it safely
		Linux cheats
		K210?
	Pagefault uses msync & walks backwards
Communication bandwidth summary
	Hardware benchmark!
	Compare pipe to TCP - these are loopback
	Compare file reread to bcopy
	Compare Mmap reread to mem read
	Compare bcopys
		libc should be better
		Note R10K bcopys - needs work
	Mem read rate
		notice that P6 is fastest on reads
