I am examining java version sparse matrix multiplication program which is from JGF benchmark. I run this program in many kinds of cpu frequency. I also do some profile for this program. I classify it as a memory-intensive program, because the cache locality is bad and has heavy memory access. The execution time of this kind of program running in slower frequency should decrease slightly compared to faster frequency due to it would waste cpu cycles in stall. But the execution time of this program is proportional to cpu frequency in my experiments. Why is the reasons?
The dimension of matrix(array) is 500000 and this program was run in i7-920 which has three layer cache. There are 32KB L1 data 2KB, L1 instruction per core, L2 256KB per core and L3 8MB shared cache.
I also got the execution statistics by perf:
Performance counter stats for 'java -cp . JGFSparseMatmultBenchSizeC':
83925.084119 task-clock-msecs # 1.001 CPUs 2,045 context-switches # 0.000 M/sec 28 CPU-migrations # 0.000 M/sec 29,687 page-faults # 0.000 M/sec 223,130,573,396 cycles # 2658.688 M/sec (scaled from 66.68%) 66,679,432,987 instructions # 0.299 IPC (scaled from 83.33%) 12,779,607,690 branches # 152.274 M/sec (scaled from 83.32%) 11,389,605 branch-misses # 0.089 % (scaled from 83.32%) 11,056,332,293 cache-references # 131.740 M/sec (scaled from 83.34%) 3,847,329,243 cache-misses # 45.842 M/sec (scaled from 83.35%) 83.816412311 seconds time elapsed
preguntado el 02 de febrero de 12 a las 11:02
Integer objects that represent values close to 0 may be cached by JVM to save memory - maybe that could play some role in it.