Are there performance advantages to be gained by linking GapMiner against locally-compiled static GMP and MPFR libraries which have been optimised for the host CPU, as opposed to using the standard distro packages which are only configured for the host OS?
This test is a comparison between reported performance of standard GapMiner versus one compiled with optimised GMP/MPFR (where “optimised” means “compiled with GCC flags -static -march=native -mtune=native”).
Column labels map directly to miner output: pps is average primes per second, tps is average tests per second, gps is average gaps per second, glst is the size of the gaplist and l/s is the time in seconds to scan the gaplist at the reported rate of gaps per second.
standard gapminer
| measure | pps | tps | gps |
|---|---|---|---|
| mean | 381417 | 121373 | 17836 |
| max | 450402 | 143195 | 21061 |
| min | 371074 | 118095 | 17352 |
| std | 15163 | 4804 | 709 |
optimised gapminer
| measure | pps | tps | gps |
|---|---|---|---|
| mean | 409228 | 130275 | 19137 |
| max | 479724 | 152951 | 22436 |
| min | 398576 | 126829 | 18638 |
| std | 15814 | 5077 | 740 |
per cent improvement
| measure | pps | tps | gps |
|---|---|---|---|
| mean | 7.29 | 7.33 | 7.29 |
| max | 6.51 | 6.81 | 6.53 |
| min | 7.41 | 7.34 | 7.41 |
| std | 4.29 | 5.68 | 4.37 |
The charts included below are for completeness. The figures given are the means and they differ from the pandas-produced means because, in order to support the different requirements of charting, the first 1/5th of the data is ignored in order to aid visual comparison. The chartline uses the full dataset.
Replication:
./bin/gapminer-standard -o localhost -p 31397 -u $(RPCUSER) -x $(RPCPASSWORD) -e -t 4 -f 32 > standard-miner.log 2>&1
./bin/gapminer-optimised -o localhost -p 31397 -u $(RPCUSER) -x $(RPCPASSWORD) -e -t 4 -f 32 > optimised-miner.log 2>&1