Technical convos on bitcointalk

Discourse by apparently-knowledgable Gapcoin users

Apr 6, 2021 • Graham Higgins ~ 12 min to read •

2014-11-13 bsunau7: To me the primes/s metric looks to be useless. It is a counter which is only incremented when a composite isn’t found which makes it a valid comparison only for the same sieve parameters.

Example; assume that there are on average 100 non-composite number in the gap to test, your chance of finding a block is primes/s divide by 100 multiplied by the chance of a gap.

Now lets increase the efficiency of the sieve (add more primes) so that you only have 80 non-composite numbers to test. The equation used above is now wrong by 25% and your reported primes/s has hardly changed at all.

This also means profit calculators based on primes/s are not accurate if you change -s or -r when mining.

Using 10g/s or 15g/s would be a better method to calculating profits as it includes a way of seeing the effects of sieve efficiency.

2014-11-13 bsunau7: Thanks to riecoin my gmp is already custom compiled and tuned. I only mentioned testnet as the difficulty was high and was increasing at a steady rate. Interesting that you get some performance out of SIMD with the code as is. One of the first things I did was to compile with -ftree-vectorize -mavx2 -ftree-vectorizer-verbose=5 to see what auto-vectorising found. It didn’t find much (any?) to vectorise as most of the size of the loops aren’t know at compile time (there are trick you can use however) or a non-uniform step being used. PS. The fact that GCC couldn’t vectorise what should be very conducive to vectorising is a good avenue to work on for speed-ups.

2014-11-13 bsunau7: Not sure if anyone else thinks this is a good idea, but can we replace those 10 & 15 gap metrics with something else?

I am using primes/s and candidates/s eg:

[2014-11-15 10:59:11] pps: 14669 / 14287  candidates/s 63998895
[2014-11-15 10:59:41] pps: 13121 / 14136  candidates/s 63321793
[2014-11-15 11:00:11] pps: 13572 / 14100  candidates/s 63159069

This is just the number of "numbers" scanned, in effect how fast numbers are skipped/tested. I just accumulate sievesize for every call of run_sieve.

It is the only way I can see of measuring performance across different miners and different parameters (tuning parameters is why I added it to mine).

2014-11-13 bsunau7: The miner does not stop mining the sieve until the original one is verified by a slow mpz_nextprime() call. When your miner is significantly faster than the PoW function you tend to find lots of shares before the PoW can stop the sieve from mining.

2014-11-13 Supercomputing: @bsunau7 Thanks for the info and I will implement a small sieve for validation and use a single exponentiation test at each end point; it should be at least 100x faster and less than 1/(2128) probability of it being a larger gap.

2014-11-13 bsunau7: There was talk on GMP mailing lists about speeding up gmp_nextprime in a very similar manner, sieve the start/end gap before running expensive MR tests. As the gap has been pre-sieved you might just be able to MR test the non-composite in the mining sieve and get the same result. I personally didn’t bother with the PoW validation code as for larger gaps it just isn’t called enough to matter. Also you can tweak the pprocessor->process() to terminate the sieve early which will cure the stales and get you to the next "block" a few seconds faster at the risk of having a slightly wrong difficulty (in effect a non issue).

2014-11-13 bsunau7: Here is a little fix which increased my block count (got your attention)?

Added TCP keep alives with a sub 5 minute period to the wallet. Three lines of code in net.cpp and netbase.cpp and a tweak of the Linux kernel.

net.cpp diff:

--- net.cpp_orig        2014-11-07 19:38:26.941369345 +0100
+++ net.cpp     2014-11-07 19:42:11.077371850 +0100
@@ -1594,6 +1594,7 @@
     // Different way of disabling SIGPIPE on BSD
     setsockopt(hListenSocket, SOL_SOCKET, SO_NOSIGPIPE, (void*)&nOne, sizeof(int));
+    setsockopt(hListenSocket, SOL_SOCKET, SO_KEEPALIVE, (void*)&nOne, sizeof(int));

 #ifndef WIN32
     // Allow binding if the port is still in TIME_WAIT state after

netbase.cp diff:

--- netbase.cpp_orig    2014-11-07 19:38:21.365369283 +0100
+++ netbase.cpp 2014-11-07 19:41:02.549371084 +0100
@@ -329,6 +329,8 @@
     int set = 1;
     setsockopt(hSocket, SOL_SOCKET, SO_NOSIGPIPE, (void*)&set, sizeof(int));
+    int set = 1;
+    setsockopt(hSocket, SOL_SOCKET, SO_KEEPALIVE, (void*)&set, sizeof(int));

 #ifdef WIN32
     u_long fNonblock = 1;

Add into /etc/sysctl.conf

net.ipv4.tcp_keepalive_time = 240
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 9

Then "sysctl -p /etc/sysctl.conf" to make it take effect.

I went from spending half my time in 45 minute limbo land mining dead blocks due to the wallet hanging to no issues in the last 24 hours (no wasted mining).

Just quick and dirty hack (this is code for "It could have been done in a nicer way") which seems to have sorted out a big source pain for my little 4-core mining rig.

Oh, this might also address stratum mining issues this coin seems to have. Monotonic coins don't go 15+ minutes without a transactions (i.e network traffic) so they would not see this issue.

2014-11-13 bsunau7: Starting point is derived from the block you are mining so effectively the starting point is random. The "absolute end point" is defined by the "shift" which you get to pick. People only mine part of the "space" which is defined by the sieve size. Difficulty is logarithmic, it is on the gapcoin website, so a difficulty or 23 is significantly harder than one of 22. Summary; random range of numbers are scanned looking for a run (currently a few thousand) of composite numbers.

2014-11-13 Supercomputing: Make no mistake about it, this is a GPU coin for AMD R9 and Nvidia Maxwell GPUs. The algorithm used for this coin falls in the sweet spot as afar as GPU register usage is concerned for modular arithmetic implementation (512-bit or less). Also, sieving is much faster on these GPUs.

2014-11-13 Supercomputing: I will not have time this coming weekend. But during the following weekend, I will integrate Gapcoin mining with the open source Primecoin miner below. Less than an hour's worth of work:

2014-11-13 Supercomputing: No worries, I will help Gapcoin break the world record. I still need to optimize the GPU miner for Nvidia (Maxwell) with 2GB of VRAM. For now it only works on R9 cards with 3+ GB of VRAM.

2014-11-13 Supercomputing: ... my implementation of the GPU miner is deadly and will surely kill the coin without a doubt. The coin is too young and let us give it some time to grow. I have mined enough BTCs and XPMs therefore I am going to shelve it at this time.

GPU Miner Test Run:

[2014-11-12 17:32:17] pps: 0 / 0.0000 10g/h 0.0000 / 0.0000  15g/h 0.0000 / 0.0000

We found our gap...

1054835642665510130316115181531813465485913069989570483763342784779359674641399 765143
[2014-11-12 17:32:24] Found Share: 22.8244760624  =>  accepted
[2014-11-12 17:32:27] pps:  0 / 0.0000  10g/h 0.0000 / 0.0000  15g/h 0.0000 / 0.0000
[2014-11-12 17:32:28] Got new target: 22.5718696617

[2014-11-12 17:34:40] pps: 0 / 0.0000 10g/h 0.0000 / 0.0000  15g/h 0.0000 / 0.0000

We found our gap...

1460023432727399844421086333295273985776598611867186573289828851669587467837029 674331
[2014-11-12 17:34:41] curl_easy_perform() failed: Failed initialization
[2014-11-12 17:34:41] waiting for gapcoind ...
[2014-11-12 17:34:41] Found Share: 22.8490689436  =>  accepted
[2014-11-12 17:34:46] Got new target: 22.5826074437

2014-11-13 Supercomputing: In addition, please see the proof-of-work verification code for restrictions - see lines 99 onwards: Also, because the merit of the gap is the ratio of the gap size relative to the natural logarithm of the smaller prime, the size of the prime does not have to increase by much as the difficulty increases.

The dcct's optimisation uses an idea to skip prime tests for a range of numbers between two primes if it is less than the gap minimum length. This works only for sequential searching.

2014-11-20 angelovAlex: for experiment decided to create a cuda miner in free time. I rewrite mpz's powmod function for cuda and ran it on my old 9600gt.

2014-11-21 Palmdetroit: “Seems like the gpu miner isn't fully optimized yet and still uses a good amount of the cpu also.”

I think lots of gpu speed issues may be cpu related, look at gpu usage to see if there's a bottleneck. Hopefully someone can improve on the cpu part of the gpu miner, even small improvements would likely bump speeds.

2014-11-21 SpeedDemon13: The GPU usage is between 80%~95% fluctuations and the cpu is around 50% usage. I'm mining the AMD cpu on a different rig than the gpu, just to see how they both do. The the gpu is teamed with a Core i5. Hopefully, the gpu miner will be optimized in the near. But for now, cpu mining is better at the moment...

2014-12-06 altpooler: Best I've been able to get with a radeon 7970 is between 1,500,000 - 1,700,000 PPS with the following settings:

    -g -j 5 -n 4 -w 1000 -i 50000

The flags used above are explained below:

Enable GPU mining:

Stats interval (in the settings above, I have stats printing every 5 seconds):

Number of tests per gap per gpu run:

Number of primes for sieving:

Tried many different values for the -n and -i flags and actually had it around 2,000,000 PPS, but the test/s kept going to 0. I preferred stability so I adjusted the settings to lower the PPS and keep tests/s high.

A few other settings that may have an effect on PPS, just for some quick reference. These can all be seen by running the help command ./gapminer -h:
--work-items (gpu work items - default: 2048)
--queue-size (gpu waiting queue size - note: memory intensive)
--sieve-size (prime sieve size)
--shift (the adder shift)

2015-04-16 pdazzl: “How do you attempt particular records?”

I don't have solid proof this is always true but I seem to have had the best luck on large shift block hits when mining solo at night (during night time the large firepower pool miners don't mine as hard making the rounds longer though I've gotten large shift records on both solo and pool).

All you do is append --shift (bigger number than default 25). Example:

./gapminer -o -p 3385 -u <user.worker> -x <password> -c -t 16 --shift 256
The default --sieve-size and --sieve-primes have worked fine for me when being speculative with large shifts. A shift of 256 has gotten me into the 8000-9998 list several times. Did some calculations and shift 380 & 381 got into the 10000-14998 range which if you solve any block with difficulty 22-23 you have a very good probability of a world record every time with where the records are.

Keep in mind, these higher shifts will severely punish your pps rate since you're looking for larger overall gaps AND all the numbers you're testing are bigger: shift 25 are 85 digit numbers vs shift 256 are 155 digit numbers.....which is why solo mining probably gives you better odds (with a solo target you're putting all your chips in the middle to solve the block, not accumulate shares/coins).

If nscythe or skif (the two powerhouse pool miners) are reading this, they could probably set a couple hundred new world records within a week with these higher shifts if they really wanted to. Gapcoin taking over more of the record lists I would think could only help it.

EDIT: Just set another record a few minutes ago with length @ 11202 (on block #109634). You can see it was set with shift 391 from the block record

2015-04-226 pdazzl: “So many questions...Like what shift means exactly.”

All shift does is affect the size of the numbers you are searching. Larger shifts means bigger raw numbers you are scanning/testing and subsequently larger gaps are found in larger number ranges (though you need larger overall gap lengths to get equivalent merit scores to solve the block compared to smaller number ranges).

A block solution is calculated by sha256(Blockheader) * 2^shift + adder. During a particular round, the miner is searching through all the "adder" possibilities to find a gap.

An example of what shift does using solved block #113196

256 bit hash = 96836026872238572127838827186667600687197793668933440162407221099313404183696 
(Hash is stored in hexadecimal format of d6173fb8d38ec64dabb92d9af96290e68d02c3494f758d7274931fecb4c60490)
Shift = 32
Adder = 739033261
Solution is (96836026872238572127838827186667600687197793668933440162407221099313404183696) * (232) + 739033261. That's the first prime before the gap of length 5034 begins. The last prime that ends the gap would be of course plus 5034 to the solution above.

The number above is 87 digits number long, 13 digits away from a googol. Theoretically if the shift was instead 381 then the resultant number would 192 digits.

It's up to you how you do the shifts, I personally think a wider dragnet (different simultaneous shifts on different machines) will yield more solutions. Also different shifts seem like they have better yields in my observation but I don't have concrete proof of that.