Put into an array all natural numbers up to a given limit

Obviously, crossing out of multiples need to start firstly at the square of the new chosen sieve prime. So the total effort is about

After seen some "fast" implementations of this algorithm by other people,
I decided to
write my own, really fast and less memory consuming computer program.
There are four main improvement points for

- Dense bit packing for the crossing-out flags
- To use the memory efficiently, it is necessary to mark the numbers not in a byte or even a word of the computer, but in a bit only. This must be done very clever, because bit-access is much more expensive in cpu-time than byte-access or the CPU-prefered word-access!
- Only presieved numbers in the sieve
- With exception of 2, all other primes are odd, so we need only to store the flags for
the odd numbers in the sieve. But furthermore, only the numbers
*6k+1*and*6k+5*can be primes (except 2 and 3). So, we can reduce the total amount of sieve memory by a factor of 3 storing only the flags for these numbers. If we even exclude all multiples of 5, resulting in a factor of 3.75, we need only 8 flags each 30 numbers. This is really nice, because 1 byte has 8 bits! - Don't bother with multiples of the smallest primes
- We know, that the primes, except 2 and 3,
can occur only for those numbers which have modulo 6 a remainder 1 or 5.
So we can avoid to cross out all multiples of 2 and 3, saving a
factor of 3 in sieving speed. What is more, this list of smallest primes can
be extended, e.g. including 5 and 7, and we need only to consider 48 numbers out
of 210, achieving a speed-up factor of even 4.375. Each further small prime
*p*will decrease the run time by a factor*1-1/p*, but increase the code of the program by a factor*p-1*. - Choose an appropriate sieve size
- Typically, the sieve should fit into the computers main memory and even better into its cache in modern high speed computers. Therefore, one can't use a sieve of the size of the sieve limit, but must use far smaller sieve sizes normally. So, one has to choose an appropriate fixed sieve size and sieve the total range in many parts sequentially. And hence, the dense bit packing in the sieve pays very well!

Thanks to Thomas Fiedler, University of Jena, who discovered in May 2003 an important bug (a 1 not a 0 should have been there) when segmentating sieving, I polished the source a bit and thus you can fetch the latest prime_sieve.c version 2.0c. And here is the corresponding README.

- source code availible
- very fast
- interval sieving possible
- adjustable to CPU cache size
- works up to numbers 2
^{64} - user definable macro called for each found prime number

Limit | Prime Count | CPU1 | CPU2 | CPU3 | CPU4 | CPU5 | CPU6 | CPU7 | CPU8 | CPU9 | Factor |
---|---|---|---|---|---|---|---|---|---|---|---|

10^2 | 25 | 0.0 s | 0.0 s | 0.00 s | 0.0 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 1.0955 |

10^3 | 168 | 0.0 s | 0.0 s | 0.00 s | 0.0 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 1.5010 |

10^4 | 1229 | 0.0 s | 0.1 s | 0.00 s | 0.0 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 1.7887 |

2^16 | 6542 | 0.0 s | 0.1 s | 0.01 s / 0.01 s | 0.0 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 1.9744 |

10^5 | 9592 | 0.1 s | 0.1 s | 0.01 s / 0.01 s | 0.0 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 0.00 s | 2.0118 |

10^6 | 78498 | 0.1 s | 0.2 s | 0.02 s / 0.03 s | 0.1 s | 0.01 s | 0.01 s | 0.01 s/ 0.01 s | 0.00 s | 0.00 s | 2.1941 |

10^7 | 664579 | 0.2 s | 1.6 s | 0.14 s / 0.20 s | 0.5 s | 0.13 s | 0.08 s | 0.04 s/ 0.04 s | 0.02 s | 0.01 s | 2.3483 |

10^8 | 5761455 | 1.2 s | 17.0 s | 1.42 s / 2.02 s | 4.9 s | 1.36 s | 0.80 s | 0.36 s/ 0.37 s | 0.18 s | 0.11 s | 2.4818 |

10^9 | 50847534 | 11.3 s | 187.8 s | 16.2 s / 21.7 s | 51.3 s | 15.30 s | 8.84 s | 3.62 s/ 3.68 s | 1.55 s | 1.16 s | 2.5996 |

2^32 | 203280221 | 50.7 s | 889.5 s | 79.6 s / 104.7 s | 249.5 s | 73.85 s | 43.01 s | 15.97 s/ 15.98 s | 7.55 s | 5.14 s | 2.6676 |

10^10 | 455052511 | 120.4 s | 268.9 s | 191.35 s | 38.49 s | 16.72 s | 12.34 s | 2.7050 | |||

10^11 | 4118054813 | 1268.6 s | 4122.7 s | 482.77 s | 214.51 s | 143.71 s | 2.8003 | ||||

10^12 | 37607912018 | 15207.7 s | 7466.49 s | 3071.43 s | 2074.49 s | 2.8873 | |||||

10^13 | 346065536839 | 249032.9 s | 32327.34 s | 30955.69 s | 2.9673 | ||||||

CPU2: MIPS 3000 33MHz with 32 MB RAM (no Cache) running Ultrix V4.4 (sieve size=15KB).

CPU3: AMD K6 233MHz with 64 MB RAM (32 KB Data-Cache) running Linux 2.032 (sieve size=22KB). The C-source was compiled using gcc 2.7.2.1 for i486-linux one time for 32bit LONG and the other time for 64 bit LONG. Further, for limit > 2^32 one should increase the sieve size to get shorter running times.

CPU4: Intel Pentium 133MHz with 64 MB RAM (8KB Data-Cache) running Windows 95 (sieve size=8000B). Compiler: Visual C++ (max speed)

CPU5: DEC Alpha 21164a 400 MHz with 64 MB RAM (8 KB Data-Cache) running OSF1 V4.0 (sieve size=8000)

CPU6: Intel Pentium III 450 MHz with 128 MB RAM (16 KB Data-Cache) running Linux 2.2.14 using gcc 2.7.2.3 (i386 Linux/ELF) (sieve size=16384).

As you see, it is very important how well the code-optimizer and the caching logic of the cpu does. The sieve size are nearly optimal chosen for limits < 10^10.

CPU7: AMD Thunderbird 900 MHz with 256 MB RAM (64 KB 1st level Data-Cache) running Linux 2.2.19 using gcc 2.95.2 (i386 Linux/ELF) (sieve size=64000/64KB).

CPU8: PowerPC 970FX 2.5 GHz with 1.5 GB RAM running Mac OSX 10.3.8 using compiler IBM XLF 6.0 Advanced Edition (sieve size=32000 for limit <= 10^11 else the minimal necessary sieve size).

CPU9: AMD Athlon64 Winchester 2 GHz with 3 GB RAM (64 KB 1st level Data-Cache) running Linux 2.6.9 using gcc 3.4.2 (x86-64 Linux/ELF) (sieve size=65000).

Factor = ln

The average access for each bit in the sieve is PROD

BTW: Because the gaps between successive primes are <= 250 up to p=436273009 and <= 500 up to 304599508537, to hold a list of primes only their differences need to be stored in a byte.

For interested persons: the frequency of primes
and a little problem for the mathematicians: estimate the magnitude of

SUM* _{p<=n} 1/p * - ln(ln

Hint: Hardy & Wright tell us

Achim Flammenkamp

created: 1998-06-02 17:30 UTC+2

updated: 2005-05-19 17:10 UTC+2