diff options
author | Jim Meyering <meyering@redhat.com> | 2012-09-16 22:31:04 +0200 |
---|---|---|
committer | Jim Meyering <meyering@redhat.com> | 2012-10-04 22:06:40 +0200 |
commit | 759ebcb57db73449b5670204f85931d34881b7d2 (patch) | |
tree | 0c72d5266451db1dca993ed5034d8b323a73cf1b /src | |
parent | 49f5c21fff9241c195d74101a334fdc2c8dc33e8 (diff) | |
download | coreutils-759ebcb57db73449b5670204f85931d34881b7d2.tar.xz |
factor: merge with preexisting factor; integrate tests; avoid warnings
* src/factor.c: Renamed from factor-ng.c, with the following changes:
Adjust copyright header to be consistent with others.
Use xmalloc and xrealloc, to avoid segv upon OOM.
Switch back to using readtokens to handle input.
Diagnose invalid inputs.
s/fprintf+exit/error/
(print_factors): Add comments.
(strto2uintmax): Return strtol_error, not int.
(read_item): Remove, no longer used.
(main): Use atexit(close_stdout) so that we don't ignore failed write.
* cfg.mk: Exempt src/longlong.h from several tests.
Exempt run.sh from the test-list-consistency test.
Exempt make-prime-list.c from numerous tests, since we won't
be making it conform: it must not link with libcoreutils.a.
Exempt factor-ng.c from the no-upper-case error message test.
* AUTHORS (factor): Add Torbjörn and Niels.
* tests/local.mk (factor_tests): Encode the 37 tests.
($(factor_tests)): Rule to generate a test script for each test.
* tests/factor/run.sh: New script, marked as very expensive.
* .gitignore: Ignore new generated files.
* src/local.mk (src/primes.h): New rule.
(noinst_PROGRAMS): Add make-prime-list.
(noinst_HEADERS): Add longlong.h.
Remove all wheel-related rules and files.
* src/wheel-gen.pl: Remove file.
maint: mark set-but-not-used variables with ATTRIBUTE_UNUSED
* src/factor-ng.c (redcify, prime_p, isqrt2): Mark them, so we
don't have to disable -Wunused-but-set-variable.
maint: use __builtin_expect only if __GNUC__
* src/factor-ng.c (LIKELY, UNLIKELY) [__GNUC__]: Add #ifdef guard.
build: avoid warning about unused macro
* src/factor-ng.c (__GMP_DECLSPEC): Don't define here
* src/longlong.h (__GMP_DECLSPEC): Define if not already defined.
Diffstat (limited to 'src')
-rw-r--r-- | src/.gitignore | 2 | ||||
-rw-r--r-- | src/factor-ng.c | 2396 | ||||
-rw-r--r-- | src/factor.c | 2463 | ||||
-rw-r--r-- | src/local.mk | 31 | ||||
-rw-r--r-- | src/longlong.h | 6 | ||||
-rwxr-xr-x | src/wheel-gen.pl | 114 |
6 files changed, 2190 insertions, 2822 deletions
diff --git a/src/.gitignore b/src/.gitignore index 1c1edbbe1..18cccc1d7 100644 --- a/src/.gitignore +++ b/src/.gitignore @@ -109,8 +109,6 @@ uptime users vdir wc -wheel-size.h -wheel.h who whoami yes diff --git a/src/factor-ng.c b/src/factor-ng.c deleted file mode 100644 index 36be5e6fe..000000000 --- a/src/factor-ng.c +++ /dev/null @@ -1,2396 +0,0 @@ -/* Factoring of uintmax_t numbers. - - Contributed to the GNU project by Torbjörn Granlund and Niels Möller - Contains code from GNU MP. - -Copyright 1995, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2005, 2009, 2012 -Free Software Foundation, Inc. - -This program is free software; you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free Software -Foundation; either version 3 of the License, or (at your option) any later -version. - -This program is distributed in the hope that it will be useful, but WITHOUT ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A -PARTICULAR PURPOSE. See the GNU General Public License for more details. - -You should have received a copy of the GNU General Public License along with -this program. If not, see http://www.gnu.org/licenses/. */ - -/* Efficiently factor numbers that fit in one or two words (word = uintmax_t), - or, with GMP, numbers of any size. - - Code organisation: - - There are several variants of many functions, for handling one word, two - words, and GMP's mpz_t type. If the one-word variant is called foo, the - two-word variant will be foo2, and the one for mpz_t will be mp_foo. In - some cases, the plain function variants will handle both one-word and - two-word numbers, evidenced by function arguments. - - The factoring code for two words will fall into the code for one word when - progress allows that. - - Using GMP is optional. Define HAVE_GMP to make this code include GMP - factoring code. The GMP factoring code is based on GMP's demos/factorize.c - (last synched 2012-09-07). The GMP-based factoring code will stay in GMP - factoring code even if numbers get small enough for using the two-word - code. - - Algorithm: - - (1) Perform trial division using a small primes table, but without hardware - division since the primes table store inverses modulo the word base. - (The GMP variant of this code doesn't make use of the precomputed - inverses, but instead relies on GMP for fast divisibility testing.) - (2) Check the nature of any non-factored part using Miller-Rabin for - detecting composites, and Lucas for detecting primes. - (3) Factor any remaining composite part using the Pollard-Brent rho - algorithm or the SQUFOF algorithm, checking status of found factors - again using Miller-Rabin and Lucas. - - We prefer using Hensel norm in the divisions, not the more familiar - Euclidian norm, since the former leads to much faster code. In the - Pollard-Brent rho code and the the prime testing code, we use Montgomery's - trick of multiplying all n-residues by the word base, allowing cheap Hensel - reductions mod n. - - Improvements: - - * Use modular inverses also for exact division in the Lucas code, and - elsewhere. A problem is to locate the inverses not from an index, but - from a prime. We might instead compute the inverse on-the-fly. - - * Tune trial division table size (not forgetting that this is a standalone - program where the table will be read from disk for each invocation). - - * Implement less naive powm, using k-ary exponentiation for k = 3 or - perhaps k = 4. - - * Try to speed trial division code for single uintmax_t numbers, i.e., the - code using DIVBLOCK. It currently runs at 2 cycles per prime (Intel SBR, - IBR), 3 cycles per prime (AMD Stars) and 5 cycles per prime (AMD BD) when - using gcc 4.6 and 4.7. Some software pipelining should help; 1, 2, and 4 - respectively cycles ought to be possible. - - * The redcify function could be vastly improved by using (plain Euclidian) - pre-inversion (such as GMP's invert_limb) and udiv_qrnnd_preinv (from - GMP's gmp-impl.h). The redcify2 function could be vastly improved using - similar methoods. These functions currently dominate run time when using - the -w option. -*/ - -#include <config.h> - -#include <stdlib.h> -#include <stdio.h> -#include <inttypes.h> -#include <errno.h> -#include <assert.h> -#include <ctype.h> -#include <string.h> /* for memmove */ -#include <unistd.h> /* for getopt */ -#include <stdbool.h> - -#include "system.h" - -#if HAVE_GMP -# include <gmp.h> -#endif - -#ifndef STAT_SQUFOF -# define STAT_SQUFOF 0 -#endif - -#ifndef USE_LONGLONG_H -# define USE_LONGLONG_H 1 -#endif - -#if USE_LONGLONG_H - -/* Make definitions for longlong.h to make it do what it can do for us */ -# define W_TYPE_SIZE 64 /* bitcount for uintmax_t */ -# define UWtype uintmax_t -# define UHWtype unsigned long int -# undef UDWtype -# if HAVE_ATTRIBUTE_MODE -typedef unsigned int UQItype __attribute__ ((mode (QI))); -typedef int SItype __attribute__ ((mode (SI))); -typedef unsigned int USItype __attribute__ ((mode (SI))); -typedef int DItype __attribute__ ((mode (DI))); -typedef unsigned int UDItype __attribute__ ((mode (DI))); -# else -typedef unsigned char UQItype; -typedef long SItype; -typedef unsigned long int USItype; -# if HAVE_LONG_LONG -typedef long long int DItype; -typedef unsigned long long int UDItype; -# else /* Assume `long' gives us a wide enough type. Needed for hppa2.0w. */ -typedef long int DItype; -typedef unsigned long int UDItype; -# endif -# endif -# define LONGLONG_STANDALONE /* Don't require GMP's longlong.h mdep files */ -# define ASSERT(x) /* FIXME make longlong.h really standalone */ -# define __GMP_DECLSPEC /* FIXME make longlong.h really standalone */ -# define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */ -# ifndef __GMP_GNUC_PREREQ -# define __GMP_GNUC_PREREQ(a,b) 1 -# endif -# if _ARCH_PPC -# define HAVE_HOST_CPU_FAMILY_powerpc 1 -# endif -# include "longlong.h" -# ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB -const unsigned char factor_clz_tab[129] = -{ - 1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, - 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, - 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, - 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, - 9 -}; -# endif - -#else /* not USE_LONGLONG_H */ - -# define W_TYPE_SIZE (8 * sizeof (uintmax_t)) -# define __ll_B ((uintmax_t) 1 << (W_TYPE_SIZE / 2)) -# define __ll_lowpart(t) ((uintmax_t) (t) & (__ll_B - 1)) -# define __ll_highpart(t) ((uintmax_t) (t) >> (W_TYPE_SIZE / 2)) - -#endif - -enum alg_type { ALG_POLLARD_RHO = 1, ALG_SQUFOF = 2 }; - -static enum alg_type alg; - -/* 2*3*5*7*11...*101 is 128 bits, and has 26 prime factors */ -#define MAX_NFACTS 26 - -struct factors -{ - uintmax_t plarge[2]; /* Can have a single large factor */ - uintmax_t p[MAX_NFACTS]; - unsigned char e[MAX_NFACTS]; - unsigned char nfactors; -}; - -#if HAVE_GMP -struct mp_factors -{ - mpz_t *p; - unsigned long int *e; - unsigned long int nfactors; -}; -#endif - -static void factor (uintmax_t, uintmax_t, struct factors *); - -#ifndef umul_ppmm -# define umul_ppmm(w1, w0, u, v) \ - do { \ - uintmax_t __x0, __x1, __x2, __x3; \ - unsigned long int __ul, __vl, __uh, __vh; \ - uintmax_t __u = (u), __v = (v); \ - \ - __ul = __ll_lowpart (__u); \ - __uh = __ll_highpart (__u); \ - __vl = __ll_lowpart (__v); \ - __vh = __ll_highpart (__v); \ - \ - __x0 = (uintmax_t) __ul * __vl; \ - __x1 = (uintmax_t) __ul * __vh; \ - __x2 = (uintmax_t) __uh * __vl; \ - __x3 = (uintmax_t) __uh * __vh; \ - \ - __x1 += __ll_highpart (__x0);/* this can't give carry */ \ - __x1 += __x2; /* but this indeed can */ \ - if (__x1 < __x2) /* did we get it? */ \ - __x3 += __ll_B; /* yes, add it in the proper pos. */ \ - \ - (w1) = __x3 + __ll_highpart (__x1); \ - (w0) = (__x1 << W_TYPE_SIZE / 2) + __ll_lowpart (__x0); \ - } while (0) -#endif - -#if !defined(udiv_qrnnd) || defined(UDIV_NEEDS_NORMALIZATION) -/* Define our own, not needing normalization. This function is - currently not performance critical, so keep it simple. Similar to - the mod macro below. */ -# undef udiv_qrnnd -# define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - uintmax_t __d1, __d0, __q, __r1, __r0; \ - \ - assert ((n1) < (d)); \ - __d1 = (d); __d0 = 0; \ - __r1 = (n1); __r0 = (n0); \ - __q = 0; \ - for (unsigned int __i = W_TYPE_SIZE; __i > 0; __i--) \ - { \ - rsh2 (__d1, __d0, __d1, __d0, 1); \ - __q <<= 1; \ - if (ge2 (__r1, __r0, __d1, __d0)) \ - { \ - __q++; \ - sub_ddmmss (__r1, __r0, __r1, __r0, __d1, __d0); \ - } \ - } \ - (r) = __r0; \ - (q) = __q; \ - } while (0) -#endif - -#if !defined (add_ssaaaa) -# define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - uintmax_t _add_x; \ - _add_x = (al) + (bl); \ - (sh) = (ah) + (bh) + (_add_x < (al)); \ - (sl) = _add_x; \ - } while (0) -#endif - -#define rsh2(rh, rl, ah, al, cnt) \ - do { \ - (rl) = ((ah) << (W_TYPE_SIZE - (cnt))) | ((al) >> (cnt)); \ - (rh) = (ah) >> (cnt); \ - } while (0) - -#define lsh2(rh, rl, ah, al, cnt) \ - do { \ - (rh) = ((ah) << cnt) | ((al) >> (W_TYPE_SIZE - (cnt))); \ - (rl) = (al) << (cnt); \ - } while (0) - -#define ge2(ah, al, bh, bl) \ - ((ah) > (bh) || ((ah) == (bh) && (al) >= (bl))) - -#define gt2(ah, al, bh, bl) \ - ((ah) > (bh) || ((ah) == (bh) && (al) > (bl))) - -#ifndef sub_ddmmss -# define sub_ddmmss(rh, rl, ah, al, bh, bl) \ - do { \ - uintmax_t _cy; \ - _cy = (al) < (bl); \ - (rl) = (al) - (bl); \ - (rh) = (ah) - (bh) - _cy; \ - } while (0) -#endif - -#ifndef count_leading_zeros -# define count_leading_zeros(count, x) do { \ - uintmax_t __clz_x = (x); \ - unsigned int __clz_c; \ - for (__clz_c = 0; \ - (__clz_x & ((uintmax_t) 0xff << (W_TYPE_SIZE - 8))) == 0; \ - __clz_c += 8) \ - __clz_x <<= 8; \ - for (; (intmax_t)__clz_x >= 0; __clz_c++) \ - __clz_x <<= 1; \ - (count) = __clz_c; \ - } while (0) -#endif - -#ifndef count_trailing_zeros -# define count_trailing_zeros(count, x) do { \ - uintmax_t __ctz_x = (x); \ - unsigned int __ctz_c = 0; \ - while ((__ctz_x & 1) == 0) \ - { \ - __ctz_x >>= 1; \ - __ctz_c++; \ - } \ - (count) = __ctz_c; \ - } while (0) -#endif - -/* Requires that a < n and b <= n */ -#define submod(r,a,b,n) \ - do { \ - uintmax_t _t = - (uintmax_t) (a < b); \ - (r) = ((n) & _t) + (a) - (b); \ - } while (0) - -#define addmod(r,a,b,n) \ - submod((r), (a), ((n) - (b)), (n)) - -/* Modular two-word addition and subtraction. For performance reasons, the - most significant bit of n1 must be clear. The destination variables must be - distinct from the mod operand. */ -#define addmod2(r1, r0, a1, a0, b1, b0, n1, n0) \ - do { \ - add_ssaaaa ((r1), (r0), (a1), (a0), (b1), (b0)); \ - if (ge2 ((r1), (r0), (n1), (n0))) \ - sub_ddmmss ((r1), (r0), (r1), (r0), (n1), (n0)); \ - } while (0) -#define submod2(r1, r0, a1, a0, b1, b0, n1, n0) \ - do { \ - sub_ddmmss ((r1), (r0), (a1), (a0), (b1), (b0)); \ - if ((intmax_t) (r1) < 0) \ - add_ssaaaa ((r1), (r0), (r1), (r0), (n1), (n0)); \ - } while (0) - -#define HIGHBIT_TO_MASK(x) \ - (((intmax_t)-1 >> 1) < 0 \ - ? (uintmax_t)((intmax_t)(x) >> (W_TYPE_SIZE - 1)) \ - : ((x) & ((uintmax_t) 1 << (W_TYPE_SIZE - 1)) \ - ? UINTMAX_MAX : (uintmax_t) 0)) - -/* Compute r = a mod d, where r = <*t1,retval>, a = <a1,a0>, d = <d1,d0>. - Requires that d1 != 0. */ -static uintmax_t -mod2 (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t d1, uintmax_t d0) -{ - int cntd, cnta; - - assert (d1 != 0); - - if (a1 == 0) - { - *r1 = 0; - return a0; - } - - count_leading_zeros (cntd, d1); - count_leading_zeros (cnta, a1); - int cnt = cntd - cnta; - lsh2 (d1, d0, d1, d0, cnt); - for (int i = 0; i < cnt; i++) - { - if (ge2 (a1, a0, d1, d0)) - sub_ddmmss (a1, a0, a1, a0, d1, d0); - rsh2 (d1, d0, d1, d0, 1); - } - - *r1 = a1; - return a0; -} - -static uintmax_t _GL_ATTRIBUTE_CONST -gcd_odd (uintmax_t a, uintmax_t b) -{ - if ( (b & 1) == 0) - { - uintmax_t t = b; - b = a; - a = t; - } - if (a == 0) - return b; - - /* Take out least significant one bit, to make room for sign */ - b >>= 1; - - for (;;) - { - uintmax_t t; - uintmax_t bgta; - - while ((a & 1) == 0) - a >>= 1; - a >>= 1; - - t = a - b; - if (t == 0) - return (a << 1) + 1; - - bgta = HIGHBIT_TO_MASK (t); - - /* b <-- min (a, b) */ - b += (bgta & t); - - /* a <-- |a - b| */ - a = (t ^ bgta) - bgta; - } -} - -static uintmax_t -gcd2_odd (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0) -{ - while ((a0 & 1) == 0) - rsh2 (a1, a0, a1, a0, 1); - while ((b0 & 1) == 0) - rsh2 (b1, b0, b1, b0, 1); - - for (;;) - { - if ((b1 | a1) == 0) - { - *r1 = 0; - return gcd_odd (b0, a0); - } - - if (gt2 (a1, a0, b1, b0)) - { - sub_ddmmss (a1, a0, a1, a0, b1, b0); - do - rsh2 (a1, a0, a1, a0, 1); - while ((a0 & 1) == 0); - } - else if (gt2 (b1, b0, a1, a0)) - { - sub_ddmmss (b1, b0, b1, b0, a1, a0); - do - rsh2 (b1, b0, b1, b0, 1); - while ((b0 & 1) == 0); - } - else - break; - } - - *r1 = a1; - return a0; -} - -static void -factor_insert_multiplicity (struct factors *factors, - uintmax_t prime, unsigned int m) -{ - unsigned int nfactors = factors->nfactors; - uintmax_t *p = factors->p; - unsigned char *e = factors->e; - - /* Locate position for insert new or increment e. */ - int i; - for (i = nfactors - 1; i >= 0; i--) - { - if (p[i] <= prime) - break; - } - - if (i < 0 || p[i] != prime) - { - for (int j = nfactors - 1; j > i; j--) - { - p[j + 1] = p[j]; - e[j + 1] = e[j]; - } - p[i + 1] = prime; - e[i + 1] = m; - factors->nfactors = nfactors + 1; - } - else - { - e[i] += m; - } -} - -#define factor_insert(f, p) factor_insert_multiplicity(f, p, 1) - -static void -factor_insert_large (struct factors *factors, - uintmax_t p1, uintmax_t p0) -{ - if (p1 > 0) - { - assert (factors->plarge[1] == 0); - factors->plarge[0] = p0; - factors->plarge[1] = p1; - } - else - factor_insert (factors, p0); -} - -#if HAVE_GMP -static void mp_factor (mpz_t, struct mp_factors *); - -static void -mp_factor_init (struct mp_factors *factors) -{ - factors->p = malloc (1); - factors->e = malloc (1); - factors->nfactors = 0; -} - -static void -mp_factor_clear (struct mp_factors *factors) -{ - for (unsigned int i = 0; i < factors->nfactors; i++) - mpz_clear (factors->p[i]); - - free (factors->p); - free (factors->e); -} - -static void -mp_factor_insert (struct mp_factors *factors, mpz_t prime) -{ - unsigned long int nfactors = factors->nfactors; - mpz_t *p = factors->p; - unsigned long int *e = factors->e; - long i; - - /* Locate position for insert new or increment e. */ - for (i = nfactors - 1; i >= 0; i--) - { - if (mpz_cmp (p[i], prime) <= 0) - break; - } - - if (i < 0 || mpz_cmp (p[i], prime) != 0) - { - p = realloc (p, (nfactors + 1) * sizeof p[0]); - e = realloc (e, (nfactors + 1) * sizeof e[0]); - - mpz_init (p[nfactors]); - for (long j = nfactors - 1; j > i; j--) - { - mpz_set (p[j + 1], p[j]); - e[j + 1] = e[j]; - } - mpz_set (p[i + 1], prime); - e[i + 1] = 1; - - factors->p = p; - factors->e = e; - factors->nfactors = nfactors + 1; - } - else - { - e[i] += 1; - } -} - -static void -mp_factor_insert_ui (struct mp_factors *factors, unsigned long int prime) -{ - mpz_t pz; - - mpz_init_set_ui (pz, prime); - mp_factor_insert (factors, pz); - mpz_clear (pz); -} -#endif /* HAVE_GMP */ - - -#define P(a,b,c,d) a, -static const unsigned char primes_diff[] = { -#include "primes.h" -0,0,0,0,0,0,0 /* 7 sentinels for 8-way loop */ -}; -#undef P - -#define PRIMES_PTAB_ENTRIES (sizeof(primes_diff) / sizeof(primes_diff[0]) - 8 + 1) - -#define P(a,b,c,d) b, -static const unsigned char primes_diff8[] = { -#include "primes.h" -0,0,0,0,0,0,0 /* 7 sentinels for 8-way loop */ -}; -#undef P - -struct primes_dtab -{ - uintmax_t binv, lim; -}; - -#define P(a,b,c,d) {c,d}, -static const struct primes_dtab primes_dtab[] = { -#include "primes.h" -{1,0},{1,0},{1,0},{1,0},{1,0},{1,0},{1,0} /* 7 sentinels for 8-way loop */ -}; -#undef P - -/* This flag is honoured just in the GMP code. */ -static int flag_verbose = 0; - -/* Prove primality or run probabilistic tests. */ -static int flag_prove_primality = 1; - -/* Number of Miller-Rabin tests to run when not proving primality. */ -#define MR_REPS 25 - -#define LIKELY(cond) __builtin_expect ((cond) != 0, 1) -#define UNLIKELY(cond) __builtin_expect ((cond) != 0, 0) - -static void -factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i, - unsigned int off) -{ - for (unsigned int j = 0; j < off; j++) - p += primes_diff[i + j]; - factor_insert (factors, p); -} - -/* Trial division with odd primes uses the following trick. - - Let p be an odd prime, and B = 2^{W_TYPE_SIZE}. For simplicity, - consider the case t < B (this is the second loop below). - - From our tables we get - - binv = p^{-1} (mod B) - lim = floor ( (B-1) / p ). - - First assume that t is a multiple of p, t = q * p. Then 0 <= q <= - lim (and all quotients in this range occur for some t). - - Then t = q * p is true also (mod B), and p is invertible we get - - q = t * binv (mod B). - - Next, assume that t is *not* divisible by p. Since multiplication - by binv (mod B) is a one-to-one mapping, - - t * binv (mod B) > lim, - - because all the smaller values are already taken. - - This can be summed up by saying that the function - - q(t) = binv * t (mod B) - - is a permutation of the range 0 <= t < B, with the curious property - that it maps the multiples of p onto the range 0 <= q <= lim, in - order, and the non-multiples of p onto the range lim < q < B. - */ - -static uintmax_t -factor_using_division (uintmax_t *t1p, uintmax_t t1, uintmax_t t0, - struct factors *factors) -{ - if (t0 % 2 == 0) - { - unsigned int cnt; - - if (t0 == 0) - { - count_trailing_zeros (cnt, t1); - t0 = t1 >> cnt; - t1 = 0; - cnt += W_TYPE_SIZE; - } - else - { - count_trailing_zeros (cnt, t0); - rsh2 (t1, t0, t1, t0, cnt); - } - - factor_insert_multiplicity (factors, 2, cnt); - } - - uintmax_t p = 3; - unsigned int i; - for (i = 0; t1 > 0 && i < PRIMES_PTAB_ENTRIES; i++) - { - for (;;) - { - uintmax_t q1, q0, hi, lo; - - q0 = t0 * primes_dtab[i].binv; - umul_ppmm (hi, lo, q0, p); - if (hi > t1) - break; - hi = t1 - hi; - q1 = hi * primes_dtab[i].binv; - if (LIKELY (q1 > primes_dtab[i].lim)) - break; - t1 = q1; t0 = q0; - factor_insert (factors, p); - } - p += primes_diff[i + 1]; - } - if (t1p) - *t1p = t1; - -#define DIVBLOCK(I) \ - do { \ - for (;;) \ - { \ - q = t0 * pd[I].binv; \ - if (LIKELY (q > pd[I].lim)) \ - break; \ - t0 = q; \ - factor_insert_refind (factors, p, i + 1, I); \ - } \ - } while (0) - - for (; i < PRIMES_PTAB_ENTRIES; i += 8) - { - uintmax_t q; - const struct primes_dtab *pd = &primes_dtab[i]; - DIVBLOCK (0); - DIVBLOCK (1); - DIVBLOCK (2); - DIVBLOCK (3); - DIVBLOCK (4); - DIVBLOCK (5); - DIVBLOCK (6); - DIVBLOCK (7); - - p += primes_diff8[i]; - if (p * p > t0) - break; - } - - return t0; -} - -#if HAVE_GMP -static void -mp_factor_using_division (mpz_t t, struct mp_factors *factors) -{ - mpz_t q; - unsigned long int p; - - if (flag_verbose > 0) - { - printf ("[trial division] "); - } - - mpz_init (q); - - p = mpz_scan1 (t, 0); - mpz_div_2exp (t, t, p); - while (p) - { - mp_factor_insert_ui (factors, 2); - --p; - } - - p = 3; - for (unsigned int i = 1; i <= PRIMES_PTAB_ENTRIES;) - { - if (! mpz_divisible_ui_p (t, p)) - { - p += primes_diff[i++]; - if (mpz_cmp_ui (t, p * p) < 0) - break; - } - else - { - mpz_tdiv_q_ui (t, t, p); - mp_factor_insert_ui (factors, p); - } - } - - mpz_clear (q); -} -#endif - -/* Entry i contains (2i+1)^(-1) mod 2^8. */ -static const unsigned char binvert_table[128] = -{ - 0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF, - 0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF, - 0xE1, 0x8B, 0xAD, 0x97, 0x19, 0x83, 0xA5, 0xCF, - 0xD1, 0xFB, 0x1D, 0x87, 0x09, 0xF3, 0x15, 0xBF, - 0xC1, 0x6B, 0x8D, 0x77, 0xF9, 0x63, 0x85, 0xAF, - 0xB1, 0xDB, 0xFD, 0x67, 0xE9, 0xD3, 0xF5, 0x9F, - 0xA1, 0x4B, 0x6D, 0x57, 0xD9, 0x43, 0x65, 0x8F, - 0x91, 0xBB, 0xDD, 0x47, 0xC9, 0xB3, 0xD5, 0x7F, - 0x81, 0x2B, 0x4D, 0x37, 0xB9, 0x23, 0x45, 0x6F, - 0x71, 0x9B, 0xBD, 0x27, 0xA9, 0x93, 0xB5, 0x5F, - 0x61, 0x0B, 0x2D, 0x17, 0x99, 0x03, 0x25, 0x4F, - 0x51, 0x7B, 0x9D, 0x07, 0x89, 0x73, 0x95, 0x3F, - 0x41, 0xEB, 0x0D, 0xF7, 0x79, 0xE3, 0x05, 0x2F, - 0x31, 0x5B, 0x7D, 0xE7, 0x69, 0x53, 0x75, 0x1F, - 0x21, 0xCB, 0xED, 0xD7, 0x59, 0xC3, 0xE5, 0x0F, - 0x11, 0x3B, 0x5D, 0xC7, 0x49, 0x33, 0x55, 0xFF -}; - -/* Compute n^(-1) mod B, using a Newton iteration. */ -#define binv(inv,n) \ - do { \ - uintmax_t __n = (n); \ - uintmax_t __inv; \ - \ - __inv = binvert_table[(__n / 2) & 0x7F]; /* 8 */ \ - if (W_TYPE_SIZE > 8) __inv = 2 * __inv - __inv * __inv * __n; \ - if (W_TYPE_SIZE > 16) __inv = 2 * __inv - __inv * __inv * __n; \ - if (W_TYPE_SIZE > 32) __inv = 2 * __inv - __inv * __inv * __n; \ - \ - if (W_TYPE_SIZE > 64) \ - { \ - int __invbits = 64; \ - do { \ - __inv = 2 * __inv - __inv * __inv * __n; \ - __invbits *= 2; \ - } while (__invbits < W_TYPE_SIZE); \ - } \ - \ - (inv) = __inv; \ - } while (0) - -/* q = u / d, assuming d|u. */ -#define divexact_21(q1, q0, u1, u0, d) \ - do { \ - uintmax_t _di, _q0; \ - binv (_di, (d)); \ - _q0 = (u0) * _di; \ - if ((u1) >= (d)) \ - { \ - uintmax_t _p1, _p0; \ - umul_ppmm (_p1, _p0, _q0, d); \ - (q1) = ((u1) - _p1) * _di; \ - (q0) = _q0; \ - } \ - else \ - { \ - (q0) = _q0; \ - (q1) = 0; \ - } \ - } while(0) - -/* x B (mod n). */ -#define redcify(r_prim, r, n) \ - do { \ - uintmax_t _redcify_q; \ - udiv_qrnnd (_redcify_q, r_prim, r, 0, n); \ - } while (0) - -/* x B^2 (mod n). Requires x > 0, n1 < B/2 */ -#define redcify2(r1, r0, x, n1, n0) \ - do { \ - uintmax_t _r1, _r0, _i; \ - if ((x) < (n1)) \ - { \ - _r1 = (x); _r0 = 0; \ - _i = W_TYPE_SIZE; \ - } \ - else \ - { \ - _r1 = 0; _r0 = (x); \ - _i = 2*W_TYPE_SIZE; \ - } \ - while (_i-- > 0) \ - { \ - lsh2 (_r1, _r0, _r1, _r0, 1); \ - if (ge2 (_r1, _r0, (n1), (n0))) \ - sub_ddmmss (_r1, _r0, _r1, _r0, (n1), (n0)); \ - } \ - (r1) = _r1; \ - (r0) = _r0; \ - } while (0) - -/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B. - Both a and b must be in redc form, the result will be in redc form too. */ -static inline uintmax_t -mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi) -{ - uintmax_t rh, rl, q, th, tl, xh; - - umul_ppmm (rh, rl, a, b); - q = rl * mi; - umul_ppmm (th, tl, q, m); - xh = rh - th; - if (rh < th) - xh += m; - - return xh; -} - -/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B. - Both a and b must be in redc form, the result will be in redc form too. - For performance reasons, the most significant bit of m must be clear. */ -static uintmax_t -mulredc2 (uintmax_t *r1p, - uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0, - uintmax_t m1, uintmax_t m0, uintmax_t mi) -{ - uintmax_t r1, r0, q, p1, p0, t1, t0, s1, s0; - mi = -mi; - assert ( (a1 >> (W_TYPE_SIZE - 1)) == 0); - assert ( (b1 >> (W_TYPE_SIZE - 1)) == 0); - assert ( (m1 >> (W_TYPE_SIZE - 1)) == 0); - - /* First compute a0 * <b1, b0> B^{-1} - +-----+ - |a0 b0| - +--+--+--+ - |a0 b1| - +--+--+--+ - |q0 m0| - +--+--+--+ - |q0 m1| - -+--+--+--+ - |r1|r0| 0| - +--+--+--+ - */ - umul_ppmm (t1, t0, a0, b0); - umul_ppmm (r1, r0, a0, b1); - q = mi * t0; - umul_ppmm (p1, p0, q, m0); - umul_ppmm (s1, s0, q, m1); - r0 += (t0 != 0); /* Carry */ - add_ssaaaa (r1, r0, r1, r0, 0, p1); - add_ssaaaa (r1, r0, r1, r0, 0, t1); - add_ssaaaa (r1, r0, r1, r0, s1, s0); - - /* Next, (a1 * <b1, b0> + <r1, r0> B^{-1} - +-----+ - |a1 b0| - +--+--+ - |r1|r0| - +--+--+--+ - |a1 b1| - +--+--+--+ - |q1 m0| - +--+--+--+ - |q1 m1| - -+--+--+--+ - |r1|r0| 0| - +--+--+--+ - */ - umul_ppmm (t1, t0, a1, b0); - umul_ppmm (s1, s0, a1, b1); - add_ssaaaa (t1, t0, t1, t0, 0, r0); - q = mi * t0; - add_ssaaaa (r1, r0, s1, s0, 0, r1); - umul_ppmm (p1, p0, q, m0); - umul_ppmm (s1, s0, q, m1); - r0 += (t0 != 0); /* Carry */ - add_ssaaaa (r1, r0, r1, r0, 0, p1); - add_ssaaaa (r1, r0, r1, r0, 0, t1); - add_ssaaaa (r1, r0, r1, r0, s1, s0); - - if (ge2 (r1, r0, m1, m0)) - sub_ddmmss (r1, r0, r1, r0, m1, m0); - - *r1p = r1; - return r0; -} - -static uintmax_t _GL_ATTRIBUTE_CONST -powm (uintmax_t b, uintmax_t e, uintmax_t n, uintmax_t ni, uintmax_t one) -{ - uintmax_t y = one; - - if (e & 1) - y = b; - - while (e != 0) - { - b = mulredc (b, b, n, ni); - e >>= 1; - - if (e & 1) - y = mulredc (y, b, n, ni); - } - - return y; -} - -static uintmax_t -powm2 (uintmax_t *r1m, - const uintmax_t *bp, const uintmax_t *ep, const uintmax_t *np, - uintmax_t ni, const uintmax_t *one) -{ - uintmax_t r1, r0, b1, b0, n1, n0; - unsigned int i; - uintmax_t e; - - b0 = bp[0]; - b1 = bp[1]; - n0 = np[0]; - n1 = np[1]; - - r0 = one[0]; - r1 = one[1]; - - for (e = ep[0], i = W_TYPE_SIZE; i > 0; i--, e >>= 1) - { - if (e & 1) - { - r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); - r1 = *r1m; - } - b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); - b1 = *r1m; - } - for (e = ep[1]; e > 0; e >>= 1) - { - if (e & 1) - { - r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); - r1 = *r1m; - } - b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); - b1 = *r1m; - } - *r1m = r1; - return r0; -} - -static bool _GL_ATTRIBUTE_CONST -millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, - unsigned int k, uintmax_t one) -{ - uintmax_t y = powm (b, q, n, ni, one); - - uintmax_t nm1 = n - one; /* -1, but in redc representation. */ - - if (y == one || y == nm1) - return true; - - for (unsigned int i = 1; i < k; i++) - { - y = mulredc (y, y, n, ni); - - if (y == nm1) - return true; - if (y == one) - return false; - } - return false; -} - -static bool -millerrabin2 (const uintmax_t *np, uintmax_t ni, const uintmax_t *bp, - const uintmax_t *qp, unsigned int k, const uintmax_t *one) -{ - uintmax_t y1, y0, nm1_1, nm1_0, r1m; - - y0 = powm2 (&r1m, bp, qp, np, ni, one); - y1 = r1m; - - if (y0 == one[0] && y1 == one[1]) - return true; - - sub_ddmmss (nm1_1, nm1_0, np[1], np[0], one[1], one[0]); - - if (y0 == nm1_0 && y1 == nm1_1) - return true; - - for (unsigned int i = 1; i < k; i++) - { - y0 = mulredc2 (&r1m, y1, y0, y1, y0, np[1], np[0], ni); - y1 = r1m; - - if (y0 == nm1_0 && y1 == nm1_1) - return true; - if (y0 == one[0] && y1 == one[1]) - return false; - } - return false; -} - -#if HAVE_GMP -static bool -mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y, - mpz_srcptr q, unsigned long int k) -{ - mpz_powm (y, x, q, n); - - if (mpz_cmp_ui (y, 1) == 0 || mpz_cmp (y, nm1) == 0) - return true; - - for (unsigned long int i = 1; i < k; i++) - { - mpz_powm_ui (y, y, 2, n); - if (mpz_cmp (y, nm1) == 0) - return true; - if (mpz_cmp_ui (y, 1) == 0) - return false; - } - return false; -} -#endif - -/* Lucas' prime test. The number of iterations vary greatly, up to a few dozen - have been observed. The average seem to be about 2. */ -static bool -prime_p (uintmax_t n) -{ - int k; - bool is_prime; - uintmax_t a_prim, one, ni; - struct factors factors; - - if (n <= 1) - return false; - - /* We have already casted out small primes. */ - if (n < (uintmax_t) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) - return true; - - /* Precomputation for Miller-Rabin. */ - uintmax_t q = n - 1; - for (k = 0; (q & 1) == 0; k++) - q >>= 1; - - uintmax_t a = 2; - binv (ni, n); /* ni <- 1/n mod B */ - redcify (one, 1, n); - addmod (a_prim, one, one, n); /* i.e., redcify a = 2 */ - - /* Perform a Miller-Rabin test, finds most composites quickly. */ - if (!millerrabin (n, ni, a_prim, q, k, one)) - return false; - - if (flag_prove_primality) - { - /* Factor n-1 for Lucas. */ - factor (0, n - 1, &factors); - } - - /* Loop until Lucas proves our number prime, or Miller-Rabin proves our - number composite. */ - for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) - { - if (flag_prove_primality) - { - is_prime = true; - for (unsigned int i = 0; i < factors.nfactors && is_prime; i++) - { - is_prime = powm (a_prim, (n - 1) / factors.p[i], n, ni, one) != one; - } - } - else - { - /* After enough Miller-Rabin runs, be content. */ - is_prime = (r == MR_REPS - 1); - } - - if (is_prime) - return true; - - a += primes_diff[r]; /* Establish new base. */ - - /* The following is equivalent to redcify (a_prim, a, n). It runs faster - on most processors, since it avoids udiv_qrnnd. If we go down the - udiv_qrnnd_preinv path, this code should be replaced. */ - { - uintmax_t dummy, s1, s0; - umul_ppmm (s1, s0, one, a); - if (LIKELY (s1 == 0)) - a_prim = s0 % n; - else - udiv_qrnnd (dummy, a_prim, s1, s0, n); - } - - if (!millerrabin (n, ni, a_prim, q, k, one)) - return false; - } - - fprintf (stderr, "Lucas prime test failure. This should not happen\n"); - abort (); -} - -static bool -prime2_p (uintmax_t n1, uintmax_t n0) -{ - uintmax_t q[2], nm1[2]; - uintmax_t a_prim[2]; - uintmax_t one[2]; - uintmax_t na[2]; - uintmax_t ni; - unsigned int k; - struct factors factors; - - if (n1 == 0) - return prime_p (n0); - - nm1[1] = n1 - (n0 == 0); - nm1[0] = n0 - 1; - if (nm1[0] == 0) - { - count_trailing_zeros (k, nm1[1]); - - q[0] = nm1[1] >> k; - q[1] = 0; - k += W_TYPE_SIZE; - } - else - { - count_trailing_zeros (k, nm1[0]); - rsh2 (q[1], q[0], nm1[1], nm1[0], k); - } - - uintmax_t a = 2; - binv (ni, n0); - redcify2 (one[1], one[0], 1, n1, n0); - addmod2 (a_prim[1], a_prim[0], one[1], one[0], one[1], one[0], n1, n0); - - /* FIXME: Use scalars or pointers in arguments? Some consistency needed. */ - na[0] = n0; - na[1] = n1; - - if (!millerrabin2 (na, ni, a_prim, q, k, one)) - return false; - - if (flag_prove_primality) - { - /* Factor n-1 for Lucas. */ - factor (nm1[1], nm1[0], &factors); - } - - /* Loop until Lucas proves our number prime, or Miller-Rabin proves our - number composite. */ - for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) - { - bool is_prime; - uintmax_t e[2], y[2]; - - if (flag_prove_primality) - { - is_prime = true; - if (factors.plarge[1]) - { - uintmax_t pi; - binv (pi, factors.plarge[0]); - e[0] = pi * nm1[0]; - e[1] = 0; - y[0] = powm2 (&y[1], a_prim, e, na, ni, one); - is_prime = (y[0] != one[0] || y[1] != one[1]); - } - for (unsigned int i = 0; i < factors.nfactors && is_prime; i++) - { - /* FIXME: We always have the factor 2. Do we really need to handle it - here? We have done the same powering as part of millerrabin. */ - if (factors.p[i] == 2) - rsh2 (e[1], e[0], nm1[1], nm1[0], 1); - else - divexact_21 (e[1], e[0], nm1[1], nm1[0], factors.p[i]); - y[0] = powm2 (&y[1], a_prim, e, na, ni, one); - is_prime = (y[0] != one[0] || y[1] != one[1]); - } - } - else - { - /* After enough Miller-Rabin runs, be content. */ - is_prime = (r == MR_REPS - 1); - } - - if (is_prime) - return true; - - a += primes_diff[r]; /* Establish new base. */ - redcify2 (a_prim[1], a_prim[0], a, n1, n0); - - if (!millerrabin2 (na, ni, a_prim, q, k, one)) - return false; - } - - fprintf (stderr, "Lucas prime test failure. This should not happen\n"); - abort (); -} - -#if HAVE_GMP -static bool -mp_prime_p (mpz_t n) -{ - bool is_prime; - mpz_t q, a, nm1, tmp; - struct mp_factors factors; - - if (mpz_cmp_ui (n, 1) <= 0) - return false; - - /* We have already casted out small primes. */ - if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0) - return true; - - mpz_inits (q, a, nm1, tmp, NULL); - - /* Precomputation for Miller-Rabin. */ - mpz_sub_ui (nm1, n, 1); - - /* Find q and k, where q is odd and n = 1 + 2**k * q. */ - unsigned long int k = mpz_scan1 (nm1, 0); - mpz_tdiv_q_2exp (q, nm1, k); - - mpz_set_ui (a, 2); - - /* Perform a Miller-Rabin test, finds most composites quickly. */ - if (!mp_millerrabin (n, nm1, a, tmp, q, k)) - { - is_prime = false; - goto ret2; - } - - if (flag_prove_primality) - { - /* Factor n-1 for Lucas. */ - mpz_set (tmp, nm1); - mp_factor (tmp, &factors); - } - - /* Loop until Lucas proves our number prime, or Miller-Rabin proves our - number composite. */ - for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) - { - if (flag_prove_primality) - { - is_prime = true; - for (unsigned long int i = 0; i < factors.nfactors && is_prime; i++) - { - mpz_divexact (tmp, nm1, factors.p[i]); - mpz_powm (tmp, a, tmp, n); - is_prime = mpz_cmp_ui (tmp, 1) != 0; - } - } - else - { - /* After enough Miller-Rabin runs, be content. */ - is_prime = (r == MR_REPS - 1); - } - - if (is_prime) - goto ret1; - - mpz_add_ui (a, a, primes_diff[r]); /* Establish new base. */ - - if (!mp_millerrabin (n, nm1, a, tmp, q, k)) - { - is_prime = false; - goto ret1; - } - } - - fprintf (stderr, "Lucas prime test failure. This should not happen\n"); - abort (); - - ret1: - if (flag_prove_primality) - mp_factor_clear (&factors); - ret2: - mpz_clears (q, a, nm1, tmp, NULL); - - return is_prime; -} -#endif - -static void -factor_using_pollard_rho (uintmax_t n, unsigned long int a, - struct factors *factors) -{ - uintmax_t x, z, y, P, t, ni, g; - - unsigned long int k = 1; - unsigned long int l = 1; - - redcify (P, 1, n); - addmod (x, P, P, n); /* i.e., redcify(2) */ - y = z = x; - - while (n != 1) - { - assert (a < n); - - binv (ni, n); /* FIXME: when could we use old 'ni' value? */ - - for (;;) - { - do - { - x = mulredc (x, x, n, ni); - addmod (x, x, a, n); - - submod (t, z, x, n); - P = mulredc (P, t, n, ni); - - if (k % 32 == 1) - { - if (gcd_odd (P, n) != 1) - goto factor_found; - y = x; - } - } - while (--k != 0); - - z = x; - k = l; - l = 2 * l; - for (unsigned long int i = 0; i < k; i++) - { - x = mulredc (x, x, n, ni); - addmod (x, x, a, n); - } - y = x; - } - - factor_found: - do - { - y = mulredc (y, y, n, ni); - addmod (y, y, a, n); - - submod (t, z, y, n); - g = gcd_odd (t, n); - } - while (g == 1); - - n = n / g; - - if (!prime_p (g)) - factor_using_pollard_rho (g, a + 1, factors); - else - factor_insert (factors, g); - - if (prime_p (n)) - { - factor_insert (factors, n); - break; - } - - x = x % n; - z = z % n; - y = y % n; - } -} - -static void -factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int a, - struct factors *factors) -{ - uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m; - - unsigned long int k = 1; - unsigned long int l = 1; - - redcify2 (P1, P0, 1, n1, n0); - addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */ - y1 = z1 = x1; - y0 = z0 = x0; - - while (n1 != 0 || n0 != 1) - { - binv (ni, n0); - - for (;;) - { - do - { - x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); - x1 = r1m; - addmod2 (x1, x0, x1, x0, 0, a, n1, n0); - - submod2 (t1, t0, z1, z0, x1, x0, n1, n0); - P0 = mulredc2 (&r1m, P1, P0, t1, t0, n1, n0, ni); - P1 = r1m; - - if (k % 32 == 1) - { - g0 = gcd2_odd (&g1, P1, P0, n1, n0); - if (g1 != 0 || g0 != 1) - goto factor_found; - y1 = x1; y0 = x0; - } - } - while (--k != 0); - - z1 = x1; z0 = x0; - k = l; - l = 2 * l; - for (unsigned long int i = 0; i < k; i++) - { - x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); - x1 = r1m; - addmod2 (x1, x0, x1, x0, 0, a, n1, n0); - } - y1 = x1; y0 = x0; - } - - factor_found: - do - { - y0 = mulredc2 (&r1m, y1, y0, y1, y0, n1, n0, ni); - y1 = r1m; - addmod2 (y1, y0, y1, y0, 0, a, n1, n0); - - submod2 (t1, t0, z1, z0, y1, y0, n1, n0); - g0 = gcd2_odd (&g1, t1, t0, n1, n0); - } - while (g1 == 0 && g0 == 1); - - if (g1 == 0) - { - /* The found factor is one word. */ - divexact_21 (n1, n0, n1, n0, g0); /* n = n / g */ - - if (!prime_p (g0)) - factor_using_pollard_rho (g0, a + 1, factors); - else - factor_insert (factors, g0); - } - else - { - /* The found factor is two words. This is highly unlikely, thus hard - to trigger. Please be careful before you change this code! */ - uintmax_t ginv; - - binv (ginv, g0); /* Compute n = n / g. Since the result will */ - n0 = ginv * n0; /* fit one word, we can compute the quotient */ - n1 = 0; /* modulo B, ignoring the high divisor word. */ - - if (!prime2_p (g1, g0)) - factor_using_pollard_rho2 (g1, g0, a + 1, factors); - else - factor_insert_large (factors, g1, g0); - } - - if (n1 == 0) - { - if (prime_p (n0)) - { - factor_insert (factors, n0); - break; - } - - factor_using_pollard_rho (n0, a, factors); - return; - } - - if (prime2_p (n1, n0)) - { - factor_insert_large (factors, n1, n0); - break; - } - - x0 = mod2 (&x1, x1, x0, n1, n0); - z0 = mod2 (&z1, z1, z0, n1, n0); - y0 = mod2 (&y1, y1, y0, n1, n0); - } -} - -#if HAVE_GMP -static void -mp_factor_using_pollard_rho (mpz_t n, unsigned long int a, - struct mp_factors *factors) -{ - mpz_t x, z, y, P; - mpz_t t, t2; - unsigned long long int k, l; - - if (flag_verbose > 0) - { - printf ("[pollard-rho (%lu)] ", a); - } - - mpz_inits (t, t2, NULL); - mpz_init_set_si (y, 2); - mpz_init_set_si (x, 2); - mpz_init_set_si (z, 2); - mpz_init_set_ui (P, 1); - k = 1; - l = 1; - - while (mpz_cmp_ui (n, 1) != 0) - { - for (;;) - { - do - { - mpz_mul (t, x, x); - mpz_mod (x, t, n); - mpz_add_ui (x, x, a); - - mpz_sub (t, z, x); - mpz_mul (t2, P, t); - mpz_mod (P, t2, n); - - if (k % 32 == 1) - { - mpz_gcd (t, P, n); - if (mpz_cmp_ui (t, 1) != 0) - goto factor_found; - mpz_set (y, x); - } - } - while (--k != 0); - - mpz_set (z, x); - k = l; - l = 2 * l; - for (unsigned long long int i = 0; i < k; i++) - { - mpz_mul (t, x, x); - mpz_mod (x, t, n); - mpz_add_ui (x, x, a); - } - mpz_set (y, x); - } - - factor_found: - do - { - mpz_mul (t, y, y); - mpz_mod (y, t, n); - mpz_add_ui (y, y, a); - - mpz_sub (t, z, y); - mpz_gcd (t, t, n); - } - while (mpz_cmp_ui (t, 1) == 0); - - mpz_divexact (n, n, t); /* divide by t, before t is overwritten */ - - if (!mp_prime_p (t)) - { - if (flag_verbose > 0) - { - printf ("[composite factor--restarting pollard-rho] "); - } - mp_factor_using_pollard_rho (t, a + 1, factors); - } - else - { - mp_factor_insert (factors, t); - } - - if (mp_prime_p (n)) - { - mp_factor_insert (factors, n); - break; - } - - mpz_mod (x, x, n); - mpz_mod (z, z, n); - mpz_mod (y, y, n); - } - - mpz_clears (P, t2, t, z, x, y, NULL); -} -#endif - -/* FIXME: Maybe better to use an iteration converging to 1/sqrt(n)? If - algorithm is replaced, consider also returning the remainder. */ -static uintmax_t _GL_ATTRIBUTE_CONST -isqrt (uintmax_t n) -{ - uintmax_t x; - unsigned c; - if (n == 0) - return 0; - - count_leading_zeros (c, n); - - /* Make x > sqrt(n). This will be invariant through the loop. */ - x = (uintmax_t) 1 << ((W_TYPE_SIZE + 1 - c) / 2); - - for (;;) - { - uintmax_t y = (x + n/x) / 2; - if (y >= x) - return x; - - x = y; - } -} - -static uintmax_t _GL_ATTRIBUTE_CONST -isqrt2 (uintmax_t nh, uintmax_t nl) -{ - unsigned int shift; - uintmax_t x; - - /* Ensures the remainder fits in an uintmax_t. */ - assert (nh < ((uintmax_t) 1 << (W_TYPE_SIZE - 2))); - - if (nh == 0) - return isqrt (nl); - - count_leading_zeros (shift, nh); - shift &= ~1; - - /* Make x > sqrt(n) */ - x = isqrt ( (nh << shift) + (nl >> (W_TYPE_SIZE - shift))) + 1; - x <<= (W_TYPE_SIZE - shift) / 2; - - /* Do we need more than one iteration? */ - for (;;) - { - uintmax_t q, r, y; - udiv_qrnnd (q, r, nh, nl, x); - y = (x + q) / 2; - - if (y >= x) - { - uintmax_t hi, lo; - umul_ppmm (hi, lo, x + 1, x + 1); - assert (gt2 (hi, lo, nh, nl)); - - umul_ppmm (hi, lo, x, x); - assert (ge2 (nh, nl, hi, lo)); - sub_ddmmss (hi, lo, nh, nl, hi, lo); - assert (hi == 0); - - return x; - } - - x = y; - } -} - -/* MAGIC[N] has a bit i set iff i is a quadratic residue mod N. */ -#define MAGIC64 ((uint64_t) 0x0202021202030213ULL) -#define MAGIC63 ((uint64_t) 0x0402483012450293ULL) -#define MAGIC65 ((uint64_t) 0x218a019866014613ULL) -#define MAGIC11 0x23b - -/* Returns the square root if the input is a square, otherwise 0. */ -static uintmax_t _GL_ATTRIBUTE_CONST -is_square (uintmax_t x) -{ - /* Uses the tests suggested by Cohen. Excludes 99% of the non-squares before - computing the square root. */ - if (((MAGIC64 >> (x & 63)) & 1) - && ((MAGIC63 >> (x % 63)) & 1) - /* Both 0 and 64 are squares mod (65) */ - && ((MAGIC65 >> ((x % 65) & 63)) & 1) - && ((MAGIC11 >> (x % 11) & 1))) - { - uintmax_t r = isqrt (x); - if (r*r == x) - return r; - } - return 0; -} - -/* invtab[i] = floor(0x10000 / (0x100 + i) */ -static const unsigned short invtab[0x81] = - { - 0x200, - 0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1, - 0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7, - 0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af, - 0x1ac, 0x1a9, 0x1a6, 0x1a4, 0x1a1, 0x19e, 0x19c, 0x199, - 0x197, 0x194, 0x192, 0x18f, 0x18d, 0x18a, 0x188, 0x186, - 0x183, 0x181, 0x17f, 0x17d, 0x17a, 0x178, 0x176, 0x174, - 0x172, 0x170, 0x16e, 0x16c, 0x16a, 0x168, 0x166, 0x164, - 0x162, 0x160, 0x15e, 0x15c, 0x15a, 0x158, 0x157, 0x155, - 0x153, 0x151, 0x150, 0x14e, 0x14c, 0x14a, 0x149, 0x147, - 0x146, 0x144, 0x142, 0x141, 0x13f, 0x13e, 0x13c, 0x13b, - 0x139, 0x138, 0x136, 0x135, 0x133, 0x132, 0x130, 0x12f, - 0x12e, 0x12c, 0x12b, 0x129, 0x128, 0x127, 0x125, 0x124, - 0x123, 0x121, 0x120, 0x11f, 0x11e, 0x11c, 0x11b, 0x11a, - 0x119, 0x118, 0x116, 0x115, 0x114, 0x113, 0x112, 0x111, - 0x10f, 0x10e, 0x10d, 0x10c, 0x10b, 0x10a, 0x109, 0x108, - 0x107, 0x106, 0x105, 0x104, 0x103, 0x102, 0x101, 0x100, - }; - -/* Compute q = [u/d], r = u mod d. Avoids slow hardware division for the case - that q < 0x40; here it instead uses a table of (Euclidian) inverses. */ -#define div_smallq(q, r, u, d) \ - do { \ - if ((u) / 0x40 < (d)) \ - { \ - int _cnt; \ - uintmax_t _dinv, _mask, _q, _r; \ - count_leading_zeros (_cnt, (d)); \ - _r = (u); \ - if (UNLIKELY (_cnt > (W_TYPE_SIZE - 8))) \ - { \ - _dinv = invtab[((d) << (_cnt + 8 - W_TYPE_SIZE)) - 0x80]; \ - _q = _dinv * _r >> (8 + W_TYPE_SIZE - _cnt); \ - } \ - else \ - { \ - _dinv = invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt)) - 0x7f]; \ - _q = _dinv * (_r >> (W_TYPE_SIZE - 3 - _cnt)) >> 11; \ - } \ - _r -= _q*(d); \ - \ - _mask = -(uintmax_t) (_r >= (d)); \ - (r) = _r - (_mask & (d)); \ - (q) = _q - _mask; \ - assert ( (q) * (d) + (r) == u); \ - } \ - else \ - { \ - uintmax_t _q = (u) / (d); \ - (r) = (u) - _q * (d); \ - (q) = _q; \ - } \ - } while (0) - -/* Notes: Example N = 22117019. After first phase we find Q1 = 6314, Q - = 3025, P = 1737, representing F_{18} = (-6314, 2* 1737, 3025), - with 3025 = 55^2. - - Constructing the square root, we get Q1 = 55, Q = 8653, P = 4652, - representing G_0 = (-55, 2*4652, 8653). - - In the notation of the paper: - - S_{-1} = 55, S_0 = 8653, R_0 = 4652 - - Put - - t_0 = floor([q_0 + R_0] / S0) = 1 - R_1 = t_0 * S_0 - R_0 = 4001 - S_1 = S_{-1} +t_0 (R_0 - R_1) = 706 -*/ - -/* Multipliers, in order of efficiency: - 0.7268 3*5*7*11 = 1155 = 3 (mod 4) - 0.7317 3*5*7 = 105 = 1 - 0.7820 3*5*11 = 165 = 1 - 0.7872 3*5 = 15 = 3 - 0.8101 3*7*11 = 231 = 3 - 0.8155 3*7 = 21 = 1 - 0.8284 5*7*11 = 385 = 1 - 0.8339 5*7 = 35 = 3 - 0.8716 3*11 = 33 = 1 - 0.8774 3 = 3 = 3 - 0.8913 5*11 = 55 = 3 - 0.8972 5 = 5 = 1 - 0.9233 7*11 = 77 = 1 - 0.9295 7 = 7 = 3 - 0.9934 11 = 11 = 3 -*/ -#define QUEUE_SIZE 50 - -#if STAT_SQUFOF -# define Q_FREQ_SIZE 50 -/* Element 0 keeps the total */ -static unsigned int q_freq[Q_FREQ_SIZE + 1]; -# define MIN(a,b) ((a) < (b) ? (a) : (b)) -#endif - -/* Return true on success. Expected to fail only for numbers - >= 2^{2*W_TYPE_SIZE - 2}, or close to that limit. */ -static bool -factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors) -{ - /* Uses algorithm and notation from - - SQUARE FORM FACTORIZATION - JASON E. GOWER AND SAMUEL S. WAGSTAFF, JR. - - http://homes.cerias.purdue.edu/~ssw/squfof.pdf - */ - - static const unsigned int multipliers_1[] = - { /* = 1 (mod 4) */ - 105, 165, 21, 385, 33, 5, 77, 1, 0 - }; - static const unsigned int multipliers_3[] = - { /* = 3 (mod 4) */ - 1155, 15, 231, 35, 3, 55, 7, 11, 0 - }; - - const unsigned int *m; - - struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE]; - - if (n1 >= ((uintmax_t) 1 << (W_TYPE_SIZE - 2))) - return false; - - uintmax_t sqrt_n = isqrt2 (n1, n0); - - if (n0 == sqrt_n * sqrt_n) - { - uintmax_t p1, p0; - - umul_ppmm (p1, p0, sqrt_n, sqrt_n); - assert (p0 == n0); - - if (n1 == p1) - { - if (prime_p (sqrt_n)) - factor_insert_multiplicity (factors, sqrt_n, 2); - else - { - struct factors f; - - f.nfactors = 0; - if (!factor_using_squfof (0, sqrt_n, &f)) - { - /* Try pollard rho instead */ - factor_using_pollard_rho (sqrt_n, 1, &f); - } - /* Duplicate the new factors */ - for (unsigned int i = 0; i < f.nfactors; i++) - factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]); - } - return true; - } - } - - /* Select multipliers so we always get n * mu = 3 (mod 4) */ - for (m = (n0 % 4 == 1) ? multipliers_3 : multipliers_1; - *m; m++) - { - uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B; - unsigned int i; - unsigned int mu = *m; - unsigned int qpos = 0; - - assert (mu * n0 % 4 == 3); - - /* In the notation of the paper, with mu * n == 3 (mod 4), we - get \Delta = 4 mu * n, and the paper's \mu is 2 mu. As far as - I understand it, the necessary bound is 4 \mu^3 < n, or 32 - mu^3 < n. - - However, this seems insufficient: With n = 37243139 and mu = - 105, we get a trivial factor, from the square 38809 = 197^2, - without any corresponding Q earlier in the iteration. - - Requiring 64 mu^3 < n seems sufficient. */ - if (n1 == 0) - { - if ((uintmax_t) mu*mu*mu >= n0 / 64) - continue; - } - else - { - if (n1 > ((uintmax_t) 1 << (W_TYPE_SIZE - 2)) / mu) - continue; - } - umul_ppmm (Dh, Dl, n0, mu); - Dh += n1 * mu; - - assert (Dl % 4 != 1); - assert (Dh < (uintmax_t) 1 << (W_TYPE_SIZE - 2)); - - S = isqrt2 (Dh, Dl); - - Q1 = 1; - P = S; - - /* Square root remainder fits in one word, so ignore high part. */ - Q = Dl - P*P; - /* FIXME: When can this differ from floor(sqrt(2 sqrt(D)))? */ - L = isqrt (2*S); - B = 2*L; - L1 = mu * 2 * L; - - /* The form is (+/- Q1, 2P, -/+ Q), of discriminant 4 (P^2 + Q Q1) = - 4 D. */ - - for (i = 0; i <= B; i++) - { - uintmax_t q, P1, t, rem; - - div_smallq (q, rem, S+P, Q); - P1 = S - rem; /* P1 = q*Q - P */ - -#if STAT_SQUFOF - assert (q > 0); - q_freq[0]++; - q_freq[MIN(q, Q_FREQ_SIZE)]++; -#endif - - if (Q <= L1) - { - uintmax_t g = Q; - - if ( (Q & 1) == 0) - g /= 2; - - g /= gcd_odd (g, mu); - - if (g <= L) - { - if (qpos >= QUEUE_SIZE) - { - fprintf (stderr, "squfof queue overflow.\n"); - exit (EXIT_FAILURE); - } - queue[qpos].Q = g; - queue[qpos].P = P % g; - qpos++; - } - } - - /* I think the difference can be either sign, but mod - 2^W_TYPE_SIZE arithmetic should be fine. */ - t = Q1 + q * (P - P1); - Q1 = Q; - Q = t; - P = P1; - - if ( (i & 1) == 0) - { - uintmax_t r = is_square (Q); - if (r) - { - unsigned int j; - - for (j = 0; j < qpos; j++) - { - if (queue[j].Q == r) - { - if (r == 1) - /* Traversed entire cycle. */ - goto next_multiplier; - - /* Need the absolute value for divisibility test. */ - if (P >= queue[j].P) - t = P - queue[j].P; - else - t = queue[j].P - P; - if (t % r == 0) - { - /* Delete entries up to and including entry - j, which matched. */ - memmove (queue, queue + j + 1, - (qpos - j - 1) * sizeof (queue[0])); - qpos -= (j + 1); - } - goto next_i; - } - } - - /* We have found a square form, which should give a - factor. */ - Q1 = r; - assert (S >= P); /* What signs are possible? */ - P += r * ((S - P) / r); - - /* Note: Paper says (N - P*P) / Q1, that seems incorrect - for the case D = 2N. */ - /* Compute Q = (D - P*P) / Q1, but we need double - precision. */ - uintmax_t hi, lo; - umul_ppmm (hi, lo, P, P); - sub_ddmmss (hi, lo, Dh, Dl, hi, lo); - udiv_qrnnd (Q, rem, hi, lo, Q1); - assert (rem == 0); - - for (;;) - { - /* Note: There appears to by a typo in the paper, - Step 4a in the algorithm description says q <-- - floor([S+P]/\hat Q), but looking at the equations - in Sec. 3.1, it should be q <-- floor([S+P] / Q). - (In this code, \hat Q is Q1). */ - div_smallq (q, rem, S+P, Q); - P1 = S - rem; /* P1 = q*Q - P */ - -#if STAT_SQUFOF - q_freq[0]++; - q_freq[MIN (q, Q_FREQ_SIZE)]++; -#endif - if (P == P1) - break; - t = Q1 + q * (P - P1); - Q1 = Q; - Q = t; - P = P1; - } - - if ( (Q & 1) == 0) - Q /= 2; - Q /= gcd_odd (Q, mu); - - assert (Q > 1 && (n1 || Q < n0)); - - if (prime_p (Q)) - factor_insert (factors, Q); - else if (!factor_using_squfof (0, Q, factors)) - factor_using_pollard_rho (Q, 2, factors); - - divexact_21 (n1, n0, n1, n0, Q); - - if (prime2_p (n1, n0)) - factor_insert_large (factors, n1, n0); - else - { - if (!factor_using_squfof (n1, n0, factors)) - { - if (n1 == 0) - factor_using_pollard_rho (n0, 1, factors); - else - factor_using_pollard_rho2 (n1, n0, 1, factors); - } - } - - return true; - } - } - next_i: - ; - } - next_multiplier: - ; - } - return false; -} - -static void -factor (uintmax_t t1, uintmax_t t0, struct factors *factors) -{ - factors->nfactors = 0; - factors->plarge[1] = 0; - - if (t1 == 0 && t0 < 2) - return; - - t0 = factor_using_division (&t1, t1, t0, factors); - - if (t1 == 0 && t0 < 2) - return; - - if (prime2_p (t1, t0)) - factor_insert_large (factors, t1, t0); - else - { - if (alg == ALG_SQUFOF) - if (factor_using_squfof (t1, t0, factors)) - return; - - if (t1 == 0) - factor_using_pollard_rho (t0, 1, factors); - else - factor_using_pollard_rho2 (t1, t0, 1, factors); - } -} - -#if HAVE_GMP -static void -mp_factor (mpz_t t, struct mp_factors *factors) -{ - mp_factor_init (factors); - - if (mpz_sgn (t) != 0) - { - mp_factor_using_division (t, factors); - - if (mpz_cmp_ui (t, 1) != 0) - { - if (flag_verbose > 0) - { - printf ("[is number prime?] "); - } - if (mp_prime_p (t)) - mp_factor_insert (factors, t); - else - mp_factor_using_pollard_rho (t, 1, factors); - } - } -} -#endif - -static int -strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s) -{ - int errcode; - unsigned int lo_carry; - uintmax_t hi, lo; - - errcode = -1; - - hi = lo = 0; - for (;;) - { - unsigned int c = *s++; - if (c == 0) - break; - - if (UNLIKELY (c < '0' || c > '9')) - { - errcode = -1; - break; - } - c -= '0'; - - errcode = 0; /* we've seen at least one valid digit */ - - if (UNLIKELY (hi > ~(uintmax_t)0 / 10)) - { - errcode = -1; /* overflow */ - break; - } - hi = 10 * hi; - - lo_carry = (lo >> (W_TYPE_SIZE - 3)) + (lo >> (W_TYPE_SIZE - 1)); - lo_carry += 10 * lo < 2 * lo; - - lo = 10 * lo; - lo += c; - - lo_carry += lo < c; - hi += lo_carry; - if (UNLIKELY (hi < lo_carry)) - { - errcode = -1; /* overflow */ - break; - } - } - - *hip = hi; - *lop = lo; - - return errcode; -} - -static void -print_uintmaxes (uintmax_t t1, uintmax_t t0) -{ - uintmax_t q, r; - - if (t1 == 0) - printf ("%ju", t0); - else - { - /* Use very plain code here since it seems hard to write fast code - without assuming a specific word size. */ - q = t1 / 1000000000; - r = t1 % 1000000000; - udiv_qrnnd (t0, r, r, t0, 1000000000); - print_uintmaxes (q, t0); - printf ("%09u", (int) r); - } -} - -static void -factor_one (const char *input) -{ - uintmax_t t1, t0; - int errcode; - - /* Try converting the number to one or two words. If it fails, use GMP or - print an error message. The 2nd condition checks that the most - significant bit of the two-word number is clear, in a typesize neutral - way. */ - errcode = strto2uintmax (&t1, &t0, input); - if (errcode == 0 && ((t1 << 1) >> 1) == t1) - { - struct factors factors; - - print_uintmaxes (t1, t0); - printf (":"); - - factor (t1, t0, &factors); - - for (unsigned int j = 0; j < factors.nfactors; j++) - for (unsigned int k = 0; k < factors.e[j]; k++) - printf (" %ju", factors.p[j]); - - if (factors.plarge[1]) - { - printf (" "); - print_uintmaxes (factors.plarge[1], factors.plarge[0]); - } - puts (""); - } - else - { -#if HAVE_GMP - mpz_t t; - struct mp_factors factors; - - mpz_init_set_str (t, input, 10); - - gmp_printf ("%Zd:", t); - mp_factor (t, &factors); - - for (unsigned int j = 0; j < factors.nfactors; j++) - for (unsigned int k = 0; k < factors.e[j]; k++) - gmp_printf (" %Zd", factors.p[j]); - - mp_factor_clear (&factors); - mpz_clear (t); - puts (""); -#else - fprintf (stderr, "error: number %s not parsable or too large\n", input); - exit (EXIT_FAILURE); -#endif - } -} - -struct inbuf -{ - char *buf; - size_t alloc; -}; - -/* Read white-space delimited items. Return 1 on success, 0 on EOF. - Exit on I/O errors. */ -int -read_item (struct inbuf *bufstruct) -{ - int c; - size_t i; - char *buf = bufstruct->buf; - - do - c = getchar_unlocked (); - while (isspace (c)); - - for (i = 0; !isspace(c); i++) - { - if (c < 0) - { - if (ferror (stdin)) - { - fprintf (stderr, "read error on stdin: %s\n", strerror(errno)); - exit (EXIT_FAILURE); - } - if (i == 0) - return 0; - else - break; - } - - if (UNLIKELY (bufstruct->alloc <= i + 1)) /* +1 byte for terminating NUL */ - { - bufstruct->alloc = bufstruct->alloc * 5 / 4 + 1; - bufstruct->buf = realloc (bufstruct->buf, bufstruct->alloc); - buf = bufstruct->buf; - } - - buf[i] = c; - c = getchar_unlocked (); - } - - buf[i] = '\0'; - return 1; -} - -int -main (int argc, char *argv[]) -{ - int c; - - alg = ALG_POLLARD_RHO; /* Default to Pollard rho */ - - while ( (c = getopt(argc, argv, "svw")) != -1) - switch (c) - { - case 's': - alg = ALG_SQUFOF; - break; - case 'v': - flag_verbose = 1; - break; - case 'w': - flag_prove_primality = 0; - break; - case '?': - printf ("Usage: %s [-s] number ...\n", argv[0]); - return EXIT_FAILURE; - } -#if STAT_SQUFOF - if (alg == ALG_SQUFOF) - memset (q_freq, 0, sizeof (q_freq)); -#endif - - if (optind < argc) - for (int i = optind; i < argc; i++) - factor_one (argv[i]); - else - { - struct inbuf bufstruct; - bufstruct.alloc = 50; /* enough unless HAVE_GMP */ - bufstruct.buf = malloc (bufstruct.alloc); - while (read_item (&bufstruct)) - factor_one (bufstruct.buf); - } - -#if STAT_SQUFOF - if (alg == ALG_SQUFOF && q_freq[0] > 0) - { - double acc_f; - printf ("q freq. cum. freq.(total: %d)\n", q_freq[0]); - for (unsigned int i = 1, acc_f = 0.0; i <= Q_FREQ_SIZE; i++) - { - double f = (double) q_freq[i] / q_freq[0]; - acc_f += f; - printf ("%s%d %.2f%% %.2f%%\n", i == Q_FREQ_SIZE ? ">=" : "", i, - 100.0 * f, 100.0 * acc_f); - } - } -#endif - - exit (EXIT_SUCCESS); -} diff --git a/src/factor.c b/src/factor.c index e63e0e01d..84d9721c4 100644 --- a/src/factor.c +++ b/src/factor.c @@ -14,17 +14,79 @@ You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ -/* Written by Paul Rubin <phr@ocf.berkeley.edu>. +/* Originally written by Paul Rubin <phr@ocf.berkeley.edu>. Adapted for GNU, fixed to factor UINT_MAX by Jim Meyering. Arbitrary-precision code adapted by James Youngman from Torbjörn Granlund's factorize.c, from GNU MP version 4.2.2. + In 2012, the core was rewritten by Torbjörn Granlund and Niels Möller. + Contains code from GNU MP. */ + +/* Efficiently factor numbers that fit in one or two words (word = uintmax_t), + or, with GMP, numbers of any size. + + Code organisation: + + There are several variants of many functions, for handling one word, two + words, and GMP's mpz_t type. If the one-word variant is called foo, the + two-word variant will be foo2, and the one for mpz_t will be mp_foo. In + some cases, the plain function variants will handle both one-word and + two-word numbers, evidenced by function arguments. + + The factoring code for two words will fall into the code for one word when + progress allows that. + + Using GMP is optional. Define HAVE_GMP to make this code include GMP + factoring code. The GMP factoring code is based on GMP's demos/factorize.c + (last synched 2012-09-07). The GMP-based factoring code will stay in GMP + factoring code even if numbers get small enough for using the two-word + code. + + Algorithm: + + (1) Perform trial division using a small primes table, but without hardware + division since the primes table store inverses modulo the word base. + (The GMP variant of this code doesn't make use of the precomputed + inverses, but instead relies on GMP for fast divisibility testing.) + (2) Check the nature of any non-factored part using Miller-Rabin for + detecting composites, and Lucas for detecting primes. + (3) Factor any remaining composite part using the Pollard-Brent rho + algorithm or the SQUFOF algorithm, checking status of found factors + again using Miller-Rabin and Lucas. + + We prefer using Hensel norm in the divisions, not the more familiar + Euclidian norm, since the former leads to much faster code. In the + Pollard-Brent rho code and the prime testing code, we use Montgomery's + trick of multiplying all n-residues by the word base, allowing cheap Hensel + reductions mod n. + + Improvements: + + * Use modular inverses also for exact division in the Lucas code, and + elsewhere. A problem is to locate the inverses not from an index, but + from a prime. We might instead compute the inverse on-the-fly. + + * Tune trial division table size (not forgetting that this is a standalone + program where the table will be read from disk for each invocation). + + * Implement less naive powm, using k-ary exponentiation for k = 3 or + perhaps k = 4. + + * Try to speed trial division code for single uintmax_t numbers, i.e., the + code using DIVBLOCK. It currently runs at 2 cycles per prime (Intel SBR, + IBR), 3 cycles per prime (AMD Stars) and 5 cycles per prime (AMD BD) when + using gcc 4.6 and 4.7. Some software pipelining should help; 1, 2, and 4 + respectively cycles ought to be possible. + + * The redcify function could be vastly improved by using (plain Euclidian) + pre-inversion (such as GMP's invert_limb) and udiv_qrnnd_preinv (from + GMP's gmp-impl.h). The redcify2 function could be vastly improved using + similar methoods. These functions currently dominate run time when using + the -w option. */ #include <config.h> #include <getopt.h> -#include <stdarg.h> #include <stdio.h> -#include <sys/types.h> #if HAVE_GMP # include <gmp.h> #endif @@ -40,17 +102,540 @@ /* The official name of this program (e.g., no 'g' prefix). */ #define PROGRAM_NAME "factor" -#define AUTHORS proper_name ("Paul Rubin") +#define AUTHORS \ + proper_name ("Paul Rubin"), \ + proper_name_utf8 ("Torbjorn Granlund", "Torbj\303\266rn Granlund"), \ + proper_name_utf8 ("Niels Moller", "Niels M\303\266ller") /* Token delimiters when reading from a file. */ #define DELIM "\n\t " -static bool verbose = false; +#ifndef STAT_SQUFOF +# define STAT_SQUFOF 0 +#endif + +#ifndef USE_LONGLONG_H +# define USE_LONGLONG_H 1 +#endif + +#if USE_LONGLONG_H + +/* Make definitions for longlong.h to make it do what it can do for us */ +# define W_TYPE_SIZE 64 /* bitcount for uintmax_t */ +# define UWtype uintmax_t +# define UHWtype unsigned long int +# undef UDWtype +# if HAVE_ATTRIBUTE_MODE +typedef unsigned int UQItype __attribute__ ((mode (QI))); +typedef int SItype __attribute__ ((mode (SI))); +typedef unsigned int USItype __attribute__ ((mode (SI))); +typedef int DItype __attribute__ ((mode (DI))); +typedef unsigned int UDItype __attribute__ ((mode (DI))); +# else +typedef unsigned char UQItype; +typedef long SItype; +typedef unsigned long int USItype; +# if HAVE_LONG_LONG +typedef long long int DItype; +typedef unsigned long long int UDItype; +# else /* Assume `long' gives us a wide enough type. Needed for hppa2.0w. */ +typedef long int DItype; +typedef unsigned long int UDItype; +# endif +# endif +# define LONGLONG_STANDALONE /* Don't require GMP's longlong.h mdep files */ +# define ASSERT(x) /* FIXME make longlong.h really standalone */ +# define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */ +# ifndef __GMP_GNUC_PREREQ +# define __GMP_GNUC_PREREQ(a,b) 1 +# endif +# if _ARCH_PPC +# define HAVE_HOST_CPU_FAMILY_powerpc 1 +# endif +# include "longlong.h" +# ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB +const unsigned char factor_clz_tab[129] = +{ + 1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, + 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, + 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, + 9 +}; +# endif + +#else /* not USE_LONGLONG_H */ + +# define W_TYPE_SIZE (8 * sizeof (uintmax_t)) +# define __ll_B ((uintmax_t) 1 << (W_TYPE_SIZE / 2)) +# define __ll_lowpart(t) ((uintmax_t) (t) & (__ll_B - 1)) +# define __ll_highpart(t) ((uintmax_t) (t) >> (W_TYPE_SIZE / 2)) + +#endif + +enum alg_type { ALG_POLLARD_RHO = 1, ALG_SQUFOF = 2 }; + +static enum alg_type alg; + +/* 2*3*5*7*11...*101 is 128 bits, and has 26 prime factors */ +#define MAX_NFACTS 26 + +enum +{ + VERBOSE_OPTION = CHAR_MAX + 1 +}; + +static struct option const long_options[] = +{ + {"verbose", no_argument, NULL, VERBOSE_OPTION}, + {GETOPT_HELP_OPTION_DECL}, + {GETOPT_VERSION_OPTION_DECL}, + {NULL, 0, NULL, 0} +}; + +struct factors +{ + uintmax_t plarge[2]; /* Can have a single large factor */ + uintmax_t p[MAX_NFACTS]; + unsigned char e[MAX_NFACTS]; + unsigned char nfactors; +}; + +#if HAVE_GMP +struct mp_factors +{ + mpz_t *p; + unsigned long int *e; + unsigned long int nfactors; +}; +#endif + +static void factor (uintmax_t, uintmax_t, struct factors *); + +#ifndef umul_ppmm +# define umul_ppmm(w1, w0, u, v) \ + do { \ + uintmax_t __x0, __x1, __x2, __x3; \ + unsigned long int __ul, __vl, __uh, __vh; \ + uintmax_t __u = (u), __v = (v); \ + \ + __ul = __ll_lowpart (__u); \ + __uh = __ll_highpart (__u); \ + __vl = __ll_lowpart (__v); \ + __vh = __ll_highpart (__v); \ + \ + __x0 = (uintmax_t) __ul * __vl; \ + __x1 = (uintmax_t) __ul * __vh; \ + __x2 = (uintmax_t) __uh * __vl; \ + __x3 = (uintmax_t) __uh * __vh; \ + \ + __x1 += __ll_highpart (__x0);/* this can't give carry */ \ + __x1 += __x2; /* but this indeed can */ \ + if (__x1 < __x2) /* did we get it? */ \ + __x3 += __ll_B; /* yes, add it in the proper pos. */ \ + \ + (w1) = __x3 + __ll_highpart (__x1); \ + (w0) = (__x1 << W_TYPE_SIZE / 2) + __ll_lowpart (__x0); \ + } while (0) +#endif + +#if !defined udiv_qrnnd || defined UDIV_NEEDS_NORMALIZATION +/* Define our own, not needing normalization. This function is + currently not performance critical, so keep it simple. Similar to + the mod macro below. */ +# undef udiv_qrnnd +# define udiv_qrnnd(q, r, n1, n0, d) \ + do { \ + uintmax_t __d1, __d0, __q, __r1, __r0; \ + \ + assert ((n1) < (d)); \ + __d1 = (d); __d0 = 0; \ + __r1 = (n1); __r0 = (n0); \ + __q = 0; \ + for (unsigned int __i = W_TYPE_SIZE; __i > 0; __i--) \ + { \ + rsh2 (__d1, __d0, __d1, __d0, 1); \ + __q <<= 1; \ + if (ge2 (__r1, __r0, __d1, __d0)) \ + { \ + __q++; \ + sub_ddmmss (__r1, __r0, __r1, __r0, __d1, __d0); \ + } \ + } \ + (r) = __r0; \ + (q) = __q; \ + } while (0) +#endif + +#if !defined add_ssaaaa +# define add_ssaaaa(sh, sl, ah, al, bh, bl) \ + do { \ + uintmax_t _add_x; \ + _add_x = (al) + (bl); \ + (sh) = (ah) + (bh) + (_add_x < (al)); \ + (sl) = _add_x; \ + } while (0) +#endif + +#define rsh2(rh, rl, ah, al, cnt) \ + do { \ + (rl) = ((ah) << (W_TYPE_SIZE - (cnt))) | ((al) >> (cnt)); \ + (rh) = (ah) >> (cnt); \ + } while (0) + +#define lsh2(rh, rl, ah, al, cnt) \ + do { \ + (rh) = ((ah) << cnt) | ((al) >> (W_TYPE_SIZE - (cnt))); \ + (rl) = (al) << (cnt); \ + } while (0) + +#define ge2(ah, al, bh, bl) \ + ((ah) > (bh) || ((ah) == (bh) && (al) >= (bl))) + +#define gt2(ah, al, bh, bl) \ + ((ah) > (bh) || ((ah) == (bh) && (al) > (bl))) + +#ifndef sub_ddmmss +# define sub_ddmmss(rh, rl, ah, al, bh, bl) \ + do { \ + uintmax_t _cy; \ + _cy = (al) < (bl); \ + (rl) = (al) - (bl); \ + (rh) = (ah) - (bh) - _cy; \ + } while (0) +#endif + +#ifndef count_leading_zeros +# define count_leading_zeros(count, x) do { \ + uintmax_t __clz_x = (x); \ + unsigned int __clz_c; \ + for (__clz_c = 0; \ + (__clz_x & ((uintmax_t) 0xff << (W_TYPE_SIZE - 8))) == 0; \ + __clz_c += 8) \ + __clz_x <<= 8; \ + for (; (intmax_t)__clz_x >= 0; __clz_c++) \ + __clz_x <<= 1; \ + (count) = __clz_c; \ + } while (0) +#endif + +#ifndef count_trailing_zeros +# define count_trailing_zeros(count, x) do { \ + uintmax_t __ctz_x = (x); \ + unsigned int __ctz_c = 0; \ + while ((__ctz_x & 1) == 0) \ + { \ + __ctz_x >>= 1; \ + __ctz_c++; \ + } \ + (count) = __ctz_c; \ + } while (0) +#endif + +/* Requires that a < n and b <= n */ +#define submod(r,a,b,n) \ + do { \ + uintmax_t _t = - (uintmax_t) (a < b); \ + (r) = ((n) & _t) + (a) - (b); \ + } while (0) + +#define addmod(r,a,b,n) \ + submod ((r), (a), ((n) - (b)), (n)) + +/* Modular two-word addition and subtraction. For performance reasons, the + most significant bit of n1 must be clear. The destination variables must be + distinct from the mod operand. */ +#define addmod2(r1, r0, a1, a0, b1, b0, n1, n0) \ + do { \ + add_ssaaaa ((r1), (r0), (a1), (a0), (b1), (b0)); \ + if (ge2 ((r1), (r0), (n1), (n0))) \ + sub_ddmmss ((r1), (r0), (r1), (r0), (n1), (n0)); \ + } while (0) +#define submod2(r1, r0, a1, a0, b1, b0, n1, n0) \ + do { \ + sub_ddmmss ((r1), (r0), (a1), (a0), (b1), (b0)); \ + if ((intmax_t) (r1) < 0) \ + add_ssaaaa ((r1), (r0), (r1), (r0), (n1), (n0)); \ + } while (0) + +#define HIGHBIT_TO_MASK(x) \ + (((intmax_t)-1 >> 1) < 0 \ + ? (uintmax_t)((intmax_t)(x) >> (W_TYPE_SIZE - 1)) \ + : ((x) & ((uintmax_t) 1 << (W_TYPE_SIZE - 1)) \ + ? UINTMAX_MAX : (uintmax_t) 0)) + +/* Compute r = a mod d, where r = <*t1,retval>, a = <a1,a0>, d = <d1,d0>. + Requires that d1 != 0. */ +static uintmax_t +mod2 (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t d1, uintmax_t d0) +{ + int cntd, cnta; + + assert (d1 != 0); + + if (a1 == 0) + { + *r1 = 0; + return a0; + } + + count_leading_zeros (cntd, d1); + count_leading_zeros (cnta, a1); + int cnt = cntd - cnta; + lsh2 (d1, d0, d1, d0, cnt); + for (int i = 0; i < cnt; i++) + { + if (ge2 (a1, a0, d1, d0)) + sub_ddmmss (a1, a0, a1, a0, d1, d0); + rsh2 (d1, d0, d1, d0, 1); + } + + *r1 = a1; + return a0; +} + +static uintmax_t _GL_ATTRIBUTE_CONST +gcd_odd (uintmax_t a, uintmax_t b) +{ + if ( (b & 1) == 0) + { + uintmax_t t = b; + b = a; + a = t; + } + if (a == 0) + return b; + + /* Take out least significant one bit, to make room for sign */ + b >>= 1; + + for (;;) + { + uintmax_t t; + uintmax_t bgta; + + while ((a & 1) == 0) + a >>= 1; + a >>= 1; + + t = a - b; + if (t == 0) + return (a << 1) + 1; + + bgta = HIGHBIT_TO_MASK (t); + + /* b <-- min (a, b) */ + b += (bgta & t); + + /* a <-- |a - b| */ + a = (t ^ bgta) - bgta; + } +} + +static uintmax_t +gcd2_odd (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0) +{ + while ((a0 & 1) == 0) + rsh2 (a1, a0, a1, a0, 1); + while ((b0 & 1) == 0) + rsh2 (b1, b0, b1, b0, 1); + + for (;;) + { + if ((b1 | a1) == 0) + { + *r1 = 0; + return gcd_odd (b0, a0); + } + + if (gt2 (a1, a0, b1, b0)) + { + sub_ddmmss (a1, a0, a1, a0, b1, b0); + do + rsh2 (a1, a0, a1, a0, 1); + while ((a0 & 1) == 0); + } + else if (gt2 (b1, b0, a1, a0)) + { + sub_ddmmss (b1, b0, b1, b0, a1, a0); + do + rsh2 (b1, b0, b1, b0, 1); + while ((b0 & 1) == 0); + } + else + break; + } + + *r1 = a1; + return a0; +} + +static void +factor_insert_multiplicity (struct factors *factors, + uintmax_t prime, unsigned int m) +{ + unsigned int nfactors = factors->nfactors; + uintmax_t *p = factors->p; + unsigned char *e = factors->e; + + /* Locate position for insert new or increment e. */ + int i; + for (i = nfactors - 1; i >= 0; i--) + { + if (p[i] <= prime) + break; + } + + if (i < 0 || p[i] != prime) + { + for (int j = nfactors - 1; j > i; j--) + { + p[j + 1] = p[j]; + e[j + 1] = e[j]; + } + p[i + 1] = prime; + e[i + 1] = m; + factors->nfactors = nfactors + 1; + } + else + { + e[i] += m; + } +} + +#define factor_insert(f, p) factor_insert_multiplicity (f, p, 1) + +static void +factor_insert_large (struct factors *factors, + uintmax_t p1, uintmax_t p0) +{ + if (p1 > 0) + { + assert (factors->plarge[1] == 0); + factors->plarge[0] = p0; + factors->plarge[1] = p1; + } + else + factor_insert (factors, p0); +} #if HAVE_GMP -static mpz_t *factor = NULL; -static size_t nfactors_found = 0; -static size_t nfactors_allocated = 0; +static void mp_factor (mpz_t, struct mp_factors *); + +static void +mp_factor_init (struct mp_factors *factors) +{ + factors->p = xmalloc (1); + factors->e = xmalloc (1); + factors->nfactors = 0; +} + +static void +mp_factor_clear (struct mp_factors *factors) +{ + for (unsigned int i = 0; i < factors->nfactors; i++) + mpz_clear (factors->p[i]); + + free (factors->p); + free (factors->e); +} + +static void +mp_factor_insert (struct mp_factors *factors, mpz_t prime) +{ + unsigned long int nfactors = factors->nfactors; + mpz_t *p = factors->p; + unsigned long int *e = factors->e; + long i; + + /* Locate position for insert new or increment e. */ + for (i = nfactors - 1; i >= 0; i--) + { + if (mpz_cmp (p[i], prime) <= 0) + break; + } + + if (i < 0 || mpz_cmp (p[i], prime) != 0) + { + p = xrealloc (p, (nfactors + 1) * sizeof p[0]); + e = xrealloc (e, (nfactors + 1) * sizeof e[0]); + + mpz_init (p[nfactors]); + for (long j = nfactors - 1; j > i; j--) + { + mpz_set (p[j + 1], p[j]); + e[j + 1] = e[j]; + } + mpz_set (p[i + 1], prime); + e[i + 1] = 1; + + factors->p = p; + factors->e = e; + factors->nfactors = nfactors + 1; + } + else + { + e[i] += 1; + } +} + +static void +mp_factor_insert_ui (struct mp_factors *factors, unsigned long int prime) +{ + mpz_t pz; + + mpz_init_set_ui (pz, prime); + mp_factor_insert (factors, pz); + mpz_clear (pz); +} +#endif /* HAVE_GMP */ + + +#define P(a,b,c,d) a, +static const unsigned char primes_diff[] = { +#include "primes.h" +0,0,0,0,0,0,0 /* 7 sentinels for 8-way loop */ +}; +#undef P + +#define PRIMES_PTAB_ENTRIES \ + (sizeof (primes_diff) / sizeof (primes_diff[0]) - 8 + 1) + +#define P(a,b,c,d) b, +static const unsigned char primes_diff8[] = { +#include "primes.h" +0,0,0,0,0,0,0 /* 7 sentinels for 8-way loop */ +}; +#undef P + +struct primes_dtab +{ + uintmax_t binv, lim; +}; + +#define P(a,b,c,d) {c,d}, +static const struct primes_dtab primes_dtab[] = { +#include "primes.h" +{1,0},{1,0},{1,0},{1,0},{1,0},{1,0},{1,0} /* 7 sentinels for 8-way loop */ +}; +#undef P + +/* This flag is honored only in the GMP code. */ +static int verbose = 0; + +/* Prove primality or run probabilistic tests. */ +static bool flag_prove_primality = true; + +/* Number of Miller-Rabin tests to run when not proving primality. */ +#define MR_REPS 25 + +#ifdef __GNUC__ +# define LIKELY(cond) __builtin_expect ((cond), 1) +# define UNLIKELY(cond) __builtin_expect ((cond), 0) +#else +# define LIKELY(cond) (cond) +# define UNLIKELY(cond) (cond) +#endif static void debug (char const *fmt, ...) @@ -65,368 +650,1637 @@ debug (char const *fmt, ...) } static void -emit_factor (mpz_t n) +factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i, + unsigned int off) { - if (nfactors_found == nfactors_allocated) - factor = X2NREALLOC (factor, &nfactors_allocated); - mpz_init (factor[nfactors_found]); - mpz_set (factor[nfactors_found], n); - ++nfactors_found; + for (unsigned int j = 0; j < off; j++) + p += primes_diff[i + j]; + factor_insert (factors, p); } -static void -emit_ul_factor (unsigned long int i) +/* Trial division with odd primes uses the following trick. + + Let p be an odd prime, and B = 2^{W_TYPE_SIZE}. For simplicity, + consider the case t < B (this is the second loop below). + + From our tables we get + + binv = p^{-1} (mod B) + lim = floor ( (B-1) / p ). + + First assume that t is a multiple of p, t = q * p. Then 0 <= q <= lim + (and all quotients in this range occur for some t). + + Then t = q * p is true also (mod B), and p is invertible we get + + q = t * binv (mod B). + + Next, assume that t is *not* divisible by p. Since multiplication + by binv (mod B) is a one-to-one mapping, + + t * binv (mod B) > lim, + + because all the smaller values are already taken. + + This can be summed up by saying that the function + + q(t) = binv * t (mod B) + + is a permutation of the range 0 <= t < B, with the curious property + that it maps the multiples of p onto the range 0 <= q <= lim, in + order, and the non-multiples of p onto the range lim < q < B. + */ + +static uintmax_t +factor_using_division (uintmax_t *t1p, uintmax_t t1, uintmax_t t0, + struct factors *factors) { - mpz_t t; - mpz_init (t); - mpz_set_ui (t, i); - emit_factor (t); - mpz_clear (t); + if (t0 % 2 == 0) + { + unsigned int cnt; + + if (t0 == 0) + { + count_trailing_zeros (cnt, t1); + t0 = t1 >> cnt; + t1 = 0; + cnt += W_TYPE_SIZE; + } + else + { + count_trailing_zeros (cnt, t0); + rsh2 (t1, t0, t1, t0, cnt); + } + + factor_insert_multiplicity (factors, 2, cnt); + } + + uintmax_t p = 3; + unsigned int i; + for (i = 0; t1 > 0 && i < PRIMES_PTAB_ENTRIES; i++) + { + for (;;) + { + uintmax_t q1, q0, hi, lo; + + q0 = t0 * primes_dtab[i].binv; + umul_ppmm (hi, lo, q0, p); + if (hi > t1) + break; + hi = t1 - hi; + q1 = hi * primes_dtab[i].binv; + if (LIKELY (q1 > primes_dtab[i].lim)) + break; + t1 = q1; t0 = q0; + factor_insert (factors, p); + } + p += primes_diff[i + 1]; + } + if (t1p) + *t1p = t1; + +#define DIVBLOCK(I) \ + do { \ + for (;;) \ + { \ + q = t0 * pd[I].binv; \ + if (LIKELY (q > pd[I].lim)) \ + break; \ + t0 = q; \ + factor_insert_refind (factors, p, i + 1, I); \ + } \ + } while (0) + + for (; i < PRIMES_PTAB_ENTRIES; i += 8) + { + uintmax_t q; + const struct primes_dtab *pd = &primes_dtab[i]; + DIVBLOCK (0); + DIVBLOCK (1); + DIVBLOCK (2); + DIVBLOCK (3); + DIVBLOCK (4); + DIVBLOCK (5); + DIVBLOCK (6); + DIVBLOCK (7); + + p += primes_diff8[i]; + if (p * p > t0) + break; + } + + return t0; } +#if HAVE_GMP static void -factor_using_division (mpz_t t, unsigned int limit) +mp_factor_using_division (mpz_t t, struct mp_factors *factors) { - mpz_t q, r; - unsigned long int f; - int ai; - static unsigned int const add[] = {4, 2, 4, 2, 4, 6, 2, 6}; - unsigned int const *addv = add; - unsigned int failures; + mpz_t q; + unsigned long int p; - debug ("[trial division (%u)] ", limit); + debug ("[trial division] "); mpz_init (q); - mpz_init (r); - f = mpz_scan1 (t, 0); - mpz_div_2exp (t, t, f); - while (f) + p = mpz_scan1 (t, 0); + mpz_div_2exp (t, t, p); + while (p) { - emit_ul_factor (2); - --f; + mp_factor_insert_ui (factors, 2); + --p; } - while (true) + p = 3; + for (unsigned int i = 1; i <= PRIMES_PTAB_ENTRIES;) { - mpz_tdiv_qr_ui (q, r, t, 3); - if (mpz_cmp_ui (r, 0) != 0) - break; - mpz_set (t, q); - emit_ul_factor (3); + if (! mpz_divisible_ui_p (t, p)) + { + p += primes_diff[i++]; + if (mpz_cmp_ui (t, p * p) < 0) + break; + } + else + { + mpz_tdiv_q_ui (t, t, p); + mp_factor_insert_ui (factors, p); + } } - while (true) + mpz_clear (q); +} +#endif + +/* Entry i contains (2i+1)^(-1) mod 2^8. */ +static const unsigned char binvert_table[128] = +{ + 0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF, + 0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF, + 0xE1, 0x8B, 0xAD, 0x97, 0x19, 0x83, 0xA5, 0xCF, + 0xD1, 0xFB, 0x1D, 0x87, 0x09, 0xF3, 0x15, 0xBF, + 0xC1, 0x6B, 0x8D, 0x77, 0xF9, 0x63, 0x85, 0xAF, + 0xB1, 0xDB, 0xFD, 0x67, 0xE9, 0xD3, 0xF5, 0x9F, + 0xA1, 0x4B, 0x6D, 0x57, 0xD9, 0x43, 0x65, 0x8F, + 0x91, 0xBB, 0xDD, 0x47, 0xC9, 0xB3, 0xD5, 0x7F, + 0x81, 0x2B, 0x4D, 0x37, 0xB9, 0x23, 0x45, 0x6F, + 0x71, 0x9B, 0xBD, 0x27, 0xA9, 0x93, 0xB5, 0x5F, + 0x61, 0x0B, 0x2D, 0x17, 0x99, 0x03, 0x25, 0x4F, + 0x51, 0x7B, 0x9D, 0x07, 0x89, 0x73, 0x95, 0x3F, + 0x41, 0xEB, 0x0D, 0xF7, 0x79, 0xE3, 0x05, 0x2F, + 0x31, 0x5B, 0x7D, 0xE7, 0x69, 0x53, 0x75, 0x1F, + 0x21, 0xCB, 0xED, 0xD7, 0x59, 0xC3, 0xE5, 0x0F, + 0x11, 0x3B, 0x5D, 0xC7, 0x49, 0x33, 0x55, 0xFF +}; + +/* Compute n^(-1) mod B, using a Newton iteration. */ +#define binv(inv,n) \ + do { \ + uintmax_t __n = (n); \ + uintmax_t __inv; \ + \ + __inv = binvert_table[(__n / 2) & 0x7F]; /* 8 */ \ + if (W_TYPE_SIZE > 8) __inv = 2 * __inv - __inv * __inv * __n; \ + if (W_TYPE_SIZE > 16) __inv = 2 * __inv - __inv * __inv * __n; \ + if (W_TYPE_SIZE > 32) __inv = 2 * __inv - __inv * __inv * __n; \ + \ + if (W_TYPE_SIZE > 64) \ + { \ + int __invbits = 64; \ + do { \ + __inv = 2 * __inv - __inv * __inv * __n; \ + __invbits *= 2; \ + } while (__invbits < W_TYPE_SIZE); \ + } \ + \ + (inv) = __inv; \ + } while (0) + +/* q = u / d, assuming d|u. */ +#define divexact_21(q1, q0, u1, u0, d) \ + do { \ + uintmax_t _di, _q0; \ + binv (_di, (d)); \ + _q0 = (u0) * _di; \ + if ((u1) >= (d)) \ + { \ + uintmax_t _p1, _p0; \ + umul_ppmm (_p1, _p0, _q0, d); \ + (q1) = ((u1) - _p1) * _di; \ + (q0) = _q0; \ + } \ + else \ + { \ + (q0) = _q0; \ + (q1) = 0; \ + } \ + } while (0) + +/* x B (mod n). */ +#define redcify(r_prim, r, n) \ + do { \ + uintmax_t _redcify_q ATTRIBUTE_UNUSED; \ + udiv_qrnnd (_redcify_q, r_prim, r, 0, n); \ + } while (0) + +/* x B^2 (mod n). Requires x > 0, n1 < B/2 */ +#define redcify2(r1, r0, x, n1, n0) \ + do { \ + uintmax_t _r1, _r0, _i; \ + if ((x) < (n1)) \ + { \ + _r1 = (x); _r0 = 0; \ + _i = W_TYPE_SIZE; \ + } \ + else \ + { \ + _r1 = 0; _r0 = (x); \ + _i = 2*W_TYPE_SIZE; \ + } \ + while (_i-- > 0) \ + { \ + lsh2 (_r1, _r0, _r1, _r0, 1); \ + if (ge2 (_r1, _r0, (n1), (n0))) \ + sub_ddmmss (_r1, _r0, _r1, _r0, (n1), (n0)); \ + } \ + (r1) = _r1; \ + (r0) = _r0; \ + } while (0) + +/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B. + Both a and b must be in redc form, the result will be in redc form too. */ +static inline uintmax_t +mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi) +{ + uintmax_t rh, rl, q, th, tl, xh; + + umul_ppmm (rh, rl, a, b); + q = rl * mi; + umul_ppmm (th, tl, q, m); + xh = rh - th; + if (rh < th) + xh += m; + + return xh; +} + +/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B. + Both a and b must be in redc form, the result will be in redc form too. + For performance reasons, the most significant bit of m must be clear. */ +static uintmax_t +mulredc2 (uintmax_t *r1p, + uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0, + uintmax_t m1, uintmax_t m0, uintmax_t mi) +{ + uintmax_t r1, r0, q, p1, p0, t1, t0, s1, s0; + mi = -mi; + assert ( (a1 >> (W_TYPE_SIZE - 1)) == 0); + assert ( (b1 >> (W_TYPE_SIZE - 1)) == 0); + assert ( (m1 >> (W_TYPE_SIZE - 1)) == 0); + + /* First compute a0 * <b1, b0> B^{-1} + +-----+ + |a0 b0| + +--+--+--+ + |a0 b1| + +--+--+--+ + |q0 m0| + +--+--+--+ + |q0 m1| + -+--+--+--+ + |r1|r0| 0| + +--+--+--+ + */ + umul_ppmm (t1, t0, a0, b0); + umul_ppmm (r1, r0, a0, b1); + q = mi * t0; + umul_ppmm (p1, p0, q, m0); + umul_ppmm (s1, s0, q, m1); + r0 += (t0 != 0); /* Carry */ + add_ssaaaa (r1, r0, r1, r0, 0, p1); + add_ssaaaa (r1, r0, r1, r0, 0, t1); + add_ssaaaa (r1, r0, r1, r0, s1, s0); + + /* Next, (a1 * <b1, b0> + <r1, r0> B^{-1} + +-----+ + |a1 b0| + +--+--+ + |r1|r0| + +--+--+--+ + |a1 b1| + +--+--+--+ + |q1 m0| + +--+--+--+ + |q1 m1| + -+--+--+--+ + |r1|r0| 0| + +--+--+--+ + */ + umul_ppmm (t1, t0, a1, b0); + umul_ppmm (s1, s0, a1, b1); + add_ssaaaa (t1, t0, t1, t0, 0, r0); + q = mi * t0; + add_ssaaaa (r1, r0, s1, s0, 0, r1); + umul_ppmm (p1, p0, q, m0); + umul_ppmm (s1, s0, q, m1); + r0 += (t0 != 0); /* Carry */ + add_ssaaaa (r1, r0, r1, r0, 0, p1); + add_ssaaaa (r1, r0, r1, r0, 0, t1); + add_ssaaaa (r1, r0, r1, r0, s1, s0); + + if (ge2 (r1, r0, m1, m0)) + sub_ddmmss (r1, r0, r1, r0, m1, m0); + + *r1p = r1; + return r0; +} + +static uintmax_t _GL_ATTRIBUTE_CONST +powm (uintmax_t b, uintmax_t e, uintmax_t n, uintmax_t ni, uintmax_t one) +{ + uintmax_t y = one; + + if (e & 1) + y = b; + + while (e != 0) { - mpz_tdiv_qr_ui (q, r, t, 5); - if (mpz_cmp_ui (r, 0) != 0) - break; - mpz_set (t, q); - emit_ul_factor (5); + b = mulredc (b, b, n, ni); + e >>= 1; + + if (e & 1) + y = mulredc (y, b, n, ni); } - failures = 0; - f = 7; - ai = 0; - while (mpz_cmp_ui (t, 1) != 0) + return y; +} + +static uintmax_t +powm2 (uintmax_t *r1m, + const uintmax_t *bp, const uintmax_t *ep, const uintmax_t *np, + uintmax_t ni, const uintmax_t *one) +{ + uintmax_t r1, r0, b1, b0, n1, n0; + unsigned int i; + uintmax_t e; + + b0 = bp[0]; + b1 = bp[1]; + n0 = np[0]; + n1 = np[1]; + + r0 = one[0]; + r1 = one[1]; + + for (e = ep[0], i = W_TYPE_SIZE; i > 0; i--, e >>= 1) { - mpz_tdiv_qr_ui (q, r, t, f); - if (mpz_cmp_ui (r, 0) != 0) + if (e & 1) { - f += addv[ai]; - if (mpz_cmp_ui (q, f) < 0) - break; - ai = (ai + 1) & 7; - failures++; - if (failures > limit) - break; + r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); + r1 = *r1m; + } + b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); + b1 = *r1m; + } + for (e = ep[1]; e > 0; e >>= 1) + { + if (e & 1) + { + r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); + r1 = *r1m; + } + b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); + b1 = *r1m; + } + *r1m = r1; + return r0; +} + +static bool _GL_ATTRIBUTE_CONST +millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, + unsigned int k, uintmax_t one) +{ + uintmax_t y = powm (b, q, n, ni, one); + + uintmax_t nm1 = n - one; /* -1, but in redc representation. */ + + if (y == one || y == nm1) + return true; + + for (unsigned int i = 1; i < k; i++) + { + y = mulredc (y, y, n, ni); + + if (y == nm1) + return true; + if (y == one) + return false; + } + return false; +} + +static bool +millerrabin2 (const uintmax_t *np, uintmax_t ni, const uintmax_t *bp, + const uintmax_t *qp, unsigned int k, const uintmax_t *one) +{ + uintmax_t y1, y0, nm1_1, nm1_0, r1m; + + y0 = powm2 (&r1m, bp, qp, np, ni, one); + y1 = r1m; + + if (y0 == one[0] && y1 == one[1]) + return true; + + sub_ddmmss (nm1_1, nm1_0, np[1], np[0], one[1], one[0]); + + if (y0 == nm1_0 && y1 == nm1_1) + return true; + + for (unsigned int i = 1; i < k; i++) + { + y0 = mulredc2 (&r1m, y1, y0, y1, y0, np[1], np[0], ni); + y1 = r1m; + + if (y0 == nm1_0 && y1 == nm1_1) + return true; + if (y0 == one[0] && y1 == one[1]) + return false; + } + return false; +} + +#if HAVE_GMP +static bool +mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y, + mpz_srcptr q, unsigned long int k) +{ + mpz_powm (y, x, q, n); + + if (mpz_cmp_ui (y, 1) == 0 || mpz_cmp (y, nm1) == 0) + return true; + + for (unsigned long int i = 1; i < k; i++) + { + mpz_powm_ui (y, y, 2, n); + if (mpz_cmp (y, nm1) == 0) + return true; + if (mpz_cmp_ui (y, 1) == 0) + return false; + } + return false; +} +#endif + +/* Lucas' prime test. The number of iterations vary greatly, up to a few dozen + have been observed. The average seem to be about 2. */ +static bool +prime_p (uintmax_t n) +{ + int k; + bool is_prime; + uintmax_t a_prim, one, ni; + struct factors factors; + + if (n <= 1) + return false; + + /* We have already casted out small primes. */ + if (n < (uintmax_t) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) + return true; + + /* Precomputation for Miller-Rabin. */ + uintmax_t q = n - 1; + for (k = 0; (q & 1) == 0; k++) + q >>= 1; + + uintmax_t a = 2; + binv (ni, n); /* ni <- 1/n mod B */ + redcify (one, 1, n); + addmod (a_prim, one, one, n); /* i.e., redcify a = 2 */ + + /* Perform a Miller-Rabin test, finds most composites quickly. */ + if (!millerrabin (n, ni, a_prim, q, k, one)) + return false; + + if (flag_prove_primality) + { + /* Factor n-1 for Lucas. */ + factor (0, n - 1, &factors); + } + + /* Loop until Lucas proves our number prime, or Miller-Rabin proves our + number composite. */ + for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) + { + if (flag_prove_primality) + { + is_prime = true; + for (unsigned int i = 0; i < factors.nfactors && is_prime; i++) + { + is_prime + = powm (a_prim, (n - 1) / factors.p[i], n, ni, one) != one; + } } else { - mpz_swap (t, q); - emit_ul_factor (f); - failures = 0; + /* After enough Miller-Rabin runs, be content. */ + is_prime = (r == MR_REPS - 1); } + + if (is_prime) + return true; + + a += primes_diff[r]; /* Establish new base. */ + + /* The following is equivalent to redcify (a_prim, a, n). It runs faster + on most processors, since it avoids udiv_qrnnd. If we go down the + udiv_qrnnd_preinv path, this code should be replaced. */ + { + uintmax_t s1, s0; + umul_ppmm (s1, s0, one, a); + if (LIKELY (s1 == 0)) + a_prim = s0 % n; + else + { + uintmax_t dummy ATTRIBUTE_UNUSED; + udiv_qrnnd (dummy, a_prim, s1, s0, n); + } + } + + if (!millerrabin (n, ni, a_prim, q, k, one)) + return false; } - mpz_clear (q); - mpz_clear (r); + error (0, 0, _("Lucas prime test failure. This should not happen")); + abort (); } -/* The number of Miller-Rabin tests we require. */ -enum { MR_REPS = 25 }; +static bool +prime2_p (uintmax_t n1, uintmax_t n0) +{ + uintmax_t q[2], nm1[2]; + uintmax_t a_prim[2]; + uintmax_t one[2]; + uintmax_t na[2]; + uintmax_t ni; + unsigned int k; + struct factors factors; + + if (n1 == 0) + return prime_p (n0); + + nm1[1] = n1 - (n0 == 0); + nm1[0] = n0 - 1; + if (nm1[0] == 0) + { + count_trailing_zeros (k, nm1[1]); -static void -factor_using_pollard_rho (mpz_t n, int a_int) + q[0] = nm1[1] >> k; + q[1] = 0; + k += W_TYPE_SIZE; + } + else + { + count_trailing_zeros (k, nm1[0]); + rsh2 (q[1], q[0], nm1[1], nm1[0], k); + } + + uintmax_t a = 2; + binv (ni, n0); + redcify2 (one[1], one[0], 1, n1, n0); + addmod2 (a_prim[1], a_prim[0], one[1], one[0], one[1], one[0], n1, n0); + + /* FIXME: Use scalars or pointers in arguments? Some consistency needed. */ + na[0] = n0; + na[1] = n1; + + if (!millerrabin2 (na, ni, a_prim, q, k, one)) + return false; + + if (flag_prove_primality) + { + /* Factor n-1 for Lucas. */ + factor (nm1[1], nm1[0], &factors); + } + + /* Loop until Lucas proves our number prime, or Miller-Rabin proves our + number composite. */ + for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) + { + bool is_prime; + uintmax_t e[2], y[2]; + + if (flag_prove_primality) + { + is_prime = true; + if (factors.plarge[1]) + { + uintmax_t pi; + binv (pi, factors.plarge[0]); + e[0] = pi * nm1[0]; + e[1] = 0; + y[0] = powm2 (&y[1], a_prim, e, na, ni, one); + is_prime = (y[0] != one[0] || y[1] != one[1]); + } + for (unsigned int i = 0; i < factors.nfactors && is_prime; i++) + { + /* FIXME: We always have the factor 2. Do we really need to + handle it here? We have done the same powering as part + of millerrabin. */ + if (factors.p[i] == 2) + rsh2 (e[1], e[0], nm1[1], nm1[0], 1); + else + divexact_21 (e[1], e[0], nm1[1], nm1[0], factors.p[i]); + y[0] = powm2 (&y[1], a_prim, e, na, ni, one); + is_prime = (y[0] != one[0] || y[1] != one[1]); + } + } + else + { + /* After enough Miller-Rabin runs, be content. */ + is_prime = (r == MR_REPS - 1); + } + + if (is_prime) + return true; + + a += primes_diff[r]; /* Establish new base. */ + redcify2 (a_prim[1], a_prim[0], a, n1, n0); + + if (!millerrabin2 (na, ni, a_prim, q, k, one)) + return false; + } + + error (0, 0, _("Lucas prime test failure. This should not happen")); + abort (); +} + +#if HAVE_GMP +static bool +mp_prime_p (mpz_t n) { - mpz_t x, x1, y, P; - mpz_t a; - mpz_t g; - mpz_t t1, t2; - int k, l, c; + bool is_prime; + mpz_t q, a, nm1, tmp; + struct mp_factors factors; - debug ("[pollard-rho (%d)] ", a_int); + if (mpz_cmp_ui (n, 1) <= 0) + return false; - mpz_init (g); - mpz_init (t1); - mpz_init (t2); + /* We have already casted out small primes. */ + if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0) + return true; - mpz_init_set_si (a, a_int); - mpz_init_set_si (y, 2); - mpz_init_set_si (x, 2); - mpz_init_set_si (x1, 2); - k = 1; - l = 1; - mpz_init_set_ui (P, 1); - c = 0; + mpz_inits (q, a, nm1, tmp, NULL); - while (mpz_cmp_ui (n, 1) != 0) + /* Precomputation for Miller-Rabin. */ + mpz_sub_ui (nm1, n, 1); + + /* Find q and k, where q is odd and n = 1 + 2**k * q. */ + unsigned long int k = mpz_scan1 (nm1, 0); + mpz_tdiv_q_2exp (q, nm1, k); + + mpz_set_ui (a, 2); + + /* Perform a Miller-Rabin test, finds most composites quickly. */ + if (!mp_millerrabin (n, nm1, a, tmp, q, k)) { -S2: - mpz_mul (x, x, x); mpz_add (x, x, a); mpz_mod (x, x, n); + is_prime = false; + goto ret2; + } + + if (flag_prove_primality) + { + /* Factor n-1 for Lucas. */ + mpz_set (tmp, nm1); + mp_factor (tmp, &factors); + } - mpz_sub (t1, x1, x); mpz_mul (t2, P, t1); mpz_mod (P, t2, n); - c++; - if (c == 20) + /* Loop until Lucas proves our number prime, or Miller-Rabin proves our + number composite. */ + for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++) + { + if (flag_prove_primality) { - c = 0; - mpz_gcd (g, P, n); - if (mpz_cmp_ui (g, 1) != 0) - goto S4; - mpz_set (y, x); + is_prime = true; + for (unsigned long int i = 0; i < factors.nfactors && is_prime; i++) + { + mpz_divexact (tmp, nm1, factors.p[i]); + mpz_powm (tmp, a, tmp, n); + is_prime = mpz_cmp_ui (tmp, 1) != 0; + } + } + else + { + /* After enough Miller-Rabin runs, be content. */ + is_prime = (r == MR_REPS - 1); } - k--; - if (k > 0) - goto S2; + if (is_prime) + goto ret1; - mpz_gcd (g, P, n); - if (mpz_cmp_ui (g, 1) != 0) - goto S4; + mpz_add_ui (a, a, primes_diff[r]); /* Establish new base. */ - mpz_set (x1, x); - k = l; - l = 2 * l; - unsigned int i; - for (i = 0; i < k; i++) + if (!mp_millerrabin (n, nm1, a, tmp, q, k)) { - mpz_mul (x, x, x); mpz_add (x, x, a); mpz_mod (x, x, n); + is_prime = false; + goto ret1; } - mpz_set (y, x); - c = 0; - goto S2; -S4: + } + + error (0, 0, _("Lucas prime test failure. This should not happen")); + abort (); + + ret1: + if (flag_prove_primality) + mp_factor_clear (&factors); + ret2: + mpz_clears (q, a, nm1, tmp, NULL); + + return is_prime; +} +#endif + +static void +factor_using_pollard_rho (uintmax_t n, unsigned long int a, + struct factors *factors) +{ + uintmax_t x, z, y, P, t, ni, g; + + unsigned long int k = 1; + unsigned long int l = 1; + + redcify (P, 1, n); + addmod (x, P, P, n); /* i.e., redcify(2) */ + y = z = x; + + while (n != 1) + { + assert (a < n); + + binv (ni, n); /* FIXME: when could we use old 'ni' value? */ + + for (;;) + { + do + { + x = mulredc (x, x, n, ni); + addmod (x, x, a, n); + + submod (t, z, x, n); + P = mulredc (P, t, n, ni); + + if (k % 32 == 1) + { + if (gcd_odd (P, n) != 1) + goto factor_found; + y = x; + } + } + while (--k != 0); + + z = x; + k = l; + l = 2 * l; + for (unsigned long int i = 0; i < k; i++) + { + x = mulredc (x, x, n, ni); + addmod (x, x, a, n); + } + y = x; + } + + factor_found: do { - mpz_mul (y, y, y); mpz_add (y, y, a); mpz_mod (y, y, n); - mpz_sub (t1, x1, y); mpz_gcd (g, t1, n); + y = mulredc (y, y, n, ni); + addmod (y, y, a, n); + + submod (t, z, y, n); + g = gcd_odd (t, n); + } + while (g == 1); + + n = n / g; + + if (!prime_p (g)) + factor_using_pollard_rho (g, a + 1, factors); + else + factor_insert (factors, g); + + if (prime_p (n)) + { + factor_insert (factors, n); + break; } - while (mpz_cmp_ui (g, 1) == 0); - mpz_div (n, n, g); /* divide by g, before g is overwritten */ + x = x % n; + z = z % n; + y = y % n; + } +} + +static void +factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int a, + struct factors *factors) +{ + uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m; + + unsigned long int k = 1; + unsigned long int l = 1; + + redcify2 (P1, P0, 1, n1, n0); + addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */ + y1 = z1 = x1; + y0 = z0 = x0; - if (!mpz_probab_prime_p (g, MR_REPS)) + while (n1 != 0 || n0 != 1) + { + binv (ni, n0); + + for (;;) { do { - mp_limb_t a_limb; - mpn_random (&a_limb, (mp_size_t) 1); - a_int = (int) a_limb; + x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); + x1 = r1m; + addmod2 (x1, x0, x1, x0, 0, a, n1, n0); + + submod2 (t1, t0, z1, z0, x1, x0, n1, n0); + P0 = mulredc2 (&r1m, P1, P0, t1, t0, n1, n0, ni); + P1 = r1m; + + if (k % 32 == 1) + { + g0 = gcd2_odd (&g1, P1, P0, n1, n0); + if (g1 != 0 || g0 != 1) + goto factor_found; + y1 = x1; y0 = x0; + } } - while (a_int == -2 || a_int == 0); + while (--k != 0); - debug ("[composite factor--restarting pollard-rho] "); - factor_using_pollard_rho (g, a_int); + z1 = x1; z0 = x0; + k = l; + l = 2 * l; + for (unsigned long int i = 0; i < k; i++) + { + x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); + x1 = r1m; + addmod2 (x1, x0, x1, x0, 0, a, n1, n0); + } + y1 = x1; y0 = x0; + } + + factor_found: + do + { + y0 = mulredc2 (&r1m, y1, y0, y1, y0, n1, n0, ni); + y1 = r1m; + addmod2 (y1, y0, y1, y0, 0, a, n1, n0); + + submod2 (t1, t0, z1, z0, y1, y0, n1, n0); + g0 = gcd2_odd (&g1, t1, t0, n1, n0); + } + while (g1 == 0 && g0 == 1); + + if (g1 == 0) + { + /* The found factor is one word. */ + divexact_21 (n1, n0, n1, n0, g0); /* n = n / g */ + + if (!prime_p (g0)) + factor_using_pollard_rho (g0, a + 1, factors); + else + factor_insert (factors, g0); } else { - emit_factor (g); + /* The found factor is two words. This is highly unlikely, thus hard + to trigger. Please be careful before you change this code! */ + uintmax_t ginv; + + binv (ginv, g0); /* Compute n = n / g. Since the result will */ + n0 = ginv * n0; /* fit one word, we can compute the quotient */ + n1 = 0; /* modulo B, ignoring the high divisor word. */ + + if (!prime2_p (g1, g0)) + factor_using_pollard_rho2 (g1, g0, a + 1, factors); + else + factor_insert_large (factors, g1, g0); + } + + if (n1 == 0) + { + if (prime_p (n0)) + { + factor_insert (factors, n0); + break; + } + + factor_using_pollard_rho (n0, a, factors); + return; } - mpz_mod (x, x, n); - mpz_mod (x1, x1, n); - mpz_mod (y, y, n); - if (mpz_probab_prime_p (n, MR_REPS)) + + if (prime2_p (n1, n0)) { - emit_factor (n); + factor_insert_large (factors, n1, n0); break; } - } - mpz_clear (g); - mpz_clear (P); - mpz_clear (t2); - mpz_clear (t1); - mpz_clear (a); - mpz_clear (x1); - mpz_clear (x); - mpz_clear (y); + x0 = mod2 (&x1, x1, x0, n1, n0); + z0 = mod2 (&z1, z1, z0, n1, n0); + y0 = mod2 (&y1, y1, y0, n1, n0); + } } -#else - +#if HAVE_GMP static void -debug (char const *fmt ATTRIBUTE_UNUSED, ...) +mp_factor_using_pollard_rho (mpz_t n, unsigned long int a, + struct mp_factors *factors) { -} + mpz_t x, z, y, P; + mpz_t t, t2; -#endif + debug ("[pollard-rho (%lu)] ", a); -/* The maximum number of factors, including -1, for negative numbers. */ -#define MAX_N_FACTORS (sizeof (uintmax_t) * CHAR_BIT) + mpz_inits (t, t2, NULL); + mpz_init_set_si (y, 2); + mpz_init_set_si (x, 2); + mpz_init_set_si (z, 2); + mpz_init_set_ui (P, 1); -/* The trial divisor increment wheel. Use it to skip over divisors that - are composites of 2, 3, 5, 7, or 11. The part from WHEEL_START up to - WHEEL_END is reused periodically, while the "lead in" is used to test - for those primes and to jump onto the wheel. For more information, see - http://www.utm.edu/research/primes/glossary/WheelFactorization.html */ + unsigned long long int k = 1; + unsigned long long int l = 1; -#include "wheel-size.h" /* For the definition of WHEEL_SIZE. */ -static const unsigned char wheel_tab[] = - { -#include "wheel.h" - }; + while (mpz_cmp_ui (n, 1) != 0) + { + for (;;) + { + do + { + mpz_mul (t, x, x); + mpz_mod (x, t, n); + mpz_add_ui (x, x, a); + + mpz_sub (t, z, x); + mpz_mul (t2, P, t); + mpz_mod (P, t2, n); + + if (k % 32 == 1) + { + mpz_gcd (t, P, n); + if (mpz_cmp_ui (t, 1) != 0) + goto factor_found; + mpz_set (y, x); + } + } + while (--k != 0); -#define WHEEL_START (wheel_tab + WHEEL_SIZE) -#define WHEEL_END (wheel_tab + ARRAY_CARDINALITY (wheel_tab)) + mpz_set (z, x); + k = l; + l = 2 * l; + for (unsigned long long int i = 0; i < k; i++) + { + mpz_mul (t, x, x); + mpz_mod (x, t, n); + mpz_add_ui (x, x, a); + } + mpz_set (y, x); + } -/* FIXME: comment */ + factor_found: + do + { + mpz_mul (t, y, y); + mpz_mod (y, t, n); + mpz_add_ui (y, y, a); -static size_t -factor_wheel (uintmax_t n0, size_t max_n_factors, uintmax_t *factors) -{ - uintmax_t n = n0, d, q; - size_t n_factors = 0; - unsigned char const *w = wheel_tab; + mpz_sub (t, z, y); + mpz_gcd (t, t, n); + } + while (mpz_cmp_ui (t, 1) == 0); - if (n <= 1) - return n_factors; + mpz_divexact (n, n, t); /* divide by t, before t is overwritten */ - /* The exit condition in the following loop is correct because - any time it is tested one of these 3 conditions holds: - (1) d divides n - (2) n is prime - (3) n is composite but has no factors less than d. - If (1) or (2) obviously the right thing happens. - If (3), then since n is composite it is >= d^2. */ + if (!mp_prime_p (t)) + { + debug ("[composite factor--restarting pollard-rho] "); + mp_factor_using_pollard_rho (t, a + 1, factors); + } + else + { + mp_factor_insert (factors, t); + } - d = 2; - do - { - q = n / d; - while (n == q * d) + if (mp_prime_p (n)) { - assert (n_factors < max_n_factors); - factors[n_factors++] = d; - n = q; - q = n / d; + mp_factor_insert (factors, n); + break; } - d += *(w++); - if (w == WHEEL_END) - w = WHEEL_START; + + mpz_mod (x, x, n); + mpz_mod (z, z, n); + mpz_mod (y, y, n); } - while (d <= q); - if (n != 1 || n0 == 1) + mpz_clears (P, t2, t, z, x, y, NULL); +} +#endif + +/* FIXME: Maybe better to use an iteration converging to 1/sqrt(n)? If + algorithm is replaced, consider also returning the remainder. */ +static uintmax_t _GL_ATTRIBUTE_CONST +isqrt (uintmax_t n) +{ + uintmax_t x; + unsigned c; + if (n == 0) + return 0; + + count_leading_zeros (c, n); + + /* Make x > sqrt(n). This will be invariant through the loop. */ + x = (uintmax_t) 1 << ((W_TYPE_SIZE + 1 - c) / 2); + + for (;;) { - assert (n_factors < max_n_factors); - factors[n_factors++] = n; - } + uintmax_t y = (x + n/x) / 2; + if (y >= x) + return x; - return n_factors; + x = y; + } } -/* Single-precision factoring */ -static void -print_factors_single (uintmax_t n) +static uintmax_t _GL_ATTRIBUTE_CONST +isqrt2 (uintmax_t nh, uintmax_t nl) { - uintmax_t factors[MAX_N_FACTORS]; - size_t n_factors = factor_wheel (n, MAX_N_FACTORS, factors); - size_t i; - char buf[INT_BUFSIZE_BOUND (uintmax_t)]; + unsigned int shift; + uintmax_t x; - printf ("%s:", umaxtostr (n, buf)); - for (i = 0; i < n_factors; i++) - printf (" %s", umaxtostr (factors[i], buf)); - putchar ('\n'); + /* Ensures the remainder fits in an uintmax_t. */ + assert (nh < ((uintmax_t) 1 << (W_TYPE_SIZE - 2))); + + if (nh == 0) + return isqrt (nl); + + count_leading_zeros (shift, nh); + shift &= ~1; + + /* Make x > sqrt(n) */ + x = isqrt ( (nh << shift) + (nl >> (W_TYPE_SIZE - shift))) + 1; + x <<= (W_TYPE_SIZE - shift) / 2; + + /* Do we need more than one iteration? */ + for (;;) + { + uintmax_t r ATTRIBUTE_UNUSED; + uintmax_t q, y; + udiv_qrnnd (q, r, nh, nl, x); + y = (x + q) / 2; + + if (y >= x) + { + uintmax_t hi, lo; + umul_ppmm (hi, lo, x + 1, x + 1); + assert (gt2 (hi, lo, nh, nl)); + + umul_ppmm (hi, lo, x, x); + assert (ge2 (nh, nl, hi, lo)); + sub_ddmmss (hi, lo, nh, nl, hi, lo); + assert (hi == 0); + + return x; + } + + x = y; + } } -#if HAVE_GMP -static int -mpcompare (const void *av, const void *bv) +/* MAGIC[N] has a bit i set iff i is a quadratic residue mod N. */ +#define MAGIC64 ((uint64_t) 0x0202021202030213ULL) +#define MAGIC63 ((uint64_t) 0x0402483012450293ULL) +#define MAGIC65 ((uint64_t) 0x218a019866014613ULL) +#define MAGIC11 0x23b + +/* Returns the square root if the input is a square, otherwise 0. */ +static uintmax_t _GL_ATTRIBUTE_CONST +is_square (uintmax_t x) { - mpz_t *const *a = av; - mpz_t *const *b = bv; - return mpz_cmp (**a, **b); + /* Uses the tests suggested by Cohen. Excludes 99% of the non-squares before + computing the square root. */ + if (((MAGIC64 >> (x & 63)) & 1) + && ((MAGIC63 >> (x % 63)) & 1) + /* Both 0 and 64 are squares mod (65) */ + && ((MAGIC65 >> ((x % 65) & 63)) & 1) + && ((MAGIC11 >> (x % 11) & 1))) + { + uintmax_t r = isqrt (x); + if (r*r == x) + return r; + } + return 0; } -static void -sort_and_print_factors (void) +/* invtab[i] = floor(0x10000 / (0x100 + i) */ +static const unsigned short invtab[0x81] = + { + 0x200, + 0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1, + 0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7, + 0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af, + 0x1ac, 0x1a9, 0x1a6, 0x1a4, 0x1a1, 0x19e, 0x19c, 0x199, + 0x197, 0x194, 0x192, 0x18f, 0x18d, 0x18a, 0x188, 0x186, + 0x183, 0x181, 0x17f, 0x17d, 0x17a, 0x178, 0x176, 0x174, + 0x172, 0x170, 0x16e, 0x16c, 0x16a, 0x168, 0x166, 0x164, + 0x162, 0x160, 0x15e, 0x15c, 0x15a, 0x158, 0x157, 0x155, + 0x153, 0x151, 0x150, 0x14e, 0x14c, 0x14a, 0x149, 0x147, + 0x146, 0x144, 0x142, 0x141, 0x13f, 0x13e, 0x13c, 0x13b, + 0x139, 0x138, 0x136, 0x135, 0x133, 0x132, 0x130, 0x12f, + 0x12e, 0x12c, 0x12b, 0x129, 0x128, 0x127, 0x125, 0x124, + 0x123, 0x121, 0x120, 0x11f, 0x11e, 0x11c, 0x11b, 0x11a, + 0x119, 0x118, 0x116, 0x115, 0x114, 0x113, 0x112, 0x111, + 0x10f, 0x10e, 0x10d, 0x10c, 0x10b, 0x10a, 0x109, 0x108, + 0x107, 0x106, 0x105, 0x104, 0x103, 0x102, 0x101, 0x100, + }; + +/* Compute q = [u/d], r = u mod d. Avoids slow hardware division for the case + that q < 0x40; here it instead uses a table of (Euclidian) inverses. */ +#define div_smallq(q, r, u, d) \ + do { \ + if ((u) / 0x40 < (d)) \ + { \ + int _cnt; \ + uintmax_t _dinv, _mask, _q, _r; \ + count_leading_zeros (_cnt, (d)); \ + _r = (u); \ + if (UNLIKELY (_cnt > (W_TYPE_SIZE - 8))) \ + { \ + _dinv = invtab[((d) << (_cnt + 8 - W_TYPE_SIZE)) - 0x80]; \ + _q = _dinv * _r >> (8 + W_TYPE_SIZE - _cnt); \ + } \ + else \ + { \ + _dinv = invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt)) - 0x7f]; \ + _q = _dinv * (_r >> (W_TYPE_SIZE - 3 - _cnt)) >> 11; \ + } \ + _r -= _q*(d); \ + \ + _mask = -(uintmax_t) (_r >= (d)); \ + (r) = _r - (_mask & (d)); \ + (q) = _q - _mask; \ + assert ( (q) * (d) + (r) == u); \ + } \ + else \ + { \ + uintmax_t _q = (u) / (d); \ + (r) = (u) - _q * (d); \ + (q) = _q; \ + } \ + } while (0) + +/* Notes: Example N = 22117019. After first phase we find Q1 = 6314, Q + = 3025, P = 1737, representing F_{18} = (-6314, 2* 1737, 3025), + with 3025 = 55^2. + + Constructing the square root, we get Q1 = 55, Q = 8653, P = 4652, + representing G_0 = (-55, 2*4652, 8653). + + In the notation of the paper: + + S_{-1} = 55, S_0 = 8653, R_0 = 4652 + + Put + + t_0 = floor([q_0 + R_0] / S0) = 1 + R_1 = t_0 * S_0 - R_0 = 4001 + S_1 = S_{-1} +t_0 (R_0 - R_1) = 706 +*/ + +/* Multipliers, in order of efficiency: + 0.7268 3*5*7*11 = 1155 = 3 (mod 4) + 0.7317 3*5*7 = 105 = 1 + 0.7820 3*5*11 = 165 = 1 + 0.7872 3*5 = 15 = 3 + 0.8101 3*7*11 = 231 = 3 + 0.8155 3*7 = 21 = 1 + 0.8284 5*7*11 = 385 = 1 + 0.8339 5*7 = 35 = 3 + 0.8716 3*11 = 33 = 1 + 0.8774 3 = 3 = 3 + 0.8913 5*11 = 55 = 3 + 0.8972 5 = 5 = 1 + 0.9233 7*11 = 77 = 1 + 0.9295 7 = 7 = 3 + 0.9934 11 = 11 = 3 +*/ +#define QUEUE_SIZE 50 + +#if STAT_SQUFOF +# define Q_FREQ_SIZE 50 +/* Element 0 keeps the total */ +static unsigned int q_freq[Q_FREQ_SIZE + 1]; +# define MIN(a,b) ((a) < (b) ? (a) : (b)) +#endif + +/* Return true on success. Expected to fail only for numbers + >= 2^{2*W_TYPE_SIZE - 2}, or close to that limit. */ +static bool +factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors) { - mpz_t **faclist; - size_t i; + /* Uses algorithm and notation from + + SQUARE FORM FACTORIZATION + JASON E. GOWER AND SAMUEL S. WAGSTAFF, JR. + + http://homes.cerias.purdue.edu/~ssw/squfof.pdf + */ + + static const unsigned int multipliers_1[] = + { /* = 1 (mod 4) */ + 105, 165, 21, 385, 33, 5, 77, 1, 0 + }; + static const unsigned int multipliers_3[] = + { /* = 3 (mod 4) */ + 1155, 15, 231, 35, 3, 55, 7, 11, 0 + }; + + const unsigned int *m; + + struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE]; + + if (n1 >= ((uintmax_t) 1 << (W_TYPE_SIZE - 2))) + return false; - faclist = xcalloc (nfactors_found, sizeof *faclist); - for (i = 0; i < nfactors_found; ++i) + uintmax_t sqrt_n = isqrt2 (n1, n0); + + if (n0 == sqrt_n * sqrt_n) { - faclist[i] = &factor[i]; + uintmax_t p1, p0; + + umul_ppmm (p1, p0, sqrt_n, sqrt_n); + assert (p0 == n0); + + if (n1 == p1) + { + if (prime_p (sqrt_n)) + factor_insert_multiplicity (factors, sqrt_n, 2); + else + { + struct factors f; + + f.nfactors = 0; + if (!factor_using_squfof (0, sqrt_n, &f)) + { + /* Try pollard rho instead */ + factor_using_pollard_rho (sqrt_n, 1, &f); + } + /* Duplicate the new factors */ + for (unsigned int i = 0; i < f.nfactors; i++) + factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]); + } + return true; + } } - qsort (faclist, nfactors_found, sizeof *faclist, mpcompare); - for (i = 0; i < nfactors_found; ++i) + /* Select multipliers so we always get n * mu = 3 (mod 4) */ + for (m = (n0 % 4 == 1) ? multipliers_3 : multipliers_1; + *m; m++) { - fputc (' ', stdout); - mpz_out_str (stdout, 10, *faclist[i]); + uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B; + unsigned int i; + unsigned int mu = *m; + unsigned int qpos = 0; + + assert (mu * n0 % 4 == 3); + + /* In the notation of the paper, with mu * n == 3 (mod 4), we + get \Delta = 4 mu * n, and the paper's \mu is 2 mu. As far as + I understand it, the necessary bound is 4 \mu^3 < n, or 32 + mu^3 < n. + + However, this seems insufficient: With n = 37243139 and mu = + 105, we get a trivial factor, from the square 38809 = 197^2, + without any corresponding Q earlier in the iteration. + + Requiring 64 mu^3 < n seems sufficient. */ + if (n1 == 0) + { + if ((uintmax_t) mu*mu*mu >= n0 / 64) + continue; + } + else + { + if (n1 > ((uintmax_t) 1 << (W_TYPE_SIZE - 2)) / mu) + continue; + } + umul_ppmm (Dh, Dl, n0, mu); + Dh += n1 * mu; + + assert (Dl % 4 != 1); + assert (Dh < (uintmax_t) 1 << (W_TYPE_SIZE - 2)); + + S = isqrt2 (Dh, Dl); + + Q1 = 1; + P = S; + + /* Square root remainder fits in one word, so ignore high part. */ + Q = Dl - P*P; + /* FIXME: When can this differ from floor(sqrt(2 sqrt(D)))? */ + L = isqrt (2*S); + B = 2*L; + L1 = mu * 2 * L; + + /* The form is (+/- Q1, 2P, -/+ Q), of discriminant 4 (P^2 + Q Q1) = + 4 D. */ + + for (i = 0; i <= B; i++) + { + uintmax_t q, P1, t, rem; + + div_smallq (q, rem, S+P, Q); + P1 = S - rem; /* P1 = q*Q - P */ + +#if STAT_SQUFOF + assert (q > 0); + q_freq[0]++; + q_freq[MIN (q, Q_FREQ_SIZE)]++; +#endif + + if (Q <= L1) + { + uintmax_t g = Q; + + if ( (Q & 1) == 0) + g /= 2; + + g /= gcd_odd (g, mu); + + if (g <= L) + { + if (qpos >= QUEUE_SIZE) + error (EXIT_FAILURE, 0, _("squfof queue overflow")); + queue[qpos].Q = g; + queue[qpos].P = P % g; + qpos++; + } + } + + /* I think the difference can be either sign, but mod + 2^W_TYPE_SIZE arithmetic should be fine. */ + t = Q1 + q * (P - P1); + Q1 = Q; + Q = t; + P = P1; + + if ( (i & 1) == 0) + { + uintmax_t r = is_square (Q); + if (r) + { + for (unsigned int j = 0; j < qpos; j++) + { + if (queue[j].Q == r) + { + if (r == 1) + /* Traversed entire cycle. */ + goto next_multiplier; + + /* Need the absolute value for divisibility test. */ + if (P >= queue[j].P) + t = P - queue[j].P; + else + t = queue[j].P - P; + if (t % r == 0) + { + /* Delete entries up to and including entry + j, which matched. */ + memmove (queue, queue + j + 1, + (qpos - j - 1) * sizeof (queue[0])); + qpos -= (j + 1); + } + goto next_i; + } + } + + /* We have found a square form, which should give a + factor. */ + Q1 = r; + assert (S >= P); /* What signs are possible? */ + P += r * ((S - P) / r); + + /* Note: Paper says (N - P*P) / Q1, that seems incorrect + for the case D = 2N. */ + /* Compute Q = (D - P*P) / Q1, but we need double + precision. */ + uintmax_t hi, lo; + umul_ppmm (hi, lo, P, P); + sub_ddmmss (hi, lo, Dh, Dl, hi, lo); + udiv_qrnnd (Q, rem, hi, lo, Q1); + assert (rem == 0); + + for (;;) + { + /* Note: There appears to by a typo in the paper, + Step 4a in the algorithm description says q <-- + floor([S+P]/\hat Q), but looking at the equations + in Sec. 3.1, it should be q <-- floor([S+P] / Q). + (In this code, \hat Q is Q1). */ + div_smallq (q, rem, S+P, Q); + P1 = S - rem; /* P1 = q*Q - P */ + +#if STAT_SQUFOF + q_freq[0]++; + q_freq[MIN (q, Q_FREQ_SIZE)]++; +#endif + if (P == P1) + break; + t = Q1 + q * (P - P1); + Q1 = Q; + Q = t; + P = P1; + } + + if ( (Q & 1) == 0) + Q /= 2; + Q /= gcd_odd (Q, mu); + + assert (Q > 1 && (n1 || Q < n0)); + + if (prime_p (Q)) + factor_insert (factors, Q); + else if (!factor_using_squfof (0, Q, factors)) + factor_using_pollard_rho (Q, 2, factors); + + divexact_21 (n1, n0, n1, n0, Q); + + if (prime2_p (n1, n0)) + factor_insert_large (factors, n1, n0); + else + { + if (!factor_using_squfof (n1, n0, factors)) + { + if (n1 == 0) + factor_using_pollard_rho (n0, 1, factors); + else + factor_using_pollard_rho2 (n1, n0, 1, factors); + } + } + + return true; + } + } + next_i:; + } + next_multiplier:; } - putchar ('\n'); - free (faclist); + return false; } static void -free_factors (void) +factor (uintmax_t t1, uintmax_t t0, struct factors *factors) { - size_t i; + factors->nfactors = 0; + factors->plarge[1] = 0; + + if (t1 == 0 && t0 < 2) + return; + + t0 = factor_using_division (&t1, t1, t0, factors); + + if (t1 == 0 && t0 < 2) + return; - for (i = 0; i < nfactors_found; ++i) + if (prime2_p (t1, t0)) + factor_insert_large (factors, t1, t0); + else { - mpz_clear (factor[i]); + if (alg == ALG_SQUFOF) + if (factor_using_squfof (t1, t0, factors)) + return; + + if (t1 == 0) + factor_using_pollard_rho (t0, 1, factors); + else + factor_using_pollard_rho2 (t1, t0, 1, factors); } - /* Don't actually free factor[] because in the case where we are - reading numbers from stdin, we may be about to use it again. */ - nfactors_found = 0; } -/* Arbitrary-precision factoring */ +#if HAVE_GMP static void -print_factors_multi (mpz_t t) +mp_factor (mpz_t t, struct mp_factors *factors) { - mpz_out_str (stdout, 10, t); - putchar (':'); + mp_factor_init (factors); if (mpz_sgn (t) != 0) { - /* Set the trial division limit according to the size of t. */ - size_t n_bits = mpz_sizeinbase (t, 2); - unsigned int division_limit = MIN (n_bits, 1000); - division_limit *= division_limit; - - factor_using_division (t, division_limit); + mp_factor_using_division (t, factors); if (mpz_cmp_ui (t, 1) != 0) { debug ("[is number prime?] "); - if (mpz_probab_prime_p (t, MR_REPS)) - emit_factor (t); + if (mp_prime_p (t)) + mp_factor_insert (factors, t); else - factor_using_pollard_rho (t, 1); + mp_factor_using_pollard_rho (t, 1, factors); } } - - mpz_clear (t); - sort_and_print_factors (); - free_factors (); } #endif +static strtol_error +strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s) +{ + unsigned int lo_carry; + uintmax_t hi, lo; + + strtol_error err; + + hi = lo = 0; + for (;;) + { + unsigned int c = *s++; + if (c == 0) + break; + + if (UNLIKELY (!ISDIGIT (c))) + { + err = LONGINT_INVALID; + break; + } + c -= '0'; + + err = LONGINT_OK; /* we've seen at least one valid digit */ + + if (UNLIKELY (hi > ~(uintmax_t)0 / 10)) + { + err = LONGINT_OVERFLOW; + break; + } + hi = 10 * hi; + + lo_carry = (lo >> (W_TYPE_SIZE - 3)) + (lo >> (W_TYPE_SIZE - 1)); + lo_carry += 10 * lo < 2 * lo; + + lo = 10 * lo; + lo += c; + + lo_carry += lo < c; + hi += lo_carry; + if (UNLIKELY (hi < lo_carry)) + { + err = LONGINT_OVERFLOW; + break; + } + } + + *hip = hi; + *lop = lo; + + return err; +} + +static void +print_uintmaxes (uintmax_t t1, uintmax_t t0) +{ + uintmax_t q, r; + + if (t1 == 0) + printf ("%ju", t0); + else + { + /* Use very plain code here since it seems hard to write fast code + without assuming a specific word size. */ + q = t1 / 1000000000; + r = t1 % 1000000000; + udiv_qrnnd (t0, r, r, t0, 1000000000); + print_uintmaxes (q, t0); + printf ("%09u", (int) r); + } +} + +/* Single-precision factoring */ +static void +print_factors_single (uintmax_t t1, uintmax_t t0) +{ + struct factors factors; + + print_uintmaxes (t1, t0); + putchar (':'); + + factor (t1, t0, &factors); + + for (unsigned int j = 0; j < factors.nfactors; j++) + for (unsigned int k = 0; k < factors.e[j]; k++) + printf (" %ju", factors.p[j]); + + if (factors.plarge[1]) + { + putchar (' '); + print_uintmaxes (factors.plarge[1], factors.plarge[0]); + } + putchar ('\n'); +} /* Emit the factors of the indicated number. If we have the option of using either algorithm, we select on the basis of the length of the number. @@ -434,58 +2288,55 @@ print_factors_multi (mpz_t t) has enough digits, because the algorithm is better. The turnover point depends on the value. */ static bool -print_factors (char const *s) +print_factors (const char *input) { - uintmax_t n; - strtol_error err = xstrtoumax (s, NULL, 10, &n, ""); + uintmax_t t1, t0; -#if HAVE_GMP - enum { GMP_TURNOVER_POINT = 100000 }; + /* Try converting the number to one or two words. If it fails, use GMP or + print an error message. The 2nd condition checks that the most + significant bit of the two-word number is clear, in a typesize neutral + way. */ + strtol_error err = strto2uintmax (&t1, &t0, input); - if (err == LONGINT_OVERFLOW - || (err == LONGINT_OK && GMP_TURNOVER_POINT <= n)) + switch (err) { - mpz_t t; - mpz_init (t); - if (gmp_sscanf (s, "%Zd", t) == 1) + case LONGINT_OK: + if (((t1 << 1) >> 1) == t1) { - debug ("[%s]", _("using arbitrary-precision arithmetic")); - print_factors_multi (t); + debug ("[%s]", _("using single-precision arithmetic")); + print_factors_single (t1, t0); return true; } - err = LONGINT_INVALID; - } -#endif - - switch (err) - { - case LONGINT_OK: - debug ("[%s]", _("using single-precision arithmetic")); - print_factors_single (n); - return true; - - case LONGINT_OVERFLOW: - error (0, 0, _("%s is too large"), quote (s)); - return false; + break; default: - error (0, 0, _("%s is not a valid positive integer"), quote (s)); + error (0, 0, _("%s is not a valid positive integer"), quote (input)); return false; } -} -enum -{ - VERBOSE_OPTION = CHAR_MAX + 1 -}; +#if HAVE_GMP + debug ("[%s]", _("using arbitrary-precision arithmetic")); + mpz_t t; + struct mp_factors factors; -static struct option const long_options[] = -{ - {"verbose", no_argument, NULL, VERBOSE_OPTION}, - {GETOPT_HELP_OPTION_DECL}, - {GETOPT_VERSION_OPTION_DECL}, - {NULL, 0, NULL, 0} -}; + mpz_init_set_str (t, input, 10); + + gmp_printf ("%Zd:", t); + mp_factor (t, &factors); + + for (unsigned int j = 0; j < factors.nfactors; j++) + for (unsigned int k = 0; k < factors.e[j]; k++) + gmp_printf (" %Zd", factors.p[j]); + + mp_factor_clear (&factors); + mpz_clear (t); + putchar ('\n'); + return true; +#else + error (0, 0, _("%s is too large"), quote (input)); + return false; +#endif +} void usage (int status) @@ -535,9 +2386,6 @@ do_stdin (void) int main (int argc, char **argv) { - bool ok; - int c; - initialize_main (&argc, &argv); set_program_name (argv[0]); setlocale (LC_ALL, ""); @@ -546,12 +2394,23 @@ main (int argc, char **argv) atexit (close_stdout); + alg = ALG_POLLARD_RHO; /* Default to Pollard rho */ + + int c; while ((c = getopt_long (argc, argv, "", long_options, NULL)) != -1) { switch (c) { case VERBOSE_OPTION: - verbose = true; + verbose = 1; + break; + + case 's': + alg = ALG_SQUFOF; + break; + + case 'w': + flag_prove_primality = false; break; case_GETOPT_HELP_CHAR; @@ -563,18 +2422,36 @@ main (int argc, char **argv) } } +#if STAT_SQUFOF + if (alg == ALG_SQUFOF) + memset (q_freq, 0, sizeof (q_freq)); +#endif + + bool ok; if (argc <= optind) ok = do_stdin (); else { - int i; ok = true; - for (i = optind; i < argc; i++) + for (int i = optind; i < argc; i++) if (! print_factors (argv[i])) ok = false; } -#if HAVE_GMP - free (factor); + +#if STAT_SQUFOF + if (alg == ALG_SQUFOF && q_freq[0] > 0) + { + double acc_f; + printf ("q freq. cum. freq.(total: %d)\n", q_freq[0]); + for (unsigned int i = 1, acc_f = 0.0; i <= Q_FREQ_SIZE; i++) + { + double f = (double) q_freq[i] / q_freq[0]; + acc_f += f; + printf ("%s%d %.2f%% %.2f%%\n", i == Q_FREQ_SIZE ? ">=" : "", i, + 100.0 * f, 100.0 * acc_f); + } + } #endif + exit (ok ? EXIT_SUCCESS : EXIT_FAILURE); } diff --git a/src/local.mk b/src/local.mk index 98259fa80..6a01ef1c7 100644 --- a/src/local.mk +++ b/src/local.mk @@ -35,7 +35,10 @@ bin_PROGRAMS = @bin_PROGRAMS@ pkglibexec_PROGRAMS = @pkglibexec_PROGRAMS@ # Needed by the testsuite. -noinst_PROGRAMS = src/setuidgid src/getlimits +noinst_PROGRAMS = \ + src/getlimits \ + src/make-prime-list \ + src/setuidgid noinst_HEADERS = \ src/chown-core.h \ @@ -48,20 +51,18 @@ noinst_HEADERS = \ src/fs-is-local.h \ src/group-list.h \ src/ioblksize.h \ + src/longlong.h \ src/ls.h \ src/operand2sig.h \ src/prog-fprintf.h \ src/remove.h \ src/system.h \ - src/wheel-size.h \ - src/wheel.h \ src/uname.h EXTRA_DIST += \ src/dcgen \ src/dircolors.hin \ src/tac-pipe.c \ - src/wheel-gen.pl \ src/extract-magic \ src/c99-to-c89.diff @@ -134,6 +135,12 @@ src_link_LDADD = $(LDADD) src_ln_LDADD = $(LDADD) src_logname_LDADD = $(LDADD) src_ls_LDADD = $(LDADD) + +# This must *not* depend on anything in lib/, since it is used to generate +# src/primes.h. If it depended on libcoreutils.a, that would pull all lib/*.c +# into BUILT_SOURCES. +src_make_prime_list_LDADD = + src_md5sum_LDADD = $(LDADD) src_mkdir_LDADD = $(LDADD) src_mkfifo_LDADD = $(LDADD) @@ -371,19 +378,11 @@ src/dircolors.h: src/dcgen src/dircolors.hin $(AM_V_at)chmod a-w $@-t $(AM_V_at)mv $@-t $@ -wheel_size = 5 - -BUILT_SOURCES += src/wheel-size.h -src/wheel-size.h: Makefile.am - $(AM_V_GEN)rm -f $@ $@-t - $(AM_V_at)echo '#define WHEEL_SIZE $(wheel_size)' > $@-t - $(AM_V_at)chmod a-w $@-t - $(AM_V_at)mv $@-t $@ - -BUILT_SOURCES += src/wheel.h -src/wheel.h: src/wheel-gen.pl Makefile.am +BUILT_SOURCES += src/primes.h +CLEANFILES += src/primes.h +src/primes.h: src/make-prime-list $(AM_V_GEN)rm -f $@ $@-t - $(AM_V_at)$(srcdir)/src/wheel-gen.pl $(wheel_size) > $@-t + $(AM_V_at)src/make-prime-list 5000 > $@-t $(AM_V_at)chmod a-w $@-t $(AM_V_at)mv $@-t $@ diff --git a/src/longlong.h b/src/longlong.h index 84e32ee46..510f40ef2 100644 --- a/src/longlong.h +++ b/src/longlong.h @@ -1,7 +1,7 @@ /* longlong.h -- definitions for mixed size 32/64 bit arithmetic. Copyright 1991, 1992, 1993, 1994, 1996, 1997, 1999, 2000, 2001, 2002, 2003, -2004, 2005, 2007, 2008, 2009, 2011 Free Software Foundation, Inc. +2004, 2005, 2007, 2008, 2009, 2011, 2012 Free Software Foundation, Inc. This file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free @@ -2055,6 +2055,10 @@ extern UWtype mpn_udiv_qrnnd_r (UWtype, UWtype, UWtype, UWtype *); __GMP_DECLSPEC UWtype __MPN(udiv_w_sdiv) (UWtype *, UWtype, UWtype, UWtype); #endif +#ifndef __GMP_DECLSPEC +#define __GMP_DECLSPEC /* empty */ +#endif + /* If udiv_qrnnd was not defined for this processor, use __udiv_qrnnd_c. */ #if !defined (udiv_qrnnd) #define UDIV_NEEDS_NORMALIZATION 1 diff --git a/src/wheel-gen.pl b/src/wheel-gen.pl deleted file mode 100755 index 65b60d12d..000000000 --- a/src/wheel-gen.pl +++ /dev/null @@ -1,114 +0,0 @@ -#!/usr/bin/perl -w -# Generate the spokes of a wheel, for wheel factorization. - -# Copyright (C) 2001-2012 Free Software Foundation, Inc. - -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, either version 3 of the License, or -# (at your option) any later version. - -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. - -# You should have received a copy of the GNU General Public License -# along with this program. If not, see <http://www.gnu.org/licenses/>. - -eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' - if 0; - -use strict; -(my $ME = $0) =~ s|.*/||; - -# A global destructor to close standard output with error checking. -sub END -{ - defined fileno STDOUT - or return; - close STDOUT - and return; - warn "$ME: closing standard output: $!\n"; - $? ||= 1; -} - -sub is_prime ($) -{ - my ($n) = @_; - use integer; - - $n == 2 - and return 1; - - my $d = 2; - my $w = 1; - while (1) - { - my $q = $n / $d; - $n == $q * $d - and return 0; - $d += $w; - $q < $d - and last; - $w = 2; - } - return 1; -} - -{ - @ARGV == 1 - or die "$ME: missing argument\n"; - - my $wheel_size = $ARGV[0]; - - my @primes = (2); - my $product = $primes[0]; - my $n_primes = 1; - for (my $i = 3; ; $i += 2) - { - if (is_prime $i) - { - push @primes, $i; - $product *= $i; - ++$n_primes == $wheel_size - and last; - } - } - - my $ws_m1 = $wheel_size - 1; - print <<EOF; -/* The first $ws_m1 elements correspond to the incremental offsets of the - first $wheel_size primes (@primes). The $wheel_size(th) element is the - difference between that last prime and the next largest integer - that is not a multiple of those primes. The remaining numbers - define the wheel. For more information, see - http://www.utm.edu/research/primes/glossary/WheelFactorization.html. */ -EOF - - my @increments; - my $prev = 2; - for (my $i = 3; ; $i += 2) - { - my $rel_prime = 1; - foreach my $divisor (@primes) - { - $i != $divisor && $i % $divisor == 0 - and $rel_prime = 0; - } - - if ($rel_prime) - { - #warn $i, ' ', $i - $prev, "\n"; - push @increments, $i - $prev; - $prev = $i; - - $product + 1 < $i - and last; - } - } - - print join (",\n", @increments), "\n"; - - exit 0; -} |