Jun 02, 2021 Article blog
Quake-III Arena is one of the classic games of the '90s. T he series of games not only has good graphics and content, but also runs extremely smoothly even when the computer configuration is low. T hat's thanks to John Carmack, the developer of its 3D engine. I n fact, back in the DOS era of the early 1990s, when it was amazing to be able to do a little animation on a PC, John Carmack launched the stone-breaking Wolf Castle Castle, and then the re-inspiration, doom, doomII, Quake... P ush 3-D technology to the extreme every time. H is 3D engine code is extremely efficient, pressing almost every instruction on the PC. At the beginning, MS Direct3D had to listen to him and modify a lot of APIs.
RECENTLY, QUAKE DEVELOPER ID SOFTWARE COMPLIED WITH THE GPL PROTOCOL AND RELEASED THE ORIGINAL CODE FOR QUADE-III, GIVING THE WORLD THE PRIVILEGE OF WITNESSING THE ORIGINAL CODE OF CARMACK'S LEGENDARY 3D ENGINE.
This is the download address of the original QUKE-III code:
http://www.fileshack.com/file.x?fid=7547
(Below is the official download URL, search "quake3-1.32b-source.zip" to find a whole bunch of Chinese pages.)
ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip)
We know that the lower the function, the more frequently it is called. T he 3D engine is, in the final analysis, mathematical. T hen finding the lowest mathematical function (in game/code/q_math.c) must be well-written. There are a lot of interesting functions, many of which are amazing, and we can't learn them for years.
Such a piece of code was found in the game/code/q_math.c. Its purpose is to square and take down a number, and the code has been tested to be four times faster than (float) (1.0/sqrt(x)::
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number 0.5F;
y = number;
i = ( long ) &y; evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); what the fuck?
y = ( float ) &i;
y = y ( threehalfs - ( x2 y y ) ); 1st iteration
y = y ( threehalfs - ( x2 y y ) ); 2nd iteration, this can be removed
#ifndef Q3_VM
#ifdef linux
assert( !isnan(y) ); bk010122 - FPE?
#endif
#endif
return y;
}
The function returns 1/sqrt(x), which is more useful in image processing than sqrt(x).
Notice that this function uses only one overlay! ( In fact, there is no use for stacking, direct operation). C ompilation, experimentation, this function not only works very well, but also 4 times faster than the standard sqrt() function! You know, the compiler comes with its own functions, but after a rigorous and careful compilation optimization ah!
This concise function, at its core and most puzzling, is labeled "what the fuck?" a sentence
i = 0x5f3759df - ( i >> 1 );
Plus y s y ( threehalfs - ( x2 y y ) );
Two sentences to complete the opening operation! A nd notice that the core sentence is fixed-point shift operation, very fast! This is extremely efficient especially on many RISC structure CPUs that do not have multiplication instructions.
The principle of the algorithm is not complicated, that is, Newton iterative method, with x-f(x)/f'(x) to constantly approach the root of f(x)-a.
Simply put, for example, looking for square roots, f(x) x2, a, f'(x), 2 x, f (x)/f'(x) x/2, and f(x) in
x-f (x)/f'(x) after there is (x-a/x)/2, now we choose a-5, choose a guess value such as 2,
Then we can do that
5/2 = 2.5; ( 2.5+2)/2 = 2.25; 5 /2.25 = xxx; (2.25+xxx)/2 = xxxx ...
This iteration goes on, the result must converge in sqrt (5), yes, the general square root is so calculated
But the real bull B of The Quake3 author is that he chose a mysterious constant 0x5f3759df to calculate that guess
That's the line we annotated, and the line calculates a value very close to 1/sqrt(n) so that we only need 2 iterations to achieve the precision we need.
Well, if this isn't NB yet, then look:
Chris Lomont, a mathematician at Purdue University, was amused when he saw it and decided to study what Carmack had come up with
What's the mystery of this guess? Lomont is also a cowman, and after careful study, he theoretically deduces one
The best guess, and Carmack's numbers are very close, 0x5f37642f. Carmack' real cow, is he an alien?
The saga doesn't end here. Lomont was very satisfied with the results, so he took the starting point of his own calculations
Values and Carmack's mysterious numbers do the race to see whose numbers can find square roots faster and more accurately. The result is
Carmack won... No one knows how Carmack found the number.
Finally Lomont became angry, using violent methods to try one number after another, and finally found a bikamak number
The word is better than a small number, although in fact the results of these two numbers are very similar to this storm
The figure from the force is 0x5f375a86.
Lomont wrote a paper for this, "Fast Inverse Square Root."
Paper download address:
http://www.math.purdue.edu/~clomont/Math/Papers/2003/InvSqrt.pdf
http://www.matrix67.com/data/InvSqrt.pdf
reference:
Finally, the most streamlined 1/sqrt() function is given:
float InvSqrt(float x)
{
float xhalf = 0.5f x;
int i = (int )&x; get bits for floating VALUE
i = 0x5f375a86- (i>>1); gives initial guess y0
x = (float )&i; // convert bits BACK to float
x = x (1.5f-xhalf x x); // Newton step, repeating increases accuracy
return x;
}
You can try compiling and experimenting on PC, 51, AVR, 430, ARM, and be surprised by its efficiency.
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
Wikipedia Reference:
Paper: http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf
The above is A description of R's existence;
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
The following is the calculation of R
http://www.guokr.com/post/91389/
Basics 1:i>>1
Operation i>>1 means moving the binary number i one bit to the right, i.e. deleting the last bit and adding 0 in the first digit
Notice that we delete the last decimal number M of an n-bit and turn it into n-1 bits, which can be seen as dividing the decimal number by 10 and then rounding the floor (M/10) down
Similarly, to say that a binary number deletes the last digit is equivalent to dividing the number by 2 and rounding the floor (M/2) down
This looks like i>>1 is like floor (i/2), because the first derivative of the function f(x) =1/sqrt (x) is -1/2 (x)-3/2, just in front of a -1/2, and one can't help but feel that 0x5f3759df - (i >> 1) is the function 1/sqrt (x) in a certain order to expand Taylor. Here's a passage from Fast Inverse Square Root:
Eberly’s explanation was that this produced linear approximation, but is incorrect; we’ll see the guess is piecewise linear, and the function being approximated is not the same in all cases.
"Eberly's explanation is that it's a linear approximation, but it's not true. W e'll see that this estimate is segmented linear, and this approximation function is not the same in every situation. ”
Why do you say that? Basics 2: Floating-point storage is required here
The storage of various types of floats can be understood by looking at IEEE745 (not knowing what it is at all).
The 32-bit single-precision float is used here, and is always a positive number, expressed as:
0| E | M
0 represents a positive number
E is an index, and E is equivalent to 2^-127
M represents a number with an absolute value less than 1, but notes that one bit is omitted here. That is, when converting bit decimals, it should be expressed as (1 plus M)
Then the number after conversion should be: (1 plus M) 2^ (E-127)
So we find that in fact i>>1 is not exactly floor (i/2) but turns a number into (floor (M/2) plus 1) and 2(floor(E-127)/2)
And according to E's parity tail may need to be added 1/2
It may be useful to make e-E-127, noting that 1/sqrt(x) is to make the index of the original number become -e/2, so that carmack may simply want to produce an e/2 and use displacement, the next step is to find a subtraction to produce -e/2 and make the mantissa as close as possible to (1 plus M) - 1/2
Since this number must be positive, it is assumed that the value is:
0| R 1| R2
The next step is to discuss the situation:
Assuming E is even, the right shift of the exponent at this time does not affect the value of the mantail:
At this time e is odd, so that e is 2d1
Then the exponential part becomes:
Note that the phase subtraction here actually converts the floating point into an integer (i.e. regularization) and then subtracts it, rather than the normal floating point addition or subtraction.
Since the initial value must be positive, we need the upper formula to be positive, because 0 is < e< is 256, so r1 > is 128
Because the E we're talking about is even, that is, the last digit must be 0, so don't consider the effect of his right shift on M, so the result of the two minus is:
Note that M/2 is used here instead of floor (M/2) because this little error is too small compared to other errors
Of course, there is also a situation, that is, R2 <m p "" 2, in fact, binary additions and subtractions and decimals are similar, if the tail number is small, then it is like a higher number of "debits", that is, in this case subtracted the result is: < "" >
We define:
Then we can combine the two cases into one function:
This function is our approximation of function 1/sqrt(x), so our goal is to make the relative error of this function | (y-y0)/y | as small as possible:
So we get an error equation:
afterwards:
Note that 1/8 here is actually a pieced together, the specific method can first assume a positive number less than one a, because 0<r2 < 1,0 < m <1, we can expand to get r1 about an interval of a. K eep a as small as possible so that the range of this interval is less than one. A ccording to the characteristic that r1 must be an integer, we can determine r1 with the smallest error. H ere it is concluded that r1 is 190, brought into the hex and moved to the right (note that there is a symbol at the beginning of 0) is the 0x5f, which is exactly the first few of the black magic constants. & lt; p="">
What if E is odd? In fact, if E is odd, then M/2 needs to add 1/2 (the first bit of the tail is equivalent to 1/2), according to the same method, we can get another relative error function:
thereinto:
Interested can calculate how much R1 should be in this case, the author is very lazy to say that because the need to allow constants to be applied to both cases, so let R1 - 190.
Then is to determine the value of R2, all kinds of segmented discussion, too tangled we do not look (in the end, anyway, also not right, stall), determined down to about 0.45, and then through the software to calculate the optimal solution.