Optimising+integer+division

//by Richard Russell, July 2010//

The integer division instructions **div** (unsigned) and **idiv** (signed) are the slowest of all the integer instructions, sometimes taking tens of clock cycles. When the divisor is a constant (that is, it is known when the code is assembled) a significantly faster execution can be achieved by using a multiplication (effectively by the reciprocal of the divisor) followed by a shift.

A difficulty with this approach is that it is non-trivial to devise an instruction sequence which is guaranteed to produce the **identical** result to the division instruction it replaces. Listed at the end of this article are two assembler macro functions **FN_udiv** and **FN_sdiv** which automatically generate the optimum code sequence for //any// divisor, giving identical results to the **div** and **idiv** instructions respectively.

It is probably easiest to illustrate the use of these macros by example. Suppose you want to divide the unsigned value in the **ebx** register by 123; the conventional code would be something like this: code 00B4174C 8B C3                         mov eax,ebx 00B4174E 33 D2                         xor edx,edx 00B41750 B9 7B 00 00 00                mov ecx,123 00B41755 F7 F1                         div ecx code This will return the unsigned quotient in the eax register. To speed up the operation use the **FN_udiv** macro as follows: code 00B423E2 B8 53 08 34 85 F7 E3        05 53 08 34 85 83 D2         00 C1 EA 06                    OPT FN_udiv("ebx", 123) code Although the code is longer (18 bytes compared with 11) the execution will be significantly faster. Note that the quotient is returned in **edx** not in **eax**.

The actual code generated by the macro is as follows: code 00B423E2 B8 53 08 34 85                mov eax,&85340853 00B423E7 F7 E3                         mul ebx 00B423E9 05 53 08 34 85                add eax,&85340853 00B423EE 83 D2 00                      adc edx,0 00B423F1 C1 EA 06                      shr edx,6 code

Suppose instead that one wishes to divide the **signed** value stored in the memory location **[edi]** by 123. The conventional code would be something like this: code 00B42424 8B 07                         mov eax,[edi] 00B42426 99                            cdq 00B42427 B9 7B 00 00 00                mov ecx,123 00B4242C F7 F9                         idiv ecx code This will return the signed quotient in the eax register.

To speed up the operation use the **FN_sdiv** macro as follows: code 00B423EE B8 15 02 4D 21 F7 2F 8B 07 C1 FA 04 C1 E8        1F 03 D0                       OPT FN_sdiv("dword [edi]", 123) code Again the the quotient is returned in **edx** not in **eax**.

The actual code generated by the macro in this case is as follows: code 00B423EE B8 15 02 4D 21                mov eax,&214D0215 00B423F3 F7 2F                         imul dword [edi] 00B423F5 8B 07                         mov eax,dword [edi] 00B423F7 C1 FA 04                      sar edx,4 00B423FA C1 E8 1F                      shr eax,31 00B423FD 03 D0                         add edx,eax code

In the case of **FN_udiv** the dividend can be any register (except eax) or a memory location. For **FN_sdiv** the dividend can be any register (except eax and edx) or a memory location. The divisor can be any 32-bit value except zero.

Here is the code of the macros: code format="bb4w" REM Generate assembler code for fast unsigned division by a constant REM Dividend may be a memory location or a register other than eax DEF FN_udiv(dividend$, d%) LOCAL opt%, i%, l%, f#, i#     opt% = ?419 >> 4 l% = FNlog2(d%) f# = 2.0#^(l%+32) / FNuint(d%) i# = INT(f#) IF f# = i# THEN PROCasm(opt%, "mov edx,"+dividend$) ELSEIF (f# - i#) < 0.5# THEN; i% = FNintu(i#) WHILE (i% AND 1)=0 AND l%>0 i% = i% >>> 1 l% -= 1 ENDWHILE PROCasm(opt%, "mov eax,&"+STR$~i%) PROCasm(opt%, "mul "+dividend$) PROCasm(opt%, "add eax,&"+STR$~i%) PROCasm(opt%, "adc edx,0") ELSE i% = FNintu(i#+1) WHILE (i% AND 1)=0 AND l%>0 i% = i% >>> 1 l% -= 1 ENDWHILE PROCasm(opt%, "mov eax,&"+STR$~i%) PROCasm(opt%, "mul "+dividend$) ENDIF IF l% PROCasm(opt%, "shr edx,"+STR$l%) = opt% REM Generate assembler code for fast *signed* division by a constant REM Dividend may be a memory location or a register other than eax or edx DEF FN_sdiv(dividend$, d%) LOCAL opt%, e%, i%, l%, j#, k#, l#, m#     opt% = ?419 >> 4 e% = d%     d% = ABS(d%) l% = FNlog2(d%) IF (d% AND (d%-1)) = 0 THEN PROCasm(opt%, "mov eax,"+dividend$) PROCasm(opt%, "cdq") PROCasm(opt%, "and edx,&"+STR$~(d%-1)) PROCasm(opt%, "add edx,eax") IF l% PROCasm(opt%, "sar edx,"+STR$l%) IF e%<0 PROCasm(opt%, "neg edx") = opt% ENDIF j# = 2.0#^31 / d%     j# = INT(d% * (j# - INT(j#))) k# = INT(2.0#^(l%+32) / (2.0#^31 - j#)) l# = INT(2.0#^(l%+32) / d%) m# = INT((2.0#^(l%+32) + k#) / d%) WHILE INT(l#/2)0 l# = INT(l#/2) m# = INT(m#/2) l% -= 1 ENDWHILE i% = FNintu(m#) PROCasm(opt%, "mov eax,&"+STR$~i%) PROCasm(opt%, "imul "+dividend$) PROCasm(opt%, "mov eax,"+dividend$) IF (i% AND &80000000) PROCasm(opt%, "add edx,eax") IF l% PROCasm(opt%, "sar edx,"+STR$l%) PROCasm(opt%, "shr eax,31") PROCasm(opt%, "add edx,eax") IF e%<0 PROCasm(opt%, "neg edx") = opt% DEF FNlog2(i%) LOCAL t%     REPEAT i% = i% >>> 1 t% += 1 UNTIL i% = 0 = t% - 1 DEF PROCasm(opt%, code$) LOCAL chan% code$ = "[OPT " + STR$(3) + " : " + code$ + " : ]" chan% = OPENOUT(@tmp$+"ASMDIV") PRINT #chan%, CHR$(LEN(code$)+4) + CHR$0 + CHR$0 + code$ : BPUT #chan%,0 CLOSE #chan% CALL @tmp$+"ASMDIV" ENDPROC DEF FNintu(N) = ((N / 2) << 1) - (INT(N / 2) <> N / 2) DEF FNuint(N%) = (N% >>> 1) * 2 + (N% AND 1) code