Fixed-Point Blockset | ![]() ![]() |
Addition and Subtraction
Addition is the most common arithmetic operation a processor performs. When two n-bit numbers are added together, it is always possible to produce a result with n + 1 nonzero digits due to a carry from the leftmost digit. For two's complement addition of two numbers, there are three cases to consider:
Fixed-Point Blockset Summation Process
Consider the summation of two numbers. Ideally, the real-world values obey the equation
where Vb and Vc are the input values and Va is the output value. To see how the summation is actually implemented, the three ideal values should be replaced by the general slope/bias encoding scheme described in Scaling.
The solution of the resulting equation for the stored integer, Qa, is given by the equation in Addition. Using shorthand notation, that equation becomes
where Fsb and Fsc are the adjusted fractional slopes and Bnet is the net bias. The offline conversions, and online conversions and operations are discussed below.
Offline Conversions. Fsb, Fsc, and Bnet are computed offline using round-to-nearest and saturation. Furthermore, Bnet is stored using the output data type.
Online Conversions and Operations. The remaining operations are performed online by the fixed-point processor, and depend on the slopes and biases for the input and output data types. The worst (most inefficient) case occurs when the slopes and biases are mismatched. The worst-case conversions and operations are given by these steps:
This conversion includes any necessary bit shifting, rounding, or overflow handling.
This summation includes any necessary overflow handling.
It is important to note that bit shifting, rounding, and overflow handling are applied to the intermediate steps (3 and 4) and not to the overall sum.
Streamlining Simulations and Generated Code
If the scaling of the input and output signals is matched, the number of summation operations is reduced from the worst (most inefficient) case. For example, when an input has the same fractional slope as the output, step 2 reduces to multiplication by one and can be eliminated. Trivial steps in the summation process are eliminated for both simulation and code generation. Exclusive use of radix point-only scaling for both input signals and output signals is a common way to eliminate the occurrence of mismatched slopes and biases, and results in the most efficient simulations and generated code.
Example: The Summation Process
Suppose you want to sum three numbers. Each of these numbers is represented by an 8-bit word, and each has a different radix point-only scaling. Additionally, the output is restricted to an 8-bit word with radix point-only scaling of 2-3.
The summation is shown below for the input values 19.875, 5.4375, and 4.84375.
Applying the rules from the previous section, the sum follows these steps:
Note that a loss in precision of one bit occurs, with the resulting value of QTemp determined by the rounding mode. For this example, round-to-floor is used. Overflow cannot occur in this case since the bits and radix point are both shifted to the right.
Note that overflow did not occur, but it is possible for this operation.
Note that a loss in precision of two bit occurs, with the resulting value of QTemp determined by the rounding mode. For this example, round-to-floor is used. Overflow cannot occur in this case since the bits and radix point are both shifted to the right.
Note that overflow did not occur, but it is possible for this operation.
As shown below, the result of step 7 differs from the ideal sum.
Blocks that perform addition and subtraction include the FixPt Sum, FixPt Matrix Gain, and FixPt FIR blocks.
![]() | Computational Units | Multiplication | ![]() |