FMADD - Floating Multiply-Add Instruction | PowerPC Instruction Set Reference

Instruction Syntax

Mnemonic	Format	Flags
fmadd	frD,frA,frC,frB	Rc = 0
fmadd.	frD,frA,frC,frB	Rc = 1

Instruction Encoding

Field	Bits	Description
Primary Opcode	0-5	111111 (0x3F)
frD	6-10	Destination floating-point register
frA	11-15	Source floating-point register A (multiplicand)
frB	16-20	Source floating-point register B (addend)
frC	21-25	Source floating-point register C (multiplier)
XO	26-30	11101 (29)
Rc	31	Record Condition Register

Operation

frD ← (frA × frC) + frB

The contents of floating-point register frA are multiplied by the contents of floating-point register frC. The contents of floating-point register frB are then added to this intermediate product. The result is placed into floating-point register frD.

Note: This is a fused multiply-add operation. The intermediate product (frA × frC) is computed to infinite precision and is not rounded before adding frB. Only one rounding operation occurs, at the end, which provides higher accuracy than separate multiply and add operations. This instruction is essential for high-performance computing applications.

Affected Registers

Condition Register (CR1 field)

(if Rc = 1)

Reflects floating-point exception summary and status

Floating-Point Status and Control Register (FPSCR)

Affected fields:

FPRF (Floating-Point Result Flags)
FX (Floating-Point Exception Summary)
OX (Overflow Exception)
UX (Underflow Exception)
XX (Inexact Exception)
VXISI (Invalid Operation Exception for ∞ - ∞)
VXIMZ (Invalid Operation Exception for 0 × ∞)
FR (Fraction Rounded)
FI (Fraction Inexact)

For more information on floating-point status see Section 2.1.4, "Floating-Point Status and Control Register (FPSCR)," in the PowerPC Microprocessor Family: The Programming Environments manual.

Examples

Basic Multiply-Add Operation

lfd f1, multiplicand(r0)   # Load multiplicand (A)
lfd f2, addend(r0)         # Load addend (B)
lfd f3, multiplier(r0)     # Load multiplier (C)
fmadd f4, f1, f3, f2       # f4 = (f1 × f3) + f2
stfd f4, result(r0)        # Store result

Matrix Multiplication (Dot Product)

# Calculate dot product: result = a₁×b₁ + a₂×b₂ + a₃×b₃ + a₄×b₄
lfd f1, a1(r0)             # Load a₁
lfd f2, b1(r0)             # Load b₁
lfd f3, a2(r0)             # Load a₂
lfd f4, b2(r0)             # Load b₂
lfd f5, a3(r0)             # Load a₃
lfd f6, b3(r0)             # Load b₃
lfd f7, a4(r0)             # Load a₄
lfd f8, b4(r0)             # Load b₄
fmul f9, f1, f2            # f9 = a₁ × b₁
fmadd f9, f3, f4, f9       # f9 = (a₂ × b₂) + f9 = a₁×b₁ + a₂×b₂
fmadd f9, f5, f6, f9       # f9 = (a₃ × b₃) + f9 = a₁×b₁ + a₂×b₂ + a₃×b₃
fmadd f9, f7, f8, f9       # f9 = (a₄ × b₄) + f9 = final dot product
stfd f9, dot_product(r0)   # Store result

Polynomial Evaluation (Horner's Method)

# Evaluate polynomial: f(x) = ax³ + bx² + cx + d using Horner's method
# f(x) = ((ax + b)x + c)x + d
lfd f1, coeff_a(r0)        # Load coefficient a
lfd f2, coeff_b(r0)        # Load coefficient b
lfd f3, coeff_c(r0)        # Load coefficient c
lfd f4, coeff_d(r0)        # Load coefficient d
lfd f5, x_value(r0)        # Load x
fmadd f6, f1, f5, f2       # f6 = ax + b
fmadd f6, f6, f5, f3       # f6 = (ax + b)x + c
fmadd f6, f6, f5, f4       # f6 = ((ax + b)x + c)x + d
stfd f6, polynomial_result(r0) # Store final result

3D Graphics Transformation

# Transform 3D point: P' = M × P + T (matrix multiplication + translation)
# Calculate one component: result = m₁₁×x + m₁₂×y + m₁₃×z + tₓ
lfd f1, matrix_11(r0)      # Load m₁₁
lfd f2, matrix_12(r0)      # Load m₁₂
lfd f3, matrix_13(r0)      # Load m₁₃
lfd f4, point_x(r0)        # Load x coordinate
lfd f5, point_y(r0)        # Load y coordinate
lfd f6, point_z(r0)        # Load z coordinate
lfd f7, translation_x(r0)  # Load translation tₓ
fmul f8, f1, f4            # f8 = m₁₁ × x
fmadd f8, f2, f5, f8       # f8 = (m₁₂ × y) + f8 = m₁₁×x + m₁₂×y
fmadd f8, f3, f6, f8       # f8 = (m₁₃ × z) + f8 = m₁₁×x + m₁₂×y + m₁₃×z
fadd f8, f8, f7            # f8 = f8 + tₓ (final transformed coordinate)
stfd f8, transformed_x(r0) # Store transformed X coordinate

Physics: Kinematic Equation

# Calculate position using kinematic equation: s = ut + ½at²
# Rearranged as: s = u×t + (½a)×t²
lfd f1, initial_velocity(r0) # Load initial velocity (u)
lfd f2, time(r0)           # Load time (t)
lfd f3, acceleration(r0)   # Load acceleration (a)
lfd f4, half_constant(r0)  # Load 0.5
fmul f5, f4, f3            # f5 = ½a
fmadd f6, f1, f2, f0       # f6 = u×t + 0 = u×t
fmadd f6, f5, f2, f6       # f6 = (½a)×t + f6 = ut + ½at (first approximation)
# For higher accuracy, use: s = ut + ½at²
fmul f7, f2, f2            # f7 = t²
fmadd f8, f5, f7, f6       # f8 = (½a)×t² + ut = ut + ½at²
stfd f8, position(r0)      # Store final position

Audio Processing: FIR Filter

# FIR filter tap: output += coefficient × sample
lfd f1, filter_coeff(r0)   # Load filter coefficient
lfd f2, audio_sample(r0)   # Load audio sample
lfd f3, accumulator(r0)    # Load current accumulator value
fmadd f4, f1, f2, f3       # f4 = (coeff × sample) + accumulator
stfd f4, accumulator(r0)   # Store updated accumulator

Scientific Computing: Newton-Raphson Iteration

# Newton-Raphson iteration: x₊₁ = x - f(x)/f'(x)
# For square root: x₊₁ = ½(x + n/x) = ½x + ½(n/x)
lfd f1, current_x(r0)      # Load current approximation
lfd f2, target_n(r0)       # Load number to find square root of
lfd f3, half_constant(r0)  # Load 0.5
fdiv f4, f2, f1            # f4 = n/x
fmadd f5, f3, f1, f0       # f5 = ½x + 0 = ½x
fmadd f5, f3, f4, f5       # f5 = ½(n/x) + ½x = ½(x + n/x)
stfd f5, next_x(r0)        # Store next iteration

Financial Calculation: Compound Interest

# Compound interest: A = P(1 + r)ⁿ (simplified for one iteration)
# For monthly compounding: new_amount = amount × (1 + monthly_rate) + monthly_deposit
lfd f1, current_amount(r0) # Load current amount
lfd f2, monthly_rate(r0)   # Load monthly interest rate
lfd f3, monthly_deposit(r0) # Load monthly deposit
lfd f4, one_constant(r0)   # Load 1.0
fadd f5, f4, f2            # f5 = 1 + monthly_rate
fmadd f6, f1, f5, f3       # f6 = amount × (1 + rate) + deposit
stfd f6, new_amount(r0)    # Store new amount

Game Physics: Velocity Integration

# Update velocity: v = v₀ + at (acceleration integration)
# Update position: p = p₀ + vt (velocity integration)
lfd f1, velocity(r0)       # Load current velocity
lfd f2, acceleration(r0)   # Load acceleration
lfd f3, delta_time(r0)     # Load time step
lfd f4, position(r0)       # Load current position
fmadd f5, f2, f3, f1       # f5 = acceleration × Δt + velocity = new velocity
fmadd f6, f5, f3, f4       # f6 = new_velocity × Δt + position = new position
stfd f5, velocity(r0)      # Store updated velocity
stfd f6, position(r0)      # Store updated position

Signal Processing: Biquad Filter

# Biquad filter (Direct Form I): y[n] = b₀×x[n] + b₁×x[n-1] + b₂×x[n-2] - a₁×y[n-1] - a₂×y[n-2]
lfd f1, b0_coeff(r0)       # Load b₀ coefficient
lfd f2, b1_coeff(r0)       # Load b₁ coefficient  
lfd f3, b2_coeff(r0)       # Load b₂ coefficient
lfd f4, a1_coeff(r0)       # Load a₁ coefficient
lfd f5, a2_coeff(r0)       # Load a₂ coefficient
lfd f6, input_n(r0)        # Load x[n]
lfd f7, input_n1(r0)       # Load x[n-1]
lfd f8, input_n2(r0)       # Load x[n-2]
lfd f9, output_n1(r0)      # Load y[n-1]
lfd f10, output_n2(r0)     # Load y[n-2]
fmul f11, f1, f6           # f11 = b₀ × x[n]
fmadd f11, f2, f7, f11     # f11 = b₁×x[n-1] + f11
fmadd f11, f3, f8, f11     # f11 = b₂×x[n-2] + f11
fmsub f11, f4, f9, f11     # f11 = f11 - a₁×y[n-1] (using fmsub for efficiency)
fmsub f11, f5, f10, f11    # f11 = f11 - a₂×y[n-2]
stfd f11, output_n(r0)     # Store y[n]

Machine Learning: Neural Network Layer

# Neural network forward pass: output = weight × input + bias
lfd f1, weight(r0)         # Load weight
lfd f2, input_value(r0)    # Load input
lfd f3, bias(r0)           # Load bias
fmadd f4, f1, f2, f3       # f4 = weight × input + bias
stfd f4, neuron_output(r0) # Store neuron output (before activation)

Related Instructions

fmadds, fmsub, fnmadd, fnmsub, fmul, fadd

Back to Index