Instruction Syntax
Mnemonic | Format | Flags |
fmadd | frD,frA,frC,frB | Rc = 0 |
fmadd. | frD,frA,frC,frB | Rc = 1 |
Instruction Encoding
Field | Bits | Description |
Primary Opcode | 0-5 | 111111 (0x3F) |
frD | 6-10 | Destination floating-point register |
frA | 11-15 | Source floating-point register A (multiplicand) |
frB | 16-20 | Source floating-point register B (addend) |
frC | 21-25 | Source floating-point register C (multiplier) |
XO | 26-30 | 11101 (29) |
Rc | 31 | Record Condition Register |
Operation
frD ← (frA × frC) + frB
The contents of floating-point register frA are multiplied by the contents of floating-point register frC. The contents of floating-point register frB are then added to this intermediate product. The result is placed into floating-point register frD.
Note: This is a fused multiply-add operation. The intermediate product (frA × frC) is computed to infinite precision and is not rounded before adding frB. Only one rounding operation occurs, at the end, which provides higher accuracy than separate multiply and add operations. This instruction is essential for high-performance computing applications.
Affected Registers
Condition Register (CR1 field)
(if Rc = 1)
- Reflects floating-point exception summary and status
Floating-Point Status and Control Register (FPSCR)
Affected fields:
- FPRF (Floating-Point Result Flags)
- FX (Floating-Point Exception Summary)
- OX (Overflow Exception)
- UX (Underflow Exception)
- XX (Inexact Exception)
- VXISI (Invalid Operation Exception for ∞ - ∞)
- VXIMZ (Invalid Operation Exception for 0 × ∞)
- FR (Fraction Rounded)
- FI (Fraction Inexact)
For more information on floating-point status see Section 2.1.4, "Floating-Point Status and Control Register (FPSCR)," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Basic Multiply-Add Operation
lfd f1, multiplicand(r0) # Load multiplicand (A) lfd f2, addend(r0) # Load addend (B) lfd f3, multiplier(r0) # Load multiplier (C) fmadd f4, f1, f3, f2 # f4 = (f1 × f3) + f2 stfd f4, result(r0) # Store result
Matrix Multiplication (Dot Product)
# Calculate dot product: result = a₁×b₁ + a₂×b₂ + a₃×b₃ + a₄×b₄ lfd f1, a1(r0) # Load a₁ lfd f2, b1(r0) # Load b₁ lfd f3, a2(r0) # Load a₂ lfd f4, b2(r0) # Load b₂ lfd f5, a3(r0) # Load a₃ lfd f6, b3(r0) # Load b₃ lfd f7, a4(r0) # Load a₄ lfd f8, b4(r0) # Load b₄ fmul f9, f1, f2 # f9 = a₁ × b₁ fmadd f9, f3, f4, f9 # f9 = (a₂ × b₂) + f9 = a₁×b₁ + a₂×b₂ fmadd f9, f5, f6, f9 # f9 = (a₃ × b₃) + f9 = a₁×b₁ + a₂×b₂ + a₃×b₃ fmadd f9, f7, f8, f9 # f9 = (a₄ × b₄) + f9 = final dot product stfd f9, dot_product(r0) # Store result
Polynomial Evaluation (Horner's Method)
# Evaluate polynomial: f(x) = ax³ + bx² + cx + d using Horner's method # f(x) = ((ax + b)x + c)x + d lfd f1, coeff_a(r0) # Load coefficient a lfd f2, coeff_b(r0) # Load coefficient b lfd f3, coeff_c(r0) # Load coefficient c lfd f4, coeff_d(r0) # Load coefficient d lfd f5, x_value(r0) # Load x fmadd f6, f1, f5, f2 # f6 = ax + b fmadd f6, f6, f5, f3 # f6 = (ax + b)x + c fmadd f6, f6, f5, f4 # f6 = ((ax + b)x + c)x + d stfd f6, polynomial_result(r0) # Store final result
3D Graphics Transformation
# Transform 3D point: P' = M × P + T (matrix multiplication + translation) # Calculate one component: result = m₁₁×x + m₁₂×y + m₁₃×z + tₓ lfd f1, matrix_11(r0) # Load m₁₁ lfd f2, matrix_12(r0) # Load m₁₂ lfd f3, matrix_13(r0) # Load m₁₃ lfd f4, point_x(r0) # Load x coordinate lfd f5, point_y(r0) # Load y coordinate lfd f6, point_z(r0) # Load z coordinate lfd f7, translation_x(r0) # Load translation tₓ fmul f8, f1, f4 # f8 = m₁₁ × x fmadd f8, f2, f5, f8 # f8 = (m₁₂ × y) + f8 = m₁₁×x + m₁₂×y fmadd f8, f3, f6, f8 # f8 = (m₁₃ × z) + f8 = m₁₁×x + m₁₂×y + m₁₃×z fadd f8, f8, f7 # f8 = f8 + tₓ (final transformed coordinate) stfd f8, transformed_x(r0) # Store transformed X coordinate
Physics: Kinematic Equation
# Calculate position using kinematic equation: s = ut + ½at² # Rearranged as: s = u×t + (½a)×t² lfd f1, initial_velocity(r0) # Load initial velocity (u) lfd f2, time(r0) # Load time (t) lfd f3, acceleration(r0) # Load acceleration (a) lfd f4, half_constant(r0) # Load 0.5 fmul f5, f4, f3 # f5 = ½a fmadd f6, f1, f2, f0 # f6 = u×t + 0 = u×t fmadd f6, f5, f2, f6 # f6 = (½a)×t + f6 = ut + ½at (first approximation) # For higher accuracy, use: s = ut + ½at² fmul f7, f2, f2 # f7 = t² fmadd f8, f5, f7, f6 # f8 = (½a)×t² + ut = ut + ½at² stfd f8, position(r0) # Store final position
Audio Processing: FIR Filter
# FIR filter tap: output += coefficient × sample lfd f1, filter_coeff(r0) # Load filter coefficient lfd f2, audio_sample(r0) # Load audio sample lfd f3, accumulator(r0) # Load current accumulator value fmadd f4, f1, f2, f3 # f4 = (coeff × sample) + accumulator stfd f4, accumulator(r0) # Store updated accumulator
Scientific Computing: Newton-Raphson Iteration
# Newton-Raphson iteration: x₊₁ = x - f(x)/f'(x) # For square root: x₊₁ = ½(x + n/x) = ½x + ½(n/x) lfd f1, current_x(r0) # Load current approximation lfd f2, target_n(r0) # Load number to find square root of lfd f3, half_constant(r0) # Load 0.5 fdiv f4, f2, f1 # f4 = n/x fmadd f5, f3, f1, f0 # f5 = ½x + 0 = ½x fmadd f5, f3, f4, f5 # f5 = ½(n/x) + ½x = ½(x + n/x) stfd f5, next_x(r0) # Store next iteration
Financial Calculation: Compound Interest
# Compound interest: A = P(1 + r)ⁿ (simplified for one iteration) # For monthly compounding: new_amount = amount × (1 + monthly_rate) + monthly_deposit lfd f1, current_amount(r0) # Load current amount lfd f2, monthly_rate(r0) # Load monthly interest rate lfd f3, monthly_deposit(r0) # Load monthly deposit lfd f4, one_constant(r0) # Load 1.0 fadd f5, f4, f2 # f5 = 1 + monthly_rate fmadd f6, f1, f5, f3 # f6 = amount × (1 + rate) + deposit stfd f6, new_amount(r0) # Store new amount
Game Physics: Velocity Integration
# Update velocity: v = v₀ + at (acceleration integration) # Update position: p = p₀ + vt (velocity integration) lfd f1, velocity(r0) # Load current velocity lfd f2, acceleration(r0) # Load acceleration lfd f3, delta_time(r0) # Load time step lfd f4, position(r0) # Load current position fmadd f5, f2, f3, f1 # f5 = acceleration × Δt + velocity = new velocity fmadd f6, f5, f3, f4 # f6 = new_velocity × Δt + position = new position stfd f5, velocity(r0) # Store updated velocity stfd f6, position(r0) # Store updated position
Signal Processing: Biquad Filter
# Biquad filter (Direct Form I): y[n] = b₀×x[n] + b₁×x[n-1] + b₂×x[n-2] - a₁×y[n-1] - a₂×y[n-2] lfd f1, b0_coeff(r0) # Load b₀ coefficient lfd f2, b1_coeff(r0) # Load b₁ coefficient lfd f3, b2_coeff(r0) # Load b₂ coefficient lfd f4, a1_coeff(r0) # Load a₁ coefficient lfd f5, a2_coeff(r0) # Load a₂ coefficient lfd f6, input_n(r0) # Load x[n] lfd f7, input_n1(r0) # Load x[n-1] lfd f8, input_n2(r0) # Load x[n-2] lfd f9, output_n1(r0) # Load y[n-1] lfd f10, output_n2(r0) # Load y[n-2] fmul f11, f1, f6 # f11 = b₀ × x[n] fmadd f11, f2, f7, f11 # f11 = b₁×x[n-1] + f11 fmadd f11, f3, f8, f11 # f11 = b₂×x[n-2] + f11 fmsub f11, f4, f9, f11 # f11 = f11 - a₁×y[n-1] (using fmsub for efficiency) fmsub f11, f5, f10, f11 # f11 = f11 - a₂×y[n-2] stfd f11, output_n(r0) # Store y[n]
Machine Learning: Neural Network Layer
# Neural network forward pass: output = weight × input + bias lfd f1, weight(r0) # Load weight lfd f2, input_value(r0) # Load input lfd f3, bias(r0) # Load bias fmadd f4, f1, f2, f3 # f4 = weight × input + bias stfd f4, neuron_output(r0) # Store neuron output (before activation)