lfdux
Instruction Syntax
| Mnemonic | Format | Flags |
| lfdux | frD,rA,rB | - |
Instruction Encoding
| Field | Bits | Description |
| Primary Opcode | 0-5 | 011111 (0x1F) |
| frD | 6-10 | Destination floating-point register |
| rA | 11-15 | Source register A |
| rB | 16-20 | Source register B |
| XO | 21-30 | 631 (Extended opcode) |
| Rc | 31 | Reserved (0) |
Operation
EA ← (rA) + (rB) frD ← MEM(EA, 8) rA ← EA
A double-precision floating-point value (64 bits) is loaded from memory and placed in floating-point register frD. The effective address is computed by adding the contents of registers rA and rB. After the load, the effective address is stored back into register rA.
Note: This instruction cannot be used with rA=0. The update form requires a valid base register. This is the most advanced addressing mode for double-precision loads, combining indexed addressing with automatic pointer advancement. Essential for high-performance scientific computing and matrix operations.
Affected Registers
rA - Updated with the effective address after the load operation.
For more information on floating-point operations see Section 2.1.4, "Floating-Point Status and Control Register (FPSCR)," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Scientific Computing - Advanced Matrix Operations
# Perform matrix multiplication with optimized access patterns
lis r3, matrix_a@ha
addi r3, r3, matrix_a@l
lis r4, matrix_b@ha
addi r4, r4, matrix_b@l
lis r5, result_matrix@ha
addi r5, r5, result_matrix@l
lwz r6, matrix_size(r0) # N x N matrix
# Triple nested loop for matrix multiplication
li r7, 0 # i (row index)
outer_loop:
li r8, 0 # j (column index)
middle_loop:
lfd f10, zero_double(r0) # Initialize sum = 0.0
li r9, 0 # k (inner loop index)
# Calculate base addresses for this iteration
mullw r10, r7, r6 # i * N
slwi r11, r10, 3 # * 8 (bytes per double)
add r12, r3, r11 # &matrix_a[i][0]
slwi r13, r8, 3 # j * 8
add r14, r4, r13 # &matrix_b[0][j]
# Calculate stride for matrix_b (advance to next row)
slwi r15, r6, 3 # N * 8 (row stride in bytes)
inner_loop:
# Load matrix_a[i][k] with automatic advance
slwi r16, r9, 3 # k * 8
lfdux f1, r12, r16 # Load a[i][k] and advance r12
# Load matrix_b[k][j] with stride advancement
lfdux f2, r14, r15 # Load b[k][j] and advance to next row
# Multiply-accumulate: sum += a[i][k] * b[k][j]
fmadd f10, f1, f2, f10
addi r9, r9, 1 # k++
cmpw r9, r6 # k < N?
blt inner_loop # Continue inner loop
# Store result[i][j] = sum
mullw r17, r7, r6 # i * N
add r18, r17, r8 # i * N + j
slwi r19, r18, 3 # Convert to byte offset
stfdx f10, r5, r19 # Store result
addi r8, r8, 1 # j++
cmpw r8, r6 # j < N?
blt middle_loop # Continue middle loop
addi r7, r7, 1 # i++
cmpw r7, r6 # i < N?
blt outer_loop # Continue outer loop
Quantum Mechanics - Wavefunction Analysis
# Analyze quantum wavefunction with variable grid spacing lis r3, wavefunction_data@ha addi r3, r3, wavefunction_data@l lis r4, grid_spacing@ha addi r4, r4, grid_spacing@l lwz r5, num_grid_points(r0) # Number of spatial grid points # Calculate probability density and expectation values lfd f20, zero_double(r0) # Total probability lfd f21, zero_double(r0) # Position expectationlfd f22, zero_double(r0) # Position squared expectation quantum_analysis_loop: # Load current grid spacing (variable for adaptive grids) lwz r6, 0(r4) # Load spacing offset # Load complex wavefunction component with adaptive spacing lfdux f1, r3, r6 # Load ψ_real and advance by spacing lwz r7, 8 # Standard 8-byte advance for imaginary part lfdux f2, r3, r7 # Load ψ_imag and advance # Calculate probability density: |ψ|² = ψ_real² + ψ_imag² fmul f3, f1, f1 # ψ_real² fmadd f4, f2, f2, f3 # ψ_real² + ψ_imag² = |ψ|² # Load current position coordinate lwz r8, 4 # Advance to position data lfdux f5, r3, r8 # Load x coordinate and advance # Update total probability (for normalization check) fadd f20, f20, f4 # total_probability += |ψ|² # Update position expectation value: += x * |ψ|² fmadd f21, f5, f4, f21 # += x * |ψ|² # Update position squared expectation: += x² * |ψ|² fmul f6, f5, f5 # x² fmadd f22, f6, f4, f22 # += x² * |ψ|² addi r4, r4, 4 # Next grid spacing subi r5, r5, 1 # Decrement grid counter cmpwi r5, 0 bne quantum_analysis_loop # Continue analysis # Normalize results and calculate uncertainty fdiv f23, f21, f20 # = Σ(x|ψ|²) / Σ(|ψ|²) fdiv f24, f22, f20 # = Σ(x²|ψ|²) / Σ(|ψ|²) # Calculate uncertainty: Δx = √( - ²) fmul f25, f23, f23 # ² fsub f26, f24, f25 # - ² fsqrt f27, f26 # Δx = √( - ²) # Store results stfd f23, position_expectation(r0) stfd f27, position_uncertainty(r0)
Computational Fluid Dynamics - Turbulence Modeling
# Simulate turbulent flow using Large Eddy Simulation (LES)
lis r3, velocity_field@ha
addi r3, r3, velocity_field@l
lis r4, grid_metrics@ha
addi r4, r4, grid_metrics@l
lwz r5, num_cells(r0) # Number of computational cells
# Each cell contains: [u, v, w, p, ρ, τ_xx, τ_yy, τ_zz, τ_xy, τ_xz, τ_yz]
# where u,v,w are velocities, p is pressure, ρ is density, τ are stress tensors
turbulence_simulation_loop:
# Load grid metrics for adaptive mesh refinement
lwz r6, 0(r4) # Load cell size offset
# Load velocity components with variable grid spacing
lfdux f1, r3, r6 # Load u-velocity and advance by cell size
lwz r7, 8 # Standard advance for next component
lfdux f2, r3, r7 # Load v-velocity and advance
lfdux f3, r3, r7 # Load w-velocity and advance
# Load pressure and density
lfdux f4, r3, r7 # Load pressure and advance
lfdux f5, r3, r7 # Load density and advance
# Load stress tensor components
lfdux f6, r3, r7 # Load τ_xx and advance
lfdux f7, r3, r7 # Load τ_yy and advance
lfdux f8, r3, r7 # Load τ_zz and advance
lfdux f9, r3, r7 # Load τ_xy and advance
lfdux f10, r3, r7 # Load τ_xz and advance
lfdux f11, r3, r7 # Load τ_yz and advance
# Calculate strain rate tensor components
# S_ij = 0.5 * (∂u_i/∂x_j + ∂u_j/∂x_i)
bl calculate_strain_rate # Compute strain rate from velocity gradients
# Apply Smagorinsky subgrid-scale model
# τ_sgs = -2 * ρ * (C_s * Δ)² * |S| * S_ij
lfd f12, smagorinsky_constant(r0) # C_s
lfd f13, filter_width(r0) # Δ (filter width)
# Calculate |S| = √(2 * S_ij * S_ij)
bl calculate_strain_magnitude # Returns |S| in f14
# Calculate subgrid stress
fmul f15, f12, f13 # C_s * Δ
fmul f16, f15, f15 # (C_s * Δ)²
fmul f17, f16, f14 # (C_s * Δ)² * |S|
fmul f18, f5, f17 # ρ * (C_s * Δ)² * |S|
fadd f18, f18, f18 # 2 * ρ * (C_s * Δ)² * |S|
fneg f19, f18 # -2 * ρ * (C_s * Δ)² * |S|
# Update stress tensor with subgrid contributions
fmadd f6, f19, f14, f6 # τ_xx += τ_sgs
fmadd f7, f19, f14, f7 # τ_yy += τ_sgs
fmadd f8, f19, f14, f8 # τ_zz += τ_sgs
# Store updated flow variables (move pointer back to overwrite)
stfd f1, -88(r3) # Store updated u-velocity
stfd f2, -80(r3) # Store updated v-velocity
stfd f3, -72(r3) # Store updated w-velocity
stfd f4, -64(r3) # Store updated pressure
stfd f5, -56(r3) # Store updated density
stfd f6, -48(r3) # Store updated τ_xx
stfd f7, -40(r3) # Store updated τ_yy
stfd f8, -32(r3) # Store updated τ_zz
stfd f9, -24(r3) # Store updated τ_xy
stfd f10, -16(r3) # Store updated τ_xz
stfd f11, -8(r3) # Store updated τ_yz
addi r4, r4, 4 # Next grid metric
subi r5, r5, 1 # Decrement cell counter
cmpwi r5, 0
bne turbulence_simulation_loop # Continue simulation
Financial Engineering - Monte Carlo Path Generation
# Generate stochastic paths for multi-asset portfolio using Cholesky decomposition
lis r3, correlation_matrix@ha
addi r3, r3, correlation_matrix@l
lis r4, random_numbers@ha
addi r4, r4, random_numbers@l
lwz r5, num_assets(r0) # Number of assets in portfolio
lwz r6, num_timesteps(r0) # Number of time steps in simulation
# Generate correlated random variables using Cholesky decomposition
# Z = L * W where L is lower triangular Cholesky matrix, W is independent normals
monte_carlo_path_loop:
li r7, 0 # Asset index i
asset_loop_i:
lfd f10, zero_double(r0) # Initialize correlated random variable
li r8, 0 # Asset index j (for Cholesky sum)
cholesky_sum_loop:
# Calculate matrix element address: L[i][j] = base + (i*(i+1)/2 + j)*8
mullw r9, r7, r7 # i²
add r10, r9, r7 # i² + i = i*(i+1)
srwi r11, r10, 1 # i*(i+1)/2
add r12, r11, r8 # i*(i+1)/2 + j
slwi r13, r12, 3 # Convert to byte offset
# Load Cholesky matrix element with indexing
lfdux f1, r3, r13 # Load L[i][j] and advance
# Load independent random number for asset j
slwi r14, r8, 3 # j * 8
lfdx f2, r4, r14 # Load W[j]
# Accumulate: Z[i] += L[i][j] * W[j]
fmadd f10, f1, f2, f10
addi r8, r8, 1 # j++
cmpw r8, r7 # j <= i (lower triangular)
ble cholesky_sum_loop # Continue Cholesky sum
# Store correlated random variable
lis r15, correlated_randoms@ha
addi r15, r15, correlated_randoms@l
slwi r16, r7, 3 # i * 8
stfdx f10, r15, r16 # Store Z[i]
addi r7, r7, 1 # i++
cmpw r7, r5 # i < num_assets?
blt asset_loop_i # Continue asset loop
# Generate asset price paths using geometric Brownian motion
# S(t+dt) = S(t) * exp((μ - σ²/2)*dt + σ*√dt*Z)
li r17, 0 # Asset index for price update
price_update_loop:
# Load current asset price
lis r18, asset_prices@ha
addi r18, r18, asset_prices@l
slwi r19, r17, 3 # Asset index to byte offset
lfdx f11, r18, r19 # Load S(t)
# Load asset parameters
lis r20, drift_rates@ha
addi r20, r20, drift_rates@l
lfdx f12, r20, r19 # Load μ (drift rate)
lis r21, volatilities@ha
addi r21, r21, volatilities@l
lfdx f13, r21, r19 # Load σ (volatility)
# Load correlated random variable for this asset
lis r15, correlated_randoms@ha
addi r15, r15, correlated_randoms@l
lfdx f14, r15, r19 # Load Z[asset]
# Calculate drift term: (μ - σ²/2)*dt
fmul f15, f13, f13 # σ²
lfd f16, half_constant(r0) # 0.5
fmul f17, f15, f16 # σ²/2
fsub f18, f12, f17 # μ - σ²/2
lfd f19, time_step(r0) # dt
fmul f20, f18, f19 # (μ - σ²/2)*dt
# Calculate diffusion term: σ*√dt*Z
fsqrt f21, f19 # √dt
fmul f22, f13, f21 # σ*√dt
fmul f23, f22, f14 # σ*√dt*Z
# Total exponent: (μ - σ²/2)*dt + σ*√dt*Z
fadd f24, f20, f23
# Calculate new price: S(t+dt) = S(t) * exp(exponent)
bl compute_exp # exp(f24) -> f25
fmul f26, f11, f25 # S(t+dt) = S(t) * exp(exponent)
# Store updated price
stfdx f26, r18, r19 # Store new price
addi r17, r17, 1 # Next asset
cmpw r17, r5 # All assets updated?
blt price_update_loop # Continue price updates
# Advance to next time step
add r4, r4, r5 # Advance random number pointer
subi r6, r6, 1 # Decrement time steps
cmpwi r6, 0
bne monte_carlo_path_loop # Continue simulation
Molecular Dynamics - Force Calculation
# Calculate intermolecular forces using Lennard-Jones potential
lis r3, particle_positions@ha
addi r3, r3, particle_positions@l
lis r4, force_vectors@ha
addi r4, r4, force_vectors@l
lis r5, neighbor_list@ha
addi r5, r5, neighbor_list@l
lwz r6, num_particles(r0) # Number of particles
# Each particle has: [x, y, z, mass, type, charge]
# Each force vector: [fx, fy, fz]
particle_force_loop:
# Load current particle data
lwz r7, 0 # Particle index
slwi r8, r7, 5 # Particle offset (* 32 for 4 doubles)
# Load particle position with automatic advancement
lfdux f1, r3, r8 # Load x coordinate and advance
lwz r9, 8 # Standard advance
lfdux f2, r3, r9 # Load y coordinate and advance
lfdux f3, r3, r9 # Load z coordinate and advance
lfdux f4, r3, r9 # Load mass and advance
# Initialize force accumulator
lfd f10, zero_double(r0) # fx = 0
lfd f11, zero_double(r0) # fy = 0
lfd f12, zero_double(r0) # fz = 0
# Process neighbor list for this particle
lwz r10, 0(r5) # Load number of neighbors
addi r5, r5, 4 # Advance to neighbor indices
neighbor_loop:
cmpwi r10, 0 # Any neighbors left?
beq force_complete # Skip if no neighbors
# Load neighbor index and calculate neighbor position offset
lwz r11, 0(r5) # Load neighbor index
slwi r12, r11, 5 # Neighbor offset (* 32)
# Load neighbor position
lis r13, particle_positions@ha
addi r13, r13, particle_positions@l
lfdux f5, r13, r12 # Load neighbor x and advance
lwz r9, 8
lfdux f6, r13, r9 # Load neighbor y and advance
lfdux f7, r13, r9 # Load neighbor z and advance
# Calculate distance vector: dr = r_neighbor - r_current
fsub f13, f5, f1 # dx = x_neighbor - x_current
fsub f14, f6, f2 # dy = y_neighbor - y_current
fsub f15, f7, f3 # dz = z_neighbor - z_current
# Calculate distance squared: r² = dx² + dy² + dz²
fmul f16, f13, f13 # dx²
fmadd f17, f14, f14, f16 # dx² + dy²
fmadd f18, f15, f15, f17 # r² = dx² + dy² + dz²
# Calculate Lennard-Jones force: F = 24ε[(2σ¹²/r¹³) - (σ⁶/r⁷)]
lfd f19, lj_sigma(r0) # σ (Lennard-Jones size parameter)
lfd f20, lj_epsilon(r0) # ε (Lennard-Jones energy parameter)
# Calculate σ²/r²
fmul f21, f19, f19 # σ²
fdiv f22, f21, f18 # σ²/r²
# Calculate (σ²/r²)³ = σ⁶/r⁶
fmul f23, f22, f22 # (σ²/r²)²
fmul f24, f23, f22 # (σ²/r²)³ = σ⁶/r⁶
# Calculate (σ⁶/r⁶)² = σ¹²/r¹²
fmul f25, f24, f24 # σ¹²/r¹²
# Calculate force magnitude: F = 24ε[(2σ¹²/r¹²) - (σ⁶/r⁶)]/r²
fadd f26, f25, f25 # 2σ¹²/r¹²
fsub f27, f26, f24 # 2σ¹²/r¹² - σ⁶/r⁶
lfd f28, twentyfour_constant(r0) # 24
fmul f29, f28, f20 # 24ε
fmul f30, f29, f27 # 24ε[(2σ¹²/r¹²) - (σ⁶/r⁶)]
fdiv f31, f30, f18 # Force magnitude / r²
# Calculate force components: F_i = force_magnitude * dr_i
fmul f0, f31, f13 # fx_component
fmul f1, f31, f14 # fy_component
fmul f2, f31, f15 # fz_component
# Accumulate forces
fadd f10, f10, f0 # total_fx += fx_component
fadd f11, f11, f1 # total_fy += fy_component
fadd f12, f12, f2 # total_fz += fz_component
addi r5, r5, 4 # Next neighbor index
subi r10, r10, 1 # Decrement neighbor count
b neighbor_loop # Continue neighbor processing
force_complete:
# Store total force for this particle
slwi r14, r7, 4 # Force vector offset (* 16 for 2 doubles)
stfdx f10, r4, r14 # Store fx
addi r15, r14, 8
stfdx f11, r4, r15 # Store fy
addi r16, r15, 8
stfdx f12, r4, r16 # Store fz
addi r7, r7, 1 # Next particle
cmpw r7, r6 # All particles processed?
blt particle_force_loop # Continue force calculation