lfdu
Instruction Syntax
| Mnemonic | Format | Flags |
| lfdu | frD,d(rA) | - |
Instruction Encoding
| Field | Bits | Description |
| Primary Opcode | 0-5 | 110011 (0x33) |
| frD | 6-10 | Destination floating-point register |
| rA | 11-15 | Source register A |
| d | 16-31 | 16-bit signed displacement |
Operation
EA ← (rA) + EXTS(d) frD ← MEM(EA, 8) rA ← EA
A double-precision floating-point value (64 bits) is loaded from memory and placed in floating-point register frD. The effective address is computed by adding the sign-extended displacement to the contents of register rA. After the load, the effective address is stored back into register rA.
Note: This instruction cannot be used with rA=0. The update form requires a valid base register. The effective address should be doubleword-aligned (divisible by 8) for optimal performance. This instruction is essential for sequential processing of double-precision floating-point arrays.
Affected Registers
rA - Updated with the effective address after the load operation.
For more information on floating-point operations see Section 2.1.4, "Floating-Point Status and Control Register (FPSCR)," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Scientific Array Processing
# Process array of double-precision values with automatic advance
lis r3, scientific_data@ha
addi r3, r3, scientific_data@l
lwz r4, array_length(r0) # Number of elements
subi r3, r3, 8 # Pre-adjust for first lfdu
compute_loop:
lfdu f1, 8(r3) # Load next double and advance pointer
# Perform scientific computation (e.g., statistical analysis)
bl compute_square_root # f1 = sqrt(f1)
bl compute_logarithm # f1 = log(f1)
# Store result back
stfd f1, 0(r3) # Store computed result
subi r4, r4, 1 # Decrement counter
cmpwi r4, 0
bne compute_loop # Continue if more elements
Matrix Row Processing
# Process matrix rows using update addressing
lis r3, matrix_data@ha
addi r3, r3, matrix_data@l
lwz r4, num_rows(r0) # Number of rows
lwz r5, num_cols(r0) # Number of columns
subi r3, r3, 8 # Pre-adjust pointer
row_loop:
mr r6, r5 # Column counter
col_loop:
lfdu f1, 8(r3) # Load matrix element and advance
# Apply transformation (e.g., normalization)
lfd f2, normalization_factor(r0)
fmul f3, f1, f2 # Apply scaling factor
# Apply bias
lfd f4, bias_value(r0)
fadd f5, f3, f4 # Add bias
# Store transformed value
stfd f5, 0(r3) # Store back to matrix
subi r6, r6, 1 # Decrement column counter
cmpwi r6, 0
bne col_loop # Continue row
subi r4, r4, 1 # Decrement row counter
cmpwi r4, 0
bne row_loop # Continue matrix
Financial Time Series Analysis
# Process financial time series data with moving averages
lis r3, price_data@ha
addi r3, r3, price_data@l
lwz r4, data_points(r0) # Number of data points
li r5, 20 # Moving average window
subi r3, r3, 8 # Pre-adjust pointer
# Initialize moving average sum
lfd f10, zero_constant(r0) # Running sum
li r6, 0 # Current index
time_series_loop:
lfdu f1, 8(r3) # Load next price and advance
# Add to running sum
fadd f10, f10, f1 # sum += current_price
# Check if we have enough data for moving average
cmpw r6, r5 # index >= window_size?
blt skip_average # Skip if not enough data
# Calculate moving average
stw r5, temp_window(r1) # Store window size
lfs f11, temp_window(r1) # Convert to float
fdiv f12, f10, f11 # average = sum / window_size
# Store moving average
lis r7, moving_avg_array@ha
addi r7, r7, moving_avg_array@l
sub r8, r6, r5 # Calculate storage index
addi r8, r8, 1 # Adjust for 0-based index
slwi r9, r8, 3 # Convert to byte offset
stfdx f12, r7, r9 # Store moving average
# Remove oldest value from sum (sliding window)
mr r10, r6 # Current index
sub r11, r10, r5 # Oldest index in window
slwi r12, r11, 3 # Convert to byte offset
# Load oldest value (requires saving current r3)
mr r13, r3 # Save current pointer
lis r14, price_data@ha
addi r14, r14, price_data@l
lfdx f13, r14, r12 # Load oldest value
fsub f10, f10, f13 # Remove from sum
mr r3, r13 # Restore pointer
skip_average:
addi r6, r6, 1 # Increment index
subi r4, r4, 1 # Decrement remaining count
cmpwi r4, 0
bne time_series_loop # Continue processing
Digital Signal Processing - FFT Data Loading
# Load complex numbers for FFT processing with automatic advance
lis r3, complex_data@ha
addi r3, r3, complex_data@l
lwz r4, fft_size(r0) # FFT size (number of complex samples)
subi r3, r3, 8 # Pre-adjust pointer
li r5, 0 # Sample index
# Separate real and imaginary arrays for FFT algorithm
lis r6, real_array@ha
addi r6, r6, real_array@l
lis r7, imag_array@ha
addi r7, r7, imag_array@l
subi r6, r6, 8 # Pre-adjust real array pointer
subi r7, r7, 8 # Pre-adjust imag array pointer
load_complex_loop:
# Load real part
lfdu f1, 8(r3) # Load real part and advance
stfdu f1, 8(r6) # Store in real array and advance
# Load imaginary part
lfdu f2, 8(r3) # Load imaginary part and advance
stfdu f2, 8(r7) # Store in imag array and advance
addi r5, r5, 1 # Increment sample counter
cmpw r5, r4 # Check if done
blt load_complex_loop # Continue loading
# Apply windowing function to reduce spectral leakage
lis r8, window_function@ha
addi r8, r8, window_function@l
subi r8, r8, 8 # Pre-adjust window pointer
mr r5, r4 # Reset sample counter
subi r6, r6, 8 # Reset real array pointer
subi r7, r7, 8 # Reset imag array pointer
# Note: Need to adjust pointers back to start
apply_window_loop:
lfdu f3, 8(r8) # Load window coefficient and advance
lfdu f4, 8(r6) # Load real sample and advance
lfdu f5, 8(r7) # Load imag sample and advance
# Apply window function
fmul f6, f4, f3 # real *= window
fmul f7, f5, f3 # imag *= window
# Store windowed samples back
stfd f6, 0(r6) # Store windowed real
stfd f7, 0(r7) # Store windowed imag
subi r5, r5, 1 # Decrement counter
cmpwi r5, 0
bne apply_window_loop # Continue windowing
Machine Learning - Neural Network Training
# Load training data for neural network with automatic advance
lis r3, training_data@ha
addi r3, r3, training_data@l
lwz r4, num_samples(r0) # Number of training samples
lwz r5, input_size(r0) # Size of each input vector
subi r3, r3, 8 # Pre-adjust pointer
training_loop:
# Load input vector
mr r6, r5 # Input vector counter
lis r7, input_vector@ha
addi r7, r7, input_vector@l
subi r7, r7, 8 # Pre-adjust input vector pointer
load_input_loop:
lfdu f1, 8(r3) # Load input component and advance
stfdu f1, 8(r7) # Store in input vector and advance
subi r6, r6, 1 # Decrement input counter
cmpwi r6, 0
bne load_input_loop # Continue loading input
# Load target output
lfdu f2, 8(r3) # Load target value and advance
stfd f2, target_output(r0) # Store target
# Forward propagation through network
bl forward_propagation # Process input vector
# Load network output for comparison
lfd f3, network_output(r0)
# Calculate error: error = target - output
fsub f4, f2, f3 # f4 = error
# Calculate squared error for loss function
fmul f5, f4, f4 # f5 = error^2
lfd f6, total_loss(r0) # Load accumulated loss
fadd f7, f6, f5 # Add to total loss
stfd f7, total_loss(r0) # Store updated loss
# Backpropagation to update weights
bl backpropagation # Update network weights
subi r4, r4, 1 # Decrement sample counter
cmpwi r4, 0
bne training_loop # Continue training
Computational Physics - Particle Simulation
# Process particle data for physics simulation
lis r3, particle_array@ha
addi r3, r3, particle_array@l
lwz r4, num_particles(r0) # Number of particles
subi r3, r3, 8 # Pre-adjust pointer
# Particle structure: [x, y, z, vx, vy, vz, mass, charge]
# Each field is 8 bytes (double precision)
particle_loop:
# Load position
lfdu f1, 8(r3) # Load x and advance
lfdu f2, 8(r3) # Load y and advance
lfdu f3, 8(r3) # Load z and advance
# Load velocity
lfdu f4, 8(r3) # Load vx and advance
lfdu f5, 8(r3) # Load vy and advance
lfdu f6, 8(r3) # Load vz and advance
# Load mass and charge
lfdu f7, 8(r3) # Load mass and advance
lfdu f8, 8(r3) # Load charge and advance
# Calculate forces (simplified electromagnetic force)
lfd f9, electric_field_x(r0) # Load electric field components
lfd f10, electric_field_y(r0)
lfd f11, electric_field_z(r0)
# Force = charge * electric_field
fmul f12, f8, f9 # fx = charge * Ex
fmul f13, f8, f10 # fy = charge * Ey
fmul f14, f8, f11 # fz = charge * Ez
# Calculate acceleration: a = F / mass
fdiv f15, f12, f7 # ax = fx / mass
fdiv f16, f13, f7 # ay = fy / mass
fdiv f17, f14, f7 # az = fz / mass
# Load time step
lfd f18, time_step(r0) # dt
# Update velocity: v = v + a * dt
fmadd f4, f15, f18, f4 # vx += ax * dt
fmadd f5, f16, f18, f5 # vy += ay * dt
fmadd f6, f17, f18, f6 # vz += az * dt
# Update position: x = x + v * dt
fmadd f1, f4, f18, f1 # x += vx * dt
fmadd f2, f5, f18, f2 # y += vy * dt
fmadd f3, f6, f18, f3 # z += vz * dt
# Store updated particle data (move pointer back)
stfd f1, -64(r3) # Store x (8 fields back)
stfd f2, -56(r3) # Store y
stfd f3, -48(r3) # Store z
stfd f4, -40(r3) # Store vx
stfd f5, -32(r3) # Store vy
stfd f6, -24(r3) # Store vz
# mass and charge unchanged
subi r4, r4, 1 # Decrement particle counter
cmpwi r4, 0
bne particle_loop # Continue simulation
Audio Processing - Convolution Reverb
# Apply convolution reverb using impulse response
lis r3, audio_input@ha
addi r3, r3, audio_input@l
lis r4, impulse_response@ha
addi r4, r4, impulse_response@l
lis r5, audio_output@ha
addi r5, r5, audio_output@l
lwz r6, audio_length(r0) # Length of audio signal
lwz r7, impulse_length(r0) # Length of impulse response
subi r3, r3, 8 # Pre-adjust input pointer
li r8, 0 # Current sample index
convolution_loop:
lfd f10, zero_constant(r0) # Initialize accumulator
# Convolution: output[n] = sum(input[k] * impulse[n-k])
li r9, 0 # Impulse index
mr r10, r3 # Current input position
mr r11, r4 # Reset impulse pointer
subi r11, r11, 8 # Pre-adjust impulse pointer
impulse_loop:
cmpw r9, r7 # Check if done with impulse
bge impulse_done
# Check bounds for input signal
sub r12, r8, r9 # input_index = current - impulse_index
cmpwi r12, 0 # Check if negative
blt skip_impulse # Skip if before start of input
# Load input sample and impulse coefficient
slwi r13, r12, 3 # Convert index to byte offset
lis r14, audio_input@ha
addi r14, r14, audio_input@l
lfdx f1, r14, r13 # Load input[input_index]
lfdu f2, 8(r11) # Load impulse coefficient and advance
# Multiply and accumulate
fmadd f10, f1, f2, f10 # accumulator += input * impulse
skip_impulse:
addi r9, r9, 1 # Next impulse sample
b impulse_loop
impulse_done:
# Store convolution result
slwi r15, r8, 3 # Convert output index to byte offset
stfdx f10, r5, r15 # Store output[current_sample]
# Advance to next input sample
lfdu f3, 8(r3) # Load current input (advance pointer)
addi r8, r8, 1 # Increment sample index
subi r6, r6, 1 # Decrement remaining samples
cmpwi r6, 0
bne convolution_loop # Continue convolution
Geological Data Analysis
# Analyze seismic data with automatic pointer advancement
lis r3, seismic_data@ha
addi r3, r3, seismic_data@l
lwz r4, num_traces(r0) # Number of seismic traces
lwz r5, samples_per_trace(r0) # Samples per trace
subi r3, r3, 8 # Pre-adjust pointer
# Statistics for each trace
lfd f20, zero_constant(r0) # Global minimum
lfd f21, max_double(r0) # Global maximum
lfd f22, zero_constant(r0) # Global sum for mean
trace_loop:
mr r6, r5 # Sample counter for current trace
lfd f10, zero_constant(r0) # Trace sum
lfd f11, max_double(r0) # Trace minimum
lfd f12, zero_constant(r0) # Trace maximum
sample_loop:
lfdu f1, 8(r3) # Load seismic amplitude and advance
# Update trace statistics
fadd f10, f10, f1 # Add to trace sum
# Check for new minimum
fcmpu cr0, f1, f11 # Compare with current min
bge check_max # Skip if not smaller
fmr f11, f1 # Update trace minimum
check_max:
fcmpu cr0, f1, f12 # Compare with current max
ble update_global # Skip if not larger
fmr f12, f1 # Update trace maximum
update_global:
# Update global statistics
fcmpu cr0, f11, f20 # Compare trace min with global min
bge check_global_max # Skip if not smaller
fmr f20, f11 # Update global minimum
check_global_max:
fcmpu cr0, f12, f21 # Compare trace max with global max
ble continue_sample # Skip if not larger
fmr f21, f12 # Update global maximum
continue_sample:
fadd f22, f22, f1 # Add to global sum
subi r6, r6, 1 # Decrement sample counter
cmpwi r6, 0
bne sample_loop # Continue trace
# Calculate trace mean
stw r5, temp_samples(r1) # Store sample count
lfs f13, temp_samples(r1) # Convert to float
fdiv f14, f10, f13 # trace_mean = trace_sum / num_samples
# Store trace statistics
lis r7, trace_stats@ha
addi r7, r7, trace_stats@l
sub r8, r4, 1 # Calculate trace index (reverse count)
li r9, 4 # 4 stats per trace (sum, mean, min, max)
mullw r10, r8, r9 # trace_index * 4
slwi r11, r10, 3 # Convert to byte offset (* 8)
stfdx f10, r7, r11 # Store trace sum
addi r11, r11, 8
stfdx f14, r7, r11 # Store trace mean
addi r11, r11, 8
stfdx f11, r7, r11 # Store trace minimum
addi r11, r11, 8
stfdx f12, r7, r11 # Store trace maximum
subi r4, r4, 1 # Decrement trace counter
cmpwi r4, 0
bne trace_loop # Continue processing
# Calculate and store global statistics
lwz r12, total_samples(r0) # Total number of samples
stw r12, temp_total(r1)
lfs f15, temp_total(r1) # Convert to float
fdiv f16, f22, f15 # global_mean = global_sum / total_samples
stfd f20, global_min(r0) # Store global minimum
stfd f21, global_max(r0) # Store global maximum
stfd f16, global_mean(r0) # Store global mean
Quantum Mechanics - Wavefunction Evolution
# Evolve quantum wavefunction using time-dependent Schrödinger equation
lis r3, wavefunction_real@ha
addi r3, r3, wavefunction_real@l
lis r4, wavefunction_imag@ha
addi r4, r4, wavefunction_imag@l
lis r5, hamiltonian@ha
addi r5, r5, hamiltonian@l
lwz r6, grid_points(r0) # Number of spatial grid points
subi r3, r3, 8 # Pre-adjust real part pointer
subi r4, r4, 8 # Pre-adjust imaginary part pointer
subi r5, r5, 8 # Pre-adjust Hamiltonian pointer
# Load physical constants
lfd f20, hbar(r0) # ℏ (reduced Planck constant)
lfd f21, time_step(r0) # Δt
fdiv f22, f21, f20 # Δt/ℏ
evolution_loop:
# Load wavefunction components
lfdu f1, 8(r3) # Load ψ_real and advance
lfdu f2, 8(r4) # Load ψ_imag and advance
# Load Hamiltonian matrix element (simplified as potential)
lfdu f3, 8(r5) # Load H and advance
# Apply time evolution operator: ψ(t+dt) = exp(-iHdt/ℏ)ψ(t)
# For small dt, use first-order approximation:
# ψ_new = ψ - i(H*dt/ℏ)ψ
# Calculate H*ψ (simplified)
fmul f4, f3, f1 # H * ψ_real
fmul f5, f3, f2 # H * ψ_imag
# Apply time evolution: ψ_new = ψ - i(H*dt/ℏ)ψ
# Real part: ψ_real_new = ψ_real - (-H*ψ_imag*dt/ℏ) = ψ_real + H*ψ_imag*dt/ℏ
fmadd f6, f5, f22, f1 # ψ_real_new = ψ_real + H*ψ_imag*Δt/ℏ
# Imaginary part: ψ_imag_new = ψ_imag - H*ψ_real*dt/ℏ
fnmsub f7, f4, f22, f2 # ψ_imag_new = ψ_imag - H*ψ_real*Δt/ℏ
# Store evolved wavefunction
stfd f6, 0(r3) # Store new real part
stfd f7, 0(r4) # Store new imaginary part
subi r6, r6, 1 # Decrement grid point counter
cmpwi r6, 0
bne evolution_loop # Continue evolution
# Normalize wavefunction to preserve probability
lis r3, wavefunction_real@ha
addi r3, r3, wavefunction_real@l
lis r4, wavefunction_imag@ha
addi r4, r4, wavefunction_imag@l
lwz r6, grid_points(r0) # Reset grid counter
lfd f10, zero_constant(r0) # Normalization sum
subi r3, r3, 8 # Pre-adjust pointers
subi r4, r4, 8
norm_loop:
lfdu f1, 8(r3) # Load real part and advance
lfdu f2, 8(r4) # Load imaginary part and advance
# Calculate |ψ|² = ψ_real² + ψ_imag²
fmadd f3, f1, f1, f10 # sum += ψ_real²
fmadd f10, f2, f2, f3 # sum += ψ_imag²
subi r6, r6, 1 # Decrement counter
cmpwi r6, 0
bne norm_loop # Continue normalization calculation
# Calculate normalization factor: 1/√(∫|ψ|²dx)
fsqrt f11, f10 # √(∫|ψ|²dx)
lfd f12, one_constant(r0) # Load 1.0
fdiv f13, f12, f11 # Normalization factor = 1/√(∫|ψ|²dx)
# Apply normalization
# ... (similar loop to multiply all wavefunction values by f13)