Instruction Syntax
Mnemonic | Format | Flags |
lfsu | frD,d(rA) | - |
Instruction Encoding
Field | Bits | Description |
Primary Opcode | 0-5 | 110001 (0x31) |
frD | 6-10 | Destination floating-point register |
rA | 11-15 | Source register A |
d | 16-31 | 16-bit signed displacement |
Operation
EA ← (rA) + EXTS(d) frD ← DOUBLE(MEM(EA, 4)) rA ← EA
A single-precision floating-point value (32 bits) is loaded from memory, converted to double-precision format, and placed in floating-point register frD. The effective address is computed by adding the sign-extended displacement to the contents of register rA. After the load, the effective address is stored back into register rA.
Note: This instruction cannot be used with rA=0. The update form requires a valid base register. The loaded single-precision value is automatically converted to double-precision before being stored in the FPR. The effective address should be word-aligned (divisible by 4) for optimal performance.
Affected Registers
rA - Updated with the effective address after the load operation.
For more information on floating-point operations see Section 2.1.4, "Floating-Point Status and Control Register (FPSCR)," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Audio Sample Processing
# Process audio samples with automatic advance (32-bit float samples) lis r3, audio_buffer@ha addi r3, r3, audio_buffer@l lwz r4, num_samples(r0) # Number of audio samples subi r3, r3, 4 # Pre-adjust for first lfsu # Load reverb impulse response lis r5, reverb_impulse@ha addi r5, r5, reverb_impulse@l lwz r6, impulse_length(r0) # Length of impulse response audio_process_loop: lfsu f1, 4(r3) # Load next audio sample and advance pointer # Apply dynamic range compression lfs f2, compression_threshold(r0) fcmpu cr0, f1, f2 # Compare with threshold ble no_compression # Skip if below threshold # Apply compression: output = threshold + (input - threshold) * ratio lfs f3, compression_ratio(r0) # Load compression ratio (0.0 - 1.0) fsub f4, f1, f2 # (input - threshold) fmul f5, f4, f3 # (input - threshold) * ratio fadd f1, f2, f5 # threshold + compressed_amount no_compression: # Apply high-pass filter for clarity lfs f6, prev_sample(r0) # Load previous sample lfs f7, hp_coefficient(r0) # High-pass filter coefficient fsub f8, f1, f6 # Current - previous fmul f9, f8, f7 # Apply filter coefficient stfs f1, prev_sample(r0) # Store current as previous for next iteration # Apply reverb (convolution with impulse response) fmr f10, f9 # Start with filtered sample mr r7, r5 # Reset impulse pointer subi r7, r7, 4 # Pre-adjust for lfsu li r8, 0 # Impulse index reverb_loop: cmpw r8, r6 # Check if processed entire impulse bge reverb_done lfsu f11, 4(r7) # Load impulse coefficient and advance # Calculate delayed sample index sub r9, r8, 0 # For simplicity, use direct convolution cmpwi r9, 0 blt skip_reverb # Skip if negative index # Load delayed sample (simplified - normally would use circular buffer) slwi r10, r9, 2 # Convert to byte offset sub r11, r3, r10 # Calculate delayed sample address cmpw r11, r3 # Bounds check (simplified) bgt skip_reverb lfs f12, 0(r11) # Load delayed sample fmadd f10, f11, f12, f10 # accumulate reverb: result += impulse * delayed_sample skip_reverb: addi r8, r8, 1 # Next impulse coefficient b reverb_loop reverb_done: # Apply final gain and store processed sample lfs f13, output_gain(r0) # Load output gain fmul f14, f10, f13 # Apply gain stfs f14, 0(r3) # Store processed sample back to buffer subi r4, r4, 1 # Decrement sample counter cmpwi r4, 0 bne audio_process_loop # Continue processing
3D Graphics Vertex Processing
# Process 3D vertex data with transformation matrices lis r3, vertex_array@ha addi r3, r3, vertex_array@l lwz r4, num_vertices(r0) # Number of vertices subi r3, r3, 4 # Pre-adjust pointer # Load transformation matrix (4x4 model-view-projection matrix) lis r5, mvp_matrix@ha addi r5, r5, mvp_matrix@l vertex_transform_loop: # Load vertex position (x, y, z, w) lfsu f1, 4(r3) # Load x and advance lfsu f2, 4(r3) # Load y and advance lfsu f3, 4(r3) # Load z and advance lfsu f4, 4(r3) # Load w and advance # Matrix transformation: result = matrix * vertex # Row 0: result.x = m[0]*x + m[1]*y + m[2]*z + m[3]*w lfs f5, 0(r5) # m[0][0] lfs f6, 4(r5) # m[0][1] lfs f7, 8(r5) # m[0][2] lfs f8, 12(r5) # m[0][3] fmul f9, f5, f1 # m[0][0] * x fmadd f9, f6, f2, f9 # + m[0][1] * y fmadd f9, f7, f3, f9 # + m[0][2] * z fmadd f9, f8, f4, f9 # + m[0][3] * w = result.x # Row 1: result.y lfs f5, 16(r5) # m[1][0] lfs f6, 20(r5) # m[1][1] lfs f7, 24(r5) # m[1][2] lfs f8, 28(r5) # m[1][3] fmul f10, f5, f1 # m[1][0] * x fmadd f10, f6, f2, f10 # + m[1][1] * y fmadd f10, f7, f3, f10 # + m[1][2] * z fmadd f10, f8, f4, f10 # + m[1][3] * w = result.y # Row 2: result.z lfs f5, 32(r5) # m[2][0] lfs f6, 36(r5) # m[2][1] lfs f7, 40(r5) # m[2][2] lfs f8, 44(r5) # m[2][3] fmul f11, f5, f1 # m[2][0] * x fmadd f11, f6, f2, f11 # + m[2][1] * y fmadd f11, f7, f3, f11 # + m[2][2] * z fmadd f11, f8, f4, f11 # + m[2][3] * w = result.z # Row 3: result.w lfs f5, 48(r5) # m[3][0] lfs f6, 52(r5) # m[3][1] lfs f7, 56(r5) # m[3][2] lfs f8, 60(r5) # m[3][3] fmul f12, f5, f1 # m[3][0] * x fmadd f12, f6, f2, f12 # + m[3][1] * y fmadd f12, f7, f3, f12 # + m[3][2] * z fmadd f12, f8, f4, f12 # + m[3][3] * w = result.w # Perspective divide (x/w, y/w, z/w) fdiv f13, f9, f12 # x/w fdiv f14, f10, f12 # y/w fdiv f15, f11, f12 # z/w # Store transformed vertex (overwrite original) stfs f13, -16(r3) # Store transformed x stfs f14, -12(r3) # Store transformed y stfs f15, -8(r3) # Store transformed z stfs f12, -4(r3) # Store w (for clipping tests) subi r4, r4, 1 # Decrement vertex counter cmpwi r4, 0 bne vertex_transform_loop # Continue processing vertices
Real-Time Signal Processing - IIR Filter
# Apply Infinite Impulse Response (IIR) filter to signal lis r3, signal_input@ha addi r3, r3, signal_input@l lwz r4, signal_length(r0) # Number of signal samples subi r3, r3, 4 # Pre-adjust pointer # IIR filter coefficients (2nd order Butterworth low-pass filter) lis r5, filter_coeffs@ha addi r5, r5, filter_coeffs@l # Coefficients layout: [b0, b1, b2, a1, a2] where: # y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] # Initialize delay lines (previous input and output samples) lfs f20, zero_constant(r0) # x[n-1] = 0 lfs f21, zero_constant(r0) # x[n-2] = 0 lfs f22, zero_constant(r0) # y[n-1] = 0 lfs f23, zero_constant(r0) # y[n-2] = 0 # Load filter coefficients lfs f10, 0(r5) # b0 lfs f11, 4(r5) # b1 lfs f12, 8(r5) # b2 lfs f13, 12(r5) # a1 lfs f14, 16(r5) # a2 iir_filter_loop: lfsu f1, 4(r3) # Load input sample x[n] and advance # Calculate IIR filter output # y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] fmul f2, f10, f1 # b0 * x[n] fmadd f2, f11, f20, f2 # + b1 * x[n-1] fmadd f2, f12, f21, f2 # + b2 * x[n-2] fnmsub f2, f13, f22, f2 # - a1 * y[n-1] fnmsub f2, f14, f23, f2 # - a2 * y[n-2] # Update delay lines for next iteration fmr f21, f20 # x[n-2] = x[n-1] fmr f20, f1 # x[n-1] = x[n] fmr f23, f22 # y[n-2] = y[n-1] fmr f22, f2 # y[n-1] = y[n] # Store filtered output stfs f2, 0(r3) # Store filtered sample back to buffer subi r4, r4, 1 # Decrement sample counter cmpwi r4, 0 bne iir_filter_loop # Continue filtering
Machine Learning - Neural Network Inference
# Forward pass through dense neural network layer lis r3, input_layer@ha addi r3, r3, input_layer@l lwz r4, input_size(r0) # Number of input neurons lwz r5, output_size(r0) # Number of output neurons lis r6, weight_matrix@ha addi r6, r6, weight_matrix@l # Weight matrix [output_size x input_size] lis r7, output_layer@ha addi r7, r7, output_layer@l subi r3, r3, 4 # Pre-adjust input pointer # Process each output neuron li r8, 0 # Output neuron index output_neuron_loop: lfs f10, zero_constant(r0) # Initialize accumulator for this neuron # Reset input pointer for this output neuron lis r9, input_layer@ha addi r9, r9, input_layer@l subi r9, r9, 4 # Pre-adjust for lfsu mr r10, r4 # Reset input counter # Calculate weight matrix offset for current output neuron mullw r11, r8, r4 # output_index * input_size slwi r12, r11, 2 # Convert to byte offset (* 4) add r13, r6, r12 # Weight pointer for this output neuron subi r13, r13, 4 # Pre-adjust for lfsu input_neuron_loop: lfsu f1, 4(r9) # Load input activation and advance lfsu f2, 4(r13) # Load weight and advance # Multiply-accumulate: sum += input * weight fmadd f10, f1, f2, f10 subi r10, r10, 1 # Decrement input counter cmpwi r10, 0 bne input_neuron_loop # Continue for all inputs # Load bias for this output neuron lis r14, bias_array@ha addi r14, r14, bias_array@l slwi r15, r8, 2 # Convert output index to byte offset lfsx f3, r14, r15 # Load bias # Add bias: activation = sum + bias fadd f11, f10, f3 # Apply ReLU activation function: max(0, x) lfs f4, zero_constant(r0) fcmpu cr0, f11, f4 # Compare with 0 blt use_zero # Use 0 if negative fmr f12, f11 # Use computed value if positive b store_activation use_zero: fmr f12, f4 # Use 0 for negative values store_activation: # Store output activation slwi r16, r8, 2 # Convert output index to byte offset stfsx f12, r7, r16 # Store activation in output layer addi r8, r8, 1 # Next output neuron cmpw r8, r5 # Check if done with all outputs blt output_neuron_loop # Continue for all output neurons # Apply softmax for classification (optional) # First pass: find maximum for numerical stability lfs f20, neg_infinity(r0) # Start with very negative value li r8, 0 # Reset output index find_max_loop: slwi r16, r8, 2 lfsx f13, r7, r16 # Load activation fcmpu cr0, f13, f20 # Compare with current max ble not_new_max fmr f20, f13 # Update maximum not_new_max: addi r8, r8, 1 cmpw r8, r5 blt find_max_loop # Second pass: compute exp(x - max) and sum lfs f21, zero_constant(r0) # Sum of exponentials lis r17, temp_exp@ha addi r17, r17, temp_exp@l # Temporary array for exponentials li r8, 0 # Reset index subi r17, r17, 4 # Pre-adjust for stfsu exp_sum_loop: slwi r16, r8, 2 lfsx f14, r7, r16 # Load activation fsub f15, f14, f20 # x - max bl compute_exp # Compute exp(x - max) -> result in f16 stfsu f16, 4(r17) # Store exponential and advance fadd f21, f21, f16 # Add to sum addi r8, r8, 1 cmpw r8, r5 blt exp_sum_loop # Third pass: divide by sum to get probabilities lis r17, temp_exp@ha addi r17, r17, temp_exp@l subi r17, r17, 4 # Pre-adjust for lfsu li r8, 0 # Reset index softmax_loop: lfsu f17, 4(r17) # Load exponential and advance fdiv f18, f17, f21 # exp / sum = probability slwi r16, r8, 2 stfsx f18, r7, r16 # Store probability addi r8, r8, 1 cmpw r8, r5 blt softmax_loop
Digital Image Processing - Convolution
# Apply 2D convolution filter to image (e.g., edge detection, blur) lis r3, image_data@ha addi r3, r3, image_data@l lwz r4, image_width(r0) # Image width in pixels lwz r5, image_height(r0) # Image height in pixels lis r6, filter_kernel@ha addi r6, r6, filter_kernel@l # 3x3 convolution kernel lis r7, output_image@ha addi r7, r7, output_image@l # Process each pixel (excluding border for simplicity) li r8, 1 # Start from row 1 (skip border) subi r9, r5, 1 # End at height-1 (skip border) row_loop: li r10, 1 # Start from column 1 (skip border) subi r11, r4, 1 # End at width-1 (skip border) col_loop: lfs f10, zero_constant(r0) # Initialize convolution sum # Apply 3x3 kernel li r12, -1 # Kernel row offset (-1, 0, 1) li r13, 3 # Kernel row counter kernel_row_loop: li r14, -1 # Kernel column offset (-1, 0, 1) li r15, 3 # Kernel column counter kernel_col_loop: # Calculate source pixel position add r16, r8, r12 # source_row = current_row + kernel_row_offset add r17, r10, r14 # source_col = current_col + kernel_col_offset # Calculate source pixel address mullw r18, r16, r4 # source_row * image_width add r19, r18, r17 # + source_col slwi r20, r19, 2 # Convert to byte offset (* 4) add r21, r3, r20 # Source pixel address # Load source pixel value lfs f1, 0(r21) # Load pixel value # Calculate kernel coefficient address addi r22, r12, 1 # Convert kernel row offset to index (0-2) mulli r23, r22, 3 # kernel_row_index * 3 addi r24, r14, 1 # Convert kernel col offset to index (0-2) add r25, r23, r24 # kernel_index = row_index * 3 + col_index slwi r26, r25, 2 # Convert to byte offset add r27, r6, r26 # Kernel coefficient address # Load kernel coefficient lfs f2, 0(r27) # Load kernel coefficient # Multiply and accumulate fmadd f10, f1, f2, f10 # sum += pixel * kernel_coeff addi r14, r14, 1 # Next kernel column offset subi r15, r15, 1 # Decrement column counter cmpwi r15, 0 bne kernel_col_loop # Continue kernel column addi r12, r12, 1 # Next kernel row offset subi r13, r13, 1 # Decrement row counter cmpwi r13, 0 bne kernel_row_loop # Continue kernel row # Store convolution result mullw r28, r8, r4 # current_row * image_width add r29, r28, r10 # + current_col slwi r30, r29, 2 # Convert to byte offset add r31, r7, r30 # Output pixel address stfs f10, 0(r31) # Store convolved pixel addi r10, r10, 1 # Next column cmpw r10, r11 # Check if done with row blt col_loop # Continue row addi r8, r8, 1 # Next row cmpw r8, r9 # Check if done with image blt row_loop # Continue image processing
Financial Modeling - Monte Carlo Simulation
# Monte Carlo simulation for option pricing lis r3, random_numbers@ha addi r3, r3, random_numbers@l lwz r4, num_simulations(r0) # Number of Monte Carlo paths subi r3, r3, 4 # Pre-adjust pointer # Load option parameters lfs f20, spot_price(r0) # Current stock price lfs f21, strike_price(r0) # Option strike price lfs f22, risk_free_rate(r0) # Risk-free interest rate lfs f23, volatility(r0) # Stock volatility lfs f24, time_to_expiry(r0) # Time to expiration lfs f25, zero_constant(r0) # Zero for max calculations lfs f26, zero_constant(r0) # Accumulator for option values # Precalculate constants # drift = (r - 0.5 * σ²) * T lfs f27, half_constant(r0) # 0.5 fmul f28, f23, f23 # σ² fmul f29, f27, f28 # 0.5 * σ² fsub f30, f22, f29 # r - 0.5 * σ² fmul f31, f30, f24 # drift = (r - 0.5 * σ²) * T # vol_sqrt_t = σ * √T fsqrt f0, f24 # √T fmul f1, f23, f0 # σ * √T monte_carlo_loop: lfsu f2, 4(r3) # Load random number (standard normal) and advance # Calculate stock price at expiration using Black-Scholes formula # S_T = S_0 * exp(drift + σ*√T*Z) where Z is standard normal random fmul f3, f1, f2 # σ * √T * Z fadd f4, f31, f3 # drift + σ * √T * Z bl compute_exp # exp(drift + σ * √T * Z) -> result in f5 fmul f6, f20, f5 # S_T = S_0 * exp(...) # Calculate option payoff (European call option) # payoff = max(S_T - K, 0) fsub f7, f6, f21 # S_T - K fcmpu cr0, f7, f25 # Compare with 0 blt zero_payoff # Payoff is 0 if S_T < K fmr f8, f7 # Payoff = S_T - K b add_payoff zero_payoff: fmr f8, f25 # Payoff = 0 add_payoff: fadd f26, f26, f8 # Add to accumulator subi r4, r4, 1 # Decrement simulation counter cmpwi r4, 0 bne monte_carlo_loop # Continue simulation # Calculate option price # price = exp(-r*T) * (sum_of_payoffs / num_simulations) lwz r5, num_simulations(r0) # Reload total number of simulations stw r5, temp_simulations(r1) # Store as float lfs f9, temp_simulations(r1) # Load as float fdiv f10, f26, f9 # Average payoff fmul f11, f22, f24 # r * T fneg f12, f11 # -r * T bl compute_exp # exp(-r * T) -> result in f13 fmul f14, f13, f10 # Discounted expected payoff = option price stfs f14, option_price(r0) # Store calculated option price # Calculate additional Greeks (delta, gamma, etc.) if needed # Delta approximation using finite differences would require additional simulations # with slightly perturbed spot prices
Scientific Computing - Numerical Integration
# Adaptive quadrature integration using Simpson's rule lis r3, function_values@ha addi r3, r3, function_values@l lwz r4, num_intervals(r0) # Number of integration intervals subi r3, r3, 4 # Pre-adjust pointer # Integration parameters lfs f20, integration_start(r0) # Lower bound lfs f21, integration_end(r0) # Upper bound lfs f22, zero_constant(r0) # Integral accumulator # Calculate step size: h = (b - a) / n fsub f23, f21, f20 # b - a lwz r5, num_intervals(r0) stw r5, temp_intervals(r1) lfs f24, temp_intervals(r1) # Convert to float fdiv f25, f23, f24 # h = (b - a) / n # Simpson's rule coefficients lfs f26, one_constant(r0) # 1 lfs f27, four_constant(r0) # 4 lfs f28, two_constant(r0) # 2 lfs f29, six_constant(r0) # 6 # Simpson's rule: ∫f(x)dx ≈ (h/3)[f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + ... + f(xₙ)] li r6, 0 # Interval index integration_loop: lfsu f1, 4(r3) # Load function value f(xᵢ) and advance # Determine Simpson's coefficient based on position cmpwi r6, 0 # First point? beq first_point cmpw r6, r4 # Last point? beq last_point # Check if even or odd index (excluding first and last) andi. r7, r6, 1 # Check if odd bne odd_point # Odd indices get coefficient 4 # Even point (coefficient 2) fmul f2, f1, f28 # f(xᵢ) * 2 b add_to_integral first_point: last_point: # First and last points get coefficient 1 fmul f2, f1, f26 # f(xᵢ) * 1 b add_to_integral odd_point: # Odd points get coefficient 4 fmul f2, f1, f27 # f(xᵢ) * 4 add_to_integral: fadd f22, f22, f2 # Add weighted function value to sum addi r6, r6, 1 # Next interval cmpw r6, r4 # Check if done ble integration_loop # Continue (≤ because we need n+1 points) # Final result: integral = (h/3) * sum fdiv f30, f25, f29 # h/3 fmul f31, f30, f22 # (h/3) * sum stfs f31, integral_result(r0) # Store final integral value # Error estimation using Richardson extrapolation (optional) # This would involve computing the integral with half the step size # and comparing results for adaptive refinement