Instruction Syntax
Mnemonic | Format | Flags |
lwzux | rD,rA,rB | - |
Instruction Encoding
Field | Bits | Description |
Primary Opcode | 0-5 | 011111 (0x1F) |
rD | 6-10 | Destination register |
rA | 11-15 | Source register A |
rB | 16-20 | Source register B |
XO | 21-30 | 55 (Extended opcode) |
Rc | 31 | Reserved (0) |
Operation
EA ← (rA) + (rB) rD ← MEM(EA, 4) rA ← EA
A word (32 bits) is loaded from memory and placed in register rD. The effective address is computed by adding the contents of registers rA and rB. After the load, the effective address is stored back into register rA.
Note: This instruction cannot be used with rA=0. The update form requires a valid base register. This is the most advanced addressing mode for word loads, combining indexed addressing with automatic pointer advancement. Essential for high-performance data structure traversal and array processing with dynamic strides.
Affected Registers
rA - Updated with the effective address after the load operation.
For more information on memory addressing see Section 2.1.6, "Effective Address Calculation," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Operating System - Process Table Management
# Traverse process control blocks with variable sizes lis r3, process_table@ha addi r3, r3, process_table@l lis r4, pcb_sizes@ha addi r4, r4, pcb_sizes@l lwz r5, num_processes(r0) # Number of active processes # Process Control Block fields: [pid, state, priority, memory_base, ...] process_scheduler_loop: # Load PCB size for this process (variable due to different process types) lwz r6, 0(r4) # Load PCB size offset # Load process information with automatic advancement lwzux r7, r3, r6 # Load process ID and advance by PCB size lwz r8, 4 # Standard word advance lwzux r9, r3, r8 # Load process state and advance lwzux r10, r3, r8 # Load priority and advance lwzux r11, r3, r8 # Load memory base address and advance # Process scheduling decision based on state and priority cmpwi r9, PROCESS_READY # Check if process is ready bne skip_scheduling # Skip if not ready # Update process time slice and scheduling counters lwzux r12, r3, r8 # Load time slice remaining and advance subi r13, r12, 1 # Decrement time slice cmpwi r13, 0 # Check if time slice expired bgt update_timeslice # Continue if time remaining # Time slice expired - perform context switch bl context_switch # Switch to next process li r13, DEFAULT_TIMESLICE # Reset time slice update_timeslice: stw r13, 0(r3) # Store updated time slice skip_scheduling: addi r4, r4, 4 # Next PCB size subi r5, r5, 1 # Decrement process counter cmpwi r5, 0 bne process_scheduler_loop # Continue scheduling
Database Engine - Index B-Tree Traversal
# Traverse B-tree index with variable node sizes lis r3, btree_root@ha addi r3, r3, btree_root@l lis r4, search_key@ha lwz r5, search_key@l(r4) # Key to search for lis r6, node_sizes@ha addi r6, r6, node_sizes@l # B-tree node structure: [num_keys, key1, ptr1, key2, ptr2, ..., ptr_n+1] btree_search_loop: # Load node size for variable-size nodes (different for leaf/internal) lwz r7, 0(r6) # Load node size offset # Load number of keys in current node lwzux r8, r3, r7 # Load num_keys and advance by node size # Search through keys in current node li r9, 0 # Key index li r10, 4 # Standard word advance key_search_loop: cmpw r9, r8 # Compare key index with num_keys bge key_not_found # Branch if searched all keys lwzux r11, r3, r10 # Load current key and advance cmpw r5, r11 # Compare search key with current key beq key_found # Branch if exact match blt follow_left_ptr # Follow left pointer if search key < current key # search_key > current_key, continue to next key lwzux r12, r3, r10 # Load and skip pointer, advance addi r9, r9, 1 # Increment key index b key_search_loop # Continue searching follow_left_ptr: # Follow pointer to child node lwzux r13, r3, r10 # Load child pointer and advance mr r3, r13 # Update current node pointer b btree_search_loop # Continue search in child node key_found: # Key found - load associated data pointer lwzux r14, r3, r10 # Load data pointer and advance # Process found record bl process_record # Process record at r14 b search_complete key_not_found: # Key not found in current node # Follow rightmost pointer for internal nodes lwzux r15, r3, r10 # Load rightmost pointer and advance cmpwi r15, 0 # Check if null pointer (leaf node) beq search_failed # Key not found mr r3, r15 # Follow rightmost pointer b btree_search_loop # Continue search search_failed: # Key not found in B-tree li r3, -1 # Return error code search_complete:
Graphics Engine - Polygon Mesh Processing
# Process polygon mesh with variable vertex counts per face lis r3, mesh_data@ha addi r3, r3, mesh_data@l lis r4, face_sizes@ha addi r4, r4, face_sizes@l lwz r5, num_faces(r0) # Number of faces in mesh # Face structure: [vertex_count, v1_index, v2_index, ..., vn_index, material_id] mesh_processing_loop: # Load face size (varies based on polygon type: triangle=3, quad=4, n-gon=n) lwz r6, 0(r4) # Load face size offset # Load vertex count for this face lwzux r7, r3, r6 # Load vertex_count and advance by face size # Initialize face processing li r8, 0 # Vertex index counter lfd f10, zero_double(r0) # Face area accumulator # Load first vertex index for area calculation li r9, 4 # Standard word advance lwzux r10, r3, r9 # Load first vertex index and advance lwzux r11, r3, r9 # Load second vertex index and advance # Calculate face area using cross product method vertex_loop: cmpw r8, r7 # Check if processed all vertices bge face_area_complete # Complete if done with vertices lwzux r12, r3, r9 # Load next vertex index and advance # Load vertex coordinates for area calculation lis r13, vertex_array@ha addi r13, r13, vertex_array@l # Load vertex positions (each vertex = [x, y, z]) slwi r14, r10, 4 # v1 offset (* 16 for 4 floats) lfsx f1, r13, r14 # Load v1.x addi r15, r14, 4 lfsx f2, r13, r15 # Load v1.y addi r16, r15, 4 lfsx f3, r13, r16 # Load v1.z slwi r17, r11, 4 # v2 offset lfsx f4, r13, r17 # Load v2.x addi r18, r17, 4 lfsx f5, r13, r18 # Load v2.y addi r19, r18, 4 lfsx f6, r13, r19 # Load v2.z slwi r20, r12, 4 # v3 offset lfsx f7, r13, r20 # Load v3.x addi r21, r20, 4 lfsx f8, r13, r21 # Load v3.y addi r22, r21, 4 lfsx f9, r13, r22 # Load v3.z # Calculate triangle area using cross product: 0.5 * ||(v2-v1) × (v3-v1)|| fsub f11, f4, f1 # v2.x - v1.x fsub f12, f5, f2 # v2.y - v1.y fsub f13, f6, f3 # v2.z - v1.z fsub f14, f7, f1 # v3.x - v1.x fsub f15, f8, f2 # v3.y - v1.y fsub f16, f9, f3 # v3.z - v1.z # Cross product: (v2-v1) × (v3-v1) fmsub f17, f12, f16, f0 # (v2.y-v1.y)*(v3.z-v1.z) fmsub f17, f13, f15, f17 # - (v2.z-v1.z)*(v3.y-v1.y) = cross.x fmsub f18, f13, f14, f0 # (v2.z-v1.z)*(v3.x-v1.x) fmsub f18, f11, f16, f18 # - (v2.x-v1.x)*(v3.z-v1.z) = cross.y fmsub f19, f11, f15, f0 # (v2.x-v1.x)*(v3.y-v1.y) fmsub f19, f12, f14, f19 # - (v2.y-v1.y)*(v3.x-v1.x) = cross.z # Calculate magnitude: ||cross|| = √(x² + y² + z²) fmadd f20, f17, f17, f0 # cross.x² fmadd f20, f18, f18, f20 # + cross.y² fmadd f20, f19, f19, f20 # + cross.z² fsqrt f21, f20 # ||cross|| # Triangle area = 0.5 * ||cross|| lfd f22, half_constant(r0) # 0.5 fmul f23, f21, f22 # Triangle area fadd f10, f10, f23 # Add to face area # Move to next triangle in fan mr r11, r12 # v2 = v3 for next iteration addi r8, r8, 1 # Increment vertex counter b vertex_loop # Continue with next vertex face_area_complete: # Store calculated face area lis r23, face_areas@ha addi r23, r23, face_areas@l sub r24, r5, 1 # Calculate face index (reverse counter) slwi r25, r24, 3 # Convert to double offset stfdx f10, r23, r25 # Store face area # Load material ID for this face lwzux r26, r3, r9 # Load material_id and advance # Process material properties bl process_material # Apply material properties addi r4, r4, 4 # Next face size subi r5, r5, 1 # Decrement face counter cmpwi r5, 0 bne mesh_processing_loop # Continue mesh processing
Compiler Optimization - Loop Unrolling Analysis
# Analyze loop structures for optimization opportunities lis r3, basic_blocks@ha addi r3, r3, basic_blocks@l lis r4, block_sizes@ha addi r4, r4, block_sizes@l lwz r5, num_blocks(r0) # Number of basic blocks # Basic block structure: [block_id, instruction_count, instr1, instr2, ..., exit_targets] loop_analysis_loop: # Load basic block size (varies based on number of instructions) lwz r6, 0(r4) # Load block size offset # Load basic block header lwzux r7, r3, r6 # Load block_id and advance by block size li r8, 4 # Standard word advance lwzux r9, r3, r8 # Load instruction_count and advance # Analyze loop characteristics li r10, 0 # Instruction index li r11, 0 # Loop instruction counter li r12, 0 # Memory operation counter li r13, 0 # Branch counter instruction_analysis_loop: cmpw r10, r9 # Check if analyzed all instructions bge block_analysis_complete # Complete if done lwzux r14, r3, r8 # Load instruction opcode and advance # Classify instruction type srwi r15, r14, 26 # Extract primary opcode (bits 0-5) # Check for loop-relevant instructions cmpwi r15, 0x20 # lwz beq memory_operation cmpwi r15, 0x24 # stw beq memory_operation cmpwi r15, 0x10 # bc (conditional branch) beq branch_operation cmpwi r15, 0x12 # b (unconditional branch) beq branch_operation # Check for arithmetic operations (good for unrolling) cmpwi r15, 0x1F # Extended opcodes bne continue_analysis # Extract extended opcode for detailed analysis andi. r16, r14, 0x3FF # Extract XO field cmpwi r16, 266 # add beq arithmetic_operation cmpwi r16, 40 # subf beq arithmetic_operation cmpwi r16, 235 # mullw beq arithmetic_operation b continue_analysis memory_operation: addi r12, r12, 1 # Increment memory operation counter # Analyze addressing mode for loop optimization potential andi. r17, r14, 0x1F # Extract rA field cmpwi r17, 0 # Check for rA=0 (simple addressing) beq simple_addressing # Complex addressing - check for induction variables bl analyze_induction_variable # Analyze for loop variables simple_addressing: b continue_analysis branch_operation: addi r13, r13, 1 # Increment branch counter # Check if this is a loop back-edge bl analyze_loop_branch # Determine if loop-closing branch b continue_analysis arithmetic_operation: addi r11, r11, 1 # Increment loop instruction counter # Analyze for loop-carried dependencies bl analyze_dependencies # Check data dependencies continue_analysis: addi r10, r10, 1 # Next instruction b instruction_analysis_loop # Continue analysis block_analysis_complete: # Calculate optimization metrics # Unroll factor = min(max_unroll, instructions/memory_ops) cmpwi r12, 0 # Check for divide by zero beq no_memory_ops divw r18, r11, r12 # instructions/memory_ops ratio b calculate_unroll_factor no_memory_ops: li r18, MAX_UNROLL # Default unroll factor calculate_unroll_factor: lwz r19, max_unroll_factor(r0) cmpw r18, r19 # Compare with maximum allowed ble store_unroll_factor mr r18, r19 # Clamp to maximum store_unroll_factor: # Store optimization recommendation lis r20, optimization_data@ha addi r20, r20, optimization_data@l slwi r21, r7, 4 # block_id * 16 (4 words per block) stwx r18, r20, r21 # Store unroll factor addi r22, r21, 4 stwx r11, r20, r22 # Store loop instruction count addi r23, r22, 4 stwx r12, r20, r23 # Store memory operation count addi r24, r23, 4 stwx r13, r20, r24 # Store branch count addi r4, r4, 4 # Next block size subi r5, r5, 1 # Decrement block counter cmpwi r5, 0 bne loop_analysis_loop # Continue analysis
Network Stack - Protocol Header Processing
# Process network packets with variable header lengths lis r3, packet_buffer@ha addi r3, r3, packet_buffer@l lis r4, header_lengths@ha addi r4, r4, header_lengths@l lwz r5, num_packets(r0) # Number of packets to process # Packet structure: [header_type, header_data..., payload_length, payload...] packet_processing_loop: # Load header length for current packet type lwz r6, 0(r4) # Load header length offset # Load packet header with variable length advancement lwzux r7, r3, r6 # Load header_type and advance by header length # Process based on protocol type cmpwi r7, ETHERNET_TYPE beq process_ethernet cmpwi r7, IP_TYPE beq process_ip cmpwi r7, TCP_TYPE beq process_tcp cmpwi r7, UDP_TYPE beq process_udp b unknown_protocol process_ethernet: # Process Ethernet header (14 bytes) li r8, 4 # Standard word advance lwzux r9, r3, r8 # Load destination MAC (first 4 bytes) and advance lwzux r10, r3, r8 # Load destination MAC (last 2 bytes) + source MAC (first 2 bytes) and advance lwzux r11, r3, r8 # Load source MAC (last 4 bytes) and advance lwzux r12, r3, r8 # Load EtherType and advance # Validate Ethernet frame bl validate_ethernet_frame b continue_processing process_ip: # Process IP header (20+ bytes, variable with options) li r8, 4 lwzux r13, r3, r8 # Load version/IHL/ToS/Total Length and advance # Extract header length srwi r14, r13, 8 # Shift to get IHL andi. r15, r14, 0x0F # Extract IHL (Internet Header Length) slwi r16, r15, 2 # Convert to bytes (IHL * 4) # Continue loading IP header lwzux r17, r3, r8 # Load ID/Flags/Fragment Offset and advance lwzux r18, r3, r8 # Load TTL/Protocol/Checksum and advance lwzux r19, r3, r8 # Load source IP and advance lwzux r20, r3, r8 # Load destination IP and advance # Skip IP options if present subi r21, r16, 20 # Calculate options length cmpwi r21, 0 # Check if options present ble no_ip_options add r3, r3, r21 # Skip options no_ip_options: bl process_ip_packet b continue_processing process_tcp: # Process TCP header (20+ bytes, variable with options) li r8, 4 lwzux r22, r3, r8 # Load source/dest ports and advance lwzux r23, r3, r8 # Load sequence number (first part) and advance lwzux r24, r3, r8 # Load sequence number (second part) and advance lwzux r25, r3, r8 # Load acknowledgment number (first part) and advance lwzux r26, r3, r8 # Load acknowledgment number (second part) and advance lwzux r27, r3, r8 # Load data offset/flags/window and advance # Extract TCP header length srwi r28, r27, 28 # Extract data offset (upper 4 bits) slwi r29, r28, 2 # Convert to bytes (offset * 4) # Continue with TCP header lwzux r30, r3, r8 # Load checksum/urgent pointer and advance # Skip TCP options if present subi r31, r29, 20 # Calculate options length cmpwi r31, 0 # Check if options present ble no_tcp_options add r3, r3, r31 # Skip options no_tcp_options: bl process_tcp_segment b continue_processing process_udp: # Process UDP header (8 bytes fixed) li r8, 4 lwzux r9, r3, r8 # Load source/dest ports and advance lwzux r10, r3, r8 # Load length/checksum and advance bl process_udp_datagram b continue_processing unknown_protocol: # Handle unknown protocol bl handle_unknown_protocol continue_processing: # Load payload length li r8, 4 lwzux r11, r3, r8 # Load payload length and advance # Process payload bl process_packet_payload # Process payload data # Skip to next packet add r3, r3, r11 # Skip payload data addi r4, r4, 4 # Next header length subi r5, r5, 1 # Decrement packet counter cmpwi r5, 0 bne packet_processing_loop # Continue packet processing