Instruction Syntax
Mnemonic | Format | Flags |
lhzux | rD,rA,rB | - |
Instruction Encoding
Field | Bits | Description |
Primary Opcode | 0-5 | 011111 (0x1F) |
rD | 6-10 | Destination register |
rA | 11-15 | Source register A |
rB | 16-20 | Source register B |
XO | 21-30 | 311 (Extended opcode) |
Rc | 31 | Reserved (0) |
Operation
EA ← (rA) + (rB) rD ← 0x0000 || MEM(EA, 2) rA ← EA
A halfword (16 bits) is loaded from memory and placed in the low-order 16 bits of register rD. The upper 16 bits of rD are set to zero. The effective address is computed by adding the contents of registers rA and rB. After the load, the effective address is stored back into register rA.
Note: This instruction cannot be used with rA=0. The update form requires a valid base register. This indexed update form is particularly useful for traversing 16-bit data structures with dynamic stride patterns while maintaining automatic pointer advancement.
Affected Registers
rA - Updated with the effective address after the load operation.
For more information on memory addressing see Section 2.1.6, "Effective Address Calculation," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Unicode String Processing with Variable Character Widths
# Process Unicode string with mixed single and double-byte characters lis r3, unicode_string@ha addi r3, r3, unicode_string@l li r4, 2 # Default advance (2 bytes for BMP characters) unicode_process_loop: mr r5, r4 # Use calculated advance lhzux r6, r3, r5 # Load Unicode code point and advance # Check for end of string cmpwi r6, 0 # Null terminator beq string_complete # Check Unicode plane (Basic Multilingual Plane vs. supplementary) lis r7, 0xD800 # High surrogate start cmpw r6, r7 blt bmp_character # Basic Multilingual Plane character lis r8, 0xDBFF # High surrogate end cmpw r6, r8 bgt bmp_character # Not a surrogate pair # High surrogate found - load low surrogate li r9, 2 # Advance 2 more bytes lhzux r10, r3, r9 # Load low surrogate # Combine surrogates into 21-bit code point # code_point = ((high - 0xD800) << 10) + (low - 0xDC00) + 0x10000 lis r11, 0xD800 sub r12, r6, r11 # high - 0xD800 slwi r13, r12, 10 # (high - 0xD800) << 10 lis r14, 0xDC00 sub r15, r10, r14 # low - 0xDC00 add r16, r13, r15 # Combine parts lis r17, 1 # 0x10000 add r18, r16, r17 # Final 21-bit code point # Process supplementary character bl process_supplementary_char # Pass code point in r18 li r4, 2 # Next advance = 2 bytes (already advanced 4 total) b unicode_process_loop bmp_character: # Process Basic Multilingual Plane character bl process_bmp_char # Pass code point in r6 li r4, 2 # Next advance = 2 bytes b unicode_process_loop string_complete:
16-bit Image Processing with ROI (Region of Interest)
# Process 16-bit grayscale image with variable stride for ROI lis r3, image_data@ha addi r3, r3, image_data@l lwz r4, image_width(r0) # Full image width lwz r5, roi_start_x(r0) # ROI start column lwz r6, roi_start_y(r0) # ROI start row lwz r7, roi_width(r0) # ROI width lwz r8, roi_height(r0) # ROI height # Calculate starting position in image mullw r9, r6, r4 # roi_start_y * image_width add r10, r9, r5 # + roi_start_x slwi r11, r10, 1 # Convert to byte offset (* 2 for 16-bit) add r3, r3, r11 # Adjust base pointer to ROI start subi r3, r3, 2 # Pre-adjust for first lhzux # Calculate stride to next ROI row sub r12, r4, r7 # image_width - roi_width slwi r13, r12, 1 # Convert to bytes (* 2) li r14, 0 # Current ROI row roi_row_loop: mr r15, r7 # ROI width counter li r16, 2 # Column advance (2 bytes per pixel) roi_col_loop: lhzux r17, r3, r16 # Load 16-bit pixel and advance # Apply image processing (e.g., histogram equalization) # Normalize pixel value to 0-255 range for lookup srwi r18, r17, 8 # Convert 16-bit to 8-bit index lis r19, histogram_lut@ha addi r19, r19, histogram_lut@l lhzx r20, r19, r18 # Load equalized value from lookup table # Apply additional processing (e.g., noise reduction) lis r21, noise_threshold@ha lhz r22, noise_threshold@l(r21) cmpw r20, r22 # Compare with noise threshold blt noise_pixel # Handle noise pixel # Normal pixel processing bl enhance_pixel # Apply enhancement algorithm b store_pixel noise_pixel: # Apply noise reduction (e.g., median filter result) bl apply_noise_reduction # Returns processed value in r20 store_pixel: sth r20, 0(r3) # Store processed pixel back subi r15, r15, 1 # Decrement column counter cmpwi r15, 0 bne roi_col_loop # Continue ROI row # Move to next ROI row add r3, r3, r13 # Add stride to skip non-ROI pixels addi r14, r14, 1 # Next ROI row cmpw r14, r8 # Check if done with ROI blt roi_row_loop # Continue processing
Audio Sample Rate Conversion
# Convert 16-bit audio from 44.1kHz to 48kHz using linear interpolation lis r3, input_audio@ha addi r3, r3, input_audio@l lis r4, output_audio@ha addi r4, r4, output_audio@l lwz r5, input_samples(r0) # Number of input samples # Conversion ratio: 48000/44100 = 1.08843537... # We'll use fixed-point arithmetic: ratio = 1.08843537 * 65536 = 7132.97... li r6, 7133 # Fixed-point ratio (16.16 format) li r7, 0 # Fixed-point position in input li r8, 0 # Output sample counter subi r3, r3, 2 # Pre-adjust for lhzux subi r4, r4, 2 # Pre-adjust for sthux resample_loop: # Calculate integer and fractional parts of input position srwi r9, r7, 16 # Integer part andi r10, r7, 0xFFFF # Fractional part (0-65535) # Check bounds cmpw r9, r5 # Check if beyond input bge resample_done # Done if past end # Load current and next sample for interpolation slwi r11, r9, 1 # Convert sample index to byte offset lis r12, input_audio@ha addi r12, r12, input_audio@l lhzx r13, r12, r11 # Load current sample # Check if next sample exists addi r14, r9, 1 # Next sample index cmpw r14, r5 # Check bounds bge use_current_sample # Use current if no next sample addi r15, r11, 2 # Next sample byte offset lhzx r16, r12, r15 # Load next sample # Linear interpolation: result = current + (next - current) * fraction sub r17, r16, r13 # next - current mullw r18, r17, r10 # (next - current) * fraction srwi r19, r18, 16 # Divide by 65536 (shift right 16) add r20, r13, r19 # current + interpolated_difference b store_sample use_current_sample: mr r20, r13 # Use current sample as-is store_sample: li r21, 2 # Advance 2 bytes sthux r20, r4, r21 # Store interpolated sample and advance # Advance input position by ratio add r7, r7, r6 # Add fixed-point ratio addi r8, r8, 1 # Increment output counter # Check if we need more output samples lwz r22, target_output_samples(r0) cmpw r8, r22 # Check if done blt resample_loop # Continue resampling resample_done:
Network Packet Header Parsing
# Parse variable-length network headers with 16-bit fields lis r3, packet_data@ha addi r3, r3, packet_data@l lwz r4, packet_length(r0) # Total packet length # Parse Ethernet header (14 bytes) li r5, 2 # Advance 2 bytes per field lhzux r6, r3, r5 # Load destination MAC bytes 0-1 and advance lhzux r7, r3, r5 # Load destination MAC bytes 2-3 and advance lhzux r8, r3, r5 # Load destination MAC bytes 4-5 and advance lhzux r9, r3, r5 # Load source MAC bytes 0-1 and advance lhzux r10, r3, r5 # Load source MAC bytes 2-3 and advance lhzux r11, r3, r5 # Load source MAC bytes 4-5 and advance lhzux r12, r3, r5 # Load EtherType and advance # Check EtherType to determine next header li r13, 0x0800 # IPv4 cmpw r12, r13 beq parse_ipv4 li r14, 0x86DD # IPv6 cmpw r12, r14 beq parse_ipv6 li r15, 0x8100 # VLAN tag cmpw r12, r15 beq parse_vlan b unknown_ethertype parse_vlan: # Parse VLAN tag (4 bytes total) lhzux r16, r3, r5 # Load VLAN priority/DEI/VID and advance lhzux r17, r3, r5 # Load actual EtherType and advance # Extract VLAN ID andi r18, r16, 0x0FFF # Extract 12-bit VLAN ID stw r18, vlan_id(r0) # Store VLAN ID # Process actual EtherType after VLAN tag mr r12, r17 # Use inner EtherType li r13, 0x0800 # IPv4 cmpw r12, r13 beq parse_ipv4 li r14, 0x86DD # IPv6 cmpw r12, r14 beq parse_ipv6 b unknown_ethertype parse_ipv4: # Parse IPv4 header (20+ bytes) lhzux r19, r3, r5 # Load version/IHL/ToS/Total Length and advance # Extract and validate version srwi r20, r19, 12 # Extract version (upper 4 bits of first halfword) andi r21, r20, 0x0F # Isolate version cmpwi r21, 4 # Check IPv4 version bne invalid_ipv4 # Extract header length srwi r22, r19, 8 # Shift to get IHL in lower bits andi r23, r22, 0x0F # Extract IHL (Internet Header Length) slwi r24, r23, 2 # Convert to bytes (IHL * 4) stw r24, ip_header_len(r0) # Store header length lhzux r25, r3, r5 # Load ID/Flags/Fragment Offset and advance lhzux r26, r3, r5 # Load TTL/Protocol/Header Checksum and advance # Extract protocol field for further parsing andi r27, r26, 0x00FF # Extract protocol (lower 8 bits) stw r27, ip_protocol(r0) # Store protocol lhzux r28, r3, r5 # Load source IP bytes 0-1 and advance lhzux r29, r3, r5 # Load source IP bytes 2-3 and advance lhzux r30, r3, r5 # Load dest IP bytes 0-1 and advance lhzux r31, r3, r5 # Load dest IP bytes 2-3 and advance # Check for IP options (if header length > 20) cmpwi r24, 20 # Compare with minimum header length ble no_ip_options # No options if length = 20 # Skip IP options subi r24, r24, 20 # Calculate options length add r3, r3, r24 # Skip options bytes no_ip_options: # Parse next protocol based on protocol field cmpwi r27, 6 # TCP beq parse_tcp cmpwi r27, 17 # UDP beq parse_udp cmpwi r27, 1 # ICMP beq parse_icmp b unknown_protocol parse_ipv6: # Parse IPv6 header (40 bytes fixed) lhzux r19, r3, r5 # Load version/traffic class/flow label part 1 lhzux r20, r3, r5 # Load flow label part 2/payload length lhzux r21, r3, r5 # Load next header/hop limit # Extract next header for protocol parsing srwi r22, r21, 8 # Extract next header field andi r23, r22, 0x00FF # Isolate next header stw r23, ipv6_next_header(r0) # Store next header # Skip IPv6 addresses (32 bytes = 16 halfwords) li r24, 16 # Number of halfwords to skip skip_ipv6_addresses: lhzux r25, r3, r5 # Load and skip address halfword subi r24, r24, 1 # Decrement counter cmpwi r24, 0 bne skip_ipv6_addresses # Continue skipping # Parse next protocol cmpwi r23, 6 # TCP beq parse_tcp cmpwi r23, 17 # UDP beq parse_udp cmpwi r23, 58 # ICMPv6 beq parse_icmpv6 b unknown_protocol parse_tcp: # Parse TCP header (20+ bytes) lhzux r26, r3, r5 # Load source port and advance lhzux r27, r3, r5 # Load destination port and advance lhzux r28, r3, r5 # Load sequence number part 1 and advance lhzux r29, r3, r5 # Load sequence number part 2 and advance lhzux r30, r3, r5 # Load acknowledgment number part 1 and advance lhzux r31, r3, r5 # Load acknowledgment number part 2 and advance lhzux r6, r3, r5 # Load data offset/flags and advance # Extract TCP header length srwi r7, r6, 12 # Extract data offset (upper 4 bits) andi r8, r7, 0x0F # Isolate data offset slwi r9, r8, 2 # Convert to bytes (offset * 4) stw r9, tcp_header_len(r0) # Store TCP header length lhzux r10, r3, r5 # Load window size and advance lhzux r11, r3, r5 # Load checksum/urgent pointer and advance # Skip TCP options if present cmpwi r9, 20 # Compare with minimum TCP header length ble no_tcp_options # No options if length = 20 subi r12, r9, 20 # Calculate options length add r3, r3, r12 # Skip TCP options no_tcp_options: # Process TCP payload bl process_tcp_payload b parsing_complete parse_udp: # Parse UDP header (8 bytes) lhzux r13, r3, r5 # Load source port and advance lhzux r14, r3, r5 # Load destination port and advance lhzux r15, r3, r5 # Load length and advance lhzux r16, r3, r5 # Load checksum and advance # Process UDP payload bl process_udp_payload b parsing_complete parse_icmp: parse_icmpv6: # Parse ICMP header (8+ bytes depending on type) lhzux r17, r3, r5 # Load type/code and advance lhzux r18, r3, r5 # Load checksum and advance lhzux r19, r3, r5 # Load identifier and advance lhzux r20, r3, r5 # Load sequence number and advance bl process_icmp_payload b parsing_complete unknown_ethertype: bl handle_unknown_ethertype b parsing_complete invalid_ipv4: bl handle_invalid_ipv4 b parsing_complete unknown_protocol: bl handle_unknown_protocol parsing_complete:
Digital Music Synthesis - Wavetable
# Synthesize audio using 16-bit wavetable with variable pitch lis r3, wavetable@ha addi r3, r3, wavetable@l lis r4, output_buffer@ha addi r4, r4, output_buffer@l lwz r5, wavetable_size(r0) # Size of wavetable in samples lwz r6, num_output_samples(r0) # Number of samples to generate # Synthesis parameters lfs f1, base_frequency(r0) # Base frequency of wavetable (e.g., 440 Hz) lfs f2, target_frequency(r0) # Desired output frequency lfs f3, sample_rate(r0) # Audio sample rate (e.g., 44100 Hz) # Calculate phase increment per sample # phase_inc = (target_freq / base_freq) * wavetable_size * (1 / sample_rate) fdiv f4, f2, f1 # target_freq / base_freq stw r5, temp_wavetable_size(r1) lfs f5, temp_wavetable_size(r1) # Convert wavetable size to float fmul f6, f4, f5 # * wavetable_size lfs f7, one_constant(r0) # 1.0 fdiv f8, f7, f3 # 1 / sample_rate fmul f9, f6, f8 # phase_inc per sample # Convert to fixed-point for efficiency (16.16 format) lfs f10, fixed_point_scale(r0) # 65536.0 fmul f11, f9, f10 # phase_inc * 65536 fctiwz f12, f11 # Convert to integer stfd f12, temp_phase_inc(r1) lwz r7, temp_phase_inc+4(r1) # Load fixed-point phase increment li r8, 0 # Current phase (fixed-point) subi r4, r4, 2 # Pre-adjust output pointer subi r3, r3, 2 # Pre-adjust wavetable pointer synthesis_loop: # Extract integer and fractional parts of phase srwi r9, r8, 16 # Integer part (wavetable index) andi r10, r8, 0xFFFF # Fractional part for interpolation # Wrap phase around wavetable divw r11, r9, r5 # index / wavetable_size mullw r12, r11, r5 # (index / size) * size sub r13, r9, r12 # wrapped_index = index % size # Load current and next wavetable samples slwi r14, r13, 1 # Convert index to byte offset lis r15, wavetable@ha addi r15, r15, wavetable@l lhzx r16, r15, r14 # Load current sample # Calculate next sample index (with wrapping) addi r17, r13, 1 # next_index cmpw r17, r5 # Check if beyond end blt no_wrap_needed # No wrap needed li r17, 0 # Wrap to beginning no_wrap_needed: slwi r18, r17, 1 # Convert next index to byte offset lhzx r19, r15, r18 # Load next sample # Linear interpolation between samples # result = current + (next - current) * fraction sub r20, r19, r16 # next - current mullw r21, r20, r10 # (next - current) * fraction srwi r22, r21, 16 # Divide by 65536 add r23, r16, r22 # current + interpolated difference # Apply envelope and effects here if needed # For now, just apply simple amplitude scaling lhz r24, amplitude_scale(r0) # Load amplitude multiplier mullw r25, r23, r24 # Scale amplitude srwi r26, r25, 8 # Normalize (assuming scale was 8-bit) # Store synthesized sample li r27, 2 # Advance 2 bytes sthux r26, r4, r27 # Store sample and advance output pointer # Advance phase add r8, r8, r7 # Add phase increment subi r6, r6, 1 # Decrement output sample counter cmpwi r6, 0 bne synthesis_loop # Continue synthesis
Data Compression - LZ77 Sliding Window
# LZ77 compression using sliding window with 16-bit tokens lis r3, input_data@ha addi r3, r3, input_data@l lis r4, compressed_output@ha addi r4, r4, compressed_output@l lwz r5, input_size(r0) # Size of input data li r6, 32768 # Sliding window size (32KB) li r7, 258 # Maximum match length subi r3, r3, 2 # Pre-adjust input pointer subi r4, r4, 2 # Pre-adjust output pointer li r8, 0 # Current position in input compress_loop: cmpw r8, r5 # Check if done with input bge compression_done # Find longest match in sliding window li r9, 0 # Best match distance li r10, 0 # Best match length # Calculate window start position sub r11, r8, r6 # window_start = current_pos - window_size cmpwi r11, 0 # Check for negative bge window_start_ok li r11, 0 # Clamp to 0 window_start_ok: mr r12, r11 # Search position in window window_search_loop: cmpw r12, r8 # Check if reached current position bge search_done # Compare bytes starting at search position mr r13, r12 # Window position mr r14, r8 # Current position li r15, 0 # Match length match_loop: cmpw r14, r5 # Check bounds for current position bge match_done cmpw r15, r7 # Check maximum match length bge match_done # Load bytes to compare slwi r16, r13, 1 # Convert to byte offset lis r17, input_data@ha addi r17, r17, input_data@l lhzx r18, r17, r16 # Load window byte slwi r19, r14, 1 # Convert current pos to byte offset lhzx r20, r17, r19 # Load current byte cmpw r18, r20 # Compare bytes bne match_done # Stop if no match addi r13, r13, 1 # Next window position addi r14, r14, 1 # Next current position addi r15, r15, 1 # Increment match length b match_loop match_done: # Check if this is the best match so far cmpw r15, r10 # Compare with best match length ble not_best_match # Skip if not better mr r10, r15 # Update best match length sub r9, r8, r12 # Calculate match distance add r9, r9, r15 # Adjust distance not_best_match: addi r12, r12, 1 # Next search position b window_search_loop search_done: # Encode the result cmpwi r10, 3 # Minimum match length blt literal_byte # Use literal if match too short # Encode as length/distance pair # Format: [1-bit flag][15-bit distance][16-bit length] ori r21, r9, 0x8000 # Set flag bit for match li r22, 2 # Advance 2 bytes sthux r21, r4, r22 # Store distance with flag and advance li r22, 2 # Advance 2 bytes sthux r10, r4, r22 # Store length and advance # Advance input position by match length add r8, r8, r10 b compress_loop literal_byte: # Encode as literal byte # Format: [1-bit flag=0][15-bit literal value] slwi r23, r8, 1 # Convert position to byte offset lis r24, input_data@ha addi r24, r24, input_data@l lhzx r25, r24, r23 # Load literal byte andi r26, r25, 0x7FFF # Clear flag bit (literal) li r22, 2 # Advance 2 bytes sthux r26, r4, r22 # Store literal and advance addi r8, r8, 1 # Advance input position by 1 b compress_loop compression_done:
Real-Time Spectral Analysis
# Real-time spectral analysis using sliding FFT window lis r3, audio_input@ha addi r3, r3, audio_input@l lis r4, fft_buffer@ha addi r4, r4, fft_buffer@l lwz r5, fft_size(r0) # FFT size (e.g., 1024) lwz r6, hop_size(r0) # Hop size for overlap (e.g., 512) lwz r7, num_frames(r0) # Number of audio frames to process subi r3, r3, 2 # Pre-adjust input pointer li r8, 0 # Current frame index spectral_analysis_loop: # Fill FFT buffer with windowed audio samples mr r9, r5 # Sample counter lis r10, fft_buffer@ha addi r10, r10, fft_buffer@l subi r10, r10, 2 # Pre-adjust FFT buffer pointer # Calculate input offset for current frame mullw r11, r8, r6 # frame_index * hop_size slwi r12, r11, 1 # Convert to byte offset lis r13, audio_input@ha addi r13, r13, audio_input@l add r14, r13, r12 # Input position for this frame subi r14, r14, 2 # Pre-adjust for lhzux # Load windowing function lis r15, window_function@ha addi r15, r15, window_function@l subi r15, r15, 2 # Pre-adjust window pointer window_sample_loop: li r16, 2 # Advance 2 bytes lhzux r17, r14, r16 # Load audio sample and advance li r16, 2 # Advance 2 bytes lhzux r18, r15, r16 # Load window coefficient and advance # Apply window function: windowed_sample = sample * window_coeff mullw r19, r17, r18 # Multiply sample by window srwi r20, r19, 15 # Normalize (assuming 15-bit window coefficients) li r16, 2 # Advance 2 bytes sthux r20, r10, r16 # Store windowed sample and advance subi r9, r9, 1 # Decrement sample counter cmpwi r9, 0 bne window_sample_loop # Continue windowing # Perform FFT on windowed data lis r21, fft_buffer@ha addi r21, r21, fft_buffer@l bl perform_fft_16bit # FFT with 16-bit input/output # Compute magnitude spectrum lis r22, magnitude_spectrum@ha addi r22, r22, magnitude_spectrum@l subi r22, r22, 2 # Pre-adjust magnitude pointer mr r23, r5 # FFT size counter srwi r24, r23, 1 # Half FFT size (real spectrum) lis r25, fft_buffer@ha addi r25, r25, fft_buffer@l subi r25, r25, 4 # Pre-adjust for complex pairs (4 bytes per complex) magnitude_loop: li r26, 2 # Advance 2 bytes lhzux r27, r25, r26 # Load real part and advance li r26, 2 # Advance 2 bytes lhzux r28, r25, r26 # Load imaginary part and advance # Calculate magnitude: sqrt(real² + imag²) mullw r29, r27, r27 # real² mullw r30, r28, r28 # imag² add r31, r29, r30 # real² + imag² # Fast integer square root approximation bl fast_sqrt_16bit # Returns sqrt in r31 li r26, 2 # Advance 2 bytes sthux r31, r22, r26 # Store magnitude and advance subi r24, r24, 1 # Decrement spectrum counter cmpwi r24, 0 bne magnitude_loop # Continue magnitude calculation # Analyze spectrum for features (peak detection, spectral centroid, etc.) bl analyze_spectrum_features # Update display or processing based on spectral features bl update_spectral_display addi r8, r8, 1 # Next frame cmpw r8, r7 # Check if done with all frames blt spectral_analysis_loop # Continue analysis