dcbtst
Data Cache Block Touch for Store - 7C 00 01 EC
dcbtst
Instruction Syntax
Mnemonic | Format | Flags |
dcbtst | rA,rB | - |
Instruction Encoding
0
1
1
1
1
1
0
0
0
0
0
A
A
A
A
A
B
B
B
B
B
0
0
1
1
1
1
0
1
1
0
0
Field | Bits | Description |
Primary Opcode | 0-5 | 011111 (0x1F) |
Reserved | 6-10 | 00000 |
rA | 11-15 | Register A (base address) |
rB | 16-20 | Register B (index) |
Reserved | 21 | 0 |
XO | 22-30 | 011110110 (246) |
Reserved | 31 | 0 |
Operation
if rA = 0 then EA ← (rB) else EA ← (rA) + (rB)
if block not in cache and EA is cacheable then load to cache for store
The data cache block touch for store instruction loads the cache block containing the effective address into the data cache if it is not already present and the address is cacheable. This instruction indicates that the cache block will be modified soon, allowing the cache to optimize for write operations.
Note: This instruction is a performance hint for write-intensive operations, potentially obtaining exclusive ownership of the cache line.
Affected Registers
None - This instruction does not affect any registers.
For more information on cache management see Section 3.2, "Cache Management Instructions," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Basic Store Prefetch
# Prefetch cache line that will be written to lis r3, output_buffer@ha # Load high part of buffer address addi r3, r3, output_buffer@l # Complete buffer address dcbtst 0, r3 # Prefetch for store operations stw r4, 0(r3) # Store will be faster
Buffer Initialization
# Prefetch cache lines before initializing buffer lis r3, init_buffer@ha addi r3, r3, init_buffer@l li r4, 0 # Start offset li r5, 1024 # Buffer size li r6, 32 # Cache line size init_loop: add r7, r3, r4 # Calculate address dcbtst 0, r7 # Prefetch for store # Initialize cache line (32 bytes) li r8, 0x12345678 # Pattern to store stw r8, 0(r7) # Store pattern stw r8, 4(r7) stw r8, 8(r7) stw r8, 12(r7) stw r8, 16(r7) stw r8, 20(r7) stw r8, 24(r7) stw r8, 28(r7) add r4, r4, r6 # Next cache line cmpw r4, r5 # Check if done blt init_loop # Continue if more data
Memory Copy Optimization
# Optimize memory copy with prefetching lis r3, source_addr@ha addi r3, r3, source_addr@l lis r4, dest_addr@ha addi r4, r4, dest_addr@l li r5, 0 # Offset li r6, 2048 # Copy size li r7, 64 # Prefetch distance copy_loop: add r8, r3, r5 # Source address add r9, r4, r5 # Destination address add r10, r8, r7 # Prefetch source add r11, r9, r7 # Prefetch dest dcbt 0, r10 # Prefetch source for read dcbtst 0, r11 # Prefetch dest for write # Copy 32 bytes (one cache line) lwz r12, 0(r8) lwz r13, 4(r8) lwz r14, 8(r8) lwz r15, 12(r8) stw r12, 0(r9) stw r13, 4(r9) stw r14, 8(r9) stw r15, 12(r9) lwz r12, 16(r8) lwz r13, 20(r8) lwz r14, 24(r8) lwz r15, 28(r8) stw r12, 16(r9) stw r13, 20(r9) stw r14, 24(r9) stw r15, 28(r9) addi r5, r5, 32 # Next cache line cmpwi r5, r6 # Check bounds blt copy_loop # Continue copying
Array Processing with Store Prefetch
# Process array with write prefetching lis r3, input_array@ha addi r3, r3, input_array@l lis r4, output_array@ha addi r4, r4, output_array@l li r5, 0 # Index li r6, 512 # Array elements li r7, 128 # Prefetch ahead (32 elements) process_loop: mulli r8, r5, 4 # Calculate byte offset add r9, r3, r8 # Input address add r10, r4, r8 # Output address add r11, r10, r7 # Prefetch output address dcbtst 0, r11 # Prefetch output for store lwz r12, 0(r9) # Load input value # Process data (example: multiply by 2) slwi r12, r12, 1 # Shift left (multiply by 2) stw r12, 0(r10) # Store result addi r5, r5, 1 # Next element cmpw r5, r6 # Check if done blt process_loop # Continue processing