dcbt
Data Cache Block Touch - 7C 00 02 2C
dcbt
Instruction Syntax
Mnemonic | Format | Flags |
dcbt | rA,rB | - |
Instruction Encoding
0
1
1
1
1
1
0
0
0
0
0
A
A
A
A
A
B
B
B
B
B
0
1
0
0
0
1
0
1
1
0
0
Field | Bits | Description |
Primary Opcode | 0-5 | 011111 (0x1F) |
Reserved | 6-10 | 00000 |
rA | 11-15 | Register A (base address) |
rB | 16-20 | Register B (index) |
Reserved | 21 | 0 |
XO | 22-30 | 100010110 (278) |
Reserved | 31 | 0 |
Operation
if rA = 0 then EA ← (rB) else EA ← (rA) + (rB)
if block not in cache and EA is cacheable then load to cache
The data cache block touch instruction loads the cache block containing the effective address into the data cache if it is not already present and the address is cacheable. This is a performance hint to preload data that will be used soon.
Note: This instruction improves performance by reducing cache misses on subsequent accesses to the prefetched data.
Affected Registers
None - This instruction does not affect any registers.
For more information on cache management see Section 3.2, "Cache Management Instructions," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Basic Cache Prefetch
# Prefetch data that will be used soon lis r3, data_array@ha # Load high part of array address addi r3, r3, data_array@l # Complete array address dcbt 0, r3 # Prefetch first cache line dcbt 32, r3 # Prefetch second cache line
Loop Prefetching
# Prefetch data ahead of processing loop lis r3, process_buffer@ha addi r3, r3, process_buffer@l li r4, 0 # Current offset li r5, 2048 # Buffer size li r6, 64 # Prefetch distance (2 cache lines ahead) process_loop: add r7, r3, r4 # Current address add r8, r7, r6 # Prefetch address dcbt 0, r8 # Prefetch data ahead # Process current data lwz r9, 0(r7) # Load current data # ... process data ... addi r4, r4, 32 # Next cache line cmpwi r4, r5 # Check bounds blt process_loop # Continue until done
Sequential Access Optimization
# Optimize sequential memory access pattern lis r3, large_array@ha addi r3, r3, large_array@l li r4, 0 # Start offset li r5, 8192 # Array size li r6, 128 # Prefetch ahead distance sequential_loop: add r7, r3, r4 # Current position dcbt r6, r7 # Prefetch ahead # Process 32 bytes (one cache line) lwz r8, 0(r7) # Process data lwz r9, 4(r7) lwz r10, 8(r7) lwz r11, 12(r7) # ... continue processing ... addi r4, r4, 32 # Move to next cache line cmpw r4, r5 # Check if done blt sequential_loop # Continue processing
Matrix Processing Prefetch
# Prefetch for matrix operations lis r3, matrix_a@ha addi r3, r3, matrix_a@l lis r4, matrix_b@ha addi r4, r4, matrix_b@l li r5, 0 # Row counter li r6, 64 # Row size in bytes matrix_loop: mulli r7, r5, r6 # Calculate row offset add r8, r3, r7 # Matrix A row address add r9, r4, r7 # Matrix B row address dcbt 0, r8 # Prefetch Matrix A row dcbt 0, r9 # Prefetch Matrix B row dcbt r6, r8 # Prefetch next Matrix A row dcbt r6, r9 # Prefetch next Matrix B row # Process current row # ... matrix calculations ... addi r5, r5, 1 # Next row cmpwi r5, 16 # Check if done (16 rows) blt matrix_loop # Continue processing