dcbt
Data Cache Block Touch - 7C 00 02 2C
dcbt
Instruction Syntax
| Mnemonic | Format | Flags |
| dcbt | rA,rB | - |
Instruction Encoding
0
1
1
1
1
1
0
0
0
0
0
A
A
A
A
A
B
B
B
B
B
0
1
0
0
0
1
0
1
1
0
0
| Field | Bits | Description |
| Primary Opcode | 0-5 | 011111 (0x1F) |
| Reserved | 6-10 | 00000 |
| rA | 11-15 | Register A (base address) |
| rB | 16-20 | Register B (index) |
| Reserved | 21 | 0 |
| XO | 22-30 | 100010110 (278) |
| Reserved | 31 | 0 |
Operation
if rA = 0 then EA ← (rB) else EA ← (rA) + (rB)
if block not in cache and EA is cacheable then load to cache
The data cache block touch instruction loads the cache block containing the effective address into the data cache if it is not already present and the address is cacheable. This is a performance hint to preload data that will be used soon.
Note: This instruction improves performance by reducing cache misses on subsequent accesses to the prefetched data.
Affected Registers
None - This instruction does not affect any registers.
For more information on cache management see Section 3.2, "Cache Management Instructions," in the PowerPC Microprocessor Family: The Programming Environments manual.
Examples
Basic Cache Prefetch
# Prefetch data that will be used soon lis r3, data_array@ha # Load high part of array address addi r3, r3, data_array@l # Complete array address dcbt 0, r3 # Prefetch first cache line dcbt 32, r3 # Prefetch second cache line
Loop Prefetching
# Prefetch data ahead of processing loop
lis r3, process_buffer@ha
addi r3, r3, process_buffer@l
li r4, 0 # Current offset
li r5, 2048 # Buffer size
li r6, 64 # Prefetch distance (2 cache lines ahead)
process_loop:
add r7, r3, r4 # Current address
add r8, r7, r6 # Prefetch address
dcbt 0, r8 # Prefetch data ahead
# Process current data
lwz r9, 0(r7) # Load current data
# ... process data ...
addi r4, r4, 32 # Next cache line
cmpwi r4, r5 # Check bounds
blt process_loop # Continue until done
Sequential Access Optimization
# Optimize sequential memory access pattern
lis r3, large_array@ha
addi r3, r3, large_array@l
li r4, 0 # Start offset
li r5, 8192 # Array size
li r6, 128 # Prefetch ahead distance
sequential_loop:
add r7, r3, r4 # Current position
dcbt r6, r7 # Prefetch ahead
# Process 32 bytes (one cache line)
lwz r8, 0(r7) # Process data
lwz r9, 4(r7)
lwz r10, 8(r7)
lwz r11, 12(r7)
# ... continue processing ...
addi r4, r4, 32 # Move to next cache line
cmpw r4, r5 # Check if done
blt sequential_loop # Continue processing
Matrix Processing Prefetch
# Prefetch for matrix operations
lis r3, matrix_a@ha
addi r3, r3, matrix_a@l
lis r4, matrix_b@ha
addi r4, r4, matrix_b@l
li r5, 0 # Row counter
li r6, 64 # Row size in bytes
matrix_loop:
mulli r7, r5, r6 # Calculate row offset
add r8, r3, r7 # Matrix A row address
add r9, r4, r7 # Matrix B row address
dcbt 0, r8 # Prefetch Matrix A row
dcbt 0, r9 # Prefetch Matrix B row
dcbt r6, r8 # Prefetch next Matrix A row
dcbt r6, r9 # Prefetch next Matrix B row
# Process current row
# ... matrix calculations ...
addi r5, r5, 1 # Next row
cmpwi r5, 16 # Check if done (16 rows)
blt matrix_loop # Continue processing