dcbt
Data Cache Block Touch - 7C 00 02 2C
dcbt

Instruction Syntax

Mnemonic Format Flags
dcbt rA,rB -

Instruction Encoding

0
1
1
1
1
1
0
0
0
0
0
A
A
A
A
A
B
B
B
B
B
0
1
0
0
0
1
0
1
1
0
0

Field Bits Description
Primary Opcode 0-5 011111 (0x1F)
Reserved 6-10 00000
rA 11-15 Register A (base address)
rB 16-20 Register B (index)
Reserved 21 0
XO 22-30 100010110 (278)
Reserved 31 0

Operation

if rA = 0 then EA ← (rB)
else EA ← (rA) + (rB)

if block not in cache and EA is cacheable then load to cache

The data cache block touch instruction loads the cache block containing the effective address into the data cache if it is not already present and the address is cacheable. This is a performance hint to preload data that will be used soon.

Note: This instruction improves performance by reducing cache misses on subsequent accesses to the prefetched data.

Affected Registers

None - This instruction does not affect any registers.

For more information on cache management see Section 3.2, "Cache Management Instructions," in the PowerPC Microprocessor Family: The Programming Environments manual.

Examples

Basic Cache Prefetch

# Prefetch data that will be used soon
lis r3, data_array@ha   # Load high part of array address
addi r3, r3, data_array@l   # Complete array address
dcbt 0, r3              # Prefetch first cache line
dcbt 32, r3             # Prefetch second cache line

Loop Prefetching

# Prefetch data ahead of processing loop
lis r3, process_buffer@ha
addi r3, r3, process_buffer@l
li r4, 0                # Current offset
li r5, 2048             # Buffer size
li r6, 64               # Prefetch distance (2 cache lines ahead)

process_loop:
    add r7, r3, r4      # Current address
    add r8, r7, r6      # Prefetch address
    dcbt 0, r8          # Prefetch data ahead
    
    # Process current data
    lwz r9, 0(r7)       # Load current data
    # ... process data ...
    
    addi r4, r4, 32     # Next cache line
    cmpwi r4, r5        # Check bounds
    blt process_loop    # Continue until done

Sequential Access Optimization

# Optimize sequential memory access pattern
lis r3, large_array@ha
addi r3, r3, large_array@l
li r4, 0                # Start offset
li r5, 8192             # Array size
li r6, 128              # Prefetch ahead distance

sequential_loop:
    add r7, r3, r4      # Current position
    dcbt r6, r7         # Prefetch ahead
    
    # Process 32 bytes (one cache line)
    lwz r8, 0(r7)       # Process data
    lwz r9, 4(r7)
    lwz r10, 8(r7)
    lwz r11, 12(r7)
    # ... continue processing ...
    
    addi r4, r4, 32     # Move to next cache line
    cmpw r4, r5         # Check if done
    blt sequential_loop # Continue processing

Matrix Processing Prefetch

# Prefetch for matrix operations
lis r3, matrix_a@ha
addi r3, r3, matrix_a@l
lis r4, matrix_b@ha
addi r4, r4, matrix_b@l
li r5, 0                # Row counter
li r6, 64               # Row size in bytes

matrix_loop:
    mulli r7, r5, r6    # Calculate row offset
    add r8, r3, r7      # Matrix A row address
    add r9, r4, r7      # Matrix B row address
    
    dcbt 0, r8          # Prefetch Matrix A row
    dcbt 0, r9          # Prefetch Matrix B row
    dcbt r6, r8         # Prefetch next Matrix A row
    dcbt r6, r9         # Prefetch next Matrix B row
    
    # Process current row
    # ... matrix calculations ...
    
    addi r5, r5, 1      # Next row
    cmpwi r5, 16        # Check if done (16 rows)
    blt matrix_loop     # Continue processing

Related Instructions

dcba, dcbf, dcbi, dcbst, dcbtst, dcbz, icbt

Back to Index