Using MIPS microAptiv UP Processor CorExtend UDI interface

CorExtend is a feature of MIPS32 microAptiv microprocessor which is presented in MIPSfpga project as a real industrial unobfuscated RTL. Sources of MIPSfpga can be downloaded after joining Imagination University Programme https://community.imgtec.com/university/. CorExtend allows system designers to define and add their own instructions that operate on data in the general-purpose registers in the same manner as standard MIPS instructions.

This post describes CorExtend or User-Defined Instructions (UDI) interface protocol. The interface allows to connect custom CorExtend UDI block directly to the MIPS32 microAptiv UP processor core.

This post can be also download as a pdf file by the link MIPS microAptiv UP Processor CorExtend UDI interface protocol guide.

Example project with simulation sources can be downloaded on GitHub https://github.com/zatslogic/UDI_example. The project is described further in this post.

Position of CorExtend in the top-level RTL hierarchy of a m14k microAptiv processor core is shown below.
CorExtend RTL Hierarchy
CorExtend RTL Hierarchy
All core signals at the m14k_cpu level, including CorExtend UDI, are listed in MIPS32 microAptiv UP Processor Core Family Integrators Guide (Table 2.3 Signal Descriptions for m14k cpu Level). In the table below only signals related to CorExtend UDI are presented.


Description of CorExtend signals connected to m14k cpu
Signal Name Type Description
UDI_ir_e[31:0] Out This is the complete instruction word. Although the module also gets rs and rt source
operands, the full instruction is provided so all or part of the source register fields may
be used to hold immediate values. Note that the implementer is responsible for decoding
the Opcode and Function fields.
UDI_irvalid_e Out Indicates whether the value of the instruction word (UDI_ir_e) is valid or not.
UDI_rs_e[31:0] Out Source operand rs after the bypass mux.
UDI_rt_e[31:0] Out Source operand rt after the bypass mux.
UDI_endianb_e Out Indicates that this instruction is executing in Big Endian mode. This signal is generally not needed unless a) the UDI instruction works on sub-word data that is endian dependent, and b) the UDI block is designed to be bi-endian
UDI_kd_mode e Out Indicates that the instruction is executing in kernel or debug mode. This can be used to
prevent certain UDI instructions from being executed in user mode.
UDI_kill_m Out Late arriving kill signal due to an exception generated by an earlier instruction. This
signal may optionally be used to deassert the UDI_stall_m output for improved interrupt
latency on multi-cycle UDIs whose results wont be used.
UDI_start_e Out This is the mpc_run_ie signal coming from the core pipeline control logic.
UDI_run_m Out This is the mpc_run_m signal used to qualify UDI_kill_m.
UDI_greset Out Reset signal to be used to reset any state machines.
UDI_gclk Out Clock input.
UDI_gscanenable Out Global scan enable.
UDI_ri_e In A one-bit signal which when high indicates that the SPECIAL2 instruction currently
being executed is illegal (i.e., reserved). This signal is used by the Master Pipeline
Control (MPC) block within the core to signal an illegal instruction, however, this signal
is sampled by MPC only if the current instruction is within the SPECIAL2 range of
user-defined instructions (bits [5:4] of the instruction are 2’b01).
UDI_rd_m[31:0] In The 32-bit result of the executed instruction available in the M stage.
UDI_wrreg_e[4:0] In Register to write the result from the execution of this user-defined instruction. This
value is also passed on to mpc.
UDI_stall_m In Signals that the UDI block is processing a multicycle instruction and needs to stall the
pipeline since the outputs need to be written into the register file. Should be set to 0 for
single cycle instructions. This is an M stage signal.
UDI_present In Static signal that denotes whether any UDI support is available.
UDI_honor_cee In Indicates whether the core should honor the CorExtend Enable (CEE) bit contained in
the Status register. When this signal is asserted, Status.CEE is deasserted, and a UDI
operation is attempted, the core will take a CorExtend Unusable Exception.

 
In addition to signals connected to m14k cpu, custom CorExtend block has external signals (table below) with variable width propagated out of m14k top.

 

Description of external CorExtend signals
Signal Name Type Description
UDI_toudi[x-1:0] In Variable-width external input to a custom CorExtend block.
UDI_fromudi[x-1:0] Out Variable-width external output from a custom CorExtend block.

 

 

In order to implement custom CorExtend block, m14k_edp_buf misc and m14k_udi_stub should be modified. Input and output signals of m14k_edp_buf_misc should be connected to each other, for example, like this.



assign UDI_ir_e[31:0] = mpc_ir_e ;
assign UDI_irvalid_e = mpc_irval_e ;
assign UDI_rs_e[31:0] = edp_abus_e ;
assign UDI_rt_e[31:0] = edp_bbus_e ;
assign UDI_endianb_e = cpz_rbigend_e ;
assign UDI_kd_mode_e = cpz_kuc_e ;
assign UDI_kill_m = mpc_killmd_m ;
assign UDI_start_e = mpc_run_ie ;
assign UDI_run_m = mpc_run_m ;
assign UDI_greset = greset ;
assign UDI_gscanenable = gscanenable ;
assign UDI_gclk = gclk ;
assign edp_udi_wrreg_e[4:0] = UDI_wrreg_e ;
assign edp_udi_ri_e = UDI_ri_e ;
assign edp_udi_stall_m = UDI_stall_m ;
assign edp_udi_present = UDI_present ;
assign edp_udi_honor_cee = UDI_honor_cee ;
mvp_mux2 #(32) _res_m_31_0_(res_m[31:0],mpc_udislt_sel_m, asp_m, UDI_rd_m);

Actual custom CorExtend block should replace m14k_udi_stub. Example of interaction between CorExtend and the microAptiv UP core is presented on the waveform below.

CorExtend interface protocol waveform

CorExtend interface protocol waveform
The UDI_present signal must be tied high. UDI_honor_cee can be tied low; in case it is tied high, Status CEE bit must be asserted using mtc0 instruction before any attempt to execute a CorExtend instruction. Otherwise the CorExtend unusable exception will occur and UDI_kill_m will be set during two clock cycles on the next clock cycle after UDI_start_e is asserted.

 

Every instruction word being executed by the core arrives on UDI_ir_e[31:0] with UDI_irvalid_e signal. UDI_start_e indicates the execution stage of the microAptiv UP core pipeline. If instruction has RS and/or RT operands, they arrive correspondingly on UDI_rs_e[31:0] and UDI_rt_e[31:0] with the UDI_start_e signal.
Some parts of an instruction must be decoded on the same cycle with UDI_start_e arriving. It is crucial for forming UDI_ri_e, which must be asserted on the same cycle with UDI_start_e if the instruction is illegal. If the instruction has to write the result to the processor’s general-purpose register, the address of RD must be presented on UDI_wrreg_e[4:0] on the same cycle with UDI_start_e. Other fields of the instruction may be registered and decoded later.
The signal UDI_wrreg_e[4:0] can address 31 processor’s general-purpose registers; value 5’d0 means not writing to them.
The result of the UDI instruction to be written to the register file must be presented on UDI_rd_m[31:0] on the next cycle after UDI_start_e. In case it should be written later, UDI_stall_m must be asserted on the next clock cycle after UDI_start_e. UDI_stall_m must be deasserted in the clock cycle before the result is present on UDI_rd_m[31:0].
Figure 3 represents the UDI instruction format. Major opcode of UDI is included in special2 major opcodes and equals 6’d28. RS and RT fields address source operand registers. Bits 15..6 may be used for custom CorExtend block purposes. For example, the address of the destination register to write the result can be written there. Function field has bits 5..4 with a mandatory value of 2’b01 and bits 3..0 capable of encoding up to 16 UDI instructions.
UDI instruction
Implementation of a custom CorExtend block is illustrated by the following example of the DSP accelerator block.

The block performs several closely related operations. It calculates instantaneous power of a quadrature signal P(t) which is defined as

P(t) = a2(t) + b2(t)
where a(t) and b(t) are correspondingly real and imaginary parts of a quadrature signal.
This operation is useful for signal detection using comparing with a threshold.
Table below shows a list of implemented UDI instructions.
List of implemented UDI instructions
Instruction Explanation Function field
UDI0 RD; RS; RT RD = RS[31:16]2 + RT[31:16]2 6’b010000
UDI1 RD; RS; RT RD = (RS[31:16]2 + RT[31:16]2) >> 1 6’b010001
UDI2 RD; RS RD = RS[31:16]2 6’b010010
UDI3 RS stored_threshold = RS 6’b010011
UDI4 RD; RS; RT RD = ( (RS[31:16]2 + RT[31:16]2) > stored_threshold ) ? 1:0 6’b010100
UDI5 RD; RS; RT RD = ( ((RS[31:16]2 + RT[31:16]2) >> 1) > stored_threshold ) ? 1:0 6’b010101
UDI6 RD; RS; RT RD = ( RS[31:16]2 > stored_threshold ) ? 1:0 6’b010110
UDI0 calculates instantaneous power. RS and RT are source operands which contain 16-bit real and imaginary parts of a quadrature signal. The 32-bit result is put in a RD destination register.

 

UDI1 does essentially the same operation as UDI0. The difference is that UDI1 shifts the result to prevent overflow.

 

UDI2 calculates instantaneous power using only real part of a quadrature signal. RT operand is not used.

 

UDI3 stores 32-bit threshold value in an internal register of the CorExtend block, no result is returned.

 

UDI4, UDI5, and UDI6 correspondingly do UDI0, UDI1, and UDI2 operations and compare the result with the stored threshold value. If it is exceeded, a value of 32’d1 is returned. Otherwise, a value of 32’d0 is returned.

 

All instructions, except UDI3, write results to the register file and, therefore, require the address of the destination register. To that end, field RD was included in the instruction word structure, as shown in figure below.
Example of custom UDI instruction
The code listing below shows the program written in MIPS assembler for testing all developed UDI instructions.


Machine Code Instruction Address Assembly Code
3c088000 // bfc00000: lui $8, 0x8000
3c09beaf // bfc00004: lui $9, 0xbeaf
71095010 // bfc00008: udi0 $8 $9 $10
71095011 // bfc0000c: udi1 $8 $9 $10
71005012 // bfc00010: udi2 $8 $10
3c0bbeaf // bfc00014: lui $11, 0xbeaf
356bdead // bfc00018: ori $11,$11, 0xdead
71600013 // bfc0001c: udi3 $11
71095014 // bfc00020: L1: udi4 $8 $9 $10
71095015 // bfc00024: udi5 $8 $9 $10
71095016 // bfc00028: udi6 $8 $9 $10
3c0b0001 // bfc0002c: lui $11, 0x0001
356bfeed // bfc00030: ori $11,$11, 0xfeed
71600013 // bfc00034: udi3 $11
1000fff9 // bfc00038: beq $0, $0, L1
00000000 // bfc0003c: nop

Example project that implements in Verilog custom CorExtend block from the example above can be downloaded with the link https://github.com/zatslogic/UDI_example.

 

The project includes all sources needed for simulation except the files from rtl_up directory. To obtain them, you need to register in the Imagination University Programme and make a request for downloading (https://community.imgtec.com/downloads/mipsfpga-getting-started-version-1-2/). You may also need XilinxCorelib for simualtion. It can be compiled in Vivado using tcl command compile_simlib.
Example project has two variants of custom CorExtend block. The first one performs all UDI instructions in one cycle. The second one has additional pipelining and requires more cycles for some instructions. It was made especially to utilize UDI_stall_m signal.

 

Waveforms below show simulation of the assembler program from the above.
In the first unpipelined variant first three instructions UDI0, UDI1, and UDI2 are executed as show in the figure below (click to enlarge).
It can be seen that instructions arrives on UDI_ir_e with the signals UDI_irvalid_e and UDI_start_e. Operands are valid in the same cycle on UDI_rs_e and UDI_rt_e. The address of the GPR register to write the result is also formed in this very cycle. In the next cycle the result is valid on UDI_rd_m.
The signals from the register file (rf) are presented in the waveform as well. The result is written to the GPR with address displayed on mpc_dest_w. The data value can be seen on edp_wrdata_w with the strobe on mpc_rfwrite_w. Addresses of the operands being read from GPR are presented on mpc_rega_i and mpc_regb_i.
 
In the figure below the instructions UDI3, UDI4, UDI5, and UDI6 are shown.
As can be seen from the code listing, UDI3 writes a value 0xbeafdead to stored_threshold. It is a value of zero, that is written to the result since none of the computation products have exceeded threshold.
 
In the next waveform instructions UDI3, UDI4, UDI5, and UDI6 are executed again after a conditional jump was taken. Here the threshold value is lower than the computational products, and thus the results of executing these instructions are 0x000001.
 
 
The next three waveforms shows simulation of pipelined UDI block.
 
In the figure below instructions UDI0, UDI1, and UDI2 are executed. UDI_stall_m asserted while computation is being done in the UDI block. The result arrives on UDI_rd_m in the next cycle after deasserting UDI_stall_m. In the further cycle the result is written to the GPR.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed with signal UDI_stall_m.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed. The result value is different from the figure above.