Execution units¶
The main characteristic of an execution unit is the way in which it will wake up the instruction which depend on it.
Static wake : For EU with fixed latency (no stall), a static wake can be asked to the issue queue. This can reduce the completion-to-use delay by two cycles
Dynamic wake : For EU with variable latency (which can stall), the EU is responsible to send a wakes commands to the issue queue.
Here is an illustration with two EU (one with static wake, one with dynamic wake) interacting with the issue queue to wake up dependent instructions
Custom instruction¶
There are multiple ways you can add custom instructions into NaxRiscv. The following chapter will provide some demo.
SIMD add¶
Let’s define a plugin which will implement a SIMD add (4x8bits adder), working on the integer register file.
The plugin will be based on the ExecutionUnitElementSimple which makes implementing ALU plugins simpler. Such a plugin can then be used to compose a given execution unit (hosted by a ExecutionUnitBase).
For instance the Plugin configuration could be :
plugins += new ExecutionUnitBase("ALU0")
plugins += new IntFormatPlugin("ALU0")
plugins += new SrcPlugin("ALU0", earlySrc = true)
plugins += new IntAluPlugin("ALU0", aluStage = 0)
plugins += new ShiftPlugin("ALU0" , aluStage = 0)
plugins += new ShiftPlugin("ALU0" , aluStage = 0)
plugins += new SimdAddPlugin("ALU0") // <- We will implement this plugin
Plugin implementation¶
Here is a example how this plugin could be implemented : (https://github.com/SpinalHDL/NaxRiscv/blob/d44ac3a3a3a4328cf2c654f9a46171511a798fae/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L36)
package naxriscv.execute
import spinal.core._
import spinal.lib._
import naxriscv._
import naxriscv.riscv._
import naxriscv.riscv.IntRegFile
import naxriscv.interfaces.{RS1, RS2}
import naxriscv.utilities.Plugin
//This plugin example will add a new instruction named SIMD_ADD which do the following :
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
//RD(16 downto 8) = RS1(16 downto 8) + RS2(16 downto 8)
//RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
//RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
//Instruction encoding :
//0000000----------000-----0001011 <- Custom0 func3=0 func7=0
// |RS2||RS1| |RD |
//
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
object SimdAddPlugin{
//Define the instruction type and encoding that we wll use
val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}
//ExecutionUnitElementSimple Is a base class which will be coupled to the pipeline provided by a ExecutionUnitBase with
//the same euId. It provide quite a few utilities to ease the implementation of custom instruction.
//Here we will implement a plugin which provide SIMD add on the register file.
//staticLatency=true specify that our plugin will never halt the pipeling, allowing the issue queue to statically
//wake up instruction which depend on its result.
class SimdAddPlugin(val euId : String) extends ExecutionUnitElementSimple(euId, staticLatency = true) {
//We will assume our plugin is fully combinatorial
override def euWritebackAt = 0
//The setup code is by plugins to specify things to each others before it is too late
//create early blockOfCode will
override val setup = create early new Setup{
//Let's assume we only support RV32 for now
assert(Global.XLEN.get == 32)
//Specify to the ExecutionUnitBase that the current plugin will implement the ADD4 instruction
add(SimdAddPlugin.ADD4)
}
override val logic = create late new Logic{
val process = new ExecuteArea(stageId = 0) {
//Get the RISC-V RS1/RS2 values from the register file
val rs1 = stage(eu(IntRegFile, RS1)).asUInt
val rs2 = stage(eu(IntRegFile, RS2)).asUInt
//Do some computation
val rd = UInt(32 bits)
rd( 7 downto 0) := rs1( 7 downto 0) + rs2( 7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
//Provide the computation value for the writeback
wb.payload := rd.asBits
}
}
}
NaxRiscv generation¶
Then, to generate a NaxRiscv with this new plugin, we could run the following App : (https://github.com/SpinalHDL/NaxRiscv/blob/d44ac3a3a3a4328cf2c654f9a46171511a798fae/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L71)
object SimdAddNaxGen extends App{
import naxriscv.compatibility._
import naxriscv.utilities._
def plugins = {
//Get a default list of plugins
val l = Config.plugins(
withRdTime = false,
aluCount = 2,
decodeCount = 2
)
//Add our plugin to the two ALUs
l += new SimdAddPlugin("ALU0")
l += new SimdAddPlugin("ALU1")
l
}
//Create a SpinalHDL configuration that will be used to generate the hardware
val spinalConfig = SpinalConfig(inlineRom = true)
spinalConfig.addTransformationPhase(new MemReadDuringWriteHazardPhase)
spinalConfig.addTransformationPhase(new MultiPortWritesSymplifier)
//Generate the NaxRiscv verilog file
val report = spinalConfig.generateVerilog(new NaxRiscv(xlen = 32, plugins))
//Generate some C header files used by the verilator testbench to connect to the DUT
report.toplevel.framework.getService[DocPlugin].genC()
}
To run this App, you can go to the NaxRiscv directory and run :
sbt "runMain naxriscv.execute.SimdAddNaxGen"
Software test¶
Then let’s write some assembly test code : (https://github.com/SpinalHDL/NaxSoftware/tree/849679c70b238ceee021bdfd18eb2e9809e7bdd0/baremetal/simdAdd)
.globl _start
_start:
#include "../../driver/riscv_asm.h"
#include "../../driver/sim_asm.h"
#include "../../driver/custom_asm.h"
//Test 1
li x1, 0x01234567
li x2, 0x01FF01FF
opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) //x3 = ADD4(x1, x2)
//Print result value
li x4, PUT_HEX
sw x3, 0(x4)
//Check result
li x5, 0x02224666
bne x3, x5, fail
j pass
pass:
j pass
fail:
j fail
Compile it with
make clean rv32im
Simulation¶
And the run a simulation in src/test/cpp/naxriscv (You will have to setup things as described in its readme first)
make clean compile
./obj_dir/VNaxRiscv --load-elf ../../../../ext/NaxSoftware/baremetal/simdAdd/build/rv32im/simdAdd.elf --spike-disable --pass-symbol pass --fail-symbol fail --trace
Which will output the value 2224666 in the shell :D
Conclusion¶
So overall this example didn’t introduce how to specify some additional decoding, nor how to define multi-cycle ALU. (TODO). But you can take a look in the IntAluPlugin, ShiftPlugin, DivPlugin, MulPlugin and BranchPlugin which are doing those things using the same ExecutionUnitElementSimple base class.
Also, you don’t have to use the ExecutionUnitElementSimple base class, you can have more fundamental accesses, as the LoadPlugin, StorePlugin, EnvCallPlugin.
Hardcore way¶
Note, here is an example of the same instruction, but implemented without the ExecutionUnitElementSimple facilities : (https://github.com/SpinalHDL/NaxRiscv/blob/72b80e3345ecc3a25ca913f2b741e919a3f4c970/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L100)
object SimdAddRawPlugin{
val SEL = Stageable(Bool()) //Will be used to identify when we are executing a ADD4
val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}
class SimdAddRawPlugin(euId : String) extends Plugin {
import SimdAddRawPlugin._
val setup = create early new Area{
val eu = findService[ExecutionUnitBase](_.euId == euId)
eu.retain() //We don't want the EU to generate itself before we are done with it
//Specify all the ADD4 requirements
eu.addMicroOp(ADD4)
eu.setCompletion(ADD4, stageId = 0)
eu.setStaticWake(ADD4, stageId = 0)
eu.setDecodingDefault(SEL, False)
eu.addDecoding(ADD4, SEL, True)
//IntFormatPlugin provide a shared point to write into the register file with some optional carry extensions
val intFormat = findService[IntFormatPlugin](_.euId == euId)
val writeback = intFormat.access(stageId = 0, writeLatency = 0)
}
val logic = create late new Area{
val eu = setup.eu
val writeback = setup.writeback
val stage = eu.getExecute(stageId = 0)
//Get the RISC-V RS1/RS2 values from the register file
val rs1 = stage(eu(IntRegFile, RS1)).asUInt
val rs2 = stage(eu(IntRegFile, RS2)).asUInt
//Do some computation
val rd = UInt(32 bits)
rd( 7 downto 0) := rs1( 7 downto 0) + rs2( 7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
//Provide the computation value for the writeback
writeback.valid := stage(SEL)
writeback.payload := rd.asBits
//Now the EU has every requirements set for the generation (from this plugin perspective)
eu.release()
}
}