Execution units

The main characteristic of an execution unit is the way in which it will wake up the instruction which depend on it.

  • Static wake : For EU with fixed latency (no stall), a static wake can be asked to the issue queue. This can reduce the completion-to-use delay by two cycles

  • Dynamic wake : For EU with variable latency (which can stall), the EU is responsible to send a wakes commands to the issue queue.

Here is an illustration with two EU (one with static wake, one with dynamic wake) interacting with the issue queue to wake up dependent instructions

../../_images/eu_wakes.png

Custom instruction

There are multiple ways you can add custom instructions into NaxRiscv. The following chapter will provide some demo.

SIMD add

Let’s define a plugin which will implement a SIMD add (4x8bits adder), working on the integer register file.

The plugin will be based on the ExecutionUnitElementSimple which makes implementing ALU plugins simpler. Such a plugin can then be used to compose a given execution unit (hosted by a ExecutionUnitBase).

For instance the Plugin configuration could be :

plugins += new ExecutionUnitBase("ALU0")
plugins += new IntFormatPlugin("ALU0")
plugins += new SrcPlugin("ALU0", earlySrc = true)
plugins += new IntAluPlugin("ALU0", aluStage = 0)
plugins += new ShiftPlugin("ALU0" , aluStage = 0)
plugins += new ShiftPlugin("ALU0" , aluStage = 0)
plugins += new SimdAddPlugin("ALU0") // <- We will implement this plugin

Plugin implementation

Here is a example how this plugin could be implemented : (https://github.com/SpinalHDL/NaxRiscv/blob/d44ac3a3a3a4328cf2c654f9a46171511a798fae/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L36)

package naxriscv.execute

import spinal.core._
import spinal.lib._
import naxriscv._
import naxriscv.riscv._
import naxriscv.riscv.IntRegFile
import naxriscv.interfaces.{RS1, RS2}
import naxriscv.utilities.Plugin

//This plugin example will add a new instruction named SIMD_ADD which do the following :
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto  0) = RS1( 7 downto  0) + RS2( 7 downto  0)
//RD(16 downto  8) = RS1(16 downto  8) + RS2(16 downto  8)
//RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
//RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
//Instruction encoding :
//0000000----------000-----0001011   <- Custom0 func3=0 func7=0
//       |RS2||RS1|   |RD |
//
//Note :  RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA


object SimdAddPlugin{
  //Define the instruction type and encoding that we wll use
  val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}

//ExecutionUnitElementSimple Is a base class which will be coupled to the pipeline provided by a ExecutionUnitBase with
//the same euId. It provide quite a few utilities to ease the implementation of custom instruction.
//Here we will implement a plugin which provide SIMD add on the register file.
//staticLatency=true specify that our plugin will never halt the pipeling, allowing the issue queue to statically
//wake up instruction which depend on its result.
class SimdAddPlugin(val euId : String) extends ExecutionUnitElementSimple(euId, staticLatency = true) {
  //We will assume our plugin is fully combinatorial
  override def euWritebackAt = 0

  //The setup code is by plugins to specify things to each others before it is too late
  //create early blockOfCode will
  override val setup = create early new Setup{
    //Let's assume we only support RV32 for now
    assert(Global.XLEN.get == 32)

    //Specify to the ExecutionUnitBase that the current plugin will implement the ADD4 instruction
    add(SimdAddPlugin.ADD4)
  }

  override val logic = create late new Logic{
    val process = new ExecuteArea(stageId = 0) {
      //Get the RISC-V RS1/RS2 values from the register file
      val rs1 = stage(eu(IntRegFile, RS1)).asUInt
      val rs2 = stage(eu(IntRegFile, RS2)).asUInt

      //Do some computation
      val rd = UInt(32 bits)
      rd( 7 downto  0) := rs1( 7 downto  0) + rs2( 7 downto  0)
      rd(16 downto  8) := rs1(16 downto  8) + rs2(16 downto  8)
      rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
      rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)

      //Provide the computation value for the writeback
      wb.payload := rd.asBits
    }
  }
}

NaxRiscv generation

Then, to generate a NaxRiscv with this new plugin, we could run the following App : (https://github.com/SpinalHDL/NaxRiscv/blob/d44ac3a3a3a4328cf2c654f9a46171511a798fae/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L71)

object SimdAddNaxGen extends App{
  import naxriscv.compatibility._
  import naxriscv.utilities._

  def plugins = {
    //Get a default list of plugins
    val l = Config.plugins(
      withRdTime = false,
      aluCount    = 2,
      decodeCount = 2
    )
    //Add our plugin to the two ALUs
    l += new SimdAddPlugin("ALU0")
    l += new SimdAddPlugin("ALU1")
    l
  }

  //Create a SpinalHDL configuration that will be used to generate the hardware
  val spinalConfig = SpinalConfig(inlineRom = true)
  spinalConfig.addTransformationPhase(new MemReadDuringWriteHazardPhase)
  spinalConfig.addTransformationPhase(new MultiPortWritesSymplifier)

  //Generate the NaxRiscv verilog file
  val report = spinalConfig.generateVerilog(new NaxRiscv(xlen = 32, plugins))

  //Generate some C header files used by the verilator testbench to connect to the DUT
  report.toplevel.framework.getService[DocPlugin].genC()
}

To run this App, you can go to the NaxRiscv directory and run :

sbt "runMain naxriscv.execute.SimdAddNaxGen"

Software test

Then let’s write some assembly test code : (https://github.com/SpinalHDL/NaxSoftware/tree/849679c70b238ceee021bdfd18eb2e9809e7bdd0/baremetal/simdAdd)

.globl _start
_start:

#include "../../driver/riscv_asm.h"
#include "../../driver/sim_asm.h"
#include "../../driver/custom_asm.h"

    //Test 1
    li x1, 0x01234567
    li x2, 0x01FF01FF
    opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) //x3 = ADD4(x1, x2)

    //Print result value
    li x4, PUT_HEX
    sw x3, 0(x4)

    //Check result
    li x5, 0x02224666
    bne x3, x5, fail

    j pass

pass:
    j pass
fail:
    j fail

Compile it with

make clean rv32im

Simulation

And the run a simulation in src/test/cpp/naxriscv (You will have to setup things as described in its readme first)

make clean compile
./obj_dir/VNaxRiscv --load-elf ../../../../ext/NaxSoftware/baremetal/simdAdd/build/rv32im/simdAdd.elf --spike-disable --pass-symbol pass --fail-symbol fail --trace

Which will output the value 2224666 in the shell :D

Conclusion

So overall this example didn’t introduce how to specify some additional decoding, nor how to define multi-cycle ALU. (TODO). But you can take a look in the IntAluPlugin, ShiftPlugin, DivPlugin, MulPlugin and BranchPlugin which are doing those things using the same ExecutionUnitElementSimple base class.

Also, you don’t have to use the ExecutionUnitElementSimple base class, you can have more fundamental accesses, as the LoadPlugin, StorePlugin, EnvCallPlugin.

Hardcore way

Note, here is an example of the same instruction, but implemented without the ExecutionUnitElementSimple facilities : (https://github.com/SpinalHDL/NaxRiscv/blob/72b80e3345ecc3a25ca913f2b741e919a3f4c970/src/main/scala/naxriscv/execute/SimdAddPlugin.scala#L100)

object SimdAddRawPlugin{
  val SEL = Stageable(Bool()) //Will be used to identify when we are executing a ADD4
  val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}

class SimdAddRawPlugin(euId : String) extends Plugin {
  import SimdAddRawPlugin._
  val setup = create early new Area{
    val eu = findService[ExecutionUnitBase](_.euId == euId)
    eu.retain() //We don't want the EU to generate itself before we are done with it

    //Specify all the ADD4 requirements
    eu.addMicroOp(ADD4)
    eu.setCompletion(ADD4, stageId = 0)
    eu.setStaticWake(ADD4, stageId = 0)
    eu.setDecodingDefault(SEL, False)
    eu.addDecoding(ADD4, SEL, True)

    //IntFormatPlugin provide a shared point to write into the register file with some optional carry extensions
    val intFormat = findService[IntFormatPlugin](_.euId == euId)
    val writeback = intFormat.access(stageId = 0, writeLatency = 0)
  }

  val logic = create late new Area{
    val eu = setup.eu
    val writeback = setup.writeback
    val stage = eu.getExecute(stageId = 0)

    //Get the RISC-V RS1/RS2 values from the register file
    val rs1 = stage(eu(IntRegFile, RS1)).asUInt
    val rs2 = stage(eu(IntRegFile, RS2)).asUInt

    //Do some computation
    val rd = UInt(32 bits)
    rd( 7 downto  0) := rs1( 7 downto  0) + rs2( 7 downto  0)
    rd(16 downto  8) := rs1(16 downto  8) + rs2(16 downto  8)
    rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
    rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)

    //Provide the computation value for the writeback
    writeback.valid   := stage(SEL)
    writeback.payload := rd.asBits

    //Now the EU has every requirements set for the generation (from this plugin perspective)
    eu.release()
  }
}