Custom instruction
There are multiple ways you can add custom instructions into VexiiRiscv. The following chapter will provide some demo.
SIMD add
Let's define a plugin which will implement a SIMD add (4x8bits adder), working on the integer register file.
The plugin will be based on the ExecutionUnitElementSimple which makes implementing ALU plugins simpler. Such a plugin can then be used to compose a given execution lane layer
For instance the Plugin configuration could be :
plugins += new SrcPlugin(early0, executeAt = 0, relaxedRs = relaxedSrc)
plugins += new IntAluPlugin(early0, formatAt = 0)
plugins += new BarrelShifterPlugin(early0, formatAt = relaxedShift.toInt)
plugins += new IntFormatPlugin("lane0")
plugins += new BranchPlugin(early0, aluAt = 0, jumpAt = relaxedBranch.toInt, wbAt = 0)
plugins += new SimdAddPlugin(early0) // <- We will implement this plugin
Plugin implementation
Here is a example how this plugin could be implemented :
package vexiiriscv.execute
import spinal.core._
import spinal.lib._
import spinal.lib.pipeline.Stageable
import vexiiriscv.Generate.args
import vexiiriscv.{Global, ParamSimple, VexiiRiscv}
import vexiiriscv.compat.MultiPortWritesSymplifier
import vexiiriscv.riscv.{IntRegFile, RS1, RS2, Riscv}
// This plugin example will add a new instruction named SIMD_ADD which do the following :
//
// RD : Regfile Destination, RS : Regfile Source
// RD( 7 downto  0) = RS1( 7 downto  0) + RS2( 7 downto  0)
// RD(16 downto  8) = RS1(16 downto  8) + RS2(16 downto  8)
// RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
// RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
// Instruction encoding :
// 0000000----------000-----0001011   <- Custom0 func3=0 func7=0
//        |RS2||RS1|   |RD |
//
// Note :  RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
object SimdAddPlugin{
  // Define the instruction type and encoding that we wll use
  val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
}
// ExecutionUnitElementSimple is a plugin base class which will integrate itself in a execute lane layer
// It provide quite a few utilities to ease the implementation of custom instruction.
// Here we will implement a plugin which provide SIMD add on the register file.
class SimdAddPlugin(val layer : LaneLayer) extends ExecutionUnitElementSimple(layer) {
  // Here we create an elaboration thread. The Logic class is provided by ExecutionUnitElementSimple to provide functionalities
  val logic = during setup new Logic {
    // Here we could have lock the elaboration of some other plugins (ex CSR), but here we don't need any of that
    // as all is already sorted out in the Logic base class.
    // So we just wait for the build phase
    awaitBuild()
    // Let's assume we only support RV32 for now
    assert(Riscv.XLEN.get == 32)
    // Let's get the hardware interface that we will use to provide the result of our custom instruction
    val wb = newWriteback(ifp, 0)
    // Specify that the current plugin will implement the ADD4 instruction
    val add4 = add(SimdAddPlugin.ADD4).spec
    // We need to specify on which stage we start using the register file values
    add4.addRsSpec(RS1, executeAt = 0)
    add4.addRsSpec(RS2, executeAt = 0)
    // Now that we are done specifying everything about the instructions, we can release the Logic.uopRetainer
    // This will allow a few other plugins to continue their elaboration (ex : decoder, dispatcher, ...)
    uopRetainer.release()
    // Let's define some logic in the execute lane [0]
    val process = new el.Execute(id = 0) {
      // Get the RISC-V RS1/RS2 values from the register file
      val rs1 = el(IntRegFile, RS1).asUInt
      val rs2 = el(IntRegFile, RS2).asUInt
      // Do some computation
      val rd = UInt(32 bits)
      rd( 7 downto  0) := rs1( 7 downto  0) + rs2( 7 downto  0)
      rd(16 downto  8) := rs1(16 downto  8) + rs2(16 downto  8)
      rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
      rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
      // Provide the computation value for the writeback
      wb.valid := SEL
      wb.payload := rd.asBits
    }
  }
}
VexiiRiscv generation
Then, to generate a VexiiRiscv with this new plugin, we could run the following App :
object VexiiSimdAddGen extends App {
  val param = new ParamSimple()
  val sc = SpinalConfig()
  assert(new scopt.OptionParser[Unit]("VexiiRiscv") {
    help("help").text("prints this usage text")
    param.addOptions(this)
  }.parse(args, Unit).nonEmpty)
  sc.addTransformationPhase(new MultiPortWritesSymplifier)
  val report = sc.generateVerilog {
    val pa = param.pluginsArea()
    pa.plugins += new SimdAddPlugin(pa.early0)
    VexiiRiscv(pa.plugins)
  }
}
To run this App, you can go to the NaxRiscv directory and run :
sbt "runMain vexiiriscv.execute.VexiiSimdAddGen"
Software test
Then let's write some assembly test code : (https://github.com/SpinalHDL/NaxSoftware/tree/849679c70b238ceee021bdfd18eb2e9809e7bdd0/baremetal/simdAdd)
.globl _start
_start:
#include "../../driver/riscv_asm.h"
#include "../../driver/sim_asm.h"
#include "../../driver/custom_asm.h"
    // Test 1
    li x1, 0x01234567
    li x2, 0x01FF01FF
    opcode_R(CUSTOM0, 0x0, 0x00, x3, x1, x2) // x3 = ADD4(x1, x2)
    // Print result value
    li x4, PUT_HEX
    sw x3, 0(x4)
    // Check result
    li x5, 0x02224666
    bne x3, x5, fail
    j pass
pass:
    j pass
fail:
    j fail
Compile it with
make clean rv32im
Simulation
You could run a simulation using this testbench :
object VexiiSimdAddSim extends App {
  val param = new ParamSimple()
  val testOpt = new TestOptions()
  val genConfig = SpinalConfig()
  genConfig.includeSimulation
  val simConfig = SpinalSimConfig()
  simConfig.withFstWave
  simConfig.withTestFolder
  simConfig.withConfig(genConfig)
  assert(new scopt.OptionParser[Unit]("VexiiRiscv") {
    help("help").text("prints this usage text")
    testOpt.addOptions(this)
    param.addOptions(this)
  }.parse(args, Unit).nonEmpty)
  println(s"With Vexiiriscv parm :\n - ${param.getName()}")
  val compiled = simConfig.compile {
    val pa = param.pluginsArea()
    pa.plugins += new SimdAddPlugin(pa.early0)
    VexiiRiscv(pa.plugins)
  }
  testOpt.test(compiled)
}
Which can be run with :
sbt "runMain vexiiriscv.execute.VexiiSimdAddSim --load-elf ext/NaxSoftware/baremetal/simdAdd/build/rv32ima/simdAdd.elf --trace-all --no-rvls-check"
Which will output the value 02224666 in the shell and show traces in simWorkspace/VexiiRiscv/test :D
Note that --no-rvls-check is required as spike do not implement that custom simdAdd.
Conclusion
So overall this example didn't introduce how to specify some additional decoding, nor how to define multi-cycle ALU. (TODO). But you can take a look in the IntAluPlugin, ShiftPlugin, DivPlugin, MulPlugin and BranchPlugin which are doing those things using the same ExecutionUnitElementSimple base class.