NaxRiscv is a core currently characterised by :

  • Out of order execution with register renaming

  • Superscalar (ex : 2 decode, 3 execution units, 2 retire)

  • (RV32/RV64)IMAFDCSU (Linux / Buildroot works on hardware)

  • Portable HDL, but target FPGA with distributed ram (Xilinx series 7 is the reference used so far)

  • Target a (relatively) low area usage and high fmax (not the best IPC)

  • Decentralized hardware elaboration (Empty toplevel parametrized with plugins)

  • Frontend implemented around a pipelining framework to ease customisation

  • Non-blocking Data cache with multiple refill and writeback slots

  • BTB + GSHARE + RAS branch predictors

  • Hardware refilled MMU (SV32, SV39)

  • Load to use latency of 3 cycles via the speculative cache hit predictor

  • Pipeline visualisation via verilator simulation and Konata (gem5 file format)

  • JTAG / OpenOCD / GDB support by implementing the RISCV External Debug Support v. 0.13.2

Project development and status

  • This project is free and open source

  • It can run upstream buildroot/linux on hardware (ArtyA7-35T / Litex)

  • It started in October 2021, got some funding from NLnet later on.

  • Unfortunately the project started with a single crew (not by wish), contribution are welcome.

Why a OoO core targeting FPGA

There is a few reasons

  • Improving single threaded performance. During the tests made with VexRiscv running linux, it was clear that even if the can multi core help, “most” applications aren’t made to take advantage of it.

  • Hiding the memory latency (There isn’t much memory to have a big L2 cache on FPGA)

  • To experiment with more advanced hardware description paradigms (scala / SpinalHDL)

  • By personal interest

Also there wasn’t many OoO opensource softcore out there in the wild (Marocchino, RSD, OPA, ..). The bet was that it was possible to do better in some metrics, and hopefully being good enough to justify in some project the replacement of single issue / in order core softcore by providing better performances (at the cost of area).


NaxRiscv is composed of multiple pipelines :

  • Fetch : Which provide the raw data for a given PC, provide a first level of branch prediction and do the PC memory translation

  • Frontend : Which align, decode the fetch raw data, provide a second level of branch prediction, then allocates and rename the resources required for the instruction

  • Execution units : Which get the resources required for a given instruction (ex RF, PC, ..), execute it, write its result back and notify completion.

  • LSU load : Which read the data cache, translate the memory address, check for older store dependencies and eventually bypass a store value.

  • LSU store : Which translate the memory address and check for younger load having notified completion while having a dependency with the given store

  • LSU writeback : Which will write committed store into the data cache

But it should also be noticed that there is a few state machines :

  • LSU Atomic : Which sequentially execute SC/AMO operations once they are the next in line for commit.

  • MMU refill : Which walk the page tables to refill the TLB

  • Privileged : Which will check and update the CSR required for trap and privilege switches (mret, sret)

Here is a general architectural diagram :


How to use

There are currently two ways to try NaxRiscv :

Hardware description

NaxRiscv was designed using SpinalHDL (a Scala hardware description library). One goal of the implementation was to explore new hardware elaboration paradigms as :

  • Automatic pipelining framework

  • Distributed hardware elaboration

  • Software paradigms applied to hardware elaboration (ex : software interface)

A few notes about it :

  • You can get generate the Verilog or the VHDL from it.

  • A whole chapter (Abstractions / HDL) of the doc is dedicated to explore the different paradigms used