383 lines
20 KiB
Plaintext
383 lines
20 KiB
Plaintext
|
:sectnums:
|
|||
|
== Overview
|
|||
|
|
|||
|
The NEORV32 RISC-V Processor is an open-source RISC-V compatible processor system that is intended as
|
|||
|
*ready-to-go* auxiliary processor within a larger SoC designs or as stand-alone custom / customizable
|
|||
|
microcontroller.
|
|||
|
|
|||
|
The system is highly configurable and provides optional common peripherals like embedded memories,
|
|||
|
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
|
|||
|
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
|
|||
|
compatible on-chip debugger accessible via JTAG.
|
|||
|
|
|||
|
Special focus is paid on **execution safety** to provide defined and predictable behavior at any time.
|
|||
|
Therefore, the CPU ensures that all memory access are acknowledged and no invalid/malformed instructions
|
|||
|
are executed. Whenever an unexpected situation occurs, the application code is informed via hardware exceptions.
|
|||
|
|
|||
|
The software framework of the processor comes with application makefiles, software libraries for all CPU
|
|||
|
and processor features, a bootloader, a runtime environment and several example programs - including a port
|
|||
|
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
|
|||
|
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).
|
|||
|
|
|||
|
Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
|
|||
|
that provides hands-on tutorials to get you started.
|
|||
|
|
|||
|
|
|||
|
**Structure**
|
|||
|
|
|||
|
[start=2]
|
|||
|
. <<_neorv32_processor_soc>>
|
|||
|
. <<_neorv32_central_processing_unit_cpu>>
|
|||
|
. <<_software_framework>>
|
|||
|
. <<_on_chip_debugger_ocd>>
|
|||
|
. <<_legal>>
|
|||
|
|
|||
|
|
|||
|
**Annotations**
|
|||
|
|
|||
|
[WARNING]
|
|||
|
Warning
|
|||
|
|
|||
|
[IMPORTANT]
|
|||
|
Important
|
|||
|
|
|||
|
[NOTE]
|
|||
|
Note
|
|||
|
|
|||
|
[TIP]
|
|||
|
Tip
|
|||
|
|
|||
|
|
|||
|
<<<
|
|||
|
// ####################################################################################################################
|
|||
|
|
|||
|
include::rationale.adoc[]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
// ####################################################################################################################
|
|||
|
:sectnums:
|
|||
|
=== Project Key Features
|
|||
|
|
|||
|
**Project**
|
|||
|
|
|||
|
* all-in-one package: **CPU** + **SoC** + **Software Framework & Tooling**
|
|||
|
* completely described in behavioral, platform-independent VHDL - no vendor- or technology-specific primitives, attributes, macros, libraries, etc. are used at all
|
|||
|
* all-Verilog "version" available (auto-generated netlist)
|
|||
|
* extensive configuration options for adapting the processor to the requirements of the application
|
|||
|
* highly extensible hardware - on CPU, SoC and system level
|
|||
|
* aims to be as small as possible while being as RISC-V-compliant as possible - with a reasonable area-vs-performance trade-off
|
|||
|
* FPGA friendly (e.g. all internal memories can be mapped to block RAM - including the register file)
|
|||
|
* optimized for high clock frequencies to ease timing closure and integration
|
|||
|
* from zero to _"hello world!"_ - completely open source and documented
|
|||
|
* easy to use even for FPGA/RISC-V starters – intended to _work out of the box_
|
|||
|
|
|||
|
**NEORV32 CPU (the core)**
|
|||
|
|
|||
|
* 32-bit RISC-V CPU
|
|||
|
* fully compatible to the RISC-V ISA specs. - checked by the https://github.com/stnolting/neorv32-riscof[official RISCOF architecture tests]
|
|||
|
* base ISA + privileged ISA + several optional standard and custom ISA extensions
|
|||
|
* option to add user-defined RISC-V instructions as custom ISA extension
|
|||
|
* rich set of customization options (ISA extensions, design goal: performance / area / energy, tuning options, ...)
|
|||
|
* <<_full_virtualization>> capabilities to increase execution safety
|
|||
|
* official RISC-V open source architecture ID
|
|||
|
|
|||
|
**NEORV32 Processor (the SoC)**
|
|||
|
|
|||
|
* highly-configurable full-scale microcontroller-like processor system
|
|||
|
* based on the NEORV32 CPU
|
|||
|
* optional standard serial interfaces (UART, TWI, SPI (host and device), 1-Wire)
|
|||
|
* optional timers and counters (watchdog, system timer)
|
|||
|
* optional general purpose IO and PWM; a native NeoPixel(c)-compatible smart LED interface
|
|||
|
* optional embedded memories and caches for data, instructions and bootloader
|
|||
|
* optional external memory interface for custom connectivity
|
|||
|
* optional execute in-place (XIP) module to execute code directly form an external SPI flash
|
|||
|
* optional DMA controller for CPU-independent data transfers
|
|||
|
* optional CRC module to check data integrity
|
|||
|
* on-chip debugger compatible with OpenOCD and gdb including hardware trigger module
|
|||
|
|
|||
|
**Software framework**
|
|||
|
|
|||
|
* GCC-based toolchain - https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains available]; application compilation based on GNU makefiles
|
|||
|
* internal bootloader with serial user interface (via UART)
|
|||
|
* core libraries and HAL for high-level usage of the provided functions and peripherals
|
|||
|
* processor-specific runtime environment and several example programs
|
|||
|
* doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
|
|||
|
* FreeRTOS port + demos available
|
|||
|
|
|||
|
|
|||
|
**Extensibility and Customization**
|
|||
|
|
|||
|
The NEORV32 processor is designed to ease customization and extensibility and provides several options for adding
|
|||
|
application-specific custom hardware modules and accelerators. The three most common options for adding custom
|
|||
|
on-chip modules are listed below.
|
|||
|
|
|||
|
* <<_processor_external_memory_interface_wishbone>> to attach processor-external IP modules
|
|||
|
* <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
|
|||
|
* <<_custom_functions_unit_cfu>> for custom RISC-V instructions
|
|||
|
|
|||
|
[TIP]
|
|||
|
A more detailed comparison of the extension/customization options can be found in section
|
|||
|
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules]
|
|||
|
of the user guide.
|
|||
|
|
|||
|
|
|||
|
<<<
|
|||
|
// ####################################################################################################################
|
|||
|
:sectnums:
|
|||
|
=== Project Folder Structure
|
|||
|
|
|||
|
...................................
|
|||
|
neorv32 - Project home folder
|
|||
|
│
|
|||
|
├docs - Project documentation
|
|||
|
│├datasheet - AsciiDoc sources for the NEORV32 data sheet
|
|||
|
│├figures - Figures and logos
|
|||
|
│├references - Data sheets and RISC-V specs
|
|||
|
│├sources - Sources for the images in 'figures/'
|
|||
|
│└userguide - AsciiDoc sources for the NEORV32 user guide
|
|||
|
│
|
|||
|
├rtl - VHDL sources
|
|||
|
│├core - Core sources of the CPU & SoC
|
|||
|
││└mem - SoC-internal memories (default architectures)
|
|||
|
│├legacy - Deprecated/legacy HDL modules
|
|||
|
│├processor_templates - Pre-configured SoC wrappers
|
|||
|
│├system_integration - System wrappers for advanced connectivity
|
|||
|
│└test_setups - Minimal test setup "SoCs" used in the User Guide
|
|||
|
│
|
|||
|
├sim - Simulation files (see User Guide)
|
|||
|
│
|
|||
|
└-sw - Software framework
|
|||
|
├bootloader - Sources of the processor-internal bootloader
|
|||
|
├common - Linker script, crt0.S start-up code and central makefile
|
|||
|
├example - Example programs for the core and the SoC modules
|
|||
|
├lib - Processor core library
|
|||
|
│├include - Header files (*.h)
|
|||
|
│└source - Source files (*.c)
|
|||
|
├image_gen - Helper program to generate NEORV32 executables
|
|||
|
├ocd_firmware - Firmware for the on-chip debugger's "park loop"
|
|||
|
├openocd - OpenOCD configuration files
|
|||
|
└svd - Processor system view description file (CMSIS-SVD)
|
|||
|
...................................
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<<<
|
|||
|
// ####################################################################################################################
|
|||
|
:sectnums:
|
|||
|
=== VHDL File Hierarchy
|
|||
|
|
|||
|
All necessary VHDL hardware description files are located in the project's `rtl/core` folder. The top entity
|
|||
|
of the entire processor including all the required configuration generics is `neorv32_top.vhd`.
|
|||
|
|
|||
|
.Compile Order
|
|||
|
[IMPORTANT]
|
|||
|
Most of the RTL sources use **entity instantiation**. Hence, the RTL compile order might be relevant.
|
|||
|
The list below shows the hierarchical compile order srarting at the top.
|
|||
|
|
|||
|
.VHDL Library
|
|||
|
[IMPORTANT]
|
|||
|
All core VHDL files from the list below have to be assigned to a **new library** named `neorv32`.
|
|||
|
|
|||
|
...................................
|
|||
|
┌neorv32_package.vhd - Processor/CPU main VHDL package file
|
|||
|
├neorv32_clockgate.vhd - Generic clock gating switch
|
|||
|
├neorv32_fifo.vhd - Generic FIFO component
|
|||
|
│
|
|||
|
│ ┌neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.)
|
|||
|
│ ├neorv32_cpu_cp_cfu.vhd - Custom instructions co-processor (Zxcfu ext.)
|
|||
|
│ ├neorv32_cpu_cp_cond.vhd - Integer conditional operations (Zicond ext.)
|
|||
|
│ ├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.)
|
|||
|
│ ├neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor (base ISA)
|
|||
|
│ ├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M ext.)
|
|||
|
│ │
|
|||
|
│┌neorv32_cpu_alu.vhd - Arithmetic/logic unit
|
|||
|
│├neorv32_cpu_pmp.vhd - Physical memory protection unit (Smpmp ext.)
|
|||
|
│├neorv32_cpu_lsu.vhd - Load/store unit
|
|||
|
││ ┌neorv32_cpu_decompressor.vhd - Compressed instructions decoder (C ext.)
|
|||
|
│├neorv32_cpu_control.vhd - CPU control, exception system and CSRs
|
|||
|
│├neorv32_cpu_regfile.vhd - Data register file
|
|||
|
││
|
|||
|
├neorv32_cpu.vhd - NEORV32 CPU TOP ENTITY
|
|||
|
│
|
|||
|
├mem/neorv32_dmem.default.vhd - *Default* data memory (architecture-only)
|
|||
|
├mem/neorv32_imem.default.vhd - *Default* instruction memory (architecture-only)
|
|||
|
│
|
|||
|
│┌neorv32_bootloader_image.vhd - Bootloader ROM memory image
|
|||
|
├neorv32_boot_rom.vhd - Bootloader ROM
|
|||
|
│
|
|||
|
│┌neor32_application_image.vhd - IMEM application initialization image
|
|||
|
├neorv32_imem.entity.vhd - Processor-internal instruction memory (entity-only!)
|
|||
|
│
|
|||
|
├neorv32_cfs.vhd - Custom functions subsystem
|
|||
|
├neorv32_crc.vhd - Cyclic redundancy check unit
|
|||
|
├neorv32_dcache.vhd - Processor-internal data cache
|
|||
|
├neorv32_debug_dm.vhd - on-chip debugger: debug module
|
|||
|
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
|
|||
|
├neorv32_dma.vhd - Direct memory access controller
|
|||
|
├neorv32_dmem.entity.vhd - Processor-internal data memory (entity-only!)
|
|||
|
├neorv32_gpio.vhd - General purpose input/output port unit
|
|||
|
├neorv32_gptmr.vhd - General purpose 32-bit timer
|
|||
|
├neorv32_icache.vhd - Processor-internal instruction cache
|
|||
|
├neorv32_intercon.vhd - SoC bus infrastructure
|
|||
|
├neorv32_mtime.vhd - Machine system timer
|
|||
|
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
|
|||
|
├neorv32_onewire.vhd - One-Wire serial interface controller
|
|||
|
├neorv32_pwm.vhd - Pulse-width modulation controller
|
|||
|
├neorv32_sdi.vhd - Serial data interface controller (SPI device)
|
|||
|
├neorv32_slink.vhd - Stream link interface
|
|||
|
├neorv32_spi.vhd - Serial peripheral interface controller (SPI host)
|
|||
|
├neorv32_sysinfo.vhd - System configuration information memory
|
|||
|
├neorv32_trng.vhd - True random number generator
|
|||
|
├neorv32_twi.vhd - Two wire serial interface controller
|
|||
|
├neorv32_uart.vhd - Universal async. receiver/transmitter
|
|||
|
├neorv32_wdt.vhd - Watchdog timer
|
|||
|
├neorv32_wishbone.vhd - External (Wishbone) bus interface
|
|||
|
├neorv32_xip.vhd - Execute in place module
|
|||
|
├neorv32_xirq.vhd - External interrupt controller
|
|||
|
│
|
|||
|
neorv32_top.vhd - NEORV32 PROCESSOR TOP ENTITY
|
|||
|
...................................
|
|||
|
|
|||
|
[NOTE]
|
|||
|
The processor-internal instruction and data memories (IMEM and DMEM) are split into two design files each:
|
|||
|
a plain entity definition (`neorv32_*mem.entity.vhd`) and the actual architecture definition
|
|||
|
(`mem/neorv32_*mem.default.vhd`). The `*.default.vhd` architecture definitions from `rtl/core/mem` provide a _generic_ and
|
|||
|
_platform independent_ memory design (inferring embedded memory blocks). You can replace/modify the architecture
|
|||
|
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
|
|||
|
and/or timing.
|
|||
|
|
|||
|
|
|||
|
<<<
|
|||
|
// ####################################################################################################################
|
|||
|
:sectnums:
|
|||
|
=== FPGA Implementation Results
|
|||
|
|
|||
|
This section shows **exemplary** FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
|
|||
|
|
|||
|
[IMPORTANT]
|
|||
|
The results are generated by manual synthesis runs. Hence, they might not represent the latest version of the processor.
|
|||
|
|
|||
|
[discrete]
|
|||
|
==== CPU
|
|||
|
|
|||
|
[cols="<2,<8"]
|
|||
|
[grid="topbot"]
|
|||
|
|=======================
|
|||
|
| HW version: | `1.7.8.5`
|
|||
|
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
|||
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
|||
|
| Toolchain: | Quartus Prime Lite 21.1
|
|||
|
| Constraints: | **no timing constraints**, "balanced optimization", f~max~ from "_Slow 1200mV 0C Model_"
|
|||
|
|=======================
|
|||
|
|
|||
|
[cols="<6,>1,>1,>1,>1,>1"]
|
|||
|
[options="header",grid="rows"]
|
|||
|
|=======================
|
|||
|
| CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | _f~max~_
|
|||
|
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32im_Zicsr_Zicntr` | 2087 | 983 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32imcb_Zicsr_Zicntr` | 3175 | 1247 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32imcbu_Zicsr_Zicntr` | 3186 | 1254 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32imcbu_Zicsr_Zicntr_Zifencei` | 3187 | 1254 | 1024 | 0 | 130 MHz
|
|||
|
| `rv32imcbu_Zicsr_Zicntr_Zifencei_Zfinx` | 4450 | 1906 | 1024 | 7 | 123 MHz
|
|||
|
| `rv32imcbu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 4825 | 2018 | 1024 | 7 | 123 MHz
|
|||
|
|=======================
|
|||
|
|
|||
|
.Goal-Driven Optimization
|
|||
|
[TIP]
|
|||
|
The CPU provides further options to reduce the area footprint or to increase performance.
|
|||
|
See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
|
|||
|
https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].
|
|||
|
|
|||
|
|
|||
|
[discrete]
|
|||
|
==== Processor - Modules
|
|||
|
|
|||
|
[cols="<2,<8"]
|
|||
|
[grid="topbot"]
|
|||
|
|=======================
|
|||
|
| HW version: | `1.8.6.7`
|
|||
|
| Top entity: | `rtl/core/neorv32_top.vhd`
|
|||
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
|||
|
| Toolchain: | Quartus Prime Lite 21.1
|
|||
|
| Constraints: | **no timing constraints**, "balanced optimization"
|
|||
|
|=======================
|
|||
|
|
|||
|
.Hardware utilization by processor module
|
|||
|
[cols="<2,<8,>1,>1,>2,>1"]
|
|||
|
[options="header",grid="rows"]
|
|||
|
|=======================
|
|||
|
| Module | Description | LEs | FFs | MEM bits | DSPs
|
|||
|
| BOOT ROM | Bootloader ROM (4kB) | 2 | 2 | 32768 | 0
|
|||
|
| Bus switch (core) | _SoC bus infrastructure_ | 28 | 15 | 0 | 0
|
|||
|
| Bus switch (DMA) | _SoC bus infrastructure_ | 159 | 9 | 0 | 0
|
|||
|
| CFS | Custom functions subsystem footnote:[Resource utilization depends on custom design logic.] | - | - | - | -
|
|||
|
| CRC | Cyclic redundancy check unit | 130 | 117 | 0 | 0
|
|||
|
| dCACHE | Data cache (4 blocks, 64 bytes per block) | 300 | 167 | 2112 | 0
|
|||
|
| DM | On-chip debugger - debug module | 377 | 241 | 0 | 0
|
|||
|
| DTM | On-chip debugger - debug transfer module (JTAG) | 262 | 220 | 0 | 0
|
|||
|
| DMA | Direct memory access controller | 365 | 291 | 0 | 0
|
|||
|
| DMEM | Processor-internal data memory (8kB) | 6 | 2 | 65536 | 0
|
|||
|
| Gateway | _SoC bus infrastructure_ | 215 | 91 | 0 | 0
|
|||
|
| GPIO | General purpose input/output ports | 102 | 98 | 0 | 0
|
|||
|
| GPTMR | General Purpose Timer | 150 | 105 | 0 | 0
|
|||
|
| IO Switch | _SoC bus infrastructure_ | 217 | 0 | 0 | 0
|
|||
|
| iCACHE | Instruction cache (2x4 blocks, 64 bytes per block) | 458 | 296 | 4096 | 0
|
|||
|
| IMEM | Processor-internal instruction memory (16kB) | 7 | 2 | 131072 | 0
|
|||
|
| MTIME | Machine system timer | 307 | 166 | 0 | 0
|
|||
|
| NEOLED | Smart LED Interface (NeoPixel/WS28128) (FIFO_depth=1) | 171 | 129 | 0 | 0
|
|||
|
| ONEWIRE | 1-wire interface | 105 | 77 | 0 | 0
|
|||
|
| PWM | Pulse_width modulation controller (4 channels) | 91 | 81 | 0 | 0
|
|||
|
| Reservation Set | Reservation set controller for LR/SC instructions | 52 | 33 | 0 | 0
|
|||
|
| SDI | Serial data interface | 103 | 77 | 512 | 0
|
|||
|
| SLINK | Stream link interface (RX/TX FIFO depth=32) | 96 | 73 | 2048 | 0
|
|||
|
| SPI | Serial peripheral interface | 137 | 97 | 1024 | 0
|
|||
|
| SYSINFO | System configuration information memory | 11 | 11 | 0 | 0
|
|||
|
| TRNG | True random number generator | 140 | 108 | 512 | 0
|
|||
|
| TWI | Two-wire interface | 93 | 64 | 0 | 0
|
|||
|
| UART0, UART1 | Universal asynchronous receiver/transmitter 0/1 (FIFO_depth=1) | 222 | 142 | 1024 | 0
|
|||
|
| WDT | Watchdog timer | 107 | 89 | 0 | 0
|
|||
|
| WISHBONE | External memory interface | 122 | 112 | 0 | 0
|
|||
|
| XIP | Execute in place module | 369 | 276 | 0 | 0
|
|||
|
| XIRQ | External interrupt controller (4 channels) | 35 | 29 | 0 | 0
|
|||
|
|=======================
|
|||
|
|
|||
|
|
|||
|
<<<
|
|||
|
// ####################################################################################################################
|
|||
|
:sectnums:
|
|||
|
=== CPU Performance
|
|||
|
|
|||
|
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
|
|||
|
The according sources can be found in the `sw/example/coremark` folder.
|
|||
|
The resulting CoreMark score is defined as CoreMark iterations per second per MHz.
|
|||
|
|
|||
|
.Configuration
|
|||
|
[cols="<2,<8"]
|
|||
|
[grid="topbot"]
|
|||
|
|=======================
|
|||
|
| HW version: | `1.5.7.10`
|
|||
|
| Hardware: | 32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock
|
|||
|
| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK
|
|||
|
| Compiler: | RISCV32-GCC 10.2.0 (compiled with `march=rv32i mabi=ilp32`)
|
|||
|
| Compiler flags: | default but with `-O3`, see makefile
|
|||
|
|=======================
|
|||
|
|
|||
|
.CoreMark results
|
|||
|
[cols="<5,^1,^1,^1"]
|
|||
|
[options="header",grid="rows"]
|
|||
|
|=======================
|
|||
|
| CPU | CoreMark Score | CoreMarks/MHz | Average CPI
|
|||
|
| _small_ (`rv32i_Zicsr_Zifencei`) | 33.89 | **0.3389** | **4.04**
|
|||
|
| _medium_ (`rv32imc_Zicsr_Zifencei`) | 62.50 | **0.6250** | **5.34**
|
|||
|
| _performance_ (`rv32imc_Zicsr_Zifencei` + perf. options) | 95.23 | **0.9523** | **3.54**
|
|||
|
|=======================
|
|||
|
|
|||
|
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
|||
|
several consecutive micro operations. The average CPI (cycles per instruction) depends on the instruction
|
|||
|
mix of a specific applications and also on the available CPU extensions. More information regarding the execution
|
|||
|
time of each implemented instruction can be found in section <<_instruction_sets_and_extensions>>.
|