383 lines
20 KiB
Plaintext
383 lines
20 KiB
Plaintext
:sectnums:
|
||
== Overview
|
||
|
||
The NEORV32 RISC-V Processor is an open-source RISC-V compatible processor system that is intended as
|
||
*ready-to-go* auxiliary processor within a larger SoC designs or as stand-alone custom / customizable
|
||
microcontroller.
|
||
|
||
The system is highly configurable and provides optional common peripherals like embedded memories,
|
||
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
|
||
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
|
||
compatible on-chip debugger accessible via JTAG.
|
||
|
||
Special focus is paid on **execution safety** to provide defined and predictable behavior at any time.
|
||
Therefore, the CPU ensures that all memory access are acknowledged and no invalid/malformed instructions
|
||
are executed. Whenever an unexpected situation occurs, the application code is informed via hardware exceptions.
|
||
|
||
The software framework of the processor comes with application makefiles, software libraries for all CPU
|
||
and processor features, a bootloader, a runtime environment and several example programs - including a port
|
||
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
|
||
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).
|
||
|
||
Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
|
||
that provides hands-on tutorials to get you started.
|
||
|
||
|
||
**Structure**
|
||
|
||
[start=2]
|
||
. <<_neorv32_processor_soc>>
|
||
. <<_neorv32_central_processing_unit_cpu>>
|
||
. <<_software_framework>>
|
||
. <<_on_chip_debugger_ocd>>
|
||
. <<_legal>>
|
||
|
||
|
||
**Annotations**
|
||
|
||
[WARNING]
|
||
Warning
|
||
|
||
[IMPORTANT]
|
||
Important
|
||
|
||
[NOTE]
|
||
Note
|
||
|
||
[TIP]
|
||
Tip
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
|
||
include::rationale.adoc[]
|
||
|
||
|
||
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== Project Key Features
|
||
|
||
**Project**
|
||
|
||
* all-in-one package: **CPU** + **SoC** + **Software Framework & Tooling**
|
||
* completely described in behavioral, platform-independent VHDL - no vendor- or technology-specific primitives, attributes, macros, libraries, etc. are used at all
|
||
* all-Verilog "version" available (auto-generated netlist)
|
||
* extensive configuration options for adapting the processor to the requirements of the application
|
||
* highly extensible hardware - on CPU, SoC and system level
|
||
* aims to be as small as possible while being as RISC-V-compliant as possible - with a reasonable area-vs-performance trade-off
|
||
* FPGA friendly (e.g. all internal memories can be mapped to block RAM - including the register file)
|
||
* optimized for high clock frequencies to ease timing closure and integration
|
||
* from zero to _"hello world!"_ - completely open source and documented
|
||
* easy to use even for FPGA/RISC-V starters – intended to _work out of the box_
|
||
|
||
**NEORV32 CPU (the core)**
|
||
|
||
* 32-bit RISC-V CPU
|
||
* fully compatible to the RISC-V ISA specs. - checked by the https://github.com/stnolting/neorv32-riscof[official RISCOF architecture tests]
|
||
* base ISA + privileged ISA + several optional standard and custom ISA extensions
|
||
* option to add user-defined RISC-V instructions as custom ISA extension
|
||
* rich set of customization options (ISA extensions, design goal: performance / area / energy, tuning options, ...)
|
||
* <<_full_virtualization>> capabilities to increase execution safety
|
||
* official RISC-V open source architecture ID
|
||
|
||
**NEORV32 Processor (the SoC)**
|
||
|
||
* highly-configurable full-scale microcontroller-like processor system
|
||
* based on the NEORV32 CPU
|
||
* optional standard serial interfaces (UART, TWI, SPI (host and device), 1-Wire)
|
||
* optional timers and counters (watchdog, system timer)
|
||
* optional general purpose IO and PWM; a native NeoPixel(c)-compatible smart LED interface
|
||
* optional embedded memories and caches for data, instructions and bootloader
|
||
* optional external memory interface for custom connectivity
|
||
* optional execute in-place (XIP) module to execute code directly form an external SPI flash
|
||
* optional DMA controller for CPU-independent data transfers
|
||
* optional CRC module to check data integrity
|
||
* on-chip debugger compatible with OpenOCD and gdb including hardware trigger module
|
||
|
||
**Software framework**
|
||
|
||
* GCC-based toolchain - https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains available]; application compilation based on GNU makefiles
|
||
* internal bootloader with serial user interface (via UART)
|
||
* core libraries and HAL for high-level usage of the provided functions and peripherals
|
||
* processor-specific runtime environment and several example programs
|
||
* doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
|
||
* FreeRTOS port + demos available
|
||
|
||
|
||
**Extensibility and Customization**
|
||
|
||
The NEORV32 processor is designed to ease customization and extensibility and provides several options for adding
|
||
application-specific custom hardware modules and accelerators. The three most common options for adding custom
|
||
on-chip modules are listed below.
|
||
|
||
* <<_processor_external_memory_interface_wishbone>> to attach processor-external IP modules
|
||
* <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
|
||
* <<_custom_functions_unit_cfu>> for custom RISC-V instructions
|
||
|
||
[TIP]
|
||
A more detailed comparison of the extension/customization options can be found in section
|
||
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules]
|
||
of the user guide.
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== Project Folder Structure
|
||
|
||
...................................
|
||
neorv32 - Project home folder
|
||
│
|
||
├docs - Project documentation
|
||
│├datasheet - AsciiDoc sources for the NEORV32 data sheet
|
||
│├figures - Figures and logos
|
||
│├references - Data sheets and RISC-V specs
|
||
│├sources - Sources for the images in 'figures/'
|
||
│└userguide - AsciiDoc sources for the NEORV32 user guide
|
||
│
|
||
├rtl - VHDL sources
|
||
│├core - Core sources of the CPU & SoC
|
||
││└mem - SoC-internal memories (default architectures)
|
||
│├legacy - Deprecated/legacy HDL modules
|
||
│├processor_templates - Pre-configured SoC wrappers
|
||
│├system_integration - System wrappers for advanced connectivity
|
||
│└test_setups - Minimal test setup "SoCs" used in the User Guide
|
||
│
|
||
├sim - Simulation files (see User Guide)
|
||
│
|
||
└-sw - Software framework
|
||
├bootloader - Sources of the processor-internal bootloader
|
||
├common - Linker script, crt0.S start-up code and central makefile
|
||
├example - Example programs for the core and the SoC modules
|
||
├lib - Processor core library
|
||
│├include - Header files (*.h)
|
||
│└source - Source files (*.c)
|
||
├image_gen - Helper program to generate NEORV32 executables
|
||
├ocd_firmware - Firmware for the on-chip debugger's "park loop"
|
||
├openocd - OpenOCD configuration files
|
||
└svd - Processor system view description file (CMSIS-SVD)
|
||
...................................
|
||
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== VHDL File Hierarchy
|
||
|
||
All necessary VHDL hardware description files are located in the project's `rtl/core` folder. The top entity
|
||
of the entire processor including all the required configuration generics is `neorv32_top.vhd`.
|
||
|
||
.Compile Order
|
||
[IMPORTANT]
|
||
Most of the RTL sources use **entity instantiation**. Hence, the RTL compile order might be relevant.
|
||
The list below shows the hierarchical compile order srarting at the top.
|
||
|
||
.VHDL Library
|
||
[IMPORTANT]
|
||
All core VHDL files from the list below have to be assigned to a **new library** named `neorv32`.
|
||
|
||
...................................
|
||
┌neorv32_package.vhd - Processor/CPU main VHDL package file
|
||
├neorv32_clockgate.vhd - Generic clock gating switch
|
||
├neorv32_fifo.vhd - Generic FIFO component
|
||
│
|
||
│ ┌neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.)
|
||
│ ├neorv32_cpu_cp_cfu.vhd - Custom instructions co-processor (Zxcfu ext.)
|
||
│ ├neorv32_cpu_cp_cond.vhd - Integer conditional operations (Zicond ext.)
|
||
│ ├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.)
|
||
│ ├neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor (base ISA)
|
||
│ ├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M ext.)
|
||
│ │
|
||
│┌neorv32_cpu_alu.vhd - Arithmetic/logic unit
|
||
│├neorv32_cpu_pmp.vhd - Physical memory protection unit (Smpmp ext.)
|
||
│├neorv32_cpu_lsu.vhd - Load/store unit
|
||
││ ┌neorv32_cpu_decompressor.vhd - Compressed instructions decoder (C ext.)
|
||
│├neorv32_cpu_control.vhd - CPU control, exception system and CSRs
|
||
│├neorv32_cpu_regfile.vhd - Data register file
|
||
││
|
||
├neorv32_cpu.vhd - NEORV32 CPU TOP ENTITY
|
||
│
|
||
├mem/neorv32_dmem.default.vhd - *Default* data memory (architecture-only)
|
||
├mem/neorv32_imem.default.vhd - *Default* instruction memory (architecture-only)
|
||
│
|
||
│┌neorv32_bootloader_image.vhd - Bootloader ROM memory image
|
||
├neorv32_boot_rom.vhd - Bootloader ROM
|
||
│
|
||
│┌neor32_application_image.vhd - IMEM application initialization image
|
||
├neorv32_imem.entity.vhd - Processor-internal instruction memory (entity-only!)
|
||
│
|
||
├neorv32_cfs.vhd - Custom functions subsystem
|
||
├neorv32_crc.vhd - Cyclic redundancy check unit
|
||
├neorv32_dcache.vhd - Processor-internal data cache
|
||
├neorv32_debug_dm.vhd - on-chip debugger: debug module
|
||
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
|
||
├neorv32_dma.vhd - Direct memory access controller
|
||
├neorv32_dmem.entity.vhd - Processor-internal data memory (entity-only!)
|
||
├neorv32_gpio.vhd - General purpose input/output port unit
|
||
├neorv32_gptmr.vhd - General purpose 32-bit timer
|
||
├neorv32_icache.vhd - Processor-internal instruction cache
|
||
├neorv32_intercon.vhd - SoC bus infrastructure
|
||
├neorv32_mtime.vhd - Machine system timer
|
||
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
|
||
├neorv32_onewire.vhd - One-Wire serial interface controller
|
||
├neorv32_pwm.vhd - Pulse-width modulation controller
|
||
├neorv32_sdi.vhd - Serial data interface controller (SPI device)
|
||
├neorv32_slink.vhd - Stream link interface
|
||
├neorv32_spi.vhd - Serial peripheral interface controller (SPI host)
|
||
├neorv32_sysinfo.vhd - System configuration information memory
|
||
├neorv32_trng.vhd - True random number generator
|
||
├neorv32_twi.vhd - Two wire serial interface controller
|
||
├neorv32_uart.vhd - Universal async. receiver/transmitter
|
||
├neorv32_wdt.vhd - Watchdog timer
|
||
├neorv32_wishbone.vhd - External (Wishbone) bus interface
|
||
├neorv32_xip.vhd - Execute in place module
|
||
├neorv32_xirq.vhd - External interrupt controller
|
||
│
|
||
neorv32_top.vhd - NEORV32 PROCESSOR TOP ENTITY
|
||
...................................
|
||
|
||
[NOTE]
|
||
The processor-internal instruction and data memories (IMEM and DMEM) are split into two design files each:
|
||
a plain entity definition (`neorv32_*mem.entity.vhd`) and the actual architecture definition
|
||
(`mem/neorv32_*mem.default.vhd`). The `*.default.vhd` architecture definitions from `rtl/core/mem` provide a _generic_ and
|
||
_platform independent_ memory design (inferring embedded memory blocks). You can replace/modify the architecture
|
||
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
|
||
and/or timing.
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== FPGA Implementation Results
|
||
|
||
This section shows **exemplary** FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
|
||
|
||
[IMPORTANT]
|
||
The results are generated by manual synthesis runs. Hence, they might not represent the latest version of the processor.
|
||
|
||
[discrete]
|
||
==== CPU
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| HW version: | `1.7.8.5`
|
||
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
||
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
||
| Toolchain: | Quartus Prime Lite 21.1
|
||
| Constraints: | **no timing constraints**, "balanced optimization", f~max~ from "_Slow 1200mV 0C Model_"
|
||
|=======================
|
||
|
||
[cols="<6,>1,>1,>1,>1,>1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | _f~max~_
|
||
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz
|
||
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz
|
||
| `rv32im_Zicsr_Zicntr` | 2087 | 983 | 1024 | 0 | 130 MHz
|
||
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz
|
||
| `rv32imcb_Zicsr_Zicntr` | 3175 | 1247 | 1024 | 0 | 130 MHz
|
||
| `rv32imcbu_Zicsr_Zicntr` | 3186 | 1254 | 1024 | 0 | 130 MHz
|
||
| `rv32imcbu_Zicsr_Zicntr_Zifencei` | 3187 | 1254 | 1024 | 0 | 130 MHz
|
||
| `rv32imcbu_Zicsr_Zicntr_Zifencei_Zfinx` | 4450 | 1906 | 1024 | 7 | 123 MHz
|
||
| `rv32imcbu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 4825 | 2018 | 1024 | 7 | 123 MHz
|
||
|=======================
|
||
|
||
.Goal-Driven Optimization
|
||
[TIP]
|
||
The CPU provides further options to reduce the area footprint or to increase performance.
|
||
See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
|
||
https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].
|
||
|
||
|
||
[discrete]
|
||
==== Processor - Modules
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| HW version: | `1.8.6.7`
|
||
| Top entity: | `rtl/core/neorv32_top.vhd`
|
||
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
||
| Toolchain: | Quartus Prime Lite 21.1
|
||
| Constraints: | **no timing constraints**, "balanced optimization"
|
||
|=======================
|
||
|
||
.Hardware utilization by processor module
|
||
[cols="<2,<8,>1,>1,>2,>1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| Module | Description | LEs | FFs | MEM bits | DSPs
|
||
| BOOT ROM | Bootloader ROM (4kB) | 2 | 2 | 32768 | 0
|
||
| Bus switch (core) | _SoC bus infrastructure_ | 28 | 15 | 0 | 0
|
||
| Bus switch (DMA) | _SoC bus infrastructure_ | 159 | 9 | 0 | 0
|
||
| CFS | Custom functions subsystem footnote:[Resource utilization depends on custom design logic.] | - | - | - | -
|
||
| CRC | Cyclic redundancy check unit | 130 | 117 | 0 | 0
|
||
| dCACHE | Data cache (4 blocks, 64 bytes per block) | 300 | 167 | 2112 | 0
|
||
| DM | On-chip debugger - debug module | 377 | 241 | 0 | 0
|
||
| DTM | On-chip debugger - debug transfer module (JTAG) | 262 | 220 | 0 | 0
|
||
| DMA | Direct memory access controller | 365 | 291 | 0 | 0
|
||
| DMEM | Processor-internal data memory (8kB) | 6 | 2 | 65536 | 0
|
||
| Gateway | _SoC bus infrastructure_ | 215 | 91 | 0 | 0
|
||
| GPIO | General purpose input/output ports | 102 | 98 | 0 | 0
|
||
| GPTMR | General Purpose Timer | 150 | 105 | 0 | 0
|
||
| IO Switch | _SoC bus infrastructure_ | 217 | 0 | 0 | 0
|
||
| iCACHE | Instruction cache (2x4 blocks, 64 bytes per block) | 458 | 296 | 4096 | 0
|
||
| IMEM | Processor-internal instruction memory (16kB) | 7 | 2 | 131072 | 0
|
||
| MTIME | Machine system timer | 307 | 166 | 0 | 0
|
||
| NEOLED | Smart LED Interface (NeoPixel/WS28128) (FIFO_depth=1) | 171 | 129 | 0 | 0
|
||
| ONEWIRE | 1-wire interface | 105 | 77 | 0 | 0
|
||
| PWM | Pulse_width modulation controller (4 channels) | 91 | 81 | 0 | 0
|
||
| Reservation Set | Reservation set controller for LR/SC instructions | 52 | 33 | 0 | 0
|
||
| SDI | Serial data interface | 103 | 77 | 512 | 0
|
||
| SLINK | Stream link interface (RX/TX FIFO depth=32) | 96 | 73 | 2048 | 0
|
||
| SPI | Serial peripheral interface | 137 | 97 | 1024 | 0
|
||
| SYSINFO | System configuration information memory | 11 | 11 | 0 | 0
|
||
| TRNG | True random number generator | 140 | 108 | 512 | 0
|
||
| TWI | Two-wire interface | 93 | 64 | 0 | 0
|
||
| UART0, UART1 | Universal asynchronous receiver/transmitter 0/1 (FIFO_depth=1) | 222 | 142 | 1024 | 0
|
||
| WDT | Watchdog timer | 107 | 89 | 0 | 0
|
||
| WISHBONE | External memory interface | 122 | 112 | 0 | 0
|
||
| XIP | Execute in place module | 369 | 276 | 0 | 0
|
||
| XIRQ | External interrupt controller (4 channels) | 35 | 29 | 0 | 0
|
||
|=======================
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== CPU Performance
|
||
|
||
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
|
||
The according sources can be found in the `sw/example/coremark` folder.
|
||
The resulting CoreMark score is defined as CoreMark iterations per second per MHz.
|
||
|
||
.Configuration
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| HW version: | `1.5.7.10`
|
||
| Hardware: | 32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock
|
||
| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK
|
||
| Compiler: | RISCV32-GCC 10.2.0 (compiled with `march=rv32i mabi=ilp32`)
|
||
| Compiler flags: | default but with `-O3`, see makefile
|
||
|=======================
|
||
|
||
.CoreMark results
|
||
[cols="<5,^1,^1,^1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| CPU | CoreMark Score | CoreMarks/MHz | Average CPI
|
||
| _small_ (`rv32i_Zicsr_Zifencei`) | 33.89 | **0.3389** | **4.04**
|
||
| _medium_ (`rv32imc_Zicsr_Zifencei`) | 62.50 | **0.6250** | **5.34**
|
||
| _performance_ (`rv32imc_Zicsr_Zifencei` + perf. options) | 95.23 | **0.9523** | **3.54**
|
||
|=======================
|
||
|
||
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
||
several consecutive micro operations. The average CPI (cycles per instruction) depends on the instruction
|
||
mix of a specific applications and also on the available CPU extensions. More information regarding the execution
|
||
time of each implemented instruction can be found in section <<_instruction_sets_and_extensions>>.
|