979 lines
53 KiB
Plaintext
979 lines
53 KiB
Plaintext
:sectnums:
|
|
== NEORV32 Central Processing Unit (CPU)
|
|
|
|
The NEORV32 CPU is an area-optimized RISC-V core implementing the `rv32i_zicsr_zifencei` base (privileged) ISA and
|
|
supporting several additional/optional ISA extensions. The CPU's micro architecture is based on a von-Neumann
|
|
machine build upon a mixture of multi-cycle and pipelined execution schemes.
|
|
|
|
[NOTE]
|
|
This chapter assumes that the reader is familiar with the official
|
|
RISC-V _User_ and _Privileged Architecture_ specifications.
|
|
|
|
**Section Structure**
|
|
|
|
* <<_risc_v_compatibility>>
|
|
* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>
|
|
* <<_architecture>> and <<_full_virtualization>>
|
|
* <<_instruction_sets_and_extensions>> and <<_custom_functions_unit_cfu>>
|
|
* <<_control_and_status_registers_csrs>>
|
|
* <<_traps_exceptions_and_interrupts>>
|
|
* <<_bus_interface>>
|
|
|
|
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== RISC-V Compatibility
|
|
|
|
The NEORV32 CPU passes the tests of the **official RISCOF RISC-V Architecture Test Framework**. This framework is used to check
|
|
RISC-V implementations for compatibility to the official RISC-V user/privileged ISA specifications. The NEORV32 port of this
|
|
test framework is available in a separate repository at GitHub: https://github.com/stnolting/neorv32-riscof
|
|
|
|
.Unsupported ISA Extensions
|
|
[TIP]
|
|
Executing instructions or accessing CSRs from yet unsupported ISA extensions will raise an illegal
|
|
instruction exception (see section <<_full_virtualization>>).
|
|
|
|
|
|
**Incompatibility Issues and Limitations**
|
|
|
|
.`time[h]` CSRs (Wall Clock Time)
|
|
[IMPORTANT]
|
|
The NEORV32 does not implement the `time[h]` registers. Any access to these registers will trap. It is
|
|
recommended that the trap handler software provides a means of accessing the platform-defined <<_machine_system_timer_mtime>>.
|
|
|
|
.No Hardware Support of Misaligned Memory Accesses
|
|
[IMPORTANT]
|
|
The CPU does not support resolving unaligned memory access by the hardware (this is not a
|
|
RISC-V-incompatibility issue but an important thing to know!). Any kind of unaligned memory access
|
|
will raise an exception to allow a _software-based_ emulation provided by the application. However, unaligned memory
|
|
access can be **emulated** using the NEORV32 runtime environment. See section <<_application_context_handling>>
|
|
for more information.
|
|
|
|
.No Atomic Read-Modify-Write Operations
|
|
[IMPORTANT]
|
|
The NEORV32 <<_a_isa_extension>> only supports the load-reservate (LR) and store-conditional (SR) instructions.
|
|
The remaining read-modify-write operations are not supported. However, these missing instructions can
|
|
be emulated. The NEORV32 <<_core_libraries>> provide an emulation wrapper for the missing AMO/read-modify-write
|
|
instructions that is based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== CPU Top Entity - Signals
|
|
|
|
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
|
|
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
|
|
direction as seen from the CPU.
|
|
|
|
.NEORV32 CPU Signal List
|
|
[cols="<3,^3,^1,<5"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Signal | Width/Type | Dir | Description
|
|
4+^| **Global Signals**
|
|
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
|
|
| `clk_aux_i` | 1 | in | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
|
|
| `rstn_i` | 1 | in | Global reset, low-active
|
|
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set
|
|
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
|
|
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
|
|
| `msi_i` | 1 | in | RISC-V machine software interrupt
|
|
| `mei_i` | 1 | in | RISC-V machine external interrupt
|
|
| `mti_i` | 1 | in | RISC-V machine timer interrupt
|
|
| `firq_i` | 16 | in | Custom fast interrupt request signals
|
|
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>)
|
|
4+^| **Instruction <<_bus_interface>>**
|
|
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request
|
|
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response
|
|
4+^| **Data <<_bus_interface>>**
|
|
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request
|
|
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response
|
|
|=======================
|
|
|
|
.Bus Interface Protocol
|
|
[TIP]
|
|
See section <<_bus_interface>> for the instruction fetch and data access interface protocol and the
|
|
according interface types (`bus_req_t` and `bus_rsp_t`).
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== CPU Top Entity - Generics
|
|
|
|
Most of the CPU configuration generics are a subset of the actual Processor configuration generics
|
|
(see section <<_processor_top_entity_generics>>). and are not listed here. However, the CPU provides
|
|
some _specific_ generics that are used to configure the CPU for the NEORV32 processor setup. These generics
|
|
are assigned by the processor setup only and are not available for user defined configuration.
|
|
The specific generics are listed below.
|
|
|
|
.Table Abbreviations
|
|
[NOTE]
|
|
The generic type "suv(x:y)" defines a `std_ulogic_vector(x downto y)`.
|
|
|
|
.NEORV32 CPU-Exclusive Generic List
|
|
[cols="<4,^2,<8"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Name | Type | Description
|
|
| `CPU_BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
|
|
| `CPU_DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
|
|
| `CPU_DEBUG_EXC_ADDR` | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
|
|
| `CPU_EXTENSION_RISCV_Sdext` | boolean | Implement RISC-V-compatible "debug" CPU operation mode required for the <<_on_chip_debugger_ocd>>.
|
|
| `CPU_EXTENSION_RISCV_Sdtrig` | boolean | Implement RISC-V-compatible trigger module. See section <<_on_chip_debugger_ocd>>.
|
|
|=======================
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== Architecture
|
|
|
|
image::neorv32_cpu.png[align=center]
|
|
|
|
The CPU implements a pipelined multi-cycle architecture: each instruction is executed as a series of consecutive
|
|
micro-operations. In order to increase performance, the CPU's front-end (instruction fetch) and back-end
|
|
(instruction execution) are de-couples via a FIFO (the instruction prefetch buffer. Thus, the front-end can already
|
|
fetch new instructions while the back-end is still processing the previously-fetched instructions.
|
|
|
|
Basically, the CPU's micro architecture is somewhere between a classical pipelined architecture, where each stage
|
|
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
|
|
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of these
|
|
two design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
|
|
overlapping operation of fetch and execute) at a reduced hardware footprint (due to the multi-cycle concept).
|
|
|
|
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access. However,
|
|
these two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
|
|
have higher priority). Hence, _all_ memory addresses including peripheral devices are mapped to a single unified 32-bit
|
|
<<_address_space>>.
|
|
|
|
[NOTE]
|
|
The CPU does not perform any speculative/out-of-order operations at all. Hence, it is not vulnerable to security issues
|
|
caused by speculative execution (like Spectre or Meltdown).
|
|
|
|
|
|
:sectnums:
|
|
==== CPU Register File
|
|
|
|
The data register file contains the general purpose architecture registers `x0` to `x31`. For the `rv32e` ISA only the lower
|
|
16 registers are implemented. Register zero (`x0`/`zero`) always read as zero and any write access to it has no effect.
|
|
Up to four individual synchronous read ports allow to fetch up to 4 register operands at once. The write and read accesses
|
|
are mutually exclusive as they happen in separate cycles. Hence, there is no need to consider things like "read-during-write"
|
|
behavior.
|
|
|
|
The register file provides two different implementation options configured via the top's `REGFILE_HW_RST` generic.
|
|
|
|
* `REGFILE_HW_RST = false` (default): In this configuration the register file is implemented as plain memory array without a
|
|
dictated hardware reset. This architecture allows to infer FPGA block RAM for the entire register file resulting in minimal
|
|
logic utilization and optimal timing.
|
|
* `REGFILE_HW_RST = true`: This configuration is based on individual FFs that do provide a dedicated hardware reset.
|
|
Hence, the register cannot be mapped to FPGA block RAM. This optional should only be selected if the application requires a
|
|
reset of the register file (e.g. for security reasons) or if the design shall be synthesized for an **ASIC** implementation.
|
|
|
|
The state of this configuration generic can be checked by software via the <<_mxisa>> CSR.
|
|
|
|
.FPGA Implementation
|
|
[WARNING]
|
|
Enabling the `REGFILE_HW_RST` option for FPGA implementation is not recommended as this will massively increase the amount
|
|
of required logic resources.
|
|
|
|
.Implementation of the `zero` Register within FPGA Block RAM
|
|
[NOTE]
|
|
Register `zero` is also mapped to a _physical memory location_ within the register file's block RAM. By this, there is no need
|
|
to add a further multiplexer to "insert" zero if reading from register `zero` reducing logic requirements and shortening the
|
|
critical path. However, this also requires that the physical storage bits of register `zero` are explicitly initialized (set
|
|
to zero) by the hardware. This is done transparently by the CPU control requiring no additional processing overhead.
|
|
|
|
.Block RAM Ports
|
|
[NOTE]
|
|
The default register file configuration uses two access ports: a read-only port for reading register `rs2` (second source operand)
|
|
and a read/write port for reading register `rs1` (first source operand) and for writing processing results to register `rd`
|
|
(destination register). Hence, a simple dual-port RAM can be used to implement the entire register file. From a functional point
|
|
of view, read and write accesses to the register file do never occur in the same clock cycle, so no bypass logic is required at all.
|
|
|
|
|
|
:sectnums:
|
|
==== CPU Arithmetic Logic Unit
|
|
|
|
The arithmetic/logic unit (ALU) is used for actual data processing as well as generating memory and branch addresses.
|
|
All "simple" <<_i_isa_extension>> computational instructions (like `add` and `or`) are implemented as plain combinatorial logic
|
|
requiring only a single cycle to complete. More sophisticated instructions like shift operations or multiplications are processed
|
|
by so-called "ALU co-processors".
|
|
|
|
The co-processors are implemented as iterative units that require several cycles to complete processing. Besides the base ISA's
|
|
shift instructions, the co-processors are used to implement all further processing-based ISA extensions (e.g. <<_m_isa_extension>>
|
|
and <<_b_isa_extension>>).
|
|
|
|
.Multi-Cycle Execution Monitor
|
|
[NOTE]
|
|
The CPU control will raise an illegal instruction exception if a multi-cycle functional unit (like the <<_custom_functions_unit_cfu>>)
|
|
does not complete processing in a bound amount of time (configured via the package's `monitor_mc_tmo_c` constant; default = 512 clock cycles).
|
|
|
|
.Tuning Options
|
|
[TIP]
|
|
The ALU architecture can be tuned for an application-specific area-vs-performance trade-off. The `FAST_MUL_EN` and `FAST_SHIFT_EN`
|
|
generics can be used to implement performance-optimized barrel shifters and DSP blocks, respectively. See sections <<_i_isa_extension>>,
|
|
<<_b_isa_extension>> and <<_m_isa_extension>> for specific examples.
|
|
|
|
|
|
:sectnums:
|
|
==== CPU Bus Unit
|
|
|
|
The bus unit takes care of handling data memory accesses via load and store instructions. It handles data adjustment when accessing
|
|
sub-word data quantities (16-bit or 8-bit) and performs sign-extension for singed load operations. The bus unit also includes the optional
|
|
<<_pmp_isa_extension>> that performs permission checks for all data and instruction accesses.
|
|
|
|
A list of the bus interface signals and a detailed description of the protocol can be found in section <<_bus_interface>>.
|
|
All bus interface signals are driven/buffered by registers; so even a complex SoC interconnection bus network will not
|
|
effect maximal operation frequency.
|
|
|
|
.Unaligned Accesses
|
|
[WARNING]
|
|
The CPU does not support a hardware-based handling of unaligned memory accesses! Any unaligned access will raise a bus load/store unaligned
|
|
address exception. The exception handler can be used to _emulate_ unaligned memory accesses in software.
|
|
See the NEORV32 Runtime Environment's <<_application_context_handling>> section for more information.
|
|
|
|
|
|
:sectnums:
|
|
==== CPU Control Unit
|
|
|
|
The CPU control unit is responsible for generating all the control signals for the different CPU modules.
|
|
The control unit is split into a "front-end" and a "back-end".
|
|
|
|
|
|
**Front-End**
|
|
|
|
The front-end is responsible for fetching instructions in chunks of 32-bits. This can be a single aligned 32-bit instruction,
|
|
two aligned 16-bit instructions or a mixture of those. The instructions including control and exception information are stored
|
|
to a FIFO queue - the instruction prefetch buffer (IPB). This FIFO has a depth of two entries by default but can be customized
|
|
via the `ipb_depth_c` VHDL package constant.
|
|
|
|
The FIFO allows the front-end to do "speculative" instruction fetches, as it keeps fetching the next consecutive instruction
|
|
all the time. This also allows to decouple front-end (instruction fetch) and back-end (instruction execution) so both modules
|
|
can operate in parallel to increase performance. However, all potential side effects that are caused by this "speculative"
|
|
instruction fetch are already handled by the CPU front-end ensuring a defined execution stage while preventing security
|
|
side attacks.
|
|
|
|
|
|
**Back-End**
|
|
|
|
Instruction data from the instruction prefetch buffer is decompressed (if the `C` ISA extension is enabled) and sent to the
|
|
CPU back-end for actual execution. Execution is conducted by a state-machine that controls all of the CPU modules. The back-end also
|
|
includes the <<_control_and_status_registers_csrs>> as well as the trap controller.
|
|
|
|
|
|
==== Sleep Mode
|
|
|
|
The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing
|
|
dynamic power consumption. Sleep mode is entered by executing the `wfi` ("wait for interrupt") instruction.
|
|
|
|
[NOTE]
|
|
The `wfi` instruction will raise an illegal instruction exception when executed in user-mode
|
|
if `TW` in <<_mstatus>> is set. When executed in debug-mode or during single-stepping `wfi` will behave as
|
|
simple `nop` without entering sleep mode.
|
|
|
|
After executing the `wfi` instruction the CPU's `sleep_o` signal (<<_cpu_top_entity_signals>>) will become set
|
|
as soon as the CPU has fully halted ("CPU is sleeping"):
|
|
|
|
[start=1]
|
|
.The front-end (instruction fetch) is stopped. There is no pending instruction fetch bus access.
|
|
.The back-end (instruction execution) is stopped. There is no pending data bus access.
|
|
.There is not enabled interrupt pending.
|
|
|
|
CPU-external modules like memories, timers and peripheral interfaces are not affected by this. Furthermore, the CPU will
|
|
continue to buffer/enqueue incoming interrupt. The CPU will leave sleep mode as soon as any _enabled (via <<_mie>>)
|
|
interrupt source becomes _pending_ or if a debug session is started.
|
|
|
|
===== Power-Down Mode
|
|
|
|
Optionally, the sleep mode can also be used to shut down the CPU's main clock to further reduce power consumption
|
|
by halting the core's clock tree. This clock gating mode is enabled by the `CLOCK_GATING_EN` generic
|
|
(<<_processor_top_entity_generics>>). See section <<_processor_clocking>> for more information.
|
|
|
|
|
|
==== Full Virtualization
|
|
|
|
Just like the RISC-V ISA, the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
|
|
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V
|
|
specifications. Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situations
|
|
(e.g. executing a malformed or not supported instruction or accessing a non-allocated memory address). For any kind
|
|
of trap the core is always in a defined and fully synchronized state throughout the whole system (i.e. there are no
|
|
out-of-order operations that might have to be reverted). This allows a defined and predictable execution behavior
|
|
at any time improving overall execution safety.
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== Bus Interface
|
|
|
|
The NEORV32 CPU provides separated instruction fetch and data access interfaces making it a **Harvard Architecture**:
|
|
the instruction fetch interface (`i_bus_*` signals) is used for fetching instructions and the data access interface
|
|
(`d_bus_*` signals) is used to access data via load and store operations. Each of these interfaces can access an address
|
|
space of up to 2^32^ bytes (4GB).
|
|
|
|
The bus interface uses two custom interface types: `bus_req_t` is used to propagate the bus access **requests**. These
|
|
signals are driven by the _accessing_ device (i.e. the CPU core). `bus_rsp_t` is used to return the bus **response** and
|
|
is driven by the _accessed_ device or bus system (i.e. a processor-internal memory or IO device).
|
|
|
|
.Bus Interface - Request Bus (`bus_req_t`)
|
|
[cols="^1,^1,<6"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Signal | Width | Description
|
|
| `addr` | 32 | Access address (byte addressing)
|
|
| `data` | 32 | Write data
|
|
| `ben` | 4 | Byte-enable for each byte in `data`
|
|
| `stb` | 1 | Request trigger ("strobe", single-shot)
|
|
| `rw` | 1 | Access direction (`0` = read, `1` = write)
|
|
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
|
|
| `priv` | 1 | Set if privileged (M-mode) access
|
|
| `rvso` | 1 | Set if current access is a reservation-set operation (atomic `lr` or `sc` instruction)
|
|
| `fence` | 1 | Data/instruction fence operation; valid without `stb` being set
|
|
|=======================
|
|
|
|
.Bus Interface - Response Bus (`bus_rsp_t`)
|
|
[cols="^1,^1,<6"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Signal | Width | Description
|
|
| `data` | 32 | Read data (single-shot)
|
|
| `ack` | 1 | Transfer acknowledge / success (single-shot)
|
|
| `err` | 1 | Transfer error / fail (single-shot)
|
|
|=======================
|
|
|
|
|
|
:sectnums:
|
|
==== Bus Interface Protocol
|
|
|
|
Transactions are triggered entirely by the request bus. A new bus request is initiated by setting the _strobe_
|
|
signal `stb` high for exactly one cycle. All remaining signals of the bus are set together with `stb` and will
|
|
remain unchanged until the transaction is completed.
|
|
|
|
The transaction is completed when the accessed device returns a response via the response interface:
|
|
`ack` is high for exactly one cycle if the transaction was completed successfully. `err` is high for exactly
|
|
one cycle if the transaction failed to complete. These two signals are mutually exclusive. In case of a read
|
|
access the read data is returned together with the `ack` signal. Otherwise, the return data signal is
|
|
kept at all-zero allowing wired-or interconnection of all response buses.
|
|
|
|
The figure below shows three exemplary bus accesses:
|
|
|
|
[start=1]
|
|
. A read access to address `A_addr` returning `rdata` after several cycles (slow response; `ACK` arrives after several cycles).
|
|
. A write access to address `B_addr` writing `wdata` (fastest response; `ACK` arrives right in the next cycle).
|
|
. A failing read access to address `C_addr` (slow response; `ERR` arrives after several cycles).
|
|
|
|
.Three Exemplary Bus Transactions
|
|
image::bus_interface.png[700]
|
|
|
|
|
|
:sectnums:
|
|
==== Atomic Accesses
|
|
|
|
The load-reservate (`lr.w`) and store-conditional (`sc.w`) instructions from the <<_a_isa_extension>> execute as standard
|
|
load/store bus transactions but with the `rvso` ("reservation set operation") signal being set. It is the task of the
|
|
<<_reservation_set_controller>> to handle these LR/SC bus transactions accordingly.
|
|
|
|
.Reservation Set Controller
|
|
[NOTE]
|
|
See section <<_address_space>> / <<_reservation_set_controller>> for more information.
|
|
|
|
.Read-Modify-Write Operations
|
|
[IMPORTANT]
|
|
Read-modify-write operations (line an atomic swap / `amoswap.w`) are **not** supported. However, the NEORV32
|
|
<<_core_libraries>> provide an emulation wrapper for those unsupported instructions that is
|
|
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
|
|
|
|
The figure below shows three exemplary bus accesses (1 to 3 from left to right). The `req` signal record represents
|
|
the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as `rvs_valid` signal.
|
|
|
|
[start=1]
|
|
. A load-reservate (LR) instruction using `addr` as address. This instruction returns the loaded data `rdata` via `rsp.data`
|
|
and also registers a reservation for the address `addr` (`rvs_valid` becomes set).
|
|
. A store-conditional (SC) instruction attempts to write `wdata1` to address `addr`. This SC operation **succeeds**, so
|
|
`wdata1` is actually written to address `addr`. The successful operation is indicated by a **0** being returned via
|
|
`rsp.data` together with `ack`. As the LR/SC is completed the registered reservation is invalidated (`rvs_valid` becomes cleared).
|
|
. Another store-conditional (SC) instruction attempts to write `wdata2` to address `addr`. As the reservation set is already
|
|
invalidated (`rvs_valid` is `0`) the store access fails, so `wdata2` is **not** written to address `addr` at all. The failed
|
|
operation is indicated by a **1** being returned via `rsp.data` together with `ack`.
|
|
|
|
.Three Exemplary LR/SC Bus Transactions
|
|
image::bus_interface_atomic.png[700]
|
|
|
|
.SC Status
|
|
[NOTE]
|
|
The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== Instruction Sets and Extensions
|
|
|
|
The NEORV32 CPU provides several optional RISC-V and custom ISA extensions. The extensions can be enabled/configured
|
|
via the according <<_processor_top_entity_generics>>. This chapter gives a brief overview of the different ISA extensions.
|
|
|
|
.NEORV32 Instruction Set Extensions
|
|
[cols="<2,<5,<3"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Name | Description | <<_processor_top_entity_generics, Enabled by Generic>>
|
|
| <<_a_isa_extension,`A`>> | Atomic memory access instructions | `CPU_EXTENSION_RISCV_A`
|
|
| <<_b_isa_extension,`B`>> | Bit-manipulation instructions | `CPU_EXTENSION_RISCV_B`
|
|
| <<_c_isa_extension,`C`>> | Compressed (16-bit) instructions | `CPU_EXTENSION_RISCV_C`
|
|
| <<_e_isa_extension,`E`>> | Embedded CPU extension (reduced register file size) | `CPU_EXTENSION_RISCV_E`
|
|
| <<_i_isa_extension,`I`>> | Integer base ISA | Enabled if `CPU_EXTENSION_RISCV_E` is **not** enabled
|
|
| <<_m_isa_extension,`M`>> | Integer multiplication and division instructions | `CPU_EXTENSION_RISCV_M`
|
|
| <<_u_isa_extension,`U`>> | Less-privileged _user_ mode extension | `CPU_EXTENSION_RISCV_U`
|
|
| <<_x_isa_extension,`X`>> | Platform-specific / NEORV32-specific extension | Always enabled
|
|
| <<_zifencei_isa_extension,`Zifencei`>> | Instruction stream synchronization instruction | Always enabled
|
|
| <<_zfinx_isa_extension,`Zfinx`>> | Floating-point instructions using integer registers | `CPU_EXTENSION_RISCV_Zfinx`
|
|
| <<_zicntr_isa_extension,`Zicntr`>> | Base counters extension | `CPU_EXTENSION_RISCV_Zicntr`
|
|
| <<_zicond_isa_extension,`Zicond`>> | Integer conditional operations | `CPU_EXTENSION_RISCV_Zicond`
|
|
| <<_zicsr_isa_extension,`Zicsr`>> | Control and status register access instructions | Always enabled
|
|
| <<_zihpm_isa_extension,`Zihpm`>> | Hardware performance monitors extension | `CPU_EXTENSION_RISCV_Zihpm`
|
|
| <<_zmmul_isa_extension,`Zmmul`>> | Integer multiplication-only instruction | `CPU_EXTENSION_RISCV_Zmmul`
|
|
| <<_zcfu_isa_extension,`Zcfu`>> | Custom / user-defined instructions | `CPU_EXTENSION_RISCV_Zxcfu`
|
|
| <<_pmp_isa_extension,`PMP`>> | Physical memory protection extension | `PMP_NUM_REGIONS`
|
|
| <<_sdext_isa_extension,`Sdext`>> | External debug support extension | `ON_CHIP_DEBUGGER_EN`
|
|
| <<_sdtrig_isa_extension,`Sdtrig`>> | Trigger module extension | `ON_CHIP_DEBUGGER_EN`
|
|
|=======================
|
|
|
|
.RISC-V ISA Specifications
|
|
[TIP]
|
|
For more information regarding the RISC-V ISA extensions please refer to the "RISC-V Instruction Set Manual - Volume
|
|
I: Unprivileged ISA" and "The RISC-V Instruction Set Manual Volume II: Privileged Architecture" Acopy of all currently
|
|
implemented ISA extensions can be found in the projects `docs/references` folder.
|
|
|
|
.Discovering ISA Extensions
|
|
[TIP]
|
|
Software can discover available ISA extensions via the <<_misa>> and <<_mxisa>> CSRs or by executing an instruction
|
|
and checking for an illegal instruction exception (i.e. <<_full_virtualization>>).
|
|
|
|
.ISA Extensions-Specific CSRs
|
|
[NOTE]
|
|
The <<_control_and_status_registers_csrs>> section lists the according ISA extensions for all CSRs.
|
|
|
|
|
|
==== `A` ISA Extension
|
|
|
|
The `A` ISA extension adds instructions and mechanisms for atomic memory access operations. Note that the NEORV32 `A`
|
|
only includes the _load-reservate_ (`lr.w`) and _store-conditional_ (`sc.w`) instructions - the remaining read-modify-write
|
|
instructions (like `amoswap`) are **not supported**. However, these missing instructions can be emulated using the
|
|
LR and SC operations.
|
|
|
|
.AMO Emulation
|
|
[NOTE]
|
|
The NEORV32 <<_core_libraries>> provide an emulation wrapper for the missing AMO/read-modify-write instructions that is
|
|
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
|
|
|
|
Atomic instructions allow to notify an application if a certain memory location has been altered by another instance
|
|
(like another process running on the same CPU or a DMA access). Hence, they can be used to implement synchronization
|
|
mechanisms like mutexes and semaphores).
|
|
|
|
The NEORV32 `A` extension is enabled via the `CPU_EXTENSION_RISCV_A` generic (see <<_processor_top_entity_generics>>).
|
|
When enabled the following additional instructions are available.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Load-reservate word | `lr.w` | 5
|
|
| Store-conditional word | `sc.w` | 5
|
|
|=======================
|
|
|
|
The `lr.w` instructions stores one word to a word-aligned address and registers a _reservation set_. The `sc.w`
|
|
instruction stores a word to a word-aligned address only if the reservation set is still valid. Furthermore, the
|
|
`sc.w` operations returns the state of the reservation set (0 = reservation set still valid, data has been written;
|
|
1 = reservation set was broken, no data has been written). The reservation set is invalidated if another `lr.w` instruction
|
|
is executed or if any write access to the _reservated_ address takes place. Traps and/or CPU privilege level changes
|
|
do not modify current reservation sets.
|
|
|
|
.`aq` and `rl` Bits
|
|
[NOTE]
|
|
The instruction word's `aq` and `lr` memory ordering bits are not evaluated by the hardware at all.
|
|
|
|
.Atomic Memory Access on Hardware Level
|
|
[NOTE]
|
|
More information regarding the atomic memory accesses and the according reservation
|
|
sets can be found in section <<_reservation_set_controller>>.
|
|
|
|
.Cache Coherency
|
|
[IMPORTANT]
|
|
Atomic operations **always bypass** the CPU caches using direct/uncached accesses. Care must be taken
|
|
to maintain data cache coherency (e.g. by using the `fence` instruction).
|
|
|
|
|
|
==== `B` ISA Extension
|
|
|
|
The `B` ISA extension adds instructions for bit-manipulation operations.
|
|
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_bitmanip.vhd`).
|
|
The NEORV32 `B` ISA extension includes the following sub-extensions:
|
|
|
|
* `Zba` - Address-generation instructions
|
|
* `Zbb` - Basic bit-manipulation instructions
|
|
* `Zbc` - Carry-less multiplication instructions
|
|
* `Zbs` - Single-bit instructions
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Arithmetic/logic | `min[u]` `max[u]` `sext.b` `sext.h` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 4
|
|
| Shifts | `clz` `ctz` | 3 + 1..32; FAST_SHIFT: 4
|
|
| Shifts | `cpop` | 36; FAST_SHIFT: 4
|
|
| Shifts | `rol` `ror[i]` | 4 + _shift_amount_; FAST_SHIFT: 4
|
|
| Shifted-add | `sh1add` `sh2add` `sh3add` | 4
|
|
| Single-bit | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 4
|
|
| Carry-less multiply | `clmul` `clmulh` `clmulr` | 36
|
|
|=======================
|
|
|
|
.Barrel Shifter
|
|
[TIP]
|
|
Shift operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN`
|
|
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.
|
|
|
|
|
|
==== `C` ISA Extension
|
|
|
|
The "compressed" ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| ALU | `c.addi4spn` `c.nop` `c.add[i]` `c.li` `c.addi16sp` `c.lui` `c.and[i]` `c.sub` `c.xor` `c.or` `c.mv` | 2
|
|
| ALU | `c.srli` `c.srai` `c.slli` | 3 + 1..32; FAST_SHIFT: 4
|
|
| Branches | `c.beqz` `c.bnez` | taken: 6; not taken: 3
|
|
| Jumps / calls | `c.jal[r]` `c.j` `c.jr` | 6
|
|
| Memory access | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4
|
|
| System | `c.break` | 3
|
|
|=======================
|
|
|
|
|
|
==== `E` ISA Extension
|
|
|
|
The "embedded" ISA extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
|
|
shrink hardware size. It provides the same instructions as the the base `I` ISA extensions.
|
|
|
|
[NOTE]
|
|
Due to the reduced register file size an alternate toolchain ABI (`ilp32e*`) is required.
|
|
|
|
|
|
==== `I` ISA Extension
|
|
|
|
The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| ALU | `add[i]` `slt[i]` `slt[i]u` `xor[i]` `or[i]` `and[i]` `sub` `lui` `auipc` | 2
|
|
| ALU shifts | `sll[i]` `srl[i]` `sra[i]` | 3 + 1..32; FAST_SHIFT: 4
|
|
| Branches | `beq` `bne` `blt` `bge` `bltu` `bgeu` | taken: 6; not taken: 3
|
|
| Jump/call | `jal[r]` | 6
|
|
| Load/store | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5
|
|
| System | `ecall` `ebreak` | 3
|
|
| Data fence | `fence` | 5
|
|
| System | `wfi` | 3
|
|
| System | `mret` | 5
|
|
| Illegal inst. | - | 3
|
|
|=======================
|
|
|
|
.`fence` Instruction
|
|
[NOTE]
|
|
The `fence` instruction word's _predecessor_ and _successor_ bits (used for memory ordering) are not evaluated
|
|
by the hardware at all. For the NEORV32 the `fence` instruction behaves exactly like the `fence.i` instruction
|
|
(see <<_zifencei_isa_extension>>). However, software should still use distinct `fence` and `fence.i` to provide
|
|
platform-compatibility and to indicate the actual intention of the according fence instruction(s).
|
|
|
|
.`wfi` Instruction
|
|
[NOTE]
|
|
The `wfi` instruction is used to enter <<_sleep_mode>>. Executing the `wfi` instruction in user-mode
|
|
will raise an illegal instruction exception if the `TW` bit of <<_mstatus>> is set.
|
|
|
|
.Barrel Shifter
|
|
[TIP]
|
|
The shift operations are implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_shifter.vhd`).
|
|
These operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN`
|
|
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.
|
|
|
|
|
|
==== `M` ISA Extension
|
|
|
|
Hardware-accelerated integer multiplication and division operations are available via the RISC-V `M` ISA extension.
|
|
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_muldiv.vhd`).
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Multiplication | `mul` `mulh` `mulhsu` `mulhu` | 36; FAST_MUL: 4
|
|
| Division | `div` `divu` `rem` `remu` | 36
|
|
|=======================
|
|
|
|
.DSP Blocks
|
|
[TIP]
|
|
Multiplication operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_MUL_EN`
|
|
configuration option that will replace the (time-variant) bit-serial multiplier by (time-constant) FPGA DSP blocks.
|
|
|
|
|
|
==== `U` ISA Extension
|
|
|
|
In addition to the highest-privileged machine-mode, the user-mode ISA extensions adds a second **less-privileged**
|
|
operation mode. Code executed in user-mode has reduced CSR access rights. Furthermore, user-mode accesses to the address space
|
|
(like peripheral/IO devices) can be constrained via the physical memory protection.
|
|
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.
|
|
|
|
|
|
==== `X` ISA Extension
|
|
|
|
The NEORV32-specific ISA extensions `X` is always enabled. The most important points of the NEORV32-specific extensions are:
|
|
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>
|
|
and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the
|
|
RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
|
|
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
|
|
* There are <<_neorv32_specific_csrs>>.
|
|
|
|
|
|
==== `Zifencei` ISA Extension
|
|
|
|
The `Zifencei` CPU extension allows manual synchronization of the instruction stream. This extension is always enabled.
|
|
|
|
.NEORV32 Fence Instructions
|
|
[NOTE]
|
|
The NEORV32 treats both fence instructions (`fence` = data fence, `fence.i` = instruction fence) in exactly the same way.
|
|
Both instructions cause a flush of the CPU's instruction prefetch buffer and also send a fence request via the system
|
|
bus (see <<_bus_interface>>). This system bus fence operation will, for example, clear/flush all downstream caches.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Instruction fence | `fence.i` | 5
|
|
|=======================
|
|
|
|
|
|
==== `Zfinx` ISA Extension
|
|
|
|
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
|
|
It also uses the integer register file `x` to store and operate on floating-point data
|
|
instead of a dedicated floating-point register file. Thus, the `Zfinx` extension requires
|
|
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
|
|
register file-related load/store or move instructions. The `Zfinx` extension'S floating-point unit is controlled
|
|
via dedicated <<_floating_point_csrs>>.
|
|
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_fpu.vhd`).
|
|
|
|
.Fused Multiply-Add and Division Instructions
|
|
[WARNING]
|
|
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
|
|
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
|
|
|
|
.Subnormal Number
|
|
[WARNING]
|
|
Subnormal numbers ("de-normalized" numbers, i.e. exponent = 0) are not supported by the NEORV32 FPU.
|
|
Subnormal numbers are _flushed to zero_ setting them to +/- 0 before being processed by **any** FPU operation.
|
|
If a computational instruction generates a subnormal result it is also flushed to zero during normalization.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Artihmetic | `fadd.s` | 110
|
|
| Artihmetic | `fsub.s` | 112
|
|
| Artihmetic | `fmul.s` | 22
|
|
| Compare | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
|
|
| Conversion | `fcvt.w.s` `fcvt.wu.s` `fcvt.s.w` `fcvt.s.wu` | 48
|
|
| Misc | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
|
|
|=======================
|
|
|
|
|
|
==== `Zicntr` ISA Extension
|
|
|
|
The `Zicntr` ISA extension adds the basic <<_cycleh>>, <<_mcycleh>>, <<_instreth>> and <<_minstreth>>
|
|
counter CSRs. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
|
|
|
|
[NOTE]
|
|
The user-mode `time[h]` CSRs are **not implemented**. Any access will trap allowing the trap handler to
|
|
retrieve system time from the <<_machine_system_timer_mtime>>.
|
|
|
|
[NOTE]
|
|
This extensions is stated as _mandatory_ by the RISC-V spec. However, area-constrained setups may remove
|
|
support for these counters.
|
|
|
|
|
|
==== `Zicond` ISA Extension
|
|
|
|
The `Zicond` ISA extension adds integer conditional move primitives that allow to implement branch-less
|
|
control flows. It is enabled by the top's `CPU_EXTENSION_RISCV_Zicond` generic.
|
|
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_cond.vhd`).
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| Conditional | `czero.eqz` `czero.nez` | 3
|
|
|=======================
|
|
|
|
|
|
==== `Zicsr` ISA Extension
|
|
|
|
This ISA extensions provides instructions for accessing the <<_control_and_status_registers_csrs>> as well as further
|
|
privileged-architecture extensions. This extension is mandatory and cannot be disabled. Hence, there is no generic
|
|
for enabling/disabling this ISA extension.
|
|
|
|
[NOTE]
|
|
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
|
|
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
|
|
(the RISC-V spec. state that these combinations "shall" not cause any side-effects).
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| System | `csrrw[i]` `csrrs[i]` `csrrc[i]` | 3
|
|
|=======================
|
|
|
|
|
|
==== `Zihpm` ISA Extension
|
|
|
|
In additions to the base counters the NEORV32 CPU provides up to 13 hardware performance monitors (HPM 3..15),
|
|
which can be used to benchmark applications. Each HPM consists of an N-bit wide counter (split in a high-word 32-bit
|
|
CSR and a low-word 32-bit CSR), where N is defined via the top's
|
|
`HPM_CNT_WIDTH` generic and a corresponding event configuration CSR. The event configuration
|
|
CSR defines the architectural events that lead to an increment of the associated HPM counter. See section
|
|
<<_hardware_performance_monitors_hpm_csrs>> for a list of all HPM-related CSRs and event configurations.
|
|
|
|
[TIP]
|
|
Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.
|
|
|
|
|
|
==== `Zmmul` - ISA Extension
|
|
|
|
This is a sub-extension of the <<_m_isa_extension>> ISA extension. It implements only the multiplication operations
|
|
of the `M` extensions and is intended for size-constrained setups that require hardware-based
|
|
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
|
|
|
|
|
|
==== `Zxcfu` ISA Extension
|
|
|
|
The `Zxcfu` presents a NEORV32-specific ISA extension. It adds the <<_custom_functions_unit_cfu>> to
|
|
the CPU core, which allows to add custom RISC-V instructions to the processor core.
|
|
For detailed information regarding the CFU, its hardware and the according software interface
|
|
see section <<_custom_functions_unit_cfu>>.
|
|
|
|
Software can utilize the custom instructions by using _intrinsics_, which are basically inline assembly functions that
|
|
behave like regular C functions but that evaluate to a single custom instruction word (no calling overhead at all).
|
|
|
|
|
|
==== `PMP` ISA Extension
|
|
|
|
The NEORV32 physical memory protection (PMP, also known as `Smpmp` ISA extension) provides an elementary memory
|
|
protection mechanism that can be used to constrain read, write and execute rights of arbitrary memory regions.
|
|
The NEORV32 PMP is fully compatible to the RISC-V Privileged Architecture Specifications. In general, the PMP can
|
|
**grant permissions to user mode**, which by default has none, and can **revoke permissions from M-mode**, which
|
|
by default has full permissions. The PMP is configured via the <<_machine_physical_memory_protection_csrs>>.
|
|
|
|
Several <<_processor_top_entity_generics>> are provided to fine-tune the CPU's PMP capabilities:
|
|
* `PMP_NUM_REGIONS` defines the number of implemented PMP region
|
|
* `PMP_MIN_GRANULARITY` defines the minimal granularity of each region
|
|
* `PMP_TOR_MODE_EN` controls the implementation of the top-of-region (TOR) mode
|
|
* `PMP_NAP_MODE_EN` controls the implementation of the naturally-aligned-power-of-two (NA4 and NAPOT) modes
|
|
|
|
.PMP Rules when in Debug Mode
|
|
[NOTE]
|
|
When in debug-mode all PMP rules are ignored making the debugger have maximum access rights.
|
|
|
|
[IMPORTANT]
|
|
Instruction fetches are also triggered when denied by a certain PMP rule. However, the fetched instruction(s)
|
|
will not be executed and will not change CPU core state.
|
|
|
|
|
|
==== `Sdext` ISA Extension
|
|
|
|
This ISA extension enables the RISC-V-compatible "external debug support" by implementing
|
|
the CPU "debug mode", which is required for the on-chip debugger.
|
|
See section <<_on_chip_debugger_ocd>> / <<_cpu_debug_mode>> for more information.
|
|
|
|
.Instructions and Timing
|
|
[cols="<2,<4,<3"]
|
|
[options="header", grid="rows"]
|
|
|=======================
|
|
| Class | Instructions | Execution cycles
|
|
| System | `dret` | 5
|
|
|=======================
|
|
|
|
==== `Sdtrig` ISA Extension
|
|
|
|
This ISA extension implements the RISC-V-compatible "trigger module".
|
|
See section <<_on_chip_debugger_ocd>> / <<_trigger_module>> for more information.
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
|
|
include::cpu_cfu.adoc[]
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
include::cpu_csr.adoc[]
|
|
|
|
|
|
<<<
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
==== Traps, Exceptions and Interrupts
|
|
|
|
In this document the following terminology is used (derived from the RISC-V trace specification
|
|
available at https://github.com/riscv-non-isa/riscv-trace-spec):
|
|
|
|
* **exception**: an unusual condition occurring at run time associated (i.e. _synchronous_) with an instruction in a RISC-V hart
|
|
* **interrupt**: an external _asynchronous_ event that may cause a RISC-V hart to experience an unexpected transfer of control
|
|
* **trap**: the transfer of control to a trap handler caused by either an _exception_ or an _interrupt_
|
|
|
|
Whenever an exception or interrupt is triggered, the CPU switches to machine-mode (if not already in machine-mode)
|
|
and continues operation at the address being stored in the <<_mtvec>> CSR. The cause of the the trap can be determined via the
|
|
<<_mcause>> CSR. A list of all implemented `mcause` values and the according description can be found below in section
|
|
<<_neorv32_trap_listing>>. The address that reflects the current program counter when a trap was taken is stored to
|
|
<<_mepc>> CSR. Additional information regarding the cause of the trap can be retrieved from the <<_mtval>> and <<_mtinst>> CSRs.
|
|
|
|
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
|
|
while all remaining exceptions are ignored and discarded. If several _interrupts_ trigger at once, the one with highest priority
|
|
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
|
|
the second highest priority will get serviced and so on until no further interrupts are pending.
|
|
|
|
.Interrupts when in User-Mode
|
|
[IMPORTANT]
|
|
If the core is currently operating in less privileged user-mode, interrupts are globally enabled
|
|
even if <<_mstatus>>.mie is cleared.
|
|
|
|
.Interrupt Signal Requirements - Standard RISC-V Interrupts
|
|
[IMPORTANT]
|
|
All standard RISC-V interrupt request signals are **high-active**. A request has to stay at high-level
|
|
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
|
|
|
|
.Interrupt Signal Requirements - NEORV32-Specific Fast Interrupt Requests
|
|
[IMPORTANT]
|
|
The NEORV32-specific FIRQ request lines are triggered (= becoming pending) by a one-shot high-level.
|
|
|
|
.Instruction Atomicity
|
|
[NOTE]
|
|
All instructions execute as atomic operations - interrupts can only trigger _between_ consecutive instructions.
|
|
Even if there is a permanent interrupt request, exactly one instruction from the interrupted program will be executed before
|
|
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
|
|
|
|
|
|
:sectnums:
|
|
===== Memory Access Exceptions
|
|
|
|
If a load operation causes any exception, the instruction's destination register is **not written** at all. Furthermore,
|
|
exceptions caused by a misaligned memory address a physical memory protection fault do not trigger a memory access request at all.
|
|
|
|
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception is raised if bit 1 of the fetch
|
|
address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented there will **never** be a misaligned
|
|
instruction exception at all.
|
|
|
|
|
|
:sectnums:
|
|
===== Custom Fast Interrupt Request Lines
|
|
|
|
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
|
|
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
|
|
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.
|
|
|
|
|
|
:sectnums:
|
|
===== NEORV32 Trap Listing
|
|
|
|
The following tables show all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
|
|
and the CSR side-effects.
|
|
|
|
**Table Annotations**
|
|
|
|
The "Prio." column shows the priority of each trap with the highest priority being 1. The "RTE Trap ID" aliases are
|
|
defined by the NEORV32 core library (the runtime environment _RTE_) and can be used in plain C code when interacting
|
|
with the pre-defined RTE function. The <<_mcause>>, <<_mepc>>, <<_mtval>> and <<_mtinst>> columns show the value being
|
|
written to the according CSRs when a trap is triggered:
|
|
|
|
* **I-PC** - address of intercepted instruction (instruction has _not_ been executed yet)
|
|
* **PC** - address of instruction that caused the trap (instruction has been executed)
|
|
* **ADR** - bad data memory access address that caused the trap
|
|
* **INS** - the transformed/decompressed instruction word that caused the trap
|
|
* **0** - zero
|
|
|
|
.NEORV32 Trap Listing
|
|
[cols="1,4,8,10,2,2,2"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Prio. | `mcause` | RTE Trap ID | Cause | `mepc` | `mtval` | `mtinst`
|
|
7+^| **Exceptions** (_synchronous_ to instruction execution)
|
|
| 1 | `0x00000001` | `TRAP_CODE_I_ACCESS` | instruction access fault | I-PC | 0 | INS
|
|
| 2 | `0x00000002` | `TRAP_CODE_I_ILLEGAL` | illegal instruction | PC | 0 | INS
|
|
| 3 | `0x00000000` | `TRAP_CODE_I_MISALIGNED` | instruction address misaligned | PC | 0 | INS
|
|
| 4 | `0x0000000b` | `TRAP_CODE_MENV_CALL` | environment call from M-mode | PC | 0 | INS
|
|
| 5 | `0x00000008` | `TRAP_CODE_UENV_CALL` | environment call from U-mode | PC | 0 | INS
|
|
| 6 | `0x00000003` | `TRAP_CODE_BREAKPOINT` | software breakpoint / trigger firing | PC | 0 | INS
|
|
| 7 | `0x00000006` | `TRAP_CODE_S_MISALIGNED` | store address misaligned | PC | ADR | INS
|
|
| 8 | `0x00000004` | `TRAP_CODE_L_MISALIGNED` | load address misaligned | PC | ADR | INS
|
|
| 9 | `0x00000007` | `TRAP_CODE_S_ACCESS` | store access fault | PC | ADR | INS
|
|
| 10 | `0x00000005` | `TRAP_CODE_L_ACCESS` | load access fault | PC | ADR | INS
|
|
7+^| **Interrupts** (_asynchronous_ to instruction execution)
|
|
| 11 | `0x80000010` | `TRAP_CODE_FIRQ_0` | fast interrupt request channel 0 | I-PC | 0 | 0
|
|
| 12 | `0x80000011` | `TRAP_CODE_FIRQ_1` | fast interrupt request channel 1 | I-PC | 0 | 0
|
|
| 13 | `0x80000012` | `TRAP_CODE_FIRQ_2` | fast interrupt request channel 2 | I-PC | 0 | 0
|
|
| 14 | `0x80000013` | `TRAP_CODE_FIRQ_3` | fast interrupt request channel 3 | I-PC | 0 | 0
|
|
| 15 | `0x80000014` | `TRAP_CODE_FIRQ_4` | fast interrupt request channel 4 | I-PC | 0 | 0
|
|
| 16 | `0x80000015` | `TRAP_CODE_FIRQ_5` | fast interrupt request channel 5 | I-PC | 0 | 0
|
|
| 17 | `0x80000016` | `TRAP_CODE_FIRQ_6` | fast interrupt request channel 6 | I-PC | 0 | 0
|
|
| 18 | `0x80000017` | `TRAP_CODE_FIRQ_7` | fast interrupt request channel 7 | I-PC | 0 | 0
|
|
| 19 | `0x80000018` | `TRAP_CODE_FIRQ_8` | fast interrupt request channel 8 | I-PC | 0 | 0
|
|
| 20 | `0x80000019` | `TRAP_CODE_FIRQ_9` | fast interrupt request channel 9 | I-PC | 0 | 0
|
|
| 21 | `0x8000001a` | `TRAP_CODE_FIRQ_10` | fast interrupt request channel 10 | I-PC | 0 | 0
|
|
| 22 | `0x8000001b` | `TRAP_CODE_FIRQ_11` | fast interrupt request channel 11 | I-PC | 0 | 0
|
|
| 23 | `0x8000001c` | `TRAP_CODE_FIRQ_12` | fast interrupt request channel 12 | I-PC | 0 | 0
|
|
| 24 | `0x8000001d` | `TRAP_CODE_FIRQ_13` | fast interrupt request channel 13 | I-PC | 0 | 0
|
|
| 25 | `0x8000001e` | `TRAP_CODE_FIRQ_14` | fast interrupt request channel 14 | I-PC | 0 | 0
|
|
| 26 | `0x8000001f` | `TRAP_CODE_FIRQ_15` | fast interrupt request channel 15 | I-PC | 0 | 0
|
|
| 27 | `0x8000000B` | `TRAP_CODE_MEI` | machine external interrupt (MEI) | I-PC | 0 | 0
|
|
| 28 | `0x80000003` | `TRAP_CODE_MSI` | machine software interrupt (MSI) | I-PC | 0 | 0
|
|
| 29 | `0x80000007` | `TRAP_CODE_MTI` | machine timer interrupt (MTI) | I-PC | 0 | 0
|
|
|=======================
|
|
|
|
.NEORV32 Trap Description
|
|
[cols="<3,<7"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| Trap ID [C] | Triggered when ...
|
|
| `TRAP_CODE_I_ACCESS` | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during instruction fetch
|
|
| `TRAP_CODE_I_ILLEGAL` | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
|
|
| `TRAP_CODE_I_MISALIGNED` | fetching a 32-bit instruction word that is not 32-bit-aligned (see note below)
|
|
| `TRAP_CODE_MENV_CALL` | executing `ecall` instruction in machine-mode
|
|
| `TRAP_CODE_UENV_CALL` | executing `ecall` instruction in user-mode
|
|
| `TRAP_CODE_BREAKPOINT` | executing `ebreak` instruction or if <<_trigger_module>> fires
|
|
| `TRAP_CODE_S_MISALIGNED` | storing data to an address that is not naturally aligned to the data size (half/word)
|
|
| `TRAP_CODE_L_MISALIGNED` | loading data from an address that is not naturally aligned to the data size (half/word)
|
|
| `TRAP_CODE_S_ACCESS` | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during load data operation
|
|
| `TRAP_CODE_L_ACCESS` | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during store data operation
|
|
| `TRAP_CODE_FIRQ_*` | caused by interrupt-condition of **processor-internal modules**, see <<_neorv32_specific_fast_interrupt_requests>>
|
|
| `TRAP_CODE_MEI` | machine external interrupt (via dedicated <<_processor_top_entity_signals>>)
|
|
| `TRAP_CODE_MSI` | machine software interrupt (via dedicated <<_processor_top_entity_signals>>)
|
|
| `TRAP_CODE_MTI` | machine timer interrupt (internal <<_machine_system_timer_mtime>> or via dedicated <<_processor_top_entity_signals>>)
|
|
|=======================
|
|
|
|
.Resumable Exceptions
|
|
[WARNING]
|
|
Note that not all exceptions are resumable. For example, the "instruction access fault" exception or the "instruction
|
|
address misaligned" exception are not resumable in most cases. These exception might indicate a fatal memory hardware failure.
|
|
|