neorv32/docs/datasheet/cpu.adoc

:sectnums:
== NEORV32 Central Processing Unit (CPU)

The NEORV32 CPU is an area-optimized RISC-V core implementing the `rv32i_zicsr_zifencei` base (privileged) ISA and
supporting several additional/optional ISA extensions. The CPU's micro architecture is based on a von-Neumann
machine build upon a mixture of multi-cycle and pipelined execution schemes.

[NOTE]
This chapter assumes that the reader is familiar with the official
RISC-V _User_ and _Privileged Architecture_ specifications.

**Section Structure**

* <<_risc_v_compatibility>>
* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>
* <<_architecture>> and <<_full_virtualization>>
* <<_instruction_sets_and_extensions>> and <<_custom_functions_unit_cfu>>
* <<_control_and_status_registers_csrs>>
* <<_traps_exceptions_and_interrupts>>
* <<_bus_interface>>


// ####################################################################################################################
:sectnums:
=== RISC-V Compatibility

The NEORV32 CPU passes the tests of the **official RISCOF RISC-V Architecture Test Framework**. This framework is used to check
RISC-V implementations for compatibility to the official RISC-V user/privileged ISA specifications. The NEORV32 port of this
test framework is available in a separate repository at GitHub: https://github.com/stnolting/neorv32-riscof

.Unsupported ISA Extensions
[TIP]
Executing instructions or accessing CSRs from yet unsupported ISA extensions will raise an illegal
instruction exception (see section  <<_full_virtualization>>).


**Incompatibility Issues and Limitations**

.`time[h]` CSRs (Wall Clock Time)
[IMPORTANT]
The NEORV32 does not implement the `time[h]` registers. Any access to these registers will trap. It is
recommended that the trap handler software provides a means of accessing the platform-defined <<_machine_system_timer_mtime>>.

.No Hardware Support of Misaligned Memory Accesses
[IMPORTANT]
The CPU does not support resolving unaligned memory access by the hardware (this is not a
RISC-V-incompatibility issue but an important thing to know!). Any kind of unaligned memory access
will raise an exception to allow a _software-based_ emulation provided by the application. However, unaligned memory
access can be **emulated** using the NEORV32 runtime environment. See section <<_application_context_handling>>
for more information.

.No Atomic Read-Modify-Write Operations
[IMPORTANT]
The NEORV32 <<_a_isa_extension>> only supports the load-reservate (LR) and store-conditional (SR) instructions.
The remaining read-modify-write operations are not supported. However, these missing instructions can
be emulated. The NEORV32 <<_core_libraries>> provide an emulation wrapper for the missing AMO/read-modify-write
instructions that is based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Signals

The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
direction as seen from the CPU.

.NEORV32 CPU Signal List
[cols="<3,^3,^1,<5"]
[options="header", grid="rows"]
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i`      |           1 | in  | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
| `clk_aux_i`  |           1 | in  | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
| `rstn_i`     |           1 | in  | Global reset, low-active
| `sleep_o`    |           1 | out | CPU is in <<_sleep_mode>> when set
| `debug_o`    |           1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
| `msi_i`      |           1 | in  | RISC-V machine software interrupt
| `mei_i`      |           1 | in  | RISC-V machine external interrupt
| `mti_i`      |           1 | in  | RISC-V machine timer interrupt
| `firq_i`     |          16 | in  | Custom fast interrupt request signals
| `dbi_i`      |           1 | in  | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>)
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request
| `ibus_rsp_i` | `bus_rsp_t` | in  | Instruction fetch bus response
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request
| `dbus_rsp_i` | `bus_rsp_t` | in  | Data access (load/store) bus response
|=======================

.Bus Interface Protocol
[TIP]
See section <<_bus_interface>> for the instruction fetch and data access interface protocol and the
according interface types (`bus_req_t` and `bus_rsp_t`).


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Generics

Most of the CPU configuration generics are a subset of the actual Processor configuration generics
(see section <<_processor_top_entity_generics>>). and are not listed here. However, the CPU provides
some _specific_ generics that are used to configure the CPU for the NEORV32 processor setup. These generics
are assigned by the processor setup only and are not available for user defined configuration.
The specific generics are listed below.

.Table Abbreviations
[NOTE]
The generic type "suv(x:y)" defines a `std_ulogic_vector(x downto y)`.

.NEORV32 CPU-Exclusive Generic List
[cols="<4,^2,<8"]
[options="header",grid="rows"]
|=======================
| Name | Type | Description
| `CPU_BOOT_ADDR`              | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `CPU_DEBUG_PARK_ADDR`        | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_DEBUG_EXC_ADDR`         | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_EXTENSION_RISCV_Sdext`  | boolean   | Implement RISC-V-compatible "debug" CPU operation mode required for the <<_on_chip_debugger_ocd>>.
| `CPU_EXTENSION_RISCV_Sdtrig` | boolean   | Implement RISC-V-compatible trigger module. See section <<_on_chip_debugger_ocd>>.
|=======================


<<<
// ####################################################################################################################
:sectnums:
=== Architecture

image::neorv32_cpu.png[align=center]

The CPU implements a pipelined multi-cycle architecture: each instruction is executed as a series of consecutive
micro-operations. In order to increase performance, the CPU's front-end (instruction fetch) and back-end
(instruction execution) are de-couples via a FIFO (the instruction prefetch buffer. Thus, the front-end can already
fetch new instructions while the back-end is still processing the previously-fetched instructions.

Basically, the CPU's micro architecture is somewhere between a classical pipelined architecture, where each stage
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of these
two design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
overlapping operation of fetch and execute) at a reduced hardware footprint (due to the multi-cycle concept).

As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access. However,
these two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
have higher priority). Hence, _all_ memory addresses including peripheral devices are mapped to a single unified 32-bit
<<_address_space>>.

[NOTE]
The CPU does not perform any speculative/out-of-order operations at all. Hence, it is not vulnerable to security issues
caused by speculative execution (like Spectre or Meltdown).


:sectnums:
==== CPU Register File

The data register file contains the general purpose architecture registers `x0` to `x31`. For the `rv32e` ISA only the lower
16 registers are implemented. Register zero (`x0`/`zero`) always read as zero and any write access to it has no effect.
Up to four individual synchronous read ports allow to fetch up to 4 register operands at once. The write and read accesses
are mutually exclusive as they happen in separate cycles. Hence, there is no need to consider things like "read-during-write"
behavior.

The register file provides two different implementation options configured via the top's `REGFILE_HW_RST` generic.

* `REGFILE_HW_RST = false` (default): In this configuration the register file is implemented as plain memory array without a
dictated hardware reset. This architecture allows to infer FPGA block RAM for the entire register file resulting in minimal
logic utilization and optimal timing.
* `REGFILE_HW_RST = true`: This configuration is based on individual FFs that do provide a dedicated hardware reset.
Hence, the register cannot be mapped to FPGA block RAM. This optional should only be selected if the application requires a
reset of the register file (e.g. for security reasons) or if the design shall be synthesized for an **ASIC** implementation.

The state of this configuration generic can be checked by software via the <<_mxisa>> CSR.

.FPGA Implementation
[WARNING]
Enabling the `REGFILE_HW_RST` option for FPGA implementation is not recommended as this will massively increase the amount
of required logic resources.

.Implementation of the `zero` Register within FPGA Block RAM
[NOTE]
Register `zero` is also mapped to a _physical memory location_ within the register file's block RAM. By this, there is no need
to add a further multiplexer to "insert" zero if reading from register `zero` reducing logic requirements and shortening the
critical path. However, this also requires that the physical storage bits of register `zero` are explicitly initialized (set
to zero) by the hardware. This is done transparently by the CPU control requiring no additional processing overhead.

.Block RAM Ports
[NOTE]
The default register file configuration uses two access ports: a read-only port for reading register `rs2` (second source operand)
and a read/write port for reading register `rs1` (first source operand) and for writing processing results to register `rd`
(destination register). Hence, a simple dual-port RAM can be used to implement the entire register file. From a functional point
of view, read and write accesses to the register file do never occur in the same clock cycle, so no bypass logic is required at all.


:sectnums:
==== CPU Arithmetic Logic Unit

The arithmetic/logic unit (ALU) is used for actual data processing as well as generating memory and branch addresses.
All "simple" <<_i_isa_extension>> computational instructions (like `add` and `or`) are implemented as plain combinatorial logic
requiring only a single cycle to complete. More sophisticated instructions like shift operations or multiplications are processed
by so-called "ALU co-processors".

The co-processors are implemented as iterative units that require several cycles to complete processing. Besides the base ISA's
shift instructions, the co-processors are used to implement all further processing-based ISA extensions (e.g. <<_m_isa_extension>>
and <<_b_isa_extension>>).

.Multi-Cycle Execution Monitor
[NOTE]
The CPU control will raise an illegal instruction exception if a multi-cycle functional unit (like the <<_custom_functions_unit_cfu>>)
does not complete processing in a bound amount of time (configured via the package's `monitor_mc_tmo_c` constant; default = 512 clock cycles).

.Tuning Options
[TIP]
The ALU architecture can be tuned for an application-specific area-vs-performance trade-off. The `FAST_MUL_EN` and `FAST_SHIFT_EN`
generics can be used to implement performance-optimized barrel shifters and DSP blocks, respectively. See sections <<_i_isa_extension>>,
<<_b_isa_extension>> and <<_m_isa_extension>> for specific examples.


:sectnums:
==== CPU Bus Unit

The bus unit takes care of handling data memory accesses via load and store instructions. It handles data adjustment when accessing
sub-word data quantities (16-bit or 8-bit) and performs sign-extension for singed load operations. The bus unit also includes the optional
<<_pmp_isa_extension>> that performs permission checks for all data and instruction accesses.

A list of the bus interface signals and a detailed description of the protocol can be found in section <<_bus_interface>>.
All bus interface signals are driven/buffered by registers; so even a complex SoC interconnection bus network will not
effect maximal operation frequency.

.Unaligned Accesses
[WARNING]
The CPU does not support a hardware-based handling of unaligned memory accesses! Any unaligned access will raise a bus load/store unaligned
address exception. The exception handler can be used to _emulate_ unaligned memory accesses in software.
See the NEORV32 Runtime Environment's <<_application_context_handling>> section for more information.


:sectnums:
==== CPU Control Unit

The CPU control unit is responsible for generating all the control signals for the different CPU modules.
The control unit is split into a "front-end" and a "back-end".


**Front-End**

The front-end is responsible for fetching instructions in chunks of 32-bits. This can be a single aligned 32-bit instruction,
two aligned 16-bit instructions or a mixture of those. The instructions including control and exception information are stored
to a FIFO queue - the instruction prefetch buffer (IPB). This FIFO has a depth of two entries by default but can be customized
via the `ipb_depth_c` VHDL package constant.

The FIFO allows the front-end to do "speculative" instruction fetches, as it keeps fetching the next consecutive instruction
all the time. This also allows to decouple front-end (instruction fetch) and back-end (instruction execution) so both modules
can operate in parallel to increase performance. However, all potential side effects that are caused by this "speculative"
instruction fetch are already handled by the CPU front-end ensuring a defined execution stage while preventing security
side attacks.


**Back-End**

Instruction data from the instruction prefetch buffer is decompressed (if the `C` ISA extension is enabled) and sent to the
CPU back-end for actual execution. Execution is conducted by a state-machine that controls all of the CPU modules. The back-end also
includes the <<_control_and_status_registers_csrs>> as well as the trap controller.


==== Sleep Mode

The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing
dynamic power consumption. Sleep mode is entered by executing the `wfi` ("wait for interrupt") instruction.

[NOTE]
The `wfi` instruction will raise an illegal instruction exception when executed in user-mode
if `TW` in <<_mstatus>> is set. When executed in debug-mode or during single-stepping `wfi` will behave as
simple `nop` without entering sleep mode.

After executing the `wfi` instruction the CPU's `sleep_o` signal (<<_cpu_top_entity_signals>>) will become set
as soon as the CPU has fully halted ("CPU is sleeping"):

[start=1]
.The front-end (instruction fetch) is stopped. There is no pending instruction fetch bus access.
.The back-end (instruction execution) is stopped. There is no pending data bus access.
.There is not enabled interrupt pending.

CPU-external modules like memories, timers and peripheral interfaces are not affected by this. Furthermore, the CPU will
continue to buffer/enqueue incoming interrupt. The CPU will leave sleep mode as soon as any _enabled (via <<_mie>>)
interrupt source becomes _pending_ or if a debug session is started.

===== Power-Down Mode

Optionally, the sleep mode can also be used to shut down the CPU's main clock to further reduce power consumption
by halting the core's clock tree. This clock gating mode is enabled by the `CLOCK_GATING_EN` generic
(<<_processor_top_entity_generics>>).  See section <<_processor_clocking>> for more information.


==== Full Virtualization

Just like the RISC-V ISA, the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V
specifications. Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situations
(e.g. executing a malformed or not supported instruction or accessing a non-allocated memory address). For any kind
of trap the core is always in a defined and fully synchronized state throughout the whole system (i.e. there are no
out-of-order operations that might have to be reverted). This allows a defined and predictable execution behavior
at any time improving overall execution safety.


<<<
// ####################################################################################################################
:sectnums:
=== Bus Interface

The NEORV32 CPU provides separated instruction fetch and data access interfaces making it a **Harvard Architecture**:
the instruction fetch interface (`i_bus_*` signals) is used for fetching instructions and the data access interface
(`d_bus_*` signals) is used to access data via load and store operations. Each of these interfaces can access an address
space of up to 2^32^ bytes (4GB).

The bus interface uses two custom interface types: `bus_req_t` is used to propagate the bus access **requests**. These
signals are driven by the _accessing_ device (i.e. the CPU core). `bus_rsp_t` is used to return the bus **response** and
is driven by the _accessed_ device or bus system (i.e. a processor-internal memory or IO device).

.Bus Interface - Request Bus (`bus_req_t`)
[cols="^1,^1,<6"]
[options="header",grid="rows"]
|=======================
| Signal  | Width | Description
| `addr`  |    32 | Access address (byte addressing)
| `data`  |    32 | Write data
| `ben`   |     4 | Byte-enable for each byte in `data`
| `stb`   |     1 | Request trigger ("strobe", single-shot)
| `rw`    |     1 | Access direction (`0` = read, `1` = write)
| `src`   |     1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv`  |     1 | Set if privileged (M-mode) access
| `rvso`  |     1 | Set if current access is a reservation-set operation (atomic `lr` or `sc` instruction)
| `fence` |     1 | Data/instruction fence operation; valid without `stb` being set
|=======================

.Bus Interface - Response Bus (`bus_rsp_t`)
[cols="^1,^1,<6"]
[options="header",grid="rows"]
|=======================
| Signal | Width | Description
| `data` |    32 | Read data (single-shot)
| `ack`  |     1 | Transfer acknowledge / success (single-shot)
| `err`  |     1 | Transfer error / fail (single-shot)
|=======================


:sectnums:
==== Bus Interface Protocol

Transactions are triggered entirely by the request bus. A new bus request is initiated by setting the _strobe_
signal `stb` high for exactly one cycle. All remaining signals of the bus are set together with `stb` and will
remain unchanged until the transaction is completed.

The transaction is completed when the accessed device returns a response via the response interface:
`ack` is high for exactly one cycle if the transaction was completed successfully. `err` is high for exactly
one cycle if the transaction failed to complete. These two signals are mutually exclusive. In case of a read
access the read data is returned together with the `ack` signal. Otherwise, the return data signal is
kept at all-zero allowing wired-or interconnection of all response buses.

The figure below shows three exemplary bus accesses:

[start=1]
. A read access to address `A_addr` returning `rdata` after several cycles (slow response; `ACK` arrives after several cycles).
. A write access to address `B_addr` writing `wdata` (fastest response; `ACK` arrives right in the next cycle).
. A failing read access to address `C_addr` (slow response; `ERR` arrives after several cycles).

.Three Exemplary Bus Transactions
image::bus_interface.png[700]


:sectnums:
==== Atomic Accesses

The load-reservate (`lr.w`) and store-conditional (`sc.w`) instructions from the <<_a_isa_extension>> execute as standard
load/store bus transactions but with the `rvso` ("reservation set operation") signal being set. It is the task of the
<<_reservation_set_controller>> to handle these LR/SC bus transactions accordingly.

.Reservation Set Controller
[NOTE]
See section <<_address_space>> / <<_reservation_set_controller>> for more information.

.Read-Modify-Write Operations
[IMPORTANT]
Read-modify-write operations (line an atomic swap / `amoswap.w`) are **not** supported. However, the NEORV32
<<_core_libraries>> provide an emulation wrapper for those unsupported instructions that is
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.

The figure below shows three exemplary bus accesses (1 to 3 from left to right). The `req` signal record represents
the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as `rvs_valid` signal.

[start=1]
. A load-reservate (LR) instruction using `addr` as address. This instruction returns the loaded data `rdata` via `rsp.data`
and also registers a reservation for the address `addr` (`rvs_valid` becomes set).
. A store-conditional (SC) instruction attempts to write `wdata1` to address `addr`. This SC operation **succeeds**, so
`wdata1` is actually written to address `addr`. The successful operation is indicated by a **0** being returned via
`rsp.data` together with `ack`. As the LR/SC is completed the registered reservation is invalidated (`rvs_valid` becomes cleared).
. Another store-conditional (SC) instruction attempts to write `wdata2` to address `addr`. As the reservation set is already
invalidated (`rvs_valid` is `0`) the store access fails, so `wdata2` is **not** written to address `addr` at all. The failed
operation is indicated by a **1** being returned via `rsp.data` together with `ack`.

.Three Exemplary LR/SC Bus Transactions
image::bus_interface_atomic.png[700]

.SC Status
[NOTE]
The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).


<<<
// ####################################################################################################################
:sectnums:
=== Instruction Sets and Extensions

The NEORV32 CPU provides several optional RISC-V and custom ISA extensions. The extensions can be enabled/configured
via the according <<_processor_top_entity_generics>>. This chapter gives a brief overview of the different ISA extensions.

.NEORV32 Instruction Set Extensions
[cols="<2,<5,<3"]
[options="header",grid="rows"]
|=======================
| Name | Description | <<_processor_top_entity_generics, Enabled by Generic>>
| <<_a_isa_extension,`A`>> | Atomic memory access instructions | `CPU_EXTENSION_RISCV_A`
| <<_b_isa_extension,`B`>> | Bit-manipulation instructions | `CPU_EXTENSION_RISCV_B`
| <<_c_isa_extension,`C`>> | Compressed (16-bit) instructions | `CPU_EXTENSION_RISCV_C`
| <<_e_isa_extension,`E`>> | Embedded CPU extension (reduced register file size) | `CPU_EXTENSION_RISCV_E`
| <<_i_isa_extension,`I`>> | Integer base ISA | Enabled if `CPU_EXTENSION_RISCV_E` is **not** enabled
| <<_m_isa_extension,`M`>> | Integer multiplication and division instructions | `CPU_EXTENSION_RISCV_M`
| <<_u_isa_extension,`U`>> | Less-privileged _user_ mode extension | `CPU_EXTENSION_RISCV_U`
| <<_x_isa_extension,`X`>> | Platform-specific / NEORV32-specific extension | Always enabled
| <<_zifencei_isa_extension,`Zifencei`>> | Instruction stream synchronization instruction | Always enabled
| <<_zfinx_isa_extension,`Zfinx`>> | Floating-point instructions using integer registers | `CPU_EXTENSION_RISCV_Zfinx`
| <<_zicntr_isa_extension,`Zicntr`>> | Base counters extension | `CPU_EXTENSION_RISCV_Zicntr`
| <<_zicond_isa_extension,`Zicond`>> | Integer conditional operations | `CPU_EXTENSION_RISCV_Zicond`
| <<_zicsr_isa_extension,`Zicsr`>> | Control and status register access instructions | Always enabled
| <<_zihpm_isa_extension,`Zihpm`>> | Hardware performance monitors extension | `CPU_EXTENSION_RISCV_Zihpm`
| <<_zmmul_isa_extension,`Zmmul`>> | Integer multiplication-only instruction | `CPU_EXTENSION_RISCV_Zmmul`
| <<_zcfu_isa_extension,`Zcfu`>> | Custom / user-defined instructions | `CPU_EXTENSION_RISCV_Zxcfu`
| <<_pmp_isa_extension,`PMP`>> | Physical memory protection extension | `PMP_NUM_REGIONS`
| <<_sdext_isa_extension,`Sdext`>> | External debug support extension | `ON_CHIP_DEBUGGER_EN`
| <<_sdtrig_isa_extension,`Sdtrig`>> | Trigger module extension | `ON_CHIP_DEBUGGER_EN`
|=======================

.RISC-V ISA Specifications
[TIP]
For more information regarding the RISC-V ISA extensions please refer to the "RISC-V Instruction Set Manual - Volume
I: Unprivileged ISA" and "The RISC-V Instruction Set Manual Volume II: Privileged Architecture" Acopy of all currently
implemented ISA extensions can be found in the projects `docs/references` folder.

.Discovering ISA Extensions
[TIP]
Software can discover available ISA extensions via the <<_misa>> and <<_mxisa>> CSRs or by executing an instruction
and checking for an illegal instruction exception (i.e. <<_full_virtualization>>).

.ISA Extensions-Specific CSRs
[NOTE]
The <<_control_and_status_registers_csrs>> section lists the according ISA extensions for all CSRs.


==== `A` ISA Extension

The `A` ISA extension adds instructions and mechanisms for atomic memory access operations. Note that the NEORV32 `A`
only includes the _load-reservate_ (`lr.w`) and _store-conditional_ (`sc.w`) instructions - the remaining read-modify-write
instructions (like `amoswap`) are **not supported**. However, these missing instructions can be emulated using the
LR and SC operations.

.AMO Emulation
[NOTE]
The NEORV32 <<_core_libraries>> provide an emulation wrapper for the missing AMO/read-modify-write instructions that is
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.

Atomic instructions allow to notify an application if a certain memory location has been altered by another instance
(like another process running on the same CPU or a DMA access). Hence, they can be used to implement synchronization
mechanisms like mutexes and semaphores).

The NEORV32 `A` extension is enabled via the `CPU_EXTENSION_RISCV_A` generic (see <<_processor_top_entity_generics>>).
When enabled the following additional instructions are available.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Load-reservate word    | `lr.w` | 5
| Store-conditional word | `sc.w` | 5
|=======================

The `lr.w` instructions stores one word to a word-aligned address and registers a _reservation set_. The `sc.w`
instruction stores a word to a word-aligned address only if the reservation set is still valid. Furthermore, the
`sc.w` operations returns the state of the reservation set (0 = reservation set still valid, data has been written;
1 = reservation set was broken, no data has been written). The reservation set is invalidated if another `lr.w` instruction
is executed or if any write access to the _reservated_ address takes place. Traps and/or CPU privilege level changes
do not modify current reservation sets.

.`aq` and `rl` Bits
[NOTE]
The instruction word's `aq` and `lr` memory ordering bits are not evaluated by the hardware at all.

.Atomic Memory Access on Hardware Level
[NOTE]
More information regarding the atomic memory accesses and the according reservation
sets can be found in section <<_reservation_set_controller>>.

.Cache Coherency
[IMPORTANT]
Atomic operations **always bypass** the CPU caches using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the `fence` instruction).


==== `B` ISA Extension

The `B` ISA extension adds instructions for bit-manipulation operations.
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_bitmanip.vhd`).
The NEORV32 `B` ISA extension includes the following sub-extensions:

* `Zba` - Address-generation instructions
* `Zbb` - Basic bit-manipulation instructions
* `Zbc` - Carry-less multiplication instructions
* `Zbs` - Single-bit instructions

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Arithmetic/logic    | `min[u]` `max[u]` `sext.b` `sext.h` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 4
| Shifts              | `clz` `ctz`                                                                                       | 3 + 1..32; FAST_SHIFT: 4
| Shifts              | `cpop`                                                                                            | 36; FAST_SHIFT: 4
| Shifts              | `rol` `ror[i]`                                                                                    | 4 + _shift_amount_; FAST_SHIFT: 4
| Shifted-add         | `sh1add` `sh2add` `sh3add`                                                                        | 4
| Single-bit          | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]`                                                       | 4
| Carry-less multiply | `clmul` `clmulh` `clmulr`                                                                         | 36
|=======================

.Barrel Shifter
[TIP]
Shift operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN`
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.


==== `C` ISA Extension

The "compressed" ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| ALU           | `c.addi4spn` `c.nop` `c.add[i]` `c.li` `c.addi16sp` `c.lui` `c.and[i]` `c.sub` `c.xor` `c.or` `c.mv` | 2
| ALU           | `c.srli` `c.srai` `c.slli`                                                                           | 3 + 1..32; FAST_SHIFT: 4
| Branches      | `c.beqz` `c.bnez`                                                                                    | taken: 6; not taken: 3
| Jumps / calls | `c.jal[r]` `c.j` `c.jr`                                                                              | 6
| Memory access | `c.lw` `c.sw` `c.lwsp` `c.swsp`                                                                      | 4
| System        | `c.break`                                                                                            | 3
|=======================


==== `E` ISA Extension

The "embedded" ISA extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
shrink hardware size. It provides the same instructions as the the base `I` ISA extensions.

[NOTE]
Due to the reduced register file size an alternate toolchain ABI (`ilp32e*`) is required.


==== `I` ISA Extension

The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| ALU           | `add[i]` `slt[i]` `slt[i]u` `xor[i]` `or[i]` `and[i]` `sub` `lui` `auipc` | 2
| ALU shifts    | `sll[i]` `srl[i]` `sra[i]`                                                | 3 + 1..32; FAST_SHIFT: 4
| Branches      | `beq` `bne` `blt` `bge` `bltu` `bgeu`                                     | taken: 6; not taken: 3
| Jump/call     | `jal[r]`                                                                  | 6
| Load/store    | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`                                 | 5
| System        | `ecall` `ebreak`                                                          | 3
| Data fence    | `fence`                                                                   | 5
| System        | `wfi`                                                                     | 3
| System        | `mret`                                                                    | 5
| Illegal inst. | -                                                                         | 3
|=======================

.`fence` Instruction
[NOTE]
The `fence` instruction word's _predecessor_ and _successor_ bits (used for memory ordering) are not evaluated
by the hardware at all. For the NEORV32 the `fence` instruction behaves exactly like the `fence.i` instruction
(see <<_zifencei_isa_extension>>). However, software should still use distinct `fence` and `fence.i` to provide
platform-compatibility and to indicate the actual intention of the according fence instruction(s).

.`wfi` Instruction
[NOTE]
The `wfi` instruction is used to enter <<_sleep_mode>>. Executing the `wfi` instruction in user-mode
will raise an illegal instruction exception if the `TW` bit of <<_mstatus>> is set.

.Barrel Shifter
[TIP]
The shift operations are implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_shifter.vhd`).
These operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_SHIFT_EN`
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.


==== `M` ISA Extension

Hardware-accelerated integer multiplication and division operations are available via the RISC-V `M` ISA extension.
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_muldiv.vhd`).

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Multiplication | `mul` `mulh` `mulhsu` `mulhu` | 36; FAST_MUL: 4
| Division       | `div` `divu` `rem` `remu`     | 36
|=======================

.DSP Blocks
[TIP]
Multiplication operations can be accelerated (at the cost of additional logic resources) by enabling the `FAST_MUL_EN`
configuration option that will replace the (time-variant) bit-serial multiplier by (time-constant) FPGA DSP blocks.


==== `U` ISA Extension

In addition to the highest-privileged machine-mode, the user-mode ISA extensions adds a second **less-privileged**
operation mode. Code executed in user-mode has reduced CSR access rights. Furthermore, user-mode accesses to the address space
(like peripheral/IO devices) can be constrained via the physical memory protection.
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.


==== `X` ISA Extension

The NEORV32-specific ISA extensions `X` is always enabled. The most important points of the NEORV32-specific extensions are:
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>
and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the
RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
* There are <<_neorv32_specific_csrs>>.


==== `Zifencei` ISA Extension

The `Zifencei` CPU extension allows manual synchronization of the instruction stream. This extension is always enabled.

.NEORV32 Fence Instructions
[NOTE]
The NEORV32 treats both fence instructions (`fence` = data fence, `fence.i` = instruction fence) in exactly the same way.
Both instructions cause a flush of the CPU's instruction prefetch buffer and also send a fence request via the system
bus (see <<_bus_interface>>). This system bus fence operation will, for example, clear/flush all downstream caches.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Instruction fence | `fence.i` | 5
|=======================


==== `Zfinx` ISA Extension

The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
It also uses the integer register file `x` to store and operate on floating-point data
instead of a dedicated floating-point register file. Thus, the `Zfinx` extension requires
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
register file-related load/store or move instructions. The `Zfinx` extension'S floating-point unit is controlled
via dedicated <<_floating_point_csrs>>.
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_fpu.vhd`).

.Fused Multiply-Add and Division Instructions
[WARNING]
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!

.Subnormal Number
[WARNING]
Subnormal numbers ("de-normalized" numbers, i.e. exponent = 0) are not supported by the NEORV32 FPU.
Subnormal numbers are _flushed to zero_ setting them to +/- 0 before being processed by **any** FPU operation.
If a computational instruction generates a subnormal result it is also flushed to zero during normalization.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Artihmetic | `fadd.s`                                      | 110
| Artihmetic | `fsub.s`                                      | 112
| Artihmetic | `fmul.s`                                      | 22
| Compare    | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`     | 13
| Conversion | `fcvt.w.s` `fcvt.wu.s` `fcvt.s.w` `fcvt.s.wu` | 48
| Misc       | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s`    | 12
|=======================


==== `Zicntr` ISA Extension

The `Zicntr` ISA extension adds the basic <<_cycleh>>, <<_mcycleh>>, <<_instreth>> and <<_minstreth>>
counter CSRs. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.

[NOTE]
The user-mode `time[h]` CSRs are **not implemented**. Any access will trap allowing the trap handler to
retrieve system time from the <<_machine_system_timer_mtime>>.

[NOTE]
This extensions is stated as _mandatory_ by the RISC-V spec. However, area-constrained setups may remove
support for these counters.


==== `Zicond` ISA Extension

The `Zicond` ISA extension adds integer conditional move primitives that allow to implement branch-less
control flows. It is enabled by the top's `CPU_EXTENSION_RISCV_Zicond` generic.
This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv32_cpu_cp_cond.vhd`).

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Conditional | `czero.eqz` `czero.nez` | 3
|=======================


==== `Zicsr` ISA Extension

This ISA extensions provides instructions for accessing the <<_control_and_status_registers_csrs>> as well as further
privileged-architecture extensions. This extension is mandatory and cannot be disabled. Hence, there is no generic
for enabling/disabling this ISA extension.

[NOTE]
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
(the RISC-V spec. state that these combinations "shall" not cause any side-effects).

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| System | `csrrw[i]` `csrrs[i]` `csrrc[i]` | 3
|=======================


==== `Zihpm` ISA Extension

In additions to the base counters the NEORV32 CPU provides up to 13 hardware performance monitors (HPM 3..15),
which can be used to benchmark applications. Each HPM consists of an N-bit wide counter (split in a high-word 32-bit
CSR and a low-word 32-bit CSR), where N is defined via the top's
`HPM_CNT_WIDTH` generic and a corresponding event configuration CSR. The event configuration
CSR defines the architectural events that lead to an increment of the associated HPM counter. See section
<<_hardware_performance_monitors_hpm_csrs>> for a list of all HPM-related CSRs and event configurations.

[TIP]
Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.


==== `Zmmul` - ISA Extension

This is a sub-extension of the <<_m_isa_extension>> ISA extension. It implements only the multiplication operations
of the `M` extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.


==== `Zxcfu` ISA Extension

The `Zxcfu` presents a NEORV32-specific ISA extension. It adds the <<_custom_functions_unit_cfu>> to
the CPU core, which allows to add custom RISC-V instructions to the processor core.
For detailed information regarding the CFU, its hardware and the according software interface
see section <<_custom_functions_unit_cfu>>.

Software can utilize the custom instructions by using _intrinsics_, which are basically inline assembly functions that
behave like regular C functions but that evaluate to a single custom instruction word (no calling overhead at all).


==== `PMP` ISA Extension

The NEORV32 physical memory protection (PMP, also known as `Smpmp` ISA extension) provides an elementary memory
protection mechanism that can be used to constrain read, write and execute rights of arbitrary memory regions.
The NEORV32 PMP is fully compatible to the RISC-V Privileged Architecture Specifications. In general, the PMP can
**grant permissions to user mode**, which by default has none, and can **revoke permissions from M-mode**, which
by default has full permissions. The PMP is configured via the <<_machine_physical_memory_protection_csrs>>.

Several <<_processor_top_entity_generics>> are provided to fine-tune the CPU's PMP capabilities:
* `PMP_NUM_REGIONS` defines the number of implemented PMP region
* `PMP_MIN_GRANULARITY` defines the minimal granularity of each region
* `PMP_TOR_MODE_EN` controls the implementation of the top-of-region (TOR) mode
* `PMP_NAP_MODE_EN` controls the implementation of the naturally-aligned-power-of-two (NA4 and NAPOT) modes

.PMP Rules when in Debug Mode
[NOTE]
When in debug-mode all PMP rules are ignored making the debugger have maximum access rights.

[IMPORTANT]
Instruction fetches are also triggered when denied by a certain PMP rule. However, the fetched instruction(s)
will not be executed and will not change CPU core state.


==== `Sdext` ISA Extension

This ISA extension enables the RISC-V-compatible "external debug support" by implementing
the CPU "debug mode", which is required for the on-chip debugger.
See section <<_on_chip_debugger_ocd>> / <<_cpu_debug_mode>> for more information.

.Instructions and Timing
[cols="<2,<4,<3"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| System | `dret` | 5
|=======================

==== `Sdtrig` ISA Extension

This ISA extension implements the RISC-V-compatible "trigger module".
See section <<_on_chip_debugger_ocd>> / <<_trigger_module>> for more information.


<<<
// ####################################################################################################################

include::cpu_cfu.adoc[]


<<<
// ####################################################################################################################
include::cpu_csr.adoc[]


<<<
// ####################################################################################################################
:sectnums:
==== Traps, Exceptions and Interrupts

In this document the following terminology is used (derived from the RISC-V trace specification
available at https://github.com/riscv-non-isa/riscv-trace-spec):

* **exception**: an unusual condition occurring at run time associated (i.e. _synchronous_) with an instruction in a RISC-V hart
* **interrupt**: an external _asynchronous_ event that may cause a RISC-V hart to experience an unexpected transfer of control
* **trap**: the transfer of control to a trap handler caused by either an _exception_ or an _interrupt_

Whenever an exception or interrupt is triggered, the CPU switches to machine-mode (if not already in machine-mode)
and continues operation at the address being stored in the <<_mtvec>> CSR. The cause of the the trap can be determined via the
<<_mcause>> CSR. A list of all implemented `mcause` values and the according description can be found below in section
<<_neorv32_trap_listing>>. The address that reflects the current program counter when a trap was taken is stored to
<<_mepc>> CSR. Additional information regarding the cause of the trap can be retrieved from the <<_mtval>> and <<_mtinst>> CSRs.

The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
while all remaining exceptions are ignored and discarded. If several _interrupts_ trigger at once, the one with highest priority
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
the second highest priority will get serviced and so on until no further interrupts are pending.

.Interrupts when in User-Mode
[IMPORTANT]
If the core is currently operating in less privileged user-mode, interrupts are globally enabled
even if <<_mstatus>>.mie is cleared.

.Interrupt Signal Requirements - Standard RISC-V Interrupts
[IMPORTANT]
All standard RISC-V interrupt request signals are **high-active**. A request has to stay at high-level
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).

.Interrupt Signal Requirements - NEORV32-Specific Fast Interrupt Requests
[IMPORTANT]
The NEORV32-specific FIRQ request lines are triggered (= becoming pending) by a one-shot high-level.

.Instruction Atomicity
[NOTE]
All instructions execute as atomic operations - interrupts can only trigger _between_ consecutive instructions.
Even if there is a permanent interrupt request, exactly one instruction from the interrupted program will be executed before
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.


:sectnums:
===== Memory Access Exceptions

If a load operation causes any exception, the instruction's destination register is **not written** at all. Furthermore,
exceptions caused by a misaligned memory address a physical memory protection fault do not trigger a memory access request at all.

For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception is raised if bit 1 of the fetch
address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented there will **never** be a misaligned
instruction exception at all.


:sectnums:
===== Custom Fast Interrupt Request Lines

As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.


:sectnums:
===== NEORV32 Trap Listing

The following tables show all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
and the CSR side-effects.

**Table Annotations**

The "Prio." column shows the priority of each trap with the highest priority being 1. The "RTE Trap ID" aliases are
defined by the NEORV32 core library (the runtime environment _RTE_) and can be used in plain C code when interacting
with the pre-defined RTE function. The <<_mcause>>, <<_mepc>>, <<_mtval>> and <<_mtinst>> columns show the value being
written to the according CSRs when a trap is triggered:

* **I-PC** - address of intercepted instruction (instruction has _not_ been executed yet)
* **PC** - address of instruction that caused the trap (instruction has been executed)
* **ADR** - bad data memory access address that caused the trap
* **INS** - the transformed/decompressed instruction word that caused the trap
* **0** - zero

.NEORV32 Trap Listing
[cols="1,4,8,10,2,2,2"]
[options="header",grid="rows"]
|=======================
| Prio. | `mcause`     | RTE Trap ID              | Cause                                | `mepc` | `mtval` | `mtinst`
7+^| **Exceptions** (_synchronous_ to instruction execution)
| 1     | `0x00000001` | `TRAP_CODE_I_ACCESS`     | instruction access fault             | I-PC   | 0       | INS
| 2     | `0x00000002` | `TRAP_CODE_I_ILLEGAL`    | illegal instruction                  | PC     | 0       | INS
| 3     | `0x00000000` | `TRAP_CODE_I_MISALIGNED` | instruction address misaligned       | PC     | 0       | INS
| 4     | `0x0000000b` | `TRAP_CODE_MENV_CALL`    | environment call from M-mode         | PC     | 0       | INS
| 5     | `0x00000008` | `TRAP_CODE_UENV_CALL`    | environment call from U-mode         | PC     | 0       | INS
| 6     | `0x00000003` | `TRAP_CODE_BREAKPOINT`   | software breakpoint / trigger firing | PC     | 0       | INS
| 7     | `0x00000006` | `TRAP_CODE_S_MISALIGNED` | store address misaligned             | PC     | ADR     | INS
| 8     | `0x00000004` | `TRAP_CODE_L_MISALIGNED` | load address misaligned              | PC     | ADR     | INS
| 9     | `0x00000007` | `TRAP_CODE_S_ACCESS`     | store access fault                   | PC     | ADR     | INS
| 10    | `0x00000005` | `TRAP_CODE_L_ACCESS`     | load access fault                    | PC     | ADR     | INS
7+^| **Interrupts** (_asynchronous_ to instruction execution)
| 11    | `0x80000010` | `TRAP_CODE_FIRQ_0`       | fast interrupt request channel 0     | I-PC   | 0       | 0
| 12    | `0x80000011` | `TRAP_CODE_FIRQ_1`       | fast interrupt request channel 1     | I-PC   | 0       | 0
| 13    | `0x80000012` | `TRAP_CODE_FIRQ_2`       | fast interrupt request channel 2     | I-PC   | 0       | 0
| 14    | `0x80000013` | `TRAP_CODE_FIRQ_3`       | fast interrupt request channel 3     | I-PC   | 0       | 0
| 15    | `0x80000014` | `TRAP_CODE_FIRQ_4`       | fast interrupt request channel 4     | I-PC   | 0       | 0
| 16    | `0x80000015` | `TRAP_CODE_FIRQ_5`       | fast interrupt request channel 5     | I-PC   | 0       | 0
| 17    | `0x80000016` | `TRAP_CODE_FIRQ_6`       | fast interrupt request channel 6     | I-PC   | 0       | 0
| 18    | `0x80000017` | `TRAP_CODE_FIRQ_7`       | fast interrupt request channel 7     | I-PC   | 0       | 0
| 19    | `0x80000018` | `TRAP_CODE_FIRQ_8`       | fast interrupt request channel 8     | I-PC   | 0       | 0
| 20    | `0x80000019` | `TRAP_CODE_FIRQ_9`       | fast interrupt request channel 9     | I-PC   | 0       | 0
| 21    | `0x8000001a` | `TRAP_CODE_FIRQ_10`      | fast interrupt request channel 10    | I-PC   | 0       | 0
| 22    | `0x8000001b` | `TRAP_CODE_FIRQ_11`      | fast interrupt request channel 11    | I-PC   | 0       | 0
| 23    | `0x8000001c` | `TRAP_CODE_FIRQ_12`      | fast interrupt request channel 12    | I-PC   | 0       | 0
| 24    | `0x8000001d` | `TRAP_CODE_FIRQ_13`      | fast interrupt request channel 13    | I-PC   | 0       | 0
| 25    | `0x8000001e` | `TRAP_CODE_FIRQ_14`      | fast interrupt request channel 14    | I-PC   | 0       | 0
| 26    | `0x8000001f` | `TRAP_CODE_FIRQ_15`      | fast interrupt request channel 15    | I-PC   | 0       | 0
| 27    | `0x8000000B` | `TRAP_CODE_MEI`          | machine external interrupt (MEI)     | I-PC   | 0       | 0
| 28    | `0x80000003` | `TRAP_CODE_MSI`          | machine software interrupt (MSI)     | I-PC   | 0       | 0
| 29    | `0x80000007` | `TRAP_CODE_MTI`          | machine timer interrupt (MTI)        | I-PC   | 0       | 0
|=======================

.NEORV32 Trap Description
[cols="<3,<7"]
[options="header",grid="rows"]
|=======================
| Trap ID [C] | Triggered when ...
| `TRAP_CODE_I_ACCESS`     | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during instruction fetch
| `TRAP_CODE_I_ILLEGAL`    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
| `TRAP_CODE_I_MISALIGNED` | fetching a 32-bit instruction word that is not 32-bit-aligned (see note below)
| `TRAP_CODE_MENV_CALL`    | executing `ecall` instruction in machine-mode
| `TRAP_CODE_UENV_CALL`    | executing `ecall` instruction in user-mode
| `TRAP_CODE_BREAKPOINT`   | executing `ebreak` instruction or if <<_trigger_module>> fires
| `TRAP_CODE_S_MISALIGNED` | storing data to an address that is not naturally aligned to the data size (half/word)
| `TRAP_CODE_L_MISALIGNED` | loading data from an address that is not naturally aligned to the data size  (half/word)
| `TRAP_CODE_S_ACCESS`     | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during load data operation
| `TRAP_CODE_L_ACCESS`     | bus timeout, bus access error or <<_pmp_isa_extension,PMP>> rule violation during store data operation
| `TRAP_CODE_FIRQ_*`       | caused by interrupt-condition of **processor-internal modules**, see <<_neorv32_specific_fast_interrupt_requests>>
| `TRAP_CODE_MEI`          | machine external interrupt (via dedicated <<_processor_top_entity_signals>>)
| `TRAP_CODE_MSI`          | machine software interrupt (via dedicated <<_processor_top_entity_signals>>)
| `TRAP_CODE_MTI`          | machine timer interrupt (internal <<_machine_system_timer_mtime>> or via dedicated <<_processor_top_entity_signals>>)
|=======================

.Resumable Exceptions
[WARNING]
Note that not all exceptions are resumable. For example, the "instruction access fault" exception or the "instruction
address misaligned" exception are not resumable in most cases. These exception might indicate a fatal memory hardware failure.