neorv32/docs/datasheet/soc.adoc

778 lines
46 KiB
Plaintext

// ####################################################################################################################
:sectnums:
== NEORV32 Processor (SoC)
The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral
interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform.
.The NEORV32 Processor (Block Diagram)
image::neorv32_processor.png[align=center]
**Section Structure**
* <<_processor_top_entity_signals>> and <<_processor_top_entity_generics>>
* <<_processor_clocking>> and <<_processor_reset>>
* <<_processor_interrupts>>
* <<_address_space>> and <<_boot_configuration>>
* <<_processor_internal_modules>>
**Key Features**
* _optional_ processor-internal data and instruction memories (<<_data_memory_dmem,**DMEM**>>/<<_instruction_memory_imem,**IMEM**>>)
* _optional_ caches (<<_processor_internal_instruction_cache_icache,**iCACHE**>>/<<_processor_internal_data_cache_dcache,**dCACHE**>>)
* _optional_ internal bootloader (<<_bootloader_rom_bootrom,**BOOTROM**>>) with UART console & SPI flash boot option
* _optional_ machine system timer (<<_machine_system_timer_mtime,**MTIME**>>), RISC-V-compatible
* _optional_ two independent universal asynchronous receivers and transmitters (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,**UART0**>>,
<<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,**UART1**>>) with optional hardware flow control (RTS/CTS)
* _optional_ serial peripheral interface host controller (<<_serial_peripheral_interface_controller_spi,**SPI**>>) with 8 dedicated CS lines
* _optional_ 8-bit serial data device interface (<<_serial_data_interface_controller_spi,**SDI**>>)
* _optional_ two wire serial interface controller (<<_two_wire_serial_interface_controller_twi,**TWI**>>), compatible to the I²C standard
* _optional_ general purpose parallel IO port (<<_general_purpose_input_and_output_port_gpio,**GPIO**>>), 64xOut, 64xIn
* _optional_ 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (<<_processor_external_memory_interface_wishbone,**WISHBONE**>>)
* _optional_ watchdog timer (<<_watchdog_timer_wdt,**WDT**>>)
* _optional_ PWM controller with up to 12 channels & 8-bit duty cycle resolution (<<_pulse_width_modulation_controller_pwm,**PWM**>>)
* _optional_ ring-oscillator-based true random number generator (<<_true_random_number_generator_trng,**TRNG**>>)
* _optional_ custom functions subsystem for custom co-processor extensions (<<_custom_functions_subsystem_cfs,**CFS**>>)
* _optional_ NeoPixel(TM)/WS2812-compatible smart LED interface (<<_smart_led_interface_neoled,**NEOLED**>>)
* _optional_ external interrupt controller with up to 32 channels (<<_external_interrupt_controller_xirq,**XIRQ**>>)
* _optional_ general purpose 32-bit timer (<<_general_purpose_timer_gptmr,**GPTMR**>>) with capture input
* _optional_ execute in-place module (<<_execute_in_place_module_xip,**XIP**>>)
* _optional_ 1-wire serial interface controller (<<_one_wire_serial_interface_controller_onewire,**ONEWIRE**>>), compatible to the 1-wire standard
* _optional_ autonomous direct memory access controller (<<_direct_memory_access_controller_dma,**DMA**>>)
* _optional_ stream link interface (<<_stream_link_interface_slink,**SLINK**>>), AXI4-Stream compatible
* _optional_ cyclic redundancy check unit (<<_cyclic_redundancy_check_crc,**CRC**>>)
* _optional_ on-chip debugger with JTAG TAP (<<_on_chip_debugger_ocd,**OCD**>>)
* system configuration information memory to check HW configuration via software (<<_system_configuration_information_memory_sysinfo,**SYSINFO**>>)
<<<
// ####################################################################################################################
:sectnums:
=== Processor Top Entity - Signals
The following table shows all interface signals of the processor top entity (`rtl/core/neorv32_top.vhd`).
All signals are of type `std_ulogic` or `std_ulogic_vector`, respectively.
.Default Values of Inputs
[NOTE]
All _optional_ input signals provide default values in case they are not explicitly assigned during instantiation.
The weak driver strengths of VHDL (`'L'` and `'H'`) are used to model a pull-down or pull-up resistor.
.Configurable Amount of Channels
[NOTE]
Some peripherals allow to configure the number of channels to-be-implemented by a generic (for example the number
of PWM channels). The according input/output signals have a fixed sized regardless of the actually configured
amount of channels. If less than the maximum number of channels is configured, only the LSB-aligned channels are used:
in case of an _input port_ the remaining bits/channels are left unconnected; in case of an _output port_ the remaining
bits/channels are hardwired to zero.
.Tri-State Interfaces
[NOTE]
Some interfaces (like the TWI and the 1-Wire bus) require tri-state drivers in the designs top module.
.NEORV32 Processor Signal List
[cols="<3,^1,^1,^1,<8"]
[options="header",grid="rows"]
|=======================
| Name | Width | Direction | Default | Description
5+^| **Global Control (<<_processor_clocking>> and <<_processor_reset>>)**
| `clk_i` | 1 | in | none | global clock line, all registers triggering on rising edge
| `rstn_i` | 1 | in | none | global reset, asynchronous, **low-active**
5+^| **JTAG Access Port for <<_on_chip_debugger_ocd>>**
| `jtag_trst_i` | 1 | in | `'H'` | TAP reset, low-active (optional)
| `jtag_tck_i` | 1 | in | `'L'` | serial clock
| `jtag_tdi_i` | 1 | in | `'L'` | serial data input
| `jtag_tdo_o` | 1 | out | - | serial data output
| `jtag_tms_i` | 1 | in | `'L'` | mode select
5+^| **<<_processor_external_memory_interface_wishbone>>**
| `wb_tag_o` | 3 | out | - | tag (access type identifier)
| `wb_adr_o` | 32 | out | - | destination address
| `wb_dat_i` | 32 | in | `'L'` | write data
| `wb_dat_o` | 32 | out | - | read data
| `wb_we_o` | 1 | out | - | write enable ('0' = read transfer)
| `wb_sel_o` | 4 | out | - | byte enable
| `wb_stb_o` | 1 | out | - | strobe
| `wb_cyc_o` | 1 | out | - | valid cycle
| `wb_lock_o` | 1 | out | - | exclusive access request
| `wb_ack_i` | 1 | in | `'L'` | transfer acknowledge
| `wb_err_i` | 1 | in | `'L'` | transfer error
5+^| **<<_stream_link_interface_slink>>**
| `slink_rx_dat_i` | 32 | in | `'L'` | RX data
| `slink_rx_val_i` | 1 | in | `'L'` | RX data valid
| `slink_rx_lst_i` | 1 | in | `'L'` | RX last element of stream
| `slink_rx_rdy_o` | 1 | out | - | RX ready to receive
| `slink_tx_dat_o` | 32 | out | - | TX data
| `slink_tx_val_o` | 1 | out | - | TX data valid
| `slink_tx_lst_o` | 1 | out | - | TX last element of stream
| `slink_tx_rdy_i` | 1 | in | `'L'` | TX allowed to send
5+^| **<<_execute_in_place_module_xip>>**
| `xip_csn_o` | 1 | out | - | chip select, low-active
| `xip_clk_o` | 1 | out | - | serial clock
| `xip_dat_i` | 1 | in | `'L'` | serial data input
| `xip_dat_o` | 1 | out | - | serial data output
5+^| **<<_general_purpose_input_and_output_port_gpio>>**
| `gpio_o` | 64 | out | - | general purpose parallel output
| `gpio_i` | 64 | in | `'L'` | general purpose parallel input
5+^| **<<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>**
| `uart0_txd_o` | 1 | out | - | serial transmitter
| `uart0_rxd_i` | 1 | in | `'L'` | serial receiver
| `uart0_rts_o` | 1 | out | - | RX ready to receive new char
| `uart0_cts_i` | 1 | in | `'L'` | TX allowed to start sending, low-active
5+^| **<<_secondary_universal_asynchronous_receiver_and_transmitter_uart1>>**
| `uart1_txd_o` | 1 | out | - | serial transmitter
| `uart1_rxd_i` | 1 | in | `'L'` | serial receiver
| `uart1_rts_o` | 1 | out | - | RX ready to receive new char
| `uart1_cts_i` | 1 | in | `'L'` | TX allowed to start sending, low-active
5+^| **<<_serial_peripheral_interface_controller_spi>>**
| `spi_clk_o` | 1 | out | - | controller clock line
| `spi_dat_o` | 1 | out | - | serial data output
| `spi_dat_i` | 1 | in | `'L'` | serial data input
| `spi_csn_o` | 8 | out | - | select (low-active)
5+^| **<<_serial_data_interface_controller_sdi>>**
| `sdi_clk_i` | 1 | in | `'L'` | controller clock line
| `sdi_dat_o` | 1 | out | - | serial data output
| `sdi_dat_i` | 1 | in | `'L'` | serial data input
| `sdi_csn_i` | 1 | in | `'H'` | chip select, low-active
5+^| **<<_two_wire_serial_interface_controller_twi>>**
| `twi_sda_i` | 1 | in | `'H'` | serial data line sense input
| `twi_sda_o` | 1 | out | - | serial data line output (pull low only)
| `twi_scl_i` | 1 | in | `'H'` | serial clock line sense input
| `twi_scl_o` | 1 | out | - | serial clock line output (pull low only)
5+^| **<<_one_wire_serial_interface_controller_onewire>>**
| `onewire_i` | 1 | in | `'H'` | 1-wire bus sense input
| `onewire_o` | 1 | out | - | 1-wire bus output (pull low only)
5+^| **<<_pulse_width_modulation_controller_pwm>>**
| `pwm_o` | 12 | out | - | pulse-width modulated channels
5+^| **<<_custom_functions_subsystem_cfs>>**
| `cfs_in_i` | 32 | in | `'L'` | custom CFS input signal conduit
| `cfs_out_o` | 32 | out | - | custom CFS output signal conduit
5+^| **<<_smart_led_interface_neoled>>**
| `neoled_o` | 1 | out | - | asynchronous serial data output
5+^| **<<_machine_system_timer_mtime>>**
| `mtime_time_o` | 64 | out | - | MTIME system time output
5+^| **<<_general_purpose_timer_gptmr>>**
| `gptmr_trig_i` | 1 | in | `'L'` | timer capture input
5+^| **<<_external_interrupt_controller_xirq>>**
| `xirq_i` | 32 | in | `'L'` | external interrupt requests
5+^| **RISC-V Machine-Mode <<_processor_interrupts>>**
| `mtime_irq_i` | 1 | in | `'L'` | machine timer interrupt (RISC-V), high-level-active
| `msw_irq_i` | 1 | in | `'L'` | machine software interrupt (RISC-V), high-level-active
| `mext_irq_i` | 1 | in | `'L'` | machine external interrupt (RISC-V), high-level-active
|=======================
<<<
// ####################################################################################################################
:sectnums:
=== Processor Top Entity - Generics
This section lists all configuration generics of the NEORV32 processor top entity (`rtl/neorv32_top.vhd`).
.Customization
[TIP]
The NEORV32 generics allow to configure the system according to your needs. The generics are
used to control implementation of certain CPU extensions and peripheral modules and even allow to
optimize the system for certain design goals like minimal area or maximum performance.
.Default Values
[NOTE]
All _optional_ configuration generics provide default values in case they are not explicitly assigned during instantiation.
.Software Discovery of Configuration
[TIP]
Software can determine the actual CPU configuration via the <<_misa>> and <<_mxisa>> CSRs. The Soc/Processor
and can be determined via the <<_system_configuration_information_memory_sysinfo, SYSINFO>> memory-mapped registers.
.Excluded Modules and Extensions
[NOTE]
If optional modules (like CPU extensions or peripheral devices) are not enabled the according hardware
will not be synthesized at all. Hence, the disabled modules do not increase area and power requirements
and do not impact timing.
.Table Abbreviations
[NOTE]
The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downto y)`".
.NEORV32 Processor Generic List
[cols="<3,^2,^2,<8"]
[options="header",grid="rows"]
|=======================
| Name | Type | Default | Description
4+^| **General**
| `CLOCK_FREQUENCY` | natural | - | The clock frequency of the processor's `clk_i` input port in Hertz (Hz).
| `CLOCK_GATING_EN` | boolean | false | Enable clock gating when CPU is in sleep mode (see sections <<_sleep_mode>> and <<_processor_clocking>>).
| `INT_BOOTLOADER_EN` | boolean | false | Implement the processor-internal <<_bootloader_rom_bootrom>>, pre-initialized with the default <<_bootloader>> image.
| `HART_ID` | suv(31:0) | 0x00000000 | The hart thread ID of the CPU (passed to <<_mhartid>> CSR).
| `VENDOR_ID` | suv(31:0) | 0x00000000 | JEDEC ID (passed to <<_mvendorid>> CSR).
4+^| **<<_on_chip_debugger_ocd>>**
| `ON_CHIP_DEBUGGER_EN` | boolean | false | Implement the on-chip debugger and the CPU debug mode.
| `DM_LEGACY_MODE` | boolean | false | Debug module spec. version: `false` = v1.0, `true` = v0.13 (legacy mode).
4+^| **CPU <<_instruction_sets_and_extensions>>**
| `CPU_EXTENSION_RISCV_A` | boolean | false | Enable <<_a_isa_extension>> (atomic memory accesses).
| `CPU_EXTENSION_RISCV_B` | boolean | false | Enable <<_b_isa_extension>> (bit-manipulation).
| `CPU_EXTENSION_RISCV_C` | boolean | false | Enable <<_c_isa_extension>> (compressed instructions).
| `CPU_EXTENSION_RISCV_E` | boolean | false | Enable <<_e_isa_extension>> (reduced register file size).
| `CPU_EXTENSION_RISCV_M` | boolean | false | Enable <<_m_isa_extension>> (hardware-based integer multiplication and division).
| `CPU_EXTENSION_RISCV_U` | boolean | false | Enable <<_u_isa_extension>> (less-privileged user mode).
| `CPU_EXTENSION_RISCV_Zfinx` | boolean | false | Enable <<_zfinx_isa_extension>> (single-precision floating-point unit).
| `CPU_EXTENSION_RISCV_Zicntr` | boolean | true | Enable <<_zicntr_isa_extension>> (CPU base counters).
| `CPU_EXTENSION_RISCV_Zicond` | boolean | false | Enable <<_zicond_isa_extension>> (integer conditional operations).
| `CPU_EXTENSION_RISCV_Zihpm` | boolean | false | Enable <<_zihpm_isa_extension>> (hardware performance monitors).
| `CPU_EXTENSION_RISCV_Zmmul` | boolean | false | Enable <<_zmmul_isa_extension>> (hardware-based integer multiplication).
| `CPU_EXTENSION_RISCV_Zxcfu` | boolean | false | Enable NEORV32-specific <<_zxcfu_isa_extension>> (custom RISC-V instructions).
4+^| **CPU <<_architecture>> Tuning Options**
| `FAST_MUL_EN` | boolean | false | Implement fast but large full-parallel multipliers (trying to infer DSP blocks); see section <<_cpu_arithmetic_logic_unit>>.
| `FAST_SHIFT_EN` | boolean | false | Implement fast but large full-parallel barrel shifters; see section <<_cpu_arithmetic_logic_unit>>.
| `REGFILE_HW_RST` | boolean | false | Implement full hardware reset for register file (prevent inferring of BRAM); see section <<_cpu_register_file>>.
4+^| **Physical Memory Protection (<<_pmp_isa_extension>>)**
| `PMP_NUM_REGIONS` | natural | 0 | Number of implemented PMP regions (0..16).
| `PMP_MIN_GRANULARITY` | natural | 4 | Minimal region granularity in bytes. Has to be a power of two, min 4.
| `PMP_TOR_MODE_EN` | boolean | true | Implement support for top-of-region (TOR) mode.
| `PMP_NAP_MODE_EN` | boolean | true | Implement support for naturally-aligned power-of-two (NAPOT & NA4) modes.
4+^| **Hardware Performance Monitors (<<_zihpm_isa_extension>>)**
| `HPM_NUM_CNTS` | natural | 0 | Number of implemented hardware performance monitor counters (0..13).
| `HPM_CNT_WIDTH` | natural | 40 | Total LSB-aligned size of each HPM counter. Min 0, max 64.
4+^| **Atomic Memory Access Reservation Set Granularity (<<_a_isa_extension>>)**
| `AMO_RVS_GRANULARITY` | natural | 4 | Size in bytes, has to be a power of 2, min 4.
4+^| **Internal <<_instruction_memory_imem>>**
| `MEM_INT_IMEM_EN` | boolean | false | Implement the processor-internal instruction memory.
| `MEM_INT_IMEM_SIZE` | natural | 16*1024 | Size in bytes of the processor internal instruction memory (use a power of 2).
4+^| **Internal <<_data_memory_dmem>>**
| `MEM_INT_DMEM_EN` | boolean | false | Implement the processor-internal data memory.
| `MEM_INT_DMEM_SIZE` | natural | 8*1024 | Size in bytes of the processor-internal data memory (use a power of 2).
4+^| **<<_processor_internal_instruction_cache_icache>>**
| `ICACHE_EN` | boolean | false | Implement the instruction cache.
| `ICACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines") Has to be a power of two.
| `ICACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
| `ICACHE_ASSOCIATIVITY` | natural | 1 | Associativity (number of sets). Allowed configurations: `1` = 1 set, direct mapped; `2` = 2-way set-associative.
4+^| **<<_processor_internal_data_cache_dcache>>**
| `DCACHE_EN` | boolean | false | Implement the data cache.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines"). Has to be a power of two.
| `DCACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
4+^| **<<_processor_external_memory_interface_wishbone>>**
| `MEM_EXT_EN` | boolean | false | Implement the external bus interface.
| `MEM_EXT_TIMEOUT` | natural | 255 | Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception.
| `MEM_EXT_PIPE_MODE` | boolean | false | Use _standard_ ("classic") Wishbone protocol when false. Use _pipelined_ Wishbone protocol when true.
| `MEM_EXT_BIG_ENDIAN` | boolean | false | Use BIG endian data order interface for external bus.
| `MEM_EXT_ASYNC_RX` | boolean | false | Disable input registers when true.
| `MEM_EXT_ASYNC_TX` | boolean | false | Disable output registers when true.
4+^| **<<_execute_in_place_module_xip>>**
| `XIP_EN` | boolean | false | Implement the execute in-place module.
| `XIP_CACHE_EN` | boolean | false | Implement XIP cache.
| `XIP_CACHE_NUM_BLOCKS` | natural | 8 | Number of blocks in XIP cache. Has to be a power of two.
| `XIP_CACHE_BLOCK_SIZE` | natural | 256 | Number of bytes per XIP cache block. Has to be a power of two, min 4.
4+^| **<<_external_interrupt_controller_xirq>>**
| `XIRQ_NUM_CH` | natural | 0 | Number of channels of the external interrupt controller. Valid values are 0..32.
| `XIRQ_TRIGGER_TYPE` | suv(31:0) | 0xFFFFFFFF | Trigger type (one bit per channel): `0` = level-triggered, '1' = edge triggered.
| `XIRQ_TRIGGER_POLARITY` | suv(31:0) | 0xFFFFFFFF | Trigger polarity (one bit per channel): `0` = low-level/falling-edge, '1' = high-level/rising-edge.
4+^| **Peripheral/IO Modules**
| `IO_GPIO_NUM` | natural | 0 | Number of general purpose input/output pairs of the <<_general_purpose_input_and_output_port_gpio>>.
| `IO_MTIME_EN` | boolean | false | Implement the <<_machine_system_timer_mtime>>.
| `IO_UART0_EN` | boolean | false | Implement the <<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>.
| `IO_UART0_RX_FIFO` | natural | 1 | UART0 RX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_UART0_TX_FIFO` | natural | 1 | UART0 TX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_UART1_EN` | boolean | false | Implement the <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1>>.
| `IO_UART1_RX_FIFO` | natural | 1 | UART1 RX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_UART1_TX_FIFO` | natural | 1 | UART1 TX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_SPI_EN` | boolean | false | Implement the <<_serial_peripheral_interface_controller_spi>>.
| `IO_SPI_FIFO` | natural | 1 | Depth of the <<_serial_peripheral_interface_controller_spi>> FIFO. Has to be a power of two, min 1, max 32768.
| `IO_SDI_EN` | boolean | false | Implement the <<_serial_data_interface_controller_sdi>>.
| `IO_SDI_FIFO` | natural | 1 | Depth of the <<_serial_data_interface_controller_sdi>> FIFO. Has to be a power of two, min 1, max 32768.
| `IO_TWI_EN` | boolean | false | Implement the <<_two_wire_serial_interface_controller_twi>>.
| `IO_PWM_NUM_CH` | natural | 0 | Number of channels of the <<_pulse_width_modulation_controller_pwm>> to implement (0..12).
| `IO_WDT_EN` | boolean | false | Implement the <<_watchdog_timer_wdt>>.
| `IO_TRNG_EN` | boolean | false | Implement the <<_true_random_number_generator_trng>>.
| `IO_TRNG_FIFO` | natural | 1 | Depth of the TRNG data FIFO. Has to be a power of two, min 1, max 32768.
| `IO_CFS_EN` | boolean | false | Implement the <<_custom_functions_subsystem_cfs>>.
| `IO_CFS_CONFIG` | suv(31:0) | 0x00000000 | "Conduit" generic to pass user-defined flags to the <<_custom_functions_subsystem_cfs>>.
| `IO_CFS_IN_SIZE` | natural | 32 | Size of the <<_custom_functions_subsystem_cfs>> input signal conduit (`cfs_in_i`).
| `IO_CFS_OUT_SIZE` | natural | 32 | Size of the <<_custom_functions_subsystem_cfs>> output signal conduit (`cfs_out_o`).
| `IO_NEOLED_EN` | boolean | false | Implement the <<_smart_led_interface_neoled>>.
| `IO_NEOLED_TX_FIFO` | natural | 1 | TX FIFO depth of the the <<_smart_led_interface_neoled>>. Has to be a power of two, min 1, max 32768.
| `IO_GPTMR_EN` | boolean | false | Implement the <<_general_purpose_timer_gptmr>>.
| `IO_ONEWIRE_EN` | boolean | false | Implement the <<_one_wire_serial_interface_controller_onewire>>.
| `IO_DMA_EN` | boolean | false | Implement the <<_direct_memory_access_controller_dma>>.
| `IO_SLINK_EN` | boolean | false | Implement the <<_stream_link_interface_slink>>.
| `IO_SLINK_RX_FIFO` | natural | 1 | SLINK RX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_SLINK_TX_FIFO` | natural | 1 | SLINK TX FIFO depth, has to be a power of two, minimum value is 1, max 32768.
| `IO_CRC_EN` | boolean | false | Implement the <<_cyclic_redundancy_check_crc>> unit.
|=======================
<<<
// ####################################################################################################################
:sectnums:
=== Processor Clocking
The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely by the
top's `clk_i` signal. This clock signal is used by all internal registers and memories, which trigger on the rising edge of
this clock signal - except for the <<_processor_reset>> and the clock switching gate that trigger on a falling edge.
External "clocks" like the OCD's JTAG clock or the SDI's serial clock are synchronized into the processor's clock domain
before being further processed.
==== Clock Gating
The single clock domain of the processor can be split into an always-on clock domain and a switchable clock domain.
The switchable clock domain is used to clock the CPU core, the CPU's bus switch and - if implemented - the caches.
This domain can be deactivated to reduce power consumption. The always-on clock domain is used to clock all other
processor modules like peripherals, memories and IO devices. Hence, these modules can continue operation (e.g. a
timer keeps running) even if the CPU is shut down.
The splitting into two clock domain is enabled by the `CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>>).
When enabled, a generic clock switching gate is added to decouple the switchable clock from the always-on clock domain
(VHDL file `neorv32_clockgate.vhd`). Whenever the CPU enters <<_sleep_mode>> the CPU clock domain ist shut down.
.Clock Switch Hardware
[NOTE]
By default, a generic clock gate is used (`rtl/core/neorv32_clockgate.vhd`) to shut down the CPU clock.
Especially for FPGA setups it is highly recommended to replace this default version by a technology-specific primitive
or macro wrapper to improve efficiency (clock skew, global clock tree usage, etc.).
==== Peripheral Clocks
Many processor modules like the UARTs or the timers provide a programmable time base for operations. In order to simplify
the hardware, the processor implements a global "clock generator" that provides _clock enables_ for certain frequencies that
are derived from the man clock. Hence, these clock enable signals are synchronous to the system's main clock and will be high
for only a single cycle. The processor modules can use these enables for sub-main-clock operations while still providing a single
clock domain only.
In total, 8 sub-main-clock signals are available. All processor modules, which feature a time-based configuration, provide a
programmable three-bit prescaler select in their control register to select one of the 8 available clocks. The
mapping of the prescaler select bits to the according clock source is shown in the table below. Here, _f_ represents the
processor main clock from the top entity's `clk_i` signal.
[cols="<3,^1,^1,^1,^1,^1,^1,^1,^1"]
[grid="rows"]
|=======================
| Prescaler bits: | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111`
| Resulting clock: | _f/2_ | _f/4_ | _f/8_ | _f/64_ | _f/128_ | _f/1024_| _f/2048_| _f/4096_
|=======================
.Power Saving
[TIP]
If no peripheral modules requires a clock signal from the internal clock generator (all according modules are disabled by
clearing the enable bit in the according module's control register) the generator is automatically deactivated to reduce
dynamic power consumption.
<<<
// ####################################################################################################################
:sectnums:
=== Processor Reset
.Processor Reset Signal
[IMPORTANT]
Always make sure to connect the processor's reset signal `rstn_i` to a valid reset source (a button, the "locked"
signal of a PLL, a dedicated reset controller, etc.).
The processor-wide reset can be triggered by any of the following sources:
* the asynchronous low-active `rstn_i` top entity input signal
* the <<_on_chip_debugger_ocd>>
* the <<_watchdog_timer_wdt>>
.Reset Cause
[TIP]
The actual reset cause can be determined via the <<_watchdog_timer_wdt>>.
If any of these sources trigger a reset, the internal reset will be triggered for at least 4 clock cycles ensuring
a valid reset of the entire processor. The internal global reset is asserted _aysynchronoulsy_ if triggered by the external
`rstn_i` signal. For internal reset sources, the global reset is asserted _synchronously_. If the reset cause gets inactive
the internal reset is de-asserted _synchronously_ at a falling clock edge.
Internally, **all registers** that are not meant for mapping to blockRAM (like the register file) do provide a dedicated and
low-active **asynchronous hardware reset**. This asynchronous reset ensures that the entire processor logic is reset to a
defined state even if the main clock is not operational yet.
[NOTE]
The system reset will only reset the control registers of each implemented IO/peripheral module. This control register
reset will also reset the according "module enable flag" to zero, which - in turn - will cause a _synchronous_
module-internal reset of the remaining logic.
<<<
// ####################################################################################################################
:sectnums:
=== Processor Interrupts
The NEORV32 Processor provides several interrupt request signals (IRQs) for custom platform use.
:sectnums:
==== RISC-V Standard Interrupts
The processor setup features the standard machine-level RISC-V interrupt lines for "machine timer interrupt", "machine
software interrupt" and "machine external interrupt". Their usage is defined by the RISC-V privileged architecture
specifications. However, bare-metal system can also repurpose these interrupts. See CPU section
<<_traps_exceptions_and_interrupts>> for more information.
[cols="<4,<10"]
[options="header",grid="rows"]
|=======================
| Top signal | Description
| `mtime_irq_i` | Machine timer interrupt from _processor-external_ MTIME unit (`MTI`). This IRQ is only available if the processor-internal <<_machine_system_timer_mtime>> unit is not implemented.
| `msw_irq_i` | Machine software interrupt (`MSI`). This interrupt is used for inter-processor interrupts in multi-core systems. However, it can also be used for any custom purpose.
| `mext_irq_i` | Machine external interrupt (`MEI`). This interrupt is used for any processor-external interrupt source (like a platform interrupt controller).
|=======================
.Trigger Type
[IMPORTANT]
The RISC-V standard interrupts are **level-triggered and high-active**. Once set, the signal has to remain high until
the interrupt request is explicitly acknowledged (e.g. writing to a memory-mapped register). The RISC-V standard interrupts
**CANNOT** be acknowledged/cleared by writing zero to the according <<_mip>> CSR bit.
:sectnums:
==== NEORV32-Specific Fast Interrupt Requests
As part of the NEORV32-specific CPU extensions, the processor core features 16 fast interrupt request signals
(`FIRQ0` - `FIRQ15`) providing dedicated bits in the <<_mip>> and <<_mie>> CSRs and custom <<_mcause>> trap codes.
The FIRQ signals are reserved for _processor-internal_ modules only (for example for the communication
interfaces to signal "available incoming data" or "ready to send new data").
The mapping of the 16 FIRQ channels to the according processor-internal modules is shown in the following
table (the channel number also corresponds to the according FIRQ priority: 0 = highest, 15 = lowest):
.NEORV32 Fast Interrupt Request (FIRQ) Mapping
[cols="^2,<2,<6"]
[options="header",grid="rows"]
|=======================
| Channel | Source | Description
| 0 | <<_watchdog_timer_wdt,WDT>> | watchdog timeout interrupt
| 1 | <<_custom_functions_subsystem_cfs,CFS>> | custom functions subsystem (CFS) interrupt (user-defined)
| 2 | <<_primary_universal_asynchronous_receiver_and_transmitter_uart0,UART0>> | UART0 RX interrupt
| 3 | <<_primary_universal_asynchronous_receiver_and_transmitter_uart0,UART0>> | UART0 TX interrupt
| 4 | <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,UART1>> | UART1 RX interrupt
| 5 | <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,UART1>> | UART1 TX interrupt
| 6 | <<_serial_peripheral_interface_controller_spi,SPI>> | SPI interrupt
| 7 | <<_two_wire_serial_interface_controller_twi,TWI>> | TWI transmission done interrupt
| 8 | <<_external_interrupt_controller_xirq,XIRQ>> | External interrupt controller interrupt
| 9 | <<_smart_led_interface_neoled,NEOLED>> | NEOLED TX buffer interrupt
| 10 | <<_direct_memory_access_controller_dma,DMA>> | DMA transfer done interrupt
| 11 | <<_serial_data_interface_controller_sdi,SDI>> | SDI interrupt
| 12 | <<_general_purpose_timer_gptmr,GPTMR>> | General purpose timer interrupt
| 13 | <<_one_wire_serial_interface_controller_onewire,ONEWIRE>> | 1-wire operation done interrupt
| 14 | <<_stream_link_interface_slink,SLINK>> | SLINK FIFO level interrupt
| 15 | <<_true_random_number_generator_trng,TRNG>> | TRNG FIFO level interrupt
|=======================
.Trigger Type
[IMPORTANT]
The fast interrupt request channels become pending after being triggering by one-cycle-high signal.
A pending FIRQ has to be explicitly cleared by writing zero to the according <<_mip>> CSR bit.
<<<
// ####################################################################################################################
:sectnums:
=== Address Space
As a 32-bit architecture the NEORV32 can access a 4GB physical address space. By default, this address space is
split into six main regions. Each region provides specific _physical memory attributes_ ("PMAs") that define
the access capabilities (`rwxac`; `r` = read permission, `w` = write permission, `x` - execute permission,
`a` = atomic access support, `c` = cached CPU access).
.NEORV32 Processor Address Space (Default Configuration)
image::address_space.png[900]
.Main Address Regions
[cols="<1,^4,^2,<7"]
[options="header",grid="rows"]
|=======================
| # | Region | PMAs | Description
| 1 | Internal IMEM address space | `rwxac` | For instructions (=code) and constants; mapped to the internal <<_instruction_memory_imem>>.
| 2 | Internal DMEM address space | `rwxac` | For application runtime data (heap, stack, etc.); mapped to the internal <<_data_memory_dmem>>).
| 3 | Memory-mapped XIP flash | `r-xac` | Memory-mapped access to the <<_execute_in_place_module_xip>> SPI flash.
| 4 | Bootloader address space | `r-xa-` | Read-only memory for the internal <<_bootloader_rom_bootrom>> containing the default <<_bootloader>>.
| 5 | IO/peripheral address space | `rwxa-` | Processor-internal peripherals / IO devices.
| 6 | The "**void**" | `rwxac` | Unmapped address space. All accesses to this region(s) are redirected to the <<_processor_external_memory_interface_wishbone>> (if implemented).
|=======================
.Custom PMAs
[NOTE]
Physical memory attributes can be customized (constrained) using the CPU's <<_pmp_isa_extension>>.
The CPU can access all of the 32-bit address space from the instruction fetch interface and also from the data access
interface. Both interfaces can be equipped with optional caches (<<_processor_internal_data_cache_dcache>> and
<<_processor_internal_instruction_cache_icache>>). The two CPU interfaces are multiplexed by a simple bus switch into
a single processor-internal bus. Optionally, this bus is further switched by another instance of the bus switch so the
<<_direct_memory_access_controller_dma>> controller can also access the entire address space. Accesses via the
resulting SoC bus are split by the <<_bus_gateway>> that redirects accesses to the according main address regions.
Accesses to the processor-internal IO/peripheral devices are further redirected via a dedicated <<_io_switch>>.
.Processor-Internal Bus Architecture
image::neorv32_bus.png[1300]
.Bus Interface
[TIP]
See sections CPU <<_architecture>> and <<_bus_interface>> for more information regarding the CPU bus accesses.
:sectnums:
==== Bus Gateway
The central bus gateway serves two purposes: **redirect** core accesses to the according modules (e.g. memory accesses
vs. memory-mapped IO accesses) and **monitor** all bus transactions. The redirection of access request is based on a
customizable memory map implemented via VHDL constants in the main package file (`rtl/core/neorv323_package.vhd`):
.Main Address Regions Configuration in the VHDL Package File
[source,vhdl]
----
-- Main Address Regions ---
constant mem_imem_base_c : std_ulogic_vector(31 downto 0) := x"00000000"; -- IMEM size via generic
constant mem_dmem_base_c : std_ulogic_vector(31 downto 0) := x"80000000"; -- DMEM size via generic
constant mem_xip_base_c : std_ulogic_vector(31 downto 0) := x"e0000000";
constant mem_xip_size_c : natural := 256*1024*1024;
constant mem_boot_base_c : std_ulogic_vector(31 downto 0) := x"ffffc000";
constant mem_boot_size_c : natural := 8*1024;
constant mem_io_base_c : std_ulogic_vector(31 downto 0) := x"ffffe000";
constant mem_io_size_c : natural := 8*1024;
----
Besides the delegation of bus requests the gateway also implements a bus monitor (aka "the bus keeper") that tracks all
active bus transactions to ensure _safe_ and _deterministic_ operations.
Whenever a memory-mapped device is accessed (a real memory, a memory-mapped IO or some processor-external module) the bus
monitor starts an internal timer. The accessed module has to respond ("ACK") to the bus request within a specific
**time window**. This time window is defined by a global constant in the processor's VHDL package file
(`rtl/core/neorv323_package.vhd`).
.Internal Bus Timeout Configuration
[source,vhdl]
----
constant bus_timeout_c : natural := 15;
----
This constant defines the _maximum_ number of cycles after which a non-responding bus request (i.e. no `ack`
and no `err` signal) will time out raising a bus access fault exception. For example this can happen when accessing
"address space holes" - addresses that are not mapped to any physical module. The resulting exception type corresponds
to the according access type, i.e. instruction fetch access exception, load access exception or store access exception.
.XIP Timeout
[NOTE]
Accesses to the memory-mapped XIP flash (via the <<_execute_in_place_module_xip>>) will _never_ time out.
.External Bus Interface Timeout
[NOTE]
Accesses that are delegated to the external bus interface have a different maximum timeout value that is defined by an
explicit specific processor generic. See section <<_processor_external_memory_interface_wishbone>> for more information.
:sectnums:
==== Reservation Set Controller
The reservation set controller is responsible for handling the load-reservate and store-conditional bus transaction that
are triggered by the `lr.w` (LR) and `sc.w` (SC) instructions from the CPU's <<_a_isa_extension>>.
A "reservation" defines an address or address range that provides a guarding mechanism to support atomic accesses. A new
reservation is registered by the LR instruction. The address provided by this instruction defines the memory location
that is now monitored for atomic accesses. The according SC instruction evaluates the state of this reservation. If
the reservation is still valid the write access triggered by the SC instruction is finally executed and the instruction
return a "success" state (`rd` = 0). If the reservation has been invalidated the SC instruction will not write to memory
and will return a "failed" state (`rd` = 1).
The reservation is invalidated if...
* an SC instruction is executed that accesses an address **outside** of the reservation set of the previous LR instruction.
This SC instruction will **fail** (not writing to memory).
* an SC instruction is executed that accesses an address **inside** of the reservation set of the previous LR instruction.
This SC instruction will **succeed** (finally writing to memory).
* a normal store operation accesses an address **inside** of the current reservation set (by the CPU or by the DMA).
* a hardware reset is triggered.
.Consecutive LR Instructions
[NOTE]
If an LR instruction is followed by another LR instruction the reservation set of the former one is overridden
by the reservation set of the latter one.
.Bus Access Errors
[IMPORTANT]
If the LR operation causes a bus access error (raising a load access exception) the reservation **is registered anyway**.
If the SC operation causes a bus access error (raising a store access exception) an already registered reservation set
**is invalidated anyway**.
.Strong Semantic
[IMPORTANT]
The LR/SC mechanism follows the _strong semantic_ approach: the LR/SC instruction pair fails only if there is a write
access to the referenced memory location between the LR and SC instructions (by the CPU itself or by the DMA).
Context changes, interrupts, traps, etc. do not effect nor invalidate the reservation state at all.
The controller supports only a single global reservation set. By default this reservation set "monitors" a word-aligned
4-byte granule. However, the granularity can be customized via the `AMO_RVS_GRANULARITY` top entity generic (see
<<_processor_top_entity_generics>>) to cover an arbitrarily large naturally aligned address region. The only constraint is
that the size of the address region has to be a power of two. The configured granularity can be determined by software via
the <<_system_configuration_information_memory_sysinfo>> module.
.Physical Memory Attributes
[NOTE]
The reservation set can be set for _any_ address (only constrained by the configured granularity). This also
includes cached memory, memory-mapped IO devices and processor-external address spaces.
Bus transactions triggered by the LR instruction register a new reservation set and are delegated to the adressed
memory/device. Bus transactions triggered by the SC remove a reservation set and are forwarded to the adressed
memory/device only if the SC operations succeeds. Otherwise, the access request is not forwarded and a local ACK is
generated to terminate the bus transaction.
.LR/SC Bus Protocol
[NOTE]
More information regarding the LR/SC bus transactions and the the according protocol can be found in section
<<_bus_interface>> / <<_atomic_accesses>>.
.Cache Coherency
[IMPORTANT]
Atomic operations **always bypass** the cache using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the `fence` instruction).
:sectnums:
==== IO Switch
The IO switch further decodes the address when accessing the processor-internal IO/peripheral devices and forwards
the access request to the according module. Note that a total address space size of 256 bytes is assigned to each
IO module in order to simplify address decoding. The IO-specific address map is also defined in the main VHDL
package file (`rtl/core/neorv323_package.vhd`).
.Exemplary Cut-Out from the IO Address Map
[source,vhdl]
----
-- IO Address Map --
constant iodev_size_c : natural := 256; -- size of a single IO device (bytes)
constant base_io_cfs_c : std_ulogic_vector(31 downto 0) := x"ffffeb00";
constant base_io_slink_c : std_ulogic_vector(31 downto 0) := x"ffffec00";
constant base_io_dma_c : std_ulogic_vector(31 downto 0) := x"ffffed00";
----
:sectnums:
==== Boot Configuration
Due to the flexible memory configuration, the NEORV32 Processor provides several different boot scenarios.
The following section illustrates the two most common boot scenarios.
.NEORV32 Boot Configurations
image::neorv32_boot_configurations.png[800]
There are two general boot scenarios: _Indirect Boot_ (1a and 1b) and _Direct Boot_ (2a and 2b) configured via the
`INT_BOOTLOADER_EN` generic. If this generic is `true` the _indirect boot scenario_ is used. This is also the
default boot configuration of the processor. If `INT_BOOTLOADER_EN` is `*false` the _direct boot scenario_ is used.
:sectnums!:
===== Indirect Boot
The indirect_boot scenarios **1a** and **1b** are based on the processor-internal <<_bootloader>>. This boot setup is enabled
by setting the `INT_BOOTLOADER_EN` generic to `true`, which will implement the processor-internal <<_bootloader_rom_bootrom>>.
This read-only memory is pre-initialized during synthesis with the default bootloader firmware. The bootloader provides several
options to upload an executable copying it to the beginning of the _instruction address space_ so the CPU can execute it.
Boot scenario **1a** uses the processor-internal IMEM. This scenario implements the internal <<_instruction_memory_imem>>
as non-initialized RAM so the bootloader can copy the actual executable to it.
Boot scenario **1b** uses a processor-external IMEM that is connected via the processor's bus interface. In this scenario
the internal <<_instruction_memory_imem>> is not implemented at all and the bootloader will copy the executable to the
processor-external memory. Hence, the external memory has to be implemented as RAM.
:sectnums!:
===== Direct Boot
The direct boot scenarios **2a** and **2b** do not use the processor-internal bootloader since the `INT_BOOTLOADER_EN`
generic is set `false`. In this configuration the <<_bootloader_rom_bootrom>> is not implemented at all and the CPU will
directly begin executing code from the beginning of the instruction address space after reset. An application-specific
"pre-initialization" mechanism is required in order to provide an executable inside the memory.
Boot scenario **2a** uses the processor-internal IMEM implemented as _read-only memory_ in this scenario.
It is pre-initialized (by the bitstream) with the actual application executable during synthesis.
In contrast, boot scenario **2b** uses a processor-external IMEM. In this scenario the system designer is responsible for
providing an initialized external memory that contains the actual application to be executed.
<<<
// ####################################################################################################################
:sectnums:
=== Processor-Internal Modules
.Module Address Space Mapping
[IMPORTANT]
The base address of each component/module has to be aligned to the total size of the module's occupied address space.
The occupied address space has to be a power of two (minimum 4 bytes). Addresses of peripheral modules must not overlap.
.Full-Word Write Accesses Only
[IMPORTANT]
All peripheral/IO devices should only be written in full-word mode (i.e. 32-bit). Byte or half-word (8/16-bit) write accesses
might cause undefined behavior.
.IO Module's Address Space
[IMPORTANT]
Each peripheral/IO module occupies an address space of 256 bytes (64 words). Most devices do not fully utilize this address
space and will simply _mirror_ the available interface registers across the entire 256 bytes of address space.
.Unimplemented Modules / Address Holes
[NOTE]
When accessing an IO device that hast not been implemented (disabled via the according generic)
or when accessing an address that is actually unused, a load/store access fault exception is raised.
.Module Interrupts
[NOTE]
Most peripheral/IO devices provide some kind of interrupt (for example to signal available incoming data). These
interrupts are entirely mapped to the CPU's <<_custom_fast_interrupt_request_lines>>.
See section <<_processor_interrupts>> for more information.
.CMSIS System Description View (SVD)
[TIP]
A CMSIS-SVD-compatible **System View Description (SVD)** file including all peripherals is available in `sw/svd`.
include::soc_imem.adoc[]
include::soc_dmem.adoc[]
include::soc_bootrom.adoc[]
include::soc_icache.adoc[]
include::soc_dcache.adoc[]
include::soc_dma.adoc[]
include::soc_wishbone.adoc[]
include::soc_slink.adoc[]
include::soc_gpio.adoc[]
include::soc_crc.adoc[]
include::soc_wdt.adoc[]
include::soc_mtime.adoc[]
include::soc_uart.adoc[]
include::soc_spi.adoc[]
include::soc_sdi.adoc[]
include::soc_twi.adoc[]
include::soc_onewire.adoc[]
include::soc_pwm.adoc[]
include::soc_trng.adoc[]
include::soc_cfs.adoc[]
include::soc_neoled.adoc[]
include::soc_xirq.adoc[]
include::soc_gptmr.adoc[]
include::soc_xip.adoc[]
include::soc_sysinfo.adoc[]