:sectnums: === Rationale [discrete] ==== Why did you make this? Processor and CPU architecture designs are fascinating things: they are the magic frontier where software meets hardware. This project started as something like a _journey_ into this magic realm to understand how things actually work down on this very low level and evolved over time to a capable system on chip. But there is more: when I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity. As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with... Which core to use? How to get the right toolchain? What features do I need? How does booting work? How do I create an actual executable? How to get that into the hardware? How to customize things? **_Where to start???_** This project aims to provide a _simple to understand_ and _easy to use_ yet _powerful_ and _flexible_ platform that targets FPGA and RISC-V beginners as well as advanced users. [discrete] ==== Why a _soft-core_ processor? As a matter of fact soft-core processors _cannot_ compete with discrete (like FPGA hard-macro) processors in terms of performance, energy efficiency and size. But they do fill a niche in FPGA design space: for example, soft-core processors allow to implement the _control flow part_ of certain applications (e.g. communication protocol handling) using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and re-uploaded again. Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add _exactly_ the features that are required by the application: additional memories, custom interfaces, specialized co-processors and even user-defined instructions. [discrete] ==== Why RISC-V? image::riscv_logo.png[width=250,align=left] [quote, RISC-V International, https://riscv.org/about/] ____ RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration. ____ Open-source is a great thing! While open-source has already become quite popular in _software_, hardware-focused projects still need to catch up. Admittedly, there has been quite a development, but mainly in terms of _platforms_ and _applications_ (so schematics, PCBs, etc.). Although processors and CPUs are the heart of almost every digital system, having a true open-source silicon is still a rarity. RISC-V aims to change that - and even it is _just one approach_, it helps paving the road for future development. Furthermore, I highly appreciate the community aspect of RISC-V. The ISA and everything beyond is developed in direct contact with the community: this includes businesses and professionals but also hobbyist, amateurs and people that are just curious. Everyone can join discussions and contribute to RISC-V in their very own way. Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that resembles with the basic concepts of _RISC_: simple yet effective. [discrete] ==== Yet another RISC-V core? What makes it special? The NEORV32 is not based on another RISC-V core. It was build entirely from ground up (just following the official ISA specs). The project does not intend to replace certain RISC-V cores or just beat existing ones like https://github.com/SpinalHDL/VexRiscv[VexRISC] in terms of performance or https://github.com/olofk/serv[SERV] in terms of size. It was build having a different design goal in mind. The project aims to provide _another option_ in the RISC-V / soft-core design space with a different performance vs. size trade-off and a different focus: _embrace_ concepts like documentation, platform-independence / portability, RISC-V compatibility, _extensibility & customization_ and _ease of use_ (see the <<_project_key_features>> below). Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_virtualization>>. The CPU aims to provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations and even memory accesses that are checked for address space holes and deterministic response times of memory-mapped devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation. [discrete] ==== A multi-cycle architecture?!?! Most mainstream CPUs out there are pipelined architectures to increase throughput. In contrast, most CPUs used for teaching are single-cycle designs since they are probably the most easiest to understand. But what about the multi-cycle architectures? In terms of energy, throughput, area and maximal clock frequency multi-cycle architectures are somewhere in between single-cycle and fully-pipelined designs: they provide higher throughput and clock speed when compared to their single-cycle counterparts while having less hardware complexity (= area) then a fully-pipelined designs. I decided to use the multi-cycle approach because of the following reasons: * Multi-cycle architecture are quite small! There is no need for pipeline hazard detection and resolution logic (e.g. forwarding). Furthermore, you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for the actual data processing, but also for address generation, branch condition check and branch target computation). * Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement in real world applications (i.e. FPGA block RAM is entirely synchronous). Furthermore, such design usually have a very long critical path tremendously reducing maximal operating frequency. * Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception all the instructions in earlier stages have to be invalidated. Potential architecture state changes have to be made _undone_ requiring additional (exception-handling) logic. In a multi-cycle architecture this situation cannot occur because only a single instruction is "in fly" at a time. * Having only a single instruction in fly does not only reduce hardware costs, it also simplifies simulation/verification/debugging, state preservation/restoring during exceptions and extensibility (no need to care about pipeline hazards) - but of course at the cost of reduced throughput. To counteract the loss of performance implied by a _pure_ multi-cycle architecture, the NEORV32 CPU uses a _mixed_ approach: instruction fetch (front-end) and instruction execution (back-end) are de-coupled to operate independently of each other. Data is interchanged via a queue building a simple 2-stage pipeline. Each "pipeline" stage in terms is implemented as multi-cycle architecture to simplify the hardware and to provide _precise_ state control (e.g. during exceptions).