MIPSfpga+ allows loading programs via UART and has a switchable clock

Originally published on January 1, 2016
This revision is from February 1, 2016

MIPSfpga+ / mipsfpga-plus / MFP is a cleaned-up and improved variant of MIPSfpga-based system defined in MIPSfpga Getting Started package (MFGS). The new features include:

  1. The ability to load a software program from a PC into a synthesized system on FPGA board using USB-to-UART connection instead of Bus Blaster. Some FPGA boards already have USB-to-UART interface, other boards can use a ubiquitous $5 FTDI-based USB-to-UART connector instead of $50 Bus Blaster that is difficult to get in some places of the globe.
  2. The ability to change the clock frequency on the fly from 50 or 25 MHz down to 12 Hz and 0.75 Hz (less than one cycle a second) to observe the work of CPU in real time, including cache misses and pipeline forwarding
  3. An example of integration of a light sensor with SPI protocol
  4. Smaller software initialization sequence that fits in 1 KB instead of 32 KB memory, which allows porting MIPSfpga to a wider selection of FPGA boards, without using external memory
  5. Miscellaneous fixes like improving AHB-Lite slave to handle narrow uncached writes of sizes 1 or 2-bytes

The hierarchy of synthesizable modules for Digilent Nexys 4 DDR with Xilinx Artix-7 FPGA:

The hierarchy for RTL simulation:

MIPSfpga+ currently works on two FPGA boards:

  1. Digilent Nexys 4 DDR board with Xilinx Artix-7 FPGA. See the Appendix A about how the board is connected with the applicable peripherals.
  2. Terasic DE0-CV with Altera Cyclone V. See the Appendix B about how the board is connected with the applicable peripherals.

There is also one unfinisned port and five planned ports:

  1. Terasic DE0-Nano board with Altera Cyclone IV FPGA. This port is implemented but it has some issues with clocking and interfacing to be investigated and fixed. See the Appendix C about how the board is connected with the applicable peripherals.
  2. Marsohod 3 board with Altera MAX10 FPGA
  3. Digilent Basys 3 with Xilinx Artix-7. MIPSfpga+ is likely to work on this board with no modification except adding Basys 3 wrapper (top-level Verilog and pin constraints).
  4. Digilent Arty with Xilinx Artix-7. MIPSfpga+ is likely to work on this board with no modification except adding the board wrapper (top-level Verilog and pin constraints).
  5. Terasic DE2-115 with Altera Cyclone IV
  6. Terasic DE1 with Altera Cyclone II

The description of MIPSfpga+ starts with the description of a basic system derived from MIPSfpga Getting Started package (later called MFGS), and gradually proceeds by adding more and more components and features. The source code for MIPSfpga+ is located at http://github.com/MIPSfpga/mipsfpga-plus; this code does not include any source code of MIPS microAptiv UP CPU core from MIPSfpga Getting Started package. A user of MIPSfpga+ is supposed to download Getting Started package version 1.2 from Imagination Technologies web site http://community.imgtec.com/downloads/mipsfpga-getting-started-version-1-2.

After downloading both MIPSfpga from Imagination site and MIPSfpga+ from GitHub, the user is expected to install MIPSfpga under 64-bit Microsoft Windows (either Windows 7 or Windows 8) by placing MIPSfpga into directory C:\MIPSfpga and MIPSfpga+ into C:\github\mipsfpga_plus. The paths inside MIPSfpga+ synthesis and simulation scripts rely on such installation.

MIPSfpga+ (as well as the original MFGS package) can be also used on a workstation with 32-bit Windows, 32-bit Linux, 64-bit Linux, with or without Windows or Linux virtual machine. It is possible to install MIPSfpga+ in different directories, and use it with a number of Verilog simulators and synthesis tools: Synopsys VCS, Cadence IES, Mentor ModelSim, Icarus Verilog with GTKWave, Xilinx ISim and Vivado, Altera Quartus II, Synopsys Synplify Pro and others. Some usage scenarios require modifying the scripts and adhering to specific versions of EDA and software development tools, for example:

1. Basic System

1.1. General cleanup

Before adding any new features, it was necessary to make the original MFGS code more consistent in formatting, make module names more uniform, improve move some registers across hierarchy to avoid separating closely related registers in different modules etc. The resulting MIPSfpga+ basic system had the following structure:

The hierarchy of synthesizable modules for Digilent Nexys 4 DDR with Xilinx Artix-7 FPGA:

The hierarchy for RTL simulation:

1.2. Adding batch files to Makefile

A standard practice is to use makefiles to run tasks like software build and simulation. However it order to make the first experience with MIPSfpga more digestible and less confusing, a set of batch files with self-descriptive names were introduced in a directory for each software example:

  • 00_clean_all.bat
  • 01_compile_c_to_assembly.bat
  • 02_compile_and_link.bat
  • 03_check_program_size.bat
  • 04_disassemble.bat
  • 05_generate_verilog_readmemh_file.bat
  • 06_simulate_with_modelsim.bat
  • 07_simulate_with_icarus.bat
  • 08_generate_motorola_s_record_file.bat
  • 09_upload_to_xilinx_board_using_bus_blaster.bat
  • 10_upload_to_altera_board_using_bus_blaster.bat
  • 11_check_which_com_port_is_used.bat
  • 12_upload_to_the_board_using_uart.bat

Some students do not like the complexity of running Mentor ModelSim. For those students two shortcuts were added:

  • Two scripts (batch file and ModelSim Tcl script) that run ModelSim automatically and display the resulting waveforms — 06_simulate_with_modelsim.bat and modelsim_script.tcl
  • A script that runs free (but slow) Icarus Verilog and displays the resulting waveforms using GTKwave VCD viewer — 07_simulate_with_icarus.bat

1.3. A compact software boot sequence

Some FPGA boards have a very limited amount of internal block memory. These boards include Terasic DE0-Nano with Altera Cyclone IV and (not tried) Digilent Nexys 3 with Xilinx Spartan-6. For those boards spending 32 or even 8 kilobytes of block memory on boot/reset sequence is not an option — the design will simply not fit the FPGA. It is possibly, of course, to add an interface to external memory, but there is a simpler software-only solution — a version of of boot code that fits just 1 kilobyte.

The major difference between this small version and the original MFGS version is that small version does not link to C startup code of ANSI C library crt0. This MIPSfpga+ small boot is based on code, developed by Serge Vakulenko for LiteBSD.

2. Fixing AHB-Lite interface

2.1. Synthesis

The original MFGS memory slave did not not handle AHB-Lite transactions resulting from narrow uncached writes of sizes 1 or 2-bytes. MIPSfpga+ fixes it by splitting each 4 byte word-wide memory into four 1-byte-wide memories and forming a proper mask based on AHB-Lite transaction size and alignment.

The original RAM instantiation in AHB-Lite slave:

The instantiations of for RAMs and forming mask in the improved AHB-Lite slave:

This functionality during synthesis can be turned off by defining a macro MFP_USE_WORD_MEMORY.

2.2. Simulation

The original MFGS package prepared the HEX file to load into ModelSim for simulation using objdump utility from the standard GCC toolchain, in combination with some time-consuming Windows-only script specific for MFGS. The HEX file in MIPSfpga+ is generated in a different, faster and somewhat more standard way, using objcopy utility from GCC toolchain with an option «-O verilog», in combination with MIPSfpga+ -specific utility called ad_hoc_program_hex_splitter.:

mips-mti-elf-objcopy program.elf -O verilog program.hex
..\utilities\ad_hoc_program_hex_splitter

The utility ad_hoc_program_hex_splitter splits the file program.hex into two files, program_00000000.hex and program_1fc00000.hex, that correspond to two physical memory locations — starting from addresses 0x00000000 and 0x1fc00000 correspondingly. When doing splitting, ad_hoc_program_hex_splitter also converts virtual addresses into byte offsets in the corresponding memories.

In order to load the file created with objcopy and ad_hoc_program_hex_splitter into 4-byte-wide ram register array, the following testbench code is used in MFP_USE_WORD_MEMORY and non-MFP_USE_WORD_MEMORY modes:

3. Light sensor integration

Digilent PmodALS — Ambient Light Sensor is inexpensive peripheral with simple version of SPI protocol. The original MIPSfpga Fundamental package demonstrates SPI protocol using LCD display peripheral as an example. Using Digilent Light sensor for the labs is a nice alternative: since the students already experienced an output device (7-segment display), they may want to add an input device (light sensor) and make a useful system that inputs data from the sensor and shows it on 7-segment indicator.

The version of SPI protocol used by the light sensor is very simple, it is described in just two paragraphs in Digilent documentation:

http://digilentinc.com/Data/Products/PMOD-ALS/PmodALS_RM.pdf

The code needed to get data from the sensor is correspondingly also very simple:

Adding the sensor to the basic MIPSfpga+ system can be converted into an exercise / student lab in a fashion similar to Lab 8 in MIPSfpga Fundamentals. The students would be required to create the module above, instantiate it in some reasonable place in MIPSfpga+ basic system, and modify GPIO slave to map the sensor’s output to some software memory address.

4. A hardware-only solution that loads programs into the synthesized system via UART

The original MIPSfpga Getting Started package (MFGS) allowed two ways of loading a software program into the memory of a synthesized MIPSfpga-based system. One way is to hardcode the program during RTL synthesis using Xilinx Vivado or Altera Quertus II. Another way is load the program using Bus Blaster debug probe in combination with OpenOCD software. The developers can buy such probes in places like SeeedStudio for $43.95.

Unfortunately Bus Blaster / OpenOCD solution is relatively new and has a history of driver conflicts under some operating systems. Besides it is difficult to buy it in some places of the globe.

MIPSfpga+ introduces a third, alternative way to load the program, using through USB-to-UART connection. Some FPGA boards (most notably Digilent Nexys 4 DDR and Basys 3) already have a necessary chip for this interface, other boards can use easy to find FTDI-based USB-to-UART connectors that cost less than $5 (in fact less than $2 on AliExpress). MIPSfpga+ UART loader, or simply serial loader, also avoids any software driver conflict by putting all the functionality (UART communication, file parsing and filling the memory) into hardware.

4.1. Hardware compatible with serial loader

A picture of Bus Blaster:

A picture of FTDI-based USB-to-UART connector. Note that you need to setup 3.3V/5V jumper on this connector into 3.3V position to avoid potential damage to some sensitive FPGAs:

Serial loader loader is also compatible with PL2303TA USB TTL to RS232 Converter Serial Cable module for win XP/VISTA/7/8/8.1. This cable is convenient to connect PC to Terasic/Altera boards with male GPIO pins, notably Terasic DE0-Nano. There is another, alternative cable, based on PL2303HX chip however this cable has more compatibility problems with Windows 8.x and we recommend to use cables based on PL2303TA instead.

A picture of connecting Terasic DE0-Nano board to PC using PL2303TA USB TTL to RS232 Converter Serial Cable:

4.2. Module hierarchy with serial loader

Four new modules: mfp_ahb_lite_matrix_with_loader takes the place of mfp_ahb_lite_matrix from earlier hierarchy. mfp_ahb_lite_matrix_with_loader wraps previous mfp_ahb_lite_matrix together with three pieces of new functionality — mfp_uart_receiver, mfp_srec_parser and mfp_srec_parser_to_ahb_lite_bridge.

4.2.1. mfp_uart_receiver

mfp_uart_receiver receives data serially from UART RX pin and outputs 8-bit bytes when data is ready. It assumes a simple version of UART protocol, without control signals, and with one start bit. The baud rate and the expected main clock rate is hardcoded. The module contains a state machine that waits for a negative edge (detecting a start bit) and samples data bits by counting clock cycles. Since the width of each symbol is quite big 50,000,000 Hz / 115,200 baud = 434 clock cycles (or 217 for 25 MHz), this method of getting the data is quite reliable:

4.2.2. srec_parser

srec_parser received data from mfp_uart_receiver and parses them as text in Motorola S-record file format. During parsing the state machine inside srec_parser forms the transactions to the memory of MIPSfpga+ synthesized system, filling the memory with specified bytes at specified locations:

Here is the description of Motorola S-record format from Wikipedia:

Text in Motorola S-record format produced by the standard GCC toolchain utility objcopy from ELF file:

mips-mti-elf-objcopy program.elf -O srec program.rec

No specialized software is required to send this file from PC to MIPSfpga+, user just has to run the following three commands («21» is an example number assigned by Windows to virtual COM port in USB, it should be tuned by each user according to his Windows device manager):

set a=21
mode com%a% baud=115200 parity=n data=8 stop=1 to=off xon=off odsr=off octs=off dtr=off rts=off idsr=off type FPGA_Ram.rec >\.\COM%a%

The same approach theoretically can be used with PC running Linux, just the commands are different (this was not tried yet):

stty -F /dev/ttyUSB0 raw 115200
cat srec program.rec > /dev/ttyUSB0

A user doing this should be included in dialout group.

4.2.3. mfp_srec_parser_to_ahb_lite_bridge

mfp_srec_parser_to_ahb_lite_bridge is a glue between srec_parser and AHB-Lite bus. It also edits the addresses, converting virtual addresses into physical according to the rules of fixed mapping (see MIPS microAptiv UP core software documentation):

4.3. Miscellaneous

The output of srec_parser, a signal called in_progres is used as a reset for microAptiv UP processor core. It means that while the serial loader fills the memory with S-record file data, the processor is not accessing the memory. Once srec_parser gets the termination record (S7), the core wakes up and starts to fetch the program from the newly filled memory.

The serial loader mechanism does not disable interfacing with regular Bus Blaster / Open OCD. Both ways of loading programs, serial loader and Bus Blaster, can be used without re-synthesizing the system.

A major disadvantage of loading the program through serial loader is that you cannot debug the software this way. To use a debugger like gdb, you still need BusBlaster.

5. Switchable clock down to less than one cycle a second

Switchable clock is a feature of MIPSfpga+ that enables the whole new category of student labs. A student can run the processor with its usual multi megahertz frequency and then switch it to a few clocks per second and observe how it works live. A typical usage is to connect to external LEDs the CPU signals that control cache evictions or pipeline forwarding and observe the LED patterns when running different sequence of code.

5.1. Switchable clock implementation

A few words about MIPS microAptiv UP frequencies.

When implemented in ASIC using 28 nm technology, MIPS microAptiv UP can run up to 500 MHz; when implemented using 65 nm — more than 300 MHz.

When MIPS microAptiv UP is synthesized for FPGA, the frequency is much lower — around 50-60 MHz, both for Xilinx and Altera.

The introductory student boards tested for MIPSfpga all have clock generators able to generate a clock signal with the frequency of 50-100 MHz. This frequency can be increased or decreased using phase-locked loop (PLL). Unfortunately, PLL cannot be used to lower the frequency below approximately 100 KHz. In order to lower the frequency even further, other methods have to be used.

Altera has a special macro for such situation called ALTCLKCTLR, but for some reason it did not work. As a result, the switchable clock got implemented using a combination of a counter and a global buffer (Xilinx macro BUFG and Altera macro global).

The frequency is controlled by two switches. Switches also have to be debounced.

This is how switchable clock is instantiated for Xilinx:

This is how switchable clock is instantiated for Xilinx:

The modules with a counter:

The whole thing also requires adding appropriate constrains. It is possible these constraints are not perfect right now and require more work. Incomplete constraints may be the reason switchable clock works well on Nexys 4 DDR and Terasic DE0-CV, but fails on some systems with Terasic DE0-Nano.

5.2. An example of student experiment: switchable clock enables to directly observe CPU cache in action

Switchable clock allows to show the internals of the processor to the students live. Here is an example: a signal that indicates cache eviction is connected to an external LED. Now it is possible to observe cache misses when a program fills a two-dimensional array. This example can be run twice: when the array is filled by columns and when the array is filled by rows. These runs generate different patters of LED blinking.

Specifically, since cache line of MIPS microAptiv UP has size of four words, the following pattern appear when filling the array column after column: miss hit hit hit miss hit hit hit … When the array is filled row by row, the observed pattern is different: miss miss miss … 8 times … miss hit hit hit … 24 times …

To run the demonstration program below, a user has to start with fast clock, go through the initialization sequence with switch 2 off, then switch the clock from 25 MHz to 12 Hz, turn switch 2 on and observe the pattern. After that a user has to modify the program, compile and load it to the board, and run the whole thing again.

Note that such demos are very sensitive to compiler optimizations, so the code should be kept simple and straightforward, otherwise the compiler moves actions around and the pattern becomes unclear. Also note that the first ~3 cache misses likely result from instruction fetches filling L1 instruction cache, not L1 data cache:

6. Contributors to MIPSfpga+

MIPSfpga+ is based on MIPSfpga Getting Started. The contributors to MIPSfpga Getting Started are listed in MIPSfpga Getting Started presentation inside the package. The contributors to MIPSfpga+ itself include:

  • Yuri Panchul (most of coding)
  • Alex Belits (board bringup, UART debug)
  • Anonymous (board and Bus Blaster bringup)
  • Ilya Neganov (clocking)
  • Christos Sakellariou (clocking)
  • Serge Vakulenko (boot / reset sequence)
  • Alexey Frunze (the idea of cache lab)
  • Members of MIPSfpga and Silicon Russia Google groups

7. Useful links:

MIPSfpga programme opens up the MIPS architecture to universities worldwide

Connecting the parts and running the synthesis for Digilent Nexys 4 DDR board with Xilinx Artix 7 FPGA

Connecting the parts and running the synthesis for Terasic DE0 CV board with Altera Cyclone V FPGA

Connecting the parts for Terasic DE0 Nano board with Altera Cyclone IV FPGA

Peripherals from Digilent useful in creating student labs using MIPSfpga

MIPSfpga forum on MIPS Insider