-
Notifications
You must be signed in to change notification settings - Fork 1
Notes and Tips
...because it's hard to remember all those command lines...
- massive WTF warning: migen does not deterministically generate code!! You are not going crazy!! Running the exact same script twice can and will produce different netlists. This is because Python has non-deterministic iteration order of dictionaries/sets, which seems to occasionally trigger edge-case bugs (in my case, some CSR connections were being eliminated randomly). The work around seems to be to use "PYTHONHASHSEED". My fix is to put at the top of main this check:
if os.environ['PYTHONHASHSEED'] != "0":
print( "PYTHONHASHSEED must be 0 for deterministic compilation results; aborting" )
exit()
If your goal is to do "production-grade" hardware which includes customer support patches that don't involve extensively re-validating every single corner case, you'll want this check. This also means you shouldn't use migen to generate hard-coded security keys to be embedded in your verilog netlist. But I'd rather have a deterministic CSR space, especially when my application doesn't require random number generation.
-
If your video comes out "semi-backwards", try using the parameter "reverse=True" when requesting the DRAM port on either the read or write side (but not both). "semi-backwards" means, the overall scan order is correct, but every 16 or so pixels is scanned in reverse order. Basically what's happening is the memory wants to do a 256-bit burst transaction, and the order in which the smaller pixel data is piled into the burst packet is reversed by one side or the other.
-
Array indices in Python are foo[offset:offset+length]. So if you want to select the bottom 10 bits of a signal, you do a[:10] instead of a[9:0] (an empty spot has implicit meaning in Python it seems). The next 10 bits is b[10:20], instead of b[19:10]. You can't "reverse" MSB/LSB in Python like you can in Verilog by changing the index order in the brackets.
- FYI - This is the standard Python "slicing" syntax - https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/
-
If your installation is corrupted or there's a hard-fork of migen/litex, the packages are installed here by default: /usr/local/lib/python3.5/dist-packages. Nuke litex* and migen* and then re-run the install scripts. You'll also want to nuke your build/software directory because some Makefile paths are hard-coded (assuming you're using Florent's build env).
-
Apparently Python has package management issues, so people are recommending sandboxing it. There's by far too many options for doing this (I count four), which means I'm bound to get it wrong because I have no idea why I would want to use any one option over another, so I'm sticking with what the scripts do "by default", following the installation procedure to the letter as written in their README files.
-
Mismatches in bit vector lengths is kicked down the road to Verilog. So,
a = Signal(31)
b = Signal(16)
self.comb += a.eq(b)
Just generates the verilog "assign a = b".
It's best to confirm if the compiler does what you think it does in this case. There is a spec but there are extensions that can, for example, allow for signed numbers.
Confirmed that Vivado's convention is:
- if lvalue is wider than rvalue, then LSB-align rvalue and zero-pad
- if lvalue is narrower than rvalue, then lvalue gets the LSB's of rvalue
So wire [15:0] a; wire[7:0] b; assign a = b; is implicitly assign a[15:0] = {8'b0, b};
and
wire [7:0] a; wire[15:0] b; assign a = b; is implicitly assign a[7:0] = b[7:0];
-
Bug open here -> https://github.com/m-labs/migen/issues/102
-
Clock domains. The "migen way" is to late-bind clock domain definitions. If you don't mind it, it defaults to "sys". Otherwise, you use clock domain renamers to bind the clock domains of modules.
I haven't found good docs on the clock domain crossing functions. Based on some test code I ran and an attempt to read the very difficult Python code:
- Every function (class??) in Python has an implicit clock domain of "sys", which is 100 MHz "by default"
- If you want other clocks, you need a line like "self.clock_domains.cd_YOURDOMAIN = ClockDomain()" where YOURDOMAIN is the name of your clock domain. There's some magic thing you can do to create a reset synchronizer by including "AsyncResetSynchronizer(self.cd_YOURDOMAIN, RESET_SIGNAL)" where RESET_SIGNAL is your asynchronous (external) reset signal. Later on you can do "ResetSignal("YOURDOMAIN")" to summon the synchronized reset signal in any rvalue.
- Late-bind clock domains using ClockDomainsRenamer(). Assuming you wrote a function FOO just using the ambiguous "self.sync +=" idiom, you can bind all the "self.sync" statements inside FOO to a new domain NEWCLOCKDOMAIN with "bar = ClockDomainsRenamer("NEWCLOCKDOMAIN")(FOO())" and then "self.submodules += bar".
- If you have more than one domain to rename within a function, use a dictionary:
def _to_hdmi_in0_pix(m):
return ClockDomainsRenamer(
{"pix_o": "hdmi_in0_pix_o", # format seems to be "old name" : "new name"
"pix5x_o": "hdmi_in0_pix5x_o"}
)(m)
self.submodules.hdmi_out0_clk_gen = _to_hdmi_in0_pix(hdmi_out0_clk_gen)
The self.submodules += bar is critical; if you forget it migen silently fails to actually create FOO() and just gives you the initial value of whatever the signals were inside FOO().
"bar" is now your instance of FOO() and you have to pull signals out of bar by using the "." notation to reference the signal instances within the bar object.
-
Still trying to figure out why some things are self.submodule += and while some things are self.name = bar or name = bar, seems to have to do with scoping rules and a lot of .. implicit stuff that's probably obvious to a Python programmer but totally opaque to me.
-
If you have signals in domain A and you want them in domain CLKWONK, then you use MultiReg(). specifically, "self.specials += MultiReg(a, b, "CLKWONK")". Note the assignment here is a->b, unlike the right-to-left nature of a.eq(b). MultiReg() in this case creates two back to back FF's to cross the clock domains.
-
There seems to be some other really useful primitives inside litex's cdc.py with evocative names like "Gearbox", "PulseSynchronizer", "ElasticBuffer", and "BusSynchronizer". I wish someone would tell me what they did and how to use them. Like a datasheet, or some sort of documentation like that.
-
Here's an example of the kind of documentation you can expect (from litex/soc/interconnect/csr.py):
write_from_dev : bool
Allow the design to update the CSRStorage value.
*Warning*: The atomicity of reads by the CPU is not guaranteed.
alignment_bits : int
???
name : string
Provide (or override the name) of the ``CSRStatus`` register.
It's actually kind of critical to know how they treat alignment bits, because I have a lot of bugs where the firmware is jamming in words when the registers expect bytes, and vice versa. The convention is not fixed and if you don't write your code carefully you can even get into CSRs that are implicitly aligned to for example cache-line or burst-width boundaries which is terribly non-obvious.
- All signals are active high by convention, unless indicated by _b or _n suffix.
- Streams: outside the module, a source of the module is where the data comes from, sink is where the data goes
- A stream without any special modifiers degenerates to simply a self.comb += statement. Has to have modifiers glommed onto it, e.g. sink.AsyncFIFO() to incorporate some logic into that stream
- Boundary scan SPI repos: https://github.com/jordens/bscan_spi_bitstreams - set of precompiled bitstreams that allow bscan update of SPI flash.
- scripts/enter-env.sh # run this before trying anything in litex
- export TARGET= # to change which target the environment builds
- ssh [email protected] -L1234:localhost:1234 -v # forward port to localhost via ssh
edit -- what you really want is pycharm. You'll need to make sure it's pointing to python3.5 and also you need to setup the litex paths (using python3 setup.py in the litex directory) but once it's running you can control-click through the object hierarchy.
Use ipython or bpython (via pip install bpython) for browsing netlist structure
from targets.netv2.video import SoC
from platforms.netv2 import Platform
s=SoC(Platform())
now you can use dir/completion to nav hierachy via the "s" object. Signals resolve as "signal" and these can be dropped into litescope
There's some nice features and sanity checks available in the Vivado UI that LiteX doesn't have. Just because it's all command-line doesn't mean you have to lose this.
Inside your build's gateware directory (e.g. build/platform_target_cpu/gateware) there is a "top.tcl" file. This is a script that drives vivado. If you want to e.g. run additional analyses on the automated compilation, you can just comment out the "quit" statement at the end and then run the tcl file, and it'll stop with the UI open.
If you want to use a pre-built run, you can just load a checkpoint. Start vivado from the build's gateware directory (the command is literally "vivado", assuming you've already entered the litex environment using the "source scripts/enter-env.sh command") and then just type
open_checkpoint top_route.dcp
(there's also a menu option to "open checkpoint" under the File menu)
This will load in the entire design just after place and route and where all the analysis steps get run. From here you can view the graphical flooplan, schematics, and run additional timing/clock interaction reports.
To make a clock global (so you can pull it in submodules without passing it explicitly through the args), you declare a "clock domain"
self.clock_domains.cd_george = ClockDomain()
self.specials += Instance("BUFG", name="george", i_I=georgeunbuf, o_O=self.cd_george.clk)
self.specials += AsyncResetSynchronizer(self.cd_george, unsynced_reset_signal) # unsynced_reset_signal is any signal from another clock domain that's to serve as a reset, eg a PLL lock etc
In the submodule, you can then use
self.george_clk = self.ClockSignal("george") # use george_clk wherever the clock is needed
self.res_george_sync = ResetSignal("george") # use res_george_sync wherever the reset is needed
If you want to use an existing verilog module you've written in a LiteX project, you need to first import it by calling add_source for every file in your module's heirarchy, e.g.
self.platform.add_source("full/path/to_module/module1.v")
self.platform.add_source("full/path/to_module/module2.v")
By "full/path/to_module" I mean the full path from the directory where you type "make", e.g. the top level of the build, no the absolute path relative to the filesystem.
You want to put those calls in your "target" file, inside the init function of your top-level SoC (not in the file with the list of pin mappings).
Then, you use the "specials" construct to instantiate the module. This can be done at any level in the LiteX hierarchy once the source files are incorporated in your target file.
The python syntax:
self.specials += Instance("module1",
i_port1=litex_signal1,
o_port2=litex_signal2
)
Would correspond to this verilog template:
module module1 (
input wire signal1,
output wire signal2
);
The "i_" and "o_" prefixes are automatically added to the beginning of signal names from the verilog template.
Litescope is the LiteX equivalent of Chipscope ILA.
Litescope can export its data through several ports: a dedicated UART port (separate from the firmware/CLI port), ethernet, PCI-express...in this example, we assume we've created an additional UART port dedicated to the Litescope interface. This is the slowest method but also the simplest and good for bootstrap debugging.
You will want to insert these lines inside your top-level .py SoC description file. At the top:
from litex.soc.cores import uart # these two lines may already be present
from litex.soc.cores.uart import UARTWishboneBridge
from litescope import LiteScopeAnalyzer
Then inside your top-level SoC instantiation function:
class MySoC(BaseSoc):
csr_peripherals = {
"probably_several_other_peripherals_already", ## you'll have several of these
"analyzer", ## << this line
}
csr_map_update(BaseSoC.csr_map, csr_peripherals) ## you probably have this already
def __init__(self, platform *args, **kwargs): ## you definitely have this
BaseSoC.__init__(self, platform, *args, **kwargs)
# these two lines add in the debug interface
self.submodules.bridge = UARTWishboneBridge(platform.request("debug_serial"), self.clk_freq, baudrate=115200)
self.add_wb_master(self.bridge.wishbone)
### all of your SoC code here
## now connect your analyzer to the signals you want to look at
analyzer_signals = [
self.hdmi_in0.frame.de,
self.crg.rst,
self.hdmi_in0.resdetection.valid_i
]
self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024) # 1024 is the depth of the analyzer
## add this function to create the analyzer configuration file during compliation
def do_exit(self, vns, filename="test/analyzer.csv"):
self.analyzer.export_csv(vns, filename)
After you run "make gateware", there will be a "test" sub-directory in your build/ directory. The test directory will contain an analyzer.csv and a csr.csv file. These configure the host-side drivers for the litescope.
Now in order to engage litescope, you have to start the litex_server:
litex_server uart /dev/ttyUSB0 115200 &
This of course assumes you have your UART cable at /dev/ttyUSB0 and connected properly to the board. If it works well, you'll get some output to the effect of:
LiteX remote server [CommUART] port: /dev/ttyUSB0 / baudrate: 115200 / tcp port: 1234
You can telnet to this port to confirm it's up and running.
Once the server is running you can run the following script to drive the interface:
#!/usr/bin/env python3
import time
from litex.soc.tools.remote import RemoteClient
from litescope.software.driver.analyzer import LiteScopeAnalyzerDriver
wb = RemoteClient()
wb.open()
# # #
analyzer = LiteScopeAnalyzerDriver(wb.regs, "analyzer", debug=True)
analyzer.configure_trigger(cond={"hdmi_in0_frame_de" : 1}) # only include this if you want a trigger condition
# analyzer.configure_trigger(cond={"foo": 0xa5, "bar":0x5a}) # you can add my conditions by building a "dictionary"
analyzer.configure_subsampler(1)
analyzer.run(offset=32, length=128) # controls the "pre-trigger" offset plus length to download for this run, up to the length of the total analyzer config in hardware
analyzer.wait_done()
analyzer.upload()
analyzer.save("dump.vcd")
# # #
You'll get a "dump.vcd" file which you can view using gtkwave (which you should be able to apt-get install if you don't have it).
Check https://github.com/enjoy-digital/litescope/blob/master/litescope/core.py#L199 for arguments to Litescope.
So when instantiating the analyzer, do this:
self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024, cd=my_domain, cd_ratio=1)
If my_domain is <= sys_clk, cd_ratio = 1; if my_domain is >=sys_clk, cd_ratio=2. The fastest that you can go is 2x sys_clk.
At least in my design, sys_clk is set to 100MHz, so that would give a max of 200MHz.
Florent's version of LiteX combines the platform+target files into a single file, along with most of the code you need for a specific FPGA. This makes a lot of sense.
To bootstrap into the environment, you'll need to cross-compile gcc for lm32 first. Don't clone the gcc source repo. That's a waste of time. Follow this gcc build guide up until the point where they say to run configure. Then, run
mkdir build && cd build
../configure --target=lm32-elf
make
sudo make install
Then you can run florent's "setup.py" script, first by doing setup.py init, then setup.py install
The specific SoC you're working on is in the -soc directory, e.g. netv2-soc for netv2mvp.
To use bpython3 to try and wade through problems in code, entering is a bit different. Start bpython3 and use these commands:
bpython3
exec(open("./netv2mvp.py").read()
platform=Platform()
soc=VideoSoc(platform)
And that will get you to a point where you can browse around in the hierarchy for figuring out what connects to what.
If you get some complaint about the litex environments not being found, go up a dir and run "bpython3 setup.py update" -- this will do some weird magic that maps the libraries into the current version of bpython3 that you're using. This broke a couple times for me because as I updated my python environment something in the library mappings break.
Oh also -- note that if you installed any python stuff from within the litex build environment, it "eats" it and keeps it local. So you have to re-install e.g. pip, bpython3 and so forth.
The litescope analyzer discussed above is great until the CPU stops responding.
This git repo: https://github.com/bunnie/netv2-soc/tree/lm32-debug-example
Gives an example of how to connect a scope that doesn't rely on the CPU.
The first step is to do a "manual" instantiation of litescope, using the following idiom:
# litescope
litescope_serial = platform.request("serial", 1)
litescope_bus = Signal(128)
litescope_i = Signal(16)
litescope_o = Signal(16)
self.specials += [
Instance("litescope",
i_clock=ClockSignal(),
i_reset=ResetSignal(),
i_serial_rx=litescope_serial.rx,
o_serial_tx=litescope_serial.tx,
i_bus=litescope_bus,
i_i=litescope_i,
o_o=litescope_o
)
]
platform.add_source(os.path.join("litescope", "litescope.v"))
# litescope test
self.comb += [
litescope_bus.eq(0x12345678ABCFEF),
platform.request("user_led", 1).eq(litescope_o[0]),
platform.request("user_led", 2).eq(litescope_o[1]),
litescope_i.eq(0x5AA5)
]
This basically pulls in a Verilog version of litescope. To build the verilog litescope, you need to run "build.py" located here: https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py
Once you run this script a litescope.v file, analyzer.csv and csr.csv files will be made for you.
The configuration in this example gives you a 128-bit bus to monitor, plus 16 I/Os controllable via CSRs. You would use the I/Os to, for example, stimulate or monitor signal status using an external Python script running on the host via the UART bridge. I don't know how to do this yet, I just know that's why it's there.
If you can connect to the signals using the Python/litex idioms, then great. However, I was unable to find any way to hook directly into the lm32's signals using the idioms provided in Litex. In particular, I wanted to grab an I-cache trace at the moment a certain interrupt is triggered. This example does that by ganking signals directly at the Verilog level that would normally be hidden at the Python level. This might also be a handy way to debug FSM state because currently there isn't support for that either in litescope.
In order to do this, run the master script that generates the top.v file. It'll start invoking Vivado on it; just break out of that build, you don't need to wait for the first build to finish.
Edit the top.v file, and look for the line where the litescope signals get assigned. You can just search for the "magic number" it's hard-coded to in the Python file. Once you've done that, you need to hook up the LM32's signals. I've left an example of this here:
https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py#L79
Basically, the LM32 has an I-bus and a D-bus. You have to monitor the two individually; I only have enough signals to really watch the I-bus address, I-bus data, interrupt vector, the handshaking bits to Wishbone, and also 24 bits of DMA address (which I'm watching just to make sure we're not clobbering CPU code space). This is enough anyways to get the instruction traces out.
Drop into the build directory, and then fire up Vivado, and run "source top.tcl". This should cause the full compile to run on the edited top.v file, with the UI present, until the "quit" line is called at the end and Vivado exits automatically (you can edit out the terminating quit command if you want to use the Vivado UI some more to post-mortem stuff).
You now have a top.bit file which has litescope connected to the CPU.
Of course, if you run the Python script again, it'll nuke your top.v file and all the connection edits you've done will be overwritten. So this is a very specific point-purpose tool for debugging when you're really at your wit's end.
To analyze, you use the same commands as above (litex_server uart /dev/ttyUSB0 3000000) (oh btw did I mention 115200 baud is slow, and FTDIs suport 3Mbps), and then python3 ./test_analyzer.py.
The only significant deviation of this from the prior methodology is triggering. Because the signal structure isn't there, you have to figure out which bit of the 128 you want to trigger on. This is why in the Verilog I was very meticulous to specify all my bit-widths for every signal, even though I could have left them implicit and got the same result.
Below is an example of how the trigger condition is specified.
#analyzer.configure_trigger(cond={"hdmi_in1_interrupt" : 1}) # this idiom can't work
t = getattr(analyzer, "frontend_trigger_value") # instead we define trigger & mask manually
m = getattr(analyzer, "frontend_trigger_mask")
t.write(0x80000000000000000)
m.write(0x80000000000000000)
With this, I was able to capture the instruction trace that was run upon interrupt.
Also handy:
lm32-elf-objdump -D firmware.elf
Creates a disassembled version of your firmware file, which you can use to help figure out what code your CPU is running based on the fetch addresses. You of course need to have built the lm32-gcc package, which is a standard prerequisite for Florent's environment but not for Tim's Litex variant.