Skip to content

Notes and Tips

bunnie edited this page Mar 22, 2018 · 49 revisions

...because it's hard to remember all those command lines...

WTFs

  • massive WTF warning: migen does not deterministically generate code!! You are not going crazy!! Running the exact same script twice can and will produce different netlists. This is because Python has non-deterministic iteration order of dictionaries/sets, which seems to occasionally trigger edge-case bugs (in my case, some CSR connections were being eliminated randomly). The work around seems to be to use "PYTHONHASHSEED". My fix is to put at the top of main this check:
if os.environ['PYTHONHASHSEED'] != "0":
  print( "PYTHONHASHSEED must be 0 for deterministic compilation results; aborting" )
  exit()

If your goal is to do "production-grade" hardware which includes customer support patches that don't involve extensively re-validating every single corner case, you'll want this check. This also means you shouldn't use migen to generate hard-coded security keys to be embedded in your verilog netlist. But I'd rather have a deterministic CSR space, especially when my application doesn't require random number generation.

  • If your video comes out "semi-backwards", try using the parameter "reverse=True" when requesting the DRAM port on either the read or write side (but not both). "semi-backwards" means, the overall scan order is correct, but every 16 or so pixels is scanned in reverse order. Basically what's happening is the memory wants to do a 256-bit burst transaction, and the order in which the smaller pixel data is piled into the burst packet is reversed by one side or the other.

  • Array indices in Python are foo[offset:offset+length]. So if you want to select the bottom 10 bits of a signal, you do a[:10] instead of a[9:0] (an empty spot has implicit meaning in Python it seems). The next 10 bits is b[10:20], instead of b[19:10]. You can't "reverse" MSB/LSB in Python like you can in Verilog by changing the index order in the brackets.

  • If your installation is corrupted or there's a hard-fork of migen/litex, the packages are installed here by default: /usr/local/lib/python3.5/dist-packages. Nuke litex* and migen* and then re-run the install scripts. You'll also want to nuke your build/software directory because some Makefile paths are hard-coded (assuming you're using Florent's build env).

  • Apparently Python has package management issues, so people are recommending sandboxing it. There's by far too many options for doing this (I count four), which means I'm bound to get it wrong because I have no idea why I would want to use any one option over another, so I'm sticking with what the scripts do "by default", following the installation procedure to the letter as written in their README files.

  • Mismatches in bit vector lengths is kicked down the road to Verilog. So,

a = Signal(31)
b = Signal(16)
self.comb += a.eq(b)

Just generates the verilog "assign a = b".

It's best to confirm if the compiler does what you think it does in this case. There is a spec but there are extensions that can, for example, allow for signed numbers.

Confirmed that Vivado's convention is:

  • if lvalue is wider than rvalue, then LSB-align rvalue and zero-pad
  • if lvalue is narrower than rvalue, then lvalue gets the LSB's of rvalue

So wire [15:0] a; wire[7:0] b; assign a = b; is implicitly assign a[15:0] = {8'b0, b};

and

wire [7:0] a; wire[15:0] b; assign a = b; is implicitly assign a[7:0] = b[7:0];

  • Bug open here -> https://github.com/m-labs/migen/issues/102

  • Clock domains. The "migen way" is to late-bind clock domain definitions. If you don't mind it, it defaults to "sys". Otherwise, you use clock domain renamers to bind the clock domains of modules.

I haven't found good docs on the clock domain crossing functions. Based on some test code I ran and an attempt to read the very difficult Python code:

  • Every function (class??) in Python has an implicit clock domain of "sys", which is 100 MHz "by default"
  • If you want other clocks, you need a line like "self.clock_domains.cd_YOURDOMAIN = ClockDomain()" where YOURDOMAIN is the name of your clock domain. There's some magic thing you can do to create a reset synchronizer by including "AsyncResetSynchronizer(self.cd_YOURDOMAIN, RESET_SIGNAL)" where RESET_SIGNAL is your asynchronous (external) reset signal. Later on you can do "ResetSignal("YOURDOMAIN")" to summon the synchronized reset signal in any rvalue.
  • Late-bind clock domains using ClockDomainsRenamer(). Assuming you wrote a function FOO just using the ambiguous "self.sync +=" idiom, you can bind all the "self.sync" statements inside FOO to a new domain NEWCLOCKDOMAIN with "bar = ClockDomainsRenamer("NEWCLOCKDOMAIN")(FOO())" and then "self.submodules += bar".
  • If you have more than one domain to rename within a function, use a dictionary:
        def _to_hdmi_in0_pix(m):
            return  ClockDomainsRenamer(
                {"pix_o": "hdmi_in0_pix_o",        # format seems to be "old name" : "new name"
                 "pix5x_o": "hdmi_in0_pix5x_o"}
                )(m)

        self.submodules.hdmi_out0_clk_gen = _to_hdmi_in0_pix(hdmi_out0_clk_gen)

The self.submodules += bar is critical; if you forget it migen silently fails to actually create FOO() and just gives you the initial value of whatever the signals were inside FOO().

"bar" is now your instance of FOO() and you have to pull signals out of bar by using the "." notation to reference the signal instances within the bar object.

  • Still trying to figure out why some things are self.submodule += and while some things are self.name = bar or name = bar, seems to have to do with scoping rules and a lot of .. implicit stuff that's probably obvious to a Python programmer but totally opaque to me.

  • If you have signals in domain A and you want them in domain CLKWONK, then you use MultiReg(). specifically, "self.specials += MultiReg(a, b, "CLKWONK")". Note the assignment here is a->b, unlike the right-to-left nature of a.eq(b). MultiReg() in this case creates two back to back FF's to cross the clock domains.

  • There seems to be some other really useful primitives inside litex's cdc.py with evocative names like "Gearbox", "PulseSynchronizer", "ElasticBuffer", and "BusSynchronizer". I wish someone would tell me what they did and how to use them. Like a datasheet, or some sort of documentation like that.

  • Here's an example of the kind of documentation you can expect (from litex/soc/interconnect/csr.py):

    write_from_dev : bool
        Allow the design to update the CSRStorage value.
        *Warning*: The atomicity of reads by the CPU is not guaranteed.

    alignment_bits : int
        ???

    name : string
        Provide (or override the name) of the ``CSRStatus`` register.

It's actually kind of critical to know how they treat alignment bits, because I have a lot of bugs where the firmware is jamming in words when the registers expect bytes, and vice versa. The convention is not fixed and if you don't write your code carefully you can even get into CSRs that are implicitly aligned to for example cache-line or burst-width boundaries which is terribly non-obvious.

Streams & records

It seems there is almost no documentation on streams and records in the migen 0.5 manual. Here's what I've managed to piece together.

A record is a bundle of named signals of various sizes. An example is

  my_record = [
      ("data", 32),
      ("address", 20)
  ]

These can be used as templates for payloads on streams. Directionality of signals within a record isn't established until you attempt to do a .connect() or .eq() operator, and every signal can have its own direction, and directions can be different in different instances depending upon how you ultimately connected the elements in the record. Flexible. But confusing.

Streams, in their most basic form, is basically a way to connect records together quickly without having to e.g. build an iterator that goes through all the elements in a record and do a self.comb += thing.

A design pattern you'll often see in litex code is a module being instantiated with few if no arguments and a "sink" and "source":

class TimingGenerator(Module):
    def __init__(self):
        self.sink = sink = stream.Endpoint(frame_parameter_layout)
        self.source = source = stream.Endpoint(frame_timing_layout)

When you see this:

  • "sinks" are the inputs to the module
  • "sources" are the outputs of the module So in the example above, the frame_parameter_layout record is a bunch of signals that you would use to define a frame: hsync width, vsync width, hactive, vactive, etc. These are the inputs to the Timing Generator. The frame_timing_layout is a bunch of signals that are the timing for a frame: the actual hsync, vsync, and de. These are the outputs of the Timing Generator.

One gripe I have about this is that you're basically stuck with a typeless interface with little if no documentation or enforcement of what the parameters should be. It's flexible, but basically, it's the wild west in terms of hooking stuff up.

Streams have a set of implicit signals that seem to go along with them:

  • valid - master->slave signal indicating everything else has meaning
  • ready - slave->master signal for flow control
  • first - master->slave pulsed on the first element of a stream
  • last - maser->slave pulsed on the last element of a stream
  • payload - master->slave "data" -- so this is what your stream-defining record (argument of stream.Endpoint) is bound to

The role of master and slave is not determined until you use the .connect() call. The convention is

  self.comb += master.connect(slave)

Don't forget the "self.comb +=" -- migen is perfectly happy to run master.connect(slave) and not throw an error, but it also doesn't generate anything in the netlist.

Also, the /names/ are arbitrary, so you could reverse the statement to slave.connect(master) and that's valid, just the slave is now the actual master. Confusing, right? Why would you want to do this?

It's because "sink" are the "inputs" to a module, "source" are the "outputs" of a module. If you're inside the module, you might want to connect input to output, so sink.connect(source). But if you're outside the module and connecting two blocks together that are using streams to talk to each other, you'd do something more like foo.source.connect(bar.sink), connecting the output of foo to the input of bar.

Finally, note that you don't have to use the out of band signals in a stream. I've seen code that doesn't bind/use the first/last bits. It's "up to you" how you want to use it.

TODO:

  • write up AsyncFIFO instantiations
  • write up the "keep" signal trick

Conventions

  • All signals are active high by convention, unless indicated by _b or _n suffix.
  • Streams: "outside the module, a source of the module is where the data comes from, sink is where the data goes"
  • Valid/first/last/data are master->slave; ready is slave->master. This is handled by the "connect" function.
  • A stream without any special modifiers degenerates to simply a self.comb += statement. Has to have modifiers glommed onto it, e.g. sink.AsyncFIFO() to incorporate some logic into that stream

Links

Command line tips

  • scripts/enter-env.sh # run this before trying anything in litex
  • export TARGET= # to change which target the environment builds
  • ssh [email protected] -L1234:localhost:1234 -v # forward port to localhost via ssh

Digging through the design hierarchy's name space

edit -- what you really want is pycharm. You'll need to make sure it's pointing to python3.5 and also you need to setup the litex paths (using python3 setup.py in the litex directory) but once it's running you can control-click through the object hierarchy.

Use ipython or bpython (via pip install bpython) for browsing netlist structure

 from targets.netv2.video import SoC
 from platforms.netv2 import Platform
 s=SoC(Platform())

now you can use dir/completion to nav hierachy via the "s" object. Signals resolve as "signal" and these can be dropped into litescope

Getting into the Vivado UI

There's some nice features and sanity checks available in the Vivado UI that LiteX doesn't have. Just because it's all command-line doesn't mean you have to lose this.

Inside your build's gateware directory (e.g. build/platform_target_cpu/gateware) there is a "top.tcl" file. This is a script that drives vivado. If you want to e.g. run additional analyses on the automated compilation, you can just comment out the "quit" statement at the end and then run the tcl file, and it'll stop with the UI open.

If you want to use a pre-built run, you can just load a checkpoint. Start vivado from the build's gateware directory (the command is literally "vivado", assuming you've already entered the litex environment using the "source scripts/enter-env.sh command") and then just type

 open_checkpoint top_route.dcp

(there's also a menu option to "open checkpoint" under the File menu)

This will load in the entire design just after place and route and where all the analysis steps get run. From here you can view the graphical flooplan, schematics, and run additional timing/clock interaction reports.

Global clocks and resets

To make a clock global (so you can pull it in submodules without passing it explicitly through the args), you declare a "clock domain"

 self.clock_domains.cd_george = ClockDomain()

 self.specials += Instance("BUFG", name="george", i_I=georgeunbuf, o_O=self.cd_george.clk)
 self.specials += AsyncResetSynchronizer(self.cd_george, unsynced_reset_signal) # unsynced_reset_signal is any signal from another clock domain that's to serve as a reset, eg a PLL lock etc

In the submodule, you can then use

 self.george_clk = self.ClockSignal("george")  #  use george_clk wherever the clock is needed
 self.res_george_sync = ResetSignal("george")  # use res_george_sync wherever the reset is needed

Using your own verilog modules

If you want to use an existing verilog module you've written in a LiteX project, you need to first import it by calling add_source for every file in your module's heirarchy, e.g.

 self.platform.add_source("full/path/to_module/module1.v") 
 self.platform.add_source("full/path/to_module/module2.v") 

By "full/path/to_module" I mean the full path from the directory where you type "make", e.g. the top level of the build, no the absolute path relative to the filesystem.

You want to put those calls in your "target" file, inside the init function of your top-level SoC (not in the file with the list of pin mappings).

Then, you use the "specials" construct to instantiate the module. This can be done at any level in the LiteX hierarchy once the source files are incorporated in your target file.

The python syntax:

 self.specials += Instance("module1",
    i_port1=litex_signal1,
    o_port2=litex_signal2
 )

Would correspond to this verilog template:

  module module1 (
      input wire signal1,
      output wire signal2
  );

The "i_" and "o_" prefixes are automatically added to the beginning of signal names from the verilog template.

Litescope

Litescope is the LiteX equivalent of Chipscope ILA.

Litescope can export its data through several ports: a dedicated UART port (separate from the firmware/CLI port), ethernet, PCI-express...in this example, we assume we've created an additional UART port dedicated to the Litescope interface. This is the slowest method but also the simplest and good for bootstrap debugging.

You will want to insert these lines inside your top-level .py SoC description file. At the top:

  from litex.soc.cores import uart                       # these two lines may already be present
  from litex.soc.cores.uart import UARTWishboneBridge

  from litescope import LiteScopeAnalyzer 

Then inside your top-level SoC instantiation function:

class MySoC(BaseSoc):
    csr_peripherals = {
        "probably_several_other_peripherals_already", ## you'll have several of these
        "analyzer",   ## << this line
    }
    csr_map_update(BaseSoC.csr_map, csr_peripherals)  ## you probably have this already

    def __init__(self, platform *args, **kwargs):     ## you definitely have this
       BaseSoC.__init__(self, platform, *args, **kwargs)

       # these two lines add in the debug interface                                                                                                                                                                                                                                          
       self.submodules.bridge = UARTWishboneBridge(platform.request("debug_serial"), self.clk_freq, baudrate=115200)
       self.add_wb_master(self.bridge.wishbone)

     ### all of your SoC code here

     ## now connect your analyzer to the signals you want to look at
     analyzer_signals = [
            self.hdmi_in0.frame.de,
            self.crg.rst,
            self.hdmi_in0.resdetection.valid_i
        ]
     self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024)  # 1024 is the depth of the analyzer

## add this function to create the analyzer configuration file during compliation
    def do_exit(self, vns, filename="test/analyzer.csv"):
        self.analyzer.export_csv(vns, filename)

After you run "make gateware", there will be a "test" sub-directory in your build/ directory. The test directory will contain an analyzer.csv and a csr.csv file. These configure the host-side drivers for the litescope.

Now in order to engage litescope, you have to start the litex_server:

  litex_server uart /dev/ttyUSB0 115200 &

This of course assumes you have your UART cable at /dev/ttyUSB0 and connected properly to the board. If it works well, you'll get some output to the effect of:

LiteX remote server [CommUART] port: /dev/ttyUSB0 / baudrate: 115200 / tcp port: 1234

You can telnet to this port to confirm it's up and running.

Once the server is running you can run the following script to drive the interface:

#!/usr/bin/env python3
import time

from litex.soc.tools.remote import RemoteClient
from litescope.software.driver.analyzer import LiteScopeAnalyzerDriver

wb = RemoteClient()
wb.open()

# # #

analyzer = LiteScopeAnalyzerDriver(wb.regs, "analyzer", debug=True)

analyzer.configure_trigger(cond={"hdmi_in0_frame_de" : 1})  # only include this if you want a trigger condition
# analyzer.configure_trigger(cond={"foo": 0xa5, "bar":0x5a}) # you can add my conditions by building a "dictionary"

analyzer.configure_subsampler(1)
analyzer.run(offset=32, length=128)  # controls the "pre-trigger" offset plus length to download for this run, up to the length of the total analyzer config in hardware
analyzer.wait_done()
analyzer.upload()
analyzer.save("dump.vcd")

# # #

You'll get a "dump.vcd" file which you can view using gtkwave (which you should be able to apt-get install if you don't have it).

Setting Litescope Clocks

Check https://github.com/enjoy-digital/litescope/blob/master/litescope/core.py#L199 for arguments to Litescope.

So when instantiating the analyzer, do this:

     self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024, cd=my_domain, cd_ratio=1) 

If my_domain is <= sys_clk, cd_ratio = 1; if my_domain is >=sys_clk, cd_ratio=2. The fastest that you can go is 2x sys_clk.

At least in my design, sys_clk is set to 100MHz, so that would give a max of 200MHz.

Using Florent's Build Env

Florent's version of LiteX combines the platform+target files into a single file, along with most of the code you need for a specific FPGA. This makes a lot of sense.

To bootstrap into the environment, you'll need to cross-compile gcc for lm32 first. Don't clone the gcc source repo. That's a waste of time. Follow this gcc build guide up until the point where they say to run configure. Then, run

  mkdir build && cd build
  ../configure --target=lm32-elf
  make
  sudo make install

Then you can run florent's "setup.py" script, first by doing setup.py init, then setup.py install

The specific SoC you're working on is in the -soc directory, e.g. netv2-soc for netv2mvp.

To use bpython3 to try and wade through problems in code, entering is a bit different. Start bpython3 and use these commands:

  bpython3
  exec(open("./netv2mvp.py").read()
  platform=Platform()
  soc=VideoSoc(platform)

And that will get you to a point where you can browse around in the hierarchy for figuring out what connects to what.

If you get some complaint about the litex environments not being found, go up a dir and run "bpython3 setup.py update" -- this will do some weird magic that maps the libraries into the current version of bpython3 that you're using. This broke a couple times for me because as I updated my python environment something in the library mappings break.

Oh also -- note that if you installed any python stuff from within the litex build environment, it "eats" it and keeps it local. So you have to re-install e.g. pip, bpython3 and so forth.

Debugging when the CPU is wedged

The litescope analyzer discussed above is great until the CPU stops responding.

This git repo: https://github.com/bunnie/netv2-soc/tree/lm32-debug-example

Gives an example of how to connect a scope that doesn't rely on the CPU.

The first step is to do a "manual" instantiation of litescope, using the following idiom:

# litescope
litescope_serial = platform.request("serial", 1)
litescope_bus = Signal(128)
litescope_i = Signal(16)
litescope_o = Signal(16)
self.specials += [
		  Instance("litescope",
			   i_clock=ClockSignal(),
			   i_reset=ResetSignal(),
			   i_serial_rx=litescope_serial.rx,
			   o_serial_tx=litescope_serial.tx,
			   i_bus=litescope_bus,
			   i_i=litescope_i,
			   o_o=litescope_o
			   )
			  ]
platform.add_source(os.path.join("litescope", "litescope.v"))

# litescope test
self.comb += [
	      litescope_bus.eq(0x12345678ABCFEF),
			      platform.request("user_led", 1).eq(litescope_o[0]),
			      platform.request("user_led", 2).eq(litescope_o[1]),
			      litescope_i.eq(0x5AA5)
			      ]

This basically pulls in a Verilog version of litescope. To build the verilog litescope, you need to run "build.py" located here: https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py

Once you run this script a litescope.v file, analyzer.csv and csr.csv files will be made for you.

The configuration in this example gives you a 128-bit bus to monitor, plus 16 I/Os controllable via CSRs. You would use the I/Os to, for example, stimulate or monitor signal status using an external Python script running on the host via the UART bridge. I don't know how to do this yet, I just know that's why it's there.

If you can connect to the signals using the Python/litex idioms, then great. However, I was unable to find any way to hook directly into the lm32's signals using the idioms provided in Litex. In particular, I wanted to grab an I-cache trace at the moment a certain interrupt is triggered. This example does that by ganking signals directly at the Verilog level that would normally be hidden at the Python level. This might also be a handy way to debug FSM state because currently there isn't support for that either in litescope.

In order to do this, run the master script that generates the top.v file. It'll start invoking Vivado on it; just break out of that build, you don't need to wait for the first build to finish.

Edit the top.v file, and look for the line where the litescope signals get assigned. You can just search for the "magic number" it's hard-coded to in the Python file. Once you've done that, you need to hook up the LM32's signals. I've left an example of this here:

https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py#L79

Basically, the LM32 has an I-bus and a D-bus. You have to monitor the two individually; I only have enough signals to really watch the I-bus address, I-bus data, interrupt vector, the handshaking bits to Wishbone, and also 24 bits of DMA address (which I'm watching just to make sure we're not clobbering CPU code space). This is enough anyways to get the instruction traces out.

Drop into the build directory, and then fire up Vivado, and run "source top.tcl". This should cause the full compile to run on the edited top.v file, with the UI present, until the "quit" line is called at the end and Vivado exits automatically (you can edit out the terminating quit command if you want to use the Vivado UI some more to post-mortem stuff).

You now have a top.bit file which has litescope connected to the CPU.

Of course, if you run the Python script again, it'll nuke your top.v file and all the connection edits you've done will be overwritten. So this is a very specific point-purpose tool for debugging when you're really at your wit's end.

To analyze, you use the same commands as above (litex_server uart /dev/ttyUSB0 3000000) (oh btw did I mention 115200 baud is slow, and FTDIs suport 3Mbps), and then python3 ./test_analyzer.py.

The only significant deviation of this from the prior methodology is triggering. Because the signal structure isn't there, you have to figure out which bit of the 128 you want to trigger on. This is why in the Verilog I was very meticulous to specify all my bit-widths for every signal, even though I could have left them implicit and got the same result.

Below is an example of how the trigger condition is specified.

#analyzer.configure_trigger(cond={"hdmi_in1_interrupt" : 1})  # this idiom can't work
t = getattr(analyzer, "frontend_trigger_value")  # instead we define trigger & mask manually
m = getattr(analyzer, "frontend_trigger_mask")
t.write(0x80000000000000000)
m.write(0x80000000000000000)

With this, I was able to capture the instruction trace that was run upon interrupt.

Also handy:

lm32-elf-objdump -D firmware.elf

Creates a disassembled version of your firmware file, which you can use to help figure out what code your CPU is running based on the fetch addresses. You of course need to have built the lm32-gcc package, which is a standard prerequisite for Florent's environment but not for Tim's Litex variant.

Clone this wiki locally