Skip to content

Notes and Tips

bunnie edited this page Apr 23, 2018 · 49 revisions

...because it's hard to remember all those command lines...

WTFs

  • massive WTF warning: migen does not deterministically generate code!! You are not going crazy!! Running the exact same script twice can and will produce different netlists. This is because Python has non-deterministic iteration order of dictionaries/sets, which seems to occasionally trigger edge-case bugs (in my case, some CSR connections were being eliminated randomly). The work around seems to be to use "PYTHONHASHSEED". My fix is to put at the top of main this check:
if os.environ['PYTHONHASHSEED'] != "0":
  print( "PYTHONHASHSEED must be 0 for deterministic compilation results; aborting" )
  exit()

If your goal is to do "production-grade" hardware which includes customer support patches that don't involve extensively re-validating every single corner case, you'll want this check. This also means you shouldn't use migen to generate hard-coded security keys to be embedded in your verilog netlist. But I'd rather have a deterministic CSR space, especially when my application doesn't require random number generation.

  • SPI programming via boundary scan (https://github.com/quartiq/bscan_spi_bitstreams): requires an older version of the bscan-spi protocol, doesn't work with the latest openocd. Use the version of openocd maintained by m-labs: https://github.com/m-labs/openocd

  • Cache bugs: no root cause known, but there seems to be a bug where if (a) your code is large enough that substantial portions miss in cache and (b) you're using a lot of DRAM bandwidth for video, the CPU will lock up if too many cache lines get evicted. Symptomatically, it looks like this: a system with "no load" (no video plugged in) is stable indefinitely; and a system "loaded" (video plugged in) is stable right after boot, but perhaps 1-2 minutes of idling on the command line results in the next command line command hard-crashing the CPU. The "fix" to this is to drop in parameters that cause the command line interpreter (the largest code path) to be fully visited every major loop of the system, preventing the code from being fully evicted from the cache, and thus preventing the crash. I suspect it has to do with a very specific arrangement of evicted cache lines that's only achievable with a very specific type of code. This bug was found on the lm32, but may apply to other CPUs if the problem is in the l2 wishbone cache.

  • Scoping: foo, self.foo, and self.submodules. Within a class, an object called "foo" is only accessible within that class, and has no impact on the verilog output unless foo directly impacts a self.comb, self.sync, or self.submodules. So for example, this code:

   foo = TimingDelay(3)  # create foo
   foo.sink.eq(bar)      # wire up its input
   foo.source.eq(baz)    # wire up its output

Will actually do nothing on the verilog netlist, because you created the TimingDelay() object in Python, but never actually told migen it needs to be turned into a verilog entity. This is very confusing, because you'll be like, hey, my code compiled perfectly, no warnings on the verilog netlist of dangling nets, but the outputs of the module are always stuck at zero. This is because the source/sink nets are initialized to zero by default, and without the verilog entity to override the initialization, you've got legal, verilog code that just ties all the output nets to zero and throws away the inputs.

So you have to add this line:

   self.submodules += foo   # add it to the list of submodules to evaluate into a verilog entity

For anything to actually happen.

Here's another scoping issue:

   Class MyBlock(Module):
      def __init__(self):
        foo = Signal()
        self.bar = bar = Signal()

In the code above, foo is "strictly local" to MyBlock. So you can't "read" the value of foo from outside the context of the module. bar, however, is visible outside the context of the module as follows:

   self.submodules.my_block_instance = my_block_instance = MyBlock()
   analyzer_signals = [
      my_block_instance.bar,
      my_block_instance.foo,  ## this will throw an error because foo is not visible outside MyBlock
   ]

First note we did the self.submodules thing to make sure my_block_instance actually turns into verilog code. Then we can access the member of my_block_instance called "bar" because we had bound it to self.bar.

I /think/ migen does some magic that if you add something to self.submodules, you can also access it directly as a self member object, e.g. this works identically to above:

   self.submodules.my_block_instance = MyBlock()
   analyzer_signals = [
      self.my_block_instance.bar,
      self.my_block_instance.foo,  ## this will throw an error because foo is not visible outside MyBlock
   ]

But I find this confusing so I like to explicitly assign it to a local name rather than using the migen implicit remap.

  • Apparently no type checking is a feature, not a bug. If you see the section on "streams" down below, complex parameter passing isn't done by actually passing arguments to the function, it's done by creating "streams" and sticking them together. If you happen to stick the wrong format of one stream into another module, there's no type inherent check that will throw an error or even a warning. This is particularly bad when you've got a function that doesn't reference the stream types by name, but rather iterates through them based on the expected record type. For example:
   class HardToDebug(Module):
     def __init__(self):
       self.sink = stream.Endpoint(my_layout)   # inputs
       self.source = stream.Endpoint(my_layout) # outptus

       for name in list_signals(my_layout):  # iterate through my expected layout's signal names
          s = getattr(self.sink, name)  # grab the signals and do something with it 
          # etc.

If I were to take the HardToDebug() module and assign the "sink" and "source" a layout something other than "my_layout", the code won't fail or throw any errors. Any names that happen to overlap in "my_layout" and "other_layout" will be iterated through, and the rest are silently untouched and do not generate verilog code.

Maybe in the Python world this is considered to be "flexible" and "featureful" because you can implement partial iterators on records with two different functions using layouts that are strict subsets of other layouts, but from where I stand this sounds like a system with no typechecking that can permit subtle bugs that will later bite you in the ass.

  • If your video comes out "semi-backwards", try using the parameter "reverse=True" when requesting the DRAM port on either the read or write side (but not both). "semi-backwards" means, the overall scan order is correct, but every 16 or so pixels is scanned in reverse order. Basically what's happening is the memory wants to do a 256-bit burst transaction, and the order in which the smaller pixel data is piled into the burst packet is reversed by one side or the other.

  • Array indices in Python are foo[offset:offset+length]. So if you want to select the bottom 10 bits of a signal, you do a[:10] instead of a[9:0] (an empty spot has implicit meaning in Python it seems). The next 10 bits is b[10:20], instead of b[19:10]. You can't "reverse" MSB/LSB in Python like you can in Verilog by changing the index order in the brackets.

  • If your installation is corrupted or there's a hard-fork of migen/litex, the packages are installed here by default: /usr/local/lib/python3.5/dist-packages. Nuke litex* and migen* and then re-run the install scripts. You'll also want to nuke your build/software directory because some Makefile paths are hard-coded (assuming you're using Florent's build env).

  • Apparently Python has package management issues, so people are recommending sandboxing it. There's by far too many options for doing this (I count four), which means I'm bound to get it wrong because I have no idea why I would want to use any one option over another, so I'm sticking with what the scripts do "by default", following the installation procedure to the letter as written in their README files.

  • Mismatches in bit vector lengths is kicked down the road to Verilog. So,

a = Signal(31)
b = Signal(16)
self.comb += a.eq(b)

Just generates the verilog "assign a = b".

It's best to confirm if the compiler does what you think it does in this case. There is a spec but there are extensions that can, for example, allow for signed numbers.

Confirmed that Vivado's convention is:

  • if lvalue is wider than rvalue, then LSB-align rvalue and zero-pad
  • if lvalue is narrower than rvalue, then lvalue gets the LSB's of rvalue

So wire [15:0] a; wire[7:0] b; assign a = b; is implicitly assign a[15:0] = {8'b0, b};

and

wire [7:0] a; wire[15:0] b; assign a = b; is implicitly assign a[7:0] = b[7:0];

  • Bug open here -> https://github.com/m-labs/migen/issues/102

  • Clock domains. The "migen way" is to late-bind clock domain definitions. If you just use self.sync += with no modifiers inside a function, all the synchronous blocks are bound to "sys" clock. This seems to be the preferred way to code a module that's all in one clock domain, and then on instantiating the module use ClockDomainRenamer() to bind the clock, e.g.:

class MySingleDomainBlock(Module):
  def __init__(self):
     self.sync += [  # no spec on sync, so "defaults" to "sys"
        # put synchronous logic here
     ]

class MyMultiDomainBlock(Module):
  def __init__(self):
     self.submodules.blockA = blockA = ClockDomainsRenamer("clkA")( MySingleDomainBlock() ) # maps sys -> clkA
     self.submodules.blockB = blockB = ClockDomainsRenamer("clkB")( MySingleDomainBlock() ) # maps sys -> clkB

I haven't found good docs on the clock domain crossing functions. Based on some test code I ran and an attempt to read the very difficult Python code:

  • Every function (class??) in Python has an implicit clock domain of "sys", which is 100 MHz "by default"
  • If you want other clocks, you need a line like "self.clock_domains.cd_YOURDOMAIN = ClockDomain()" where YOURDOMAIN is the name of your clock domain. There's some magic thing you can do to create a reset synchronizer by including "AsyncResetSynchronizer(self.cd_YOURDOMAIN, RESET_SIGNAL)" where RESET_SIGNAL is your asynchronous (external) reset signal. Later on you can do "ResetSignal("YOURDOMAIN")" to summon the synchronized reset signal in any rvalue.
  • Late-bind clock domains using ClockDomainsRenamer(). Assuming you wrote a function FOO just using the ambiguous "self.sync +=" idiom, you can bind all the "self.sync" statements inside FOO to a new domain NEWCLOCKDOMAIN with "bar = ClockDomainsRenamer("NEWCLOCKDOMAIN")(FOO())" and then "self.submodules += bar".
  • If you have more than one domain to rename within a function, use a dictionary:
        def _to_hdmi_in0_pix(m):
            return  ClockDomainsRenamer(
                {"pix_o": "hdmi_in0_pix_o",        # format seems to be "old name" : "new name"
                 "pix5x_o": "hdmi_in0_pix5x_o"}
                )(m)

        self.submodules.hdmi_out0_clk_gen = _to_hdmi_in0_pix(hdmi_out0_clk_gen)

In fact, it seems that when you just pass a single argument to ClockDomainsRenamer, it's shorthand that's expanded to a dictionary of {"sys" : single_argument} before the function proceeds.

The self.submodules += bar is critical; if you forget it migen silently fails to actually create FOO() and just gives you the initial value of whatever the signals were inside FOO().

"bar" is now your instance of FOO() and you have to pull signals out of bar by using the "." notation to reference the signal instances within the bar object.

  • If you have signals in domain A and you want them in domain CLKWONK, then you use MultiReg(). specifically, "self.specials += MultiReg(a, b, "CLKWONK")". Note the assignment here is a->b, unlike the right-to-left nature of a.eq(b). MultiReg() in this case creates two back to back FF's to cross the clock domains.

  • There seems to be some other really useful primitives inside litex's cdc.py with evocative names like "Gearbox", "PulseSynchronizer", "ElasticBuffer", and "BusSynchronizer". I wish someone would tell me what they did and how to use them. Like a datasheet, or some sort of documentation like that.

  • Here's an example of the kind of documentation you can expect (from litex/soc/interconnect/csr.py):

    write_from_dev : bool
        Allow the design to update the CSRStorage value.
        *Warning*: The atomicity of reads by the CPU is not guaranteed.

    alignment_bits : int
        ???

    name : string
        Provide (or override the name) of the ``CSRStatus`` register.

It's actually kind of critical to know how they treat alignment bits, because I have a lot of bugs where the firmware is jamming in words when the registers expect bytes, and vice versa. The convention is not fixed and if you don't write your code carefully you can even get into CSRs that are implicitly aligned to for example cache-line or burst-width boundaries which is terribly non-obvious.

Streams & records

It seems there is almost no documentation on streams and records in the migen 0.5 manual. Here's what I've managed to piece together.

A record is a bundle of named signals of various sizes. An example is

  my_record = [
      ("data", 32),
      ("address", 20)
  ]

These can be used as templates for payloads on streams. Directionality of signals within a record isn't established until you attempt to do a .connect() or .eq() operator, and every signal can have its own direction, and directions can be different in different instances depending upon how you ultimately connected the elements in the record. Flexible. But confusing.

Streams, in their most basic form, is basically a way to connect records together quickly without having to e.g. build an iterator that goes through all the elements in a record and do a self.comb += thing. They also imply the presence of some handshaking signals that you are free to ignore if you don't need them. The one caveat I /think/ is when connecting stream endpoints to records. For example, if you have a streamthing = stream.Endpoint(channel_layout) and a recordthing = Record(channel_layout), doing self.comb += streamthing.eq(recordthing) won't throw an error, but will silently generate no code(!!!). Confusingly, the opposite works, e.g. self.comb += recordthing.eq(streamthing) will generate the code you think it should...

A design pattern you'll often see in litex code is a module being instantiated with few if no arguments and a "sink" and "source":

class TimingGenerator(Module):
    def __init__(self):
        self.sink = sink = stream.Endpoint(frame_parameter_layout)
        self.source = source = stream.Endpoint(frame_timing_layout)
        # body of code...

# the parameter records are often in another file. For example, these were in common.py:
frame_parameter_layout = [
    ("hres",        hbits),
    ("hsync_start", hbits),
    ("hsync_end",   hbits),
    ("hscan",       hbits),
    ("vres",        vbits),
    ("vsync_start", vbits),
    ("vsync_end",   vbits),
    ("vscan",       vbits)
]

frame_timing_layout = [
    ("hsync", 1),
    ("vsync", 1),
    ("de",    1)
]

When you see this:

  • "sinks" are the inputs to the module
  • "sources" are the outputs of the module So in the example above, the frame_parameter_layout record is a bunch of signals that you would use to define a frame: hsync width, vsync width, hactive, vactive, etc. These are the inputs to the Timing Generator. The frame_timing_layout is a bunch of signals that are the timing for a frame: the actual hsync, vsync, and de. These are the outputs of the Timing Generator.

One gripe I have about this is that you're basically stuck with a typeless interface with little if no documentation or enforcement of what the parameters should be. It's flexible, but basically, it's the wild west in terms of hooking stuff up.

Stream formats

Streams have a set of implicit signals (I think it's implemented as a record itself) that seem to go along with them:

  • valid - master->slave signal indicating everything else has meaning
  • ready - slave->master signal for flow control
  • first - master->slave pulsed on the first element of a stream
  • last - maser->slave pulsed on the last element of a stream
  • payload - master->slave "data" -- so this is the binding point for the stream-defining record (argument of stream.Endpoint)

The role of master and slave is not determined until you use the .connect() call. The convention is

  self.comb += master.connect(slave)

Don't forget the "self.comb +=" -- migen is perfectly happy to run master.connect(slave) and not throw an error, but it also doesn't generate anything in the netlist, and all the signals will just be initialized to zero and stay that way.

Also, the /names/ are arbitrary, so you could reverse the statement to slave.connect(master) and that's valid, just the slave is now the actual master. Confusing, right? Why would you want to do this?

It's because "sink" are the "inputs" to a module, "source" are the "outputs" of a module. If you're inside the module, you might want to connect input to output, so sink.connect(source). But if you're outside the module and connecting two blocks together that are using streams to talk to each other, you'd do something more like foo.source.connect(bar.sink), connecting the output of foo to the input of bar.

Finally, note that you don't have to use the out of band signals in a stream. I've seen code that doesn't bind/use the first/last bits, ignore the ready signal, tie valid to 1, etc. It's "up to you" how you want to use it.

Connecting, Splitting, and Merging Streams

Because streams are used to pass parameters between modules, there's situations where maybe you don't want all the items in a stream to be mapped. One example is if you have a module that represents the CSR register bank to configure your entire block, but there's many modules within your block that want different bits of the CSR. You could make a CSR block for each module, but I think they don't do this because it clutters the name space at the firmware level, and maybe there's some issue with merging CSR lists within a module. Anyways, here's an example:

class Initiator(Module, AutoCSR):
    def __init__(self, cd):
        self.source = stream.Endpoint(frame_parameter_layout +
                                      frame_dma_layout)

Here a function called "Initiator" embodies the interface between the sys-clk domain registers and the pix-clk domain frame signals. Inside you've got an async fifo that crosses the clock boundary along with a mechanism that seems to queue frame changes so that new updates to the frame parameters only get loaded after the end of every frame. So it makes sense to bundle up both the DMA and frame parameters together. The "source" (output) of the Initiator is thus described by "adding" the two records together (this merges the arrays).

Later on, you'll want to assign the signals to just the right blocks, with some code that looks like this:

        self.comb += [
            initiator.source.connect(timing.sink, keep=list_signals(frame_parameter_layout)),
            initiator.source.connect(dma.sink, keep=list_signals(frame_dma_layout)),
        ]

The "keep" parameter allows you to pick a subset of a stream and assign it to another stream. The opposite function, "omit" is also available.

The way the .connect() algorithm seems to work is:

  • If no keep/omit specified, take the entire stream record (including implicit signals valid, ready, etc.) and wire them up according to the master/slave "direction" rules (not, sure, but it seems maybe there's a third field in a record where you could specify a direction? -- the implementation seems to summon that somehow)
  • If keep is specified, it keeps just the signals in the record, tossing out the implicits (valid, ready, etc.)
  • If omit is specified, it tosses out just the signals in the record, keeping the implicits

Decorating Streams

A stream defaults to just being a wire (with some handshaking signals, but the signals only have meaning based on what the master and slave actually do with them -- e.g. unenforced convention). These are managed by just creating stream.Endpoint().

However, you can also use stream.AsyncFIFO(), for example, and it also looks like there are upconverters, downconverters, sync FIFOs and stride converters as part of the stream package. There's no docs on any of these, but for the one variant I've actually looked at, stream.AsyncFIFO(), it looks like the design pattern is something like this:

        cdc = stream.AsyncFIFO(record_layout, depth)
        cdc = ClockDomainsRenamer({"write": "source_clock_domain",
                                   "read": "destination_clock_domain"})(cdc)
        self.submodules += cdc
        
        self.comb += [
           cdc.sink.connect( input_from_source, omit="valid" )  # break out valid so you can control it down below
           cdc.source.connect( output_to_dest )

           cdc.sink.valid.eq( enable_signal )  # some enable_signal to indicate when new values are coming in
        ]

Maybe there will be more documentation for the other tasty features inside streams, but this is the limit of what I know.

Conventions

  • All signals are active high by convention, unless indicated by _b or _n suffix.
  • Streams: "outside the module, a source of the module is where the data comes from, sink is where the data goes"
  • Valid/first/last/data are master->slave; ready is slave->master. This is handled by the "connect" function.
  • A stream without any special modifiers degenerates to simply a self.comb += statement. Has to have modifiers glommed onto it, e.g. sink.AsyncFIFO() to incorporate some logic into that stream

Links

Command line tips

  • scripts/enter-env.sh # run this before trying anything in litex
  • export TARGET= # to change which target the environment builds
  • ssh [email protected] -L1234:localhost:1234 -v # forward port to localhost via ssh

Digging through the design hierarchy's name space

edit -- what you really want is pycharm. You'll need to make sure it's pointing to python3.5 and also you need to setup the litex paths (using python3 setup.py in the litex directory) but once it's running you can control-click through the object hierarchy.

Use ipython or bpython (via pip install bpython) for browsing netlist structure

 from targets.netv2.video import SoC
 from platforms.netv2 import Platform
 s=SoC(Platform())

now you can use dir/completion to nav hierachy via the "s" object. Signals resolve as "signal" and these can be dropped into litescope

Getting into the Vivado UI

There's some nice features and sanity checks available in the Vivado UI that LiteX doesn't have. Just because it's all command-line doesn't mean you have to lose this.

Inside your build's gateware directory (e.g. build/platform_target_cpu/gateware) there is a "top.tcl" file. This is a script that drives vivado. If you want to e.g. run additional analyses on the automated compilation, you can just comment out the "quit" statement at the end and then run the tcl file, and it'll stop with the UI open.

If you want to use a pre-built run, you can just load a checkpoint. Start vivado from the build's gateware directory (the command is literally "vivado", assuming you've already entered the litex environment using the "source scripts/enter-env.sh command") and then just type

 open_checkpoint top_route.dcp

(there's also a menu option to "open checkpoint" under the File menu)

This will load in the entire design just after place and route and where all the analysis steps get run. From here you can view the graphical flooplan, schematics, and run additional timing/clock interaction reports.

Global clocks and resets

To make a clock global (so you can pull it in submodules without passing it explicitly through the args), you declare a "clock domain"

 self.clock_domains.cd_george = ClockDomain()

 self.specials += Instance("BUFG", name="george", i_I=georgeunbuf, o_O=self.cd_george.clk)
 self.specials += AsyncResetSynchronizer(self.cd_george, unsynced_reset_signal) # unsynced_reset_signal is any signal from another clock domain that's to serve as a reset, eg a PLL lock etc

In the submodule, you can then use

 self.george_clk = self.ClockSignal("george")  #  use george_clk wherever the clock is needed
 self.res_george_sync = ResetSignal("george")  # use res_george_sync wherever the reset is needed

Using your own verilog modules

If you want to use an existing verilog module you've written in a LiteX project, you need to first import it by calling add_source for every file in your module's heirarchy, e.g.

 self.platform.add_source("full/path/to_module/module1.v") 
 self.platform.add_source("full/path/to_module/module2.v") 

By "full/path/to_module" I mean the full path from the directory where you type "make", e.g. the top level of the build, no the absolute path relative to the filesystem.

You want to put those calls in your "target" file, inside the init function of your top-level SoC (not in the file with the list of pin mappings).

Then, you use the "specials" construct to instantiate the module. This can be done at any level in the LiteX hierarchy once the source files are incorporated in your target file.

The python syntax:

 self.specials += Instance("module1",
    i_port1=litex_signal1,
    o_port2=litex_signal2
 )

Would correspond to this verilog template:

  module module1 (
      input wire signal1,
      output wire signal2
  );

The "i_" and "o_" prefixes are automatically added to the beginning of signal names from the verilog template.

Litescope

Litescope is the LiteX equivalent of Chipscope ILA.

Litescope can export its data through several ports: a dedicated UART port (separate from the firmware/CLI port), ethernet, PCI-express...in this example, we assume we've created an additional UART port dedicated to the Litescope interface. This is the slowest method but also the simplest and good for bootstrap debugging.

You will want to insert these lines inside your top-level .py SoC description file. At the top:

  from litex.soc.cores import uart                       # these two lines may already be present
  from litex.soc.cores.uart import UARTWishboneBridge

  from litescope import LiteScopeAnalyzer 

Then inside your top-level SoC instantiation function:

class MySoC(BaseSoc):
    csr_peripherals = {
        "probably_several_other_peripherals_already", ## you'll have several of these
        "analyzer",   ## << this line
    }
    csr_map_update(BaseSoC.csr_map, csr_peripherals)  ## you probably have this already

    def __init__(self, platform *args, **kwargs):     ## you definitely have this
       BaseSoC.__init__(self, platform, *args, **kwargs)

       # these two lines add in the debug interface                                                                                                                                                                                                                                          
       self.submodules.bridge = UARTWishboneBridge(platform.request("debug_serial"), self.clk_freq, baudrate=115200)
       self.add_wb_master(self.bridge.wishbone)

     ### all of your SoC code here

     ## now connect your analyzer to the signals you want to look at
     analyzer_signals = [
            self.hdmi_in0.frame.de,
            self.crg.rst,
            self.hdmi_in0.resdetection.valid_i
        ]
     self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024)  # 1024 is the depth of the analyzer

## add this function to create the analyzer configuration file during compliation
    def do_exit(self, vns, filename="test/analyzer.csv"):
        self.analyzer.export_csv(vns, filename)

After you run "make gateware", there will be a "test" sub-directory in your build/ directory. The test directory will contain an analyzer.csv and a csr.csv file. These configure the host-side drivers for the litescope.

Now in order to engage litescope, you have to start the litex_server:

  litex_server uart /dev/ttyUSB0 115200 &

This of course assumes you have your UART cable at /dev/ttyUSB0 and connected properly to the board. If it works well, you'll get some output to the effect of:

LiteX remote server [CommUART] port: /dev/ttyUSB0 / baudrate: 115200 / tcp port: 1234

You can telnet to this port to confirm it's up and running.

Once the server is running you can run the following script to drive the interface:

#!/usr/bin/env python3
import time

from litex.soc.tools.remote import RemoteClient
from litescope.software.driver.analyzer import LiteScopeAnalyzerDriver

wb = RemoteClient()
wb.open()

# # #

analyzer = LiteScopeAnalyzerDriver(wb.regs, "analyzer", debug=True)

analyzer.configure_trigger(cond={"hdmi_in0_frame_de" : 1})  # only include this if you want a trigger condition
# analyzer.configure_trigger(cond={"foo": 0xa5, "bar":0x5a}) # you can add my conditions by building a "dictionary"

analyzer.configure_subsampler(1)
analyzer.run(offset=32, length=128)  # controls the "pre-trigger" offset plus length to download for this run, up to the length of the total analyzer config in hardware
analyzer.wait_done()
analyzer.upload()
analyzer.save("dump.vcd")

# # #

You'll get a "dump.vcd" file which you can view using gtkwave (which you should be able to apt-get install if you don't have it).

Setting Litescope Clocks

Check https://github.com/enjoy-digital/litescope/blob/master/litescope/core.py#L199 for arguments to Litescope.

So when instantiating the analyzer, do this:

     self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals, 1024, cd=my_domain, cd_ratio=1) 

If my_domain is <= sys_clk, cd_ratio = 1; if my_domain is >=sys_clk, cd_ratio=2. The fastest that you can go is 2x sys_clk.

At least in my design, sys_clk is set to 100MHz, so that would give a max of 200MHz.

Using Florent's Build Env

Florent's version of LiteX combines the platform+target files into a single file, along with most of the code you need for a specific FPGA. This makes a lot of sense.

To bootstrap into the environment, you'll need to cross-compile gcc for lm32 first. Don't clone the gcc source repo. That's a waste of time. Follow this gcc build guide up until the point where they say to run configure. Then, run

  mkdir build && cd build
  ../configure --target=lm32-elf
  make
  sudo make install

Then you can run florent's "setup.py" script, first by doing setup.py init, then setup.py install

The specific SoC you're working on is in the -soc directory, e.g. netv2-soc for netv2mvp.

To use bpython3 to try and wade through problems in code, entering is a bit different. Start bpython3 and use these commands:

  bpython3
  exec(open("./netv2mvp.py").read()
  platform=Platform()
  soc=VideoSoc(platform)

And that will get you to a point where you can browse around in the hierarchy for figuring out what connects to what.

If you get some complaint about the litex environments not being found, go up a dir and run "bpython3 setup.py update" -- this will do some weird magic that maps the libraries into the current version of bpython3 that you're using. This broke a couple times for me because as I updated my python environment something in the library mappings break.

Oh also -- note that if you installed any python stuff from within the litex build environment, it "eats" it and keeps it local. So you have to re-install e.g. pip, bpython3 and so forth.

Debugging when the CPU is wedged

The litescope analyzer discussed above is great until the CPU stops responding.

This git repo: https://github.com/bunnie/netv2-soc/tree/lm32-debug-example

Gives an example of how to connect a scope that doesn't rely on the CPU.

The first step is to do a "manual" instantiation of litescope, using the following idiom:

# litescope
litescope_serial = platform.request("serial", 1)
litescope_bus = Signal(128)
litescope_i = Signal(16)
litescope_o = Signal(16)
self.specials += [
		  Instance("litescope",
			   i_clock=ClockSignal(),
			   i_reset=ResetSignal(),
			   i_serial_rx=litescope_serial.rx,
			   o_serial_tx=litescope_serial.tx,
			   i_bus=litescope_bus,
			   i_i=litescope_i,
			   o_o=litescope_o
			   )
			  ]
platform.add_source(os.path.join("litescope", "litescope.v"))

# litescope test
self.comb += [
	      litescope_bus.eq(0x12345678ABCFEF),
			      platform.request("user_led", 1).eq(litescope_o[0]),
			      platform.request("user_led", 2).eq(litescope_o[1]),
			      litescope_i.eq(0x5AA5)
			      ]

This basically pulls in a Verilog version of litescope. To build the verilog litescope, you need to run "build.py" located here: https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py

Once you run this script a litescope.v file, analyzer.csv and csr.csv files will be made for you.

The configuration in this example gives you a 128-bit bus to monitor, plus 16 I/Os controllable via CSRs. You would use the I/Os to, for example, stimulate or monitor signal status using an external Python script running on the host via the UART bridge. I don't know how to do this yet, I just know that's why it's there.

If you can connect to the signals using the Python/litex idioms, then great. However, I was unable to find any way to hook directly into the lm32's signals using the idioms provided in Litex. In particular, I wanted to grab an I-cache trace at the moment a certain interrupt is triggered. This example does that by ganking signals directly at the Verilog level that would normally be hidden at the Python level. This might also be a handy way to debug FSM state because currently there isn't support for that either in litescope.

In order to do this, run the master script that generates the top.v file. It'll start invoking Vivado on it; just break out of that build, you don't need to wait for the first build to finish.

Edit the top.v file, and look for the line where the litescope signals get assigned. You can just search for the "magic number" it's hard-coded to in the Python file. Once you've done that, you need to hook up the LM32's signals. I've left an example of this here:

https://github.com/bunnie/netv2-soc/blob/lm32-debug-example/litescope/build.py#L79

Basically, the LM32 has an I-bus and a D-bus. You have to monitor the two individually; I only have enough signals to really watch the I-bus address, I-bus data, interrupt vector, the handshaking bits to Wishbone, and also 24 bits of DMA address (which I'm watching just to make sure we're not clobbering CPU code space). This is enough anyways to get the instruction traces out.

Drop into the build directory, and then fire up Vivado, and run "source top.tcl". This should cause the full compile to run on the edited top.v file, with the UI present, until the "quit" line is called at the end and Vivado exits automatically (you can edit out the terminating quit command if you want to use the Vivado UI some more to post-mortem stuff).

You now have a top.bit file which has litescope connected to the CPU.

Of course, if you run the Python script again, it'll nuke your top.v file and all the connection edits you've done will be overwritten. So this is a very specific point-purpose tool for debugging when you're really at your wit's end.

To analyze, you use the same commands as above (litex_server uart /dev/ttyUSB0 3000000) (oh btw did I mention 115200 baud is slow, and FTDIs suport 3Mbps), and then python3 ./test_analyzer.py.

The only significant deviation of this from the prior methodology is triggering. Because the signal structure isn't there, you have to figure out which bit of the 128 you want to trigger on. This is why in the Verilog I was very meticulous to specify all my bit-widths for every signal, even though I could have left them implicit and got the same result.

Below is an example of how the trigger condition is specified.

#analyzer.configure_trigger(cond={"hdmi_in1_interrupt" : 1})  # this idiom can't work
t = getattr(analyzer, "frontend_trigger_value")  # instead we define trigger & mask manually
m = getattr(analyzer, "frontend_trigger_mask")
t.write(0x80000000000000000)
m.write(0x80000000000000000)

With this, I was able to capture the instruction trace that was run upon interrupt.

Also handy:

lm32-elf-objdump -D firmware.elf

Creates a disassembled version of your firmware file, which you can use to help figure out what code your CPU is running based on the fetch addresses. You of course need to have built the lm32-gcc package, which is a standard prerequisite for Florent's environment but not for Tim's Litex variant.

Clone this wiki locally