-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMA and wide RTIO issues #1521
Comments
Let's dial it down a bit. You are making a few assumptions that are not true and others need to be checked first.
That would be nice but I guess you are aware of the limitations and uncertainties of DMA, wide RTIO, and DRTIO.
It's likely that that has got nothing to do with Fastino.
That depends on the other assumptions being true.
You may well be two different issues that have little to do with Fastino: DMA and DRTIO for wide events. Also double check your code and wnsure that it's consistent and clean. The git hash on your master that you appear to be using (816a6f2) is early 2018. Other things indicate newer code. Might also be a build system issue. |
The problem persists with out using DMA. In fact, without DMA, I can't seem to get any continuous playback. My code is below. Please let me know if I misunderstand how to use the API. With the update interval below, the code underflows. Increasing the delay by a factor 2, The gateware was built based of our local artiq variant. I will rebuild based off the latest version here (with the nessecary modification to from artiq.experiment import *
class Saw(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("fastino_0")
print("build")
def run(self):
print("run")
self.n = 10
self.vlist = [[0.]*32] * self.n
for i in range(self.n):
for ii in range(32):
self.vlist[i][ii] = -10.0 +i * 20.0/self.n
self.do()
@kernel
def do(self):
self.core.reset()
f = self.fastino_0
self.core.break_realtime()
f.init()
delay(1*us)
f.update(0xffffffff)
delay(1*us)
for i in range(10):
f.set_leds(0xaa)
delay(.1*s)
f.set_leds(0x55)
delay(.1*s)
print("start playback")
self.core.break_realtime()
delay(0.5)
while True:
self.play()
self.core.wait_until_mu(now_mu())
@kernel
def play(self):
k0 = 14*7//2*8*2
k0 = k0 *(1<<10)
for i in range(self.n):
self.fastino_0.set_group(0, self.vlist[i])
delay_mu(k0) |
Yes. Without DMA the sustained rate is limited by the CPU speed. |
Removing the loop and using 10 updates gives the following error: https://pastebin.pl/view/79e186da In this scenario the hardware doesn't update at all. Edit: this was run with gate-ware derived from the most recent commit. |
You've probably neglected to update ARTIQ on the master as well – the RPC protocol recently changed. |
That was indeed the cause of the error. With the host updated, the error no-longer occurs. There are now no errors and still no correct hardware playback. |
self.vlist = [[0.]*32] * self.n doesn't do what you think it does. |
Indeed, it does not 😅 Fixed by converting to a list comprehension. The wide interface operates fine at 1.3 MS/s with up-to 500 updates (no DMA). |
And 2.55 MS/s as well, right? |
writing a single burst at 2.55 MS/s using the wide interface and DMA, does update the DAC. However, it does not play back the correct waveform. If I write a 100 sample ramp, the ramp appears to be down sampled to a ~20 sample ramp. |
Please elaborate and be thorough and precise. Give yourself some time. |
Running the code above with n=10 results in the experiment not terminating. |
I thought we agreed to test without dma. |
I clearly stated I was using DMA. Given the issue name, testing without DMA does seem a bit stange. |
@pathfinder49 there is clearly an issue here. The question is where. Is it in the fastino wide interface, or the rtio/dma infrastructure when using the wide interface. A good way of narrowing down the problem is taking a variable out of the equation — in this case by eliminating DMA. So the question is: do you see similar behaviour if you don’t use DMA (to the extent you can reproduce this given the finite sustained event rate/fifo size). |
There is clearly a DMA issue. DMA does not work correctly at 1.3 MS/s. Non-DMA writes work correctly at 1.3 MS/s. I can attempt 2.55 MS without DMA. However, this is a different issue to the DMA incompatibility. |
@pathfinder49 can you post the output traces for a linear ramp with and without DMA at 1.3 MS/s, so we can directly illustrate this? |
What’s the conclusion then? DMA is currently broken with wide RTIO in general? |
@pathfinder49 ignoring the breakage, can you summarise what we can achieve in terms of event rates for wide/narrow interfaces with / without DMA? |
Please verify that you can reproduce working max rate bursts with Fastino (again, without DMA and DRTIO). That's the first checkpoint I'm looking for and its a requirement for everything else to work. As I mentioned explicitly before, you should dramatically reduce your problem size. Please take the advice, otherwise this will be painful. As a small example: A DRTIO link can currently at best sustain 1 Gb/s. The Fastino data at 32 channels and 2.55 MS/s is already more than 1.3 Gb/s and that's still without any overhead or round trips. I don't know even what the max practical throughput is for DRTIO and what it is for wide RTIO events. And I don't know what it is for DMA. |
No data in this issue uses DRTIO. Please explain what further verification of the max burst rate performace you want. The plot above shows it working correctly at 2.55 MS/s without DMA. |
Wide interface
Narrow interface
|
Your screenshot above shows DRTIO being used and @dnadlinger 's comment about the master version points to it as well. You haven't given much information about your setup (we have the issue templates requesting that) so I have to speculate. For DMA with wide RTIO or overall DMA throughput maybe @sbourdeauducq can shed some light, or @cjbe |
Not to add to the confusion (I certainly don't have to add anything to the debugging effort), but I was referring to master as in the artiq_master process, not as in the DRTIO node. |
Summary of discussion on IRC today: the tests we've already done seem to exclude this being an issue that's directly related to Fastino. Instead, it seems to be Fastino uncovering a bug in ARTIQ DMA, wide RTIO, etc (or some combination thereof). The Fastino "narrow" interface does seem to work as expected, but the sustained event rate is rather low due to #946 |
Summarizing a long thread... @pathfinder49 with reference to #1521 (comment)
So, this is consistent with the claim that you can fill up the RTIO output FIFOs and drain them without issue, but the rate at which the CPU can generate wide events is much lower than 2.55MSPs. Am I right in thinking that in this case you're updating all 32 channels with each sample? Out of curiosity, do you have any idea what the maximum sustained event rate is for updating all 32 channels (using the _mu commands)? I'd assume that the rate in terms of samples/channel/second is still much higher than the narrow interface.
AFAICT you tested this at both 2.55MSPs and 1.3MSPs and found that in both cases there was what looked like missing samples (this is what "bad" means, right?). c.f. #1521 (comment) This was tested with 100sample DMA records. This could possibly be related to throughput (we haven't tested it with very low sample rates)? It's hard to tell without seeing screen shots of the behavior at different sample rates (I didn't see anything posted at 1.3MHz).
For reference/comparison with the above, this is updating only one channel at a time, and includes the CPU overhead for generating a sample (which is, unsurprisingly, much less than with the wide interface). It's not a direct comparison with the wide RTIO interface data above since that was for bursts.
I think that what you mean by this is 2.55MS/(channel*s), right? i.e. you can saturate a single channel (maybe even ~1.5 channels), but not two channels simultaneously. But, importantly there were no glitches observed with the narrow interface.
You didn't mention the number of channels in your post, but I assume that here you mean that the number of (samples*channels) will be lower for narrow than wide interface due to the FIFO size. tl;dr the narrow interface works about as well as expected from the DMA performance. The wide interface seems to hit a bug somewhere in ARTIQ when used with DMA. The wide interface may still be useful (e.g. if you want a few hundred updates on multiple channels, that's currently possible with the current wide interface, but not with the narrow interface), but long term someone needs to look at this. @jbqubit FYI...if this is an issue with wide RTIO and DMA (which is a possibility given the above) it may also affect Sayma. As far as I'm aware there has been no testing of wide RTIO and DMA (but correct me if I'm wrong @sbourdeauducq ). |
For reference: the Zynq DMA is about a factor of 5-10 better than Kasli right now, presumably this is similar to the performance we could get with the current hardware by implementing #946 (comment). See cf #946 for details. |
Any chance this will be tackled soon? Or a point where to start in the ARTIQ codebase?
is not enough info to start unfortunately. |
Other than locating it in the DMA/RTIO code area there has not been any further pinpointing that I know of.
It is where we start debugging it. |
Likely fixed by ea9fe9b |
Bug Report
One-Line Summary
The wide interface for Fastino shows several unexpected behaviors
that make it unusablewhen combined with DMA.Issue Details
Edit: These issues appear to arise from combining the wide interface and DMA
The 32 channel wide API for Fastino is broken.RTIODestinationUnreachable
errors after a few hardware updates.Boththefrequency andvoltage step number do not respond to update rate changes.Steps to Reproduce
log2_width
parameter in artiq.gateware.eem.Fastino set to 5.log2_width
Expected Behavior
Your System (omit irrelevant parts)
Operating System: Windows
Version of the gateware and runtime loaded in the core device:
Hardware involved: Kasli 1.1 (speed grade 3) and Fastino 1.1
The text was updated successfully, but these errors were encountered: