Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FullMemoryWE leads to excessive blockram usage on ECP5 #70

Closed
david-sawatzke opened this issue Jun 19, 2021 · 6 comments
Closed

FullMemoryWE leads to excessive blockram usage on ECP5 #70

david-sawatzke opened this issue Jun 19, 2021 · 6 comments

Comments

@david-sawatzke
Copy link
Contributor

The colorlight target doesn't build anymore, due to too much blockram being used. It seems like tx/rx slots use up an excessive amount.

Versions

Yosys 0.9+4081 (git sha1 5a73f296c, gcc 11.1.0 -march=native -mtune=generic -O2 -fno-plt -fPIC -Os)
nextpnr-ecp5 -- Next Generation Place and Route (Version c73d4cf6)
Project Trellis ecppack Version 1.0-505-g0e6a320
(current git versions)

Without

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  4064/12144    33%
Info: 	          TRELLIS_IO:    51/  197    25%
Info: 	                DCCA:     2/   56     3%
Info: 	              DP16KD:    41/   56    73%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:     4/  128     3%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

With Ethernet

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5400/12144    44%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    69/   56   123%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

With Ethernet (ntxslot = 1)

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5339/12144    43%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    61/   56   108%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

With Ethernet (ntxslot = 1, nrxslot = 1)

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5306/12144    43%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    57/   56   101%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

With Ethernet (ntxslot = 0, nrxslot = 1)

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5214/12144    42%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    49/   56    87%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

With Ethernet (ntxslot = 4, nrxslot = 1)

$ ./colorlight_5a_75x.py --build --with-ethernet
Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5403/12144    44%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    81/   56   144%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

It seems that a txslot takes 8 dp16kd (naive calculation means a tx slot takes 16kB, that's probably not how they're used) and an rxslot takes 4 dp16kd.

In my (very naive) search for ntxslots memory usage I only found

mems[n] = Memory(dw, depth)
which should only need one blockram, so I'm not quite sure where the other ones come from.

@david-sawatzke
Copy link
Contributor Author

david-sawatzke commented Aug 3, 2021

I've investigated this a bit more: The memory get has a port with a we granularity of 8, which means that FullMemoryWE transforms the buffer into 4 grains, so 4 different memory instances in the generated verilog. Yosys probably had a regression and couldn't catch this anymore, which is the reason why it was fine before

If I remove FullMemoryWE:

With Ethernet

Info: Device utilisation:
Info: 	       TRELLIS_SLICE:  5405/12144    44%
Info: 	          TRELLIS_IO:    66/  197    33%
Info: 	                DCCA:     3/   56     5%
Info: 	              DP16KD:    51/   56    91%
Info: 	          MULT18X18D:     4/   28    14%
Info: 	              ALU54B:     0/   14     0%
Info: 	             EHXPLLL:     1/    2    50%
Info: 	             EXTREFB:     0/    1     0%
Info: 	                DCUA:     0/    1     0%
Info: 	           PCSCLKDIV:     0/    2     0%
Info: 	             IOLOGIC:    15/  128    11%
Info: 	            SIOLOGIC:    44/   69    63%
Info: 	                 GSR:     0/    1     0%
Info: 	               JTAGG:     0/    1     0%
Info: 	                OSCG:     0/    1     0%
Info: 	               SEDGA:     0/    1     0%
Info: 	                 DTR:     0/    1     0%
Info: 	             USRMCLK:     0/    1     0%
Info: 	             CLKDIVF:     0/    4     0%
Info: 	           ECLKSYNCB:     0/   10     0%
Info: 	             DLLDELD:     0/    8     0%
Info: 	              DDRDLL:     0/    4     0%
Info: 	             DQSBUFM:     0/    8     0%
Info: 	     TRELLIS_ECLKBUF:     0/    8     0%
Info: 	        ECLKBRIDGECS:     0/    2     0%

EDIT:

Sorry, turns out I had accidentally removed FullMemoryWE for these results. It's still broken, the comment is adjusted accordingly.

Not sure what FullMemoryWE is good for, it was introduced with 6c3af74, so probably not a regression in yosys.

@david-sawatzke david-sawatzke reopened this Aug 4, 2021
@david-sawatzke david-sawatzke changed the title Too much blockram usage on ecp5 (colorlight 5a-75b)/Excessive blockram usage FullMemoryWE leads to excessive blockram usage on ECP5 Aug 4, 2021
@rowanG077
Copy link
Contributor

For reference I don't have this problem. But I use the standalone core generator.

@mithro
Copy link
Collaborator

mithro commented Aug 4, 2021

@umarcor - You were talking about this in reference to GHDL or something?

david-sawatzke added a commit to david-sawatzke/liteeth that referenced this issue Aug 8, 2021
On ecp5 `FullMemoryWE` leads to an increase of DP16KD block mem, while
it works better on Intel/Altera devices according to
6c3af74.

Simple solution: Allow it to be toggled
david-sawatzke added a commit to david-sawatzke/liteeth that referenced this issue Aug 8, 2021
On ecp5 `FullMemoryWE` leads to an increase of DP16KD block mem, while
it works better on Intel/Altera devices according to
6c3af74.

Simple solution: Make it configurable
david-sawatzke added a commit to david-sawatzke/litex-boards that referenced this issue Aug 8, 2021
Leads to an increase in DP16KD, first noticed in
enjoy-digital/liteeth#70.
With full_mem_we:
```
Info: 	              DP16KD:    41/   56    73%
```
Without:
```
Info: 	              DP16KD:    29/   56    51%
```
@Disasm
Copy link

Disasm commented Aug 8, 2021

I found the commit: 6c3af74
Before: DP16KD: 51/ 56 91%
After: DP16KD: 69/ 56 123%

UPD: oops, it was already mentioned.

@david-sawatzke
Copy link
Contributor Author

For some reason github didn't pick it up, but I opened a PR to make this configurable: #72

david-sawatzke added a commit to david-sawatzke/liteeth that referenced this issue Aug 8, 2021
On ecp5 `FullMemoryWE` leads to an increase of DP16KD block mem, while
it works better on Intel/Altera devices according to
6c3af74.

Simple solution: Make it configurable
@umarcor
Copy link

umarcor commented Aug 9, 2021

@umarcor - You were talking about this in reference to GHDL or something?

Yes. @stnolting wants to use "tool agnostic VHDL" in NEORV32, so we did several tests for inferring memories with GHDL and Yosys. There were two main issues: global enable and byte granularity.

Ref: antonblanchard/microwatt#294 (comment)

enjoy-digital added a commit that referenced this issue Aug 11, 2021
mac: Allow configuring usage of FullMemoryWE (fixes #70)
AEW2015 pushed a commit to AEW2015/litex-boards that referenced this issue Jun 2, 2022
Leads to an increase in DP16KD, first noticed in
enjoy-digital/liteeth#70.
With full_mem_we:
```
Info: 	              DP16KD:    41/   56    73%
```
Without:
```
Info: 	              DP16KD:    29/   56    51%
```
david-sawatzke added a commit to david-sawatzke/hub75_colorlight75_stuff that referenced this issue Dec 26, 2024
Also reduces memory needed by bios & reduces tx/rxslots, since (somehow)
the current version with the current toolchain needs more blockram than
is available.

Opened an issue
[here](enjoy-digital/liteeth#70), since this
also affects the example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants