Skip to content

Latest commit

 

History

History
298 lines (268 loc) · 20.6 KB

README.md

File metadata and controls

298 lines (268 loc) · 20.6 KB

MacroPlacement

MacroPlacement is an open, transparent effort to provide a public, baseline implementation of Google Brain's Circuit Training (Morpheus) deep RL-based placement method. We will provide (1) testcases in open enablements, along with multiple EDA tool flows; (2) implementations of missing or binarized elements of Circuit Training; (3) reproducible example macro placement solutions produced by our implementation; and (4) post-routing results obtained by full completion of the synthesis-place-and-route flow using both proprietary and open-source tools.

Our Latest Progress

Table of Contents

  • Testcases contains open-source designs such as Ariane, MemPool and NVDLA.
  • Enablements contains PDKs for open-source enablements such as NanGate45, ASAP7 and SKY130HD with FakeStack. Memories required by the designs are also included.
  • Flows contains tool setups and runscripts for both proprietary and open-source SP&R tools such as Cadence Genus/Innovus and OpenROAD.
  • Code Elements contains implementation of engines such as Clustering, Grouping, Gridding as well as Format translators required by Circuit Training flow.
  • Baseline for Circuit Training provides a baseline for Google Brain's Circuit Training.
  • FAQ
  • Related Links

Testcases

The list of available testcases is as follows.

In the Nature Paper, the authors report results for an Ariane design with 133 memory (256x16, single ported SRAM) macros. We observe that synthesizing from the available Ariane RTL in the lowRISC GitHub repository using 256x16 memories results in an Ariane design that has 136 memory macros. We outline the steps to instantiate the memories for Ariane 136 here and we show how we convert the Ariane 136 design to an Ariane 133 design that matches Google's memory macros count here.

We provide flop count, macro type and macro count for all the testcases in the the following table.

Testcase Flop Count Macro Details (macro type x macro count)
Ariane136 19839 (256x16-bit SRAM) x 136
Ariane133 19807 (256x16-bit SRAM) x 133
MemPool tile 18278 (256x32-bit SRAM) x 16 + (64x64-bit SRAM) x 4
MemPool group 360724 (256x32-bit SRAM) x 256 + (64x64-bit SRAM) x 64 + (128x256-bit SRAM) x 2 + (128x32-bit SRAM) x 2
NVDLA 45295 (256x64-bit SRAM) x 128
BlackParrot 214441 (512x64-bit SRAM) x 128 + (64x62-bit SRAM) x 32 + (32x32-bit SRAM) x 32 + (64x124-bit SRAM) x 16 + (128x16-bit SRAM) x 8 + (256x48-bit SRAM) x 4

All the testcases are available in the Testcases directory. Details of the sub-directories are

  • rtl: directory contains all the required rtl files to synthesize the design.
  • sv2v: If the main repository contains multiple Verilog files or SystemVerilog files, then we convert it to a single synthesizable Verilog RTL. This is available in the sv2v sub-drectory.

Enablements

The list of available enablements is as follows.

Open-source enablements NanGate45, ASAP7 and SKY130HD are utilized in our SP&R flow. All the enablements are available under the Enablements directory. Details of the sub-directories are:

  • lib directory contains all the required liberty files for standard cells and hard macros.
  • lef directory contains all the required lef files.
  • qrc directory contains all the required qrc tech files.

We also provide the steps to generate the fakeram models for each of the enablements based on the required memory configurations.

Flows

We provide multiple flows for each of the testcases and enablements. They are: (1) a logical synthesis-based SP&R flow using Cadence Genus and Innovus (Flow-1), (2) a physical synthesis-based SP&R flow using Cadence Genus iSpatial and Innovus (Flow-2), (3) a logical synthesis-based SP&R flow using Yosys and OpenROAD (Flow-3), and (4) creation of input data for Physical synthesis-based Circuit Training using Genus iSpatial (Flow-4).

The details of each flow are are given in the following.

  • Flow-1:
    Flow-1
  • Flow-2:
    Flow-2
  • Flow-3:
    Flow-3
  • Flow-4:
    Flow-4

In the following table, we provide the status details of each testcase on each of the enablements for the different flows.

Test Cases Nangate45 ASAP7 SKY130HD FakeStack
Flow-1 Flow-2 Flow-3 Flow-4 Flow-1 Flow-2 Flow-3 Flow-4 Flow-1 Flow-2 Flow-3 Flow-4
Ariane 136 Link Link Link Link Link Link N/A Link Link Link Link Link
Ariane 133 Link Link Link Link Link Link N/A Link Link Link Link Link
MemPool tile Link Link Link Link Link Link N/A Link Link Link Link Link
MemPool group Link Link N/A Link N/A N/A N/A N/A N/A N/A N/A N/A
NVDLA Link Link N/A Link Link Link N/A Link Link Link N/A Link
BlackParrot Link Link N/A Link N/A N/A N/A N/A N/A N/A N/A N/A

The directory structure is : ./Flows/<enablement>/<testcase>/<constraint|def|netlist|scripts|run>/. Details of the sub-directories for each testcase on each enablement are as follows.

  • constraint directory contains the .sdc file.
  • def directory contains the def file with pin placement and die area information.
  • scripts directory contains required scripts to run SP&R using the Cadence and OpenROAD tools.
  • netlist directory contains the synthesized netlist. We provide a synthesized netlist that can be used to run P&R.
  • run directory to run the scripts provided in the scripts directory.

Code Elements

The code elements below are the most crucial undocumented portions of Circuit Training. We thank Google engineers for Q&A in a shared document, as well as live discussions on May 19, 2022, that have explained aspects of several of the following code elements used in Circuit Training. All errors of understanding and implementation are the authors'. We will rectify such errors as soon as possible after being made aware of them.

  • Gridding determines a dissection of the layout canvas into some number of rows (n_rows) and some number of columns (n_cols) of gridcells. In Circuit Training, the purpose of gridding is to control the size of the macro placement solution space, thus allowing RL to train within reasonable runtimes. Gridding enables hard macros to find locations consistent with high solution quality, while allowing soft macros (standard-cell clusters) to also find good locations.
  • Grouping ensures that closely-related logic is kept close to hard macros and to clumps of IOs. The clumps of IOs are induced by IO locations with respect to the row and column coordinates in the gridded layout canvas.
  • Hypergraph clustering clusters millions of standard cells into a few thousand clusters. In Circuit Training, the purpose of clustering is to enable an approximate but fast standard cell placement that facilitates policy network optimization.
  • Force-directed placement places the center of each standard cell cluster onto centers of gridcells generated by Gridding.
  • Simulated annealing places the center of each macro onto centers of gridcells generated by Gridding. In Circuit Training, simulated annealing is used as a baseline to show the relative sample efficiency of RL.
  • LEF/DEF and Bookshelf (OpenDB, RosettaStone) translators ease the translation between different representations of the same netlist.
  • Plc client implements all three components of the proxy cost function: wirelength cost, density cost and congestion cost.

A Human Baseline for Circuit Training

We provide a human-generated baseline for Google Brain's Circuit Training by placing macros manually following similar (grid-restricted location) rules as the RL agent. The example for Ariane133 implemented on NanGate45 is shown here. We generate the manual macro placement in two steps:
(1) we call the gridding scripts to generate grid cells (27 x 27 in our case); (2) we manually place macros on the centers of grid cells.

FAQ

Why are you doing this?

  • The challenges of data and benchmarking in EDA research have, in our view, been contributing factors in the controversy regarding the Nature work. The mission of the TILOS AI Institute includes finding solutions to these challenges -- in high-stakes applied optimization domains (such as IC EDA), and at community-scale. We hope that our effort will become an existence proof for transparency, reproducibility, and democratization of research in EDA. [We applaud and thank Cadence Design Systems for allowing their tool runscripts to be shared openly by researchers, enabling reproducibility of results obtained via use of Cadence tools.]
  • We do understand that Google has been working hard to complete the open-sourcing of Morpheus, and that this effort continues today. However, as pointed out in this Doc, updated here, it has been more than a year since "Data and Code Availability" was committed with publication of the Nature paper. We consider our work a "backstop" or "safety net" for Google's internal efforts, and a platform for researchers to build on.

What can others contribute?

  • Our shopping list (updated August 2022) includes the following. Please join in!
    • simulated annealing on the gridded canvas: documentation and implementation
    • force-directed placement: documentation and implementation
    • donated cloud resources (credits) for experimental studies
    • relevant testcases with reference implementations and implementation flows (Cadence, OpenROAD preferred since scripts can be shared)
    • improved "fakeram" generator for the ASAP7 research PDK

What is your timeline?

  • We showed our progress at the Open-Source EDA and Benchmarking Summit birds-of-a-feather meeting on July 12 at DAC-2022.
  • We are now (late August 2022) studying benefits and limitations of the CT methodology itself, following a thread of experimental questions as noted here and here.

Related Links

  • F. -C. Chang, Y. -W. Tseng, Y. -W. Yu, S. -R. Lee, A. Cioba, et al., "Flexible multiple-objective reinforcement learning for chip placement", arXiv:2204.06407, 2022. [paper]
  • S. Yue, E. M. Songhori, J. W. Jiang, T. Boyd, A. Goldie, A. Mirhoseini and S. Guadarrama, "Scalability and Generalization of Circuit Training for Chip Floorplanning", ISPD, 2022. [paper][ppt]
  • R. Cheng and J. Yan, "On joint learning for solving placement and routing in chip design", Proc. NeurIPS, 2021. [paper] [code]
  • S. Guadarrama, S. Yue, T. Boyd, J. Jiang, E. Songhori, et al., "Circuit training: an open-source framework for generating chip floor plans with distributed deep reinforcement learning", 2021. [code]
  • A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, et al., "A graph placement methodology for fast chip design", Nature, 594(7862) (2021), pp. 207-212. [paper]
  • A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, et al., "Chip Placement with Deep Reinforcement Learning", arXiv:2004.10746, 2020. [paper]
  • Z. Jiang, E. Songhori, S. Wang, A. Goldie, A. Mirhoseini, et al., "Delving into Macro Placement with Reinforcement Learning", MLCAD, 2021. [paper]
  • A Gentle Introduction to Graph Neural Networks. [Link]
  • TILOS AI Institute. [link]