Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracy profiler on BH not working #17099

Open
mywoodstock opened this issue Jan 24, 2025 · 2 comments
Open

Tracy profiler on BH not working #17099

mywoodstock opened this issue Jan 24, 2025 · 2 comments
Assignees

Comments

@mywoodstock
Copy link
Contributor

On blackhole, we initially get kernel compile error, but with the following patch, it runs:

diff --git a/tt_metal/hw/inc/blackhole/dev_mem_map.h b/tt_metal/hw/inc/blackhole/dev_mem_map.h
index 075edd005c..b97e3c5601 100644
--- a/tt_metal/hw/inc/blackhole/dev_mem_map.h
+++ b/tt_metal/hw/inc/blackhole/dev_mem_map.h
@@ -48,7 +48,7 @@

 /////////////
 // Firmware/kernel code holes
-#define MEM_BRISC_FIRMWARE_SIZE (5 * 1024 + 128)
+#define MEM_BRISC_FIRMWARE_SIZE (5 * 1024 + 256)
 // TODO: perhaps put NCRISC FW in the scratch area and free 1.5K after init (GS/WH)
 #define MEM_NCRISC_FIRMWARE_SIZE 1536
 #define MEM_TRISC0_FIRMWARE_SIZE 1536

But when trying to profile Resnet50, the run with profiler hangs. (Passes fine without profiler).

Branch: asarje/bh-rn50-20250123
Compile with profiler: ./build_metal.sh -p --debug
Run the model with profiler: python -m tracy -p -r -v -m pytest "\"tests/ttnn/integration_tests/resnet/test_ttnn_functional_resnet50.py::test_resnet_50[pretrained_weight_false-batch_size=16-act_dtype=DataType.BFLOAT8_B-weight_dtype=DataType.BFLOAT8_B-math_fidelity=MathFidelity.LoFi-device_params={'l1_small_size': 24576}]\""

The run will hang.

@mo-tenstorrent
Copy link
Contributor

As per conversation with Paul first thing to check is entirely l1 data cache which is set in risc_common

@mo-tenstorrent
Copy link
Contributor

mo-tenstorrent commented Jan 25, 2025

Confirming that disabling L1 cache fixes the issue.

Hanging run: https://github.com/tenstorrent/tt-metal/actions/runs/12959399845

Passing run https://github.com/tenstorrent/tt-metal/actions/runs/12966853376/job/36168151536#step:9:407

TT_METAL_DISABLE_L1_DATA_CACHE_RISCVS="BR,NC,TR" python -m tracy -p -r -v -m pytest "\"tests/ttnn/integration_tests/resnet/test_ttnn_functional_resnet50.py::test_resnet_50[pretrained_weight_false-batch_size=16-act_dtype=DataType.BFLOAT8_B-weight_dtype=DataType.BFLOAT8_B-math_fidelity=MathFidelity.LoFi-device_params={'l1_small_size': 24576}]\""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants