Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XiP from PSRAM #2083

Open
Dominaezzz opened this issue Sep 4, 2024 · 7 comments
Open

Support XiP from PSRAM #2083

Dominaezzz opened this issue Sep 4, 2024 · 7 comments
Labels
peripheral:psram PSRAM peripheral

Comments

@Dominaezzz
Copy link
Collaborator

https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-guides/external-ram.html#execute-in-place-xip-from-psram

CONFIG_SPIRAM_FETCH_INSTRUCTIONS and CONFIG_SPIRAM_RODATA are the specific esp-idf configs.

In short, this feature lets you copy the application and data from flash to PSRAM at start up, which means flash is no longer needed for the rest of your application.

This is important for #2081 where you want to store huge buffers in PSRAM and DMA them to a peripheral, but you don't want flash access temporarily disabling psram access, leading to the DMA being starved of data and the peripheral getting garbage when it inevitably goes too far ahead.

Somewhat related #1083 .

@bjoernQ
Copy link
Contributor

bjoernQ commented Sep 17, 2024

Turns out at least on ESP32-S3 we can execute code from PSRAM right after initializing PSRAM as we currently do - it's just that it's not writable via ibus (but we can via dbus and need to make sure to synchonize caches)

I think there are two ways

@yanshay
Copy link
Contributor

yanshay commented Dec 23, 2024

I'd like to place my vote in favor of implementing this issue and explain the rationale.

If I understand this right, this feature is practically required for applications that need to use large RGB displays (where buffer is in PSRAM) while at the same time using PSRAM intensively for application use.

Well, for complex applications, with user interface, using WiFi, SSL, async, etc. the limited amount of RAM on esp32 is quickly exhausted and PSRAM is required for memory allocations.
And for large displays (over 3.5") RGB is mandatory and required PSRAM memory for buffering.

There are many applications of this sort out there, and currently they are all written in C++ and out of the reach of rust developers.

Therefore, to enter the space of UI based applications, this feature is a must. I think that would be an effort wisely invested.

And to sum up:
"That's one small step for esp-hal, one giant leap for rustaceans."

@bjoernQ
Copy link
Contributor

bjoernQ commented Jan 16, 2025

The simplest thing to try I can think of would be:

  • map PSRAM as we already do, FLASH is still mapped
  • just memcopy everything from the mapped flash to psram
  • map psram to the origin of extram, invalidate caches

I think ESP-IDF directly reads flash / writes psram via SPI but just memcpy the mapped flash/psram should be easier (no code to port from ESP-IDF)

I haven't tried that (since then we would already have it :) ) but I guess it could work this way. All necessary changes would be in

pub(crate) fn init_psram(config: PsramConfig) {
let mut config = config;
utils::psram_init(&mut config);
const CONFIG_ESP32S3_INSTRUCTION_CACHE_SIZE: u32 = 0x4000;
const CONFIG_ESP32S3_ICACHE_ASSOCIATED_WAYS: u8 = 8;
const CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_SIZE: u8 = 32;
const CONFIG_ESP32S3_DATA_CACHE_SIZE: u32 = 0x8000;
const CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS: u8 = 8;
const CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE: u8 = 32;
const MMU_ACCESS_SPIRAM: u32 = 1 << 15;
const START_PAGE: u32 = 0;
extern "C" {
fn rom_config_instruction_cache_mode(
cfg_cache_size: u32,
cfg_cache_ways: u8,
cfg_cache_line_size: u8,
);
fn Cache_Suspend_DCache();
fn rom_config_data_cache_mode(
cfg_cache_size: u32,
cfg_cache_ways: u8,
cfg_cache_line_size: u8,
);
fn Cache_Resume_DCache(param: u32);
/// Set DCache mmu mapping.
///
/// [`ext_ram`]: u32 DPORT_MMU_ACCESS_FLASH for flash, DPORT_MMU_ACCESS_SPIRAM for spiram, DPORT_MMU_INVALID for invalid.
/// [`vaddr`]: u32 Virtual address in CPU address space.
/// [`paddr`]: u32 Physical address in external memory. Should be aligned by psize.
/// [`psize`]: u32 Page size of DCache, in kilobytes. Should be 64 here.
/// [`num`]: u32 Pages to be set.
/// [`fixes`]: u32 0 for physical pages grow with virtual pages, other for virtual pages map to same physical page.
fn cache_dbus_mmu_set(
ext_ram: u32,
vaddr: u32,
paddr: u32,
psize: u32,
num: u32,
fixed: u32,
) -> i32;
}
let start = unsafe {
const MMU_PAGE_SIZE: u32 = 0x10000;
const ICACHE_MMU_SIZE: usize = 0x800;
const FLASH_MMU_TABLE_SIZE: usize = ICACHE_MMU_SIZE / core::mem::size_of::<u32>();
const MMU_INVALID: u32 = 1 << 14;
const DR_REG_MMU_TABLE: u32 = 0x600C5000;
// calculate the PSRAM start address to map
// the linker scripts can produce a gap between mapped IROM and DROM segments
// bigger than a flash page - i.e. we will see an unmapped memory slot
// start from the end and find the last mapped flash page
//
// More general information about the MMU can be found here:
// https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-reference/system/mm.html#introduction
let mmu_table_ptr = DR_REG_MMU_TABLE as *const u32;
let mut mapped_pages = 0;
for i in (0..FLASH_MMU_TABLE_SIZE).rev() {
if mmu_table_ptr.add(i).read_volatile() != MMU_INVALID {
mapped_pages = (i + 1) as u32;
break;
}
}
let start = EXTMEM_ORIGIN + (MMU_PAGE_SIZE * mapped_pages);
debug!("PSRAM start address = {:x}", start);
// Configure the mode of instruction cache : cache size, cache line size.
rom_config_instruction_cache_mode(
CONFIG_ESP32S3_INSTRUCTION_CACHE_SIZE,
CONFIG_ESP32S3_ICACHE_ASSOCIATED_WAYS,
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_SIZE,
);
// If we need use SPIRAM, we should use data cache.Configure the mode of data :
// cache size, cache line size.
Cache_Suspend_DCache();
rom_config_data_cache_mode(
CONFIG_ESP32S3_DATA_CACHE_SIZE,
CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS,
CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE,
);
if cache_dbus_mmu_set(
MMU_ACCESS_SPIRAM,
start,
START_PAGE << 16,
64,
config.size.get() as u32 / 1024 / 64, // number of pages to map
0,
) != 0
{
panic!("cache_dbus_mmu_set failed");
}
let extmem = &*esp32s3::EXTMEM::PTR;
extmem.dcache_ctrl1().modify(|_, w| {
w.dcache_shut_core0_bus()
.clear_bit()
.dcache_shut_core1_bus()
.clear_bit()
});
Cache_Resume_DCache(0);
start
};
unsafe {
crate::soc::MAPPED_PSRAM.memory_range = start as usize..start as usize + config.size.get();
}
}

@EliteTK
Copy link
Contributor

EliteTK commented Jan 19, 2025

The recent 38C3 talk on liberating the WiFi on the ESP32 mentioned that reverse-engineering the ROM is allowed by espressif. So, after a bit of reverse engineering work and studying the IDF sources I've gotten XiP from PSRAM working on my ESP32S3.

https://gist.github.com/EliteTK/5a409431082b4a4c34bb560243f2cf61

The code is ported from the equivalent ESP-IDF with any excess fluff removed. The comments contain the reverse engineered, cleaned up, mostly valid C which represents what the Cache_ functions which get called are actually doing.

I haven't actually tested this with a display yet as I never tried writing the DMA from PSRAM code mentioned in the related issues because I knew (from experience with esp-idf-hal, before I enabled XiP from PSRAM) that it would be glitchy and unusable. But I hope to do that tomorrow or next week.

I think it would make sense to put this in init_psram with additional PsramConfig fields for specifying that you want this feature. Putting the .text and .rodata in PSRAM has an effect on the amount of remaining available PSRAM which is another reason why I think it make sense to do this there. I plan on putting together a draft PR to this effect.

The code for doing this for the ESP32-S2 and ESP32-S3 is very similar. There's a v2 file in the IDF which handles ESP32-P4. I don't actually know enough about ESP32 to know if that's the extent of SOCs which support this feature.

Lastly, I have minimal experience with the ESP32 and esp-hal, limited experience with embedded rust, limited experience with unsafe rust, and only about a year of proper rust experience so please do comment on the gist on anything you think looks odd/could be improved, I'll happily incorporate it as I keep working on this.

@bjoernQ
Copy link
Contributor

bjoernQ commented Jan 20, 2025

Wow - didn't notice there is a Cache_Flash_To_SPIRAM_Copy ROM function - nice

@EliteTK
Copy link
Contributor

EliteTK commented Jan 20, 2025

Wow - didn't notice there is a Cache_Flash_To_SPIRAM_Copy ROM function - nice

Funny thing about it: It has a bug.

For some reason mappings to the 0 page in flash (and only the zero page) all get coalesced when doing the copy to PSRAM (someone who knows more about ESP32 or maybe the IDF would possibly know why specifically the zero page, I would love to know), so that once everything is copied to PSRAM, anything which previously mapped to the 0 page in flash should now map at a single page in PSRAM. But the way that the function handles this is incorrect, the zero-page address is one higher than it should be when it's assigned to the in-out parameter which holds this information. This means that the first zero-page mapping will point at the copied zero page in PSRAM but any subsequent zero-page mapping will point to the next PSRAM page after the copied zero page.

I verified this by setting up the scenario and then calling the ROM function just in case my reading of the diassembly and ghidra's decompliation were both wildly wrong.

I could be wrong and this could be intentional (I don't see how), but I think for the esp-hal implementation I'll just re-create the function in rust but without the bug.

There's also the question of why the Cache_Flash_To_SPIRAM_Copy function is only used to copy half of the mapped memory (0-127 when calling it with CACHE_IBUS, and 128-255 when calling it with CACHE_DBUS). Presumably another artifact of the ESP-IDF that I don't understand.

It uses a spare MMU entry (511, the last one) to perform the copy by mapping subsequent PSRAM pages to that memory location and then memcpying the data. There is no problem when it only applies the copy to the first 256 mapped entries, but if for some reason the entire MMU table was filled with flash mappings and the weird "only the first 256 entries" thing was relaxed to cover the entire table, the last mapping would never get copied to PSRAM. But I guess at this point since your MMU table was full, you would have no reason to bother with XiP from PSRAM so maybe it's fine to error out in those cases (really, even if the page isn't copied, while incredibly unlikely, there's no guarantee that the code isn't currently executing from that memory area, which means that when the page is temporarily remapped, the code would crash, so guarding against this seems sensible anyway, properly fixing this would require running the setup code from RAM).

Lastly, I also found a bug in the ESP-IDF's workaround for a known ROM bug in the Cache_Count_Flash_Pages function, the function needs to account for the extra zero page mapping if it's the first time it's counting a zero page and then never again, hence the reason for the in-out parameter, but this incorrectly counts a zero page if the zero-page-count was zero even if the new zero-page-count is still zero (meaning that there is no zero page that needs accounting for yet). This means that if you have no zero pages mapped in either of the regions of the MMU table you will end up with an extra pages each instance of the call to the count function. Anyway, long story short, the patched version will over-compensate and under-report the page count in some scenarios.

So, all things considered, I think it would be clearer to just replicate the relatively simple Cache_Count_Flash_Pages in rust for the esp-hal version.

Really this whole thing has left me with just as many questions as I had before, except they're different questions, so I guess I learned something.

And I did now try running a framebuffer from PSRAM but seem to be struggling just getting the screen to display it remotely correctly, never mind displaying it without the kinds of glitches you see when you're running code from flash while the framebuffer is in PSRAM.

@EliteTK
Copy link
Contributor

EliteTK commented Jan 23, 2025

I've now opened a WIP PR for this in #3024 . I would appreciate any and all feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
peripheral:psram PSRAM peripheral
Projects
Status: Todo
Development

No branches or pull requests

5 participants