-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm2c: prettify/change name mangling #2142
Conversation
8e7853c
to
c80455a
Compare
Will review this as soon as I get a chance. But broadly, the new naming scheme looks way nicer! :) |
ac6168b
to
d37b911
Compare
Can you update the PR description with the message you would like to include when landing this? (or if you prefer you would post that as a comment) |
test/wasm2c/hello.txt
Outdated
assert(wasm_rt_is_initialized()); | ||
init_instance_import(instance, Z_wasi_snapshot_preview1_instance); | ||
init_instance_import(instance, w2c_wasi0x5Fsnapshot0x5Fpreview1_instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how common underscores are likely to be in module names this seems like a somewhat of a regression.
I'm not sure what we can do about? Some other kind of escaping mechanism for modules with underscores in their name? Use double underscores to mark the start and end of the module name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely prettify this at the expense of making the mangling rules more complicated.
One possibility would be: In all names, we escape leading and trailing underscores and underscores that follow underscores. Additionally, in a module name, we transform a single underscore into a double underscore. The boundary between module name and export name is the first single underscore after w2c_
.
So wasi_snapshot_preview1
would become wasi__snapshot__preview1
. Unfortunately an export named _start
would become 0x5Fstart
which is a little uglier but maybe still acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed a new commit that does this -- hope it looks cleaner.
Sure, I think I had imagined it would be the current commit message for the first (main) commit:
wdyt? |
Maybe add something about the new |
d37b911
to
baf5666
Compare
This refactors the wasm2c name-mangling in two big ways: 1) Removing the `Z_` prefix and trying to make the names somewhat ergonomic/pretty. Previously the `factorial` export from a `fac` module looked like this: ``` u32 Z_facZ_factorial(Z_fac_instance_t*, u32); ``` After this commit, it looks like this: ``` u32 w2c_fac_factorial(w2c_fac*, u32); ``` Symbols defined by wasm2c itself (including instantiate, free, get_func_type and the imported memory limits) are now prefixed with `wasm2c_` to avoid conflicting with names defined by the module. 2) Using globally unique (module-prefixed) names for functions, types, segments, and tags, even though they are currently static (internal-linkage) symbols in the .c output. This is preparation for a future "multiple .c output" option where these symbols will need to have external linkage.
baf5666
to
07d9a2d
Compare
Done in 07d9a2d |
In all names, we now escape leading underscores, trailing underscores, and underscores that follow underscores. Additionally, in a module name, we transform a single underscore into a double underscore. The boundary between module name and export name is the first single underscore after w2c_. `wasi_snapshot_preview1` becomes `wasi__snapshot__preview1`. Unfortunately an export named `_start` now becomes `0x5Fstart`.
e022fea
to
ac71de5
Compare
wasm2c/examples/rot13/main.c
Outdated
wasm_rt_memory_t memory; | ||
char* input; | ||
}; | ||
} w2c_host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is adding this extra typedef part of the this change too? Or was this file just out-of-date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably should have been a typedef all along (so main.c can refer to it as w2c_host
instead of struct w2c_host
, and to match the wasm2c generated code which always typedef's the name of the struct), but it's not critical...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can split that out as well? .. sorry for being so pedantic, but it helps my rather slow brain understand PR much better when its focused on just one thing. Feel free to push back on this if you think its too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem. Stand by...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a5a8164. I'm guessing there's going to be more in c-writer.cc that you'll want split out, which I may try to get to first to save you the trouble...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done for now)
58e9ada
to
0e87340
Compare
@@ -455,7 +458,7 @@ module doesn't use any globals, memory or tables. | |||
The most interesting part is the definition of the function `fac`: | |||
|
|||
```c | |||
static u32 w2c_fac(Z_fac_instance_t* instance, u32 w2c_p0) { | |||
static u32 w2c_fac_fac_0(w2c_fac* instance, u32 w2c_p0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the _0
suffix on the function name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see we have an internal name and then an exported name that simply forwards its arg to the internal one.
As a followup should we potentially just export the function itself rather than a forwarding function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly. In this case the names had to be deconflicted because the ID of the function itself is $fac, and also the export is named "fac", so the internal function gets the _0
suffix.
I think trying to "directly" export a function like this may be doable but a bit complicated. :-( The internal functions are named and called based on their generated name or debug name (e.g. $f0 or $fac), whereas the exports are named based on their export name (in this case, also "fac"). The same function can be exported multiple times under different names, so the module could export this function as "fac" and "fac2" and "fac3" and they would all have to be in the .h file.
If you think this is valuable, I think what we'd try to do is detect a "direct" export (an exported function where the export name is identical to the name of the function itself) and handle it specially by declaring it directly in the .h
file, and then for any other export of the same function, we just forward back to the original implementation. I can see a pathway here but it might be best done in another PR if you're amenable. :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I yes, that makes total sense. I think continue to separate internal names from export names makes sense give that context. I don't think we need to special case for when they happen to match.
I do wonder if we could avoid declaring extra function though.. maybe function exports could be some kind of symbol alias instead? I guess that runs into visibility issues, and is kind of compiler-specific. Its a shame C doesn't really have anything like aliases built-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think if we were willing to limit ourselves to GCC and clang, we could use __attribute__((alias))
no problem. But I don't know how to do this on MSVC. :-/
src/c-writer.cc
Outdated
}; | ||
auto my_ishexdigit = [&](uint8_t ch) { | ||
return my_isdigit(ch) || (ch >= 'A' && ch <= 'F'); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simply declaring these as normal static C functions outside this method body?
I guess its mostly aesthetic, but unless we actually need to capture stuff from the current function I'm not a huge fan of inner lambda functions like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 6974bae
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
d4870c4
to
6974bae
Compare
The ultimate bike-shedding PR. :-) Currently sequenced behind #2137.
This refactors the wasm2c name-mangling in two big ways:
Removing the 'Z_' prefix and trying to make the names somewhat ergonomic/pretty. Previously the
factorial
export from afac
module looked like this:After this PR, it looks like this:
The name of the instance is now
w2c_fac
and the exported function is simplyw2c_
+ module_name +_
+ export_name.It gets less pretty if there are forbidden chars in the module or export name, because those still have to be escaped/mangled (and an underscore in a module name now has to be escaped), but I think in the common case these are an ergonomic improvement.
Symbols defined by wasm2c itself (including
instantiate
/free
/get_func_type
and the imported memory properties) are now prefixed withwasm2c_
to avoid conflicting with anything defined by the module (prefixed withwasm_
(joining thewasm_rt_...
family)w2c_
).Using globally unique (module-prefixed) names for functions, types, segments, and tags, even though these are currently static (internal-linkage) symbols in the .c output. This is preparation for a future "multiple .c output" option where these symbols will need to have external linkage (wasm2c roadmap ideas #2019).
Internally, this tries to tidy up the name-handling logic into more-understandable routines (e.g. ClaimName, FindUniqueName) that encapsulate the raw manipulation of the symbol set and map.