Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c: prettify/change name mangling #2142

Merged
merged 9 commits into from
Feb 23, 2023
Merged

wasm2c: prettify/change name mangling #2142

merged 9 commits into from
Feb 23, 2023

Conversation

keithw
Copy link
Member

@keithw keithw commented Feb 4, 2023

The ultimate bike-shedding PR. :-) Currently sequenced behind #2137.

This refactors the wasm2c name-mangling in two big ways:

  1. Removing the 'Z_' prefix and trying to make the names somewhat ergonomic/pretty. Previously the factorial export from a fac module looked like this:

    u32 Z_facZ_factorial(Z_fac_instance_t*, u32);
    

    After this PR, it looks like this:

    u32 w2c_fac_factorial(w2c_fac*, u32);
    

    The name of the instance is now w2c_fac and the exported function is simply w2c_ + module_name + _ + export_name.

    It gets less pretty if there are forbidden chars in the module or export name, because those still have to be escaped/mangled (and an underscore in a module name now has to be escaped), but I think in the common case these are an ergonomic improvement.

    Symbols defined by wasm2c itself (including instantiate/free/get_func_type and the imported memory properties) are now prefixed with wasm2c_ wasm_ (joining the wasm_rt_... family) to avoid conflicting with anything defined by the module (prefixed with w2c_).

  2. Using globally unique (module-prefixed) names for functions, types, segments, and tags, even though these are currently static (internal-linkage) symbols in the .c output. This is preparation for a future "multiple .c output" option where these symbols will need to have external linkage (wasm2c roadmap ideas #2019).

Internally, this tries to tidy up the name-handling logic into more-understandable routines (e.g. ClaimName, FindUniqueName) that encapsulate the raw manipulation of the symbol set and map.

@keithw keithw requested review from sbc100 and shravanrn February 4, 2023 19:52
@keithw keithw force-pushed the w2c-prettify-names branch 7 times, most recently from 8e7853c to c80455a Compare February 5, 2023 17:09
@shravanrn
Copy link
Collaborator

Will review this as soon as I get a chance. But broadly, the new naming scheme looks way nicer! :)

Base automatically changed from w2c-fix-param-label-duplicates to main February 7, 2023 16:50
test/wasm2c/add.txt Outdated Show resolved Hide resolved
@keithw keithw force-pushed the w2c-prettify-names branch 3 times, most recently from ac6168b to d37b911 Compare February 13, 2023 19:12
@keithw keithw requested a review from sbc100 February 14, 2023 18:58
@sbc100
Copy link
Member

sbc100 commented Feb 14, 2023

Can you update the PR description with the message you would like to include when landing this? (or if you prefer you would post that as a comment)

assert(wasm_rt_is_initialized());
init_instance_import(instance, Z_wasi_snapshot_preview1_instance);
init_instance_import(instance, w2c_wasi0x5Fsnapshot0x5Fpreview1_instance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how common underscores are likely to be in module names this seems like a somewhat of a regression.

I'm not sure what we can do about? Some other kind of escaping mechanism for modules with underscores in their name? Use double underscores to mark the start and end of the module name?

Copy link
Member Author

@keithw keithw Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely prettify this at the expense of making the mangling rules more complicated.

One possibility would be: In all names, we escape leading and trailing underscores and underscores that follow underscores. Additionally, in a module name, we transform a single underscore into a double underscore. The boundary between module name and export name is the first single underscore after w2c_.

So wasi_snapshot_preview1 would become wasi__snapshot__preview1. Unfortunately an export named _start would become 0x5Fstart which is a little uglier but maybe still acceptable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a new commit that does this -- hope it looks cleaner.

@keithw
Copy link
Member Author

keithw commented Feb 14, 2023

Can you update the PR description with the message you would like to include when landing this? (or if you prefer you would post that as a comment)

Sure, I think I had imagined it would be the current commit message for the first (main) commit:

wasm2c: prettify/change name-mangling

This refactors the wasm2c name-mangling in two big ways:

  1. Removing the 'Z_' prefix and trying to make the names somewhat
    ergonomic/pretty. Previously the factorial export from a fac
    module looked like this:

    u32 Z_facZ_factorial(Z_fac_instance_t*, u32);
    

    After this commit, it looks like this:

    u32 w2c_fac_factorial(w2c_fac*, u32);
    
  2. Using globally unique (module-prefixed) names for functions,
    types, segments, and tags, even though they are currently static
    (internal-linkage) symbols in the .c output. This is preparation
    for a future "multiple .c output" option where these symbols
    will need to have external linkage.

wdyt?

@sbc100
Copy link
Member

sbc100 commented Feb 14, 2023

Can you update the PR description with the message you would like to include when landing this? (or if you prefer you would post that as a comment)

Sure, I think I had imagined it would be the current commit message for the first (main) commit:

wasm2c: prettify/change name-mangling

This refactors the wasm2c name-mangling in two big ways:

  1. Removing the 'Z_' prefix and trying to make the names somewhat
    ergonomic/pretty. Previously the factorial export from a fac
    module looked like this:

    u32 Z_facZ_factorial(Z_fac_instance_t*, u32);
    

    After this commit, it looks like this:

    u32 w2c_fac_factorial(w2c_fac*, u32);
    
  2. Using globally unique (module-prefixed) names for functions,
    types, segments, and tags, even though they are currently static
    (internal-linkage) symbols in the .c output. This is preparation
    for a future "multiple .c output" option where these symbols
    will need to have external linkage.

wdyt?

Maybe add something about the new wasm2c_ prefix as distinct from the w2c_ prefix?

@keithw keithw force-pushed the w2c-prettify-names branch from d37b911 to baf5666 Compare February 14, 2023 20:40
This refactors the wasm2c name-mangling in two big ways:

1) Removing the `Z_` prefix and trying to make the names somewhat
   ergonomic/pretty. Previously the `factorial` export from a `fac`
   module looked like this:

   ```
   u32 Z_facZ_factorial(Z_fac_instance_t*, u32);
   ```

   After this commit, it looks like this:

   ```
   u32 w2c_fac_factorial(w2c_fac*, u32);
   ```

   Symbols defined by wasm2c itself (including instantiate, free,
   get_func_type and the imported memory limits) are now prefixed with
   `wasm2c_` to avoid conflicting with names defined by the module.

2) Using globally unique (module-prefixed) names for functions, types,
   segments, and tags, even though they are currently static
   (internal-linkage) symbols in the .c output. This is preparation for
   a future "multiple .c output" option where these symbols will need to
   have external linkage.
@keithw keithw force-pushed the w2c-prettify-names branch from baf5666 to 07d9a2d Compare February 14, 2023 20:41
@keithw
Copy link
Member Author

keithw commented Feb 14, 2023

Maybe add something about the new wasm2c_ prefix as distinct from the w2c_ prefix?

Done in 07d9a2d

In all names, we now escape leading underscores, trailing underscores,
and underscores that follow underscores. Additionally, in a module name,
we transform a single underscore into a double underscore. The boundary
between module name and export name is the first single underscore after
w2c_.

`wasi_snapshot_preview1` becomes `wasi__snapshot__preview1`.
Unfortunately an export named `_start` now becomes `0x5Fstart`.
@keithw keithw force-pushed the w2c-prettify-names branch from e022fea to ac71de5 Compare February 14, 2023 21:46
wasm2c/README.md Outdated Show resolved Hide resolved
test/run-spec-wasm2c.py Outdated Show resolved Hide resolved
src/c-writer.cc Outdated Show resolved Hide resolved
src/c-writer.cc Outdated Show resolved Hide resolved
wasm_rt_memory_t memory;
char* input;
};
} w2c_host;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is adding this extra typedef part of the this change too? Or was this file just out-of-date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably should have been a typedef all along (so main.c can refer to it as w2c_host instead of struct w2c_host, and to match the wasm2c generated code which always typedef's the name of the struct), but it's not critical...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can split that out as well? .. sorry for being so pedantic, but it helps my rather slow brain understand PR much better when its focused on just one thing. Feel free to push back on this if you think its too much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem. Stand by...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a5a8164. I'm guessing there's going to be more in c-writer.cc that you'll want split out, which I may try to get to first to save you the trouble...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done for now)

@keithw keithw force-pushed the w2c-prettify-names branch from 58e9ada to 0e87340 Compare February 14, 2023 22:41
@keithw keithw requested a review from sbc100 February 23, 2023 18:44
@@ -455,7 +458,7 @@ module doesn't use any globals, memory or tables.
The most interesting part is the definition of the function `fac`:

```c
static u32 w2c_fac(Z_fac_instance_t* instance, u32 w2c_p0) {
static u32 w2c_fac_fac_0(w2c_fac* instance, u32 w2c_p0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the _0 suffix on the function name here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see we have an internal name and then an exported name that simply forwards its arg to the internal one.

As a followup should we potentially just export the function itself rather than a forwarding function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. In this case the names had to be deconflicted because the ID of the function itself is $fac, and also the export is named "fac", so the internal function gets the _0 suffix.

I think trying to "directly" export a function like this may be doable but a bit complicated. :-( The internal functions are named and called based on their generated name or debug name (e.g. $f0 or $fac), whereas the exports are named based on their export name (in this case, also "fac"). The same function can be exported multiple times under different names, so the module could export this function as "fac" and "fac2" and "fac3" and they would all have to be in the .h file.

If you think this is valuable, I think what we'd try to do is detect a "direct" export (an exported function where the export name is identical to the name of the function itself) and handle it specially by declaring it directly in the .h file, and then for any other export of the same function, we just forward back to the original implementation. I can see a pathway here but it might be best done in another PR if you're amenable. :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I yes, that makes total sense. I think continue to separate internal names from export names makes sense give that context. I don't think we need to special case for when they happen to match.

I do wonder if we could avoid declaring extra function though.. maybe function exports could be some kind of symbol alias instead? I guess that runs into visibility issues, and is kind of compiler-specific. Its a shame C doesn't really have anything like aliases built-in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think if we were willing to limit ourselves to GCC and clang, we could use __attribute__((alias)) no problem. But I don't know how to do this on MSVC. :-/

src/c-writer.cc Outdated
};
auto my_ishexdigit = [&](uint8_t ch) {
return my_isdigit(ch) || (ch >= 'A' && ch <= 'F');
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply declaring these as normal static C functions outside this method body?

I guess its mostly aesthetic, but unless we actually need to capture stuff from the current function I'm not a huge fan of inner lambda functions like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6974bae

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@keithw keithw force-pushed the w2c-prettify-names branch from d4870c4 to 6974bae Compare February 23, 2023 21:26
@keithw keithw enabled auto-merge (squash) February 23, 2023 21:28
@keithw keithw merged commit 046451f into main Feb 23, 2023
@keithw keithw deleted the w2c-prettify-names branch February 23, 2023 21:45
@keithw keithw restored the w2c-prettify-names branch February 23, 2023 21:53
@keithw keithw deleted the w2c-prettify-names branch February 23, 2023 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants