Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding; direct access to strings without copying #37

Open
phst opened this issue Nov 9, 2015 · 2 comments
Open

String encoding; direct access to strings without copying #37

phst opened this issue Nov 9, 2015 · 2 comments

Comments

@phst
Copy link
Collaborator

phst commented Nov 9, 2015

We should define what string encodings are expected and what is returned to module code. With that specification we might be able to implement zero-copy access to string/buffer contents.

@aaptel
Copy link
Owner

aaptel commented Nov 13, 2015

Is the internal Emacs encoding clearly documented somewhere?

@phst
Copy link
Collaborator Author

phst commented Nov 16, 2015

There is some documentation in https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html and https://www.gnu.org/software/emacs/manual/html_node/elisp/Coding-System-Basics.html, and mule-conf.el has

;; The encoding used internally.  This encoding is meant to be able to save
;; any multibyte buffer without losing information.  It can change between
;; Emacs releases, tho, so should only be used for internal files.
(define-coding-system-alias 'emacs-internal 'utf-8-emacs-unix)

So this means that module authors could e.g. compare (coding-system-base 'emacs-internal) against a list of expected values. However, we cannot expect authors to actually do that, but we also shouldn't require the internal coding system to remain fixed (the comment clearly indicates that it's not fixed). I propose the following: Change all functions taking a string (char*) parameter to take an emacs_value instead, except for make_string and copy_string_contents. This way encoding issues are dealt with in a central place. Add arguments to make_string and copy_string_contents to specify the coding system explicitly. Then we can introduce a fast get_string_pointer function that also takes a coding system name and fails if the internal coding system is different. The string functions would then have the signature (hopefully the coding system name itself is always an ASCII string without embedded nulls!):

emacs_value make_string(emacs_env* env, const char* coding_system, const char *contents, size_t size);
bool copy_string_contents(emacs_env *env, emacs_value value, const char *coding_system, char *buffer, size_t *size);
bool get_string_pointer(emacs_env *env, emacs_value value, const char *coding_system, const char **pointer, size_t *size)

Unfortunately that makes operations such as interning more awkward; instead of return env->intern(env, "foo") you now have to say:

emacs_value sym = env->make_string(env, "foo", strlen("foo"), "us-ascii");
if (!sym) return NULL;
return env->intern(env, sym);

However, such code can be easily abstracted in a convenience function, whereas the current API implicitly requires using the internal encoding without specifying it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants