String encoding; direct access to strings without copying #37

phst · 2015-11-09T18:53:50Z

We should define what string encodings are expected and what is returned to module code. With that specification we might be able to implement zero-copy access to string/buffer contents.

aaptel · 2015-11-13T01:26:01Z

Is the internal Emacs encoding clearly documented somewhere?

phst · 2015-11-16T22:30:39Z

There is some documentation in https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html and https://www.gnu.org/software/emacs/manual/html_node/elisp/Coding-System-Basics.html, and mule-conf.el has

;; The encoding used internally.  This encoding is meant to be able to save
;; any multibyte buffer without losing information.  It can change between
;; Emacs releases, tho, so should only be used for internal files.
(define-coding-system-alias 'emacs-internal 'utf-8-emacs-unix)

So this means that module authors could e.g. compare (coding-system-base 'emacs-internal) against a list of expected values. However, we cannot expect authors to actually do that, but we also shouldn't require the internal coding system to remain fixed (the comment clearly indicates that it's not fixed). I propose the following: Change all functions taking a string (char*) parameter to take an emacs_value instead, except for make_string and copy_string_contents. This way encoding issues are dealt with in a central place. Add arguments to make_string and copy_string_contents to specify the coding system explicitly. Then we can introduce a fast get_string_pointer function that also takes a coding system name and fails if the internal coding system is different. The string functions would then have the signature (hopefully the coding system name itself is always an ASCII string without embedded nulls!):

emacs_value make_string(emacs_env* env, const char* coding_system, const char *contents, size_t size);
bool copy_string_contents(emacs_env *env, emacs_value value, const char *coding_system, char *buffer, size_t *size);
bool get_string_pointer(emacs_env *env, emacs_value value, const char *coding_system, const char **pointer, size_t *size)

Unfortunately that makes operations such as interning more awkward; instead of return env->intern(env, "foo") you now have to say:

emacs_value sym = env->make_string(env, "foo", strlen("foo"), "us-ascii");
if (!sym) return NULL;
return env->intern(env, sym);

However, such code can be easily abstracted in a convenience function, whereas the current API implicitly requires using the internal encoding without specifying it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String encoding; direct access to strings without copying #37

String encoding; direct access to strings without copying #37

phst commented Nov 9, 2015

aaptel commented Nov 13, 2015

phst commented Nov 16, 2015

String encoding; direct access to strings without copying #37

String encoding; direct access to strings without copying #37

Comments

phst commented Nov 9, 2015

aaptel commented Nov 13, 2015

phst commented Nov 16, 2015