-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: use BV class as python value for Cryptol bit sequences #116
Conversation
This Cryptol branch has the corresponding submodule bump and tests updated to reflect these changes: https://github.com/GaloisInc/cryptol/tree/cryptol-bits-as-BV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run the code, but it looks generally good. There's some style stuff that I think we should follow up on, though.
3cf029b
to
96cc7eb
Compare
This PR does not remove the previously added support for |
Is there support in the new “BV” module for moving to and from BitVector? |
@weaversa Not currently, but there could be! Would that be useful? |
Well, yes. If you’re planning to remove support for BitVector, it would be nice to have a ready way to merge the new stuff into existing code that uses BitVector already. |
0ee7dbd
to
a978288
Compare
.github/workflows/ci.yml
Outdated
- name: Build | ||
run: | | ||
cabal build all | ||
- name: Cabal argo tests | ||
run: cabal test argo | ||
- name: Cabal file-echo-api tests | ||
run: cabal test file-echo-api | ||
- name: Python Cryptol unit tests | ||
working-directory: ./python/cryptol | ||
run: python3 -m unittest discover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final issue: can we set this up to run as part of cabal test
? We already run Python scripts as part of our normal test suite, and having some tests require a different command to run than the others seems like a recipe for ongoing confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1539ba2 tweaks things so cabal test python
will run the python unit test and discovery, and the success of any discovered python unit tests in the directory structure is a part of the existing Haskell test suite's success. Feels a little odd to have the haskell test framework invoking the python unit test framework... but I think it's fine for now and at least it's simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay... but the test are failing in a way I don't see locally and specifically running mypy on the files in question doesn't produce the error either... hmmmm....
It would be nice for the cryptol.call function to accept strings as input. Otherwise, I've written this:
Also, BitVector accepts named parameters, so one can write |
This has already simplified my client code quite a bit by allowing me to pass results from |
RE strings as arguments: we can make something like that work, but I don't think it fits well with Python's design. Python is pretty careful to keep a distinction between Unicode strings, and their realizations under various encodings as byte arrays. We could say "It's all UTF-8", but that seems a bit risky vs asking clients to call |
👍 on keyword arguments to the constructor :-) |
I see. I don't know Python all that well. What is a good way to handle passing strings? I'm fine with enforcing encode, and I'm also fine using the helper function I wrote, but perhaps something similar could be supported in the package? So, am I correct to think we could have something like:
or
|
As of e6e76ed you should be able to use |
You're right (it's number 2). Python strings can be converted to bytes by calling their
(pardon my Danish - had to find something with a non-trivial representation in UTF-8) |
Ok! So that’s a fine solution. Thanks for showing me |
I'm hesitant to have it call encode on strings by default, but certainly open to the possibility. The argument for not doing so is two-fold:
The best argument for doing a silent coercion is that Cryptol does one:
Unfortunately for us, most of the world these days expects silent coercions to be done in UTF-8, but Cryptol seems to be doing ISO-8859-1. This makes me think that it's in users' interest to be careful here. |
Cryptol gives some wonderful error messages when it's provided with Unicode that isn't representable in ISO-8858-1!
|
If this is common enough to warrant a package function, would something like this fit the bill? @staticmethod
def byte_seq(source : Union[str, bytes], encoding : str = 'UTF-8') -> List['BV']:
"""Convert ``source`` into a sequence of 8-bit ``BV``s.
If ``source`` is a string it is treated as UTF-8 encoded bytes by default.
Other encodings can be selected via the optional ``encoding`` keyword.
See python's ``str.encode()`` function for supported encodings.
"""
if isinstance(source, str):
return [BV(8, c) for c in source.encode(encoding=encoding)]
elif isinstance(source, bytes):
return [BV(8, c) for c in source]
else: # type: ignore
raise ValueError(f'Cannot convert {source!r} into a byte sequence.') Which could be called as follows:
|
You know, that is interesting. Shouldn’t the |
Here, we get into a conflict between the default Python interpretation (UTF-8) and the default Cryptol one (ISO-8859-1). Cryptol string notation is really a convenience notation for finite sequences of bytes, rather than representing actual strings, and it seems that the fact that it allows non-ASCII characters may even be a bug. |
The magical green CI checkmark is back! @david-christiansen should we consider merging this now and continue the other features/etc separately? Or delay? |
I think we should go ahead and get this thing merged. |
Would you mind if we continued this as a Cryptol issue? More folks over there. |
I brought it up in our internal chat and it's being discussed quite a lot! But yes, I think that's where it's most appropriate. |
GaloisInc/cryptol#863 seems to be the place where it is most natural. |
I made this issue to summarize the general problem and allow us to include Python-specific discussion if needed without mucking up the pure-cryptol questions: #119 |
Adds a python class
BV
to represent bitvectors / cryptol bit sequences and uses that for the canonical return value for bit sequences returned from the python cryptol API.(The previous class we experimented with (BitVector) seemed optimized for use cases that did not line up well with our needs.)