Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI decoding byte array offsets allows overflows #1602

Closed
wadealexc opened this issue Sep 6, 2019 · 15 comments
Closed

ABI decoding byte array offsets allows overflows #1602

wadealexc opened this issue Sep 6, 2019 · 15 comments

Comments

@wadealexc
Copy link

Version Information

  • vyper Version (output of vyper --version): 0.1.0-beta.12

What's your issue about?

The ABI uses calldata offsets to point to the location of dynamically-sized data. Vyper does not check for overflow when evaluating these offsets. For example:

@public
@constant
def temp(arr: bytes[1024]) -> bytes[1024]:
    return arr

The above contract does not correctly decode calldata provided by this Solidity contract:

pragma solidity ^0.5.0;

contract Test {

    bytes constant DATA = bytes("Hello, world!");

    function test(address _vyContract) public view returns (string memory) {
        bytes memory payload 
            = abi.encodeWithSignature("temp(bytes)", DATA);
        
        assembly {
            let neg := sub(0, 4)
            mstore(add(0x24, payload), neg) // modify offset of bytes array
            let res := staticcall(gas, _vyContract, add(0x20, payload), mload(payload), 0, 0)
            returndatacopy(0, 0, returndatasize)
            return(0, returndatasize)
        }
    }
}

Specifically, this line in the LLL contains the overflow issue: [calldatacopy, 320, [add, 4, [calldataload, 4]], 1056]. This is located here in the code.

How can it be fixed?

Check offset calculations for overflow, or make sure offsets are pointing within the bounds of CALLDATASIZE

@charles-cooper
Copy link
Member

Thanks for the report! I am having trouble why this is a vyper issue. What is the expected behavior when the caller provides malformed data?

@wadealexc
Copy link
Author

The data isn't really malformed. It's valid, ABI-encoded data that points outside of the range of calldata. Vyper should detect this and revert.

@charles-cooper
Copy link
Member

I guess I still don't understand. Doesn't the spec say that the offset word points to tail(X(i))? https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#formal-specification-of-the-encoding. I always get tripped up reading this spec so I am happy to be corrected if my understanding is incorrect.

@wadealexc
Copy link
Author

Sorry, I don't think my initial reply was particularly clear. The example I gave shows that Vyper accepts an overflow during ABI decoding which leads to it reading the first 32 bytes of calldata and treating it as an array size. In every variation I tried, this led to the call failing because CALLDATACOPY inevitably reads from a massive index in calldata. While I'm not convinced this is exploitable behavior, it seems much safer to check for the overflow and explicitly revert the transaction, rather than allow the behavior and fail because of the gas limit.

@charles-cooper
Copy link
Member

My understanding of the spec is that the offset word must point to where the data is. If it doesn't, it is ipso facto invalid input. Unless I am misunderstanding the spec.

@wadealexc
Copy link
Author

That is certainly true, but my point is that (because of the overflow) Vyper is not reading from where the offset points. If there were data there (nevermind the cost of accessing it), Vyper would not read it correctly.

@charles-cooper
Copy link
Member

charles-cooper commented Sep 8, 2019

Yes but the same would hold true if the offset pointed to any other location. I suppose that Vyper could validate that all the offsets are valid (i.e., start at dynamic_section_start, and check offset_n is equal to offset_n-1 + len_n-1 for all offsets), but this seems like unnecessary gas to spend on something where there is no attack vector (and the entire point of ABIv2 is to have O(1) access to data instead of O(n) access). If we can construct an attack vector (and I am beginning to suspect that there are indeed vectors here) then we should definitely consider doing this validation, otherwise it seems the worst case is an inconvenience to callers who provide invalid input getting tx failure instead of revert.

@wadealexc
Copy link
Author

What attack vectors are you beginning to suspect? I'm very curious!

Regardless of the existence of a tangible attack vector - this is the type of edge-case in behavior that can exacerbate other issues. Cleaning it up allows you to make much stronger statements about the behavior of the abi decoder, which is why I recommend it!

@charles-cooper
Copy link
Member

charles-cooper commented Sep 8, 2019

What attack vectors are you beginning to suspect? I'm very curious!

I don't have it fully fleshed out but an attacker might be able to take advantage of the offset pointing somewhere unexpected. So for example, let's say you have a contract which checks some bytearrays against some value. For instance,

def check_withdraw(sig1: bytes[100], sig2: bytes[100]) : 
  if sig1 == sig2 : 
    send(msg.sender, balance)

Using the technique in this issue, an attacker could set the offset of sig2 to be the offset of sig1, and then the check always evaluates to true.

@wadealexc
Copy link
Author

Yes, that's actually allowed in Solidity - and you don't need to do any overflow weirdness. Just set them to the same offset. By the ABI specification, those are the same array :)

@charles-cooper
Copy link
Member

Well then maybe there are some solidity contracts which are susceptible to this attack. I don't think that calldata which has the offset of sig2 set to the offset of sig1 is actually valid ABI-encoded data. Whether contracts check for this is a different matter. From https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#design-criteria-for-the-encoding:

The data of a variable or array element is not interleaved with other data and it is relocatable, i.e. it only uses relative “addresses”.

@charles-cooper
Copy link
Member

After thinking about it some more, I don't think this would result in a serious attack vector (any more than a caller switching two variables).

@fubuloubu
Copy link
Member

@charles-cooper what's the outcome of this issue?

@charles-cooper
Copy link
Member

@fubuloubu not sure, let's bring it up at another meeting

@charles-cooper
Copy link
Member

i don't think we need to check for the calldata case since calldata is always user-controlled. however, there is an issue with abi_decode when it decodes from memory, since that can allow introspection of memory. closing this issue as wontfix, see #3899 for the memory case.

@charles-cooper charles-cooper closed this as not planned Won't fix, can't repro, duplicate, stale Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants