Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Guidance on Mythril’s Code Transformation Process #1890

Open
matheus-junio-da-silva opened this issue Dec 10, 2024 · 2 comments

Comments

@matheus-junio-da-silva
Copy link

Dear Mythril Developers,

I hope this message finds you well. My name is M. Silva, and I’m a Computer Science student currently working on a project that involves understanding how Mythril transforms smart contract code for analysis.

I’ve been diving deep into the documentation and even debugging the codebase, but I’m struggling to fully grasp the intricacies, particularly when it comes to the symbolic execution process. I understand this is a foundational aspect of Mythril, and I truly want to comprehend it, not just for my project but also to deepen my understanding of these critical security concepts.

This project is very important to me, and I’m reaching out with a heartfelt request: could you kindly provide some guidance or resources that might help clarify how Mythril processes and transforms code during analysis? I’d be immensely grateful for any insight you could share.

Thank you so much for your time and for the incredible work you’ve done with Mythril. It’s an inspiring tool, and I hope to understand it better with your help.

Warm regards,
M. Silva

@norhh
Copy link
Collaborator

norhh commented Dec 27, 2024

Hi, Mythril operates on bytecode. So, each instruction has to be symbolically modeled as in this file: https://github.com/Consensys/mythril/blob/develop/mythril/laser/ethereum/instructions.py.

Consider JUMPI instruction as an example.
The condition and the jump destination are popped from the stack (EVM is a stack based machine). Usually, if the condition is true, program counter jumps to the jump destination, otherwise, the program counter is just incremented. In symbolic execution, when the condition is a symbol and can resolve to both true and false, both the paths get added to the list of states. Hence, this instruction returns two states.

Whenever an instruction is being evaluated by symbolic vm, this code is wrapped by analysis modules. One such example is arbitrary jump destination module. This module wraps around JUMP and JUMPI instructions as prehooks, so, this analysis module is executed before symbolically executing the instructions JUMP and JUMPI. This analysis module returns an issue if the jump destination popped from the stack (the jump dest still exists in stack during pre_hooks) is symbolic rather than a concrete value.

Feel free to ask me any questions.

@matheus-junio-da-silva
Copy link
Author

Dear @norhh,
I hope you're doing well!

First, thank you so much for your detailed and thoughtful response to my previous question—it really helped me make significant progress in understanding the symbolic execution process in Mythril. Since our last interaction, I’ve been studying the parts of the codebase you mentioned, particularly the symbolic modeling in instructions.py, and I also started exploring the external_calls detection module.

However, I’ve hit a roadblock that I was hoping you could help me with. You’re the only person I can think of who might have the insight I need, and I’d be immensely grateful for any guidance. I know this is a complex topic, and I’ve been working day and night to understand it better, but I feel a bit stuck. If my questions seem too basic, I sincerely apologize—it’s just that I don’t have a strong theoretical background in this area yet.

Here’s what I’ve been working on: I’m analyzing a simple contract that Mythril detects as having the vulnerability “External Call To User-Supplied Address.” While studying the external_calls module, I noticed that the detection seems to involve constraints from two sources:

Constraints explicitly defined in the module itself, such as verifying that the available gas is greater than 2300 and checking if the destination address matches the attacker's:

constraints = [  
    UGT(2_gas, 2300),  
    1271270613000041655817448348132275889066893754095 ==  
    Concat(0,  
        If(1_calldatasize <= 12, 0, 1_calldata[12]),  
        If(1_calldatasize <= 13, 0, 1_calldata[13]),  
        ...  
        If(1_calldatasize <= 31, 0, 1_calldata[31]))  
]  

Could you explain how this Concat function works in this context? Specifically, what is calldata and calldatasize, and how are they used here?

Constraints from state.world_state.constraints. I couldn’t fully grasp how these constraints are gathered and added to the detection process. What criteria are used to determine whether something qualifies as a constraint and gets added to state.world_state.constraints?

From my understanding, these constraints are then provided to the SMT solver to check if there’s any sequence of transactions that satisfies them, ultimately identifying vulnerabilities. Are there optimizations applied to these constraints before they are passed to the solver?

Additionally, are detection modules in Mythril generally structured in a similar way—that is, by combining module-specific constraints with those from world_state and passing them to the solver?

Once again, I apologize if this question seems simple, but I really want to understand how these pieces fit together. Your insight would mean a lot to me as I try to overcome this learning plateau.

Thank you so much for your time and for all the work you do in this field. Mythril is such an inspiring tool, and your support has been invaluable.
Best regards,
M. Silva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants