[DEPRECATED] Smart Contract Vulnerabilities

❗❗❗ This repository is deprecated and merged into our MANDO-HGT repository.

Multi-Level Graph Embeddings

This repository is an implementation of HGT-EVM: Heterogeneous Graph Transformers for Vulnerability Detection in Smart Contract Bytecode. The source code is based on the implementation of HGT model using Deep Graph Library.

The following figure shows the smart contract in several forms in our processing, let's have an intuitive view of the graph of source code as well.

How to train the models?

Dataset

We prepared dataset for experiments.

System Description

We run all experiments on

Ubuntu 20.04
CUDA 11.1
NVIDA RTX3080

Install Environment

Install python required packages.

pip install -r requirements.txt -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html -f https://data.pyg.org/whl/torch-1.8.0+cu111.html -f https://data.dgl.ai/wheels/repo.html

Inspection scripts

We provied inspection scripts for Graph Classification tasks as well as their required data.

Training Phase

python -m experiments.graph_classification --epochs 50 --repeat 20 --model hgt --bytecode runtime

To show the result table

python -m experiments.graph_classification --result

Trainer

Usage

usage: HGT-EVM Graph Classifier [-h] [-s SEED] [-ld LOG_DIR] [--output_models OUTPUT_MODELS] [--compressed_graph COMPRESSED_GRAPH] [--dataset DATASET] [--testset TESTSET] [--label LABEL] [--checkpoint CHECKPOINT]
                                [--feature_extractor FEATURE_EXTRACTOR] [--node_feature NODE_FEATURE] [--k_folds K_FOLDS] [--test] [--non_visualize]

optional arguments:
  -h, --help            show this help message and exit
  -s SEED, --seed SEED  Random seed

Storage:
  Directories for util results

  -ld LOG_DIR, --log-dir LOG_DIR
                        Directory for saving training logs and visualization
  --output_models OUTPUT_MODELS
                        Where you want to save your models

Dataset:
  Dataset paths

  --compressed_graph COMPRESSED_GRAPH
                        Compressed graphs of dataset which was extracted by graph helper tools
  --dataset DATASET     Dicrectory of all souce code files which were used to extract the compressed graph
  --testset TESTSET     Dicrectory of all souce code files which is a partition of the dataset for testing
  --label LABEL         Label of sources in source code storage
  --checkpoint CHECKPOINT
                        Checkpoint of trained models

Node feature:
  Define the way to get node features

  --feature_extractor FEATURE_EXTRACTOR
                        If "node_feature" is "GAE" or "LINE" or "Node2vec", we need a extracted features from those models
  --node_feature NODE_FEATURE
                        Kind of node features we want to use, here is one of "nodetype", "metapath2vec", "gae", "line", "node2vec"

Optional configures:
  Advanced options

  --k_folds K_FOLDS     Config for cross validate strategy
  --test                Set true if you only want to run test phase
  --non_visualize       Wheather you want to visualize the metrics

Examples

We prepared some scripts for the custom HGT-EVM structures bellow:
Graph Classication for Heterogeous Control Flow Graphs (HCFGs) which detect vulnerabilites at the contract level.
- LINE as node features

python graph_classifier.py -ld ./logs/graph_classification/byte_code/line/access_control --output_models ./models/graph_classification/byte_code/line/access_control --dataset ./experiments/ge-sc-data/byte_code/smartbugs/runtime/gpickles/access_control/clean_57_buggy_curated_0/  --compressed_graph ./experiments/ge-sc-data/byte_code/smartbugs/runtime/gpickles/access_control/clean_57_buggy_curated_0/compressed_graphs/runtime_balanced_compressed_graphs.gpickle --label ./experiments/ge-sc-data/byte_code/smartbugs/contract_labels/access_control/creation_buggy_curated_contract_labels.json --node_feature line --feature_extractor ./experiments/ge-sc-data/byte_code/smartbugs/runtime/gpickles/gesc_matrices_node_embedding/balanced/matrix_line_dim128_of_core_graph_of_access_control_runtime_balanced_cfg_compressed_graphs.pkl --seed 1

Testing

We automatically run testing after training phase for now.

Visuallization

You also use tensorboard and take a look the trend of metrics for both training phase and testing phase.

tensorboard --logdir LOG_DIR

Results

Apply HGT to HCFGs in creation bytecode.

Apply HGT to HCFGs in runtime bytecode.

Statistics

We also make some statistics about the dataset and the last embedding we got from last layer of out models.

Datasets

Our datasets were mixed from three resources: SB Curated, SolidiFI Benchmark and SmartBugs Wild . The chart bellow indicate the proportion of smartcontract which have any bug or not.

Creation dataset distribution
Runtime dataset distribution

Classification performance

We plot last embedding of each smart contract after applying the PCA to discover classification ability of our models. In reentrancy bug, the following figure show embedding of the smart contracts in testset using serveral kind of input node features in creation and runtime bytecode.

Creation bytecode last embedding.
Runtime bytecode last embedding.
Check this link out to see more analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
experiments		experiments
forensics		forensics
process_graphs		process_graphs
sco_models		sco_models
.gitignore		.gitignore
EtherSolve.jar		EtherSolve.jar
README.md		README.md
graph_classifier.py		graph_classifier.py
requirements.txt		requirements.txt
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[DEPRECATED] Smart Contract Vulnerabilities

❗❗❗ This repository is deprecated and merged into our MANDO-HGT repository.

Multi-Level Graph Embeddings

Table of contents

How to train the models?

Dataset

System Description

Install Environment

Inspection scripts

Trainer

Usage

Examples

Testing

Visuallization

Results

Apply HGT to HCFGs in creation bytecode.

Apply HGT to HCFGs in runtime bytecode.

Statistics

Datasets

Classification performance

About

Releases

Packages

Contributors 2

Languages

MANDO-Project/ge-sc-bytecode

Folders and files

Latest commit

History

Repository files navigation

[DEPRECATED] Smart Contract Vulnerabilities

❗❗❗ This repository is deprecated and merged into our MANDO-HGT repository.

Multi-Level Graph Embeddings

Table of contents

How to train the models?

Dataset

System Description

Install Environment

Inspection scripts

Trainer

Usage

Examples

Testing

Visuallization

Results

Apply HGT to HCFGs in creation bytecode.

Apply HGT to HCFGs in runtime bytecode.

Statistics

Datasets

Classification performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages