Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 adaptation #291

Open
wants to merge 53 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a5d062f
updated to python3
Aug 23, 2020
f04f22f
make develop runs
Aug 23, 2020
2886772
add some of the extras, makes the install neater
Aug 27, 2020
c2fe3a8
figuring out s3 references
Aug 27, 2020
1b990fc
update to make it clear its python3
Aug 27, 2020
4f48ac4
mistake in ProTECT_config
Sep 3, 2020
3e934f3
fixed to fastq files generated using picards SamToFastq
drkthomp Sep 5, 2020
b5ca956
fixing sort mechanism (TypeError: '<' not supported between instances…
drkthomp Sep 5, 2020
eab7983
broke the makefile oops. actually installs ProTECT now.
drkthomp Sep 5, 2020
0e1fa41
update universal options
drkthomp Sep 5, 2020
56ad6c4
change to local
drkthomp Oct 8, 2020
235519c
final update, including mem changes
drkthomp Oct 11, 2020
bba742a
error logs insufficient resources fixed 2020101023XX
drkthomp Oct 11, 2020
efad0ca
pip installing s3am is useless as it is python2. local only for now
drkthomp Oct 11, 2020
1d2bdb9
debug from 2 to 3 due to changes in check_output
drkthomp Oct 11, 2020
f70d319
version is python2 only. TODO figure out fix
drkthomp Oct 11, 2020
66053d3
ignore jobStore
drkthomp Oct 11, 2020
2bebd75
mem downgrade, not tested with toil (check still commented out)
drkthomp Oct 11, 2020
9d5edbf
new error. possible /data path is wrong or docker?
drkthomp Oct 11, 2020
a2b27f7
more exact error
drkthomp Oct 19, 2020
e6cedc0
error
drkthomp Oct 20, 2020
351e855
file is not valid for python2. probably want iobase
drkthomp Oct 20, 2020
5a4c50d
IOBASE import
drkthomp Oct 23, 2020
07f185e
errors
drkthomp Nov 8, 2020
18e04a8
mustard errors
Nov 23, 2020
53496d0
core 80.0 error, hardcoding 20
Nov 26, 2020
5b98383
pandas ix deprication fix
Nov 26, 2020
d3c291d
gunzip error
Nov 28, 2020
66bc12d
gunzip error
Nov 28, 2020
8ab1a8b
working run
Nov 28, 2020
cba1725
Update READ.me
drkthomp Nov 28, 2020
6442962
record changes
drkthomp Nov 28, 2020
09f0b0a
Update README.md
drkthomp Nov 28, 2020
2cda713
touchups
drkthomp Nov 28, 2020
2041f7b
Update README.md
drkthomp Nov 28, 2020
c0a32d4
remove outdated bd2k-lib
drkthomp Nov 28, 2020
91955fd
Merge branch 'master' of https://github.com/Dranion/protect
drkthomp Nov 28, 2020
395d66f
functioning virtualenv check
drkthomp Nov 28, 2020
68074fb
ignore extras
drkthomp Nov 28, 2020
793117e
WIP: custom s3am and bd2k-lib py3 install
drkthomp Nov 28, 2020
2a71513
Update MANUAL.md
drkthomp Nov 28, 2020
0cc5566
update manual to dranion ver
drkthomp Nov 28, 2020
bfdefcd
quick push
drkthomp Nov 28, 2020
46f52f7
Merge branch 'master' of https://github.com/Dranion/protect
drkthomp Nov 28, 2020
ba9c294
Delete \
drkthomp Nov 28, 2020
f54b7cf
Update MANUAL.md
drkthomp Nov 28, 2020
a8a2a95
fix check_venv with toil fix
drkthomp Nov 28, 2020
5f02442
Update MANUAL.md
drkthomp Nov 28, 2020
336e2e2
Update MANUAL.md
drkthomp Nov 29, 2020
8b69e11
debug statement cleanup
drkthomp Dec 7, 2020
a1b58eb
remove extra
drkthomp Dec 7, 2020
8a69d07
Update README.md
drkthomp Jan 12, 2021
5c8c718
Remove custom yaml
drkthomp Jan 12, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
bd2k-extras/
pimmuno.py
pimmuno_2.py
*.pyc
/src/*.egg-info/
develop_data/
venv/
.cache/
jobStore/
test-report.xml
__pycache__
*.DONE
Expand Down
8 changes: 8 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions .idea/protect3.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions .idea/runConfigurations/Basic_Run_.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 18 additions & 46 deletions MANUAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,87 +27,59 @@ ProTECT is implemented in the [Toil](https://github.com/BD2KGenomics/toil.git) f
runs the workflow described in [protect/Flowchart.txt](
https://github.com/BD2KGenomics/protect/blob/master/Flowchart.txt).

**This manual is a quick adaptation for an adaptation of ProTECT to py3**


# Installation

ProTECT requires Toil and we recommend installing ProTECT and its requirements in a
[virtualenv](http://docs.python-guide.org/en/latest/dev/virtualenvs/).

ProTECT also requires [s3am](https://github.com/BD2KGenomics/s3am.git) version 2.0.1 to download and
~ProTECT also requires [s3am](https://github.com/BD2KGenomics/s3am.git) version 2.0.1 to download and
upload files from S3. We recommend installing s3am in its own virtualenv using the directions in
the s3am manual, then putting the s3am binary on your $PATH. ProTECT will NOT attempt to install
s3am during installation.
s3am during installation.~

currently WIP. for now, **only references to local files will work**. anything that requires access to s3am (s3 buckets) will **fail**.

ProTECT uses pkg_resources from setuptools to verify versions of tools during install. As of setuptools
~ProTECT uses pkg_resources from setuptools to verify versions of tools during install. As of setuptools
39.0.1, some modules were moved to the packaging module. If your machine has setuptools >=39.0.1, you
will need the packaging module.
will need the packaging module.~

Lastly, ProTECT uses [docker](https://www.docker.com/) to run the various sub-tools in a
reproducible, platform independent manner. ProTECT will NOT attempt to install docker during
installation.

### Method 1 - Using PIP (recommended)

First create a virtualenv at your desired location (Here we create it in the folder ~/venvs)

virtualenv ~/venvs/protect

Activate the virtualenv

source ~/venvs/protect/bin/activate

NOTE: Installation was tested using pip 7.1.2 and 8.1.1. We have seen issues with the installation
of pyYAML with lower versions of pip and recommend upgrading pip before installing ProTECT.

pip install --upgrade pip

Install Toil

pip install toil[aws]==3.5.2

Install packaging (required if setuptools>=39.0.1)

pip install packaging

Install ProTECT and all dependencies in the virtualenv

pip install protect

~Method 1 - Using PIP (recommended)~
### Method 2 - Installing from Source

This will install ProTECT in an editable mode.

Obtain the source from Github

git clone https://www.github.com/BD2KGenomics/protect.git
git clone https://www.github.com/Dranion/protect.git

Create and activate a virtualenv in the project folder (Important since the Makefile checks for
this and will fail if it detects that you are not in a virtual environment)

cd protect
virtualenv venv
virtualenv --python=python3 venv
source venv/bin/activate

Install Toil and pytest

make prepare

Install packaging (required if setuptools>=39.0.1)
Install the python3 conversion of bd2k and s3am. *s3am is untested as I am running locally*

pip install packaging
make special_install

Install ProTECT

make develop

## Method 3 - Using Docker
~Method 3 - Using Docker~

Dockerized versions of ProTECT releases can be found at https://quay.io/organization/ucsc_cgl. These
Docker containers run the ProTECT pipeline in single machine mode. The only difference between the
Docker and Python versions of the pipeline is that the Docker container takes the config options,
described below, as command line arguments as opposed to a config file. Running the container
without any arguments will list all the available options. Also, currently the dockerized version of
ProTECT only supports local file export.

# Running ProTECT

Expand Down Expand Up @@ -173,7 +145,7 @@ in the pipeline, and the information on the input samples. Elements before a `:`
dictionary read into ProTECT and should **NOT** be modified (Barring the patient ID key in the
patients dictionary). Only values to the right of the `:` should be edited.

Every required reference file is provided in the AWS bucket `cgl-pipeline-inputs` under the folder
Every required reference file is provided in the AWS bucket `protect-data` under the folder
`protect/hg19_references` or `protect/hg38_references`. The `README` file in the same location
describes in detail how each file was generated. To use a file located in an s3 bucket, replace
`/path/to` in the following descriptions with `s3://<databucket>/<folder_in_bucket>`.
Expand Down Expand Up @@ -547,7 +519,7 @@ purposes:
12: g/f/jobO4yiE4 return self.run(fileStore)
13: g/f/jobO4yiE4 File "/home/ucsc/arjun/tools/dev/toil_clean/src/toil/job.py", line 1406, in run
14: g/f/jobO4yiE4 rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
15: g/f/jobO4yiE4 File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python2.7/site-packages/protect/binding_prediction/common.py", line 566, in merge_mhc_peptide_calls
15: g/f/jobO4yiE4 File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python3/site-packages/protect/binding_prediction/common.py", line 566, in merge_mhc_peptide_calls
16: g/f/jobO4yiE4 raise RuntimeError('No peptides available for ranking')
17: g/f/jobO4yiE4 RuntimeError: No peptides available for ranking
18: g/f/jobO4yiE4 ERROR:toil.worker:Exiting the worker because of a failed job on host sjcb10st7
Expand Down Expand Up @@ -581,9 +553,9 @@ do not store logs from tools (see BD2KGenomics/protect#275). The error looks sim
Z/O/job1uH92D return self.run(fileStore)
Z/O/job1uH92D File "/home/ucsc/arjun/tools/dev/toil_clean/src/toil/job.py", line 1406, in run
Z/O/job1uH92D rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
Z/O/job1uH92D File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python2.7/site-packages/protect/mutation_calling/radia.py", line 238, in run_filter_radia
Z/O/job1uH92D File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python3/site-packages/protect/mutation_calling/radia.py", line 238, in run_filter_radia
Z/O/job1uH92D tool_version=radia_options['version'])
Z/O/job1uH92D File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python2.7/site-packages/protect/common.py", line 138, in docker_call
Z/O/job1uH92D File "/home/ucsc/arjun/tools/protect_toil_clean/local/lib/python3/site-packages/protect/common.py", line 138, in docker_call
Z/O/job1uH92D 'for command \"%s\"' % ' '.join(call),)
Z/O/job1uH92D RuntimeError: docker command returned a non-zero exit status (1)for command "docker run --rm=true -v /scratch/bio/ucsc/toil-681c097c-61da-4687-b734-c5051f0aa19f/tmped2fnu/f041f939-5c0d-40be-a884-68635e929d09:/data --log-driver=none aarjunrao/filterradia:bcda721fc1f9c28d8b9224c2f95c440759cd3a03 TCGA-CH-5788 17 /data/radia.vcf /data /home/radia/scripts -d /data/radia_dbsnp -r /data/radia_retrogenes -p /data/radia_pseudogenes -c /data/radia_cosmic -t /data/radia_gencode --noSnpEff --noBlacklist --noTargets --noRnaBlacklist -f /data/hg38.fa --log=INFO -g /data/radia_filtered_chr17_radia.log"
Z/O/job1uH92D ERROR:toil.worker:Exiting the worker because of a failed job on host sjcb10st1
Expand Down
20 changes: 12 additions & 8 deletions Makefile
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,22 @@ help:
@echo "$$help"


python=python2.7
pip=pip2.7
python=python
pip=pip
tests=src/protect/test/unit
extras=

green=\033[0;32m
normal=\033[0m
red=\033[0;31m

# WIP
special_install: check_venv
git clone https://github.com/Dranion/bd2k-extras.git
make -C bd2k-extras/bd2k-python-lib develop
make -C bd2k-extras/s3am develop

prepare: check_venv
@$(pip) install toil==3.8.0 pytest==2.8.3
@$(pip) install toil pytest

develop: check_venv
$(pip) install -e .$(extras)
Expand Down Expand Up @@ -107,11 +112,10 @@ clean_pypi:

clean: clean_develop clean_sdist clean_pypi


check_venv:
@$(python) -c 'import sys; sys.exit( int( not hasattr(sys, "real_prefix") ) )' \
|| ( echo "$(red)A virtualenv must be active.$(normal)" ; false )

@$(python) -c 'import sys; sys.exit( int( not (hasattr(sys, "real_prefix") or ( hasattr(sys, "base_prefix") and sys.base_prefix != sys.prefix ) ) ) )' \
|| [ ! -z "${VIRTUAL_ENV}" ] \
|| ( echo "$(red)A virtualenv must be active.$(normal)\n" ; false )

check_clean_working_copy:
@echo "$(green)Checking if your working copy is clean ...$(normal)"
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
[![Stories in Ready](https://badge.waffle.io/BD2KGenomics/protect.png?label=ready&title=Ready)](https://waffle.io/BD2KGenomics/protect)
# ProTECT
### **Pr**ediction **o**f **T**-Cell **E**pitopes for **C**ancer **T**herapy

Adapation of ProTECT to use python 3.8 instead of 2.7. Currently have tested a complete run using fastq files from [HCC1395 WGS Exome RNA Seq Data](https://github.com/genome/gms/wiki/HCC1395-WGS-Exome-RNA-Seq-Data), with identical results in both version of python.

Adaptation done using 2to3 and manual bug testing. Manual changes recorded [at changes.md](https://github.com/Dranion/protect/blob/master/changes.md). Since s3am is python2, **currently is local only**, however an untested python3 version of s3am exists [here](https://github.com/Dranion/bd2k-extras/tree/main). Continuing to the original README:

This repo contains the Python libraries for the Precision Immunology Pipeline developed at UCSC.

src/protect/pipeline/ProTECT.py - The python script for running the pipeline.
Expand All @@ -20,6 +23,6 @@ All docker images used in this pipeline are available at


To learn how the pipeline can be run on a sample, head over to the [ProTECT Manual](
https://github.com/BD2KGenomics/protect/blob/master/MANUAL.md)
https://github.com/Dranion/protect/blob/master/MANUAL.md)

ProTECT is currently in its infancy and is under continuous development. We would appreciate users sharing the level 3 data produced by ProTECT with us such that we can better train our predictive models.
Loading