Skip to content

RayyGao/two-sigma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

two-sigma

This project is about two-sigma competition in Kaggle. Two Sigma description.

Installation

clone repository, cd to project directory, and use make command.

This Git repository

Ideas are taken from here and below is an abbreviation of some of the ideas from that page.

Overall thought process

  • Each notebook keeps a historical (and dated) record of the analysis as it’s being explored.
  • The notebook is not meant to be anything other than a place for experimentation and development.
  • Each notebook is controlled by a single author: a data scientist on the team (marked with initials).
  • Notebooks can be split when they get too long.
  • Notebooks can be split by topic, if it makes sense.

Version Control

When issuing pull-requests, the diffs between the updated .ipynb files are hard to look at, as ipynb files are saved as json. One solution people tend to use is to commit the conversion to .py instead. This is great for seeing the differences in the input code (while jettisoning the output), and is useful for seeing the changes. However, when reviewing data science work, it is also incredibly important to see the output itself.

We get around these difficulties by committing the .ipynb, .py, and .html of every notebook. Creating the .py and .html files can be done simply and painlessly by editing the config file at ~/.jupyter/jupyter_notebook_config.py If you don’t have this file, run: code(jupyter notebook --generate-config) to create this file. Add the following code to this config file:

c = get_config()
### If you want to auto-save .html and .py versions of your notebook:
# modified from: https://github.com/ipython/ipython/issues/8009
import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['jupyter', 'nbconvert', '--to', 'script', fname], cwd=d)
    check_call(['jupyter', 'nbconvert', '--to', 'html', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

Directory structure

Seems pretty self-explanatory, but you can look at the blog linked above if you want some clarification.

TODO

  • Move final products in /develop to /src.

Authors

This is a team project which is about Kaggle Two-sigma Competition. Team "NULL": Xu Gao, Yaxiong Huang, Scott Edenbaum, Dodge Coates

About

Analysis on Kaggle's "two sigma" contest

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •