-
Notifications
You must be signed in to change notification settings - Fork 190
/
Copy pathdiscussion.html
169 lines (139 loc) · 13.7 KB
/
discussion.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<title>discussion</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<style>
html,body{color:black}*:not('#mkdbuttons'){margin:0;padding:0}#wrapper{font:15px helvetica,arial,freesans,clean,sans-serif;-webkit-font-smoothing:antialiased;line-height:1.7;padding:3px;background:#fff;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px}p{margin:1em 0}a{color:#4183c4;text-decoration:none}#wrapper{background-color:#fff;padding:30px;margin:15px;font-size:15px;line-height:1.6}#wrapper>*:first-child{margin-top:0 !important}#wrapper>*:last-child{margin-bottom:0 !important}@media screen{#wrapper{box-shadow:0 0 0 1px #cacaca, 0 0 0 4px #eee}}h1,h2,h3,h4,h5,h6{font-weight:700;line-height:1.7;cursor:text;position:relative;margin:1em 0 15px;padding:0}h1{font-size:2.5em;border-bottom:1px solid #ddd}h2{font-size:2em;border-bottom:1px solid #eee}h3{font-size:1.5em}h4{font-size:1.2em}h5{font-size:1em}h6{color:#777;font-size:1em}p,blockquote,table,pre{margin:15px 0}ul{padding-left:30px}ol{padding-left:30px}ol li ul:first-of-type{margin-top:0px}hr{background:transparent url() repeat-x 0 0;border:0 none;color:#ccc;height:4px;margin:15px 0;padding:0}#wrapper>h2:first-child{margin-top:0;padding-top:0}#wrapper>h1:first-child{margin-top:0;padding-top:0}#wrapper>h1:first-child+h2{margin-top:0;padding-top:0}#wrapper>h3:first-child,#wrapper>h4:first-child,#wrapper>h5:first-child,#wrapper>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1+p,h2+p,h3+p,h4+p,h5+p,h6+p,ul li>:first-child,ol li>:first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:bold;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #DDD;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{border-collapse:collapse;border-spacing:0;font-size:100%;font:inherit}table th{font-weight:bold;border:1px solid #ccc;padding:6px 13px}table td{border:1px solid #ccc;padding:6px 13px}table tr{border-top:1px solid #ccc;background-color:#fff}table tr:nth-child(2n){background-color:#f8f8f8}img{max-width:100%}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px;font-family:Consolas, 'Liberation Mono', Courier, monospace;font-size:12px;color:#333}pre>code{margin:0;padding:0;white-space:pre;border:none;background:transparent}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:14px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px;margin:26px 0}pre code,pre tt{background-color:transparent;border:none}.poetry pre{font-family:Georgia, Garamond, serif !important;font-style:italic;font-size:110% !important;line-height:1.6em;display:block;margin-left:1em}.poetry pre code{font-family:Georgia, Garamond, serif !important;word-break:break-all;word-break:break-word;-webkit-hyphens:auto;-moz-hyphens:auto;hyphens:auto;white-space:pre-wrap}sup,sub,a.footnote{font-size:1.4ex;height:0;line-height:1;vertical-align:super;position:relative}sub{vertical-align:sub;top:-1px}@media print{body{background:#fff}img,pre,table,figure{page-break-inside:avoid}#wrapper{background:#fff;border:none}pre code{overflow:visible}}@media screen{body.inverted{color:#eee !important;border-color:#555;box-shadow:none}.inverted #wrapper,.inverted hr,.inverted p,.inverted td,.inverted li,.inverted h1,.inverted h2,.inverted h3,.inverted h4,.inverted h5,.inverted h6,.inverted th,.inverted .math,.inverted caption,.inverted dd,.inverted dt,.inverted blockquote{color:#eee !important;border-color:#555;box-shadow:none}.inverted td,.inverted th{background:#333}.inverted pre,.inverted code,.inverted tt{background:#eeeeee !important;color:#111}.inverted h2{border-color:#555555}.inverted hr{border-color:#777;border-width:1px !important}::selection{background:rgba(157,193,200,0.5)}h1::selection{background-color:rgba(45,156,208,0.3)}h2::selection{background-color:rgba(90,182,224,0.3)}h3::selection,h4::selection,h5::selection,h6::selection,li::selection,ol::selection{background-color:rgba(133,201,232,0.3)}code::selection{background-color:rgba(0,0,0,0.7);color:#eeeeee}code span::selection{background-color:rgba(0,0,0,0.7) !important;color:#eeeeee !important}a::selection{background-color:rgba(255,230,102,0.2)}.inverted a::selection{background-color:rgba(255,230,102,0.6)}td::selection,th::selection,caption::selection{background-color:rgba(180,237,95,0.5)}.inverted{background:#0b2531;background:#252a2a}.inverted #wrapper{background:#252a2a}.inverted a{color:#acd1d5}}.highlight .c{color:#998;font-style:italic}.highlight .err{color:#a61717;background-color:#e3d2d2}.highlight .k,.highlight .o{font-weight:bold}.highlight .cm{color:#998;font-style:italic}.highlight .cp{color:#999;font-weight:bold}.highlight .c1{color:#998;font-style:italic}.highlight .cs{color:#999;font-weight:bold;font-style:italic}.highlight .gd{color:#000;background-color:#fdd}.highlight .gd .x{color:#000;background-color:#faa}.highlight .ge{font-style:italic}.highlight .gr{color:#a00}.highlight .gh{color:#999}.highlight .gi{color:#000;background-color:#dfd}.highlight .gi .x{color:#000;background-color:#afa}.highlight .go{color:#888}.highlight .gp{color:#555}.highlight .gs{font-weight:bold}.highlight .gu{color:#800080;font-weight:bold}.highlight .gt{color:#a00}.highlight .kc,.highlight .kd,.highlight .kn,.highlight .kp,.highlight .kr{font-weight:bold}.highlight .kt{color:#458;font-weight:bold}.highlight .m{color:#099}.highlight .s{color:#d14}.highlight .na{color:#008080}.highlight .nb{color:#0086B3}.highlight .nc{color:#458;font-weight:bold}.highlight .no{color:#008080}.highlight .ni{color:#800080}.highlight .ne,.highlight .nf{color:#900;font-weight:bold}.highlight .nn{color:#555}.highlight .nt{color:#000080}.highlight .nv{color:#008080}.highlight .ow{font-weight:bold}.highlight .w{color:#bbb}.highlight .mf,.highlight .mh,.highlight .mi,.highlight .mo{color:#099}.highlight .sb,.highlight .sc,.highlight .sd,.highlight .s2,.highlight .se,.highlight .sh,.highlight .si,.highlight .sx{color:#d14}.highlight .sr{color:#009926}.highlight .s1{color:#d14}.highlight .ss{color:#990073}.highlight .bp{color:#999}.highlight .vc,.highlight .vg,.highlight .vi{color:#008080}.highlight .il{color:#099}.highlight .gc{color:#999;background-color:#EAF2F5}.type-csharp .highlight .k,.type-csharp .highlight .kt{color:#00F}.type-csharp .highlight .nf{color:#000;font-weight:normal}.type-csharp .highlight .nc{color:#2B91AF}.type-csharp .highlight .nn{color:#000}.type-csharp .highlight .s,.type-csharp .highlight .sc{color:#A31515}body.dark #wrapper{background:transparent !important;box-shadow:none !important}
@media print{
#generated-toc-clone,#generated-toc{display:none!important}hr{border:none!important;page-break-after:always!important}
}
body { font-size: 14px }
#wrapper * { font-size: 100%!important; }
</style>
</head>
<body class="normal">
<div id="wrapper">
<h1 id="whatwouldyoulikeyourstudentstolearnina2ndcourseonstatisticalcomputing">What would you like your students to learn in a 2nd course on statistical computing?</h1>
<blockquote>
<p>Make it run, make it right, make it fast. </p>
</blockquote>
<h2 id="background">Background</h2>
<ul>
<li>Offering STA 633: Statistical computing and computation in Spring 2015</li>
<li>Schedule is 2 lectures (75 mins) and 1 lab (75 mins) per week</li>
<li>This is a <em>2nd</em> course in statistical computing - pre-req is Colin Rundel’s class
<ul>
<li><a href="https://stat.duke.edu/~cr173/Sta523_Fa14/">STA 523</a></li>
<li>Quite fast-paced - recommended books are
<ul>
<li>Advanced R - Wickham</li>
<li>R Packages - Wickham</li>
</ul></li>
<li>Will cover <code>Unix shell</code>, <code>make</code>, <code>git</code>, <code>markdown</code> and programming in R</li>
<li>We will have a pretest to determine eligibility if students have not taken STA 523</li>
</ul></li>
</ul>
<h2 id="proposedlearningobjectives">Proposed learning objectives</h2>
<ul>
<li>Basically teach <em>all the computing that we would personally like to see in a PhD student or postdoc working with us</em>
<ul>
<li>Comfortable using both high (Python/Julia?/R) and low level languages (C/C++)</li>
<li>Understand data management and use of relational database
<ul>
<li>Working with “bad” data
<ul>
<li><font color=red>Examples?</font></li>
</ul></li>
<li>Hands-on exercise building a normalized database from a spreadsheet and querying it via SQL</li>
</ul></li>
<li>Can build reproducible data analysis pipelines (testing + make + literate programming)</li>
<li>Can convert a statistical model (e.g. from manuscript or textbook) into a numerical algorithm
<ul>
<li>Understanding of basic algorithms for optimization, simulation and smoothing
<ul>
<li>Building blocks for large classes of statistical algorithms</li>
<li><font color=red>What algorithms should students know?</font></li>
</ul></li>
<li>Pragmatic usage of libraries for established numerical routines
<ul>
<li><font color=red>Recommendations for C/C++ libraries</font></li>
</ul></li>
</ul></li>
<li>Can write code that is <em>correct</em>
<ul>
<li>How much and what kind of testing is appropriate?</li>
<li>How to test code with stochastic elements</li>
</ul></li>
<li>Can write code that runs <em>fast</em>
<ul>
<li>Trade-off between computation and programmer time (premature optimization)</li>
<li>Some understanding of complexity trade-offs for algorithms and data structures</li>
<li>Benchmarking and profiling</li>
<li>JIT compilation</li>
<li>Writing native code</li>
<li>Exploiting multiple cores (threading, multiprocessing, OpenMP)</li>
<li>Exploiting multiple machines (MPI)</li>
<li>Exploiting GPUs (CUDA, maybe OpenCL)</li>
<li>Working with really big data (MapReduce)</li>
</ul></li>
</ul></li>
</ul>
<h2 id="units">Units</h2>
<p>Unit 1: Reproducible analysis and introducing Python as a glue language (10%)<br/>
Unit 2: Working with data - data munging and relational databases (10%)<br/>
Unit 3: Exploratory data analysis and visualization (10%)<br/>
Unit 4: Core statistical algorithms and libraries (40%)<br/>
Unit 5: C bootcamp, code profiling and writing native code (15%)<br/>
Unit 6: Parallel computing and working with big data (15%) </p>
<h2 id="discussion">Discussion</h2>
<ul>
<li><font color=red>Overall course objectives?</font></li>
<li><font color=red>Overall course content?</font>
<ul>
<li>Are there useful classes of topics we have left out?</li>
<li>Within each topic, what content should students learn?
<ul>
<li>Unit 1: Reproducible analysis and introducing Python as a glue language (10%)</li>
<li>Unit 2: Working with data - data munging and relational databases (10%)</li>
<li>Unit 3: Exploratory data analysis and visualization (10%)</li>
<li>Unit 4: Core statistical algorithms and libraries (40%)</li>
<li>Unit 5: C bootcamp, code profiling and writing native code (15%)</li>
<li>Unit 6: Parallel computing and working with big data (15%)</li>
</ul></li>
</ul></li>
<li><font color=red>How can programming be taught effectively?</font>
<ul>
<li>Every good programmer I know is self-taught …</li>
<li>MCQs for rapid sanity check on level of understanding each week</li>
<li>Less talking, more doing - mini-project after each unit</li>
<li>Individual or group work?</li>
</ul></li>
<li><font color=red>What are statistical algorithms students should know?</font>
<ul>
<li>Know the theory and how to use a good implementation
<ul>
<li>Teach understanding with toy example</li>
<li>Use library to solve more realistic problem</li>
</ul></li>
<li>Examples
<ul>
<li>Linear algebra e.g. projection, normal equations</li>
<li>Optimization - e.g. Newton, IRLS, multivariate gradient descent, EM</li>
<li>Simulation - resampling methods, Monte Carlo, MCMC</li>
<li>Others? Smoothing, interpolation etc</li>
</ul></li>
</ul></li>
<li><font color=red>What are good data sets and problems to use for teaching?</font>
<ul>
<li>Bad data</li>
<li>Big data</li>
<li>Slow and fast versions</li>
</ul></li>
</ul>
<!-- ##END MARKED WRAPPER## -->
</div>
</body>
</html>