-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathmlentary.tex
202 lines (181 loc) · 9.84 KB
/
mlentary.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
% author: sam tenka
% change: 2023-05-25
% create: 2022-05-11
\documentclass[11pt, justified]{tufte-book}
\geometry{
left = 0.90in, % left margin
textwidth = 4.95in, % main text block
marginparsep = 0.15in, % gutter between main text block and margin notes
marginparwidth = 2.30in, % width of margin notes
% 0.20in % width from margin to edge
}
\input{sam.sty}
\usepackage{tikz-cd}
\begin{document}\samtitle{basics of machine learning}
\sampassage{contributors}
\input{CONTRIBUTORS}
\sampassage{hallo!}
These optional and still-evolving notes for 6.86x give a big picture view of
our conceptual landscape. They might help you study, but you can do all your
assignments without these notes. These notes treat many but not all the
topics of our course; they also treat a few topics not emphasized in class,
to fill gaps for curious learners.
I'm happy to answer questions about the notes on the forum. If you want to
help improve these notes, ask me; I'll be happy to list you as a contributor!
We compiled these notes using the following color palette; let us know if
the colors are hard to distinguish.
{\Huge\vspace{-1.5cm}\[\substack{%
{\sky\blacksquare}
{\blu\blacksquare}
{\sea\blacksquare}\\
{\pch\blacksquare}
{\rng\blacksquare}
{\bro\blacksquare}}
\]}
\marginnote[-6cm]{%
\textsc{Clickable Table of Contents}\vspace{0.05cm}
\begin{description}
\item[\hyperlink{A}{A. prologue}] \phdot \hfill\pageref{part:A}
\begin{description}
\item[\hyperlink{A0}{{\color{gray}0} what is learning?}]
\item[\hyperlink{A1}{{\color{gray}1} a tiny example: handwritten digits}]
\item[\hyperlink{A2}{{\color{gray}2} how well did we do?}]
\item[\hyperlink{A3}{{\color{gray}3} how can we do better?}]
\end{description}
\item[\hyperlink{part:Z}{Z. a complete, simple classifier}] \phdot \hfill\pageref{part:Z}
\begin{description}
\item[\hyperlink{Z0}{{\color{gray}0} goodness of fit}]
\item[\hyperlink{Z1}{{\color{gray}1} smarter optimization}]
\item[\hyperlink{Z2}{{\color{gray}2} two examples: cows and walls}]
\item[\hyperlink{Z3}{{\color{gray}3} intrinsic plausibility}]
\end{description}
\item[\hyperlink{part:B}{B. squeezing more out of linear models}] \phdot \hfill\pageref{part:B}
\begin{description}
\item[\hyperlink{B0}{{\color{gray}0} featurization}]
\item[\hyperlink{B1}{{\color{gray}1} quantifying uncertainty}]
\item[\hyperlink{B2}{{\color{gray}2} iterative optimization}]
\item[\hyperlink{B3}{{\color{gray}3} data-dependent features}]
\end{description}
\item[\hyperlink{part:C}{C. bend those lines to capture rich patterns}] \phdot \hfill\pageref{part:C}
\begin{description}
\item[\hyperlink{C0}{{\color{gray}0} featurization}]
\item[\hyperlink{C1}{{\color{gray}1} learned featurizations}]
\item[\hyperlink{C2}{{\color{gray}2} locality and symmetry in architecture}]
\item[\hyperlink{C3}{{\color{gray}3} dependencies in architecture}]
\end{description}
\item[\hyperlink{part:D}{D. thicken those lines to quantify uncertainty}] \phdot \hfill\pageref{part:D}
\begin{description}
\item[\hyperlink{D0}{{\color{gray}0} bayesian models}]
\item[\hyperlink{D1}{{\color{gray}1} examples of bayesian models}]
\item[\hyperlink{D2}{{\color{gray}2} inference algorithms for bayesian models}]
\item[\hyperlink{D3}{{\color{gray}3} combining with deep learning}]
\end{description}
\item[\hyperlink{part:E}{E. beyond learning-from-examples}] \phdot \hfill\pageref{part:E}
\begin{description}
\item[\hyperlink{E0}{{\color{gray}0} reinforcement}]
\item[\hyperlink{E1}{{\color{gray}1} state}]
\item[\hyperlink{E2}{{\color{gray}2} deep q learning}]
\item[\hyperlink{E3}{{\color{gray}3} learning from instructions}]
\end{description}
%
\item[] \vspace{0.05cm} \hrule \vspace{-0.05cm}
%
\item[\hyperlink{part:F}{F. some background}] \phdot \hfill\pageref{part:F}
\begin{description}
\item[\hyperlink{F0}{{\color{gray}0} probability primer}]
\item[\hyperlink{F1}{{\color{gray}1} linear algebra primer}]
\item[\hyperlink{F2}{{\color{gray}2} derivatives primer}]
\item[\hyperlink{F3}{{\color{gray}3} programming and numpy and pytorch primer}]
\end{description}
%\item[\hyperlink{part:G}{G. rambles}] \phdot \hfill\pageref{part:G}
% \begin{description}
% \item[\hyperlink{G0}{{\color{gray}0} }]
% \item[\hyperlink{G1}{{\color{gray}1} }]
% \item[\hyperlink{G2}{{\color{gray}2} }]
% \item[\hyperlink{G3}{{\color{gray}3} }]
% \end{description}
\end{description}
}
\sampassage{navigating these notes}
We divide these notes into bite-size passages; each begins with a gray
heading (e.g.\ `navigating these notes' to the left). All passages are
optional in the sense that you'll be able to complete all assignments
without ever studying these notes. Yet, aside from a few bonus passages
marked as \textsf{\blu VERY OPTIONAL}, most passages offer complementary
discussion of our course topics that may help you understand, appreciate,
and internalize the lectures. After all, only with two lines of sight can
we see depth.
These notes highlight probability theory and model architecture more than
the lectures; clustering and low-rank approximation, less. We can use
different projections to tear and squash a globe into a flat map of earth.
Different views reveal different aspects of geography. In order to ``keep
australia from being split into two parts'' and to ``display the south
pole'', I used a different map projection than the lectures do. So we
traverse topics in the notes in a different order than we do in lecture.
Still, we cover essentially the same overall ground.
\newcommand{\SSS}[2]{\newpage\hypertarget{#1}{}\sampart{#2}\phantomsection\label{part:#1}}
\newcommand{\sss}[3]{\hypertarget{#1}{}\samsection{#3}\input{tex-source/#2}}
\newpage
\SSS{0}{prologue}
\sss{0a}{body.0.0.what-is-learning}{what is learning?}
\sss{0b}{body.0.1.our-first-learning-algorithm}{a tiny example: classifying handwritten digits}
\sss{0c}{body.0.2.how-well-did-we-do}{how well did we do? analyzing our error}
\sss{0d}{body.0.3.how-can-we-do-better}{how can we do better? (survey of rest of notes)}
\newpage
%a complete, simple classifier (unit 1)}
\SSS{1}{Learning from Examples}
\sss{1a}{body.1.0.likelihoods}{goodness of fit}
\sss{1b}{body.1.1.gradients}{smarter optimization}
\sss{1c}{body.1.2.priors}{intrinsic plausibility}
\sss{1d}{body.1.3.model-selection}{model selection}
\newpage
%squeezing more out of linear models (unit 2)}
\SSS{2}{Engineering Features (and Friends) by Hand}
\sss{2a}{body.2.0.featurization}{featurization}
\sss{2b}{body.2.1.quantify-uncertainty}{richer outputs: quantifying uncertainty}
\sss{2c}{body.2.2.iterative-optimization}{iterative optimization}
\sss{2d}{body.2.3.data-dependent-features}{data dependent features}
\newpage
%bend those lines to capture rich patterns (unit 3)}
\SSS{3}{Learning Features from Data}
\sss{3a}{body.3.0.shallow-learning}{shallow learning}
\sss{3b}{body.3.1.deep-architecture}{deep learning: ideas in architecture}
\sss{3c}{body.3.2.convolution}{symmetry and locality: convolution}
\sss{3d}{body.3.3.attention}{symmetry and locality: attention}
\newpage
%thicken those lines to quantify uncertainty (unit 4)}
\SSS{4}{Modeling Structured Uncertainties}
\sss{4a}{body.4.0.probabilistic-models}{probabilistic models}
\sss{4b}{body.4.1.expectation-maximization}{inference by variation: EM}
\sss{4c}{body.4.2.metropolis-hastings}{inference by sampling: MH}
\sss{4d}{body.4.3.deep-generators}{deep generative models}
\newpage
%beyond learning-from-examples (unit 5)}
\SSS{5}{Learning while Acting (and from Rewards)}
\sss{5a}{body.5.0.reinforcement}{rewards, actions, states. exploration}
\sss{5b}{body.5.1.mdps}{qualitative solutions to mdp, dynamic programming, bootstrap}
%state (dependence on prev xs and on actions ) ; RL ; partial observations}
\sss{5c}{body.5.2.q-learning}{reduce to supervision using q-learning}
\sss{5d}{body.5.3.beyond}{non-technical discussion of: curiosity, language, instruction, world-models}
% =============================================================================
% == _ ======================================================================
% =============================================================================
\newpage
\SSS{F}{Prereq Helpers}
\sss{Fa}{body.F.0.linear-algebra}{high dimensions and linear algebra}
%-- linear "spaces" and linear "maps"
%-- dimension, trace, determinant, visually
%-- vectors, covectors, higher tensors
%-- dot product, angles, and svd, visually
\sss{Fb}{body.F.1.probability}{uncertainty and probability}
%-- fundamentals; frequentism vs bayesianism
%-- 2 sharp tools: independence and averaging
%-- log loss and friends
%-- common families of distributions
\sss{Fc}{body.F.2.derivatives}{optimization and derivatives}
%-- derivatives are best linear appro.s: chain rule and linearity
%-- product rule, visually
%-- optimization : critical points, lagrange, higher derivatives
%-- write your own automatic differentiator: a short exercise
\sss{Fd}{body.F.3.programming}{python, numpy, pytorch}
\end{document}