Science of Deep Learning.html

<!DOCTYPE html>
    <html>
    <head>
        <meta charset="UTF-8">
        <title>Science of  Learning</title>
        <style>
</style>
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.10.2/dist/katex.min.css" integrity="sha384-yFRtMMDnQtDRO8rLpMIKrtPCD5jdktao2TV19YiZYWMDkUR5GQZR/NOVTdquEx1j" crossorigin="anonymous">
<link href="https://cdn.jsdelivr.net/npm/katex-copytex@latest/dist/katex-copytex.min.css" rel="stylesheet" type="text/css">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css">
<style>
            body {
                font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif;
                font-size: 14px;
                line-height: 1.6;
            }
        </style>
        <style>
.task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; }
</style>
        
        <script src="https://cdn.jsdelivr.net/npm/katex-copytex@latest/dist/katex-copytex.min.js"></script>
        
    </head>
    <body class="vscode-body vscode-light">
        <h1 id="science-of--learning">Science of  Learning</h1>
<p>V. Vapnik said that ``Nothing is more practical than a good theory.''
Here we focus on the theoretical machine learning.</p>
<ul>
<li><a href="https://www.helsinki.fi/en/researchgroups/constraint-reasoning-and-optimization">https://www.helsinki.fi/en/researchgroups/constraint-reasoning-and-optimization</a></li>
<li><a href="https://www.math.ubc.ca/~erobeva/seminar.html">https://www.math.ubc.ca/~erobeva/seminar.html</a></li>
<li><a href="https://www.deel.ai/theoretical-guarantees/">https://www.deel.ai/theoretical-guarantees/</a></li>
<li><a href="http://www.vanderschaar-lab.com/NewWebsite/index.html">http://www.vanderschaar-lab.com/NewWebsite/index.html</a></li>
<li><a href="https://nthu-datalab.github.io/ml/index.html">https://nthu-datalab.github.io/ml/index.html</a></li>
<li><a href="http://www.cs.cornell.edu/~shmat/research.html">http://www.cs.cornell.edu/~shmat/research.html</a></li>
<li><a href="http://www.prace-ri.eu/best-practice-guide-deep-learning">http://www.prace-ri.eu/best-practice-guide-deep-learning</a></li>
<li><a href="https://math.ethz.ch/sam/research/reports.html?year=2019">https://math.ethz.ch/sam/research/reports.html?year=2019</a></li>
<li><a href="http://gr.xjtu.edu.cn/web/jjx323/home">http://gr.xjtu.edu.cn/web/jjx323/home</a></li>
<li><a href="https://zhouchenlin.github.io/">https://zhouchenlin.github.io/</a></li>
<li><a href="https://www.math.tamu.edu/~bhanin/">https://www.math.tamu.edu/~bhanin/</a></li>
<li><a href="https://yani.io/annou/">https://yani.io/annou/</a></li>
<li><a href="https://probability.dmi.unibas.ch/seminar.html">https://probability.dmi.unibas.ch/seminar.html</a></li>
<li><a href="http://mjt.cs.illinois.edu/courses/dlt-f19/">http://mjt.cs.illinois.edu/courses/dlt-f19/</a></li>
<li><a href="http://danroy.org/">http://danroy.org/</a></li>
<li><a href="https://www.symbiont-project.org/events/Slides-2018-03/SYMBIONT-2018-03-zimmermann.pdf">https://www.symbiont-project.org/events/Slides-2018-03/SYMBIONT-2018-03-zimmermann.pdf</a></li>
<li><a href="https://losslandscape.com/faq/">https://losslandscape.com/faq/</a></li>
<li><a href="https://mcallester.github.io/ttic-31230/">https://mcallester.github.io/ttic-31230/</a></li>
<li><a href="http://deep-phenomena.org/">http://deep-phenomena.org/</a></li>
<li><a href="https://ijcai20interpretability.github.io/">https://ijcai20interpretability.github.io/</a></li>
<li><a href="https://niceworkshop.org/">https://niceworkshop.org/</a></li>
</ul>
<p>Deep learning is a transformative technology that has delivered impressive improvements in image classification and speech recognition.
Many researchers are trying to better understand how to improve prediction performance and also how to improve training methods.
<a href="https://stats385.github.io/">Some researchers use experimental techniques; others use theoretical approaches.</a></p>
<ul>
<li><a href="https://www.cl.cam.ac.uk/~rja14/">https://www.cl.cam.ac.uk/~rja14/</a></li>
</ul>
<p>Deep learning is at least related with kernel tricks, projection pursuit and neural networks.</p>
<ul>
<li><a href="#science-of--learning">Science of  Learning</a>
<ul>
<li><a href="#resource--on-deep-learning-theory">Resource  on Deep Learning Theory</a>
<ul>
<li><a href="#deep-learning-reading-group">Deep Learning Reading Group</a></li>
</ul>
</li>
<li><a href="#interpretability-in-ai">Interpretability in AI</a>
<ul>
<li><a href="#interpretability-of-neural-networks">Interpretability of Neural Networks</a></li>
<li><a href="#deeplever">DeepLEVER</a></li>
<li><a href="#dlphi">DLphi</a></li>
<li><a href="#scientific-machine-learning">Scientific Machine Learning</a></li>
</ul>
</li>
<li><a href="#physics-and-deep-learning">Physics and Deep Learning</a>
<ul>
<li><a href="#machine-learning-for-physics">Machine Learning for Physics</a>
<ul>
<li><a href="#deep-learning-for-physics">Deep Learning for Physics</a></li>
</ul>
</li>
<li><a href="#physics-for-machine-learning">Physics for Machine Learning</a>
<ul>
<li><a href="#physics-informed-machine-learning">Physics Informed Machine Learning</a></li>
</ul>
</li>
<li><a href="#statistical-mechanics-and-deep-learning">Statistical Mechanics and Deep Learning</a></li>
<li><a href="#born-machine">Born Machine</a></li>
<li><a href="#quantum-machine-learning">Quantum Machine learning</a></li>
</ul>
</li>
<li><a href="#mathematics-of-deep-learning">Mathematics of Deep Learning</a>
<ul>
<li><a href="#discrete-mathematics-and--neural-networks">Discrete Mathematics and  Neural Networks</a></li>
<li><a href="#numerical-analysis-for-deep-learning">Numerical Analysis for Deep Learning</a>
<ul>
<li><a href="#resnets">ResNets</a></li>
<li><a href="#differential-equations-motivated-deep-learning-methods">Differential Equations Motivated Deep Learning Methods</a></li>
</ul>
</li>
<li><a href="#control-theory-and-deep-learning">Control Theory and Deep Learning</a></li>
<li><a href="#neural-ordinary-differential-equations">Neural Ordinary Differential Equations</a></li>
</ul>
</li>
<li><a href="#dynamics-and-deep-learning">Dynamics and Deep Learning</a>
<ul>
<li><a href="#stability-for-neural-networks">Stability For Neural Networks</a></li>
</ul>
</li>
<li><a href="#differential-equation-and-deep-learning">Differential Equation and Deep Learning</a>
<ul>
<li><a href="#deep-learning-for-pdes">Deep Learning for PDEs</a></li>
<li><a href="#mathcal-h-matrix-and-deep-learning"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">H</mi></mrow><annotation encoding="application/x-tex">\mathcal H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathcal" style="margin-right:0.00965em;">H</span></span></span></span> matrix and deep learning</a></li>
<li><a href="#stochastic-differential-equations-and-deep-learning">Stochastic Differential Equations and Deep Learning</a></li>
<li><a href="#finite-element-methods-and-deep-learning">Finite Element Methods and Deep Learning</a></li>
</ul>
</li>
<li><a href="#approximation-theory-for-deep-learning">Approximation Theory for Deep Learning</a>
<ul>
<li><a href="#workshop">Workshop</a></li>
<li><a href="#labs-and-groups">Labs and Groups</a></li>
<li><a href="#the-f-principle">The F-Principle</a></li>
</ul>
</li>
<li><a href="#inverse-problem-and-deep-learning">Inverse Problem and Deep Learning</a>
<ul>
<li><a href="#deep-learning-for-inverse-problems">Deep Learning for Inverse Problems</a></li>
<li><a href="#deep-inverse-optimization">Deep Inverse Optimization</a></li>
</ul>
</li>
<li><a href="#random-matrix-theory-and-deep-learning">Random Matrix Theory and Deep Learning</a>
<ul>
<li><a href="#nonlinear-random-matrix-theory">Nonlinear Random Matrix Theory</a></li>
</ul>
</li>
<li><a href="#deep-learning-and-optimal-transport">Deep learning and Optimal Transport</a>
<ul>
<li><a href="#generative-models-and-optimal-transport">Generative Models and Optimal Transport</a></li>
</ul>
</li>
<li><a href="#geometric-analysis-approach-to-ai">Geometric Analysis Approach to AI</a>
<ul>
<li><a href="#tropical-geometry-of-deep-neural-networks">Tropical Geometry of Deep Neural Networks</a></li>
</ul>
</li>
<li><a href="#topology-and-deep-learning">Topology and Deep Learning</a>
<ul>
<li><a href="#topology-optimization-and--deep-learning">Topology Optimization and  Deep Learning</a></li>
</ul>
</li>
<li><a href="#algebra-and-deep-learning">Algebra and Deep Learning</a>
<ul>
<li><a href="#tensor-network">Tensor network</a></li>
<li><a href="#group-equivariant-convolutional-networks">Group Equivariant Convolutional Networks</a></li>
<li><a href="#complex-valued-neural-networks">Complex Valued Neural Networks</a></li>
<li><a href="#quaternion-neural-networks">Quaternion Neural Networks</a></li>
</ul>
</li>
<li><a href="#probabilistic-theory-and-deep-learning">Probabilistic Theory and Deep Learning</a>
<ul>
<li><a href="#bayesian-deep-learning">Bayesian Deep Learning</a></li>
</ul>
</li>
<li><a href="#statistics-and-deep-learning">Statistics and Deep Learning</a>
<ul>
<li><a href="#statistical-relational-ai">Statistical Relational AI</a></li>
<li><a href="#principal-component-neural-networks">Principal Component Neural Networks</a></li>
<li><a href="#least-squares-support-vector-machines">Least squares support vector machines</a></li>
</ul>
</li>
<li><a href="#information-theory-and-deep-learning">Information Theory and Deep Learning</a>
<ul>
<li><a href="#information-bottleneck-theory">Information bottleneck theory</a></li>
</ul>
</li>
<li><a href="#brain-science-and-ai">Brain Science and AI</a>
<ul>
<li><a href="#spiking-neural-networks">Spiking neural networks</a></li>
<li><a href="#the-thousand-brains-theory-of-intelligence">The Thousand Brains Theory of Intelligence</a></li>
</ul>
</li>
<li><a href="#cognition-science-and-deep-learning">Cognition Science and Deep Learning</a></li>
<li><a href="#the-lottery-ticket-hypothesis">The lottery ticket hypothesis</a></li>
<li><a href="#double-descent">Double Descent</a></li>
</ul>
</li>
</ul>
<h2 id="resource--on-deep-learning-theory">Resource  on Deep Learning Theory</h2>
<ul>
<li><a href="http://pwp.gatech.edu/fdl-2018/program/">http://pwp.gatech.edu/fdl-2018/program/</a></li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052125/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052125/</a></li>
<li><a href="https://ori.ox.ac.uk/labs/a2i/">https://ori.ox.ac.uk/labs/a2i/</a></li>
<li><a href="https://deep-learning-drizzle.github.io/">Deep Learning Drizzle</a></li>
<li><a href="https://github.com/Stephlat/DeepRegression">A Comprehensive Analysis of Deep Regression</a></li>
<li><a href="https://gangwg.github.io/research.html">https://gangwg.github.io/research.html</a></li>
<li><a href="http://www.mit.edu/~k2smith/">http://www.mit.edu/~k2smith/</a></li>
<li><a href="http://www.dfki.de/semdeep-4/">4th Workshop on Semantic Deep Learning (SemDeep-4)</a></li>
<li><a href="https://www.lri.fr/~gcharpia/deeppractice/">Deep Learning in Practice</a></li>
<li><a href="http://guillefix.me/cosmos/static/Deep%2520learning%2520theory">Deep learning theory</a></li>
<li><a href="http://web.fsktm.um.edu.my/~cschan/iredlia.html">2018 Workshop on Interpretable &amp; Reasonable Deep Learning and its Applications (IReDLiA)</a></li>
<li><a href="https://stats385.github.io/">Analyses of Deep Learning (STATS 385) 2019</a></li>
<li><a href="http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_5058.html">The Science of Deep Learning</a></li>
<li><a href="http://workshop.tbsi.edu.cn/index.html">TBSI 2019 Retreat Conference</a></li>
<li><a href="https://people.csail.mit.edu/madry/6.883/">6.883 Science of Deep Learning: Bridging Theory and Practice -- Spring 2018</a></li>
<li><a href="http://mitliagkas.github.io/ift6085-dl-theory-class/">(Winter 2018) IFT 6085: Theoretical principles for deep learning</a></li>
<li><a href="http://principlesofdeeplearning.com/">http://principlesofdeeplearning.com/</a></li>
<li><a href="https://cbmm.mit.edu/education/courses">https://cbmm.mit.edu/education/courses</a></li>
<li><a href="http://dalimeeting.org/dali2018/workshopTheoryDL.html">DALI 2018 - Data, Learning and Inference</a></li>
<li><a href="http://www.deeplearningpatterns.com/doku.php?id=theory">On Theory@http://www.deeplearningpatterns.com </a></li>
<li><a href="https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/85815724">https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/85815724</a></li>
<li><a href="https://uvadlc.github.io/">UVA DEEP LEARNING COURSE</a></li>
<li><a href="https://rakeshchada.github.io/Neural-Embedding-Animation.html">Understanding Neural Networks by embedding hidden representations</a></li>
<li><a href="https://www.cs.washington.edu/research/tractable-deep-learning">Tractable Deep Learning</a></li>
<li><a href="https://stats385.github.io/">Theories of Deep Learning (STATS 385)</a></li>
<li><a href="https://github.com/joanbruna/stat212b">Topics Course on Deep Learning for Spring 2016 by Joan Bruna, UC Berkeley, Statistics Department</a></li>
<li><a href="http://elmos.scripts.mit.edu/mathofdeeplearning/">Mathematical aspects of Deep Learning</a></li>
<li><a href="https://deeplearning-math.github.io/">MATH 6380p. Advanced Topics in Deep Learning Fall 2018</a></li>
<li><a href="https://www.advancedtopicsindeeplearning.com/">CoMS E6998 003: Advanced Topics in Deep Learning</a></li>
<li><a href="http://www.mit.edu/~9.520/fall17/Classes/deep_learning_theory.html">Deep Learning Theory: Approximation, Optimization, Generalization</a></li>
<li><a href="https://sites.google.com/site/deeplearningtheory/">Theory of Deep Learning, ICML'2018</a></li>
<li><a href="http://dalimeeting.org/dali2018/workshopTheoryDL.html">DALI 2018, Data Learning and Inference</a></li>
<li><a href="https://github.com/joanbruna/MathsDL-spring18">MATHEMATICS OF DEEP LEARNING, NYU, Spring 2018</a></li>
<li><a href="https://www.researchgate.net/project/Theory-of-Deep-Learning">Theory of Deep Learning, project in researchgate</a></li>
<li><a href="https://physicsml.github.io/blog/DL-theory.html">THE THEORY OF DEEP LEARNING - PART I</a></li>
<li><a href="http://cognitivemedium.com/magic_paper/index.html">Magic paper</a></li>
<li><a href="https://www.padl.ws/">Principled Approaches to Deep Learning</a></li>
<li><a href="https://arxiv.org/pdf/1811.03962.pdf">A Convergence Theory for Deep Learning via Over-Parameterization</a></li>
<li><a href="https://github.com/brendenlake/AAI-site">Advancing AI through cognitive science</a></li>
<li><a href="http://stillbreeze.github.io/Deep-Learning-and-the-Demand-For-Interpretability/">Deep Learning and the Demand for Interpretability</a></li>
<li><a href="https://www.robots.ox.ac.uk/~vedaldi//research/idiu/idiu.html">Integrated and detailed image understanding</a></li>
<li><a href="http://nips2018dltheory.rice.edu/">NeuroIP 2018 workshop on Deep Learning Theory</a></li>
<li><a href="http://networkinterpretability.org/">http://networkinterpretability.org/</a></li>
<li><a href="https://interpretablevision.github.io/">https://interpretablevision.github.io/</a></li>
<li><a href="https://www.msra.cn/zh-cn/news/people-stories/wei-chen">https://www.msra.cn/zh-cn/news/people-stories/wei-chen</a></li>
<li><a href="https://www.microsoft.com/en-us/research/people/tyliu/">https://www.microsoft.com/en-us/research/people/tyliu/</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/22353056">https://zhuanlan.zhihu.com/p/22353056</a></li>
<li><a href="http://qszhang.com/index.php/team/">http://qszhang.com/index.php/team/</a></li>
<li><a href="https://www.researchgate.net/profile/Hatef_Monajemi">https://www.researchgate.net/profile/Hatef_Monajemi</a></li>
<li><a href="https://indico.cern.ch/event/781223/">Symposium Artificial Intelligence for Science, Industry and Society</a></li>
<li><a href="https://arxiv.org/abs/1909.13458">https://arxiv.org/abs/1909.13458</a></li>
<li><a href="https://www.lri.fr/TAU_seminars/">TAU &amp; GTDeepNet seminars</a></li>
</ul>
<h3 id="deep-learning-reading-group">Deep Learning Reading Group</h3>
<p><a href="http://www.cs.virginia.edu//papers.htm">yanjun</a> organized a wonderful reading group on deep learning.</p>
<ul>
<li><a href="https://a2i2.deakin.edu.au/">https://a2i2.deakin.edu.au/</a></li>
<li><a href="https://qdata.github.io/deep2Read/">https://qdata.github.io/deep2Read/</a></li>
<li><a href="https://dlta-reading.github.io/">https://dlta-reading.github.io/</a></li>
</ul>
<ul>
<li><a href="http://www.mlnl.cs.ucl.ac.uk/readingroup.html">http://www.mlnl.cs.ucl.ac.uk/readingroup.html</a></li>
<li><a href="https://labrosa.ee.columbia.edu/cuneuralnet/">https://labrosa.ee.columbia.edu/cuneuralnet/</a></li>
<li><a href="http://www.ub.edu/cvub/reading-group/">http://www.ub.edu/cvub/reading-group/</a></li>
<li><a href="https://team.inria.fr/perception/deeplearning/">https://team.inria.fr/perception/deeplearning/</a></li>
<li><a href="https://scholar.princeton.edu/csmlreading">https://scholar.princeton.edu/csmlreading</a></li>
<li><a href="https://junjuew.github.io/elijah-reading-group/">https://junjuew.github.io/elijah-reading-group/</a></li>
<li><a href="http://www.sribd.cn/DL/schedule.html">http://www.sribd.cn/DL/schedule.html</a></li>
<li><a href="http://lear.inrialpes.fr/people/gaidon/lear_xrce_deep_learning_01.html">http://lear.inrialpes.fr/people/gaidon/lear_xrce_deep_learning_01.html</a></li>
<li><a href="https://simons.berkeley.edu/events/reading-group-deep-learning">https://simons.berkeley.edu/events/reading-group-deep-learning</a></li>
<li><a href="https://csml.princeton.edu/readinggroup">https://csml.princeton.edu/readinggroup</a></li>
<li><a href="http://www.bicv.org/deep-learning/">http://www.bicv.org/deep-learning/</a></li>
<li><a href="https://www.cs.ubc.ca/labs/lci/mlrg/">https://www.cs.ubc.ca/labs/lci/mlrg/</a></li>
<li><a href="https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/">https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/</a></li>
<li><a href="https://project.inria.fr/deeplearning/">https://project.inria.fr/deeplearning/</a></li>
<li><a href="https://hustcv.github.io/reading-list.html">https://hustcv.github.io/reading-list.html</a></li>
</ul>
<h2 id="interpretability-in-ai">Interpretability in AI</h2>
<ul>
<li><a href="https://ec.europa.eu/jrc/communities/en/node/1162/article/interpretability-ai-and-its-relation-fairness-transparency-reliability-and-trust">https://ec.europa.eu/jrc/communities/en/node/1162/article/interpretability-ai-and-its-relation-fairness-transparency-reliability-and-trust</a></li>
<li><a href="https://github.com/jphall663/awesome-machine-learning-interpretability">https://github.com/jphall663/awesome-machine-learning-interpretability</a></li>
<li><a href="https://people.mpi-sws.org/~manuelgr/">https://people.mpi-sws.org/~manuelgr/</a></li>
<li><a href="https://ec.europa.eu/jrc/communities/en/community/humaint/event/2nd-humaint-winter-school-fairness-accountability-and-transparency">https://ec.europa.eu/jrc/communities/en/community/humaint/event/2nd-humaint-winter-school-fairness-accountability-and-transparency</a></li>
<li><a href="https://facctconference.org/network/">https://facctconference.org/network/</a></li>
<li><a href="https://facctconference.org/network/">https://facctconference.org/network/</a></li>
</ul>
<h3 id="interpretability-of-neural-networks">Interpretability of Neural Networks</h3>
<p><a href="http://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1700808#1"> Although deep neural networks have exhibited superior performance in various tasks, interpretability is always Achilles’ heel of deep neural networks.</a>
At present, deep neural networks obtain high discrimination power at the cost of a low interpretability of their black-box representations.
We believe that high model interpretability may help people break several bottlenecks of deep learning,
e.g., learning from a few annotations, learning via human–computer communications at the semantic level,
and semantically debugging network representations.
We focus on convolutional neural networks (CNNs), and revisit the visualization of CNN representations,
methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs
with disentangled representations, and middle-to-end learning based on model interpretability.
Finally, we discuss prospective trends in explainable artificial intelligence.</p>
<ul>
<li><a href="https://www.transai.org/">https://www.transai.org/</a></li>
<li><a href="http://games-cn.org/games-webinar-20190509-93/">GAMES Webinar 2019 – 93期(深度学习可解释性专题课程) </a></li>
<li><a href="http://games-cn.org/games-webinar-20190516-94/">GAMES Webinar 2019 – 94期(深度学习可解释性专题课程) | 刘日升（大连理工大学），张拳石（上海交通大学）</a></li>
<li><a href="http://qszhang.com/index.php/publications/">http://qszhang.com/index.php/publications/</a></li>
<li><a href="https://arxiv.org/abs/1812.07169">Explaining Neural Networks Semantically and Quantitatively</a></li>
<li><a href="https://www.jiqizhixin.com/articles/0211">https://www.jiqizhixin.com/articles/0211</a></li>
<li><a href="https://www.jiqizhixin.com/articles/030205">https://www.jiqizhixin.com/articles/030205</a></li>
<li><a href="https://mp.weixin.qq.com/s/xY7Cpe6idbOTJuyD3vwD3w">https://mp.weixin.qq.com/s/xY7Cpe6idbOTJuyD3vwD3w</a></li>
<li><a href="http://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1700808#1">http://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1700808#1</a></li>
<li><a href="https://arxiv.org/pdf/1905.11833.pdf">https://arxiv.org/pdf/1905.11833.pdf</a></li>
<li><a href="http://www.cs.sjtu.edu.cn/~leng-jw/">http://www.cs.sjtu.edu.cn/~leng-jw/</a></li>
<li><a href="https://lemondan.github.io">https://lemondan.github.io</a></li>
<li><a href="http://ise.sysu.edu.cn/teacher/teacher02/1136886.htm">http://ise.sysu.edu.cn/teacher/teacher02/1136886.htm</a></li>
<li><a href="http://www.cs.cmu.edu/~zhitingh/data/hu18texar.pdf">http://www.cs.cmu.edu/~zhitingh/data/hu18texar.pdf</a></li>
<li><a href="https://datasciencephd.eu/DSSS19/slides/GiannottiPedreschi-ExplainableAI.pdf">https://datasciencephd.eu/DSSS19/slides/GiannottiPedreschi-ExplainableAI.pdf</a></li>
<li><a href="http://www.cs.cmu.edu/~zhitingh/">http://www.cs.cmu.edu/~zhitingh/</a></li>
<li><a href="https://graphreason.github.io/">https://graphreason.github.io/</a></li>
</ul>
<ul>
<li><a href="https://beenkim.github.io/">https://beenkim.github.io/</a></li>
<li><a href="https://www.math.ucla.edu/~montufar/">https://www.math.ucla.edu/~montufar/</a></li>
<li><a href="https://link.springer.com/book/10.1007/978-3-030-28954-6">Explainable AI: Interpreting, Explaining and Visualizing Deep Learning</a></li>
<li><a href="http://www.prcv2019.com/en/index.html">http://www.prcv2019.com/en/index.html</a></li>
<li><a href="http://gr.xjtu.edu.cn/web/jiansun">http://gr.xjtu.edu.cn/web/jiansun</a></li>
<li><a href="http://www.shixialiu.com/">http://www.shixialiu.com/</a></li>
<li><a href="http://irc.cs.sdu.edu.cn/">http://irc.cs.sdu.edu.cn/</a></li>
<li><a href="https://www.seas.upenn.edu/~minchenl/">https://www.seas.upenn.edu/~minchenl/</a></li>
<li><a href="https://cs.nyu.edu/~yixinhu/">https://cs.nyu.edu/~yixinhu/</a></li>
<li><a href="http://www.cs.utexas.edu/~huangqx/">http://www.cs.utexas.edu/~huangqx/</a></li>
<li><a href="https://stats385.github.io/">https://stats385.github.io/</a></li>
</ul>
<p>Not all one can understand the relative theory or quantum theory.</p>
<ul>
<li><a href="https://github.com/zqs1022/interpretableCNN">Interpretable Convolutional Neural Networks</a></li>
</ul>
<h3 id="deeplever">DeepLEVER</h3>
<blockquote>
<p>DeepLEVER aims at explaining and verifying machine learning systems via combinatorial optimization in general and SAT in particular.
<a href="http://anitideeplever.laas.fr/project">The main thesis of the DeepLever project</a> is that a solution to address the challenges faced by ML models is at the intersection of formal methods (FM) and AI. (A recent Summit on Machine Learning Meets Formal Methods offered supporting evidence to how strategic this topic is.) The DeepLever project envisions two main lines of research, concretely explanation and verification of deep ML models, supported by existing and novel constraint reasoning technologies.</p>
</blockquote>
<ul>
<li><a href="http://anitideeplever.laas.fr/deeplever-project-has-started">DeepLEVER</a></li>
<li><a href="https://aniti.univ-toulouse.fr/index.php/en/">https://aniti.univ-toulouse.fr/index.php/en/</a></li>
<li><a href="https://jpmarquessilva.github.io/">https://jpmarquessilva.github.io/</a></li>
<li><a href="https://www.researchgate.net/profile/Martin_Cooper3">https://www.researchgate.net/profile/Martin_Cooper3</a></li>
<li><a href="http://homepages.laas.fr/ehebrard/Home.html">http://homepages.laas.fr/ehebrard/Home.html</a></li>
<li><a href="http://www.merl.com/">http://www.merl.com/</a></li>
</ul>
<h3 id="dlphi">DLphi</h3>
<blockquote>
<p>Together with the participants of the Oberwolfach Seminar: Mathematics of Deep Learning, <a href="http://www.pc-petersen.eu/">I wrote a (not entirely serious) paper</a> called &quot;The Oracle of DLPhi&quot; proving that <code>Deep Learning techniques can perform accurate classifications on test data that is entirely uncorrelated to the training data</code>. This, however, requires a couple of non-standard assumptions such as uncountably many data points and the axiom of choice. In a sense this shows that mathematical results on machine learning need to be approached with a bit of scepticism.</p>
</blockquote>
<ul>
<li><a href="https://github.com/juliusberner/oberwolfach_workshop">https://github.com/juliusberner/oberwolfach_workshop</a></li>
<li><a href="http://www.pc-petersen.eu/">http://www.pc-petersen.eu/</a></li>
<li><a href="http://voigtlaender.xyz/">http://voigtlaender.xyz/</a></li>
<li><a href="https://math.ethz.ch/sam/research/reports.html">https://math.ethz.ch/sam/research/reports.html</a></li>
<li><a href="https://arxiv.org/abs/1901.05744">The Oracle of DLphi</a></li>
<li><a href="https://faculty.washington.edu/kutz/">https://faculty.washington.edu/kutz/</a></li>
</ul>
<h3 id="scientific-machine-learning">Scientific Machine Learning</h3>
<p><a href="https://thewinnower.com/papers/25359-the-essential-tools-of-scientific-machine-learning-scientific-ml">Scientific machine learning is a burgeoning discipline which blends scientific computing and machine learning. Traditionally, scientific computing focuses on large-scale mechanistic models, usually differential equations, that are derived from scientific laws that simplified and explained phenomena. On the other hand, machine learning focuses on developing non-mechanistic data-driven models which require minimal knowledge and prior assumptions. The two sides have their pros and cons: differential equation models are great at extrapolating, the terms are explainable, and they can be fit with small data and few parameters. Machine learning models on the other hand require &quot;big data&quot; and lots of parameters but are not biased by the scientists ability to correctly identify valid laws and assumptions.</a></p>
<ul>
<li><a href="https://www.scd.stfc.ac.uk/Pages/Scientific-Machine-Learning.aspx">https://www.scd.stfc.ac.uk/Pages/Scientific-Machine-Learning.aspx</a></li>
<li><a href="https://mitmath.github.io/18337/">https://mitmath.github.io/18337/</a></li>
<li><a href="https://www.stat.purdue.edu/~fmliang/STAT598Purdue/MLS.pdf">https://www.stat.purdue.edu/~fmliang/STAT598Purdue/MLS.pdf</a></li>
<li><a href="https://sciml.ai/">https://sciml.ai/</a></li>
<li><a href="https://github.com/mitmath/18S096SciML">https://github.com/mitmath/18S096SciML</a></li>
<li><a href="https://ml4sci.lbl.gov/">https://ml4sci.lbl.gov/</a></li>
<li><a href="https://www.nottingham.ac.uk/conference/fac-sci/maths-sci/scientific-computation-using-machine-learning-algorithms/">https://www.nottingham.ac.uk/conference/fac-sci/maths-sci/scientific-computation-using-machine-learning-algorithms/</a></li>
<li><a href="https://sites.google.com/lbl.gov/ml4sci/">https://sites.google.com/lbl.gov/ml4sci/</a></li>
<li><a href="https://github.com/sciann/sciann/">SciANN: Neural Networks for Scientific Computations</a></li>
</ul>
<h2 id="physics-and-deep-learning">Physics and Deep Learning</h2>
<p>Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation.<br>
Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive.<br>
We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue.<br>
In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions.<br>
<a href="https://physics.berkeley.edu/news-events/events/20151005/the-statistical-physics-of-deep-learning-on-the-beneficial-roles-of">Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants.</a></p>
<p>In recent years, artificial intelligence has made remarkable advancements, impacting many industrial sectors dependent on complex decision-making and optimization.
Physics-leaning disciplines also face hard inference problems in complex systems: climate prediction, density matrix estimation for many-body quantum systems, material phase detection, protein-fold quality prediction, parametrization of effective models of high-dimensional neural activity, energy landscapes of transcription factor-binding, etc.
Methods using artificial intelligence have in fact already advanced progress on such problems.
<a href="http://www.physics.mcgill.ca/ai2019/">So, the question is not whether, but how AI serves as a powerful tool for data analysis in academic research, and physics-leaning disciplines in particular.</a></p>
<img src="https://d2r55xnwy6nx47.cloudfront.net/uploads/2017/09/InfoBottleneck_2880x1620.jpg" width="80%"/>
<ul>
<li><a href="https://zhuanlan.zhihu.com/p/94249675">https://zhuanlan.zhihu.com/p/94249675</a></li>
<li><a href="https://web.stanford.edu/~montanar/index.html">https://web.stanford.edu/~montanar/index.html</a></li>
<li><a href="https://www.microsoft.com/en-us/research/event/physics-ml-workshop/">Physics Meets ML</a></li>
<li><a href="http://apagom.com/physicsforests/">physics forests</a></li>
<li><a href="https://www.appliedmldays.org/">Applied Machine Learning Days</a></li>
<li><a href="http://www.ncsa.illinois.edu/Conferences/DeepLearningLSST/">DEEP LEARNING FOR MULTIMESSENGER ASTROPHYSICS: REAL-TIME DISCOVERY AT SCALE</a></li>
<li><a href="http://indico.ictp.it/event/8722/">Workshop on Science of Data Science | (smr 3283)</a></li>
<li><a href="http://www.physics.mcgill.ca/ai2019/">Physics &amp; AI Workshop</a></li>
<li><a href="https://physicsml.github.io/pages/papers.html">https://physicsml.github.io/pages/papers.html</a></li>
<li><a href="http://super-ms.mit.edu/physics-ai.html">Physics-AI opportunities at MIT</a></li>
<li><a href="https://gogul.dev/software/deep-learning-meets-physics">https://gogul.dev/software/deep-learning-meets-physics</a></li>
<li><a href="https://github.com/2prime/ODE-DL/blob/master/DL_Phy.html">https://github.com/2prime/ODE-DL/blob/master/DL_Phy.md</a></li>
<li><a href="https://physics-ai.com/">https://physics-ai.com/</a></li>
<li><a href="http://physics.usyd.edu.au/quantum/Coogee2015/Presentations/Svore.pdf">http://physics.usyd.edu.au/quantum/Coogee2015/Presentations/Svore.pdf</a></li>
<li><a href="https://ocw.mit.edu/resources/res-9-003-brains-minds-and-machines-summer-course-summer-2015/index.htm">Brains, Minds and Machines Summer Course</a></li>
<li><a href="http://amos3.aapm.org/abstracts/pdf/127-36916-419554-130797.pdf">deep medcine</a></li>
<li><a href="http://www.dam.brown.edu/people/mraissi/publications/">http://www.dam.brown.edu/people/mraissi/publications/</a></li>
<li><a href="http://www.physics.rutgers.edu/gso/SSPAR/">http://www.physics.rutgers.edu/gso/SSPAR/</a></li>
<li><a href="https://community.singularitynet.io/c/education/course-brains-minds-machines">https://community.singularitynet.io/c/education/course-brains-minds-machines</a></li>
<li><a href="https://physai.sciencesconf.org/">ARTIFICIAL INTELLIGENCE AND PHYSICS</a></li>
<li><a href="http://inspirehep.net/record/1680302/references">http://inspirehep.net/record/1680302/references</a></li>
<li><a href="https://www.pnnl.gov/computing/philms/Announcements.stm">https://www.pnnl.gov/computing/philms/Announcements.stm</a></li>
<li><a href="https://tacocohen.wordpress.com/">https://tacocohen.wordpress.com/</a></li>
<li><a href="https://cnls.lanl.gov/External/workshops.php">https://cnls.lanl.gov/External/workshops.php</a></li>
<li><a href="https://www.researchgate.net/profile/Jinlong_Wu3">https://www.researchgate.net/profile/Jinlong_Wu3</a></li>
<li><a href="http://djstrouse.com/">http://djstrouse.com/</a></li>
<li><a href="https://www.researchgate.net/scientific-contributions/2135376837_Maurice_Weiler">https://www.researchgate.net/scientific-contributions/2135376837_Maurice_Weiler</a></li>
<li><a href="https://arxiv.org/abs/1710.06096">Spontaneous Symmetry Breaking in Neural Networks</a></li>
<li><a href="https://physai.sciencesconf.org/">https://physai.sciencesconf.org/</a></li>
</ul>
<h3 id="machine-learning-for-physics">Machine Learning for Physics</h3>
<ul>
<li><a href="https://dlonsc.github.io/ISC2019/7_Keynote_DL_HEP_SofiaVallecorsa.pdf">Deep Learning in High Energy Physics</a></li>
<li><a href="https://www.ipam.ucla.edu/programs/long-programs/machine-learning-for-physics-and-the-physics-of-learning/">Machine Learning for Physics and the Physics of Learning</a></li>
<li><a href="https://machine-learning-for-physicists.org/">Machine Learning for Physics</a></li>
<li><a href="http://www.thp2.nat.uni-erlangen.de/index.php/2017_Machine_Learning_for_Physicists,_by_Florian_Marquardt">2017 Machine Learning for Physicists, by Florian Marquardt</a></li>
<li><a href="https://ml4physicalsciences.github.io/2020/">https://ml4physicalsciences.github.io/2020/</a></li>
<li><a href="http://phys.cts.nthu.edu.tw/actnews/content.php?Sn=468">Machine Learning in Physics School/Workshop</a></li>
<li><a href="http://deeplearnphysics.org/">http://deeplearnphysics.org/</a></li>
</ul>
<h4 id="deep-learning-for-physics">Deep Learning for Physics</h4>
<ul>
<li><a href="https://inspirehep.net/literature/1680302">https://inspirehep.net/literature/1680302</a></li>
<li><a href="https://www.in.tum.de/cg/teaching/winter-term-1819/deep-learning-in-physics/">Master-Seminar - Deep Learning in Physics (IN2107, IN0014)</a></li>
<li><a href="https://www.ml4science.org/agenda-physics-in-ml">https://www.ml4science.org/agenda-physics-in-ml</a></li>
<li><a href="https://www.ias.edu/events/deep-learning-physics">https://www.ias.edu/events/deep-learning-physics</a></li>
<li><a href="https://dl4physicalsciences.github.io/">https://dl4physicalsciences.github.io/</a></li>
</ul>
<h3 id="physics-for-machine-learning">Physics for Machine Learning</h3>
<ul>
<li><a href="https://tartakovsky.stanford.edu/research/physics-informed-machine-learning">https://tartakovsky.stanford.edu/research/physics-informed-machine-learning</a></li>
<li><a href="https://bids.berkeley.edu/events/physics-machine-learning-workshop">Physics in Machine Learning Workshop</a></li>
<li><a href="https://www.ml4science.org/astrophysics-in-machine-learning-workshop">Physics in Machine Learning Workshop</a></li>
<li><a href="http://phys.csail.mit.edu/papers/1.pdf">A Differentiable Physics Engine for Deep Learning</a></li>
<li><a href="https://pbdl2019.github.io/">Physics Based Vision meets Deep Learning (PBDL)</a></li>
<li><a href="https://github.com/thunil/Physics-Based-Deep-Learning">Physics-Based Deep Learning</a></li>
</ul>
<h4 id="physics-informed-machine-learning">Physics Informed Machine Learning</h4>
<ul>
<li><a href="https://sites.google.com/view/icml2019phys4dl/schedule">https://sites.google.com/view/icml2019phys4dl/schedule</a></li>
<li><a href="https://icml.cc/Conferences/2019/ScheduleMultitrack?event=3531">Theoretical Physics for Deep Learning</a></li>
<li><a href="https://sites.google.com/view/icml2019phys4dl/schedule">https://sites.google.com/view/icml2019phys4dl/schedule</a></li>
<li><a href="http://www.databookuw.com/page-5/">Physics Informed Machine Learning Workshop</a></li>
<li><a href="https://github.com/maziarraissi/PINNs">Physics Informed Neural Networks</a></li>
<li><a href="https://maziarraissi.github.io/PINNs/">https://maziarraissi.github.io/PINNs/</a></li>
</ul>
<h3 id="statistical-mechanics-and-deep-learning">Statistical Mechanics and Deep Learning</h3>
<p><a href="https://www.annualreviews.org/doi/pdf/10.1146/annurev-conmatphys-031119-050745">The recent striking success of deep neural networks in machine learning raises profound questions about the theoretical principles underlying their success. For example, what can such deep networks compute? How can we train them? How does information propagate through them? Why can they generalize? And how can we teach them to imagine? We review recent work in which methods of physical analysis rooted in statistical mechanics have begun to shed conceptual insights into these questions. These insights yield connections between deep learning and diverse physical and mathematical topics, including random landscapes, spin glasses, jamming, dynamical phase transitions, chaos, Riemannian geometry, random matrix theory, free probability, and nonequilibrium statistical mechanics. Indeed, the fields of statistical mechanics and machine learning have long enjoyed a rich history of strongly coupled interactions, and recent advances at the intersection of statistical mechanics and deep learning suggest these interactions will only deepen going forward.</a></p>
<ul>
<li><a href="https://www.icts.res.in/discussion-meeting/spmml2020">Statistical Physics of Machine Learning</a></li>
<li><a href="http://smml.io/">statistical mechanics // machine learning</a></li>
<li><a href="https://arxiv.org/abs/1906.10228">A Theoretical Connection Between Statistical Physics and Reinforcement Learning</a></li>
<li><a href="https://phys.org/news/2017-02-thermodynamics.html">The thermodynamics of learning</a></li>
<li><a href="https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/">WHY DOES DEEP LEARNING WORK?</a></li>
<li><a href="https://calculatedcontent.com/2015/04/01/why-deep-learning-works-ii-the-renormalization-group/">WHY DEEP LEARNING WORKS II: THE RENORMALIZATION GROUP</a></li>
<li><a href="https://github.com/CalculatedContent/ImplicitSelfRegularization">https://github.com/CalculatedContent/ImplicitSelfRegularization</a></li>
<li><a href="https://sites.google.com/site/torbenkruegermath/home/graduate-seminar-random-matrices-spin-glasses-deep-learning">torbenkruegermath</a></li>
<li><a href="https://calculatedcontent.com/2019/12/03/towards-a-new-theory-of-learning-statistical-mechanics-of-deep-neural-networks/">TOWARDS A NEW THEORY OF LEARNING: STATISTICAL MECHANICS OF DEEP NEURAL NETWORKS</a></li>
<li><a href="https://www.annualreviews.org/doi/pdf/10.1146/annurev-conmatphys-031119-050745">Statistical Mechanics of Deep Learning</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/90096775">https://zhuanlan.zhihu.com/p/90096775</a></li>
</ul>
<h3 id="born-machine">Born Machine</h3>
<p>Born machine is a Probabilistic Generative Modeling.</p>
<ul>
<li><a href="https://journals.aps.org/prx/abstract/10.1103/PhysRevX.8.031012#fulltext">Unsupervised Generative Modeling Using Matrix Product States</a></li>
<li><a href="https://wangleiphy.github.io/talks/BornMachine-USTC.pdf">https://wangleiphy.github.io/talks/BornMachine-USTC.pdf</a></li>
<li><a href="https://github.com/congzlwag/UnsupGenModbyMPS">https://github.com/congzlwag/UnsupGenModbyMPS</a></li>
<li><a href="https://congzlwag.github.io/UnsupGenModbyMPS/">https://congzlwag.github.io/UnsupGenModbyMPS/</a></li>
<li><a href="https://github.com/congzlwag/BornMachineTomo">https://github.com/congzlwag/BornMachineTomo</a></li>
<li><a href="https://wangleiphy.github.io/talks/BornMachine.pdf">From Baltzman machine to Born Machine</a></li>
<li><a href="https://quantum.ustc.edu.cn/web/node/623">Born Machines: A fresh approach to quantum machine learning</a></li>
<li><a href="https://github.com/GiggleLiu/QuantumCircuitBornMachine">Gradient based training of Quantum Circuit Born Machine (QCBM)</a></li>
</ul>
<h3 id="quantum-machine-learning">Quantum Machine learning</h3>
<p><a href="https://peterwittek.com/">Quantum Machine Learning: What Quantum Computing Means to Data Mining explains the most relevant concepts of machine learning, quantum mechanics, and quantum information theory, and contrasts classical learning algorithms to their quantum counterparts.</a></p>
<ul>
<li><a href="https://www.quantummachinelearning.org/events.html">Combining quantum information and machine learning</a></li>
<li><a href="https://www.mpl.mpg.de/divisions/marquardt-division/workshops/2019-machine-learning-for-quantum-technology/">machine learning for quantum technology/</a></li>
<li><a href="https://wangleiphy.github.io/">https://wangleiphy.github.io/</a></li>
<li><a href="https://tacocohen.wordpress.com">https://tacocohen.wordpress.com</a></li>
<li><a href="https://peterwittek.com/qml-in-2015.html">https://peterwittek.com/qml-in-2015.html</a></li>
<li><a href="https://github.com/krishnakumarsekar/awesome-quantum-machine-learning">https://github.com/krishnakumarsekar/awesome-quantum-machine-learning</a></li>
<li><a href="https://peterwittek.com/">https://peterwittek.com/</a></li>
</ul>
<ul>
<li><a href="https://wangleiphy.github.io/lectures/DL.pdf">Lecture Note on Deep Learning and Quantum Many-Body Computation</a></li>
<li><a href="http://www.math.chalmers.se/~stig/project4.pdf">Quantum Deep Learning and Renormalization</a></li>
</ul>
<hr>
<ul>
<li><a href="https://scholar.harvard.edu/madvani/home">https://scholar.harvard.edu/madvani/home</a></li>
<li><a href="https://www.elen.ucl.ac.be/esann/index.php?pg=specsess#statistical">https://www.elen.ucl.ac.be/esann/index.php?pg=specsess#statistical</a></li>
<li><a href="https://krzakala.github.io/cargese.io/program.html">https://krzakala.github.io/cargese.io/program.html</a></li>
<li><a href="https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/">New Theory Cracks Open the Black Box of Deep Learning</a></li>
<li><a href="https://ai.googleblog.com/2019/03/unifying-physics-and-deep-learning-with.html">Unifying Physics and Deep Learning with TossingBot</a></li>
</ul>
<h2 id="mathematics-of-deep-learning">Mathematics of Deep Learning</h2>
<ul>
<li><a href="https://www.4tu.nl/ami/en/Agenda-Events/">Meeting on Mathematics of Deep Learning</a></li>
<li><a href="http://www.yanivplan.com/math-608d">Probability in high dimensions</a></li>
<li><a href="https://math.ethz.ch/sam/research/reports.html?year=2019">https://math.ethz.ch/sam/research/reports.html?year=2019</a></li>
<li><a href="http://rt.dgyblog.com/ref/ref-learning-deep-learning.html">Learning Deep Learning</a></li>
<li><a href="https://github.com/leiwu1990/course.math_theory_nn">Summer school on Deep Learning Theory by Weinan E</a></li>
<li><a href="http://www.mit.edu/~9.520/fall18/">.520/6.860: Statistical Learning Theory and Applications, Fall 2018</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/40097048">2018上海交通大学深度学习理论前沿研讨会 - 凌泽南的文章 - 知乎</a></li>
<li><a href="https://www.researchgate.net/project/Theories-of-Deep-Learning">Theories of Deep Learning</a></li>
</ul>
<p>A mathematical theory of deep networks and of why they work as well as they do is now emerging.
<a href="http://www.mit.edu/~9.520/fall17/Classes/deep_learning_theory.html">I will review some recent theoretical results on the approximation power of deep networks</a>
including conditions under which they can be exponentially better than shallow learning.
A class of deep convolutional networks represent an important special case of these conditions,
though weight sharing is not the main reason for their exponential advantage.
I will also discuss another puzzle around deep networks: what guarantees that they generalize and
they do not overfit despite the number of weights being larger than the number of training data and despite the absence of explicit regularization in the optimization?</p>
<p>Deep Neural Networks and Partial Differential Equations: Approximation Theory and
Structural Properties
Philipp Petersen, University of Oxford</p>
<ul>
<li><a href="https://memento.epfl.ch/event/a-theoretical-analysis-of-machine-learning-and-par/">https://memento.epfl.ch/event/a-theoretical-analysis-of-machine-learning-and-par/</a></li>
<li><a href="http://at.yorku.ca/c/b/p/g/30.htm">http://at.yorku.ca/c/b/p/g/30.htm</a></li>
<li><a href="https://mat.univie.ac.at/~grohs/">https://mat.univie.ac.at/~grohs/</a></li>
<li><a href="https://www.math.tamu.edu/~bhanin/DL2018.html">Deep Learning: Theory and Applications (Math 689 Fall 2018)</a></li>
<li><a href="https://joanbruna.github.io/MathsDL-spring18/">Topics course Mathematics of Deep Learning, NYU, Spring 18</a></li>
<li><a href="https://skymind.ai/ebook/Skymind_The_Math_Behind_Neural_Networks.pdf">https://skymind.ai/ebook/Skymind_The_Math_Behind_Neural_Networks.pdf</a></li>
<li><a href="https://github.com/markovmodel/deeptime">https://github.com/markovmodel/deeptime</a></li>
<li><a href="https://omar-florez.github.io/scratch_mlp/">https://omar-florez.github.io/scratch_mlp/</a></li>
<li><a href="https://joanbruna.github.io/MathsDL-spring19/">https://joanbruna.github.io/MathsDL-spring19/</a></li>
<li><a href="https://github.com/isikdogan/deep_learning_tutorials">https://github.com/isikdogan/deep_learning_tutorials</a></li>
<li><a href="https://www.brown.edu/research/projects/crunch/machine-learning-x-seminars">https://www.brown.edu/research/projects/crunch/machine-learning-x-seminars</a></li>
<li><a href="http://anotherdatum.com/tce_2018.html">Deep Learning: Theory &amp; Practice</a></li>
<li><a href="https://www.math.ias.edu/wtdl">https://www.math.ias.edu/wtdl</a></li>
<li><a href="https://www.ml.tu-berlin.de/menue/mitglieder/klaus-robert_mueller/">https://www.ml.tu-berlin.de/menue/mitglieder/klaus-robert_mueller/</a></li>
<li><a href="https://www-m15.ma.tum.de/Allgemeines/MathFounNN">https://www-m15.ma.tum.de/Allgemeines/MathFounNN</a></li>
<li><a href="https://www.math.purdue.edu/~buzzard/MA598-Spring2019/index.shtml">https://www.math.purdue.edu/~buzzard/MA598-Spring2019/index.shtml</a></li>
<li><a href="http://mathematics-in-europe.eu/?p=801">http://mathematics-in-europe.eu/?p=801</a></li>
<li><a href="https://cims.nyu.edu/~bruna/">https://cims.nyu.edu/~bruna/</a></li>
<li><a href="https://www.math.ias.edu/wtdl">https://www.math.ias.edu/wtdl</a></li>
<li><a href="https://www.pims.math.ca/scientific-event/190722-pcssdlcm">https://www.pims.math.ca/scientific-event/190722-pcssdlcm</a></li>
<li><a href="https://www.embl.de/training/events/2020/MAC20-01/">Deep Learning for Image Analysis EMBL COURSE</a></li>
<li><a href="https://deeplearning-math.github.io/2018spring.html">MATH 6380o. Deep Learning: Towards Deeper Understanding, Spring 2018</a></li>
<li><a href="https://github.com/joanbruna/MathsDL-spring19">Mathematics of Deep Learning, Courant Insititute, Spring 19</a></li>
<li><a href="http://voigtlaender.xyz/">http://voigtlaender.xyz/</a></li>
<li><a href="http://www.mit.edu/~9.520/fall19/">http://www.mit.edu/~9.520/fall19/</a></li>
<li><a href="https://gateway.newton.ac.uk/event/ofbw46/programme">The Mathematics of Deep Learning and Data Science - Programme</a></li>
</ul>
<ul>
<li><a href="https://www.brown.edu/research/projects/crunch/">Home of Math + Machine Learning + X</a></li>
<li><a href="http://crm.sns.it/event/451/">Mathematical and Computational Aspects of Machine Learning</a></li>
<li><a href="https://www.researchgate.net/project/Mathematical-Theory-for-Deep-Neural-Networks">Mathematical Theory for Deep Neural Networks</a></li>
<li><a href="https://www.researchgate.net/project/Theory-of-Deep-Learning">Theory of Deep Learning</a></li>
<li><a href="http://dalimeeting.org/dali2018/workshopTheoryDL.html">DALI 2018 - Data, Learning and Inference</a></li>
<li><a href="https://www.math-berlin.de/academics/summer-schools/2019">BMS Summer School 2019: Mathematics of Deep Learning</a></li>
<li><a href="https://www.siam.org/conferences/cm/conference/mds20">SIAM Conference on Mathematics of Data Science (MDS20)</a></li>
</ul>
<ul>
<li><a href="http://web.cs.ucla.edu/~qgu/research.html">http://web.cs.ucla.edu/~qgu/research.html</a></li>
<li><a href="https://sgo-workshop.github.io/">BRIDGING GAME THEORY AND DEEP LEARNING</a></li>
</ul>
<h3 id="discrete-mathematics-and--neural-networks">Discrete Mathematics and  Neural Networks</h3>
<ul>
<li><a href="http://proceedings.mlr.press/v28/ermon13.html">http://proceedings.mlr.press/v28/ermon13.html</a></li>
<li><a href="https://sites.google.com/view/ijcai2019dso/">https://sites.google.com/view/ijcai2019dso/</a></li>
<li><a href="http://www.cas.mcmaster.ca/~deza/tokyo2018progr.html">http://www.cas.mcmaster.ca/~deza/tokyo2018progr.html</a></li>
<li><a href="https://www.cs.cornell.edu/~bistra/">https://www.cs.cornell.edu/~bistra/</a></li>
<li><a href="https://epubs.siam.org/doi/book/10.1137/1.9780898718539?mobileUi=0">Discrete Mathematics of Neural Networks: Selected Topics</a></li>
<li><a href="https://www.math.uwaterloo.ca/~bico/co759/2018/index.html">Deep Learning in Computational Discrete Optimization</a></li>
<li><a href="http://www.ams.jhu.edu/~wcook12/dl/index.html">Deep Learning in Discrete Optimization</a></li>
<li><a href="https://web-app.usc.edu/soc/syllabus/20201/30126.pdf">https://web-app.usc.edu/soc/syllabus/20201/30126.pdf</a></li>
<li><a href="http://www.columbia.edu/~yf2414/Slides.pdf">http://www.columbia.edu/~yf2414/Slides.pdf</a></li>
<li><a href="http://www.columbia.edu/~yf2414/teach.html">http://www.columbia.edu/~yf2414/teach.html</a></li>
<li><a href="https://opt-ml.org/cfp.html">https://opt-ml.org/cfp.html</a></li>
</ul>
<hr>
<ul>
<li><a href="http://www.cas.mcmaster.ca/~deza/slidesRIKEN2019/huchette.pdf">Strong mixed-integer programming formulations for trained neural networks by Joey Huchette1</a></li>
<li><a href="https://link.springer.com/article/10.1007/s10601-018-9285-6">Deep neural networks and mixed integer linear optimization</a></li>
<li><a href="http://www.dei.unipd.it/~fisch/papers/slides/2018%20Dagstuhl%20%5BFischetti%20on%20DL%5D.pdf">Matteo Fischetti, University of Padova</a></li>
<li><a href="https://arxiv.org/abs/1712.06174">Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study</a></li>
<li><a href="https://www.researchgate.net/profile/Matteo_Fischetti">https://www.researchgate.net/profile/Matteo_Fischetti</a></li>
<li><a href="http://www.amp.i.kyoto-u.ac.jp/tecrep/ps_file/2019/2019-001.pdf">A Mixed Integer Linear Programming Formulation to Artificial Neural Networks</a></li>
<li><a href="http://www.optimization-online.org/DB_FILE/2019/07/7276.pdf">ReLU Networks as Surrogate Models in Mixed-Integer Linear Programs</a></li>
</ul>
<h3 id="numerical-analysis-for-deep-learning">Numerical Analysis for Deep Learning</h3>
<p>Dynamics of deep learning is to  consider deep learning as a dynamic system.
For example, the forward feedback network is expressed in the recurrent form:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>=</mo><msub><mi>f</mi><mi>t</mi></msub><mo stretchy="false">(</mo><msup><mi>x</mi><mi>t</mi></msup><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">x^{t+1} = f_t(x^{t}),t\in [0,1,\cdots, T]
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.864108em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.864108em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.093556em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.843556em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span></span></span></span></span></p>
<p>where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>f</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">f_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> is some nonlinear function and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>t</mi></mrow><annotation encoding="application/x-tex">t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.61508em;vertical-align:0em;"></span><span class="mord mathdefault">t</span></span></span></span> is discrete.</p>
<p>However, it is not easy to select a proper nonlinear function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>f</mi><mi>t</mi></msub><mtext> </mtext><mtext> </mtext><mi mathvariant="normal">∀</mi><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">f_t \,\,\forall t\in[0,1,\cdots, T]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.10764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">∀</span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span></span></span></span> and the number <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi></mrow><annotation encoding="application/x-tex">T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span></span></span></span>.
In another word, there are no unified scientific principle or  guide to design the structure of deep neural network models.</p>
<p>Many recursive formula share the same <code>feedback</code> forms or hidden structure, where the next input is the output of previous or historical record or generated points.</p>
<ul>
<li><a href="http://www.vvz.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?lerneinheitId=136996&amp;semkez=2020S&amp;lang=en">401-3650-19L  Numerical Analysis Seminar: Mathematics of Deep Neural Network Approximation</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/talks/">http://www.mathcs.emory.edu/~lruthot/talks/</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/courses/math789r-sp20.html">CS 584 / MATH 789R - Numerical Methods for Deep Learning</a></li>
<li><a href="https://github.com/IPAIopen/NumDL-CourseNotes">Numerical methods for deep learning</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/courses/NumDL/index.html">Short Course on Numerical Methods for Deep Learning</a></li>
<li><a href="http://www.ms.uky.edu/~qye/MA721/ma721F17.html">MA 721: Topics in Numerical Analysis: Deep Learning</a></li>
</ul>
<ul>
<li><a href="http://phys2018.csail.mit.edu/papers/29.pdf">Physics-Based Deep Learning for Fluid Flow</a></li>
</ul>
<h4 id="resnets">ResNets</h4>
<p><code>Deep Residual Networks</code> won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
It inspired more efficient forward  convolutional networks.</p>
<p>They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to a residual block in which the convolution layers predict a residual that is added to the block’s input tensor.</p>
<img src="https://raw.githubusercontent.com/torch/torch.github.io/master/blog/_posts/images/resnets_1.png" width="40%"/>
<ul>
<li><a href="https://github.com/KaimingHe/deep-residual-networks">https://github.com/KaimingHe/deep-residual-networks</a></li>
<li><a href="http://torch.ch/blog/2016/02/04/resnets.html">http://torch.ch/blog/2016/02/04/resnets.html</a></li>
<li><a href="https://zh.gluon.ai/chapter_convolutional-neural-networks/resnet.html">https://zh.gluon.ai/chapter_convolutional-neural-networks/resnet.html</a></li>
<li><a href="https://www.jiqizhixin.com/articles/042201">https://www.jiqizhixin.com/articles/042201</a></li>
<li><a href="http://www.smartchair.org/hp/MSML2020/Paper/">http://www.smartchair.org/hp/MSML2020/Paper/</a></li>
<li><a href="https://github.com/liuzhuang13/DenseNet">https://github.com/liuzhuang13/DenseNet</a></li>
<li><a href="https://arxiv.org/abs/1810.11741">https://arxiv.org/abs/1810.11741</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S0893608019301820?via%3Dihub">Depth with nonlinearity creates no bad local minima in ResNets</a></li>
<li><a href="https://arxiv.org/abs/1910.13157">LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks</a></li>
</ul>
<p><strong>Reversible Residual Network</strong></p>
<ul>
<li><a href="https://arxiv.org/abs/1707.04585">The Reversible Residual Network: Backpropagation Without Storing Activations</a></li>
<li><a href="https://ai.googleblog.com/2020/01/reformer-efficient-transformer.html">https://ai.googleblog.com/2020/01/reformer-efficient-transformer.html</a></li>
<li><a href="https://arxiv.org/abs/2001.04451">https://arxiv.org/abs/2001.04451</a></li>
<li><a href="https://ameroyer.github.io/reading-notes/architectures/2019/05/07/the_reversible_residual_network.html">https://ameroyer.github.io/reading-notes/architectures/2019/05/07/the_reversible_residual_network.html</a></li>
<li><a href="https://arxiv.org/abs/1812.04352">Layer-Parallel Training of Deep Residual Neural Networks</a></li>
</ul>
<h4 id="differential-equations-motivated-deep-learning-methods">Differential Equations Motivated Deep Learning Methods</h4>
<p>This section is on insight from numerical analysis to inspire more effective deep learning architecture.</p>
<p><a href="https://web.stanford.edu/~yplu/proj/lm/">Many effective networks can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures.</a></p>
<p><a href="http://www.mathcs.emory.edu/~lruthot/courses/NumDL/index.html">We show that residual neural networks can be interpreted as discretizations of a nonlinear time-dependent ordinary differential equation that depends on unknown parameters, i.e., the network weights. We show how this insight has been used, e.g., to study the <code>stability of neural networks, design new architectures, or use established methods from optimal control methods for training ResNets</code>. Finally, we discuss open questions and opportunities for mathematical advances in this area.</a></p>
<ul>
<li><a href="https://elsc.huji.ac.il/all-publications/1050">Path integral approach to random neural networks</a></li>
<li><a href="https://rkevingibson.github.io/blog/neural-networks-as-ordinary-differential-equations/">NEURAL NETWORKS AS ORDINARY DIFFERENTIAL EQUATIONS</a></li>
<li><a href="https://zhenyu-liao.github.io/pdf/pre/GDD_iCODE.pdf">Dynamical aspects of Deep Learning</a></li>
<li><a href="http://www.doc.ic.ac.uk/~ae/teaching.html#complex">Dynamical Systems and Deep Learning</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/71747175">https://zhuanlan.zhihu.com/p/71747175</a></li>
<li><a href="https://web.stanford.edu/~yplu/project.html">https://web.stanford.edu/~yplu/project.html</a></li>
<li><a href="https://github.com/2prime/ODE-DL/">https://github.com/2prime/ODE-DL/</a></li>
<li><a href="https://arxiv.org/pdf/1804.04272.pdf">Deep Neural Networks Motivated by Partial Differential Equations</a></li>
</ul>
<ul>
<li><a href="https://www.researchgate.net/scientific-contributions/2107227289_Eldad_Haber">https://www.researchgate.net/scientific-contributions/2107227289_Eldad_Haber</a></li>
</ul>
<p>Residual networks as discretizations of dynamic systems:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>Y</mi><mn>1</mn></msub><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo>+</mo><mi>h</mi><mi>σ</mi><mo stretchy="false">(</mo><msub><mi>K</mi><mn>0</mn></msub><msub><mi>Y</mi><mn>0</mn></msub><mo>+</mo><msub><mi>b</mi><mn>0</mn></msub><mo stretchy="false">)</mo><mspace linebreak="newline"></mspace><mi><mi mathvariant="normal">⋮</mi><mpadded height="+0em" voffset="0em"><mspace mathbackground="black" width="0em" height="1.5em"></mspace></mpadded></mi><mspace linebreak="newline"></mspace><msub><mi>Y</mi><mi>N</mi></msub><mo>=</mo><msub><mi>Y</mi><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><mi>h</mi><mi>σ</mi><mo stretchy="false">(</mo><msub><mi>K</mi><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mi>Y</mi><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>b</mi><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">Y_1 = Y_0 +h \sigma(K_0 Y_0 + b_0)\\
\vdots  \\
Y_N = Y_{N-1} +h \sigma(K_{N-1} Y_{N-1} + b_{N-1})
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">h</span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height:1.53em;vertical-align:-0.03em;"></span><span class="mord"><span class="mord">⋮</span><span class="mord rule" style="border-right-width:0em;border-top-width:1.5em;bottom:0em;"></span></span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.32833099999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.891661em;vertical-align:-0.208331em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.328331em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">h</span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.328331em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.328331em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.328331em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.10903em;">N</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></p>
<p>This is nothing but a forward Euler discretization of the <code>Ordinary Differential Equation (ODE)</code>:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∂</mi><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<p>The goal is to plan a path (via <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi></mrow><annotation encoding="application/x-tex">b</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathdefault">b</span></span></span></span>) such that the initial data can be linearly separated.</p>
<img src="http://www.mathcs.emory.edu/~lruthot/img/DeepLearning.png" width="80%" />
<p>Another idea is to ensure stability by design / constraints on <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>σ</mi></mrow><annotation encoding="application/x-tex">\sigma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">K(t), b(t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span></span></span></span>.</p>
<p>ResNet with antisymmetric transformation matrix:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∂</mi><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mo stretchy="false">[</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>−</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mo stretchy="false">]</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\partial Y(t)=\sigma([K(t)-K(t)^T] Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mopen">[</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mclose">]</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<p>Hamiltonian-like ResNet</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi mathvariant="normal">d</mi><mrow><mi mathvariant="normal">d</mi><mi>t</mi></mrow></mfrac><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Z</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mo>=</mo><mi>σ</mi><mo stretchy="false">[</mo><mo stretchy="false">(</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mi>Z</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mo>−</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\frac{\mathrm d}{\mathrm d t}(Y(t), Z(t))^T=\sigma[(K(t)Z(t), -K(t)^T Y(t))^T + b(t)], t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.05744em;vertical-align:-0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.37144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathrm">d</span><span class="mord mathdefault">t</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathrm">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">Z</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">[</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.07153em;">Z</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">−</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">]</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<p><code>Parabolic Residual Neural Networks</code></p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∂</mi><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<p><code>Hyperbolic Residual Neural Networks</code></p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∂</mi><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<p><code>Hamiltonian CNN</code></p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∂</mi><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>K</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>b</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mo separator="true">,</mo><mi>t</mi><mo>∈</mo><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>T</mi><mo stretchy="false">]</mo><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">]</span><span class="mord">.</span></span></span></span></span></p>
<ul>
<li><a href="https://github.com/IPAIopen/NumDL-CourseNotes">Numerical methods for deep learning</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/courses/NumDL/index.html">Short Course on Numerical Methods for Deep Learning</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/talks/2019-LR-IPAM-ODE-handout.pdf">Deep Neural Networks Motivated By Ordinary Differential Equations</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/courses/NumDL/3-NumDNNshort-ContinuousModels.pdf">Continuous Models: Numerical Methods for Deep Learning</a></li>
<li><a href="https://arxiv.org/abs/1905.10484">Fully Hyperbolic Convolutional Neural Networks</a></li>
<li><a href="https://eldad-haber.webnode.com/selected-talks/">https://eldad-haber.webnode.com/selected-talks/</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/courses/NumDL/3-NumDNNshort-ContinuousModels.pdf">http://www.mathcs.emory.edu/~lruthot/courses/NumDL/3-NumDNNshort-ContinuousModels.pdf</a></li>
</ul>
<img src="https://pic4.zhimg.com/80/v2-542db02f15d327ccc7558df7a8e6e137_hd.jpg" width="60%"/>
<p><code>Numerical differential equation inspired networks</code>:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtable width="100%"><mtr><mtd width="50%"></mtd><mtd><mrow><msub><mi>Y</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mi>k</mi><mi>t</mi></msub><mo stretchy="false">)</mo><msub><mi>Y</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>k</mi><mi>t</mi></msub><msub><mi>Y</mi><mi>t</mi></msub><mo>+</mo><mi>h</mi><mi>σ</mi><mo stretchy="false">(</mo><msub><mi>K</mi><mi>t</mi></msub><msub><mi>Y</mi><mi>t</mi></msub><mo>+</mo><msub><mi>b</mi><mi>t</mi></msub><mo stretchy="false">)</mo><mi mathvariant="normal">.</mi></mrow></mtd><mtd width="50%"></mtd><mtd><mtext>(Linear multi-step structure)</mtext></mtd></mtr></mtable><annotation encoding="application/x-tex">Y_{t+1} = (1-k_t)Y_{t-1} + k_t Y_t + h \sigma(K_{t} Y_{t} + b_{t})\tag{Linear multi-step structure}.
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.891661em;vertical-align:-0.208331em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.03148em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.03148em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">h</span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.07153em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord">.</span></span><span class="tag"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">(</span><span class="mord"><span class="mord">L</span><span class="mord">i</span><span class="mord">n</span><span class="mord">e</span><span class="mord">a</span><span class="mord">r</span><span class="mord"> </span><span class="mord">m</span><span class="mord">u</span><span class="mord">l</span><span class="mord">t</span><span class="mord">i</span><span class="mord">-</span><span class="mord">s</span><span class="mord">t</span><span class="mord">e</span><span class="mord">p</span><span class="mord"> </span><span class="mord">s</span><span class="mord">t</span><span class="mord">r</span><span class="mord">u</span><span class="mord">c</span><span class="mord">t</span><span class="mord">u</span><span class="mord">r</span><span class="mord">e</span></span><span class="mord">)</span></span></span></span></span></span></p>
<ul>
<li><a href="https://web.stanford.edu/~yplu/proj/lm/">Bridging Deep Architects and Numerical Differential Equations</a></li>
<li><a href="http://helper.ipam.ucla.edu/publications/glws3/glws3_15460.pdf">BRIDGING DEEP NEURAL NETWORKS AND DIFFERENTIAL EQUATIONS FOR IMAGE ANALYSIS AND BEYOND</a></li>
<li><a href="https://arxiv.org/abs/1710.10121">Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations</a></li>
<li><a href="http://bicmr.pku.edu.cn/~dongbin/">http://bicmr.pku.edu.cn/~dongbin/</a></li>
<li><a href="https://arxiv.org/pdf/1906.02762.pdf">https://arxiv.org/pdf/1906.02762.pdf</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/87999707">Neural ODE Paper List</a></li>
</ul>
<ul>
<li><a href="https://ieeexplore.ieee.org/document/8281501">A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening</a></li>
<li><a href="https://arxiv.org/abs/1808.02376">https://arxiv.org/abs/1808.02376</a></li>
<li><a href="https://www.nature.com/articles/s41598-018-22871-z">Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s Disease using structural MR and FDG-PET images</a></li>
</ul>
<p><code>MgNet</code></p>
<p><a href="https://arxiv.org/pdf/1901.10415.pdf">As the solution space is often the dual of the data space in PDEs, the
analogous concept of feature space and data space (which are dual to each other) is introduced
in CNN. With such connections and new concept in the unified model, the function of various
convolution operations and pooling used in CNN can be better understood.</a></p>
<ul>
<li><a href="https://arxiv.org/pdf/1901.10415.pdf">MgNet: A Unified Framework of Multigrid and Convolutional Neural Network</a></li>
<li><a href="http://www.multigrid.org/img2019/img2019/Index/shortcourse.html">http://www.multigrid.org/img2019/img2019/Index/shortcourse.html</a></li>
<li><a href="https://deepai.org/machine-learning/researcher/jinchao-xu">https://deepai.org/machine-learning/researcher/jinchao-xu</a></li>
</ul>
<hr>
<ul>
<li><a href="http://www.ms.uky.edu/~qye/MA721/ma721F17.html">MA 721: Topics in Numerical Analysis: Deep Learning</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/teaching.html">http://www.mathcs.emory.edu/~lruthot/teaching.html</a></li>
<li><a href="https://www.math.ucla.edu/applied/cam">https://www.math.ucla.edu/applied/cam</a></li>
<li><a href="http://www.mathcs.emory.edu/~lruthot/">http://www.mathcs.emory.edu/~lruthot/</a></li>
<li><a href="https://autodiff-workshop.github.io/slides/Hueckelheim_nips_autodiff_CNN_PDE.pdf">Automatic Differentiation of Parallelised Convolutional Neural Networks - Lessons from Adjoint PDE Solvers</a></li>
<li><a href="https://www.math.tu-berlin.de/fachgebiete_ag_modnumdiff/angewandtefunktionalanalysis/v_menue/mitarbeiter/kutyniok/v_menue/kutyniok_publications/">A Theoretical Analysis of Deep Neural Networks and Parametric PDEs.</a></li>
<li><a href="https://raoyongming.github.io/">https://raoyongming.github.io/</a></li>
<li><a href="https://sites.google.com/prod/view/haizhaoyang/">https://sites.google.com/prod/view/haizhaoyang/</a></li>
<li><a href="https://github.com/HaizhaoYang">https://github.com/HaizhaoYang</a></li>
<li><a href="https://www.stat.uchicago.edu/events/rtg/index.shtml">https://www.stat.uchicago.edu/events/rtg/index.shtml</a></li>
</ul>
<h3 id="control-theory-and-deep-learning">Control Theory and Deep Learning</h3>
<p><a href="http://scriptedonachip.com/ml-control">It arose out of control theory literature when people were trying to identify highly complex and nonlinear dynamical systems. Neural networks – artificial neural networks – were first used in a supervised learning scenario in control theory. Hornik, if I remember correctly, was the first to find that neural networks were universal approximators.</a></p>
<blockquote>
<p>Supervised Deep Learning Problem
Given training data, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>Y</mi><mn>0</mn></msub></mrow><annotation encoding="application/x-tex">Y_0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, and labels, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi></mrow><annotation encoding="application/x-tex">C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span></span></span></span>, find network parameters <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>θ</mi></mrow><annotation encoding="application/x-tex">\theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span></span></span></span> and
classification weights <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi><mo separator="true">,</mo><mi>μ</mi></mrow><annotation encoding="application/x-tex">W, \mu</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">μ</span></span></span></span> such that the DNN predicts the data-label
relationship (and generalizes to new data), i.e., solve</p>
</blockquote>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo><mi mathvariant="normal">minimize</mi><mo>⁡</mo></mo><mrow><mi>θ</mi><mo separator="true">,</mo><mi>W</mi><mo separator="true">,</mo><mi>μ</mi></mrow></msub><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">[</mo><mi>g</mi><mo stretchy="false">(</mo><mi>W</mi><mo separator="true">,</mo><mi>μ</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>C</mi><mo stretchy="false">]</mo><mo>+</mo><mi>r</mi><mi>e</mi><mi>g</mi><mi>u</mi><mi>l</mi><mi>a</mi><mi>r</mi><mi>i</mi><mi>z</mi><mi>e</mi><mi>r</mi><mo stretchy="false">[</mo><mi>θ</mi><mo separator="true">,</mo><mi>W</mi><mo separator="true">,</mo><mi>μ</mi><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">\operatorname{minimize}_{ \theta,W,\mu} loss[g(W, \mu), C] + regularizer[\theta,W,\mu]
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mop"><span class="mop"><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span><span class="mord mathrm">i</span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">z</span><span class="mord mathrm">e</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">μ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">[</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault">u</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">i</span><span class="mord mathdefault" style="margin-right:0.04398em;">z</span><span class="mord mathdefault">e</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">[</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">μ</span><span class="mclose">]</span></span></span></span></span></p>
<p>This can rewrite in a compact form</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo><mi mathvariant="normal">minimize</mi><mo>⁡</mo></mo><mrow><mi>θ</mi><mo separator="true">,</mo><mi>W</mi><mo separator="true">,</mo><mi>μ</mi></mrow></msub><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</mi><mo stretchy="false">[</mo><mi>g</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">(</mo><mi>T</mi><mo stretchy="false">)</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>T</mi><mo stretchy="false">)</mo><mo>+</mo><mi>μ</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>C</mi><mo stretchy="false">]</mo><mo>+</mo><mi>r</mi><mi>e</mi><mi>g</mi><mi>u</mi><mi>l</mi><mi>a</mi><mi>r</mi><mi>i</mi><mi>z</mi><mi>e</mi><mi>r</mi><mo stretchy="false">[</mo><mi>θ</mi><mo separator="true">,</mo><mi>W</mi><mo separator="true">,</mo><mi>μ</mi><mo stretchy="false">]</mo><mspace linebreak="newline"></mspace><mtext>subject to </mtext><msub><mi mathvariant="normal">∂</mi><mi>t</mi></msub><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>θ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo separator="true">,</mo><mi>Y</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mi>Y</mi><mn>0</mn></msub><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\operatorname{minimize}_{ \theta,W,\mu} loss[g(W(T)Y(T)+\mu), C] + regularizer[\theta,W,\mu]\\
\text{subject to  }\partial_t Y(t) = f (Y(t), \theta(t)), Y(0) = Y_0.</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="mop"><span class="mop"><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">n</span><span class="mord mathrm">i</span><span class="mord mathrm">m</span><span class="mord mathrm">i</span><span class="mord mathrm">z</span><span class="mord mathrm">e</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.02778em;">θ</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight" style="margin-right:0.13889em;">W</span><span class="mpunct mtight">,</span><span class="mord mathdefault mtight">μ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.286108em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">o</span><span class="mord mathdefault">s</span><span class="mord mathdefault">s</span><span class="mopen">[</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">)</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault">μ</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault">u</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">i</span><span class="mord mathdefault" style="margin-right:0.04398em;">z</span><span class="mord mathdefault">e</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mopen">[</span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">W</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault">μ</span><span class="mclose">]</span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">subject to </span></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.05556em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">θ</span><span class="mopen">(</span><span class="mord mathdefault">t</span><span class="mclose">)</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="mopen">(</span><span class="mord">0</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.22222em;">Y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:-0.22222em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">.</span></span></span></span></span></p>
<ul>
<li><a href="https://arxiv.org/abs/1908.10920">Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective</a></li>
<li><a href="http://proceedings.mlr.press/v80/li18b/li18b.pdf">An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks</a></li>
<li><a href="https://web.stanford.edu/~yplu/DynamicOCNN.pdf">Dynamic System and Optimal Control Perspective of Deep Learning</a></li>
<li><a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=1751636">A Flexible Optimal Control Framework for Efficient Training of Deep Neural Networks</a></li>
<li><a href="https://arxiv.org/pdf/1904.05657.pdf">Deep learning as optimal control problems: models and numerical methods</a></li>
<li><a href="https://deepai.org/publication/a-mean-field-optimal-control-formulation-of-deep-learning">A Mean-Field Optimal Control Formulation of Deep Learning</a></li>
<li><a href="http://scriptedonachip.com/ml-control">Control Theory and Machine Learning</a></li>
<li><a href="https://faculty.sites.uci.edu/khargonekar/files/2018/04/Control_ML_AI_Final.pdf">Advancing Systems and Control Research in the Era of ML and AI</a></li>
<li><a href="http://marcogallieri.micso.it/Home.html">http://marcogallieri.micso.it/Home.html</a></li>
<li><a href="http://www.eventideib.polimi.it/events/deep-learning-meets-control-theory-research-at-nnaisense-and-polimi/">Deep Learning meets Control Theory: Research at NNAISENSE and Polimi</a></li>
<li><a href="https://github.com/lakehanne/awesome-neurocontrol">Machine Learning-based Control</a></li>
<li><a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=1751636">CAREER: A Flexible Optimal Control Framework for Efficient Training of Deep Neural Networks</a></li>
<li><a href="https://www.zhihu.com/question/315809187/answer/623687046">https://www.zhihu.com/question/315809187/answer/623687046</a></li>
<li><a href="https://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR19-FOCNet.pdf">https://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR19-FOCNet.pdf</a></li>
</ul>
<h3 id="neural-ordinary-differential-equations">Neural Ordinary Differential Equations</h3>
<p><code>Neural ODE</code></p>
<ul>
<li><a href="http://www.cs.toronto.edu/~rtqichen/pdfs/neural_ode_slides.pdf">Neural Ordinary Differential Equations</a></li>
</ul>
<img src="https://rkevingibson.github.io/img/ode_networks_1.png" width="80%" />
<ul>
<li><a href="https://www.arxiv-vanity.com/papers/1908.03190/">NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data</a></li>
<li><a href="https://rajatvd.github.io/Neural-ODE-Adversarial/">Neural Ordinary Differential Equations and Adversarial Attacks</a></li>
<li><a href="http://ganguli-gang.stanford.edu/">Neural Dynamics and Computation Lab</a></li>
<li><a href="https://arxiv.org/abs/1908.03190">NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data</a></li>
<li><a href="https://math.ethz.ch/sam/research/reports.html?year=2019">https://math.ethz.ch/sam/research/reports.html?year=2019</a></li>
</ul>
<h2 id="dynamics-and-deep-learning">Dynamics and Deep Learning</h2>
<ul>
<li><a href="http://roseyu.com/">http://roseyu.com/</a></li>
<li><a href="https://link.springer.com/article/10.1007/s40304-017-0103-z">A Proposal on Machine Learning via Dynamical Systems</a></li>
<li><a href="http://www.scholarpedia.org/article/Attractor_network">http://www.scholarpedia.org/article/Attractor_network</a></li>
<li><a href="http://proceedings.mlr.press/v37/jozefowicz15.pdf">An Empirical Exploration of Recurrent Network Architectures</a></li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3984152/">An Attractor-Based Complexity Measurement for Boolean Recurrent Neural Networks</a></li>
<li><a href="https://doaj.org/article/9d9172e9bf324cc6ac6d48ff8e234a85">Deep learning for universal linear embeddings of nonlinear dynamics</a></li>
<li><a href="http://ganguli-gang.stanford.edu/pdf/DynamLearn.pdf">Exact solutions to the nonlinear dynamics of learning in deep linear neural networks</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S0925231213009338">Continuous attractors of higher-order recurrent neural networks with infinite neurons</a></li>
<li><a href="https://www.researchgate.net/profile/Jiali_Yu3">https://www.researchgate.net/profile/Jiali_Yu3</a></li>
<li><a href="https://cbmm.mit.edu/sites/default/files/publications/aaai-abstract%20%281%29.pdf">Markov Transitions between Attractor States in a Recurrent Neural Network</a></li>
<li><a href="https://sagarverma.github.io/others/lit_rev_physics.pdf">A Survey on Machine Learning Applied to Dynamic Physical Systems</a></li>
<li><a href="https://deepdrive.berkeley.edu/project/dynamical-view-machine-learning-systems">https://deepdrive.berkeley.edu/project/dynamical-view-machine-learning-systems</a></li>
</ul>
<h3 id="stability-for-neural-networks">Stability For Neural Networks</h3>
<ul>
<li><a href="https://folk.uio.no/vegarant/">https://folk.uio.no/vegarant/</a></li>
<li><a href="https://www.mn.uio.no/math/english/people/aca/vegarant/index.html">https://www.mn.uio.no/math/english/people/aca/vegarant/index.html</a></li>
<li><a href="https://arxiv.org/pdf/1710.11029.pdf">https://arxiv.org/pdf/1710.11029.pdf</a></li>
<li><a href="http://www.vision.jhu.edu/tutorials/ICCV15-Tutorial-Math-Deep-Learning-Raja.pdf">http://www.vision.jhu.edu/tutorials/ICCV15-Tutorial-Math-Deep-Learning-Raja.pdf</a></li>
<li><a href="https://arxiv.org/abs/1705.03341">https://arxiv.org/abs/1705.03341</a></li>
<li><a href="https://izmailovpavel.github.io/">https://izmailovpavel.github.io/</a></li>
<li><a href="https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zheng_Improving_the_Robustness_CVPR_2016_paper.pdf">https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zheng_Improving_the_Robustness_CVPR_2016_paper.pdf</a></li>
</ul>
<h2 id="differential-equation-and-deep-learning">Differential Equation and Deep Learning</h2>
<p>This section is on how to use deep learning or more general machine learning to solve  differential equation numerically.</p>
<p>We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations.
In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates
which are significantly superior to those provided by classical approximation results.
We use this low dimensionality to guarantee the existence of a reduced basis.
<a href="https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/Parametric_PDEs_and_NNs_.pdf">Then, for a large variety of parametric partial differential equations, we construct neural networks that yield approximations of the parametric maps not suffering from a curse of dimension and essentially only depending on the size of the reduced basis.</a></p>
<ul>
<li><a href="https://math.ethz.ch/sam/research/reports.html?year=2019">https://math.ethz.ch/sam/research/reports.html?year=2019</a></li>
<li><a href="https://aimath.org/workshops/upcoming/deeppde/">https://aimath.org/workshops/upcoming/deeppde/</a></li>
<li><a href="https://github.com/IBM/pde-deep-learning">https://github.com/IBM/pde-deep-learning</a></li>
<li><a href="https://arxiv.org/abs/1804.04272">https://arxiv.org/abs/1804.04272</a></li>
<li><a href="https://deepai.org/machine-learning/researcher/weinan-e">https://deepai.org/machine-learning/researcher/weinan-e</a></li>
<li><a href="https://deepxde.readthedocs.io/en/latest/">https://deepxde.readthedocs.io/en/latest/</a></li>
<li><a href="https://github.com/IBM/pde-deep-learning">https://github.com/IBM/pde-deep-learning</a></li>
<li><a href="https://github.com/ZichaoLong/PDE-Net">https://github.com/ZichaoLong/PDE-Net</a></li>
<li><a href="https://github.com/amkatrutsa/DeepPDE">https://github.com/amkatrutsa/DeepPDE</a></li>
<li><a href="https://github.com/maziarraissi/DeepHPMs">https://github.com/maziarraissi/DeepHPMs</a></li>
<li><a href="https://github.com/markovmodel/deeptime">https://github.com/markovmodel/deeptime</a></li>
<li><a href="https://arxiv.org/abs/1801.06637">Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations</a></li>
<li><a href="https://rse-lab.cs.washington.edu/papers/spnets2018.pdf">SPNets: Differentiable Fluid Dynamics for Deep Neural Networks</a></li>
<li><a href="https://maziarraissi.github.io/DeepHPMs/">https://maziarraissi.github.io/DeepHPMs/</a></li>
<li><a href="https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/Parametric_PDEs_and_NNs_.pdf">A Theoretical Analysis of Deep Neural Networks and Parametric PDEs</a></li>
<li><a href="http://ins.sjtu.edu.cn:3300/conferences/7/talks/314">Deep Approximation via Deep Learning</a></li>
</ul>
<h3 id="deep-learning-for-pdes">Deep Learning for PDEs</h3>
<ul>
<li><a href="https://link.springer.com/article/10.1007/s40304-018-0127-z">The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems</a></li>
<li><a href="http://utstat.toronto.edu/~ali/papers/PDEandDeepLearning.pdf">Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S0021999118305527">DGM: A deep learning algorithm for solving partial differential equations</a></li>
<li><a href="https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq">NeuralNetDiffEq.jl: A Neural Network solver for ODEs</a></li>
<li><a href="https://www.pims.math.ca/scientific-event/190722-pcssdlcm">PIMS CRG Summer School: Deep Learning for Computational Mathematics</a></li>
</ul>
<ul>
<li><a href="https://arxiv.org/abs/1806.07366">https://arxiv.org/abs/1806.07366</a></li>
<li><a href="https://mat.univie.ac.at/~grohs/">https://mat.univie.ac.at/~grohs/</a></li>
<li><a href="https://rse-lab.cs.washington.edu/">https://rse-lab.cs.washington.edu/</a></li>
<li><a href="http://www.ajentzen.de/">http://www.ajentzen.de/</a></li>
<li><a href="https://web.math.princeton.edu/~jiequnh/">https://web.math.princeton.edu/~jiequnh/</a></li>
</ul>
<h3 id="mathcal-h-matrix-and-deep-learning"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">H</mi></mrow><annotation encoding="application/x-tex">\mathcal H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathcal" style="margin-right:0.00965em;">H</span></span></span></span> matrix and deep learning</h3>
<p><a href="https://web.stanford.edu/~lexing/mnnh.pdf">In this work we introduce a new multiscale artificial neural network based on the structure of H-matrices. This network generalizes the latter to the nonlinear case by introducing a local deep neural network at each spatial scale. Numerical results indicate that the network is able to efficiently approximate discrete nonlinear maps obtained from discretized nonlinear partial differential equations, such as those arising from nonlinear Schodinger equations and the KohnSham density functional theory.</a></p>
<ul>
<li><a href="https://web.stanford.edu/~lexing/mnnh.pdf">A multiscale neural network based on hierarchical matrices</a></li>
<li><a href="https://link.springer.com/article/10.1007%2Fs40687-019-0183-3">A multiscale neural network based on hierarchical nested bases</a></li>
</ul>
<p><a href="https://www.researchgate.net/project/Mathematical-Theory-for-Deep-Neural-Networks">We aim to build a theoretical foundation for the analysis of deep neural networks to answer questions such as &quot;What are the correct approximation spaces for deep neural networks?&quot;, &quot;What is the advantage of deep versus shallow networks?&quot;, or &quot;To which extent are deep neural networks able to detect low dimensional structures in high dimensional data?&quot;.</a></p>
<ul>
<li><a href="https://www.researchgate.net/profile/Gitta_Kutyniok">https://www.researchgate.net/profile/Gitta_Kutyniok</a></li>
<li><a href="https://www.researchgate.net/project/Mathematical-Theory-for-Deep-Neural-Networks">https://www.researchgate.net/project/Mathematical-Theory-for-Deep-Neural-Networks</a></li>
<li><a href="https://www.academia-net.org/profil/prof-dr-gitta-kutyniok/1133890">https://www.academia-net.org/profil/prof-dr-gitta-kutyniok/1133890</a></li>
<li><a href="https://www.tu-berlin.de/index.php?id=168945">https://www.tu-berlin.de/index.php?id=168945</a></li>
<li><a href="https://www.math.tu-berlin.de/?108957">https://www.math.tu-berlin.de/?108957</a></li>
<li><a href="https://arxiv.org/abs/1801.05894">Deep Learning: An Introduction for Applied Mathematicians</a></li>
</ul>
<h3 id="stochastic-differential-equations-and-deep-learning">Stochastic Differential Equations and Deep Learning</h3>
<ul>
<li><a href="http://www.stochasticlifestyle.com/neural-jump-sdes-jump-diffusions-and-neural-pdes/">Neural Jump SDEs (Jump Diffusions) and Neural PDEs</a></li>
<li><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3366314">Deep-Learning Based Numerical BSDE Method for Barrier Options</a></li>
<li><a href="https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2017/2017-49.pdf">Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations</a></li>
</ul>
<h3 id="finite-element-methods-and-deep-learning">Finite Element Methods and Deep Learning</h3>
<ul>
<li><a href="http://www.multigrid.org/index.php?id=13">http://www.multigrid.org/index.php?id=13</a></li>
<li><a href="http://casopisi.junis.ni.ac.rs/index.php/FUMechEng/article/view/309">http://casopisi.junis.ni.ac.rs/index.php/FUMechEng/article/view/309</a></li>
<li><a href="http://people.math.sc.edu/imi/DASIV/">http://people.math.sc.edu/imi/DASIV/</a></li>
<li><a href="https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-07.pdf">Deep ReLU Networks and High-Order Finite Element Methods</a></li>
<li><a href="https://math.psu.edu/events/35992">https://math.psu.edu/events/35992</a></li>
<li><a href="https://olemiss.edu/sciencenet/trefftz/Trefftz/Exeter/Javadi.pdf">Neural network for constitutive modelling in finite element analysis</a></li>
<li><a href="https://arxiv.org/abs/1807.03973">https://arxiv.org/abs/1807.03973</a></li>
<li><a href="https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0844">A deep learning approach to estimate stress distribution: a fast and accurate surrogate of finite-element analysis</a></li>
<li><a href="https://repository.tudelft.nl/islandora/object/uuid%3A615f2151-bcae-4e78-a2cb-3f1891a28275">An Integrated Machine Learning and Finite Element Analysis Framework, Applied to Composite Substructures including Damage</a></li>
<li><a href="https://github.com/oleksiyskononenko/mlfem">https://github.com/oleksiyskononenko/mlfem</a></li>
<li><a href="https://people.math.gatech.edu/~wliao60/">https://people.math.gatech.edu/~wliao60/</a></li>
<li><a href="https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/main.pdf">https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/main.pdf</a></li>
</ul>
<h2 id="approximation-theory-for-deep-learning">Approximation Theory for Deep Learning</h2>
<p>Universal approximation theory show the expression power of deep neural network of some wide while shallow neural network.
The section will extend the approximation to the deep neural network.</p>
<p><a href="https://epubs.siam.org/doi/pdf/10.1137/18M118709X">We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>L</mi><mn>2</mn></msup><mo stretchy="false">(</mo><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L^2(\mathbb R^d)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.099108em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathdefault">L</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.849108em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">d</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy.</a></p>
<ul>
<li><a href="https://arxiv.org/abs/1901.02220">Deep Neural Network Approximation Theory</a></li>
<li><a href="https://cpb-us-w2.wpmucdn.com/blog.nus.edu.sg/dist/d/11132/files/2019/07/paper_cnn_copy.pdf">Approximation Analysis of Convolutional Neural Networks</a></li>
<li><a href="https://arxiv.org/abs/1608.03287">Deep vs. shallow networks : An approximation theory perspective</a></li>
<li><a href="https://arxiv.org/abs/1901.02220">Deep Neural Network Approximation Theory</a></li>
<li><a href="https://cpsc.yale.edu/sites/default/files/files/tr1513(1).pdf">Provable approximation properties for deep neural networks</a></li>
<li><a href="https://epubs.siam.org/doi/pdf/10.1137/18M118709X">Optimal Approximation with Sparsely Connected Deep Neural Networks</a></li>
<li><a href="http://helper.ipam.ucla.edu/publications/dlt2018/dlt2018_14936.pdf">Deep Learning: Approximation of Functions by Composition</a></li>
<li><a href="http://www.mit.edu/~9.520/fall16/Classes/deep_approx.html">Deep Neural Networks: Approximation Theory and Compositionality</a></li>
<li><a href="http://voigtlaender.xyz/DNNBonnHandout.pdf">DNN Bonn</a></li>
<li><a href="http://npfsa2017.uni-jena.de/l_notes/vybiral.pdf">From approximation theory to machine learning</a></li>
<li><a href="https://arxiv.org/abs/1808.04947">Collapse of Deep and Narrow Neural Nets</a></li>
<li><a href="https://www.math.tamu.edu/~foucart/publi/DDFHP.pdf">Nonlinear Approximation and (Deep) ReLU Networks</a></li>
<li><a href="http://www.ipam.ucla.edu/abstract/?tid=15953&amp;pcode=GLWS3">Deep Approximation via Deep Learning</a></li>
<li><a href="https://github.com/loliverhennigh/Steady-State-Flow-With-Neural-Nets">Convolutional Neural Networks for Steady Flow Approximation</a></li>
<li><a href="https://www.eurandom.tue.nl/wp-content/uploads/2018/11/Johannes-Schmidt-Hieber-lecture-1-2.pdf">https://www.eurandom.tue.nl/wp-content/uploads/2018/11/Johannes-Schmidt-Hieber-lecture-1-2.pdf</a></li>
<li><a href="https://arxiv.org/abs/2006.00294">https://arxiv.org/abs/2006.00294</a></li>
<li><a href="https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-64_fp.pdf">Efficient approximation of high-dimensional functions with deep neural networks</a></li>
</ul>
<h4 id="workshop">Workshop</h4>
<ul>
<li><a href="https://www.mfo.de/occasion/1842b">https://www.mfo.de/occasion/1842b</a></li>
<li><a href="https://www.mfo.de/occasion/1947a">https://www.mfo.de/occasion/1947a</a></li>
<li><a href="https://github.com/juliusberner/oberwolfach_workshop">https://github.com/juliusberner/oberwolfach_workshop</a></li>
<li><a href="https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Petersen/DGD_Approximation_Theory.pdf">DGD Approximation Theory Workshop</a></li>
</ul>
<h4 id="labs-and-groups">Labs and Groups</h4>
<ul>
<li><a href="https://deepai.org/profile/julius-berner">https://deepai.org/profile/julius-berner</a></li>
<li><a href="https://www.cityu.edu.hk/ma/people/profile/zhoudx.htm">https://www.cityu.edu.hk/ma/people/profile/zhoudx.htm</a></li>
<li><a href="https://dblp.uni-trier.de/pers/hd/y/Yang:Haizhao">https://dblp.uni-trier.de/pers/hd/y/Yang:Haizhao</a></li>
<li><a href="https://math.duke.edu/people/ingrid-daubechies">https://math.duke.edu/people/ingrid-daubechies</a></li>
<li><a href="http://www.pc-petersen.eu/">http://www.pc-petersen.eu/</a></li>
<li><a href="https://wwwhome.ewi.utwente.nl/~schmidtaj/">https://wwwhome.ewi.utwente.nl/~schmidtaj/</a></li>
<li><a href="https://personal-homepages.mis.mpg.de/montufar/">https://personal-homepages.mis.mpg.de/montufar/</a></li>
<li><a href="https://www.math.tamu.edu/~foucart/">https://www.math.tamu.edu/~foucart/</a></li>
<li><a href="http://www.damtp.cam.ac.uk/user/sl767/#about">http://www.damtp.cam.ac.uk/user/sl767/#about</a></li>
<li><a href="http://voigtlaender.xyz/publications.html">http://voigtlaender.xyz/publications.html</a></li>
</ul>
<h3 id="the-f-principle">The F-Principle</h3>
<blockquote>
<p>Understanding the training process of Deep Neural Networks (DNNs) is a fundamental problem in the area of deep learning. The study of the training process from the frequency perspective makes important progress in understanding the strength and weakness of DNN, such as generalization and converging speed etc., which may consist in “a reasonably complete picture about the main reasons behind the success of modern machine learning” (E et al., 2019).</p>
</blockquote>
<blockquote>
<p>The “Frequency Principle” was first named in the paper (Xu et al., 2018), then (Xu 2018; Xu et al., 2019) use more convincing experiments and a simple theory to demonstrate the university of the Frequency Principle. Bengio's paper (Rahaman et al., 2019) also uses the the simple theory in (Xu 2018; Xu et al., 2019) to understand the mechanism underlying the Frequency Principle for ReLU activation function. Note that the second version of Rahaman et al., (2019) points out this citation clearly but they reorganize this citation to “related works” in the final version. Later, Luo et al., (2019) studies the Frequency Principle in the general setting of deep neural networks and mathematically proves Frequency Principle with the assumption of infinite samples. Zhang et al., (2019) study the Frequency Principle in the NTK regime with finite sample points. Zhang et al., (2019) explicitly shows that the converging speed for each frequency and can accurately predict the learning results.</p>
</blockquote>
<p><a href="https://www.researchgate.net/project/Deep-learning-in-Fourier-domain">We aim to develop a theoretical framework on Fourier domain to analyze the Deep Neural Network (DNN) training process and understand the DNN generalization. We exemplified our theoretical results through DNNs fitting 1-d functions and the MNIST dataset.</a></p>
<ul>
<li><a href="https://www.researchgate.net/project/Deep-learning-in-Fourier-domain">Deep learning in Fourier domain</a></li>
<li><a href="http://ins.sjtu.edu.cn:3300/conferences/7/talks/319">Deep Learning Theory: The F-Principle and An Optimization Framework</a></li>
<li><a href="https://arxiv.org/abs/1901.06523">Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks</a></li>
<li><a href="https://arxiv.org/abs/1811.01316">Nonlinear Collaborative Scheme for Deep Neural Networks</a></li>
<li><a href="https://arxiv.org/abs/1906.00425">The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies</a></li>
<li><a href="https://arxiv.org/abs/1811.10146">Frequency Principle in Deep Learning with General Loss Functions and Its Potential Application</a></li>
<li><a href="https://arxiv.org/pdf/1906.09235v1.pdf">Theory of the Frequency Principle for General Deep Neural Networks</a></li>
<li><a href="https://arxiv.org/pdf/1905.10264.pdf">Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks</a></li>
<li><a href="https://www.researchgate.net/profile/Zhiqin_Xu">https://www.researchgate.net/profile/Zhiqin_Xu</a></li>
<li><a href="https://github.com/xuzhiqin1990/F-Principle">https://github.com/xuzhiqin1990/F-Principle</a></li>
<li><a href="https://ins.sjtu.edu.cn/people/xuzhiqin/">https://ins.sjtu.edu.cn/people/xuzhiqin/</a></li>
</ul>
<h2 id="inverse-problem-and-deep-learning">Inverse Problem and Deep Learning</h2>
<p><a href="https://deep-inverse.org/">There is a long history of algorithmic development for solving inverse problems arising in sensing and imaging systems and beyond.
Examples include medical and computational imaging, compressive sensing, as well as community detection in networks. Until recently,
most algorithms for solving inverse problems in the imaging and network sciences were based on static signal models derived from physics or intuition,
such as wavelets or sparse representations.</a></p>
<p><a href="https://deep-inverse.org/">Today</a>, the best performing approaches for the aforementioned image reconstruction and sensing problems are based on deep learning,
which learn various elements of the method including
i) signal representations,
ii) stepsizes and parameters of iterative algorithms,
iii) regularizers, and iv) entire inverse functions.
For example, it has recently been shown that solving a variety of inverse problems by transforming an iterative, physics-based algorithm into a deep network
whose parameters can be learned from training data, offers faster convergence and/or a better quality solution.
Moreover, even with very little or no learning, deep neural networks enable superior performance for classical linear inverse problems
such as denoising and compressive sensing. Motivated by those success stories, researchers are redesigning traditional imaging and sensing systems.</p>
<ul>
<li><a href="https://earthscience.rice.edu/mathx2019/">MATH + X SYMPOSIUM ON INVERSE PROBLEMS AND DEEP LEARNING IN SPACE EXPLORATION</a></li>
</ul>
<ul>
<li><a href="http://cpaior2019.uowm.gr/">Sixteenth International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research</a></li>
<li><a href="https://github.com/mughanibu/Deep-Learning-for-Inverse-Problems">https://github.com/mughanibu/Deep-Learning-for-Inverse-Problems</a></li>
<li><a href="https://cv.snu.ac.kr/research/VDSR/">Accurate Image Super-Resolution Using Very Deep Convolutional Networks</a></li>
<li><a href="https://earthscience.rice.edu/mathx2019/">https://earthscience.rice.edu/mathx2019/</a></li>
<li><a href="https://www.researchgate.net/publication/329395098_On_Deep_Learning_for_Inverse_Problems">https://www.researchgate.net/publication/329395098_On_Deep_Learning_for_Inverse_Problems</a></li>
<li><a href="https://www.dlip.org/">Deep Learning and Inverse Problem</a></li>
<li><a href="https://www.scec.org/publication/8768">https://www.scec.org/publication/8768</a></li>
<li><a href="https://amds123.github.io/">https://amds123.github.io/</a></li>
<li><a href="https://github.com/IPAIopen">https://github.com/IPAIopen</a></li>
<li><a href="https://imaginary.org/snapshot/deep-learning-and-inverse-problems">https://imaginary.org/snapshot/deep-learning-and-inverse-problems</a></li>
<li><a href="https://www.researchgate.net/scientific-contributions/2150388821_Jaweria_Amjad">https://www.researchgate.net/scientific-contributions/2150388821_Jaweria_Amjad</a></li>
<li><a href="https://zif.ai/inverse-reinforcement-learning/">https://zif.ai/inverse-reinforcement-learning/</a></li>
<li><a href="https://kailaix.github.io/ADCMESlides/Inverse.pdf">Physics Based Machine Learning for Inverse Problems</a></li>
<li><a href="https://www.ece.nus.edu.sg/stfpage/elechenx/Papers/TGRS_Learning.pdf">https://www.ece.nus.edu.sg/stfpage/elechenx/Papers/TGRS_Learning.pdf</a></li>
</ul>
<h3 id="deep-learning-for-inverse-problems">Deep Learning for Inverse Problems</h3>
<ul>
<li><a href="https://arxiv.org/abs/1803.00092">Deep Learning for Inverse Problems</a></li>
<li><a href="https://deep-inverse.org/">Solving inverse problems with deep networks</a></li>
<li><a href="https://arxiv.org/abs/1901.03707">Neumann Networks for Inverse Problems in Imaging</a></li>
<li><a href="https://deepai.org/publication/unsupervised-deep-learning-algorithm-for-pde-based-forward-and-inverse-problems">https://deepai.org/publication/unsupervised-deep-learning-algorithm-for-pde-based-forward-and-inverse-problems</a></li>
</ul>
<h3 id="deep-inverse-optimization">Deep Inverse Optimization</h3>
<ul>
<li><a href="https://github.com/tankconcordia/deep_inv_opt">deep inverse optimization</a></li>
<li><a href="https://ori.ox.ac.uk/deep-irl/">https://ori.ox.ac.uk/deep-irl/</a></li>
<li><a href="https://physai.sciencesconf.org/data/pages/perez_2019_03_Institut_Pascal_AI_and_Physics_noanim.pdf">https://physai.sciencesconf.org/data/pages/perez_2019_03_Institut_Pascal_AI_and_Physics_noanim.pdf</a></li>
</ul>
<h2 id="random-matrix-theory-and-deep-learning">Random Matrix Theory and Deep Learning</h2>
<p>Random matrix focus on the matrix, whose entities are sampled from  some specific probability distribution.
Weight matrices in deep nerual network are initialed in random.
However, the model is over-parametered and it is hard to verify the role of one individual parameter.</p>
<ul>
<li><a href="http://romaincouillet.hebfree.org/">http://romaincouillet.hebfree.org/</a></li>
<li><a href="https://zhenyu-liao.github.io/">https://zhenyu-liao.github.io/</a></li>
<li><a href="https://dionisos.wp.imt.fr/">https://dionisos.wp.imt.fr/</a></li>
<li><a href="https://project.inria.fr/paiss/">https://project.inria.fr/paiss/</a></li>
<li><a href="https://zhenyu-liao.github.io/activities/">https://zhenyu-liao.github.io/activities/</a></li>
<li><a href="https://arxiv.org/abs/1810.01075">Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning</a></li>
<li><a href="https://zhenyu-liao.github.io/pdf/pre/Matrix_talk_liao_handout.pdf">Recent Advances in Random Matrix Theory for Modern Machine Learning</a></li>
<li><a href="https://ir.library.louisville.edu/cgi/viewcontent.cgi?article=2227&amp;context=etd">Features extraction using random matrix theory</a></li>
<li><a href="https://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning.pdf">Nonlinear random matrix theory for deep learning</a></li>
<li><a href="https://arxiv.org/pdf/1702.05419.pdf">A RANDOM MATRIX APPROACH TO NEURAL NETWORKS</a></li>
<li><a href="http://proceedings.mlr.press/v48/couillet16.pdf">A Random Matrix Approach to Echo-State Neural Networks</a></li>
<li><a href="https://hal.archives-ouvertes.fr/hal-01962073">Harnessing neural networks: A random matrix approach</a></li>
<li><a href="https://www.csail.mit.edu/event/tensor-programs-swiss-army-knife-nonlinear-random-matrix-theory-deep-learning-and-beyond">Tensor Programs: A Swiss-Army Knife for Nonlinear Random Matrix Theory of Deep Learning and Beyond</a></li>
<li><a href="https://arxiv.org/abs/1902.04760">Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation</a></li>
<li><a href="http://www-math.mit.edu/~edelman/publications/random_matrix_theory_innovative.pdf">Random Matrix Theory and its Innovative Applications∗</a></li>
<li><a href="https://romaincouillet.hebfree.org/docs/conf/ELM_icassp.pdf">https://romaincouillet.hebfree.org/docs/conf/ELM_icassp.pdf</a></li>
<li><a href="https://romaincouillet.hebfree.org/docs/conf/NN_ICML.pdf">https://romaincouillet.hebfree.org/docs/conf/NN_ICML.pdf</a></li>
<li><a href="http://www.vision.jhu.edu/tutorials/CVPR16-Tutorial-Math-Deep-Learning-Raja.pdf">http://www.vision.jhu.edu/tutorials/CVPR16-Tutorial-Math-Deep-Learning-Raja.pdf</a></li>
<li><a href="https://www.lri.fr/TAU_seminars/videos/Romain_Couillet_12juin2017/talk_lri.pdf">A Random Matrix Framework for BigData Machine Learning</a></li>
</ul>
<h3 id="nonlinear-random-matrix-theory">Nonlinear Random Matrix Theory</h3>
<ul>
<li><a href="https://ai.google/research/pubs/pub46342">https://ai.google/research/pubs/pub46342</a></li>
<li><a href="http://people.cs.uchicago.edu/~pworah/nonlinear_rmt.pdf">http://people.cs.uchicago.edu/~pworah/nonlinear_rmt.pdf</a></li>
<li><a href="https://toc.csail.mit.edu/node/1314">A SWISS-ARMY KNIFE FOR NONLINEAR RANDOM MATRIX THEORY OF DEEP LEARNING AND BEYOND</a></li>
<li><a href="https://simons.berkeley.edu/talks/9-24-mahoney-deep-learning">https://simons.berkeley.edu/talks/9-24-mahoney-deep-learning</a></li>
<li><a href="https://cs.stanford.edu/people/mmahoney/">https://cs.stanford.edu/people/mmahoney/</a></li>
<li><a href="https://www.stat.berkeley.edu/~mmahoney/f13-stat260-cs294/">https://www.stat.berkeley.edu/~mmahoney/f13-stat260-cs294/</a></li>
<li><a href="https://arxiv.org/abs/1902.04760">https://arxiv.org/abs/1902.04760</a></li>
<li><a href="https://melaseddik.github.io/">https://melaseddik.github.io/</a></li>
<li><a href="https://thayafluss.github.io/">https://thayafluss.github.io/</a></li>
</ul>
<h2 id="deep-learning-and-optimal-transport">Deep learning and Optimal Transport</h2>
<p>Optimal transport (OT) provides a powerful and flexible way to compare probability measures,
of all shapes: absolutely continuous, degenerate, or discrete.
This includes of course point clouds, histograms of features, and more generally datasets, parametric densities or generative models.
Originally proposed by Monge in the eighteenth century,
<a href="http://otml17.marcocuturi.net/">this theory later led to Nobel Prizes for Koopmans and Kantorovich as well as Villani’s Fields Medal in 2010.</a></p>
<ul>
<li><a href="http://otml17.marcocuturi.net/">Optimal Transport &amp; Machine Learning</a></li>
<li><a href="https://people.math.osu.edu/memolitechera.1/courses/cse-topics-2018/">Topics on Optimal Transport in Machine Learning and Shape Analysis (OT.ML.SA)</a></li>
<li><a href="https://www-obelix.irisa.fr/files/2017/01/postdoc-Obelix.pdf">https://www-obelix.irisa.fr/files/2017/01/postdoc-Obelix.pdf</a></li>
<li><a href="http://www.cis.jhu.edu/~rvidal/talks/learning/StructuredFactorizations.pdf">http://www.cis.jhu.edu/~rvidal/talks/learning/StructuredFactorizations.pdf</a></li>
<li><a href="https://mc.ai/optimal-transport-theory-the-new-math-for-deep-learning/">https://mc.ai/optimal-transport-theory-the-new-math-for-deep-learning/</a></li>
<li><a href="https://www.louisbachelier.org/wp-content/uploads/2017/07/170620-ilb-presentation-gabriel-peyre.pdf">https://www.louisbachelier.org/wp-content/uploads/2017/07/170620-ilb-presentation-gabriel-peyre.pdf</a></li>
<li><a href="http://people.csail.mit.edu/davidam/">http://people.csail.mit.edu/davidam/</a></li>
<li><a href="https://www.birs.ca/events/2020/5-day-workshops/20w5126">https://www.birs.ca/events/2020/5-day-workshops/20w5126</a></li>
<li><a href="https://github.com/hindupuravinash/nips2017">https://github.com/hindupuravinash/nips2017</a></li>
<li><a href="https://arxiv.org/abs/1905.09076v1">Selection dynamics for deep neural networks</a></li>
<li><a href="https://people.math.osu.edu/memolitechera.1/index.html">https://people.math.osu.edu/memolitechera.1/index.html</a></li>
</ul>
<h3 id="generative-models-and-optimal-transport">Generative Models and Optimal Transport</h3>
<ul>
<li><a href="https://www.researchgate.net/publication/317378242_GAN_and_VAE_from_an_Optimal_Transport_Point_of_View">https://www.researchgate.net/publication/317378242_GAN_and_VAE_from_an_Optimal_Transport_Point_of_View</a></li>
<li><a href="https://arxiv.org/abs/1710.05488">https://arxiv.org/abs/1710.05488</a></li>
<li><a href="http://www.dataguru.cn/article-14562-1.html">http://www.dataguru.cn/article-14562-1.html</a></li>
<li><a href="http://cmsa.fas.harvard.edu/wp-content/uploads/2018/06/David_Gu_Harvard.pdf">http://cmsa.fas.harvard.edu/wp-content/uploads/2018/06/David_Gu_Harvard.pdf</a></li>
<li><a href="http://www.dataguru.cn/article-14563-1.html">http://www.dataguru.cn/article-14563-1.html</a></li>
<li><a href="http://games-cn.org/games-webinar-20190509-93/">http://games-cn.org/games-webinar-20190509-93/</a></li>
<li><a href="https://www3.cs.stonybrook.edu/~gu/">https://www3.cs.stonybrook.edu/~gu/</a></li>
</ul>
<h2 id="geometric-analysis-approach-to-ai">Geometric Analysis Approach to AI</h2>
<p>Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective. In this paper we draw a geometric picture of the deep learning system by finding its analogies with two existing geometric structures, the geometry of quantum computations and the geometry of the diffeomorphic template matching.
In this framework, we give the geometric structures of different deep learning systems including convolutional neural networks, residual networks, recursive neural networks, recurrent neural networks and the equilibrium prapagation framework.
<a href="https://arxiv.org/pdf/1710.10784.pdf">We can also analysis the relationship between the geometrical structures and their performance of different networks in an algorithmic level so that the geometric framework may guide the design of the structures and algorithms of deep learning systems.</a></p>
<ul>
<li><a href="https://cse291-i.github.io/">Machine Learning on Geometrical Data CSE291-C00 - Winter 2019</a></li>
<li><a href="https://deep-geometry.github.io/abc-dataset/">ABC Dataset A Big CAD Model Dataset For Geometric Deep Learning</a></li>
<li><a href="https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/">Into the Wild: Machine Learning In Non-Euclidean Spaces</a></li>
<li><a href="https://arxiv.org/pdf/1710.10784.pdf">How deep learning works — The geometry of deep learning</a></li>
<li><a href="http://cmsa.fas.harvard.edu/geometric-analysis-ai/">http://cmsa.fas.harvard.edu/geometric-analysis-ai/</a></li>
<li><a href="http://inspirehep.net/record/1697651">http://inspirehep.net/record/1697651</a></li>
<li><a href="https://diglib.eg.org/handle/10.2312/2631996">https://diglib.eg.org/handle/10.2312/2631996</a></li>
<li><a href="http://ubee.enseeiht.fr/skelneton/">http://ubee.enseeiht.fr/skelneton/</a></li>
<li><a href="https://biomedicalimaging.org/2019/tutorials/">https://biomedicalimaging.org/2019/tutorials/</a></li>
<li><a href="http://valser.org/article-269-1.html">Geometric View to Deep Learning</a></li>
<li><a href="https://www.isi.edu/events/calendar/12459/">GEOMETRIC IDEAS IN MACHINE LEARNING: FROM DEEP LEARNING TO INCREMENTAL OPTIMIZATION</a></li>
<li><a href="https://cordis.europa.eu/project/rcn/214602/factsheet/en">Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks</a></li>
<li><a href="http://www.ipam.ucla.edu/programs/workshops/workshop-iv-deep-geometric-learning-of-big-data-and-applications/">Workshop IV: Deep Geometric Learning of Big Data and Applications</a></li>
<li><a href="https://gateway.newton.ac.uk/sites/default/files/asset/doc/1905/Alhussein_Fawzi.pdf">Robustness and geometry of deep neural networks</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S0167839618301249">A geometric view of optimal transportation and generative model</a></li>
<li><a href="https://mc.ai/optimal-transport-theory-the-new-math-for-deep-learning/">Optimal Transport Theory the New Math for Deep Learning</a></li>
<li><a href="http://openaccess.thecvf.com/content_CVPR_2019/papers/He_GeoNet_Deep_Geodesic_Networks_for_Point_Cloud_Analysis_CVPR_2019_paper.pdf">GeoNet: Deep Geodesic Networks for Point Cloud Analysis</a></li>
<li><a href="http://www.stat.uchicago.edu/~lekheng/">http://www.stat.uchicago.edu/~lekheng/</a></li>
<li><a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=1418255">https://www.nsf.gov/awardsearch/showAward?AWD_ID=1418255</a></li>
<li><a href="https://nsf-tripods.org/institutes/">https://nsf-tripods.org/institutes/</a></li>
<li><a href="https://users.math.msu.edu/users/wei/">https://users.math.msu.edu/users/wei/</a></li>
<li><a href="https://www.darpa.mil/program/hierarchical-identify-verify-exploit">https://www.darpa.mil/program/hierarchical-identify-verify-exploit</a></li>
<li><a href="http://www.tianranchen.org/research/papers/deep-linear.pdf">The Loss Surface Of Deep Linear Networks Viewed
Through The Algebraic Geometry Lens</a></li>
</ul>
<h3 id="tropical-geometry-of-deep-neural-networks">Tropical Geometry of Deep Neural Networks</h3>
<blockquote>
<p>The basic idea of tropical geometry is to study the same kinds of questions as in standard algebraic geometry, but change what we mean when we talk about ‘polynomial equations’.</p>
</blockquote>
<ul>
<li><a href="https://arxiv.org/pdf/1805.07091.pdf">Tropical Geometry of Deep Neural Networks</a></li>
<li><a href="https://opendatagroup.github.io/data%20science/2019/04/11/tropical-geometry.html">https://opendatagroup.github.io/data science/2019/04/11/tropical-geometry.html</a></li>
<li><a href="https://www.stat.uchicago.edu/~lekheng/">https://www.stat.uchicago.edu/~lekheng/</a></li>
<li><a href="https://mathsites.unibe.ch/siamag19/">https://mathsites.unibe.ch/siamag19/</a></li>
<li><a href="https://www.math.ubc.ca/~erobeva/seminar.html">https://www.math.ubc.ca/~erobeva/seminar.html</a></li>
<li><a href="https://sites.google.com/view/maag2019/home">https://sites.google.com/view/maag2019/home</a></li>
<li><a href="https://sites.google.com/site/feliper84/">https://sites.google.com/site/feliper84/</a></li>
<li><a href="https://deepai.org/publication/a-tropical-approach-to-neural-networks-with-piecewise-linear-activations">https://deepai.org/publication/a-tropical-approach-to-neural-networks-with-piecewise-linear-activations</a></li>
<li><a href="https://www.symbiont-project.org/events/Slides-2018-03/SYMBIONT-2018-03-zimmermann.pdf">ReLu and Maxout Networks and Their Possible Connections to Tropical Methods</a></li>
<li><a href="https://repository.kaust.edu.sa/bitstream/handle/10754/662473/Masters_Thesis%20(6).pdf?sequence=10">Applications of Tropical Geometry in Deep Neural Networks</a></li>
</ul>
<h2 id="topology-and-deep-learning">Topology and Deep Learning</h2>
<p><a href="https://arxiv.org/abs/1811.01122">We perform <code>topological data analysis</code></a>
on the internal states of convolutional deep neural networks to develop an understanding of the computations
that they perform. We apply this understanding to modify the computations so as to (a) speed up computations and (b) improve generalization
from one data set of digits to another.
One byproduct of the analysis is the production of a geometry on new sets of features on data sets of images,
and use this observation to develop a methodology for constructing analogues of CNN's for many other geometries,
including the graph structures constructed by topological data analysis.</p>
<ul>
<li><a href="http://topology.cs.wisc.edu/">Topological Methods for Machine Learning</a></li>
<li><a href="http://ai.stanford.edu/blog/topologylayer/">A Topology Layer for Machine Learning</a></li>
<li><a href="https://arxiv.org/abs/1811.01122">Topological Approaches to Deep Learning</a></li>
<li><a href="https://www.gaotingran.com/">https://www.gaotingran.com/</a></li>
<li><a href="https://users.math.msu.edu/users/wei/AIM.pdf">Topology based deep learning for biomolecular data</a></li>
<li><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005690">RESEARCH ARTICLE TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions</a></li>
<li><a href="https://arxiv.org/abs/1810.03234">Exposition and Interpretation of the Topology of Neural Networks</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/26515275">https://zhuanlan.zhihu.com/p/26515275</a></li>
<li><a href="https://arxiv.org/pdf/1608.07373.pdf">Applying Topological Persistence in Convolutional Neural Network for Music Audio Signals</a></li>
<li><a href="https://www.nature.com/articles/s42256-019-0087-3">Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning</a></li>
</ul>
<h3 id="topology-optimization-and--deep-learning">Topology Optimization and  Deep Learning</h3>
<ul>
<li><a href="http://www.inescporto.pt/~jsc/publications/conferences/2019RAraujoMICCAI.pdf">http://www.inescporto.pt/~jsc/publications/conferences/2019RAraujoMICCAI.pdf</a></li>
<li><a href="https://yangliang.github.io/pdf/ijcai19_to.pdf">https://yangliang.github.io/pdf/ijcai19_to.pdf</a></li>
<li><a href="https://www.dbs.ifi.lmu.de/~tresp/">https://www.dbs.ifi.lmu.de/~tresp/</a></li>
<li><a href="https://arxiv.org/abs/1901.04859">A Novel Topology Optimization Approach using Conditional Deep Learning</a></li>
<li><a href="https://arxiv.org/ftp/arxiv/papers/1901/1901.07761.pdf">https://arxiv.org/ftp/arxiv/papers/1901/1901.07761.pdf</a></li>
<li><a href="https://www.researchgate.net/publication/322568237_Deep_learning_for_determining_a_near-optimal_topological_design_without_any_iteration">https://www.researchgate.net/publication/322568237_Deep_learning_for_determining_a_near-optimal_topological_design_without_any_iteration</a></li>
<li><a href="https://eprints.lib.hokudai.ac.jp/dspace/bitstream/2115/74695/1/MAGCON-18-11-1747-final.pdf">Topology Optimization Accelerated by Deep Learning</a></li>
</ul>
<h2 id="algebra-and-deep-learning">Algebra and Deep Learning</h2>
<p>Except the matrix and tensor decomposotion for accelerating the deep neural network, <code>Tensor network</code> is close to deep learning model.</p>
<ul>
<li><a href="http://people.cs.uchicago.edu/~risi/">http://people.cs.uchicago.edu/~risi/</a></li>
<li><a href="https://ttic.uchicago.edu/~shubhendu/">https://ttic.uchicago.edu/~shubhendu/</a></li>
</ul>
<h3 id="tensor-network">Tensor network</h3>
<ul>
<li><a href="http://math.mit.edu/~gs/learningfromdata/">Linear Algebra and Learning from Data</a></li>
<li><a href="https://jacobgil.github.io/deeplearning/tensor-decompositions-deep-learning">Accelerating deep neural networks with tensor decompositions</a></li>
<li><a href="http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10605.pdf">An Algebraic Perspective on Deep Learning</a></li>
<li><a href="https://arxiv.org/abs/1708.00006">Tensor Networks in a Nutshell</a></li>
<li><a href="https://github.com/google/tensornetwork">A library for easy and efficient manipulation of tensor networks.</a></li>
<li><a href="http://tensornetworktheory.org/">http://tensornetworktheory.org/</a></li>
<li><a href="https://www.perimeterinstitute.ca/research/research-initiatives/tensor-networks-initiative">https://www.perimeterinstitute.ca/research/research-initiatives/tensor-networks-initiative</a></li>
<li><a href="https://github.com/emstoudenmire/TNML">https://github.com/emstoudenmire/TNML</a></li>
<li><a href="http://itensor.org/">http://itensor.org/</a></li>
<li><a href="http://users.cecs.anu.edu.au/~koniusz/">http://users.cecs.anu.edu.au/~koniusz/</a></li>
<li><a href="https://deep-learning-tensorflow.readthedocs.io/en/latest/">https://deep-learning-tensorflow.readthedocs.io/en/latest/</a></li>
</ul>
<h3 id="group-equivariant-convolutional-networks">Group Equivariant Convolutional Networks</h3>
<ul>
<li><a href="https://github.com/tscohen/gconv_experiments">https://github.com/tscohen/gconv_experiments</a></li>
<li><a href="http://dalimeeting.org/dali2019b/workshop-05-02.html">http://dalimeeting.org/dali2019b/workshop-05-02.html</a></li>
<li><a href="https://erikbekkers.bitbucket.io/">https://erikbekkers.bitbucket.io/</a></li>
<li><a href="https://staff.fnwi.uva.nl/m.welling/">https://staff.fnwi.uva.nl/m.welling/</a></li>
<li><a href="https://www.ics.uci.edu/~welling/">https://www.ics.uci.edu/~welling/</a></li>
<li><a href="http://ibis.t.u-tokyo.ac.jp/suzuki/">http://ibis.t.u-tokyo.ac.jp/suzuki/</a></li>
<li><a href="http://www.mit.edu/~kawaguch/">http://www.mit.edu/~kawaguch/</a></li>
<li><a href="https://www.4tu.nl/ami/en/Agenda-Events/">https://www.4tu.nl/ami/en/Agenda-Events/</a></li>
</ul>
<h3 id="complex-valued-neural-networks">Complex Valued Neural Networks</h3>
<p>Aizenberg, Ivaskiv, Pospelov and Hudiakov (1971) (former Soviet Union) proposed a complex-valued neuron model for the first time, and although it was only available in Russian literature, their work can now be read in English (Aizenberg, Aizenberg &amp; Vandewalle, 2000). Prior to that time, most researchers other than Russians had assumed that the first persons to propose a complex-valued neuron were Widrow, McCool and Ball (1975). Interest in the field of neural networks started to grow around 1990, and various types of complex-valued neural network models were subsequently proposed. Since then, their characteristics have been researched, making it possible to solve some problems which could not be solved with the real-valued neuron, and to solve many complicated problems more simply and efficiently.</p>
<ul>
<li><a href="http://what-when-how.com/artificial-intelligence/complex-valued-neural-networks-artificial-intelligence/">http://what-when-how.com/artificial-intelligence/complex-valued-neural-networks-artificial-intelligence/</a></li>
</ul>
<blockquote>
<p>The complex-valued Neural Network is an extension of a (usual) real-valued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complex-valued function).</p>
</blockquote>
<ul>
<li><a href="https://staff.aist.go.jp/tohru-nitta/HNN.html">https://staff.aist.go.jp/tohru-nitta/HNN.html</a></li>
<li><a href="https://staff.aist.go.jp/tohru-nitta/CNN.html">https://staff.aist.go.jp/tohru-nitta/CNN.html</a></li>
<li><a href="https://github.com/ChihebTrabelsi/deep_complex_networks">https://github.com/ChihebTrabelsi/deep_complex_networks</a></li>
<li><a href="https://r2rt.com/beyond-binary-ternary-and-one-hot-neurons.html">https://r2rt.com/beyond-binary-ternary-and-one-hot-neurons.html</a></li>
<li><a href="https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2011-42.pdf">https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2011-42.pdf</a></li>
<li><a href="https://www.microsoft.com/en-us/research/uploads/prod/2018/04/Deep-Complex-Networks.pdf">https://www.microsoft.com/en-us/research/uploads/prod/2018/04/Deep-Complex-Networks.pdf</a></li>
<li><a href="https://core.ac.uk/reader/41356536">https://core.ac.uk/reader/41356536</a></li>
</ul>
<h3 id="quaternion-neural-networks">Quaternion Neural Networks</h3>
<p><a href="https://www.simonwenkel.com/2019/07/15/Capsule-Networks-and-other-neural-architectures.html">It looks like Deep (Convolutional) Neural Networks are really powerful. </a>
However, there are situations where they don’t deliver as expected.
I assume that perhaps many are happy with pre-trained VGG, Resnet, YOLO, SqueezeNext, MobileNet, etc. models because they are “good enough”,
even though they break quite easily on really realistic problems and require tons of training data.
IMHO there are much smarter approaches out there,
which are neglected/ignored. I don’t want to argue why they are ignored but I want to provide a list with other useful architectures.</p>
<p>Instead of staying with real numbers, we should have a look at complex numbers as well.
Let’s remember the single reason why we use complex numbers (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi></mrow><annotation encoding="application/x-tex">C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07153em;">C</span></span></span></span>) or quaternions (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">H</mi></mrow><annotation encoding="application/x-tex">\mathcal H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathcal" style="margin-right:0.00965em;">H</span></span></span></span>).
The most important reason why we use complex numbers is not to solve <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>x</mi><mn>2</mn></msup><mo>=</mo><mtext>−</mtext><mn>1</mn></mrow><annotation encoding="application/x-tex">x^2=−1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">−</span><span class="mord">1</span></span></span></span>.
The reason why we use complex numbers for everything that involves waves etc. is that we are lazy or efficient ;).
Who wants to waste time writing down and solving a bunch of trignometric identities.
The same is true for quaternions in robotics. Speaking in terms of computer science,
we are using a much more efficient data structure/representation.
It seems like complex valued neural networks as well as quaternion,
which are a different kind of complex numbers for the mathematical correct reader of this post,
seem to outperform real valued neural networks while using less parameters. This makes sense because we are using a different data structure
<a href="https://www.simonwenkel.com/2019/07/15/Capsule-Networks-and-other-neural-architectures.html">that itself helps to represent certain things in a much more useful way.</a></p>
<ul>
<li><a href="https://arxiv.org/abs/1903.08478">https://arxiv.org/abs/1903.08478</a></li>
<li><a href="https://www.simonwenkel.com/projects/introduction-to-quaternion-neural-networks.html">Introduction to Quaternion Neural Networks</a></li>
<li><a href="https://www.simonwenkel.com/2019/07/15/Capsule-Networks-and-other-neural-architectures.html">Capsule Networks and other neural architectures that are less known</a></li>
<li><a href="https://github.com/Orkis-Research/Quaternion-Convolutional-Neural-Networks-for-End-to-End-Automatic-Speech-Recognition">https://github.com/Orkis-Research/Quaternion-Convolutional-Neural-Networks-for-End-to-End-Automatic-Speech-Recognition</a></li>
</ul>
<h2 id="probabilistic-theory-and-deep-learning">Probabilistic Theory and Deep Learning</h2>
<p><a href="https://www.manning.com/books/probabilistic-deep-learning-with-python">Probabilistic Deep Learning with Python teaches the increasingly popular probabilistic approach to deep learning that allows you to tune and refine your results more quickly and accurately without as much trial-and-error testing. Emphasizing practical techniques that use the Python-based Tensorflow Probability Framework, you’ll learn to build highly-performant deep learning applications that can reliably handle the noise and uncertainty of real-world data.</a></p>
<ul>
<li><a href="https://ankitlab.co/projects/">Probabilistic Framework for Deep Learning</a></li>
<li><a href="https://arxiv.org/abs/1504.00641">A Probabilistic Theory of Deep Learning</a></li>
<li><a href="https://papers.nips.cc/paper/6231-a-probabilistic-framework-for-deep-learning.pdf">A Probabilistic Framework for Deep Learning</a></li>
<li><a href="https://zpascal.net/cvpr2018/Gast_Lightweight_Probabilistic_Deep_CVPR_2018_paper.pdf">Lightweight Probabilistic Deep Networks</a></li>
<li><a href="https://arxiv.org/abs/1701.03757">Deep Probabilistic Programming</a></li>
<li><a href="https://github.com/oxmlcs/ML_bazaar/wiki/Deep-Learning-and-Probabilistic-Inference">https://github.com/oxmlcs/ML_bazaar/wiki/Deep-Learning-and-Probabilistic-Inference</a></li>
<li><a href="https://eng.uber.com/pyro/">https://eng.uber.com/pyro/</a></li>
<li><a href="https://www.manning.com/books/probabilistic-deep-learning-with-python">Probabilistic Deep Learning with Python</a></li>
<li><a href="https://livebook.manning.com/book/probabilistic-deep-learning/">https://livebook.manning.com/book/probabilistic-deep-learning/</a></li>
<li><a href="http://csml.stats.ox.ac.uk/">http://csml.stats.ox.ac.uk/</a></li>
<li><a href="https://fcai.fi/agile-probabilistic">https://fcai.fi/agile-probabilistic</a></li>
<li><a href="http://bayesiandeeplearning.org/2017/papers/59.pdf">http://bayesiandeeplearning.org/2017/papers/59.pdf</a></li>
<li><a href="https://arxiv.org/abs/1906.05264v2">GluonTS: Probabilistic Time Series Models in Python</a></li>
<li><a href="http://pages.cs.wisc.edu/~dpage/cs731/">CS 731: Advanced methods in artificial intelligence, with biomedical applications (Fall 2009)</a></li>
<li><a href="https://www.biostat.wisc.edu/~page/838.html">CS 838 (Spring 2004): Statistical Relational Learning</a></li>
<li><a href="https://www.ida.liu.se/~ulfni53/lpp/bok/bok.pdf">https://www.ida.liu.se/~ulfni53/lpp/bok/bok.pdf</a></li>
<li><a href="https://www.biostat.wisc.edu/bmi576/">https://www.biostat.wisc.edu/bmi576/</a></li>
<li><a href="http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html">http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html</a></li>
</ul>
<h3 id="bayesian-deep-learning">Bayesian Deep Learning</h3>
<p><a href="http://bayesiandeeplearning.org/">The abstract of Bayesian Deep learning</a> put that:</p>
<blockquote>
<p>While deep learning has been revolutionary for machine learning, most modern deep learning models cannot represent their uncertainty nor take advantage of the well studied tools of probability theory. This has started to change following recent developments of tools and techniques combining Bayesian approaches with deep learning. The intersection of the two fields has received great interest from the community over the past few years, with the introduction of new deep learning models that take advantage of Bayesian techniques, as well as Bayesian models that incorporate deep learning elements [1-11]. In fact, the use of Bayesian techniques in deep learning can be traced back to the 1990s’, in seminal works by Radford Neal [12], David MacKay [13], and Dayan et al. [14]. These gave us tools to reason about deep models’ confidence, and achieved state-of-the-art performance on many tasks. However earlier tools did not adapt when new needs arose (such as scalability to big data), and were consequently forgotten. Such ideas are now being revisited in light of new advances in the field, yielding many exciting new results
Extending on last year’s workshop’s success, this workshop will again study the advantages and disadvantages of such ideas, and will be a platform to host the recent flourish of ideas using Bayesian approaches in deep learning and using deep learning tools in Bayesian modelling. The program includes a mix of invited talks, contributed talks, and contributed posters. It will be composed of five themes: deep generative models, variational inference using neural network recognition models, practical approximate inference techniques in Bayesian neural networks, applications of Bayesian neural networks, and information theory in deep learning. Future directions for the field will be debated in a panel discussion.
This year’s main theme will focus on applications of Bayesian deep learning within machine learning and outside of it.</p>
</blockquote>
<ol>
<li>Kingma, DP and Welling, M, &quot;Auto-encoding variational Bayes&quot;, 2013.</li>
<li>Rezende, D, Mohamed, S, and Wierstra, D, &quot;Stochastic backpropagation and approximate inference in deep generative models&quot;, 2014.</li>
<li>Blundell, C, Cornebise, J, Kavukcuoglu, K, and Wierstra, D, &quot;Weight uncertainty in neural network&quot;, 2015.</li>
<li>Hernandez-Lobato, JM and Adams, R, &quot;Probabilistic backpropagation for scalable learning of Bayesian neural networks&quot;, 2015.</li>
<li>Gal, Y and Ghahramani, Z, &quot;Dropout as a Bayesian approximation: Representing model uncertainty in deep learning&quot;, 2015.</li>
<li>Gal, Y and Ghahramani, G, &quot;Bayesian convolutional neural networks with Bernoulli approximate variational inference&quot;, 2015.</li>
<li>Kingma, D, Salimans, T, and Welling, M. &quot;Variational dropout and the local reparameterization trick&quot;, 2015.</li>
<li>Balan, AK, Rathod, V, Murphy, KP, and Welling, M, &quot;Bayesian dark knowledge&quot;, 2015.</li>
<li>Louizos, C and Welling, M, “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors”, 2016.</li>
<li>Lawrence, ND and Quinonero-Candela, J, “Local distance preservation in the GP-LVM through back constraints”, 2006.</li>
<li>Tran, D, Ranganath, R, and Blei, DM, “Variational Gaussian Process”, 2015.</li>
<li>Neal, R, &quot;Bayesian Learning for Neural Networks&quot;, 1996.</li>
<li>MacKay, D, &quot;A practical Bayesian framework for backpropagation networks&quot;, 1992.</li>
<li>Dayan, P, Hinton, G, Neal, R, and Zemel, S, &quot;The Helmholtz machine&quot;, 1995.</li>
<li>Wilson, AG, Hu, Z, Salakhutdinov, R, and Xing, EP, “Deep Kernel Learning”, 2016.</li>
<li>Saatchi, Y and Wilson, AG, “Bayesian GAN”, 2017.</li>
<li>MacKay, D.J.C. “Bayesian Methods for Adaptive Models”, PhD thesis, 1992.</li>
</ol>
<hr>
<ul>
<li><a href="https://arxiv.org/abs/1608.06884">Towards Bayesian Deep Learning: A Framework and Some Existing Methods</a></li>
<li><a href="http://www.wanghao.in/mis.html">http://www.wanghao.in/mis.html</a></li>
<li><a href="https://github.com/junlulocky/bayesian-deep-learning-notes">https://github.com/junlulocky/bayesian-deep-learning-notes</a></li>
<li><a href="https://github.com/robi56/awesome-bayesian-deep-learning">https://github.com/robi56/awesome-bayesian-deep-learning</a></li>
<li><a href="https://alexgkendall.com/computer_vision/phd_thesis/">https://alexgkendall.com/computer_vision/phd_thesis/</a></li>
<li><a href="http://bayesiandeeplearning.org/">http://bayesiandeeplearning.org/</a></li>
<li><a href="https://ericmjl.github.io/bayesian-deep-learning-demystified/">https://ericmjl.github.io/bayesian-deep-learning-demystified/</a></li>
<li><a href="http://www.cs.ox.ac.uk/people/yarin.gal/website/blog.html">http://www.cs.ox.ac.uk/people/yarin.gal/website/blog.html</a></li>
<li><a href="http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/">http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/</a></li>
<li><a href="https://uvadlc.github.io/lectures/apr2019/lecture9-bayesiandeeplearning.pdf">https://uvadlc.github.io/lectures/apr2019/lecture9-bayesiandeeplearning.pdf</a></li>
</ul>
<h2 id="statistics-and-deep-learning">Statistics and Deep Learning</h2>
<p><a href="https://www.import.io/post/history-of-deep-learning/">A History of Deep Learning</a></p>
<blockquote>
<p>Mathematician Ivakhnenko and associates including Lapa arguably created the first working deep learning networks in 1965,
applying what had been only theories and ideas up to that point.</p>
<p>Ivakhnenko developed the Group Method of Data Handling (GMDH) –
defined as a “family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets
that features fully automatic structural and parametric optimization of models” –
and applied it to neural networks.</p>
<p>For that reason alone, many consider Ivakhnenko the father of modern deep learning.</p>
<p>His learning algorithms used deep feedforward multilayer perceptrons using statistical methods at each layer to find the best features and forward them through the system.</p>
<p>Using GMDH, Ivakhnenko was able to create an 8-layer deep network in 1971,
and he successfully demonstrated the learning process in a computer identification system called Alpha.</p>
</blockquote>
<ul>
<li><a href="https://zhuanlan.zhihu.com/p/36519666">https://zhuanlan.zhihu.com/p/36519666</a></li>
<li><a href="https://wwwhome.ewi.utwente.nl/~schmidtaj/">https://wwwhome.ewi.utwente.nl/~schmidtaj/</a></li>
<li><a href="http://csml.stats.ox.ac.uk/people/teh/">http://csml.stats.ox.ac.uk/people/teh/</a></li>
<li><a href="http://www.sdlcv-workshop.com/">http://www.sdlcv-workshop.com/</a></li>
<li><a href="https://gkunapuli.github.io/files/17rrbmILP-longslides.pdf">https://gkunapuli.github.io/files/17rrbmILP-longslides.pdf</a></li>
<li><a href="https://arxiv.org/abs/1810.07132">https://arxiv.org/abs/1810.07132</a></li>
<li><a href="https://dashee87.github.io/">https://dashee87.github.io/</a></li>
<li><a href="http://lear.inrialpes.fr/workshop/osl2015/">http://lear.inrialpes.fr/workshop/osl2015/</a></li>
<li><a href="http://www.stats.ox.ac.uk/~teh/">http://www.stats.ox.ac.uk/~teh/</a></li>
<li><a href="http://blog.shakirm.com/ml-series/a-statistical-view-of-deep-learning/">http://blog.shakirm.com/ml-series/a-statistical-view-of-deep-learning/</a></li>
<li><a href="http://blog.shakirm.com/wp-content/uploads/2015/07/SVDL.pdf">http://blog.shakirm.com/wp-content/uploads/2015/07/SVDL.pdf</a></li>
<li><a href="https://www.ijcai.org/Proceedings/2019/0789.pdf">https://www.ijcai.org/Proceedings/2019/0789.pdf</a></li>
<li><a href="http://www.stat.ucla.edu/~jxie/">http://www.stat.ucla.edu/~jxie/</a></li>
<li><a href="https://mifods.mit.edu/seminar.php">https://mifods.mit.edu/seminar.php</a></li>
<li><a href="https://johanneslederer.com/people/">https://johanneslederer.com/people/</a></li>
<li><a href="https://www.tsu.ge/data/file_db/faculty_zust_sabunebismetk/WEB%20updated%205.05.15-announcement.pdf">https://www.tsu.ge/data/file_db/faculty_zust_sabunebismetk/WEB updated 5.05.15-announcement.pdf</a></li>
<li><a href="http://www.stats.ox.ac.uk/~teh/research/jsm2019/OnStatisticalThinkinginDeepLearning.pdf">On Statistical Thinking in Deep Learning: A Talk</a></li>
<li><a href="http://bulletin.imstat.org/wp-content/uploads/ml-LONG_On_Statistical_Thinking_in_Deep_Learning.pdf">On Statistical Thinking in Deep Learning: A Blog Post</a></li>
<li><a href="http://ul.qucosa.de/api/qucosa%3A34703/attachment/ATT-0/">Implementing Bayesian Inference with Neural Networks</a></li>
</ul>
<h3 id="statistical-relational-ai">Statistical Relational AI</h3>
<p>Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems.
<a href="https://www.cs.umd.edu/srl-book/">Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty</a>
while incorporating tools from logic, databases, and programming languages to represent structure.
In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data.</p>
<ul>
<li><a href="https://gkunapuli.github.io/files/17rrbmILP-longslides.pdf">Statistical Relational AI Meets Deep Learning</a></li>
<li><a href="https://people.cs.kuleuven.be/~luc.deraedt/salvador.pdf">https://people.cs.kuleuven.be/~luc.deraedt/salvador.pdf</a></li>
<li><a href="http://www.starai.org/2020/">http://www.starai.org/2020/</a></li>
<li><a href="https://homes.cs.washington.edu/~pedrod/cikm13.html">https://homes.cs.washington.edu/~pedrod/cikm13.html</a></li>
<li><a href="https://www.cs.umd.edu/srl-book/">https://www.cs.umd.edu/srl-book/</a></li>
<li><a href="https://gkunapuli.github.io/">https://gkunapuli.github.io/</a></li>
<li><a href="https://aifrenz.github.io/">https://aifrenz.github.io/</a></li>
<li><a href="https://ipvs.informatik.uni-stuttgart.de/mlr/spp-wordpress/">https://ipvs.informatik.uni-stuttgart.de/mlr/spp-wordpress/</a></li>
<li><a href="https://personal.utdallas.edu/~sriraam.natarajan/Courses/starai.html">https://personal.utdallas.edu/~sriraam.natarajan/Courses/starai.html</a></li>
<li><a href="http://acai2018.unife.it/">http://acai2018.unife.it/</a></li>
<li><a href="https://www.biostat.wisc.edu/~page/838.html">https://www.biostat.wisc.edu/~page/838.html</a></li>
</ul>
<h3 id="principal-component-neural-networks">Principal Component Neural Networks</h3>
<p><a href="http://www.nlpca.org/">Nonlinear principal component analysis (NLPCA) is commonly seen as a nonlinear generalization of standard principal component analysis (PCA). It generalizes the principal components from straight lines to curves (nonlinear). Thus, the subspace in the original data space which is described by all nonlinear components is also curved.
Nonlinear PCA can be achieved by using a neural network with an autoassociative architecture also known as autoencoder, replicator network, bottleneck or sandglass type network. Such autoassociative neural network is a multi-layer perceptron that performs an identity mapping, meaning that the output of the network is required to be identical to the input. However, in the middle of the network is a layer that works as a bottleneck in which a reduction of the dimension of the data is enforced. This bottleneck-layer provides the desired component values (scores).</a></p>
<ul>
<li><a href="http://www.nlpca.org/">http://www.nlpca.org/</a></li>
<li><a href="http://users.ics.aalto.fi/~juha/papers/Generalizations_NN_1995.pdf">http://users.ics.aalto.fi/~juha/papers/Generalizations_NN_1995.pdf</a></li>
<li><a href="https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture18-pca.pdf">https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture18-pca.pdf</a></li>
<li><a href="http://www.vision.jhu.edu/teaching/learning/deeplearning19/assets/Baldi_Hornik-89.pdf">http://www.vision.jhu.edu/teaching/learning/deeplearning19/assets/Baldi_Hornik-89.pdf</a></li>
<li><a href="https://www.cs.purdue.edu/homes/dgleich/projects/pca_neural_nets_website/">https://www.cs.purdue.edu/homes/dgleich/projects/pca_neural_nets_website/</a></li>
<li><a href="https://rdrr.io/cran/caret/man/pcaNNet.html">https://rdrr.io/cran/caret/man/pcaNNet.html</a></li>
<li><a href="http://research.ics.aalto.fi/ica/book/">http://research.ics.aalto.fi/ica/book/</a></li>
</ul>
<h3 id="least-squares-support-vector-machines">Least squares support vector machines</h3>
<ul>
<li><a href="https://www.esat.kuleuven.be/sista/lssvmlab/">https://www.esat.kuleuven.be/sista/lssvmlab/</a></li>
<li><a href="https://zhenyu-liao.github.io/pdf/journal/LSSVM-TSP.pdf">https://zhenyu-liao.github.io/pdf/journal/LSSVM-TSP.pdf</a></li>
<li><a href="https://sci2s.ugr.es/keel/pdf/specific/articulo/vs04.pdf">https://sci2s.ugr.es/keel/pdf/specific/articulo/vs04.pdf</a></li>
<li><a href="https://www.esat.kuleuven.be/sista/members/suykens.html">https://www.esat.kuleuven.be/sista/members/suykens.html</a></li>
</ul>
<h2 id="information-theory-and-deep-learning">Information Theory and Deep Learning</h2>
<p><a href="https://lizhongresearch.miraheze.org/wiki/Understanding_the_Power_of_Neural_Networks">In short</a>,
Neural Networks extract from the data the most relevant part of the information that describes the statistical dependence between the features and the labels.
In other words, the size of a Neural Networks specifies a data structure that we can compute and store,
and the result of training the network is the best approximation of the statistical relationship between the features and the labels
that can be represented by this data structure.</p>
<p><a href="https://lids.mit.edu/news-and-events/events/information-theoretic-interpretation-deep-neural-networks">In this talk</a>,
we formulate a new problem called the &quot;universal feature selection&quot; problem,
where we need to select from the high dimensional data a low dimensional feature that can be used to solve, not one, but a family of inference problems.
We solve this problem by developing a new information metric that can be used to quantify the semantics of data, and by using a geometric analysis approach.
We then show that a number of concepts in information theory and statistics such as the HGR correlation and common information are closely connected to the universal feature selection problem.
At the same time, a number of learning algorithms, PCA, Compressed Sensing, FM, deep neural networks, etc., can also be interpreted as implicitly or explicitly solving the same problem, with various forms of constraints.</p>
<ul>
<li><a href="https://arxiv.org/pdf/1911.09105.pdf">Universal Features</a></li>
<li><a href="https://glouppe.github.io/info8010-deep-learning/">https://glouppe.github.io/info8010-deep-learning/</a></li>
<li><a href="http://ita.ucsd.edu/">http://ita.ucsd.edu/</a></li>
<li><a href="http://naftali-tishby.mystrikingly.com/">http://naftali-tishby.mystrikingly.com/</a></li>
<li><a href="http://lizhongzheng.mit.edu/">http://lizhongzheng.mit.edu/</a></li>
<li><a href="https://lizhongzheng.mit.edu/sites/default/files/documents/FFSE.pdf">https://lizhongzheng.mit.edu/sites/default/files/documents/FFSE.pdf</a></li>
<li><a href="https://adityashrm21.github.io/Information-Theory-In-Deep-Learning/">Information Theory of Deep Learning</a></li>
<li><a href="https://lilianweng.github.io/lil-log/2017/09/28/anatomize-deep-learning-with-information-theory.html">Anatomize Deep Learning with Information Theory</a></li>
<li><a href="https://jhui.github.io/2017/01/05/Deep-learning-Information-theory/">“Deep learning - Information theory &amp; Maximum likelihood.”</a></li>
<li><a href="https://www.rle.mit.edu/sia/wp-content/uploads/2019/07/2019-huang-xu-zheng-wornell.pdf">Information Theoretic Interpretation of Deep Neural Networks</a></li>
</ul>
<ul>
<li><a href="http://pirsa.org/18040050">http://pirsa.org/18040050</a></li>
<li><a href="https://lizhongresearch.miraheze.org/wiki/Main_Page">https://lizhongresearch.miraheze.org/wiki/Main_Page</a></li>
<li><a href="https://lizhongzheng.mit.edu/">https://lizhongzheng.mit.edu/</a></li>
<li><a href="https://lizhongresearch.m.miraheze.org/wiki/Main_Page">https://lizhongresearch.m.miraheze.org/wiki/Main_Page</a></li>
<li><a href="https://www.leiphone.com/news/201703/qzBcOeDYFHtYwgEq.html">https://www.leiphone.com/news/201703/qzBcOeDYFHtYwgEq.html</a></li>
<li><a href="http://nsfcbl.org/">http://nsfcbl.org/</a></li>
<li><a href="https://arxiv.org/abs/1506.05232v1">Large Margin Deep Neural Networks: Theory and Algorithms</a></li>
<li><a href="http://ai.stanford.edu/">http://ai.stanford.edu/</a></li>
<li><a href="https://www.math.ias.edu/wtdl">https://www.math.ias.edu/wtdl</a></li>
<li><a href="http://ai.ucsd.edu/~haosu/papers/thesis_finalversion.pdf">DEEP 3D REPRESENTATION LEARNING</a></li>
<li><a href="https://www.mis.mpg.de/ay/index.html">https://www.mis.mpg.de/ay/index.html</a></li>
<li><a href="https://www.math.ucdavis.edu/~strohmer/courses/180BigData/180BigData_info.html">Mathematical Algorithms for Artificial Intelligence and Big Data Analysis (Spring 2017)</a></li>
<li><a href="https://www.tbsi.edu.cn/index.php?s=/cms/181.html">https://www.tbsi.edu.cn/index.php?s=/cms/181.html</a></li>
<li><a href="https://www.bigr.io/deep-learning-neural-networks-iot/">https://www.bigr.io/deep-learning-neural-networks-iot/</a></li>
</ul>
<hr>
<ul>
<li><a href="https://www.ee.ucl.ac.uk/iiml//projects/it_foundations.html">https://www.ee.ucl.ac.uk/iiml//projects/it_foundations.html</a></li>
<li><a href="https://www.isi.edu/~gregv/ijcai/">https://www.isi.edu/~gregv/ijcai/</a></li>
<li><a href="https://arxiv.org/abs/1804.09060">https://arxiv.org/abs/1804.09060</a></li>
<li><a href="https://people.eng.unimelb.edu.au/jmanton/static/pdf/ISIT2020_preprint.pdf">https://people.eng.unimelb.edu.au/jmanton/static/pdf/ISIT2020_preprint.pdf</a></li>
<li><a href="http://proceedings.mlr.press/v80/chen18j/chen18j.pdf">http://proceedings.mlr.press/v80/chen18j/chen18j.pdf</a></li>
<li><a href="https://arxiv.org/pdf/1503.02406.pdf">https://arxiv.org/pdf/1503.02406.pdf</a></li>
<li><a href="https://stat.mit.edu/calendar/gregory-wornell/">https://stat.mit.edu/calendar/gregory-wornell/</a></li>
<li><a href="http://www.mit.edu/~a_makur/publications.html">http://www.mit.edu/~a_makur/publications.html</a></li>
<li><a href="https://www.rle.mit.edu/sia/publications/">https://www.rle.mit.edu/sia/publications/</a></li>
<li><a href="https://www.rle.mit.edu/sia/">https://www.rle.mit.edu/sia/</a></li>
<li><a href="https://xiangxiangxu.com/">https://xiangxiangxu.com/</a></li>
</ul>
<h3 id="information-bottleneck-theory">Information bottleneck theory</h3>
<ul>
<li><a href="https://www.researchgate.net/publication/325022755_On_the_information_bottleneck_theory_of_deep_learning">On the information bottleneck theory of deep learning</a></li>
<li><a href="https://arxiv.org/pdf/1503.02406.pdf">Deep Learning and the Information Bottleneck Principle</a></li>
<li><a href="https://mc.ai/summary-on-the-information-bottleneck-theory-of-deep-learning/">https://mc.ai/summary-on-the-information-bottleneck-theory-of-deep-learning/</a></li>
</ul>
<h2 id="brain-science-and-ai">Brain Science and AI</h2>
<p>Artificial intelligence and brain science have had a swinging relationship of convergence and divergence.
In the early days of pattern recognition, multi-layer neural networks based on the anatomy and physiology of the visual cortex played a key role,
but subsequent sophistication of machine learning promoted methods that are little related to the brain.
Recently, however, the remarkable success of deep neural networks in learning from big data has re-evoked the interests in
<a href="http://www.brain-ai.jp/project-outline/">brain-like artificial intelligence.</a></p>
<img src="http://www.brain-ai.jp/wp-content/uploads/2017/01/brain.png" width="70%" />
<ul>
<li><a href="http://videolectures.net/deeplearning2017_ganguli_deep_learning_theory/">Theoretical Neuroscience and Deep Learning Theory</a></li>
<li><a href="https://ankitlab.co/">Bridging Neuroscience and Deep Machine Learning, by building theories that work in the Real World.</a></li>
</ul>
<ul>
<li><a href="https://neuroscience.stanford.edu/mbct/home">Center for Mind, Brain, Computation and Technology</a></li>
<li><a href="https://braininspired.co/about/">Where neuroscience and artificial intelligence converge.</a></li>
<li><a href="https://elsc.huji.ac.il/events/elsc-conference-10">https://elsc.huji.ac.il/events/elsc-conference-10</a></li>
<li><a href="http://www.brain-ai.jp/organization/">http://www.brain-ai.jp/organization/</a></li>
<li><a href="https://neurodata.io/">https://neurodata.io/</a></li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818638/">Artificial Intelligence and brain</a></li>
<li><a href="https://www.cogneurosociety.org/dissecting-artificial-intelligence-to-better-understand-the-human-brain/">Dissecting Artificial Intelligence to Better Understand the Human Brain</a></li>
<li><a href="https://quest.mit.edu/projects/">Deep Learning and the Brain</a></li>
<li><a href="https://deepmind.com/blog/ai-and-neuroscience-virtuous-circle/">AI and Neuroscience: A virtuous circle</a></li>
<li><a href="http://www.columbia.edu/cu/appliedneuroshp/Papers/out.pdf">Neuroscience-Inspired Artificial Intelligence</a></li>
<li><a href="https://www.zhihu.com/question/59800121/answer/184888043">深度神经网络（DNN）是否模拟了人类大脑皮层结构？ - Harold Yue的回答 - 知乎</a></li>
<li><a href="https://elifesciences.org/articles/33066">Deep Learning: Branching into brains</a></li>
<li><a href="https://www.humanbrainproject.eu/en/">https://www.humanbrainproject.eu/en/</a></li>
<li><a href="https://www.neuro-central.com/ask-experts-artificial-intelligence-neuroscience/">https://www.neuro-central.com/ask-experts-artificial-intelligence-neuroscience/</a></li>
</ul>
<h3 id="spiking-neural-networks">Spiking neural networks</h3>
<ul>
<li><a href="https://zenkelab.org/">https://zenkelab.org/</a></li>
<li><a href="https://neural-reckoning.github.io/snn_workshop_2020/">https://neural-reckoning.github.io/snn_workshop_2020/</a></li>
<li><a href="https://fzenke.net/">https://fzenke.net/</a></li>
<li><a href="https://github.com/google/ihmehimmeli">https://github.com/google/ihmehimmeli</a></li>
<li><a href="https://www.humanbrainproject.eu/en/education/participatecollaborate/curriculum/workshops/3rd-curriculum-workshop-ict-snn/">https://www.humanbrainproject.eu/en/education/participatecollaborate/curriculum/workshops/3rd-curriculum-workshop-ict-snn/</a></li>
<li><a href="http://www.cosyne.org/c/index.php?title=Workshops2019_spike_1">http://www.cosyne.org/c/index.php?title=Workshops2019_spike_1</a></li>
<li><a href="https://2020.wcci-virtual.org/session/workshop-6-design-implementation-and-applications-spiking-neural-networks-and-neuromorphic">https://2020.wcci-virtual.org/session/workshop-6-design-implementation-and-applications-spiking-neural-networks-and-neuromorphic</a></li>
<li><a href="https://niceworkshop.org/nice-2020/nice-2020-tutorials/">https://niceworkshop.org/nice-2020/nice-2020-tutorials/</a></li>
<li><a href="https://github.com/ai-cortex/snn-workshop-amld-2020">https://github.com/ai-cortex/snn-workshop-amld-2020</a></li>
</ul>
<h3 id="the-thousand-brains-theory-of-intelligence">The Thousand Brains Theory of Intelligence</h3>
<ul>
<li><a href="https://numenta.com/">https://numenta.com/</a></li>
<li><a href="https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/">https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/</a></li>
<li><a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/42804_The_Thousand_Brains_Theory.pdf">https://www.microsoft.com/en-us/research/uploads/prod/2019/03/42804_The_Thousand_Brains_Theory.pdf</a></li>
<li><a href="https://lexfridman.com/jeff-hawkins/">https://lexfridman.com/jeff-hawkins/</a></li>
</ul>
<h2 id="cognition-science-and-deep-learning">Cognition Science and Deep Learning</h2>
<p>Brain science is the physological theorey of cognitive science, which focus on the physical principle of brain function.
The core problem of cognition science is how to learn in my eyes.</p>
<p>Artificial deep neural networks (DNNs) initially inspired by the brain enable computers to solve cognitive tasks at which humans excel.
In the absence of explanations for such cognitive phenomena,
<a href="https://www.sciencedirect.com/science/article/pii/S1364661319300348">in turn cognitive scientists have started using DNNs as models to investigate biological cognition and its neural basis, creating heated debate.</a></p>
<img src="https://ars.els-cdn.com/content/image/1-s2.0-S1364661319300348-gr2.jpg" width="69%"/>
<ul>
<li><a href="https://www.mis.mpg.de/ay/">https://www.mis.mpg.de/ay/</a></li>
<li><a href="http://web.mit.edu/cocosci/josh.html">Josh Tenenbaum</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S1364661319300348">Deep Neural Networks as Scientific Models</a></li>
<li><a href="https://wiki.opencog.org/w/Language_learning">https://wiki.opencog.org/w/Language_learning</a></li>
<li><a href="https://github.com/opencog/learn">https://github.com/opencog/learn</a></li>
<li><a href="https://brendenlake.github.io/AAI-site/">Advancing AI through cognitive science - Spring 2019</a></li>
<li><a href="https://github.com/brendenlake/AAI-site">NYU PSYCH-GA 3405.001 / DS-GA 3001.014 : Advancing AI through cognitive science</a></li>
<li><a href="https://web.stanford.edu/class/psych209/">PSYCH 209: Neural Network Models of Cognition: Principles and Applications</a></li>
<li><a href="http://cbl.eng.cam.ac.uk/Public/Lengyel/News">Computational Learning and Memory Group</a></li>
<li><a href="http://cocosci.mit.edu/">Computational cognitive Science Group @MIT</a></li>
<li><a href="http://beyond-deep-nets.clps.brown.edu/">Beyond deep learning</a></li>
<li><a href="https://cogcomp.org/">Cognitive Computation Group @ U. Penn.</a></li>
<li><a href="https://brendenlake.github.io/CCM-site/">Computational cognitive modeling</a></li>
<li><a href="http://hohol.pl/granty/geometry/">Mechanisms of geometric cognition</a></li>
<li><a href="http://cocosci.princeton.edu/research.php">Computational Cognitive Science Lab</a></li>
<li><a href="http://www.cs.jyu.fi/ai/vagan/DL4CC.html">Deep Learning for Cognitive Computing, Theory</a></li>
<li><a href="https://iccs2019.github.io/#James-Anderson">TOPIC: FROM ARISTOTLE TO WILLIAM JAMES TO DEEP LEARNING : EVERYTHING OLD IS NEW AGAIN.</a></li>
<li><a href="https://www.dcsc.es/">https://www.dcsc.es/</a></li>
<li><a href="https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930034-8">Deep Neural Networks as Scientific Models</a></li>
<li><a href="http://vca.ele.tue.nl/">http://vca.ele.tue.nl/</a></li>
<li><a href="https://deepmind.com/research/publications/psychlab-psychology-laboratory-deep-reinforcement-learning-agents">https://deepmind.com/research/publications/psychlab-psychology-laboratory-deep-reinforcement-learning-agents</a></li>
<li><a href="https://www.uni-potsdam.de/en/mlcog/index.html">https://www.uni-potsdam.de/en/mlcog/index.html</a></li>
<li><a href="https://csai.nl/home/">https://csai.nl/home/</a></li>
<li><a href="https://hadrienj.github.io/about/">https://hadrienj.github.io/about/</a></li>
<li><a href="https://iccs2019.github.io/">https://iccs2019.github.io/</a></li>
<li><a href="https://human-memory.net/">https://human-memory.net/</a></li>
<li><a href="https://sites.google.com/view/goergen">https://sites.google.com/view/goergen</a></li>
<li><a href="https://engineering.purdue.edu/IE/people/ptProfile?resource_id=126302">https://engineering.purdue.edu/IE/people/ptProfile?resource_id=126302</a></li>
<li><a href="https://engineering.columbia.edu/faculty/christos-papadimitriou">https://engineering.columbia.edu/faculty/christos-papadimitriou</a></li>
<li><a href="https://www.bio.purdue.edu/People/faculty_dm/directory.php?refID=1000000303">https://www.bio.purdue.edu/People/faculty_dm/directory.php?refID=1000000303</a></li>
<li><a href="https://people.csail.mit.edu/mirrokni/Welcome.html">https://people.csail.mit.edu/mirrokni/Welcome.html</a></li>
<li><a href="https://www.mindcogsci.net/">https://www.mindcogsci.net/</a></li>
<li><a href="https://ganguli-gang.stanford.edu/people.html">https://ganguli-gang.stanford.edu/people.html</a></li>
<li><a href="http://wiki.ict.usc.edu/cogarch/index.php/Main_Page">http://wiki.ict.usc.edu/cogarch/index.php/Main_Page</a></li>
<li><a href="http://cogarch.ict.usc.edu/">http://cogarch.ict.usc.edu/</a></li>
<li><a href="http://bicasociety.org/cogarch/architectures.php">http://bicasociety.org/cogarch/architectures.php</a></li>
</ul>
<h2 id="the-lottery-ticket-hypothesis">The lottery ticket hypothesis</h2>
<p><a href="https://arxiv.org/pdf/1906.02768.pdf">The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training</a><br>
by increasing the probability of a “lucky” sub-network initialization being
present rather than by helping the optimization process (Frankle &amp; Carbin, 2019).</p>
<ul>
<li><a href="https://ai.facebook.com/blog/understanding-the-generalization-of-lottery-tickets-in-neural-networks">https://ai.facebook.com/blog/understanding-the-generalization-of-lottery-tickets-in-neural-networks</a></li>
<li><a href="https://arxiv.org/pdf/1905.13405.pdf">https://arxiv.org/pdf/1905.13405.pdf</a></li>
<li><a href="https://arxiv.org/abs/1903.01611">https://arxiv.org/abs/1903.01611</a></li>
<li><a href="https://arxiv.org/abs/1905.13405">https://arxiv.org/abs/1905.13405</a></li>
<li><a href="https://arxiv.org/abs/1906.02768">https://arxiv.org/abs/1906.02768</a></li>
<li><a href="https://arxiv.org/abs/1906.02773">https://arxiv.org/abs/1906.02773</a></li>
<li><a href="https://arxiv.org/abs/1909.13458">https://arxiv.org/abs/1909.13458</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/84178021">https://zhuanlan.zhihu.com/p/84178021</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/67782029">https://zhuanlan.zhihu.com/p/67782029</a></li>
<li><a href="https://openai.com/blog/deep-double-descent/">https://openai.com/blog/deep-double-descent/</a></li>
<li><a href="https://zhuanlan.zhihu.com/p/100451862">https://zhuanlan.zhihu.com/p/100451862</a></li>
</ul>
<p><a href="https://www.csail.mit.edu/research/lottery-ticket-hypothesis">This project explores the <code>Lottery Ticket Hypothesis</code>: the conjecture that neural networks contain much smaller sparse subnetworks capable of training to full accuracy. In the course of this project, we have demonstrated that these subnetworks existed at initialization in small networks and early in training in larger networks. In addition, we have shown that these lottery ticket subnetworks are state-of-the-art pruned neural networks.</a></p>
<ul>
<li><a href="http://www.jfrankle.com/">http://www.jfrankle.com/</a></li>
<li><a href="https://gkdz.org/#about">https://gkdz.org/#about</a></li>
<li><a href="http://yosinski.com/">http://yosinski.com/</a></li>
<li><a href="https://arxiv.org/abs/1905.01067">Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask</a></li>
<li><a href="https://github.com/facebookresearch/open_lth">OpenLTH: A Framework for Lottery Tickets and Beyond</a></li>
<li><a href="https://github.com/google-research/lottery-ticket-hypothesis">Code: The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks</a></li>
<li><a href="https://arxiv.org/abs/1803.03635">The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks</a></li>
<li><a href="https://deepai.org/publication/the-lottery-ticket-hypothesis-at-scale">The Lottery Ticket Hypothesis at Scale</a></li>
<li><a href="https://arxiv.org/abs/1903.01611">Stabilizing the Lottery Ticket Hypothesis</a></li>
<li><a href="https://arxiv.org/abs/1912.05671">Linear Mode Connectivity and the Lottery Ticket Hypothesis</a></li>
<li><a href="https://arxiv.org/abs/2007.12223">The Lottery Ticket Hypothesis for Pre-trained BERT Networks</a></li>
<li><a href="https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/">https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/</a></li>
<li><a href="https://internetpolicy.mit.edu/neural-networks-and-the-lottery-ticket-hypothesis/">https://internetpolicy.mit.edu/neural-networks-and-the-lottery-ticket-hypothesis/</a></li>
<li><a href="https://arxiv.org/abs/1905.13405">Luck Matters: Understanding Training Dynamics of Deep ReLU Networks</a></li>
</ul>
<h2 id="double-descent">Double Descent</h2>
<p>The model with optimal parameters are not equal to the best model.</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><menclose notation="box"><mstyle scriptlevel="0" displaystyle="false"><mtext>Learning</mtext></mstyle></menclose><mo>≠</mo><mrow><mi>T</mi><mi>r</mi><mi>a</mi><mi>i</mi><mi>n</mi><mi>i</mi><mi>n</mi><mi>g</mi></mrow><mspace linebreak="newline"></mspace><mi>G</mi><mi>e</mi><mi>n</mi><mi>e</mi><mi>r</mi><mi>a</mi><mi>l</mi><mi>i</mi><mi>z</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo>≠</mo><mrow><mi>O</mi><mi>p</mi><mi>t</mi><mi>i</mi><mi>m</mi><mi>z</mi><mi>i</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">\fbox{Learning}\not ={Training} \\ Generalization\not ={Optimziation}.
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.5577699999999999em;vertical-align:-0.53444em;"></span><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0233299999999999em;"><span style="top:-3.5577699999999997em;"><span class="pstrut" style="height:3.5577699999999997em;"></span><span class="boxpad"><span class="mord"><span class="mord">L</span><span class="mord">e</span><span class="mord">a</span><span class="mord">r</span><span class="mord">n</span><span class="mord">i</span><span class="mord">n</span><span class="mord">g</span></span></span></span><span style="top:-3.0233299999999996em;"><span class="pstrut" style="height:3.5577699999999997em;"></span><span class="stretchy fbox" style="height:1.5577699999999999em;border-style:solid;border-width:0.04em;"></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.53444em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel"><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="rlap"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="inner"><span class="mrel"></span></span><span class="fix"></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span></span></span><span class="base"><span class="strut" style="height:0.36687em;vertical-align:0em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">a</span><span class="mord mathdefault">i</span><span class="mord mathdefault">n</span><span class="mord mathdefault">i</span><span class="mord mathdefault">n</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span></span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">G</span><span class="mord mathdefault">e</span><span class="mord mathdefault">n</span><span class="mord mathdefault">e</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">i</span><span class="mord mathdefault" style="margin-right:0.04398em;">z</span><span class="mord mathdefault">a</span><span class="mord mathdefault">t</span><span class="mord mathdefault">i</span><span class="mord mathdefault">o</span><span class="mord mathdefault">n</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel"><span class="mord"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="rlap"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="inner"><span class="mrel"></span></span><span class="fix"></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span></span></span><span class="base"><span class="strut" style="height:0.36687em;vertical-align:0em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.02778em;">O</span><span class="mord mathdefault">p</span><span class="mord mathdefault">t</span><span class="mord mathdefault">i</span><span class="mord mathdefault">m</span><span class="mord mathdefault" style="margin-right:0.04398em;">z</span><span class="mord mathdefault">i</span><span class="mord mathdefault">a</span><span class="mord mathdefault">t</span><span class="mord mathdefault">i</span><span class="mord mathdefault">o</span><span class="mord mathdefault">n</span></span><span class="mord">.</span></span></span></span></span></p>
<p><a href="https://ee.princeton.edu/people/sun-yuan-kung">Back-propagation (BP), the current de facto training paradigm for deep learning models, is only useful for parameter learning but offers no role in finding an optimal network structure. We need to go beyond BP in order to derive an optimal network, both in structure and in parameter.</a></p>
<blockquote>
<p>We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.</p>
</blockquote>
<img src="https://openai.com/content/images/size/w1400/2019/12/Frame-1--3-.png" width="70%"/>
<ul>
<li><a href="https://arxiv.org/abs/1710.03667">https://arxiv.org/abs/1710.03667</a></li>
<li><a href="https://hippocampus-garden.com/double_descent/">Reproducing Deep Double Descent</a></li>
<li><a href="https://openai.com/blog/deep-double-descent/">Deep Double Descent</a></li>
<li><a href="https://windowsontheory.org/2019/12/05/deep-double-descent/">Deep Double Descent (cross-posted on OpenAI blog)</a></li>
<li><a href="https://arxiv.org/abs/1912.02292">Deep Double Descent: Where Bigger Models and More Data Hurt</a></li>
<li><a href="https://arxiv.org/abs/2003.01054">Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime</a></li>
<li><a href="https://www.lyrn.ai/">https://www.lyrn.ai/</a></li>
<li><a href="https://arxiv.org/abs/1710.03667">High-dimensional dynamics of generalization error in neural networks</a></li>
<li><a href="https://mltheory.org/deep.pdf">https://mltheory.org/deep.pdf</a></li>
</ul>

    </body>
    </html>