initial build

PyTorchKorea · Apr 24, 2022 · 82cdb4b · 82cdb4b
1 parent 8ca04d7
commit 82cdb4b
Show file tree

Hide file tree

Showing 479 changed files with 201,083 additions and 0 deletions.
diff --git a/docs/.buildinfo b/docs/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 64a5804c254257c0ad7a8dc6e19c1484
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/.nojekyll b/docs/.nojekyll
diff --git a/docs/CNAME b/docs/CNAME
@@ -0,0 +1 @@
+docs.pytorchlightning.kr
diff --git a/docs/_images/figure-parity-times.png b/docs/_images/figure-parity-times.png
diff --git a/docs/_images/lr_finder.png b/docs/_images/lr_finder.png
diff --git a/docs/_images/profiler.png b/docs/_images/profiler.png
diff --git a/docs/_modules/index.html b/docs/_modules/index.html
diff --git a/docs/_modules/pytorch_lightning/callbacks/base.html b/docs/_modules/pytorch_lightning/callbacks/base.html
diff --git a/docs/_modules/pytorch_lightning/core/datamodule.html b/docs/_modules/pytorch_lightning/core/datamodule.html
diff --git a/docs/_modules/pytorch_lightning/core/lightning.html b/docs/_modules/pytorch_lightning/core/lightning.html
diff --git a/docs/_modules/pytorch_lightning/loggers/comet.html b/docs/_modules/pytorch_lightning/loggers/comet.html
diff --git a/docs/_modules/pytorch_lightning/loggers/csv_logs.html b/docs/_modules/pytorch_lightning/loggers/csv_logs.html
diff --git a/docs/_modules/pytorch_lightning/loggers/mlflow.html b/docs/_modules/pytorch_lightning/loggers/mlflow.html
diff --git a/docs/_modules/pytorch_lightning/loggers/neptune.html b/docs/_modules/pytorch_lightning/loggers/neptune.html
diff --git a/docs/_modules/pytorch_lightning/loggers/tensorboard.html b/docs/_modules/pytorch_lightning/loggers/tensorboard.html
diff --git a/docs/_modules/pytorch_lightning/loggers/wandb.html b/docs/_modules/pytorch_lightning/loggers/wandb.html
diff --git a/docs/_modules/pytorch_lightning/loops/base.html b/docs/_modules/pytorch_lightning/loops/base.html
diff --git a/docs/_modules/pytorch_lightning/trainer/trainer.html b/docs/_modules/pytorch_lightning/trainer/trainer.html
diff --git a/docs/_sources/accelerators/accelerator_prepare.rst.txt b/docs/_sources/accelerators/accelerator_prepare.rst.txt
@@ -0,0 +1,165 @@
+:orphan:
+
+.. _gpu_prepare:
+
+########################################
+Hardware agnostic training (preparation)
+########################################
+
+To train on CPU/GPU/TPU without changing your code, we need to build a few good habits :)
+
+----
+
+*****************************
+Delete .cuda() or .to() calls
+*****************************
+
+Delete any calls to .cuda() or .to(device).
+
+.. testcode::
+
+    # before lightning
+    def forward(self, x):
+        x = x.cuda(0)
+        layer_1.cuda(0)
+        x_hat = layer_1(x)
+
+
+    # after lightning
+    def forward(self, x):
+        x_hat = layer_1(x)
+
+----
+
+**********************************************
+Init tensors using type_as and register_buffer
+**********************************************
+When you need to create a new tensor, use ``type_as``.
+This will make your code scale to any arbitrary number of GPUs or TPUs with Lightning.
+
+.. testcode::
+
+    # before lightning
+    def forward(self, x):
+        z = torch.Tensor(2, 3)
+        z = z.cuda(0)
+
+
+    # with lightning
+    def forward(self, x):
+        z = torch.Tensor(2, 3)
+        z = z.type_as(x)
+
+The :class:`~pytorch_lightning.core.lightning.LightningModule` knows what device it is on. You can access the reference via ``self.device``.
+Sometimes it is necessary to store tensors as module attributes. However, if they are not parameters they will
+remain on the CPU even if the module gets moved to a new device. To prevent that and remain device agnostic,
+register the tensor as a buffer in your modules' ``__init__`` method with :meth:`~torch.nn.Module.register_buffer`.
+
+.. testcode::
+
+    class LitModel(LightningModule):
+        def __init__(self):
+            ...
+            self.register_buffer("sigma", torch.eye(3))
+            # you can now access self.sigma anywhere in your module
+
+----
+
+***************
+Remove samplers
+***************
+
+:class:`~torch.utils.data.distributed.DistributedSampler` is automatically handled by Lightning.
+
+See :ref:`replace-sampler-ddp` for more information.
+
+----
+
+***************************************
+Synchronize validation and test logging
+***************************************
+
+When running in distributed mode, we have to ensure that the validation and test step logging calls are synchronized across processes.
+This is done by adding ``sync_dist=True`` to all ``self.log`` calls in the validation and test step.
+This ensures that each GPU worker has the same behaviour when tracking model checkpoints, which is important for later downstream tasks such as testing the best checkpoint across all workers.
+The ``sync_dist`` option can also be used in logging calls during the step methods, but be aware that this can lead to significant communication overhead and slow down your training.
+
+Note if you use any built in metrics or custom metrics that use `TorchMetrics <https://torchmetrics.readthedocs.io/>`_, these do not need to be updated and are automatically handled for you.
+
+.. testcode::
+
+    def validation_step(self, batch, batch_idx):
+        x, y = batch
+        logits = self(x)
+        loss = self.loss(logits, y)
+        # Add sync_dist=True to sync logging across all GPU workers (may have performance impact)
+        self.log("validation_loss", loss, on_step=True, on_epoch=True, sync_dist=True)
+
+
+    def test_step(self, batch, batch_idx):
+        x, y = batch
+        logits = self(x)
+        loss = self.loss(logits, y)
+        # Add sync_dist=True to sync logging across all GPU workers (may have performance impact)
+        self.log("test_loss", loss, on_step=True, on_epoch=True, sync_dist=True)
+
+It is possible to perform some computation manually and log the reduced result on rank 0 as follows:
+
+.. testcode::
+
+    def test_step(self, batch, batch_idx):
+        x, y = batch
+        tensors = self(x)
+        return tensors
+
+
+    def test_epoch_end(self, outputs):
+        mean = torch.mean(self.all_gather(outputs))
+
+        # When logging only on rank 0, don't forget to add
+        # ``rank_zero_only=True`` to avoid deadlocks on synchronization.
+        if self.trainer.is_global_zero:
+            self.log("my_reduced_metric", mean, rank_zero_only=True)
+
+----
+
+**********************
+Make models pickleable
+**********************
+It's very likely your code is already `pickleable <https://docs.python.org/3/library/pickle.html>`_,
+in that case no change in necessary.
+However, if you run a distributed model and get the following error:
+
+.. code-block::
+
+    self._launch(process_obj)
+    File "/net/software/local/python/3.6.5/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47,
+    in _launch reduction.dump(process_obj, fp)
+    File "/net/software/local/python/3.6.5/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
+    ForkingPickler(file, protocol).dump(obj)
+    _pickle.PicklingError: Can't pickle <function <lambda> at 0x2b599e088ae8>:
+    attribute lookup <lambda> on __main__ failed
+
+This means something in your model definition, transforms, optimizer, dataloader or callbacks cannot be pickled, and the following code will fail:
+
+.. code-block:: python
+
+    import pickle
+
+    pickle.dump(some_object)
+
+This is a limitation of using multiple processes for distributed training within PyTorch.
+To fix this issue, find your piece of code that cannot be pickled. The end of the stacktrace
+is usually helpful.
+ie: in the stacktrace example here, there seems to be a lambda function somewhere in the code
+which cannot be pickled.
+
+.. code-block::
+
+    self._launch(process_obj)
+    File "/net/software/local/python/3.6.5/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47,
+    in _launch reduction.dump(process_obj, fp)
+    File "/net/software/local/python/3.6.5/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
+    ForkingPickler(file, protocol).dump(obj)
+    _pickle.PicklingError: Can't pickle [THIS IS THE THING TO FIND AND DELETE]:
+    attribute lookup <lambda> on __main__ failed
diff --git a/docs/_sources/accelerators/gpu.rst.txt b/docs/_sources/accelerators/gpu.rst.txt
@@ -0,0 +1,63 @@
+.. _gpu:
+
+Accelerator: GPU training
+=========================
+
+.. raw:: html
+
+    <div class="display-card-container">
+        <div class="row">
+
+.. Add callout items below this line
+
+.. displayitem::
+   :header: Prepare your code (Optional)
+   :description: Prepare your code to run on any hardware
+   :col_css: col-md-4
+   :button_link: accelerator_prepare.html
+   :height: 150
+   :tag: basic
+
+.. displayitem::
+   :header: Basic
+   :description: Learn the basics of single and multi-GPU training.
+   :col_css: col-md-4
+   :button_link: gpu_basic.html
+   :height: 150
+   :tag: basic
+
+.. displayitem::
+   :header: Intermediate
+   :description: Learn about different distributed strategies, torchelastic and how to optimize communication layers.
+   :col_css: col-md-4
+   :button_link: gpu_intermediate.html
+   :height: 150
+   :tag: intermediate
+
+.. displayitem::
+   :header: Advanced
+   :description: Train 1 trillion+ parameter models with these techniques.
+   :col_css: col-md-4
+   :button_link: gpu_advanced.html
+   :height: 150
+   :tag: advanced
+
+.. displayitem::
+   :header: Expert
+   :description: Develop new strategies for training and deploying larger and larger models.
+   :col_css: col-md-4
+   :button_link: gpu_expert.html
+   :height: 150
+   :tag: expert
+
+.. displayitem::
+   :header: FAQ
+   :description: Frequently asked questions about GPU training.
+   :col_css: col-md-4
+   :button_link: gpu_faq.html
+   :height: 150
+
+.. raw:: html
+
+        </div>
+    </div>
diff --git a/docs/_sources/accelerators/gpu_advanced.rst.txt b/docs/_sources/accelerators/gpu_advanced.rst.txt
@@ -0,0 +1,16 @@
+:orphan:
+
+.. _gpu_advanced:
+
+GPU training (Advanced)
+=======================
+**Audience:** Users looking to scale massive models (ie: 1 Trillion parameters).
+
+----
+
+For experts pushing the state-of-the-art in model development, Lightning offers various techniques to enable Trillion+ parameter-scale models.
+
+----
+
+..
+    .. include:: ../advanced/model_parallel.rst
diff --git a/docs/_sources/accelerators/gpu_basic.rst.txt b/docs/_sources/accelerators/gpu_basic.rst.txt
@@ -0,0 +1,97 @@
+:orphan:
+
+.. _gpu_basic:
+
+GPU training (Basic)
+====================
+**Audience:** Users looking to save money and run large models faster using single or multiple
+
+----
+
+What is a GPU?
+--------------
+A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed to speed up mathematical computations used in gaming and deep learning.
+
+----
+
+Train on 1 GPU
+--------------
+
+Make sure you're running on a machine with at least one GPU. There's no need to specify any NVIDIA flags
+as Lightning will do it for you.
+
+.. testcode::
+    :skipif: torch.cuda.device_count() < 1
+
+    trainer = Trainer(accelerator="gpu", devices=1)
+
+----------------
+
+
+.. _multi_gpu:
+
+Train on multiple GPUs
+----------------------
+
+To use multiple GPUs, set the number of devices in the Trainer or the index of the GPUs.
+
+.. code::
+
+    trainer = Trainer(accelerator="gpu", devices=4)
+
+Choosing GPU devices
+^^^^^^^^^^^^^^^^^^^^
+
+You can select the GPU devices using ranges, a list of indices or a string containing
+a comma separated list of GPU ids:
+
+.. testsetup::
+
+    k = 1
+
+.. testcode::
+    :skipif: torch.cuda.device_count() < 2
+
+    # DEFAULT (int) specifies how many GPUs to use per node
+    Trainer(accelerator="gpu", devices=k)
+
+    # Above is equivalent to
+    Trainer(accelerator="gpu", devices=list(range(k)))
+
+    # Specify which GPUs to use (don't use when running on cluster)
+    Trainer(accelerator="gpu", devices=[0, 1])
+
+    # Equivalent using a string
+    Trainer(accelerator="gpu", devices="0, 1")
+
+    # To use all available GPUs put -1 or '-1'
+    # equivalent to list(range(torch.cuda.device_count()))
+    Trainer(accelerator="gpu", devices=-1)
+
+The table below lists examples of possible input formats and how they are interpreted by Lightning.
+
++------------------+-----------+---------------------+---------------------------------+
+| `devices`        | Type      | Parsed              | Meaning                         |
++==================+===========+=====================+=================================+
+| 3                | int       | [0, 1, 2]           | first 3 GPUs                    |
++------------------+-----------+---------------------+---------------------------------+
+| -1               | int       | [0, 1, 2, ...]      | all available GPUs              |
++------------------+-----------+---------------------+---------------------------------+
+| [0]              | list      | [0]                 | GPU 0                           |
++------------------+-----------+---------------------+---------------------------------+
+| [1, 3]           | list      | [1, 3]              | GPUs 1 and 3                    |
++------------------+-----------+---------------------+---------------------------------+
+| "3"              | str       | [0, 1, 2]           | first 3 GPUs                    |
++------------------+-----------+---------------------+---------------------------------+
+| "1, 3"           | str       | [1, 3]              | GPUs 1 and 3                    |
++------------------+-----------+---------------------+---------------------------------+
+| "-1"             | str       | [0, 1, 2, ...]      | all available GPUs              |
++------------------+-----------+---------------------+---------------------------------+
+
+.. note::
+
+    When specifying number of ``devices`` as an integer ``devices=k``, setting the trainer flag
+    ``auto_select_gpus=True`` will automatically help you find ``k`` GPUs that are not
+    occupied by other processes. This is especially useful when GPUs are configured
+    to be in "exclusive mode", such that only one process at a time can access them.
+    For more details see the :doc:`trainer guide <../common/trainer>`.
diff --git a/docs/_sources/accelerators/gpu_expert.rst.txt b/docs/_sources/accelerators/gpu_expert.rst.txt
@@ -0,0 +1,21 @@
+:orphan:
+
+.. _gpu_expert:
+
+GPU training (Expert)
+=====================
+**Audience:** Experts creating new scaling techniques such as Deepspeed or FSDP
+
+----
+
+Lightning enables experts focused on researching new ways of optimizing distributed training/inference strategies to create new strategies and plug them into Lightning.
+
+For example, Lightning worked closely with the Microsoft team to develop a Deepspeed integration and with the Facebook(Meta) team to develop a FSDP integration.
+
+----
+
+.. include:: ../advanced/strategy_registry.rst
+
+----
+
+.. include:: ../extensions/strategy.rst