These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Lightning-Hydra-Template
A clean and scalable template to kickstart your deep learning project
Click on Use this template to initialize new repository.
Suggestions are always welcome!
Introduction
Why you should use it:
- Convenient all-in-one technology stack for deep learning prototyping – allows you to rapidly iterate over new models, datasets and tasks on different hardware accelerators like CPUs, multi-GPUs or TPUs.
- A collection of best practices for efficient workflow and reproducibility.
- Thoroughly commented – you can use this repo as a reference and educational resource.
Why you shouldn’t use it:
- Lightning and Hydra are still evolving and integrate many libraries, which means sometimes things break – for the list of currently known problems visit this page.
- Template is not really adjusted for data science and building data pipelines that depend on each other (it’s much more useful for model prototyping on ready-to-use data).
- The configuration setup is built with simple lightning training in mind (you might need to put some effort to adjust it for different use cases, e.g. lightning lite).
- Limits you as much as pytorch lightning limits you.
*keep in mind this is unofficial community project
Main Technologies
PyTorch Lightning – a lightweight PyTorch wrapper for high-performance AI research. Think of it as a framework for organizing your PyTorch code.
Hydra – a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
Main Ideas Of This Template
- Predefined Structure: clean and scalable so that work can easily be extended # Project Structure
- Rapid Experimentation: thanks to hydra command line superpowers | # Your Superpowers
- Little Boilerplate: thanks to automating pipelines with config instantiation | # How It Works
- Main Configs: specify default training configuration | # Main Config
- Experiment Configs: override chosen hyperparameters | # Experiment Config
- Workflow: comes down to 4 simple steps | # Workflow
- Experiment Tracking: Tensorboard, W&B, Neptune, Comet, MLFlow and CSVLogger | # Experiment Tracking
- Logs: all logs (checkpoints, configs, etc.) are stored in a dynamically generated folder structure | # Logs
- Hyperparameter Search: made easier with Hydra plugins like Optuna Sweeper | # Hyperparameter Search
- Tests: generic, easy-to-adapt tests for speeding up the development | # Tests
- Continuous Integration: automatically test your repo with Github Actions | # Continuous Integration
- Best Practices: a couple of recommended tools, practices and standards | # Best Practices
Project Structure
The directory structure of new project looks like this:
├── configs <- Hydra configuration files
│ ├── callbacks <- Callbacks configs
│ ├── datamodule <- Datamodule configs
│ ├── debug <- Debugging configs
│ ├── experiment <- Experiment configs
│ ├── extras <- Extra utilities configs
│ ├── hparams_search <- Hyperparameter search configs
│ ├── hydra <- Hydra configs
│ ├── local <- Local configs
│ ├── logger <- Logger configs
│ ├── model <- Model configs
│ ├── paths <- Project paths configs
│ ├── trainer <- Trainer configs
│ │
│ ├── eval.yaml <- Main config for evaluation
│ └── train.yaml <- Main config for training
│
├── data <- Project data
│
├── logs <- Logs generated by hydra and lightning loggers
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description,
│ e.g. `1.0-jqp-initial-data-exploration.ipynb`.
│
├── scripts <- Shell scripts
│
├── src <- Source code
│ ├── datamodules <- Lightning datamodules
│ ├── models <- Lightning models
│ ├── utils <- Utility scripts
│ │
│ ├── eval.py <- Run evaluation
│ └── train.py <- Run training
│
├── tests <- Tests of any kind
│
├── .env.example <- Example of file for storing private environment variables
├── .gitignore <- List of files ignored by git
├── .pre-commit-config.yaml <- Configuration of pre-commit hooks for code formatting
├── Makefile <- Makefile with commands like `make train` or `make test`
├── pyproject.toml <- Configuration options for testing and linting
├── requirements.txt <- File for installing python dependencies
├── setup.py <- File for installing project as a package
└── README.md
Quickstart
# clone project
git clone https://github.com/ashleve/lightning-hydra-template
cd lightning-hydra-template
# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
Template contains example with MNIST classification.
When running python src/train.py
you should see something like this:
Your Superpowers
Override any config parameter from command line
Train on CPU, GPU, multi-GPU and TPU
Train with mixed precision
Train model with any logger available in PyTorch Lightning, like W&B or Tensorboard
Train model with chosen experiment config
Attach some callbacks to run
Use different tricks available in Pytorch Lightning
Easily debug
Resume training from checkpoint
Evaluate checkpoint on test dataset
Create a sweep over hyperparameters
Create a sweep over hyperparameters with Optuna
Execute all experiments from folder
Execute run for multiple different seeds
Execute sweep on a remote AWS cluster
Use Hydra tab completion
Apply pre-commit hooks
Run tests
Use tags
Contributions
Have a question? Found a bug? Missing a specific feature? Feel free to file a new issue, discussion or PR with respective title and description.
Before making an issue, please verify that:
- The problem still exists on the current
main
branch. - Your python dependencies are updated to recent versions.
Suggestions for improvements are always welcome!
How It Works
All PyTorch Lightning modules are dynamically instantiated from module paths specified in config. Example model config:
_target_: src.models.mnist_model.MNISTLitModule
lr: 0.001
net:
_target_: src.models.components.simple_dense_net.SimpleDenseNet
input_size: 784
lin1_size: 256
lin2_size: 256
lin3_size: 256
output_size: 10
Using this config we can instantiate the object with the following line:
model = hydra.utils.instantiate(config.model)
This allows you to easily iterate over new models! Every time you create a new one, just specify its module path and parameters in appropriate config file.
Switch between models and datamodules with command line arguments:
python train.py model=mnist
Example pipeline managing the instantiation logic: src/train.py.
Main Config
Location: configs/train.yaml
Main project config contains default training configuration.
It determines how config is composed when simply executing command python train.py
.
Show main project config
Experiment Config
Location: configs/experiment
Experiment configs allow you to overwrite parameters from main config.
For example, you can use them to version control best hyperparameters for each combination of model and dataset.
Show example experiment config
Workflow
Basic workflow
- Write your PyTorch Lightning module (see models/mnist_module.py for example)
- Write your PyTorch Lightning datamodule (see datamodules/mnist_datamodule.py for example)
- Write your experiment config, containing paths to model and datamodule
- Run training with chosen experiment config:
python src/train.py experiment=experiment_name.yaml
Experiment design
Say you want to execute many runs to plot how accuracy changes in respect to batch size.
-
Execute the runs with some config parameter that allows you to identify them easily, like tags:
python train.py -m logger=csv datamodule.batch_size=16,32,64,128 tags=["batch_size_exp"]
-
Write a script or notebook that searches over the
logs/
folder and retrieves csv logs from runs containing given tags in config. Plot the results.
Logs
Hydra creates new output directory for every executed run.
Default logging structure:
├── logs
│ ├── task_name
│ │ ├── runs # Logs generated by single runs
│ │ │ ├── YYYY-MM-DD_HH-MM-SS # Datetime of the run
│ │ │ │ ├── .hydra # Hydra logs
│ │ │ │ ├── csv # Csv logs
│ │ │ │ ├── wandb # Weights&Biases logs
│ │ │ │ ├── checkpoints # Training checkpoints
│ │ │ │ └── ... # Any other thing saved during training
│ │ │ └── ...
│ │ │
│ │ └── multiruns # Logs generated by multiruns
│ │ ├── YYYY-MM-DD_HH-MM-SS # Datetime of the multirun
│ │ │ ├──1 # Multirun job number
│ │ │ ├──2
│ │ │ └── ...
│ │ └── ...
│ │
│ └── debugs # Logs generated when debugging config is attached
│ └── ...
You can change this structure by modifying paths in hydra configuration.
Experiment Tracking
PyTorch Lightning supports many popular logging frameworks: Weights&Biases, Neptune, Comet, MLFlow, Tensorboard.
These tools help you keep track of hyperparameters and output metrics and allow you to compare and visualize results. To use one of them simply complete its configuration in configs/logger and run:
python train.py logger=logger_name
You can use many of them at once (see configs/logger/many_loggers.yaml for example).
You can also write your own logger.
Lightning provides convenient method for logging custom metrics from inside LightningModule. Read the docs or take a look at MNIST example.
Tests
Template comes with generic tests implemented with pytest
.
# run all tests
pytest
# run tests from specific file
pytest tests/test_train.py
# run all tests except the ones marked as slow
pytest -k "not slow"
Most of the implemented tests don’t check for any specific output – they exist to simply verify that executing some commands doesn’t end up in throwing exceptions. You can execute them once in a while to speed up the development.
Currently, the tests cover cases like:
- running 1 train, val and test step
- running 1 epoch on 1% of data, saving ckpt and resuming for the second epoch
- running 2 epochs on 1% of data, with DDP simulated on CPU
And many others. You should be able to modify them easily for your use case.
There is also @RunIf
decorator implemented, that allows you to run tests only if certain conditions are met, e.g. GPU is available or system is not windows. See the examples.
Hyperparameter Search
You can define hyperparameter search by adding new config file to configs/hparams_search.
Show example hyperparameter search config
Next, execute it with: python train.py -m hparams_search=mnist_optuna
Using this approach doesn’t require adding any boilerplate to code, everything is defined in a single config file. The only necessary thing is to return the optimized metric value from the launch file.
You can use different optimization frameworks integrated with Hydra, like Optuna, Ax or Nevergrad.
The optimization_results.yaml
will be available under logs/task_name/multirun
folder.
This approach doesn’t support advanced techniques like prunning – for more sophisticated search, you should probably write a dedicated optimization task (without multirun feature).
Continuous Integration
Template comes with CI workflows implemented in Github Actions:
.github/workflows/test.yaml
: running all tests with pytest.github/workflows/code-quality-main.yaml
: running pre-commits on main branch for all files.github/workflows/code-quality-pr.yaml
: running pre-commits on pull requests for modified files only
Note: You need to enable the GitHub Actions from the settings in your repository.
Distributed Training
Lightning supports multiple ways of doing distributed training. The most common one is DDP, which spawns separate process for each GPU and averages gradients between them. To learn about other approaches read the lightning docs.
You can run DDP on mnist example with 4 GPUs like this:
python train.py trainer=ddp
Note: When using DDP you have to be careful how you write your models – read the docs.
Accessing Datamodule Attributes In Model
The simplest way is to pass datamodule attribute directly to model on initialization:
# ./src/train.py
datamodule = hydra.utils.instantiate(config.datamodule)
model = hydra.utils.instantiate(config.model, some_param=datamodule.some_param)
Note: Not a very robust solution, since it assumes all your datamodules have
some_param
attribute available.
Similarly, you can pass a whole datamodule config as an init parameter:
# ./src/train.py
model = hydra.utils.instantiate(config.model, dm_conf=config.datamodule, _recursive_=False)
You can also pass a datamodule config parameter to your model through variable interpolation:
# ./configs/model/my_model.yaml
_target_: src.models.my_module.MyLitModule
lr: 0.01
some_param: ${datamodule.some_param}
Another approach is to access datamodule in LightningModule directly through Trainer:
# ./src/models/mnist_module.py
def on_train_start(self):
self.some_param = self.trainer.datamodule.some_param
Note: This only works after the training starts since otherwise trainer won’t be yet available in LightningModule.
Best Practices
Use Miniconda for GPU environments
Use automatic code formatting
Set private environment variables in .env file
Name metrics using ‘/’ character
Use torchmetrics
Follow PyTorch Lightning style guide
Version control your data and models with DVC
Support installing project as a package
Keep local configs out of code versioning
Resources
This template was inspired by:
- PyTorchLightning/deep-learninig-project-template
- drivendata/cookiecutter-data-science
- lucmos/nn-template
Other useful repositories:
- jxpress/lightning-hydra-template-vertex-ai – lightning-hydra-template integration with Vertex AI hyperparameter tuning and custom training job
- pytorch/hydra-torch – safely configuring PyTorch classes with Hydra
- romesco/hydra-lightning – safely configuring PyTorch Lightning classes with Hydra
- PyTorchLightning/lightning-transformers – official Lightning Transformers repo built with Hydra
License
Lightning-Hydra-Template is licensed under the MIT License.
MIT License
Copyright (c) 2021 ashleve
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
DELETE EVERYTHING ABOVE FOR YOUR PROJECT
Description
What it does
How to run
Install dependencies
# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name
# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
Train model with default configuration
# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpu
Train model with chosen experiment configuration from configs/experiment/
python src/train.py experiment=experiment_name.yaml
You can override any parameter from command line like this
python src/train.py trainer.max_epochs=20 datamodule.batch_size=64
About the tool
You can click on the links to see the associated tools
Tool type(s):
Objective(s):
Purpose(s):
Type of approach:
Github stars:
- 1106
Github forks:
- 172
Use Cases
Would you like to submit a use case for this tool?
If you have used this tool, we would love to know more about your experience.
Add use case