PyTorch Lightning

PyTorch Lightning 是专业人工智能研究人员和机器学习工程师的深度学习框架。是一个batteries included的深度学习框架,适合需要最大灵活性同时大规模增强性能的专业人工智能研究人员和机器学习工程师。

1. PyTorch Lightning安装
1
$ pip install lightning
2. 定义一个LightningModule
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning as L

# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))


# define the LightningModule
class LitAutoEncoder(L.LightningModule):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder

def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# Logging to TensorBoard (if installed) by default
self.log("train_loss", loss)
return loss

def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer


# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)
3. 定义一个数据集
1
2
3
# setup data
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
4. 训练模型
1
2
3
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = L.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

执行训练:

5.使用训练好的模型
1
2
3
4
5
6
7
8
9
10
11
12
# load checkpoint
checkpoint = "./lightning_logs/version_0/checkpoints/epoch=0-step=100.ckpt"
autoencoder = LitAutoEncoder.load_from_checkpoint(checkpoint, encoder=encoder, decoder=decoder)

# choose your trained nn.Module
encoder = autoencoder.encoder
encoder.eval()

# embed 4 fake images!
fake_image_batch = torch.rand(4, 28 * 28, device=autoencoder.device)
embeddings = encoder(fake_image_batch)
print("⚡" * 20, "\nPredictions (4 image embeddings):\n", embeddings, "\n", "⚡" * 20)

输出结果为:

1
2
3
4
5
6
7
8
(hello-D1UArRDQ-py3.11) umbrella:hello zcj$ poetry run python Lightning_module_demo.py 
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
Predictions (4 image embeddings) :
tensor([[-0.5218, -0.0958, 0.4148],
[-0.6634, -0.0083, 0.5347],
[-0.6266, 0.0502, 0.4794],
[-0.6974, -0.0774, 0.5666]], grad_fn=<AddmmBackward0>)
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
6.训练可视化

If you have tensorboard installed, you can use it for visualizing experiments. Run this on your commandline and open your browser to http://localhost:6006/

1
$ tensorboard --logdir .
7.增压训练

使用Trainer参数启用高级训练功能。这些是最先进的技术,可以自动集成到您的训练循环中,而无需更改您的代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# train on 4 GPUs
trainer = L.Trainer(
devices=4,
accelerator="gpu",
)

# train 1TB+ parameter models with Deepspeed/fsdp
trainer = L.Trainer(
devices=4,
accelerator="gpu",
strategy="deepspeed_stage_2",
precision=16
)

# 20+ helpful flags for rapid idea iteration
trainer = L.Trainer(
max_epochs=10,
min_epochs=5,
overfit_batches=1
)

# access the latest state of the art techniques
trainer = L.Trainer(callbacks=[StochasticWeightAveraging(...)])

Lightning的核心指导原则是始终提供最大的灵活性,而不隐藏任何PyTorch。根据项目的复杂性,Lightning提供5种额外的灵活性。