关于神经网络压缩方法的学习---实现部分(剪枝)
上一篇笔记记录了对于剪枝原理的学习和思考,本篇学习一下剪枝具体的编程实现方法代码大部分来自网络,剪枝过程使用的源模型为自己训练的model.onnx,模型大小为2MB左右。以下代码来自目录1.导包&定义一个简单的网络2.多参数剪枝3.全局剪枝4.总结目前大部分最先进的(SOTA)深度学习技术虽然效果好,但由于其模型参数量和计算量过高,难以用于实际部署。
上一篇笔记记录了对于剪枝原理的学习和思考,本篇学习一下剪枝具体的编程实现方法
代码大部分来自网络,剪枝过程使用的源模型为自己训练的model.pt,模型大小为2MB左右。剪枝并finetune后模型压缩了70%左右,在简单任务上的识别精度较好。
通过Pytorch进行剪枝有比较成熟的库prune可以使用,通过torch.nn.utils.prune进行调用
1. 剪枝分类
- 非结构化剪枝
- 结构化剪枝
剪枝可以在每层(局部)或多层/所有层(全局)上进行,pytorch中的结构化剪枝只支持局部
2. 具体实现方法(Pytorch)
Pytorch中可以用来实现剪枝的类:
class BasePruningMethod(ABC):
class PruningContainer(BasePruningMethod):
class Identity(BasePruningMethod):
class RandomUnstructured(BasePruningMethod):
class L1Unstructured(BasePruningMethod):
class RandomStructured(BasePruningMethod):
class LnStructured(BasePruningMethod):
具体实现的工作流程:
- 选择剪枝方法(或者子类化BasePruningMethod实现个性化的剪枝)
- 指定剪枝模块和参数名称
- 设置剪枝方法的参数,如剪枝比例、剪枝范围等
也可以自己写剪枝的逻辑,为了了解剪枝的流程,下面选择从原理上实现一次剪枝的流程
找了挺多资料,好像没有看到比较完善的基于onnx的剪枝,如果有好的教程,欢迎分享😀
1. 定义一个网络,训练完成后保存为pt格式
根据论文Learning Efficient Convolutional Networks through Network Slimming的说法,可以在BN层中加入L1正则化项,让BN的scaling factors趋向于0,从而可以更容易的识别出重要的通道。文章中认为"channel-level sparsity provides a nice tradeoff between flexibility and ease of implementation" compared to "weight-level" and "layer-level" pruning.
def updateBN(model,s=0.0001):
for m in model.modules():
if isinstance(m,nn.BatchNorm2d):
m.weight.grad.data.add_(s*torch.sign(m.weight.data))
if __name__=="__main__":
model=net()
# from torchsummary import summary
# print(summary(model,(3,20,20),1))
# x = torch.rand((1, 3, 20, 20))
# print(model(x))
optimer=torch.optim.Adam(model.parameters())
loss_fn=torch.nn.CrossEntropyLoss()
for e in range(100):
x = torch.rand((1, 3, 20, 20))
y=torch.tensor(np.random.randint(0,2,(1))).long()
out=model(x)
loss=loss_fn(out,y)
optimer.zero_grad()
loss.backward()
#BN权重稀疏化
updateBN(model)
optimer.step()
torch.save(model.state_dict(),"net.pt")
*代码参考来自模型压缩(一)通道剪枝-BN层-CSDN博客
似乎就是给BN层的weight加上了一个正则化项,不会增加参数量
有点问题:在训练初始模型时通过BN正则化添加了稀疏性,在后续Finetune的时候BN层还需要正则化吗?感觉大概不用?欢迎指正!
首先是基于剪枝的模型构建部分,引入了一个cfg参数作为剪枝的参照:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
import torch.nn as nn
##读取模型
class ld_model(nn.module):
def __init__(self, *, inputlength = 60, kernelsize = 3, kindsoutput = 2, cfg = None):
super(ld_model,self).__init__()
## cfg是一个卷积层操作的指标,见后文
if cfg:
self.features = self.make_layer(cfg)
self.dropout = nn.Dropout(0.2)
self.fc1 = nn.Sequential
(
nn.Linear(in_features = cfg[2]*(inputlength -8), out_features = 512),
nn.ReLU(),
)
self.fc2 = nn.Sequential
(
nn.Linear(in_features = 512, out_features = 256),
nn.ReLU(),
)
self.fc3 = nn.Sequential
(
nn.Linear(in_features = 256, out_features = 128),
nn.ReLU(),
)
self.out = nn.Linear(128, kindsoutput)
else:
layers = []
layers += [ nn.Conv1d(in_channels = 1, out_channels = 6, kernel_size = 3),
nn.BatchNorm1d(6),
nn.ReLU(inplace=True),
nn.AvgPool1d(kernel_size = 3, stride = 1)]
layers += [ nn.Conv1d(in_channels = 6, out_channels = 16, kernel_size = 3),
nn.BatchNorm1d(16),
nn.ReLU(inplace=True),
nn.AvgPool1d(kernel_size = 3, stride = 1)]
self.features = nn.Sequential(*layers)
self.dropout = nn.Dropout(0.2)
self.fc1 = nn.Sequential
(
nn.Linear(in_features = 16*(inputlength -8), out_features = 512),
nn.ReLU(),
)
self.fc2 = nn.Sequential
(
nn.Linear(in_features = 512, out_features = 256),
nn.ReLU(),
)
self.fc3 = nn.Sequential
(
nn.Linear(in_features = 256, out_features = 128),
nn.ReLU(),
)
self.out = nn.Linear(128, kindsoutput)
def make_layer(self, cfg):
layers = []
layers += [ nn.Conv1d(in_channels = 1, out_channels = cfg[0], kernel_size = 3),
nn.BatchNorm1d(cfg[0]),
nn.ReLU(inplace=True),
nn.AvgPool1d(kernel_size = 3, stride = 1)]
layers += [ nn.Conv1d(in_channels = cfg[0], out_channels = cfg[2], kernel_size = 3),
nn.BatchNorm1d(cfg[2]),
nn.ReLU(inplace=True),
nn.AvgPool1d(kernel_size = 3, stride = 1)]
return nn.Sequential(*layers)
def forward(self, x):
x = x.to(torch.float32)
x = self.features(x)
x = x.view(x.size(0),-1)
x = self.fc1(x)
x = self.dropout(x)
x = self.fc2(x)
x = self.dropout(x)
x = self.fc3(x)
x = self.dropout(x)
x = self.out(x)
x = x.to(torch.float32)
return x
def updateBN(model, s=0.0001):
for m in model.modules():
if isinstance(m, nn.BatchNorm1d):
m.weight.grad.data.add_(s * torch.sign(m.weight.data))
##具体训练过程略
##保存为pt模型
torch.save(model.state_dict(), 'model_bn_sparse.pt')
2. 选取上一步获得的pt模型进行剪枝
path = r'model_bn_sparse.pt'
model = ld_model()
model.load_state_dict(torch.load(path))
model = model.to(device)
total = 0 ##统计BN层参数量
for m in model.modules():
if isinstance(m, nn.BatchNorm1d):
# print(m.weight.data.shape[0])
# print(m.weight.data)
total += m.weight.data.shape[0]
print("BN层总参数量:", total)
bn_data = torch.zeros(total) ##初始化bn层空间
index = 0
for m in model.modules():
#将各个BN层的参数值拷贝到bn中
if isinstance(m,nn.BatchNorm2d):
size = m.weight.data.shape[0]
bn_data[index:(index+size)] = m.weight.data.abs().clone()
index += size
#对bn中的weight值排序
data, id = torch.sort(bn_data)
percent = 0.7 #保留70%的BN层通道数
thresh_index = int(total*percent)
thresh = data[thresh_index] #取bn排序后的第thresh_index索引值为bn权重的截断阈值
#制作mask
pruned_num = 0 #统计BN层剪枝通道数
cfg = [] #统计保存通道数
cfg_mask = [] #BN层权重矩阵,剪枝的通道记为0,未剪枝通道记为1
##以下剪枝原理部分针对一维卷积进行了更改,使用时需要注意参数的维度
for k,m in enumerate(model.modules()):
if isinstance(m,nn.BatchNorm1d):
weight_copy = m.weight.data.abs().clone()
# print(weight_copy)
mask = weight_copy.gt(thresh).float()#阈值分离权重
# print(mask)
# exit()
pruned_num += mask.shape[0] - torch.sum(mask)#
# print(pruned_num)
m.weight.data.mul_(mask) #更新BN层的权重,剪枝通道的权重值为0
m.bias.data.mul_(mask)
cfg.append(int(torch.sum(mask)))#记录未被剪枝的通道数量
cfg_mask.append(mask.clone())
print("layer index:{:d}\t total channel:{:d}\t remaining channel:{:d}".format(k,mask.shape[0],int(torch.sum(mask))))
elif isinstance(m,nn.AvgPool1d):
cfg.append("A") ##记录池化层位置,对池化层不做操作
pruned_ratio = pruned_num / total
print("剪枝通道占比:",pruned_ratio)
print(cfg)
newmodel = ld_model(cfg) #根据cfg中记录的需要保存的通道数及位置信息实例化网络,此时网络结构已经发生变化,以及需要对参数进行更新
new_model = new_model.to(device)
# print(newmodel)
# from torchsummary import summary
# print(summary(newmodel,(3,20,20),1))
layer_id_in_cfg = 0 #层
start_mask = torch.ones(1) ##这里需要注意,start_mask的初始化大小对应的是输入数据的维度
end_mask = cfg_mask[layer_id_in_cfg]#第一个BN层对应的mask
# print(cfg_mask)
# print(end_mask)
for (m0,m1) in zip(model.modules(), newmodel.modules()):#以最少的为准
if isinstance(m0, nn.BatchNorm1d):
# idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy())))#获得mask中非零索引即未被减掉的序号
if idx1.size == 1:
idx1 = np.resize(idx1,(1,))
#将旧模型的参数值拷贝到新模型中
m1.weight.data = m0.weight.data[idx1.tolist()].clone()
m1.bias.data = m0.bias.data[idx1.tolist()].clone()
m1.running_mean = m0.running_mean[idx1.tolist()].clone()
m1.running_var = m0.running_var[idx1.tolist()].clone()
layer_id_in_cfg += 1#下一个mask
start_mask = end_mask.clone()
if layer_id_in_cfg < len(cfg_mask):
end_mask = cfg_mask[layer_id_in_cfg]
elif isinstance(m0, nn.Conv1d): #输入
## 如果mask为0,则表示该通道需要删除,非0则保留
idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy()))) #idx0是start_mask中非0值的索引
idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy()))) #idx0是end_mask中非0值的索引
if idx0.size == 1:
idx0 = np.resize(idx0,(1,))
if idx1.size == 1:
idx1 = np.resize(idx1,(1,))
## 参数更新
w1 = m0.weight.data[:,idx0.tolist(),:].clone()
## shape = w1.shape
w1 = w1[idx1.tolist(),:,:].clone()
## shape = w1.shape
## bias更新
b1 = m0.bias.data.clone()
b1 = b1[idx1.tolist()].clone()
m1.weight.data = w1.clone()
m1.bias.data = b1.clone()
# elif isinstance(m0,nn.Linear):
# idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy()))) # 输入非0索引
# if idx0.size==1:
#idx0=np.resize(idx0,(1,))
# m1.weight.data=m0.weight.data[:,idx0].clone()
# m1.bias.data=m0.bias.data.clone()
torch.save(newmodel.state_dict(),"prune_net.pth")
print(newmodel)
*代码参考来自模型压缩(一)通道剪枝-BN层-CSDN博客
原代码是针对二维卷积的剪枝操作,处于工作需要,修改为了一维卷积。最麻烦的是对里面参数维度的确定,很容易出现维度不匹配的问题,需要注意。 同时,实现时删除了原代码关于全连接层的操作,因为原代码只有一层全连接,不会出现维度匹配问题,但我的模型有多层全连接,对后面的全连接层的操作会导致逻辑问题,所以删掉了。就效果而言也还可以。
3. 对剪枝后模型进行Finetune
Finetune的过程跟初始的训练过程差不多,需要注意把学习率降低进行微调。
newmodel.load_state_dict(torch.load("prune_net.pth"))
#
optimer=torch.optim.Adam(newmodel.parameters())
loss_fn=torch.nn.CrossEntropyLoss()
for e in range(100):
x = torch.rand((1, 3, 20, 20))
y=torch.tensor(np.random.randint(0,2,(1))).long()
out=newmodel(x)
loss=loss_fn(out,y)
optimer.zero_grad()
loss.backward()
optimer.step()
torch.save(newmodel.state_dict(),"prune_net.pth")
在参考的原代码中,在optimer部分输入的参数是model.parameters(),但我感觉优化的对象应该是新模型newmodel.parameters(),不知道是不是我的理解有无,欢迎指正。
在使用这种方法时,通过cfg创建网络时,需要手动确定好卷积层的位置,如我的实例中,cfg=[1,'A',5,'A'],表示第一层为卷积层,卷积通道数为1,第三层为卷积层,通道数为5,第2、4层均为池化层,是不参与网络重构的。
目前的学习进度大概就是这样,欢迎大家来与作者讨论交流,有更好的剪枝方法或实现方式请务必分享一下😀
更多推荐
所有评论(0)