上一篇笔记记录了对于剪枝原理的学习和思考,本篇学习一下剪枝具体的编程实现方法

代码大部分来自网络,剪枝过程使用的源模型为自己训练的model.pt,模型大小为2MB左右。剪枝并finetune后模型压缩了70%左右,在简单任务上的识别精度较好。

以下代码来自https://www.cnblogs.com/armcvai/p/17149914.html#%E4%BA%8Cpytorch-%E7%9A%84%E5%89%AA%E6%9E%9Dicon-default.png?t=O83Ahttps://www.cnblogs.com/armcvai/p/17149914.html#%E4%BA%8Cpytorch-%E7%9A%84%E5%89%AA%E6%9E%9D

https://blog.csdn.net/qq_33952811/article/details/124354155文章浏览阅读1.6k次,点赞4次,收藏15次。目录1.导包&定义一个简单的网络2.多参数剪枝3.全局剪枝4.总结目前大部分最先进的(SOTA)深度学习技术虽然效果好,但由于其模型参数量和计算量过高,难以用于实际部署。而众所周知,生物神经网络使用高效的稀疏连接(生物大脑神经网络balabala啥的都是稀疏连接的),考虑到这一点,为了减少内存、容量和硬件消耗,同时又不牺牲模型预测的精度,在设备上部署轻量级模型,并通过私有的设备上计算以保证隐私,通过减少参数数量来压缩模型的最佳技术非常重要。稀疏神经网络在预测精度方面可以达到密集神经网络的水平,_pytorch 全局剪枝https://blog.csdn.net/qq_33952811/article/details/124354155

通过Pytorch进行剪枝有比较成熟的库prune可以使用,通过torch.nn.utils.prune进行调用

1. 剪枝分类

  • 非结构化剪枝
  • 结构化剪枝

剪枝可以在每层(局部)或多层/所有层(全局)上进行,pytorch中的结构化剪枝只支持局部

2. 具体实现方法(Pytorch)

Pytorch中可以用来实现剪枝的类:

class BasePruningMethod(ABC):

class PruningContainer(BasePruningMethod):

class Identity(BasePruningMethod):

class RandomUnstructured(BasePruningMethod):

class L1Unstructured(BasePruningMethod):

class RandomStructured(BasePruningMethod):

class LnStructured(BasePruningMethod):

具体实现的工作流程:

  1. 选择剪枝方法(或者子类化BasePruningMethod实现个性化的剪枝)
  2. 指定剪枝模块和参数名称
  3. 设置剪枝方法的参数,如剪枝比例、剪枝范围等

也可以自己写剪枝的逻辑,为了了解剪枝的流程,下面选择从原理上实现一次剪枝的流程

找了挺多资料,好像没有看到比较完善的基于onnx的剪枝,如果有好的教程,欢迎分享😀

1. 定义一个网络,训练完成后保存为pt格式

根据论文Learning Efficient Convolutional Networks through Network Slimming的说法,可以在BN层中加入L1正则化项,让BN的scaling factors趋向于0,从而可以更容易的识别出重要的通道。文章中认为"channel-level sparsity provides a nice tradeoff between flexibility and ease of implementation" compared to "weight-level" and "layer-level" pruning.

def updateBN(model,s=0.0001):
    for m in model.modules():
        if isinstance(m,nn.BatchNorm2d):
            m.weight.grad.data.add_(s*torch.sign(m.weight.data))

if __name__=="__main__":
    model=net()
    # from torchsummary import summary
    # print(summary(model,(3,20,20),1))
    # x = torch.rand((1, 3, 20, 20))
    # print(model(x))
    optimer=torch.optim.Adam(model.parameters())
    loss_fn=torch.nn.CrossEntropyLoss()
    for e in range(100):
        x = torch.rand((1, 3, 20, 20))
        y=torch.tensor(np.random.randint(0,2,(1))).long()
        out=model(x)
        loss=loss_fn(out,y)
        optimer.zero_grad()
        loss.backward()
        #BN权重稀疏化
        updateBN(model)
        optimer.step()
    torch.save(model.state_dict(),"net.pt")

 *代码参考来自模型压缩(一)通道剪枝-BN层-CSDN博客

似乎就是给BN层的weight加上了一个正则化项,不会增加参数量

有点问题:在训练初始模型时通过BN正则化添加了稀疏性,在后续Finetune的时候BN层还需要正则化吗?感觉大概不用?欢迎指正!

首先是基于剪枝的模型构建部分,引入了一个cfg参数作为剪枝的参照:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
import torch.nn as nn

##读取模型
class ld_model(nn.module):
    def __init__(self, *, inputlength = 60, kernelsize = 3, kindsoutput = 2, cfg = None):
        super(ld_model,self).__init__()
        ## cfg是一个卷积层操作的指标,见后文
        if cfg:
            self.features = self.make_layer(cfg)
            self.dropout = nn.Dropout(0.2)
            self.fc1 = nn.Sequential
            (
                        nn.Linear(in_features = cfg[2]*(inputlength -8), out_features = 512), 
                        nn.ReLU(),
            )
            self.fc2 = nn.Sequential
            (
                        nn.Linear(in_features = 512, out_features = 256), 
                        nn.ReLU(),
            )
            self.fc3 = nn.Sequential
            (
                        nn.Linear(in_features = 256, out_features = 128), 
                        nn.ReLU(),
            )
            self.out = nn.Linear(128, kindsoutput)
        
        else:
            layers = []
            layers += [ nn.Conv1d(in_channels = 1, out_channels = 6, kernel_size = 3),
                        nn.BatchNorm1d(6),
                        nn.ReLU(inplace=True),
                        nn.AvgPool1d(kernel_size = 3, stride = 1)]

            layers += [ nn.Conv1d(in_channels = 6, out_channels = 16, kernel_size = 3),
                        nn.BatchNorm1d(16),
                        nn.ReLU(inplace=True),
                        nn.AvgPool1d(kernel_size = 3, stride = 1)]
            self.features = nn.Sequential(*layers)
            self.dropout = nn.Dropout(0.2)
            self.fc1 = nn.Sequential
            (
                        nn.Linear(in_features = 16*(inputlength -8), out_features = 512), 
                        nn.ReLU(),
            )
            self.fc2 = nn.Sequential
            (
                        nn.Linear(in_features = 512, out_features = 256), 
                        nn.ReLU(),
            )
            self.fc3 = nn.Sequential
            (
                        nn.Linear(in_features = 256, out_features = 128), 
                        nn.ReLU(),
            )
            self.out = nn.Linear(128, kindsoutput)
    
    def make_layer(self, cfg):
        layers = []
        layers += [ nn.Conv1d(in_channels = 1, out_channels = cfg[0], kernel_size = 3),
                    nn.BatchNorm1d(cfg[0]),
                    nn.ReLU(inplace=True),
                    nn.AvgPool1d(kernel_size = 3, stride = 1)]

        layers += [ nn.Conv1d(in_channels = cfg[0], out_channels = cfg[2], kernel_size = 3),
                    nn.BatchNorm1d(cfg[2]),
                    nn.ReLU(inplace=True),
                    nn.AvgPool1d(kernel_size = 3, stride = 1)]
        
        return nn.Sequential(*layers)

    def forward(self, x):
        x = x.to(torch.float32)
        x = self.features(x)
        x = x.view(x.size(0),-1)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.dropout(x)
        x = self.out(x)
        x = x.to(torch.float32)
        return x


def updateBN(model, s=0.0001):
    for m in model.modules():
        if isinstance(m, nn.BatchNorm1d):
            m.weight.grad.data.add_(s * torch.sign(m.weight.data))


##具体训练过程略

##保存为pt模型
torch.save(model.state_dict(), 'model_bn_sparse.pt')

2. 选取上一步获得的pt模型进行剪枝

path = r'model_bn_sparse.pt'
model = ld_model()
model.load_state_dict(torch.load(path))

model = model.to(device)

total = 0 ##统计BN层参数量
for m in model.modules():
    if isinstance(m, nn.BatchNorm1d):
        # print(m.weight.data.shape[0])
        # print(m.weight.data)
        total += m.weight.data.shape[0]
print("BN层总参数量:", total)

bn_data = torch.zeros(total)  ##初始化bn层空间
index = 0
for m in model.modules():
     #将各个BN层的参数值拷贝到bn中
    if isinstance(m,nn.BatchNorm2d):
        size = m.weight.data.shape[0]
        bn_data[index:(index+size)] = m.weight.data.abs().clone()
        index += size
#对bn中的weight值排序
data, id = torch.sort(bn_data)
percent = 0.7 #保留70%的BN层通道数
thresh_index = int(total*percent)
thresh = data[thresh_index] #取bn排序后的第thresh_index索引值为bn权重的截断阈值


#制作mask
pruned_num = 0  #统计BN层剪枝通道数
cfg = []  #统计保存通道数
cfg_mask = [] #BN层权重矩阵,剪枝的通道记为0,未剪枝通道记为1
 
##以下剪枝原理部分针对一维卷积进行了更改,使用时需要注意参数的维度
for k,m in enumerate(model.modules()):
    if isinstance(m,nn.BatchNorm1d):
        weight_copy = m.weight.data.abs().clone()
        # print(weight_copy)
        mask = weight_copy.gt(thresh).float()#阈值分离权重
        # print(mask)
        # exit()
        pruned_num += mask.shape[0] - torch.sum(mask)#
        # print(pruned_num)
        m.weight.data.mul_(mask) #更新BN层的权重,剪枝通道的权重值为0
        m.bias.data.mul_(mask)
 
        cfg.append(int(torch.sum(mask)))#记录未被剪枝的通道数量
        cfg_mask.append(mask.clone())
        print("layer index:{:d}\t total channel:{:d}\t remaining channel:{:d}".format(k,mask.shape[0],int(torch.sum(mask))))
    elif isinstance(m,nn.AvgPool1d):
        cfg.append("A")   ##记录池化层位置,对池化层不做操作
 
 
pruned_ratio = pruned_num / total
print("剪枝通道占比:",pruned_ratio)
print(cfg)

newmodel = ld_model(cfg)  #根据cfg中记录的需要保存的通道数及位置信息实例化网络,此时网络结构已经发生变化,以及需要对参数进行更新
new_model = new_model.to(device)
# print(newmodel)
# from torchsummary import summary
# print(summary(newmodel,(3,20,20),1))
 
layer_id_in_cfg = 0 #层

start_mask = torch.ones(1)  ##这里需要注意,start_mask的初始化大小对应的是输入数据的维度
end_mask = cfg_mask[layer_id_in_cfg]#第一个BN层对应的mask
# print(cfg_mask)
# print(end_mask)
 
for (m0,m1) in zip(model.modules(), newmodel.modules()):#以最少的为准
    if isinstance(m0, nn.BatchNorm1d):
        # idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy())))#获得mask中非零索引即未被减掉的序号
        
        if idx1.size == 1:
            idx1 = np.resize(idx1,(1,))
            
        #将旧模型的参数值拷贝到新模型中
        m1.weight.data = m0.weight.data[idx1.tolist()].clone()
        m1.bias.data = m0.bias.data[idx1.tolist()].clone()
        m1.running_mean = m0.running_mean[idx1.tolist()].clone()
        m1.running_var = m0.running_var[idx1.tolist()].clone()
 
        layer_id_in_cfg += 1#下一个mask
        start_mask = end_mask.clone()
        if layer_id_in_cfg < len(cfg_mask):
            end_mask = cfg_mask[layer_id_in_cfg]

    elif isinstance(m0, nn.Conv1d): #输入
        ## 如果mask为0,则表示该通道需要删除,非0则保留
        idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy()))) #idx0是start_mask中非0值的索引
        idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy()))) #idx0是end_mask中非0值的索引
 
        if idx0.size == 1:
            idx0 = np.resize(idx0,(1,))
        if idx1.size == 1:
            idx1 = np.resize(idx1,(1,))
        
        ## 参数更新
        w1 = m0.weight.data[:,idx0.tolist(),:].clone()
        ## shape = w1.shape
        w1 = w1[idx1.tolist(),:,:].clone()
        ## shape = w1.shape
        ## bias更新
        b1 = m0.bias.data.clone()
        b1 = b1[idx1.tolist()].clone()

        m1.weight.data = w1.clone()
        m1.bias.data = b1.clone()

    # elif isinstance(m0,nn.Linear):
        # idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy())))  # 输入非0索引
        # if idx0.size==1:
            #idx0=np.resize(idx0,(1,))
 
        # m1.weight.data=m0.weight.data[:,idx0].clone()
        # m1.bias.data=m0.bias.data.clone()

 
torch.save(newmodel.state_dict(),"prune_net.pth")
print(newmodel)

*代码参考来自模型压缩(一)通道剪枝-BN层-CSDN博客

原代码是针对二维卷积的剪枝操作,处于工作需要,修改为了一维卷积。最麻烦的是对里面参数维度的确定,很容易出现维度不匹配的问题,需要注意。 同时,实现时删除了原代码关于全连接层的操作,因为原代码只有一层全连接,不会出现维度匹配问题,但我的模型有多层全连接,对后面的全连接层的操作会导致逻辑问题,所以删掉了。就效果而言也还可以。

3. 对剪枝后模型进行Finetune

Finetune的过程跟初始的训练过程差不多,需要注意把学习率降低进行微调。

newmodel.load_state_dict(torch.load("prune_net.pth"))
#
optimer=torch.optim.Adam(newmodel.parameters())
loss_fn=torch.nn.CrossEntropyLoss()
for e in range(100):
    x = torch.rand((1, 3, 20, 20))
    y=torch.tensor(np.random.randint(0,2,(1))).long()
    out=newmodel(x)
    loss=loss_fn(out,y)
    optimer.zero_grad()
    loss.backward()
    optimer.step()
torch.save(newmodel.state_dict(),"prune_net.pth")

在参考的原代码中,在optimer部分输入的参数是model.parameters(),但我感觉优化的对象应该是新模型newmodel.parameters(),不知道是不是我的理解有无,欢迎指正。

在使用这种方法时,通过cfg创建网络时,需要手动确定好卷积层的位置,如我的实例中,cfg=[1,'A',5,'A'],表示第一层为卷积层,卷积通道数为1,第三层为卷积层,通道数为5,第2、4层均为池化层,是不参与网络重构的。

目前的学习进度大概就是这样,欢迎大家来与作者讨论交流,有更好的剪枝方法或实现方式请务必分享一下😀

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐