关于神经网络压缩方法的学习---实现部分（剪枝）

上一篇笔记记录了对于剪枝原理的学习和思考，本篇学习一下剪枝具体的编程实现方法代码大部分来自网络，剪枝过程使用的源模型为自己训练的model.onnx，模型大小为2MB左右。以下代码来自目录1.导包&定义一个简单的网络2.多参数剪枝3.全局剪枝4.总结目前大部分最先进的（SOTA）深度学习技术虽然效果好，但由于其模型参数量和计算量过高，难以用于实际部署。

Judy~judy

908人浏览 · 2024-12-03 16:34:43

Judy~judy · 2024-12-03 16:34:43 发布

上一篇笔记记录了对于剪枝原理的学习和思考，本篇学习一下剪枝具体的编程实现方法

代码大部分来自网络，剪枝过程使用的源模型为自己训练的model.pt，模型大小为2MB左右。剪枝并finetune后模型压缩了70%左右，在简单任务上的识别精度较好。

以下代码来自https://www.cnblogs.com/armcvai/p/17149914.html#%E4%BA%8Cpytorch-%E7%9A%84%E5%89%AA%E6%9E%9Dhttps://www.cnblogs.com/armcvai/p/17149914.html#%E4%BA%8Cpytorch-%E7%9A%84%E5%89%AA%E6%9E%9D

https://blog.csdn.net/qq_33952811/article/details/124354155文章浏览阅读1.6k次，点赞4次，收藏15次。目录1.导包&定义一个简单的网络2.多参数剪枝3.全局剪枝4.总结目前大部分最先进的（SOTA）深度学习技术虽然效果好，但由于其模型参数量和计算量过高，难以用于实际部署。而众所周知，生物神经网络使用高效的稀疏连接（生物大脑神经网络balabala啥的都是稀疏连接的），考虑到这一点，为了减少内存、容量和硬件消耗，同时又不牺牲模型预测的精度，在设备上部署轻量级模型，并通过私有的设备上计算以保证隐私，通过减少参数数量来压缩模型的最佳技术非常重要。稀疏神经网络在预测精度方面可以达到密集神经网络的水平，_pytorch 全局剪枝https://blog.csdn.net/qq_33952811/article/details/124354155

通过Pytorch进行剪枝有比较成熟的库prune可以使用，通过torch.nn.utils.prune进行调用

1. 剪枝分类

非结构化剪枝
结构化剪枝

剪枝可以在每层（局部）或多层/所有层（全局）上进行，pytorch中的结构化剪枝只支持局部

2. 具体实现方法（Pytorch）

Pytorch中可以用来实现剪枝的类：

class BasePruningMethod(ABC):

class PruningContainer(BasePruningMethod):

class Identity(BasePruningMethod):

class RandomUnstructured(BasePruningMethod):

class L1Unstructured(BasePruningMethod):

class RandomStructured(BasePruningMethod):

class LnStructured(BasePruningMethod):

具体实现的工作流程：

选择剪枝方法（或者子类化BasePruningMethod实现个性化的剪枝）
指定剪枝模块和参数名称
设置剪枝方法的参数，如剪枝比例、剪枝范围等

也可以自己写剪枝的逻辑，为了了解剪枝的流程，下面选择从原理上实现一次剪枝的流程

找了挺多资料，好像没有看到比较完善的基于onnx的剪枝，如果有好的教程，欢迎分享😀

1. 定义一个网络，训练完成后保存为pt格式

根据论文Learning Efficient Convolutional Networks through Network Slimming的说法，可以在BN层中加入L1正则化项，让BN的scaling factors趋向于0，从而可以更容易的识别出重要的通道。文章中认为"channel-level sparsity provides a nice tradeoff between flexibility and ease of implementation" compared to "weight-level" and "layer-level" pruning.

def updateBN(model,s=0.0001):
    for m in model.modules():
        if isinstance(m,nn.BatchNorm2d):
            m.weight.grad.data.add_(s*torch.sign(m.weight.data))

if __name__=="__main__":
    model=net()
    # from torchsummary import summary
    # print(summary(model,(3,20,20),1))
    # x = torch.rand((1, 3, 20, 20))
    # print(model(x))
    optimer=torch.optim.Adam(model.parameters())
    loss_fn=torch.nn.CrossEntropyLoss()
    for e in range(100):
        x = torch.rand((1, 3, 20, 20))
        y=torch.tensor(np.random.randint(0,2,(1))).long()
        out=model(x)
        loss=loss_fn(out,y)
        optimer.zero_grad()
        loss.backward()
        #BN权重稀疏化
        updateBN(model)
        optimer.step()
    torch.save(model.state_dict(),"net.pt")

*代码参考来自模型压缩（一）通道剪枝-BN层-CSDN博客

似乎就是给BN层的weight加上了一个正则化项，不会增加参数量

有点问题：在训练初始模型时通过BN正则化添加了稀疏性，在后续Finetune的时候BN层还需要正则化吗？感觉大概不用？欢迎指正！

首先是基于剪枝的模型构建部分，引入了一个cfg参数作为剪枝的参照：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
import torch.nn as nn

##读取模型
class ld_model(nn.module):
    def __init__(self, *, inputlength = 60, kernelsize = 3, kindsoutput = 2, cfg = None):
        super(ld_model,self).__init__()
        ## cfg是一个卷积层操作的指标，见后文
        if cfg:
            self.features = self.make_layer(cfg)
            self.dropout = nn.Dropout(0.2)
            self.fc1 = nn.Sequential
            (
                        nn.Linear(in_features = cfg[2]*(inputlength -8), out_features = 512), 
                        nn.ReLU(),
            )
            self.fc2 = nn.Sequential
            (
                        nn.Linear(in_features = 512, out_features = 256), 
                        nn.ReLU(),
            )
            self.fc3 = nn.Sequential
            (
                        nn.Linear(in_features = 256, out_features = 128), 
                        nn.ReLU(),
            )
            self.out = nn.Linear(128, kindsoutput)
        
        else:
            layers = []
            layers += [ nn.Conv1d(in_channels = 1, out_channels = 6, kernel_size = 3),
                        nn.BatchNorm1d(6),
                        nn.ReLU(inplace=True),
                        nn.AvgPool1d(kernel_size = 3, stride = 1)]

            layers += [ nn.Conv1d(in_channels = 6, out_channels = 16, kernel_size = 3),
                        nn.BatchNorm1d(16),
                        nn.ReLU(inplace=True),
                        nn.AvgPool1d(kernel_size = 3, stride = 1)]
            self.features = nn.Sequential(*layers)
            self.dropout = nn.Dropout(0.2)
            self.fc1 = nn.Sequential
            (
                        nn.Linear(in_features = 16*(inputlength -8), out_features = 512), 
                        nn.ReLU(),
            )
            self.fc2 = nn.Sequential
            (
                        nn.Linear(in_features = 512, out_features = 256), 
                        nn.ReLU(),
            )
            self.fc3 = nn.Sequential
            (
                        nn.Linear(in_features = 256, out_features = 128), 
                        nn.ReLU(),
            )
            self.out = nn.Linear(128, kindsoutput)
    
    def make_layer(self, cfg):
        layers = []
        layers += [ nn.Conv1d(in_channels = 1, out_channels = cfg[0], kernel_size = 3),
                    nn.BatchNorm1d(cfg[0]),
                    nn.ReLU(inplace=True),
                    nn.AvgPool1d(kernel_size = 3, stride = 1)]

        layers += [ nn.Conv1d(in_channels = cfg[0], out_channels = cfg[2], kernel_size = 3),
                    nn.BatchNorm1d(cfg[2]),
                    nn.ReLU(inplace=True),
                    nn.AvgPool1d(kernel_size = 3, stride = 1)]
        
        return nn.Sequential(*layers)

    def forward(self, x):
        x = x.to(torch.float32)
        x = self.features(x)
        x = x.view(x.size(0),-1)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.dropout(x)
        x = self.out(x)
        x = x.to(torch.float32)
        return x


def updateBN(model, s=0.0001):
    for m in model.modules():
        if isinstance(m, nn.BatchNorm1d):
            m.weight.grad.data.add_(s * torch.sign(m.weight.data))


##具体训练过程略

##保存为pt模型
torch.save(model.state_dict(), 'model_bn_sparse.pt')

2. 选取上一步获得的pt模型进行剪枝

path = r'model_bn_sparse.pt'
model = ld_model()
model.load_state_dict(torch.load(path))

model = model.to(device)

total = 0 ##统计BN层参数量
for m in model.modules():
    if isinstance(m, nn.BatchNorm1d):
        # print(m.weight.data.shape[0])
        # print(m.weight.data)
        total += m.weight.data.shape[0]
print("BN层总参数量：", total)

bn_data = torch.zeros(total)  ##初始化bn层空间
index = 0
for m in model.modules():
     #将各个BN层的参数值拷贝到bn中
    if isinstance(m,nn.BatchNorm2d):
        size = m.weight.data.shape[0]
        bn_data[index:(index+size)] = m.weight.data.abs().clone()
        index += size
#对bn中的weight值排序
data, id = torch.sort(bn_data)
percent = 0.7 #保留70%的BN层通道数
thresh_index = int(total*percent)
thresh = data[thresh_index] #取bn排序后的第thresh_index索引值为bn权重的截断阈值


#制作mask
pruned_num = 0  #统计BN层剪枝通道数
cfg = []  #统计保存通道数
cfg_mask = [] #BN层权重矩阵，剪枝的通道记为0，未剪枝通道记为1
 
##以下剪枝原理部分针对一维卷积进行了更改，使用时需要注意参数的维度
for k,m in enumerate(model.modules()):
    if isinstance(m,nn.BatchNorm1d):
        weight_copy = m.weight.data.abs().clone()
        # print(weight_copy)
        mask = weight_copy.gt(thresh).float()#阈值分离权重
        # print(mask)
        # exit()
        pruned_num += mask.shape[0] - torch.sum(mask)#
        # print(pruned_num)
        m.weight.data.mul_(mask) #更新BN层的权重，剪枝通道的权重值为0
        m.bias.data.mul_(mask)
 
        cfg.append(int(torch.sum(mask)))#记录未被剪枝的通道数量
        cfg_mask.append(mask.clone())
        print("layer index:{:d}\t total channel:{:d}\t remaining channel:{:d}".format(k,mask.shape[0],int(torch.sum(mask))))
    elif isinstance(m,nn.AvgPool1d):
        cfg.append("A")   ##记录池化层位置，对池化层不做操作
 
 
pruned_ratio = pruned_num / total
print("剪枝通道占比：",pruned_ratio)
print(cfg)

newmodel = ld_model(cfg)  #根据cfg中记录的需要保存的通道数及位置信息实例化网络，此时网络结构已经发生变化，以及需要对参数进行更新
new_model = new_model.to(device)
# print(newmodel)
# from torchsummary import summary
# print(summary(newmodel,(3,20,20),1))
 
layer_id_in_cfg = 0 #层

start_mask = torch.ones(1)  ##这里需要注意，start_mask的初始化大小对应的是输入数据的维度
end_mask = cfg_mask[layer_id_in_cfg]#第一个BN层对应的mask
# print(cfg_mask)
# print(end_mask)
 
for (m0,m1) in zip(model.modules(), newmodel.modules()):#以最少的为准
    if isinstance(m0, nn.BatchNorm1d):
        # idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy())))#获得mask中非零索引即未被减掉的序号
        
        if idx1.size == 1:
            idx1 = np.resize(idx1,(1,))
            
        #将旧模型的参数值拷贝到新模型中
        m1.weight.data = m0.weight.data[idx1.tolist()].clone()
        m1.bias.data = m0.bias.data[idx1.tolist()].clone()
        m1.running_mean = m0.running_mean[idx1.tolist()].clone()
        m1.running_var = m0.running_var[idx1.tolist()].clone()
 
        layer_id_in_cfg += 1#下一个mask
        start_mask = end_mask.clone()
        if layer_id_in_cfg < len(cfg_mask):
            end_mask = cfg_mask[layer_id_in_cfg]

    elif isinstance(m0, nn.Conv1d): #输入
        ## 如果mask为0，则表示该通道需要删除，非0则保留
        idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy()))) #idx0是start_mask中非0值的索引
        idx1=np.squeeze(np.argwhere(np.asarray(end_mask.numpy()))) #idx0是end_mask中非0值的索引
 
        if idx0.size == 1:
            idx0 = np.resize(idx0,(1,))
        if idx1.size == 1:
            idx1 = np.resize(idx1,(1,))
        
        ## 参数更新
        w1 = m0.weight.data[:,idx0.tolist(),:].clone()
        ## shape = w1.shape
        w1 = w1[idx1.tolist(),:,:].clone()
        ## shape = w1.shape
        ## bias更新
        b1 = m0.bias.data.clone()
        b1 = b1[idx1.tolist()].clone()

        m1.weight.data = w1.clone()
        m1.bias.data = b1.clone()

    # elif isinstance(m0,nn.Linear):
        # idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.numpy())))  # 输入非0索引
        # if idx0.size==1:
            #idx0=np.resize(idx0,(1,))
 
        # m1.weight.data=m0.weight.data[:,idx0].clone()
        # m1.bias.data=m0.bias.data.clone()

 
torch.save(newmodel.state_dict(),"prune_net.pth")
print(newmodel)

*代码参考来自模型压缩（一）通道剪枝-BN层-CSDN博客

原代码是针对二维卷积的剪枝操作，处于工作需要，修改为了一维卷积。最麻烦的是对里面参数维度的确定，很容易出现维度不匹配的问题，需要注意。同时，实现时删除了原代码关于全连接层的操作，因为原代码只有一层全连接，不会出现维度匹配问题，但我的模型有多层全连接，对后面的全连接层的操作会导致逻辑问题，所以删掉了。就效果而言也还可以。

3. 对剪枝后模型进行Finetune

Finetune的过程跟初始的训练过程差不多，需要注意把学习率降低进行微调。

newmodel.load_state_dict(torch.load("prune_net.pth"))
#
optimer=torch.optim.Adam(newmodel.parameters())
loss_fn=torch.nn.CrossEntropyLoss()
for e in range(100):
    x = torch.rand((1, 3, 20, 20))
    y=torch.tensor(np.random.randint(0,2,(1))).long()
    out=newmodel(x)
    loss=loss_fn(out,y)
    optimer.zero_grad()
    loss.backward()
    optimer.step()
torch.save(newmodel.state_dict(),"prune_net.pth")

在参考的原代码中，在optimer部分输入的参数是model.parameters()，但我感觉优化的对象应该是新模型newmodel.parameters()，不知道是不是我的理解有无，欢迎指正。

在使用这种方法时，通过cfg创建网络时，需要手动确定好卷积层的位置，如我的实例中，cfg=[1,'A',5,'A']，表示第一层为卷积层，卷积通道数为1，第三层为卷积层，通道数为5，第2、4层均为池化层，是不参与网络重构的。

目前的学习进度大概就是这样，欢迎大家来与作者讨论交流，有更好的剪枝方法或实现方式请务必分享一下😀

技术共进，成长同行——讯飞AI开发者社区

更多推荐

[深度学习]卷积神经网络

本实验基于Python和PyTorch框架比较了LeNet、AlexNet、VGG和ResNet四种经典CNN模型在FashionMNIST数据集上的表现，并重点研究了超参数调整对模型性能的影响。实验结果表明：1）对于所有模型，SGD优化器普遍比Adam表现更好；2）学习率在0.05左右时模型性能最佳；3）增加训练轮数可以提高准确率但会延长训练时间；4）批量大小对模型性能影响相对较小。此外，通过简

讯飞AI开发者社区

华为云Flexus+DeepSeek征文｜基于华为云Flexus云服务的Dify一键部署

讯飞AI开发者社区

基于嵌入式系统的智能宠物行为模式预测模型

这种设计使系统在持续运行72小时后仍保持98%的在线率（Table 1）。数据采集系统整合了六类传感器网络：运动传感器（加速度计+陀螺仪）、环境传感器（温湿度+光照）、生物传感器（心率+皮肤电）、视觉传感器（RGB摄像头）、音频传感器（麦克风阵列）和定位传感器（GPS+蓝牙信标）（Figure 1）。数据预处理采用三级流水线：原始数据经过滑动窗口截断（窗口长度5s）、小波变换去噪（db6小波基）和