VGGNet

1.VGGNet 结构与创新

 VGGNet是2014年ILSVRC(ImageNet Large Scale Visual Recognition Challenge大规模视觉识别挑战赛)竞赛的第二名,解决ImageNet中的1000类图像分类和定位问题,第一名是GoogLeNet。

 VGG全称是Visual Geometry Group,因为是由Oxford的Visual Geometry Group提出的。AlexNet问世之后,很多学者通过改进AlexNet的网络结构来提高自己的准确率,主要有两个方向:小卷积核和多尺度。而VGG的作者们则选择了另外一个方向,即加深网络深度。主要工作是证明了增加网络的深度能够在一定程度上影响网络最终的性能。

VGGNet创新点:

堆叠两个3*3卷积核代替一个5*5卷积核;堆叠三个3*3卷积核代替一个7*7卷积核。

相同感受视野,训练参数量减少。

感受视野 Receptive Field介绍

 定义:输出层一个元素对应输入层区域的大小。例如下图输入层5*5,输出层1*1,感受视野是5

计算:感受视野= (上一层感受视野-1)*步长+卷积核尺寸

 参数Param介绍

卷积层:(卷积参数(卷积核各部分)+偏置参数)*参数核的个数

(3层*3×3大小 + 1*偏置)*2个卷积核

 具体计算一次5*5卷积:

卷积后尺寸=(输入图像大小-卷积核大小+加边像素数)/步长+1

卷积后尺寸=(5-5+0)/1+1=1      输入图像为5;卷积核大小为5,没有加边所以加边像素数为0。

计算后输出结果为1,如上图所示。

感受视野为5,计算(1-1)*1+5

参数=5*5+1=26

 具体计算两次3*3卷积核

卷积后尺寸=(输入图像大小-卷积核大小+加边像素数)/步长+1

卷积后尺寸=(5-3+0)/1+1 =3

卷积后尺寸=(3-3+0)/1+1=1

感受视野=5;计算:层1:(1-1)*1+3=3;层2:(3-1)*1+3=5 

参数:(3*3+1)*2=20

由此可验证:两个3*3卷积核代替一个5*5卷积核,所以在相同感受视野里训练参数量减少。

 黑色:卷积层+ReLU激活;红色:池化层;蓝色:全连接层

 conv3:卷积核大小为3*3,步长为1,加边为1。

输出:(224-3+2*1)/1+1=224 (长宽不变) 

maxpool:池化核大小为2*2,步长为2.

输出:(224-2+0)/0+1=112 (长宽减半)

池化改变长宽,卷积操作改变厚度。

2.VGGNet 训练与预测

遥感图像数据集 包含2100张图片(21类*100张),256*256像素的彩色图,划分每类90张为训练集,10张为测试集。

1.划分数据集,按照训练集:验证集=9:1来划分-------------------------------split.py

#-*- coding : utf-8-*-
# coding:unicode_escape
import os
from shutil import copy
import random
 
 
def mkfile(file):
    if not os.path.exists(file):
        os.makedirs(file)
 
 
# 获取 photos 文件夹下除 .txt 文件以外所有文件夹名(即21种分类的类名)
file_path = '/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/Images'
flower_class = [cla for cla in os.listdir(file_path) if ".txt" not in cla]
 
# 创建 训练集train1 文件夹,并由21种类名在其目录下创建21个子目录
mkfile('/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/train1')
for cla in flower_class:
    mkfile('/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/train1/' + cla)
 
# 创建 验证集val1 文件夹,并由21种类名在其目录下创建21个子目录
mkfile('/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/val1')
for cla in flower_class:
    mkfile('/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/val1/' + cla)
 
# 划分比例,训练集 : 验证集 = 9 : 1
split_rate = 0.1
 
# 遍历21种类别的全部图像并按比例分成训练集和验证集
for cla in flower_class:
    cla_path = file_path + '/' + cla + '/'  # 某一类别动作的子目录
    images = os.listdir(cla_path)  # iamges 列表存储了该目录下所有图像的名称
    num = len(images)
    eval_index = random.sample(images, k=int(num * split_rate))  # 从images列表中随机抽取 k 个图像名称
    for index, image in enumerate(images):
        # eval_index 中保存验证集val的图像名称
        if image in eval_index:
            image_path = cla_path + image
            new_path = '/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/val1/' + cla
            copy(image_path, new_path)  # 将选中的图像复制到新路径
 
        # 其余的图像保存在训练集train中
        else:
            image_path = cla_path + image
            new_path = '/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification/UCMerced_LandUse/train1/' + cla
            copy(image_path, new_path)
        print("r[{}] processing [{}/{}]".format(cla, index + 1, num), end="")  # processing bar
    print()
 
print("processing done!")
 
 

2.定义模型,定义VGGNet网络模型------------------------------------------------------------------model.py

#-*- coding : utf-8-*-
# coding:unicode_escape
import torch.nn as nn
import torch

# official pretrain weights
model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth'
}


class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=False):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.features(x)
        # N x 512 x 7 x 7
        x = torch.flatten(x, start_dim=1)
        # N x 512*7*7
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                # nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


def make_features(cfg: list):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == "M":
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)]
            in_channels = v
    return nn.Sequential(*layers)


cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


def vgg(model_name="vgg16", **kwargs):
    assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
    cfg = cfgs[model_name]

    model = VGG(make_features(cfg), **kwargs)
    return model

3.训练,计算损失值loss,计算accuracy,保存训练好的网络参数-------------------------------train.py

#-*- coding : utf-8-*-
# coding:unicode_escape
import os
import sys
import json

import torch
import torch.nn as nn
from torchvision import transforms, datasets
import torch.optim as optim
from tqdm import tqdm

from model import vgg


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "/home/data1/ffeng_data/3.ycx/RS-SC/Image-classification"))  # get data root path
    image_path = os.path.join(data_root, "UCMerced_LandUse")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train1"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=2)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 32  #设置批处理大小
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=21, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val1"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=21, shuffle=False,
                                                  num_workers=nw)
    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))

    # test_data_iter = iter(validate_loader)
    # test_image, test_label = test_data_iter.next()

    model_name = "vgg16"
    net = vgg(model_name=model_name, num_classes=21, init_weights=True)#类别为21
    net.to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=0.0001) #学习率是0.0001

    epochs = 10 #迭代次数10
    best_acc = 0.0
    save_path = './{}Net.pth'.format(model_name)
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            outputs = net(images.to(device))
            loss = loss_function(outputs, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

4.利用训练好的网络参数后,用自己找的图像进行分类测试--------------------------predict.py

#-*- coding : utf-8-*-
# coding:unicode_escape
import torch
from model import vgg
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
import json


def main():

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    # 预处理
    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image 预测图片的地址
    image_path = "1.jpg"
    img = Image.open(image_path).convert('RGB')
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    try:
        json_file = open('./class_indices.json', 'r')
        class_indict = json.load(json_file)
    except Exception as e:
     print(e)
     exit(-1)

    # create model
    model = vgg(num_classes=21)
    # load model weights
    model_weight_path = "./vgg16Net.pth"
    model.load_state_dict(torch.load(model_weight_path))

    # 关闭 Dropout
    model.eval()
    with torch.no_grad():
        # predict class
     output = torch.squeeze(model(img))     # 将输出压缩,即压缩掉 batch 这个维度
     predict = torch.softmax(output, dim=0)
    predict_cla = torch.argmax(predict).numpy()
    print(class_indict[str(predict_cla)], predict[predict_cla].item())
    plt.show()

if __name__ == '__main__':
    main()

不使用迁移学习,大规模网络对于小规模数据集不太匹配,所以准确率就低一些。

总结:

1.堆叠两个3*3卷积核替代5*5卷积核;堆叠三个3*3卷积核替代7*7卷积核。目的:相同感受视野,减少参数量。

2.增加网络深度,提升性能。

3.计算资源损耗问题。

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐