深度学习学习笔记——在kaggle上跑mmdetection

pytorch版本升级：因为kaggle默认的torch版本时1.4，而mmcv需要的最低torch版本大于1.4，故要升级torch版本，但是刚点进pytorch官网发现支持的cuda版本只有10.2和11.3,所以只能另想办法，最后用pip install torch==1.7的方式指定torch版本，最后虽然下载下来了发现用pip show pytorch确实是下载的1.7版本，可是用tor

phily123

3431人浏览 · 2021-12-15 16:42:56

phily123 · 2021-12-15 16:42:56 发布

一、环境搭建

1.pytorch版本升级：

因为kaggle默认的torch版本时1.4，而mmcv需要的最低torch版本大于1.4，故要升级torch版本，但是刚点进pytorch官网发现支持的cuda版本只有10.2和11.3,所以只能另想办法，最后用pip install torch==1.7的方式指定torch版本，最后虽然下载下来了发现用pip show pytorch确实是下载的1.7版本，可是用torch.__version__发现还是原来的1.4版本，没弄明白但当然不能用，最后发现官网原来是有其他指定的cuda版本对应的指定torch版本下载的，老老实实使用官网指定方法下载后成功了
官网链接：https://pytorch.org/get-started/previous-versions/
在这里插入图片描述

pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

2.安装mmcv
https://github.com/open-mmlab/mmcv#installation
3.mmdetection和mmcv对应版本
https://mmdetection.readthedocs.io/en/v2.19.1/get_started.html

二、kernel

kaggle目录： /——kaggle——input、lib、working

kaggle卸载包： kaggle中的notebook卸载包时没法交互也就没法yes，所以可以按如下方法解决：

import pip

def pip_install(package):
   pip.main(['install', package])
def pip_list():
   pip.main(['list'])
def pip_uninstall(package):
    pip.main(['uninstall', package])

三、训练日志

初次在kaggle上跑
在这里插入图片描述

下面是在租的服务器上跑的

**warning：**这是用htc上预训练的模型跑cascade rcnn，可以看出size mismatch是正常的，因为类别数不同了，但同时也可以发现unexpected key in sourc state_dict,是因为模型不同，htc上预训练的模型多了很多参数。
在这里插入图片描述
下面不仅出现了unexpected key in source state-dict,也出现了misssing keys in source state_dict，说明预训练模型参数和模型相当不匹配。

在这里插入图片描述
用官网cascade rcnn resnet 50 fpn 1x的配置及其提供的相应预训练模型，可以发现除了出现size mismatch外，其他都是匹配的。

在这里插入图片描述
四、保存训练好的模型

import os
os.chdir('/kaggle/working')
print(os.getcwd())
print(os.listdir("/kaggle/working"))
from IPython.display import FileLink
FileLink('epoch_12.pth')