mmdet 10: Weight initialization
TUTORIAL 10: WEIGHT INITIALIZATION
During training, a proper initialization strategy is beneficial to speeding up the training or obtaining a higher performance. MMCV provide some commonly used methods for initializing modules like nn.Conv2d
. Model initialization in MMdetection mainly uses init_cfg
. Users can initialize models with following two steps:
- Define
init_cfg
for a model or its components inmodel_cfg
, butinit_cfg
of children components have higher priority and will overrideinit_cfg
of parents modules. - Build model as usual, but call
model.init_weights()
method explicitly, and model parameters will be initialized as configuration.
The high-level workflow of initialization in MMdetection is :
model_cfg(init_cfg) -> build_from_cfg -> model -> init_weight() -> initialize(self, self.init_cfg) -> children’s init_weight()
Description
It is dict or list[dict], and contains the following keys and values:
type
(str), containing the initializer name inINTIALIZERS
, and followed by arguments of the initializer.layer
(str or list[str]), containing the names of basiclayers in Pytorch or MMCV with learnable parameters that will be initialized, e.g.'Conv2d'
,'DeformConv2d'
.override
(dict or list[dict]), containing the sub-modules that not inherit from BaseModule and whose initialization configuration is different from other layers’ which are in'layer'
key. Initializer defined intype
will work for all layers defined inlayer
, so if sub-modules are not derived Classes ofBaseModule
but can be initialized as same ways of layers inlayer
, it does not need to useoverride
.override
contains:type
followed by arguments of initializer;name
to indicate sub-module which will be initialized.
Initialize parameters
Inherit a new model from mmcv.runner.BaseModule
or mmdet.models
Here we show an example of FooModel.
import torch.nn as nn from mmcv.runner import BaseModule class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=None): super(FooModel, self).__init__(init_cfg) ...
- Initialize model by using
init_cfg
directly in codeimport torch.nn as nn from mmcv.runner import BaseModule # or directly inherit mmdet models class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=XXX): super(FooModel, self).__init__(init_cfg) … - Initialize model by using
init_cfg
directly inmmcv.Sequential
ormmcv.ModuleList
codefrom mmcv.runner import BaseModule, ModuleList class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=None): super(FooModel, self).__init__(init_cfg) … self.conv1 = ModuleList(init_cfg=XXX) - Initialize model by using
init_cfg
in config filemodel = dict( … model = dict( type=‘FooModel’, arg1=XXX, arg2=XXX, init_cfg=XXX), …
Usage of init_cfg
- Initialize model by
layer
keyIf we only definelayer
, it just initialize the layer inlayer
key.NOTE: Value oflayer
key is the class name with attributes weights and bias of Pytorch, (so such asMultiheadAttention layer
is not supported).
- Define
layer
key for initializing module with same configuration.init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’, ‘Conv2d’, ‘Linear’], val=1) # initialize whole module with same configuration - Define
layer
key for initializing layer with different configurations.
init_cfg = [dict(type='Constant', layer='Conv1d', val=1), dict(type='Constant', layer='Conv2d', val=2), dict(type='Constant', layer='Linear', val=3)] # nn.Conv1d will be initialized with dict(type='Constant', val=1) # nn.Conv2d will be initialized with dict(type='Constant', val=2) # nn.Linear will be initialized with dict(type='Constant', val=3)
- Initialize model by
override
key
- When initializing some specific part with its attribute name, we can use
override
key, and the value inoverride
will ignore the value in init_cfg.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(type=‘Constant’, name=‘reg’, val=3, bias=4)) # self.feat and self.cls will be initialized with dict(type=’Constant’, val=1, bias=2) # The module called ‘reg’ will be initialized with dict(type=’Constant’, val=3, bias=4) - If
layer
is None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type=‘Constant’, val=1, bias=2, override=dict(name=‘reg’)) # self.feat and self.cls will be initialized by Pytorch # The module called ‘reg’ will be initialized with dict(type=’Constant’, val=1, bias=2) - If we don’t define
layer
key oroverride
key, it will not initialize anything. - Invalid usage# It is invalid that override don’t have name key init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(type=‘Constant’, val=3, bias=4)) # It is also invalid that override has name and other args except type init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(name=‘reg’, val=3, bias=4))
- Initialize model with the pretrained modelinit_cfg = dict(type=‘Pretrained’, checkpoint=‘torchvision://resnet50’)
More details can refer to the documentation in MMCV and MMCV PR #780