mmdet 10: Weight initialization
TUTORIAL 10: WEIGHT INITIALIZATION
During training, a proper initialization strategy is beneficial to speeding up the training or obtaining a higher performance. MMCV provide some commonly used methods for initializing modules like nn.Conv2d. Model initialization in MMdetection mainly uses init_cfg. Users can initialize models with following two steps:
- Define
init_cfgfor a model or its components inmodel_cfg, butinit_cfgof children components have higher priority and will overrideinit_cfgof parents modules. - Build model as usual, but call
model.init_weights()method explicitly, and model parameters will be initialized as configuration.
The high-level workflow of initialization in MMdetection is :
model_cfg(init_cfg) -> build_from_cfg -> model -> init_weight() -> initialize(self, self.init_cfg) -> children’s init_weight()
Description
It is dict or list[dict], and contains the following keys and values:
type(str), containing the initializer name inINTIALIZERS, and followed by arguments of the initializer.layer(str or list[str]), containing the names of basiclayers in Pytorch or MMCV with learnable parameters that will be initialized, e.g.'Conv2d','DeformConv2d'.override(dict or list[dict]), containing the sub-modules that not inherit from BaseModule and whose initialization configuration is different from other layers’ which are in'layer'key. Initializer defined intypewill work for all layers defined inlayer, so if sub-modules are not derived Classes ofBaseModulebut can be initialized as same ways of layers inlayer, it does not need to useoverride.overridecontains:typefollowed by arguments of initializer;nameto indicate sub-module which will be initialized.
Initialize parameters
Inherit a new model from mmcv.runner.BaseModule or mmdet.models Here we show an example of FooModel.
import torch.nn as nn
from mmcv.runner import BaseModule
class FooModel(BaseModule)
def __init__(self,
arg1,
arg2,
init_cfg=None):
super(FooModel, self).__init__(init_cfg)
...
- Initialize model by using
init_cfgdirectly in codeimport torch.nn as nn from mmcv.runner import BaseModule # or directly inherit mmdet models class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=XXX): super(FooModel, self).__init__(init_cfg) … - Initialize model by using
init_cfgdirectly inmmcv.Sequentialormmcv.ModuleListcodefrom mmcv.runner import BaseModule, ModuleList class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=None): super(FooModel, self).__init__(init_cfg) … self.conv1 = ModuleList(init_cfg=XXX) - Initialize model by using
init_cfgin config filemodel = dict( … model = dict( type=‘FooModel’, arg1=XXX, arg2=XXX, init_cfg=XXX), …
Usage of init_cfg
- Initialize model by
layerkeyIf we only definelayer, it just initialize the layer inlayerkey.NOTE: Value oflayerkey is the class name with attributes weights and bias of Pytorch, (so such asMultiheadAttention layeris not supported).
- Define
layerkey for initializing module with same configuration.init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’, ‘Conv2d’, ‘Linear’], val=1) # initialize whole module with same configuration - Define
layerkey for initializing layer with different configurations.
init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
dict(type='Constant', layer='Conv2d', val=2),
dict(type='Constant', layer='Linear', val=3)]
# nn.Conv1d will be initialized with dict(type='Constant', val=1)
# nn.Conv2d will be initialized with dict(type='Constant', val=2)
# nn.Linear will be initialized with dict(type='Constant', val=3)
- Initialize model by
overridekey
- When initializing some specific part with its attribute name, we can use
overridekey, and the value inoverridewill ignore the value in init_cfg.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(type=‘Constant’, name=‘reg’, val=3, bias=4)) # self.feat and self.cls will be initialized with dict(type=’Constant’, val=1, bias=2) # The module called ‘reg’ will be initialized with dict(type=’Constant’, val=3, bias=4) - If
layeris None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type=‘Constant’, val=1, bias=2, override=dict(name=‘reg’)) # self.feat and self.cls will be initialized by Pytorch # The module called ‘reg’ will be initialized with dict(type=’Constant’, val=1, bias=2) - If we don’t define
layerkey oroverridekey, it will not initialize anything. - Invalid usage# It is invalid that override don’t have name key init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(type=‘Constant’, val=3, bias=4)) # It is also invalid that override has name and other args except type init_cfg = dict(type=‘Constant’, layer=[‘Conv1d’,‘Conv2d’], val=1, bias=2, override=dict(name=‘reg’, val=3, bias=4))
- Initialize model with the pretrained modelinit_cfg = dict(type=‘Pretrained’, checkpoint=‘torchvision://resnet50’)
More details can refer to the documentation in MMCV and MMCV PR #780