mmdet train

Published by onesixx on

W&B (Weight and Biases)

DataSet

Data ์ˆ˜์ง‘

๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹ : ๋‹ค์šด๋กœ๋“œ

์‹ ๊ทœ ๋ฐ์ดํ„ฐ์…‹ : ๊ตฌ์„ฑ

1. Image ์ˆ˜์ง‘

  • ์ง์ ‘์ˆ˜์ง‘
  • ๋™์˜์ƒ์—์„œ ์ด๋ฏธ์ง€ ์ถ”์ถœ (ffmpeg ์‚ฌ์šฉ)
    ffmpeg -i example.mp4 -vf fps=Afolder/ex_detect_%4d.jpg
  • google์—์„œ ์ˆ˜์ง‘
  • Unity๋ฅผ ํ†ตํ•ด ๊ฐ€์ƒ๋ฐ์ดํ„ฐ ์ƒ์„ฑ

2. Annotation (CVAT ์‚ฌ์šฉ)

  • Bbox
  • Polygon : segmentation
  • Key-point (top-down, down-top)

3. ๋ฐ์ดํ„ฐ์…‹ ๋งŒ๋“ค๊ธฐ (Image์™€ Annotation์„ ์—ฐ๊ฒฐ)

  • CoCo ๋ฐ์ดํ„ฐ์…‹ ํฌ๋ฉง์œผ๋กœ CVAT์—์„œ Export
    (keypoint์— ๊ฒฝ์šฐ, cocoํ˜•์‹์œผ๋กœ export๊ฐ€ ๋˜์ง€ ์•Š์•„, CVAT(xml)ํฌ๋ฉง์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ ํ•œํ›„, datumaru๋ฅผ ํ†ตํ•ด coco ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ณ€ํ™˜
  • Custom ๋ฐ์ดํ„ฐ์…‹
    mmdetection์—์„œ custom dataset ๋“ฑ๋กํ›„ ์‚ฌ์šฉ

DataSet ๊ตฌ์„ฑ (train / valid & test)

train / valid & test ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„ : ์ด๋ฏธ์ง€๋Š” ๊ทธ๋Œ€๋กœ ๋‘๊ณ , ๊ฐ DataSet๋ณ„๋กœ AnnotationํŒŒ์ผ ๋ถ„๋ฆฌํ•˜์—ฌ ์ค€๋น„
(coco๋ฐ์ดํ„ฐ์…‹์€ ์ƒˆ๋กœ annotationํ›„ ์ฑ„๋ฒˆ์„ ๋‹ค์‹œ ํ•ด์•ผํ•จ)

Config์—์„œ, ์†Œ์Šค์˜ trn, val, tst ์˜ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ ๊ฒฐ์ •

์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ

MyCustomDataset์„ ๋“ฑ๋ก (load_annotations ์ž˜ ์ˆ˜์ •ํ•ด์„œ)

dataset ์ƒ์„ฑ

  • datasets = [build_dataset(cfg.data.train)] # /tools/train.py์—์„œ

cocoset์œผ๋กœ ๋ณ€ํ™˜

์›ฌ๋งŒํ•˜๋ฉด Coco๋กœ ๋ณ€ํ™˜

Model

๋ชจ๋ธ ์„ ์ •

open-mmlab/mmdetection ์˜ model-zoo

Model-zoo ์—์„œ

ex) faster_rcrnn

https://comlini8-8.tistory.com/86

MMDet ๋ชจ๋ธ์„ 5๊ฐ€์ง€ ์š”์†Œ๋กœ ๊ตฌ๋ถ„

Backboneํ”ผ์ฒ˜๋งต์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ FCN ๋„คํŠธ์›Œํฌ(ex. ResNet, MobileNet)
neckbackbone๊ณผ head ์‚ฌ์ด๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์š”์†Œ(ex. FPN, PAFPN)
head๊ตฌ์ฒด์ ์ธ ํƒœ์Šคํฌ๋ฅผ ์œ„ํ•œ ์š”์†Œ(ex. bbox prediction, mask prediction)
roi extractorํ”ผ์ฒ˜๋งต์œผ๋กœ๋ถ€ํ„ฐ RoI ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ๋ถ€๋ถ„(ex. RoI Align)
lossloss๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ head์˜ ๊ตฌ์„ฑ ์š”์†Œ(ex. FocalLoss, L1Loss, GHMLoss)


checkpoints ํŒŒ์ผ ์ค€๋น„

mmdet ์—์„œ ConvNeXt (CVPR’2022).

Model Zoo์—์„œ pretrained Model์˜ ๋„คํŠธ์›์„ ๋ฐ›์•„์˜ด.

w g e t -O checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth  http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

configuration – Config

๊ธฐ์กด Config ๊ฐ€์ ธ์˜ค๊ธฐ

sixxconfigs/makeConfig.py ๋ฅผ ํ†ตํ•ด ์ดˆ๊ธฐ Config ์ƒ์„ฑ

import os 
from mmcv import Config

os.chdir('/home/oschung_skcc/my/git/mmdetection')

config_file = 'configs/convnext/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco.py'
out_config =  'sixxconfigs/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco_sixx.py'

cfg = Config.fromfile(config_file)
#print(cfg.pretty_text)

try:
    with open(out_config, 'w') as f:
        f.write(cfg.pretty_text)
except FileNotFoundError:
    print("The 'docs' directory does not exist")
import argparse
import logging
import os 

from mmcv import Config

parser = argparse.ArgumentParser(description="")
parser.add_argument("-i", "--fromconfig",  default='', type=str, metavar="PATH", help="path from getting config")
parser.add_argument("-o", "--toconfig",    default='', type=str, metavar="PATH", help="path to getting config")

def print_info(message: str):
    logging.info(message)

def main():
    print_info("Starting...")
    args = parser.parse_args()

    if not args.fromconfig  :
        print("Warning!", "Nothing to set.\
Please specify a path!")
        print_info("Exiting...")
        return
    else:
        config_file =  args.fromconfig
        
    if not args.toconfig:
        out_config = 'sixxconfigs/'+  os.path.basename(args.fromconfig)
        print(args.toconfig)
    else:
        out_config  =  args.toconfig

    os.chdir('/home/oschung_skcc/my/git/mmdetection')
    # config_file = 'configs/convnext/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco.py'
    # out_config =  'sixxconfigs/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco_sixx.py'

    cfg = Config.fromfile(config_file)
    #print(cfg.pretty_text)

    try:
        with open(out_config, 'w') as f:
            f.write(cfg.pretty_text)
    except FileNotFoundError:
        print("The 'docs' directory does not exist")
    
    print_info("... End")

if __name__ == "__main__":
    main()

์‰ฌ์šด ์ˆ˜์ •์„ ์œ„ํ•ด ํ’€์–ด์ง„ config ๋งŒ๋“ค๊ธฐ

$ python sixxtools/makeConfig_sixx.py \\
--fromconfig configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \\
--toconfig sixxconfigs/faster_rcnn_r50_fpn_1x_coco_001.py
$ python sixxtools/misc/print_config.py \\
configs/convnext/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco.py

config ๋Œ€๋ถ„๋ฅ˜ ๋ฐ ์ฃผ์š”์„ค์ •๋‚ด์—ญ

config
๋Œ€๋ถ„๋ฅ˜
์„ค๋ช…
dataset dataset์˜ type(customdataset, cocodataset ๋“ฑ),
train/val/test dataset ์œ ํ˜•,
data_root,
train/val/test dataset์˜ ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •(type, ann_file, img_prefix, pipeline ๋“ฑ)
modelobject detection model์˜ backbone, neck, dense head, roi extractor, roi head(num_classes=4)
์ฃผ์š” ์˜์—ญ๋ณ„๋กœ ์„ธ๋ถ€ ์„ค์ •
scheduler optimizer ์œ ํ˜• ์„ค์ • (sgd, adam, rmsprop ๋“ฑ),
์ตœ์ดˆ learning ์„ค์ •
ํ•™์Šต์ค‘ ๋™์  learning rate ์ ์šฉ ์ •์ฑ… ์„ค์ •(step, cyclic, cosine annealing ๋“ฑ)
train ์‹œ epochs ํšŸ์ˆ˜ : learning rate scheduler
runtime์ฃผ๋กœ hook(callback)๊ด€๋ จ ์„ค์ •
ํ•™์Šต ์ค‘ checkpoint ํŒŒ์ผ,
log ํŒŒ์ผ ์ƒ์„ฑ์„ ์œ„ํ•œ interval epochs ์ˆ˜

config ์ˆ˜์ •

๊ธฐ์กด config๊ฐ€์ ธ์™€์„œ, training์— ์‚ฌ์šฉํ•  config ํŒŒ์ผ์ƒ์„ฑ
sixxconfigs/faster_rcnn_r50_fpn_1x_coco_sixx.py ๊ทธ๋ฆฌ๊ณ  ์ˆ˜์ •

  • num_classes=4, ์ˆ˜์ •(model์•„๋ž˜)
  • dataset_type = ‘CocoDataset’ ํ™•์ธ
  • data_root = ‘data/msc_pilot2/’ ์ˆ˜์ •
  • classes = [‘TRAY_A_1’, ‘TRAY_A_2’, ‘TRAY_A_3’, ‘TRAY_B_1’] ์ถ”๊ฐ€

gpu

  • samples_per_gpu
  • workers_per_gpu

data

  • train / val / test
    – ann_file ์ˆ˜์ •
    – classes ์ถ”๊ฐ€

config ์ˆ˜์ • ์˜ˆ

model = dict(
    roi_head=dict(
        bbox_head=dict(
            num_classes=4,

dataset_type = 'CocoDataset'
#data_root = 'data/coco/'
data_root = 'data/msc_pilot2/'   
classes=('Car', 'Truck', 'Pedestrian', 'Cyclist')

data = dict(
    train=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc.json', 
        #img_prefix='data/kitti_tiny/training/image_2', 
        classes=classes,

    val=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc_val.json',
        #img_prefix='data/kitti_tiny/training/image_2',
        classes=classes,

    test=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc_val.json',
        #img_prefix='data/kitti_tiny/training/image_2',
        classes=classes,
)

W&B์„ค์ • ์˜ˆ

https://onesixx.com/wandb/

#log_config = dict(interval=1, hooks=[dict(type='TextLoggerHook')])

log_config = dict(
    interval=10, #500
    hooks=[
        dict(type='TextLoggerHook',  interval=500),  
        dict(type='WandbLoggerHook', interval=1000,
            init_kwargs=dict(
                project='faster_rcnn_r50_fpn_1x',
                #entity = 'ENTITY ์ด๋ฆ„',
                name='sixx_tray')
        )
    ]
)

# workflow = [('train', 1)]
# 1 epoch์— train๊ณผ validation์„ ๋ชจ๋‘ ํ•˜๊ณ  ์‹ถ์œผ๋ฉด 
workflow = [('train', 1), ('val', 1)]

transfer learning

load_from = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
# 200epoch ํ•™์Šตํ•˜๋Š” ๋™์•ˆ 50๋ฒˆ ๋งˆ๋‹ค pthํŒŒ์ผ ๋งŒ๋“ค๊ณ , 100๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ฐ์Œ
# ํ‰๊ฐ€๋Š” 200๋ฒˆ ๋Œ๊ณ  ํ•จ. 
evaluation = dict(interval=200, metric='mIoU') #'mAP')
runner = dict(type='EpochBasedRunner', max_epochs=400)
checkpoint_config = dict(interval=50)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])



# ํ•™์Šต์œจ ๋ณ€๊ฒฝ ํ™˜๊ฒฝ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •. 
optimizer = dict(type='SGD', lr=0.02/8, momentum=0.9, weight_decay=0.0001)
lr_config = dict(
    policy='step',
    warmup=None,
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])


# ๊ฐ€์žฅ ์ตœ๊ทผ๊บผ ๋ถ€ํ„ฐ ์ด์–ด์„œ ํ•™์Šต
resume_from = 'work_dirs/sixx_faster_rcnn_r50_fpn_1x_coco/latest.pth'

์˜ˆ์‹œ2>

model = dict(
    roi_head=dict(
        bbox_head=dict(
            num_classes=4,
\t\t\t\t  dict(
            num_classes=4,
\t\t\t      dict(
            num_classes=4,
            ...
    mask_head=dict(
            num_classes=4,    
        
    
dataset_type = 'CocoDataset'
data_root = 'data/kitti_tiny/'
classes=('Car', 'Truck', 'Pedestrian', 'Cyclist'),
        
data = dict(
    
    samples_per_gpu=4,
    workers_per_gpu=4,
    
    train=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc.json', 
        img_prefix='data/kitti_tiny/training/image_2', 
        classes=classes,

    val=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc_val.json',
        img_prefix='data/kitti_tiny/training/image_2',
        classes=classes,

    test=dict(
        type='CocoDataset',
        ann_file='data/kitti_tiny/anno_cc_val.json',
        img_prefix='data/kitti_tiny/training/image_2',
        classes=classes,
)
        
evaluation = dict(metric=['bbox', 'segm'], save_best='auto', interval=50)
runner = dict(type='EpochBasedRunner',     max_epochs=10000)
checkpoint_config = dict(interval=500)        

        
# workflow = [('train', 1)]
# 1 epoch์— train๊ณผ validation์„ ๋ชจ๋‘ ํ•˜๊ณ  ์‹ถ์œผ๋ฉด 
workflow = [('train', 1), ('val', 1)]

#load_from = 'checkpoints/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco_20220510_201004-3d24f5a4.pth'
load_from = 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-small_3rdparty_32xb128-noema_in1k_20220301-303e75e3.pth'

runner ์˜ max_epochs ๊ฐ€ ์›ํ•˜๋Š” epoch 46

Batch_size

step 1473/ 46 ์•ฝ 32 …= iteration..

ํ‰๊ฐ€ (evaluation, 50๋ฒˆ์— ํ•œ๋ฒˆ)

checkpoint 1๋ฒˆ์— ํ•œ๋ฒˆ

log_config 1๋ฒˆ

CUDA_VISIBLE_DEVICES=2,3 port=29506 sixxtools/dist_train.sh "sixxconfigs/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco_sixx.py" 2
       
work_dir = './work_dirs/cascade_mask_rcnn_convnext-s_p4_w7_fpn_giou_4conv1f_fp16_ms-crop_3x_coco_sixx'
auto_resume = False
gpu_ids = range(0, 2)

model

  • num_class

dataset_type

data_root

classes

data

  • samples_per_gpu
  • workers_per_gpu
  • train / val / test
    – ann_file
    – classes

load_from


evaluation

  • save_best=’auto’, interval=50

checkpoint_config


optimizer์˜ lr ์ค„์ž„

lr_config = dict(
    policy='step',      # ์–ด๋–ค scheduler ๋ฅผ ์“ธ๊ฑด์ง€
    warmup='linear',    # warmup์„ ํ• ๊ฑด์ง€
    warmup_iters=500,   # warmup iteration ์–ผ๋งˆ๋‚˜ ์ค„๊ฑด์ง€
    warmup_ratio=0.001, 
    step=[8, 11])        # step์€ ์–ผ๋งˆ๋งˆ๋‹ค ๋ฐŸ์€ ๊ฑด์ง€

runner (_1x๋Š” epoch 12๋ฒˆ, _2x๋Š” epoch 24๋ฒˆ, _20e๋Š” epoch 20๋ฒˆ์„ ์˜๋ฏธ)

  • max_epochs=10000


auto_resume

gpu_ids

https://onesixx.com/mmdet-log/

# ํ‰๊ฐ€๋Š” 200๋ฒˆ ๋Œ๊ณ  ํ•จ. 
evaluation        = dict(interval=200, metric='mIoU') #'mAP')

# 200 epoch ํ•™์Šตํ•˜๋Š” ๋™์•ˆ 50๋ฒˆ ๋งˆ๋‹ค pthํŒŒ์ผ ๋งŒ๋“ค๊ณ , 100๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ฐ์Œ
runner            = dict(type='EpochBasedRunner', max_epochs=400)
checkpoint_config = dict(interval=50)
log_config        = dict(interval=100, hooks=[dict(type='TextLoggerHook')])

# ํ•™์Šต์œจ ๋ณ€๊ฒฝ ํ™˜๊ฒฝ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •. 
optimizer = dict(type='SGD', lr=0.02/8, momentum=0.9, weight_decay=0.0001)
lr_config = dict(
    policy='step',
    warmup=None,
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])

# ๊ฐ€์žฅ ์ตœ๊ทผ๊บผ ๋ถ€ํ„ฐ ์ด์–ด์„œ ํ•™์Šต
resume_from = 'work_dirs/sixx_faster_rcnn_r50_fpn_1x_coco/latest.pth'

GPU์‚ฌ์šฉ๋Ÿ‰ ๋ชจ๋‹ˆํ„ฐ๋ง (nvidia-smi, nvitop, gpustat)

watch -d -n 0.5 nvidia-smi

$ conda update -n base -c defaults conda

# https://anaconda.org/conda-forge/nvitop
# https://github.com/XuehaiPan/nvitop

$ conda install -c conda-forge nvitop
$ nvitop


# https://anaconda.org/conda-forge/gpustat
$ conda install -c conda-forge gpustat
$ gpustat

Training ์‹คํ–‰

$ python sixxtools/train.py "sixxconfigs/faster_rcnn_r50_fpn_1x_coco_sixx.py"

~/my/git/mmdetection/tools ==> sixxtools

$ python sixxtools/train.py \\
  "sixxconfigs/cascade_rcnn_r50_fpn_1x_coco.py" \\
  --work-dir "work_dirs/ttt"

work_dirs์— ์ž‘์—…ํ•  ํด๋”๋ฅผ ๋งŒ๋“ค์–ด์ง

์ˆ˜์ •๋œ cfg ํ™•์ธ

epoch_69.pth (PyTorch Model)์ด ์ƒ์„ฑ๋œ๋‹ค.

๋ชจ๋ธ ์‹คํ—˜์„ ์œ„ํ•œ config ์ˆ˜์ •

https://pebpung.github.io/wandb/2021/10/06/WandB-1.html

์ง๊ด€์ ์œผ๋กœ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋น„ํšจ์œจ์ ์ด๋‹ค..

์—ฌ๋Ÿฌ GPU ์‚ฌ์šฉ

$ CUDA_VISIBLE_DEVICES=2,3 port=29506 sixxtools/dist_train.sh work_dirs/sixx_faster_rcnn_r50_fpn_1x_coco.py 2
$ CUDA_VISIBLE_DEVICES=2,3,4,5,6,7 port=29506 sixxtools/dist_train.sh sixxconfigs/faster_rcnn_r50_fpn_1x_coco_sixx.py 6

CUDA_VISIBLE_DEVICES๋กœ ์‚ฌ์šฉํ•  GPU๋ฅผ ํ•œ์ •ํ•ด์ฃผ๊ณ ,

Port๋ฅผ ๋ถ„๋ฆฌํ•œ ํ›„,

์‹คํ–‰

CUDA_VISIBLE_DEVICES=2,3 python train.py

CUDA_VISIBLE_DEVICES=2,3 python train.py

CUDA_VISIBLE_DEVICES=2,3 python train.py

์ฐธ๊ณ 

#!/usr/bin/env bash

CONFIG=$1
GPUS=$2
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-29501}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \\
python -m torch.distributed.launch \\
    --nnodes=$NNODES \\
    --node_rank=$NODE_RANK \\
    --master_addr=$MASTER_ADDR \\
    --nproc_per_node=$GPUS \\
    --master_port=$PORT \\
    $(dirname "$0")/train.py \\
    $CONFIG \\
    --seed 0 \\
    --launcher pytorch ${@:3}
gpu_ids = range(1,3)
$ watch -d -n0.5 nvidia-smi

~/my/git/mmdetection$ bash sixx/dist_train.sh work_dirs/sixx_faster_rcnn_r50_fpn_1x_coco/sixx_faster_rcnn_r50_fpn_1x_coco.py  3

๋‹ค์‹œ Training work_dirs

sixx/dist_train.sh: line 2: $’\r’: command not found

์ด๋Ÿฐ ์—๋Ÿฌ๊ฐ€ ๋‚  ๊ฒฝ์šฐ, ์ „์ฒด ์ค„๋ฐ”๊ฟˆ(Carriage return๊ณผ New Line \r )์„ newline( )์œผ๋กœ ๋ฐ”๊ฟ”์ค€๋‹ค.

sed -i -e ‘s/\r$//’ ./sixx/dist_train.sh

https://github.com/open-mmlab/mmdetection/issues/334

$ bash tools/dist_train.sh configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoints.py 
1 
--work-dir work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoints 
--validate 
--test-best 
--seed 0 
--deterministic

https://github.com/facebookresearch/maskrcnn-benchmark

export NGPUS=2
CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train.py configs/faster_rcnn_r101_fpn_1x.py --gpus 2

https://artiiicy.tistory.com/61

 “CUDA_VISIBLE_DEVICES”๋ฅผ ํ†ตํ•ด cuda๊ฐ€ ๋ณผ ์ˆ˜ ์žˆ๋Š” GPU ์ œํ•œํ•˜๊ธฐ

ํ•ญ์ƒ cuda๋Š” GPU 0๋ฒˆ(torch.cuda.current_device())๋ถ€ํ„ฐ ์‚ฌ์šฉ์„ ํ•˜๊ฒŒ ๋˜๊ณ , CUDA_VISBLE_DEVICES= 2,3 ์ด๋ผ๋ฉด, cuda๋Š” 2,3๋ฒˆ์งธ๋งŒ ๋ณผ์ˆ˜ ์žˆ๊ธฐ๋•Œ๋ฌธ์— GPU 0์„ ํ• ๋‹นํ•˜๋Š”๋‹ค๋Š” ๊ฒƒ์ด 2๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

๋‹จ, multi์ธ ๊ฒฝ์šฐ, nn.DataParallel()์„ ์ž‘์„ฑํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

1-2) Jupyter notebook ๋“ฑ์˜ python script “~.ipynb” file ๋‚ด์—์„œ ๋Œ๋ฆฌ๋Š” ๊ฒฝ์šฐ

“~.ipynb” ์™€ ๊ฐ™์ด python script ๋‚ด์—์„œ ๋Œ๋ฆฌ๋Š” ๊ฒฝ์šฐ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด os.environ[ ] code๋ฅผ ํ™œ์šฉํ•˜์—ฌ environment๋ฅผ ์„ค์ •ํ•˜์—ฌ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"  # Arrange GPU devices starting from 0
os.environ["CUDA_VISIBLE_DEVICES"]= "2,3"     # Set the GPUs 2 and 3 to use
$ python sixx/train.py work_dirs/sixx_faster_rcnn_r50_fpn_1x_coco.py

$ python sixxtools/train.py sixxconfigs/cascade_rcnn_r50_fpn_1x_coco.py

Categories: vision

onesixx

Blog Owner

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x