[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory

2012-张同学

发表文章数:46

热门标签

, ,
首页 » 数据科学库 » 正文

错误一:RuntimeError: CUDA out of memory

RuntimeError: CUDA out of memory

报错代码:

vgg16_model.eval()
            with torch.no_grad():
                for j, data in enumerate(valid_loader):
                    inputs, labels = data
                    inputs, labels = inputs.to(device), labels.to(device)

                    bs, ncrops, c, h, w = inputs.size()
                    outputs = vgg16_model(inputs.view(-1, c, h, w))
                    outputs_avg = outputs.view(bs, ncrops, -1).mean(1)

                    loss = criterion(outputs_avg, labels)

                    _, predicted = torch.max(outputs_avg.data, 1)
                    total_val += labels.size(0)
                    correct_val += (predicted == labels).squeeze().cpu().sum().numpy()

                    loss_val += loss.item()

                loss_val_mean = loss_val/len(valid_loader)
                valid_curve.append(loss_val_mean)
                print("Valid:/t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                    epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val_mean, correct_val / total_val))
            vgg16_model.train()

修改后的代码:with torch.no_grad():

'''
加 with torch.no_grad,来表明此处不需要进行梯度计算。因为我在验证集加了model.eval所以好像pytorch并不会自动进行val的梯度的清理。
'''
        if (epoch+1) % val_interval == 0:

            correct_val = 0.
            total_val = 0.
            loss_val = 0.
            vgg16_model.eval()
            with torch.no_grad():
                for j, data in enumerate(valid_loader):
                    inputs, labels = data
                    inputs, labels = inputs.to(device), labels.to(device)

                    bs, ncrops, c, h, w = inputs.size()
                    outputs = vgg16_model(inputs.view(-1, c, h, w))
                    outputs_avg = outputs.view(bs, ncrops, -1).mean(1)

                    loss = criterion(outputs_avg, labels)

                    _, predicted = torch.max(outputs_avg.data, 1)
                    total_val += labels.size(0)
                    correct_val += (predicted == labels).squeeze().cpu().sum().numpy()

                    loss_val += loss.item()

                loss_val_mean = loss_val/len(valid_loader)
                valid_curve.append(loss_val_mean)
                print("Valid:/t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                    epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val_mean, correct_val / total_val))
            vgg16_model.train()

错误二:修改好后又报了如下:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:247: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory

修改前代码:

# backward
            optimizer.zero_grad()
            loss = criterion(outputs, labels)
            loss.backward()

            # update weights
            optimizer.step()

修改后代码:加上loss = loss.requires_grad_()

 # backward
                optimizer.zero_grad()
                loss = criterion(outputs, labels)
                loss = loss.requires_grad_()
                loss.backward()

拜师教育学员文章:作者:2012-张同学, 转载或复制请以 超链接形式 并注明出处 拜师资源博客
原文地址:《[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory》 发布于2022-01-22

分享到:
赞(0) 打赏

评论 抢沙发

评论前必须登录!

  注册



长按图片转发给朋友

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

Vieu3.3主题
专业打造轻量级个人企业风格博客主题!专注于前端开发,全站响应式布局自适应模板。

登录

忘记密码 ?

您也可以使用第三方帐号快捷登录

Q Q 登 录
微 博 登 录