Post

VSCode 下进行 CUDA 开发:环境、构建、调试与剖析

整理 VSCode 进行 CUDA 开发的实践:环境准备、CMake/Make 构建、VSCode 配置、cuda-gdb 调试与 Nsight 剖析,含 PlantUML 流程图与示例。

VSCode 下进行 CUDA 开发:环境、构建、调试与剖析

快速检查环境

1
2
3
nvidia-smi            # 查看驱动与 GPU 状态
nvcc --version        # 确认 CUDA Toolkit 可用
which cuda-gdb        # 调试器(通常随 toolkit 安装)

若提示 not found,请安装/修复驱动与 CUDA Toolkit,并确保 PATHLD_LIBRARY_PATH(Linux)包含 CUDA 安装路径。

项目结构与 CMake 示例

推荐用 CMake(3.18+)启用 CUDA 语言:

1
2
3
4
5
6
7
8
9
10
# CMakeLists.txt
cmake_minimum_required(VERSION 3.18)
project(cuda_vscode_demo LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_ARCHITECTURES 75) # 按需设置,如 86/89/90 等

add_executable(app src/main.cu)
target_compile_options(app PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-Wall>)

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// src/main.cu
#include <cstdio>

__global__ void vec_add(const float* a, const float* b, float* c, int n) {
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n) c[i] = a[i] + b[i];
}

int main() {
  int n = 1 << 20; size_t bytes = n * sizeof(float);
  float *a, *b, *c, *d_a, *d_b, *d_c;
  a = (float*)malloc(bytes); b = (float*)malloc(bytes); c = (float*)malloc(bytes);
  cudaMalloc(&d_a, bytes); cudaMalloc(&d_b, bytes); cudaMalloc(&d_c, bytes);
  cudaMemcpy(d_a, a, bytes, cudaMemcpyHostToDevice);
  cudaMemcpy(d_b, b, bytes, cudaMemcpyHostToDevice);
  dim3 block(256); dim3 grid((n + block.x - 1) / block.x);
  vec_add<<<grid, block>>>(d_a, d_b, d_c, n);
  cudaMemcpy(c, d_c, bytes, cudaMemcpyDeviceToHost);
  printf("c[0]=%f\n", c[0]);
  cudaFree(d_a); cudaFree(d_b); cudaFree(d_c); free(a); free(b); free(c);
  return 0;
}

构建命令:

1
2
3
cmake -S . -B build
cmake --build build -j
./build/app

VSCode 配置(tasks.json/launch.json)

.vscode/tasks.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "cmake-config",
      "type": "shell",
      "command": "cmake -S . -B build",
      "group": "build"
    },
    {
      "label": "cmake-build",
      "type": "shell",
      "command": "cmake --build build -j",
      "group": "build",
      "dependsOn": ["cmake-config"]
    },
    {
      "label": "run-app",
      "type": "shell",
      "command": "./build/app",
      "dependsOn": ["cmake-build"]
    }
  ]
}

.vscode/launch.json(使用 cuda-gdb 调试):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Debug CUDA (cuda-gdb)",
      "type": "cppdbg",
      "request": "launch",
      "program": "${workspaceFolder}/build/app",
      "cwd": "${workspaceFolder}",
      "miDebuggerPath": "/usr/local/cuda/bin/cuda-gdb",
      "MIMode": "gdb",
      "setupCommands": [
        { "text": "-enable-pretty-printing" }
      ],
      "environment": [
        { "name": "LD_LIBRARY_PATH", "value": "/usr/local/cuda/lib64:${env:LD_LIBRARY_PATH}" }
      ]
    }
  ]
}

注意:

  • 不同发行版 CUDA 安装位置可能是 /opt/cuda/usr/local/cuda,请据实调整。
  • 若使用 VSCode 的 CMake Tools 扩展,也可直接配置 cmake.configureSettingscmake.buildDirectory,体验更佳。

Nsight 剖析(选配)

  • 安装 Nsight SystemsNsight Compute,对应用进行系统级与内核级分析。
  • VSCode 中可通过任务调用 nsys/ncu
1
2
nsys profile -o report ./build/app
ncu --set full ./build/app

常见问题速查

  • 驱动/Toolkit 不匹配:更新到兼容版本,依据输出的 CUDA Version/Driver Version
  • 设备架构不符:CMAKE_CUDA_ARCHITECTURES-gencode 需匹配 GPU 的 Compute Capability。
  • 运行库找不到:设置 LD_LIBRARY_PATH 或在运行时 rpath 配置。
  • 调试无法命中设备代码:确认使用 cuda-gdb,并启用 -G(调试构建,性能会下降)。

PlantUML:VSCode CUDA 开发流程

VSCode CUDA Dev FlowDeveloperDeveloperVSCodeVSCodeCMake/nvccCMake/nvcccuda-gdbcuda-gdbNsightNsightedit *.cu / CMakeLists.txtconfigure & build (nvcc)binary (app)launch cuda-gdbdevice/host step & inspectnsys/ncu analyzemetrics & bottlenecks

小结

通过 CMake + VSCode 的任务/调试配置,可实现高效的 CUDA 开发与定位流程;结合 Nsight 工具链进行剖析,逐步优化内存访问与占用率等关键指标。欢迎在评论区反馈你的具体环境与问题,我可以提供针对性的配置示例。

This post is licensed under CC BY 4.0 by the author.

© . Some rights reserved.

Using the Chirpy theme for Jekyll.