VSCode 下进行 CUDA 开发:环境、构建、调试与剖析
整理 VSCode 进行 CUDA 开发的实践:环境准备、CMake/Make 构建、VSCode 配置、cuda-gdb 调试与 Nsight 剖析,含 PlantUML 流程图与示例。
VSCode 下进行 CUDA 开发:环境、构建、调试与剖析
快速检查环境
1
2
3
nvidia-smi # 查看驱动与 GPU 状态
nvcc --version # 确认 CUDA Toolkit 可用
which cuda-gdb # 调试器(通常随 toolkit 安装)
若提示 not found,请安装/修复驱动与 CUDA Toolkit,并确保 PATH 与 LD_LIBRARY_PATH(Linux)包含 CUDA 安装路径。
项目结构与 CMake 示例
推荐用 CMake(3.18+)启用 CUDA 语言:
1
2
3
4
5
6
7
8
9
10
# CMakeLists.txt
cmake_minimum_required(VERSION 3.18)
project(cuda_vscode_demo LANGUAGES CXX CUDA)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_ARCHITECTURES 75) # 按需设置,如 86/89/90 等
add_executable(app src/main.cu)
target_compile_options(app PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-Wall>)
示例代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// src/main.cu
#include <cstdio>
__global__ void vec_add(const float* a, const float* b, float* c, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
int main() {
int n = 1 << 20; size_t bytes = n * sizeof(float);
float *a, *b, *c, *d_a, *d_b, *d_c;
a = (float*)malloc(bytes); b = (float*)malloc(bytes); c = (float*)malloc(bytes);
cudaMalloc(&d_a, bytes); cudaMalloc(&d_b, bytes); cudaMalloc(&d_c, bytes);
cudaMemcpy(d_a, a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, bytes, cudaMemcpyHostToDevice);
dim3 block(256); dim3 grid((n + block.x - 1) / block.x);
vec_add<<<grid, block>>>(d_a, d_b, d_c, n);
cudaMemcpy(c, d_c, bytes, cudaMemcpyDeviceToHost);
printf("c[0]=%f\n", c[0]);
cudaFree(d_a); cudaFree(d_b); cudaFree(d_c); free(a); free(b); free(c);
return 0;
}
构建命令:
1
2
3
cmake -S . -B build
cmake --build build -j
./build/app
VSCode 配置(tasks.json/launch.json)
.vscode/tasks.json:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"version": "2.0.0",
"tasks": [
{
"label": "cmake-config",
"type": "shell",
"command": "cmake -S . -B build",
"group": "build"
},
{
"label": "cmake-build",
"type": "shell",
"command": "cmake --build build -j",
"group": "build",
"dependsOn": ["cmake-config"]
},
{
"label": "run-app",
"type": "shell",
"command": "./build/app",
"dependsOn": ["cmake-build"]
}
]
}
.vscode/launch.json(使用 cuda-gdb 调试):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug CUDA (cuda-gdb)",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/build/app",
"cwd": "${workspaceFolder}",
"miDebuggerPath": "/usr/local/cuda/bin/cuda-gdb",
"MIMode": "gdb",
"setupCommands": [
{ "text": "-enable-pretty-printing" }
],
"environment": [
{ "name": "LD_LIBRARY_PATH", "value": "/usr/local/cuda/lib64:${env:LD_LIBRARY_PATH}" }
]
}
]
}
注意:
- 不同发行版 CUDA 安装位置可能是
/opt/cuda或/usr/local/cuda,请据实调整。 - 若使用 VSCode 的
CMake Tools扩展,也可直接配置cmake.configureSettings与cmake.buildDirectory,体验更佳。
Nsight 剖析(选配)
- 安装
Nsight Systems与Nsight Compute,对应用进行系统级与内核级分析。 - VSCode 中可通过任务调用
nsys/ncu:
1
2
nsys profile -o report ./build/app
ncu --set full ./build/app
常见问题速查
- 驱动/Toolkit 不匹配:更新到兼容版本,依据输出的
CUDA Version/Driver Version。 - 设备架构不符:
CMAKE_CUDA_ARCHITECTURES或-gencode需匹配 GPU 的 Compute Capability。 - 运行库找不到:设置
LD_LIBRARY_PATH或在运行时rpath配置。 - 调试无法命中设备代码:确认使用
cuda-gdb,并启用-G(调试构建,性能会下降)。
PlantUML:VSCode CUDA 开发流程
小结
通过 CMake + VSCode 的任务/调试配置,可实现高效的 CUDA 开发与定位流程;结合 Nsight 工具链进行剖析,逐步优化内存访问与占用率等关键指标。欢迎在评论区反馈你的具体环境与问题,我可以提供针对性的配置示例。
This post is licensed under
CC BY 4.0
by the author.