GDKVM | Reproduction Process

1. 环境构建1. Environment Setup

0. 清理环境0. Clean Environment

若当前 Shell 已激活虚拟环境（左侧显示 (base) 或 (env)），需先退出。 If a virtual environment is currently activated in your Shell (indicated by (base) or (env) on the left), please deactivate it first.

对于 Conda 环境：For Conda environments:

conda deactivate

对于普通 venv：For standard venv:

deactivate

备注： uv 的环境管理具有较高的隔离性。在大多数场景下，即便未退出 Conda 环境，直接构建 uv 虚拟环境也不会引发依赖冲突。 Note: uv's environment management features high isolation. In most scenarios, even without deactivating the Conda environment, directly creating a uv virtual environment will not cause dependency conflicts.

1. 工具准备：安装 uv1. Preparation: Install uv

本项目采用 uv 进行依赖管理。该工具基于 Rust 开发，具备高效的依赖解析能力。 This project uses uv for dependency management. Developed in Rust, it features highly efficient dependency resolution capabilities.

服务器无法直接链接外网时，安装：Installation without direct internet access:

pip install uv

或更新至最新版：Or update to the latest version:

pip install --upgrade uv

验证安装：Verify installation:

uv -V

基准版本：今天的版本 0.9.10。 Baseline version: Today's version 0.9.10.

2. 获取项目代码2. Get Project Code

克隆代码仓库并指定本地目录名。 Clone the repository and specify the local directory name.

git clone https://github.com/wangrui2025/GDKVM.git gdkvm_20251215

进入项目目录：Enter the project directory:

cd gdkvm_20251215

项目结构概览：Project structure overview:

.
├── .python-version    # Specifies the Python version
├── pyproject.toml     # Main project configuration file
├── uv.lock            # Lock file (Ensures consistency)
└── ...                # Other source code files

.
├── .python-version    # 指定项目使用的 Python 版本
├── pyproject.toml     # 项目的主配置文件
├── uv.lock            # 锁定文件 (确保环境一致性)
└── ...                # 其他项目源代码文件

3. 环境初始化与同步3. Initialization & Sync

uv 将读取配置文件，创建虚拟环境，安装所有依赖。 uv will read configuration files, create a virtual environment, and install all dependencies.

对于更新的环境，可以使用 env_02 环境配置，具体代码参考 env/env_02/pyproject.toml For a newer environment, you can use the env_02 configuration. Refer to env/env_02/pyproject.toml.

环境自检Environment Self-Check

( printf "\n==========================================\n🔍 1. GPU Drivers (Hardware Foundation)\n"; nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | awk '{print "Driver: " $1}' || echo "Driver: Unknown"; nvidia-smi 2>/dev/null | grep "CUDA Version" | awk '{print "Max CUDA: " $9}' || echo "Max CUDA: Unknown"; printf "\n==========================================\n🐧 2. OS & GLIBC\n"; [ -f /etc/os-release ] && . /etc/os-release && echo "OS: ${PRETTY_NAME}"; ldd --version | head -n 1; printf "\n==========================================\n🏗️  3. Compiler (JIT Critical)\n"; printf "GCC: "; gcc --version 2>/dev/null | head -n 1 || echo "Not found"; printf "\n==========================================\n🛠️  4. CUDA Toolkit\n"; if command -v nvcc >/dev/null 2>&1; then nvcc -V | grep release; else echo "⚠️ nvcc not found"; fi; printf -- "------------------------------------------\nCUDA Physical Directories (/usr/local):\n"; ls -l /usr/local 2>/dev/null | grep cuda; printf -- "------------------------------------------\nLD_LIBRARY_PATH (Runtime Libs):\n${LD_LIBRARY_PATH:-⚠️ Not set}\n"; printf "\n==========================================\n🐍 5. Python Environment\n"; printf "Python: "; python3 --version 2>&1 || echo "Not found"; printf "Path:   "; command -v python3 || echo "Not found"; printf "==========================================\n" )

env_01 （兼容性较好的环境）(Stable Environment)

            ==========================================

            🔍 1. GPU Drivers (Hardware Foundation)

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Driver: 525.85.05

            Max CUDA: 12.0

            ==========================================

            🐧 2. OS & GLIBC

            OS: Ubuntu 18.04.6 LTS

            ldd (Ubuntu GLIBC 2.27-3ubuntu1.6) 2.27

            ==========================================

            🏗️  3. Compiler (JIT Critical)

            GCC: gcc (GCC) 7.4.0

            ==========================================

            🛠️  4. CUDA Toolkit

            Cuda compilation tools, release 11.8, V11.8.89

            ------------------------------------------

            CUDA Physical Directories (/usr/local):

            lrwxrwxrwx  1 root root    20 11月 21  2022 cuda -> /usr/local/cuda-10.0

            drwxr-xr-x 17 root root  4096 11月  9  2022 cuda-10.1

            drwxr-xr-x 18 root root  4096 7月  18  2022 cuda-10.2

            drwxr-xr-x 17 root root  4096 4月  19  2024 cuda-11.8

            drwxr-xr-x 17 root root  4096 7月   9  2022 cuda-8.0

            drwxr-xr-x 18 root root  4096 7月  18  2022 cuda-9.0

            ------------------------------------------

            LD_LIBRARY_PATH (Runtime Libs):

            /usr/local/cuda-11.8/lib64:

            ==========================================

            🐍 5. Python Environment

            Python: Python 3.12.12

            Path:   /data/Anon/Repo/gdkvm-20251216/.venv/bin/python3

            ==========================================

env_02 （较新的环境）(Newer Environment)

            ==========================================

            🔍 1. GPU Drivers (Hardware Foundation)

            Driver: 570.133.07

            Driver: 570.133.07

            Driver: 570.133.07

            Driver: 570.133.07

            Max CUDA: 12.8

            ==========================================

            🐧 2. OS & GLIBC

            OS: Ubuntu 20.04.6 LTS

            ldd (Ubuntu GLIBC 2.31-0ubuntu9.16) 2.31

            ==========================================

            🏗️  3. Compiler (JIT Critical)

            GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

            ==========================================

            🛠️  4. CUDA Toolkit

            Cuda compilation tools, release 11.8, V11.8.89

            ------------------------------------------

            CUDA Physical Directories (/usr/local):

            lrwxrwxrwx  1 root root   21 4月   4  2023 cuda -> /usr/local/cuda-11.7/

            drwxr-xr-x 19 root root 4096 9月   8  2022 cuda-10.0

            drwxr-xr-x 18 root root 4096 10月 31  2022 cuda-10.1

            drwxr-xr-x 15 root root 4096 9月   8  2022 cuda-11.3

            drwxr-xr-x 16 root root 4096 9月  21  2021 cuda-11.4

            drwxr-xr-x 16 root root 4096 4月   4  2023 cuda-11.7

            drwxr-xr-x 17 root root 4096 4月  22  2024 cuda-11.8

            drwxr-xr-x 17 root root 4096 12月  1 15:59 cuda-12.6

            ------------------------------------------

            LD_LIBRARY_PATH (Runtime Libs):

            /usr/local/cuda-11.8/lib64:

            ==========================================

            🐍 5. Python Environment

            Python: Python 3.13.7

            Path:   /data/Anon/Repo/gdkvm-rtx6/.venv/bin/python3

            ==========================================

uv sync

故障排除：内网/受限网络下的 Python 安装 Troubleshooting: Python Installation in Intranet

场景： 若执行 uv sync 时报错（如 SSL 错误、连接超时），通常系网络策略限制导致 uv 无法自动下载 Python 解释器。 Scenario: If uv sync fails (e.g., SSL errors, timeouts), it is usually due to network policies preventing uv from automatically downloading the Python interpreter.

解决方案： 通过镜像源手动安装 Python。 Solution: Manually install Python via a mirror source.

查看可用版本：Check available versions:

uv python list

通过镜像源安装（以 3.12.12 为例）：Install via mirror (e.g., 3.12.12):

uv python install 3.12.12 --mirror https://github-proxy.lixxing.top/https://github.com/astral-sh/python-build-standalone/releases/download

安装成功后，再次运行 uv sync 。After successful installation, run uv sync again.

版本约束与兼容性说明Version Constraints & Compatibility

Pydantic 版本冲突Pydantic Version Conflict <2.12

现象：Pydantic 2.12+ (2025-10) 的严格 Schema 校验与 wandb 字段声明冲突，多进程下可能导致崩溃。 Issue: Pydantic 2.12+ (2025-10) strict Schema validation conflicts with wandb field declarations, potentially causing crashes in multi-process modes.

解决：配置文件已强制锁定 pydantic<2.12。 Solution: The configuration file forcefully locks pydantic<2.12.
Wandb 系统兼容性 (glibc)Wandb System Compatibility (glibc)

现象：wandb>=0.22.2 停止提供针对 Ubuntu 18.04 (glibc 2.27) 的预编译包，导致安装失败。 Issue: wandb>=0.22.2 stopped providing pre-built packages for Ubuntu 18.04 (glibc 2.27), causing installation failures.

解决：需确保系统安装了 Go 编译器以支持源码编译，或升级操作系统。 Solution: Ensure a Go compiler is installed for source compilation, or upgrade the OS.
SSL/TLS 证书错误 (内网/代理环境)SSL/TLS Certificate Errors
现象：运行 uv add/sync 时报错（Certificate Expired），因内网防火墙/代理的自签名证书不被 uv 默认的 Rust TLS 信任。 Issue: uv add/sync fails (Certificate Expired) because self-signed certificates in intranet/proxy environments are not trusted by uv's default Rust TLS.

解决方案：切换为系统原生 TLS 验证。 Solution: Switch to system-native TLS validation.
方案 A (临时单次)Option A (Temporary)
```
uv add wandb --native-tls
```
方案 B (永久推荐)Option B (Recommended)
```
export UV_NATIVE_TLS=1
```
* 建议将此命令添加到 ~/.bashrc 或 ~/.zshrc * Recommended to add this to ~/.bashrc or ~/.zshrc

4. 激活环境4. Activate Environment

source .venv/bin/activate

注：推荐显式激活环境，以便进入交互式调试（如 Python REPL）及使用 pip 检查包状态。 Note: Explicit activation is recommended to enable interactive debugging (e.g., Python REPL) and checking package status via pip.

验证环境（应输出项目 .venv 目录下的路径）：Verify environment (should output the path inside project .venv):

which python

输出结果示例：
/data/Anon/Repo/gdkvm_20251215/.venv/bin/python Example Output:
/data/Anon/Repo/gdkvm_20251215/.venv/bin/python

2. 数据准备2. Data Preparation

我们使用 CAMUS 和 EchoNet-Dynamic 数据集。 We utilize the CAMUS and EchoNet-Dynamic datasets.

数据集下载Dataset Download

已处理数据集Processed Data: CAMUS 🤗 HuggingFace EchoNet-Dynamic todo

原始数据集Raw Data: CAMUS 官方网站Official 🤗 Mirror镜像 EchoNet-Dynamic 官方网站Official 🤗 镜像Mirror

3. 模型训练与评估3. Training & Evaluation

3.1 模型训练3.1 Model Training

1. 环境配置1. Configuration

根据 Shell 环境（zsh 或 bash）选择相应的 train.sh 脚本，并配置以下环境变量以适配硬件环境： Select the appropriate train.sh script based on your Shell environment (zsh or bash), and configure the following environment variables to adapt to your hardware:

CUDA_VISIBLE_DEVICES: 0,1 # Specify GPU device IDs
MASTER_PORT: 29500        # Port for distributed training

CUDA_VISIBLE_DEVICES: 0,1 # 指定使用的 GPU 设备编号
MASTER_PORT: 29500        # 分布式训练的主端口号，避免冲突

2. 参数设定2. Hyperparameters

编辑配置文件 config/config_gdkvm_01.yaml，针对实验需求调整关键超参数： Edit config/config_gdkvm_01.yaml to adjust key hyperparameters:

data_path: /data/Anon/dataset/camus_png256x256_10f_20250709/   # Dataset path
batch_size: 8                                                  # Batch size
learning_rate: 1.0e-4                                          # Learning rate
num_iterations: 3000                                           # Total iterations
eval_stage:
  num_vis: 0                                                   # Visualization count
wandb_mode: "offline"                                          # Set to "offline"

data_path: /data/Anon/dataset/camus_png256x256_10f_20250709/   # 数据集的实际存放路径
batch_size: 8                                                  # 单次训练的样本数量
learning_rate: 1.0e-4                                          # 学习率
num_iterations: 3000                                           # 总迭代次数
eval_stage:
  num_vis: 0                                                   # 可视化图片的数量
wandb_mode: "offline"                                          # 设置为 "offline"

3. 执行训练3. Execute Training

赋予执行权限：Grant execution permission:

chmod +x ./train.sh

启动训练：Start training:

./train.sh

3.2 训练结果输出3.2 Outputs

训练产物（模型权重、可视化结果等）将保存至 train.sh 中 hydra.run.dir 指定的目录。 Artifacts (weights, visualizations) are saved to the directory specified in hydra.run.dir.

gdkvm_20251215/outputs

3.3 WandB 监控 Monitoring

实验采用 Weights & Biases (WandB) 进行离线日志记录。 Experiments use Weights & Biases (WandB) for offline logging.

日志路径:Log Path: gdkvm_20251215/wandb
具体运行目录:Run Directory: 包含时间戳与哈希值的子文件夹（例如：offline-run-20251215_123456-abcdef1gh）。Subfolders with timestamps and hashes (e.g., offline-run-20251215_123456-abcdef1gh).

上传离线日志Upload Offline Logs

训练结束后，可以使用以下命令将离线数据同步到 WandB 云端： After training, sync offline data to WandB cloud:

wandb sync gdkvm_20251215/wandb/offline-run-20251215_123456-abcdef1gh

wandb sync gdkvm_20251215/wandb/offline-run-20251215_123456-abcdef1gh

Reproduction Process复现流程