Reproduction Process复现流程

GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule GDKVM:含门控 Delta 规则的时空键值记忆的超声心动图视频分割

1. 环境构建1. Environment Setup

0. 清理环境0. Clean Environment

若当前 Shell 已激活虚拟环境(左侧显示 (base)(env)),需先退出。 If a virtual environment is currently activated in your Shell (indicated by (base) or (env) on the left), please deactivate it first.

对于 Conda 环境:For Conda environments:

conda deactivate

对于普通 venv:For standard venv:

deactivate

备注: uv 的环境管理具有较高的隔离性。在大多数场景下,即便未退出 Conda 环境,直接构建 uv 虚拟环境也不会引发依赖冲突。 Note: uv's environment management features high isolation. In most scenarios, even without deactivating the Conda environment, directly creating a uv virtual environment will not cause dependency conflicts.

1. 工具准备:安装 uv1. Preparation: Install uv

本项目采用 uv 进行依赖管理。该工具基于 Rust 开发,具备高效的依赖解析能力。 This project uses uv for dependency management. Developed in Rust, it features highly efficient dependency resolution capabilities.

服务器无法直接链接外网时,安装:Installation without direct internet access:

pip install uv

或更新至最新版:Or update to the latest version:

pip install --upgrade uv

验证安装:Verify installation:

uv -V

基准版本:今天的版本 0.9.10。 Baseline version: Today's version 0.9.10.

2. 获取项目代码2. Get Project Code

克隆代码仓库并指定本地目录名。 Clone the repository and specify the local directory name.

git clone https://github.com/wangrui2025/GDKVM.git gdkvm_20251215

进入项目目录:Enter the project directory:

cd gdkvm_20251215

项目结构概览:Project structure overview:

.
├── .python-version    # Specifies the Python version
├── pyproject.toml     # Main project configuration file
├── uv.lock            # Lock file (Ensures consistency)
└── ...                # Other source code files
.
├── .python-version    # 指定项目使用的 Python 版本
├── pyproject.toml     # 项目的主配置文件
├── uv.lock            # 锁定文件 (确保环境一致性)
└── ...                # 其他项目源代码文件

3. 环境初始化与同步3. Initialization & Sync

uv 将读取配置文件,创建虚拟环境,安装所有依赖。 uv will read configuration files, create a virtual environment, and install all dependencies.

对于更新的环境,可以使用 env_02 环境配置,具体代码参考 env/env_02/pyproject.toml For a newer environment, you can use the env_02 configuration. Refer to env/env_02/pyproject.toml.

环境自检Environment Self-Check
( printf "\n==========================================\n🔍 1. GPU Drivers (Hardware Foundation)\n"; nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | awk '{print "Driver: " $1}' || echo "Driver: Unknown"; nvidia-smi 2>/dev/null | grep "CUDA Version" | awk '{print "Max CUDA: " $9}' || echo "Max CUDA: Unknown"; printf "\n==========================================\n🐧 2. OS & GLIBC\n"; [ -f /etc/os-release ] && . /etc/os-release && echo "OS: ${PRETTY_NAME}"; ldd --version | head -n 1; printf "\n==========================================\n🏗️  3. Compiler (JIT Critical)\n"; printf "GCC: "; gcc --version 2>/dev/null | head -n 1 || echo "Not found"; printf "\n==========================================\n🛠️  4. CUDA Toolkit\n"; if command -v nvcc >/dev/null 2>&1; then nvcc -V | grep release; else echo "⚠️ nvcc not found"; fi; printf -- "------------------------------------------\nCUDA Physical Directories (/usr/local):\n"; ls -l /usr/local 2>/dev/null | grep cuda; printf -- "------------------------------------------\nLD_LIBRARY_PATH (Runtime Libs):\n${LD_LIBRARY_PATH:-⚠️ Not set}\n"; printf "\n==========================================\n🐍 5. Python Environment\n"; printf "Python: "; python3 --version 2>&1 || echo "Not found"; printf "Path:   "; command -v python3 || echo "Not found"; printf "==========================================\n" )

env_01 (兼容性较好的环境)(Stable Environment)

==========================================
🔍 1. GPU Drivers (Hardware Foundation)
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Driver: 525.85.05
Max CUDA: 12.0

==========================================
🐧 2. OS & GLIBC
OS: Ubuntu 18.04.6 LTS
ldd (Ubuntu GLIBC 2.27-3ubuntu1.6) 2.27

==========================================
🏗️ 3. Compiler (JIT Critical)
GCC: gcc (GCC) 7.4.0

==========================================
🛠️ 4. CUDA Toolkit
Cuda compilation tools, release 11.8, V11.8.89
------------------------------------------
CUDA Physical Directories (/usr/local):
lrwxrwxrwx 1 root root 20 11月 21 2022 cuda -> /usr/local/cuda-10.0
drwxr-xr-x 17 root root 4096 11月 9 2022 cuda-10.1
drwxr-xr-x 18 root root 4096 7月 18 2022 cuda-10.2
drwxr-xr-x 17 root root 4096 4月 19 2024 cuda-11.8
drwxr-xr-x 17 root root 4096 7月 9 2022 cuda-8.0
drwxr-xr-x 18 root root 4096 7月 18 2022 cuda-9.0
------------------------------------------
LD_LIBRARY_PATH (Runtime Libs):
/usr/local/cuda-11.8/lib64:

==========================================
🐍 5. Python Environment
Python: Python 3.12.12
Path: /data/Anon/Repo/gdkvm-20251216/.venv/bin/python3
==========================================

env_02 (较新的环境)(Newer Environment)

==========================================
🔍 1. GPU Drivers (Hardware Foundation)
Driver: 570.133.07
Driver: 570.133.07
Driver: 570.133.07
Driver: 570.133.07
Max CUDA: 12.8

==========================================
🐧 2. OS & GLIBC
OS: Ubuntu 20.04.6 LTS
ldd (Ubuntu GLIBC 2.31-0ubuntu9.16) 2.31

==========================================
🏗️ 3. Compiler (JIT Critical)
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

==========================================
🛠️ 4. CUDA Toolkit
Cuda compilation tools, release 11.8, V11.8.89
------------------------------------------
CUDA Physical Directories (/usr/local):
lrwxrwxrwx 1 root root 21 4月 4 2023 cuda -> /usr/local/cuda-11.7/
drwxr-xr-x 19 root root 4096 9月 8 2022 cuda-10.0
drwxr-xr-x 18 root root 4096 10月 31 2022 cuda-10.1
drwxr-xr-x 15 root root 4096 9月 8 2022 cuda-11.3
drwxr-xr-x 16 root root 4096 9月 21 2021 cuda-11.4
drwxr-xr-x 16 root root 4096 4月 4 2023 cuda-11.7
drwxr-xr-x 17 root root 4096 4月 22 2024 cuda-11.8
drwxr-xr-x 17 root root 4096 12月 1 15:59 cuda-12.6
------------------------------------------
LD_LIBRARY_PATH (Runtime Libs):
/usr/local/cuda-11.8/lib64:

==========================================
🐍 5. Python Environment
Python: Python 3.13.7
Path: /data/Anon/Repo/gdkvm-rtx6/.venv/bin/python3
==========================================
uv sync
故障排除:内网/受限网络下的 Python 安装 Troubleshooting: Python Installation in Intranet

场景: 若执行 uv sync 时报错(如 SSL 错误、连接超时),通常系网络策略限制导致 uv 无法自动下载 Python 解释器。 Scenario: If uv sync fails (e.g., SSL errors, timeouts), it is usually due to network policies preventing uv from automatically downloading the Python interpreter.

解决方案: 通过镜像源手动安装 Python。 Solution: Manually install Python via a mirror source.

查看可用版本:Check available versions:

uv python list

通过镜像源安装(以 3.12.12 为例):Install via mirror (e.g., 3.12.12):

uv python install 3.12.12 --mirror https://github-proxy.lixxing.top/https://github.com/astral-sh/python-build-standalone/releases/download

安装成功后,再次运行 uv syncAfter successful installation, run uv sync again.

版本约束与兼容性说明Version Constraints & Compatibility
  1. Pydantic 版本冲突Pydantic Version Conflict <2.12

    现象:Pydantic 2.12+ (2025-10) 的严格 Schema 校验与 wandb 字段声明冲突,多进程下可能导致崩溃。 Issue: Pydantic 2.12+ (2025-10) strict Schema validation conflicts with wandb field declarations, potentially causing crashes in multi-process modes.

    解决:配置文件已强制锁定 pydantic<2.12 Solution: The configuration file forcefully locks pydantic<2.12.

  2. Wandb 系统兼容性 (glibc)Wandb System Compatibility (glibc)

    现象:wandb>=0.22.2 停止提供针对 Ubuntu 18.04 (glibc 2.27) 的预编译包,导致安装失败。 Issue: wandb>=0.22.2 stopped providing pre-built packages for Ubuntu 18.04 (glibc 2.27), causing installation failures.

    解决:需确保系统安装了 Go 编译器以支持源码编译,或升级操作系统。 Solution: Ensure a Go compiler is installed for source compilation, or upgrade the OS.

  3. SSL/TLS 证书错误 (内网/代理环境)SSL/TLS Certificate Errors

    现象:运行 uv add/sync 时报错(Certificate Expired),因内网防火墙/代理的自签名证书不被 uv 默认的 Rust TLS 信任。 Issue: uv add/sync fails (Certificate Expired) because self-signed certificates in intranet/proxy environments are not trusted by uv's default Rust TLS.

    解决方案:切换为系统原生 TLS 验证。 Solution: Switch to system-native TLS validation.

    方案 A (临时单次)Option A (Temporary)
    uv add wandb --native-tls
    方案 B (永久推荐)Option B (Recommended)
    export UV_NATIVE_TLS=1

    * 建议将此命令添加到 ~/.bashrc~/.zshrc * Recommended to add this to ~/.bashrc or ~/.zshrc

4. 激活环境4. Activate Environment

source .venv/bin/activate

注:推荐显式激活环境,以便进入交互式调试(如 Python REPL)及使用 pip 检查包状态。 Note: Explicit activation is recommended to enable interactive debugging (e.g., Python REPL) and checking package status via pip.

验证环境(应输出项目 .venv 目录下的路径):Verify environment (should output the path inside project .venv):

which python

输出结果示例:
/data/Anon/Repo/gdkvm_20251215/.venv/bin/python
Example Output:
/data/Anon/Repo/gdkvm_20251215/.venv/bin/python

2. 数据准备2. Data Preparation

我们使用 CAMUSEchoNet-Dynamic 数据集。 We utilize the CAMUS and EchoNet-Dynamic datasets.

数据集下载Dataset Download

已处理数据集Processed Data: CAMUS 🤗 HuggingFace EchoNet-Dynamic todo
原始数据集Raw Data: CAMUS 官方网站Official 🤗 Mirror镜像 EchoNet-Dynamic 官方网站Official 🤗 镜像Mirror

3. 模型训练与评估3. Training & Evaluation

3.1 模型训练3.1 Model Training

1. 环境配置1. Configuration

根据 Shell 环境(zshbash)选择相应的 train.sh 脚本,并配置以下环境变量以适配硬件环境: Select the appropriate train.sh script based on your Shell environment (zsh or bash), and configure the following environment variables to adapt to your hardware:

CUDA_VISIBLE_DEVICES: 0,1 # Specify GPU device IDs
MASTER_PORT: 29500        # Port for distributed training
CUDA_VISIBLE_DEVICES: 0,1 # 指定使用的 GPU 设备编号
MASTER_PORT: 29500        # 分布式训练的主端口号,避免冲突

2. 参数设定2. Hyperparameters

编辑配置文件 config/config_gdkvm_01.yaml,针对实验需求调整关键超参数: Edit config/config_gdkvm_01.yaml to adjust key hyperparameters:

data_path: /data/Anon/dataset/camus_png256x256_10f_20250709/   # Dataset path
batch_size: 8                                                  # Batch size
learning_rate: 1.0e-4                                          # Learning rate
num_iterations: 3000                                           # Total iterations
eval_stage:
  num_vis: 0                                                   # Visualization count
wandb_mode: "offline"                                          # Set to "offline"
data_path: /data/Anon/dataset/camus_png256x256_10f_20250709/   # 数据集的实际存放路径
batch_size: 8                                                  # 单次训练的样本数量
learning_rate: 1.0e-4                                          # 学习率
num_iterations: 3000                                           # 总迭代次数
eval_stage:
  num_vis: 0                                                   # 可视化图片的数量
wandb_mode: "offline"                                          # 设置为 "offline"

3. 执行训练3. Execute Training

赋予执行权限:Grant execution permission:

chmod +x ./train.sh

启动训练:Start training:

./train.sh

3.2 训练结果输出3.2 Outputs

训练产物(模型权重、可视化结果等)将保存至 train.shhydra.run.dir 指定的目录。 Artifacts (weights, visualizations) are saved to the directory specified in hydra.run.dir.

gdkvm_20251215/outputs

3.3 WandB 监控 Monitoring

实验采用 Weights & Biases (WandB) 进行离线日志记录。 Experiments use Weights & Biases (WandB) for offline logging.

上传离线日志Upload Offline Logs

训练结束后,可以使用以下命令将离线数据同步到 WandB 云端: After training, sync offline data to WandB cloud:

wandb sync gdkvm_20251215/wandb/offline-run-20251215_123456-abcdef1gh
wandb sync gdkvm_20251215/wandb/offline-run-20251215_123456-abcdef1gh