基于PyTorch的医学影像分割辅助诊断系统

张

张建站

2026/6/1 1:08:59

10分钟阅读

做这个基于PyTorch的医学影像分割辅助诊断系统真的是一把辛酸泪完整源码链接https://pan.quark.cn/s/1e54aa2ae950选题一出来我人傻了第7个 “基于PyTorch的医学影像分割辅助诊断系统” 看着挺正经的对吧我当时也觉得不就是U-Net嘛网上开源代码一堆调一调就能跑结果我他**的太天真了从配环境开始一路踩坑踩到交代码真的服了先说环境我笔记本是AMD的显卡 CUDA跟我没关系只能CPU硬扛本来想着装个PyTorch GPU版本还能快一点结果torch.cuda.is_available()永远返回False 行吧那就cpu模式反正毕设不用实时跑到60帧慢就慢点但是后面训练的时候那个进度条走得跟蜗牛一样我当时就想把电脑砸了还有python版本我一开始用的3.12 结果torch装了个预览版一堆warning 后来换了3.10才消停这些版本兼容性问题真的恶心得不行然后就是数据的问题一开始我想去找真实的医学影像数据集 KiTS啊 BraTS啊什么的但那些东西一个比一个大动不动几个G 下载还要注册还要申请标注格式还是nii.gz 还要用SimpleITK去读我搞了一下午连一个样本都没读出来算了算了自己生成模拟数据吧反正毕设嘛证明算法能跑通就行又不需要发Nature那生成模拟医学影像我得想个办法先搞个椭圆当器官再搞几个重叠圆当病灶import numpy as np import torch from torch.utils.data import Dataset, DataLoader # 这个函数生成一个椭圆用来模拟器官区域 # 一开始我想用cv2.ellipse 但cv2还要单独装opencv-python # 后面发现直接数学公式算更快还不依赖别的库 def random_ellipse_mask(h, w): center (np.random.randint(h//4, 3*h//4), np.random.randint(w//4, 3*w//4)) axes (np.random.randint(h//8, h//4), np.random.randint(w//8, w//4)) angle np.random.uniform(0, np.pi) y, x np.ogrid[:h, :w] cos_a, sin_a np.cos(angle), np.sin(angle) x_rot cos_a * (x - center[1]) sin_a * (y - center[0]) y_rot -sin_a * (x - center[1]) cos_a * (y - center[0]) mask (x_rot / axes[0])**2 (y_rot / axes[1])**2 1 return mask # 返回bool数组可以直接做布尔索引这个椭圆的旋转公式我搞了快一个小时最开始我把x_rot和y_rot的公式写反了那个旋转矩阵的坐标对应关系我一直搞混出来的椭圆全是偏到左上角跟预期完全不一样后来在纸上画了个坐标系才弄明白中心坐标是(center[1], center[0])对应(x, y) 差点被自己蠢哭然后是病灶区域用几个小圆叠一起模拟不规则形状这个方法虽然粗糙但胜在简单def random_irregular_mask(h, w, num_overlap6): center_y np.random.randint(h//3, 2*h//3) center_x np.random.randint(w//3, 2*w//3) mask np.zeros((h, w), dtypebool) for _ in range(num_overlap): cy center_y np.random.randint(-h//10, h//10) cx center_x np.random.randint(-w//10, w//10) r np.random.randint(h//18, h//10) y, x np.ogrid[:h, :w] circle (x - cx) ** 2 (y - cy) ** 2 r ** 2 mask np.maximum(mask, circle) return mask6个圆叠在一起边缘看起来就像肿瘤那种不规则的边界这里有一点要注意 np.maximum(mask, circle)是element-wise的取最大值因为circle是bool所以相当于做逻辑或运算这样多个圆重叠的区域会合并最终形成一个不规则的多瓣形状效果还行接下来是合成一整张影像我分了几个步骤背景噪声、器官、病灶、亮度渐变def generate_synthetic_medical_image(h128, w128): # 背景用高斯噪声模拟CT图像的噪声特性 image np.random.normal(0.08, 0.03, (h, w)).astype(np.float32) image np.clip(image, 0, 1) # 画个椭圆模拟器官亮度比背景高一点 organ_mask random_ellipse_mask(h, w) organ_intensity np.random.uniform(0.3, 0.5) n_organ int(organ_mask.sum()) organ_noise np.random.normal(0, 0.03, n_organ).astype(np.float32) image[organ_mask] organ_intensity organ_noise image np.clip(image, 0, 1) # 病灶区域亮度更高而且约束必须在器官内部 tumor_mask random_irregular_mask(h, w) tumor_mask tumor_mask organ_mask # 病灶一定在器官内 if tumor_mask.sum() 10: tumor_mask organ_mask (np.random.random((h, w)) 0.85) tumor_intensity np.random.uniform(0.7, 0.9) n_tumor int(tumor_mask.sum()) tumor_noise np.random.normal(0, 0.02, n_tumor).astype(np.float32) image[tumor_mask] tumor_intensity tumor_noise image np.clip(image, 0, 1) # 加亮度渐变模拟CT的杯状伪影否则图像太平了不真实 gradient np.linspace(0, 0.06, h).reshape(-1, 1).astype(np.float32) image image gradient image np.clip(image, 0, 1) return image, tumor_mask.astype(np.float32)这中间有个坑我一开始random_ellipse_mask返回的是astype(np.float32) 结果在image[organ_mask]做索引时报错 IndexError: arrays used as indices must be of integer (or boolean) type 就是因为float32不能当布尔索引改成返回原始bool就好了 numpy在这方面特别严格类型不对直接报错不像Python原生那样隐式转换还有个坑是tumor_mask.sum()返回的是numpy的标量不是Python原生int 传给np.random.normal的size参数时被要求必须是整数加个int()包一下就好了这种类型的小问题在调试的时候真的烦人一跑就报错一行一行查数据封装成PyTorch的Dataset:class MedicalDataset(Dataset): def __init__(self, num_samples500, image_size128, seed42): self.num_samples num_samples self.image_size image_size # 保存随机状态防止影响全局随机性 rng_state np.random.get_state() np.random.seed(seed) self.images [] self.masks [] for _ in range(num_samples): img, mask generate_synthetic_medical_image(image_size, image_size) self.images.append(img) self.masks.append(mask) np.random.set_state(rng_state) def __len__(self): return self.num_samples def __getitem__(self, idx): img self.images[idx] mask self.masks[idx] # 加channel维度 Conv2d要求输入(N, C, H, W) img_tensor torch.from_numpy(img).unsqueeze(0).float() mask_tensor torch.from_numpy(mask).unsqueeze(0).float() return img_tensor, mask_tensor这里我保存了np.random.get_state() 生成完数据再恢复这样不会影响程序其他地方用随机数这个习惯是从一次数据跟别人对不上号的惨痛教训里学的数据搞定了就是U-Net的实现了这个网络结构我已经背得滚瓜烂熟对称的编码器-解码器中间有跳跃连接import torch import torch.nn as nn import torch.nn.functional as F class DoubleConv(nn.Module): # 两次卷积BNReLU 标准的特征提取模块 def __init__(self, in_channels, out_channels): super().__init__() self.double_conv nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size3, padding1), nn.BatchNorm2d(out_channels), nn.ReLU(inplaceTrue), nn.Conv2d(out_channels, out_channels, kernel_size3, padding1), nn.BatchNorm2d(out_channels), nn.ReLU(inplaceTrue), ) def forward(self, x): return self.double_conv(x) class Down(nn.Module): # 下采样模块池化双卷积 def __init__(self, in_channels, out_channels): super().__init__() self.maxpool_conv nn.Sequential( nn.MaxPool2d(2), DoubleConv(in_channels, out_channels), ) def forward(self, x): return self.maxpool_conv(x) class Up(nn.Module): # 上采样模块这个模块的padding处理坑了我很久 def __init__(self, in_channels, out_channels, bilinearTrue): super().__init__() if bilinear: self.up nn.Upsample(scale_factor2, modebilinear, align_cornersTrue) self.conv DoubleConv(in_channels, out_channels) else: self.up nn.ConvTranspose2d(in_channels, in_channels//2, kernel_size2, stride2) self.conv DoubleConv(in_channels, out_channels) def forward(self, x1, x2): x1 self.up(x1) # 这里处理尺寸差异如果x1和x2尺寸不对齐就做padding diffY x2.size()[2] - x1.size()[2] diffX x2.size()[3] - x1.size()[3] x1 F.pad(x1, [diffX//2, diffX - diffX//2, diffY//2, diffY - diffY//2]) x torch.cat([x2, x1], dim1) return self.conv(x) class OutConv(nn.Module): # 1x1卷积将通道数映射到分类数 def __init__(self, in_channels, out_channels): super().__init__() self.conv nn.Conv2d(in_channels, out_channels, kernel_size1) def forward(self, x): return self.conv(x) class UNet(nn.Module): def __init__(self, n_channels1, n_classes1, base_channels64): super().__init__() self.inc DoubleConv(n_channels, base_channels) self.down1 Down(base_channels, base_channels*2) self.down2 Down(base_channels*2, base_channels*4) self.down3 Down(base_channels*4, base_channels*8) self.down4 Down(base_channels*8, base_channels*8) self.up1 Up(base_channels*16, base_channels*4) self.up2 Up(base_channels*8, base_channels*2) self.up3 Up(base_channels*4, base_channels) self.up4 Up(base_channels*2, base_channels) self.outc OutConv(base_channels, n_classes) def forward(self, x): x1 self.inc(x) x2 self.down1(x1) x3 self.down2(x2) x4 self.down3(x3) x5 self.down4(x4) x self.up1(x5, x4) x self.up2(x, x3) x self.up3(x, x2) x self.up4(x, x1) logits self.outc(x) return logits这里最坑的就是Up模块里的尺寸对齐问题因为下采样如果输入不是2的整数次幂或者卷积的padding导致尺寸变化上采样回来的尺寸可能跟对应编码层的特征图尺寸差一点点比如64x64的输入经过4次下采样变成4x4 再上采样回64 中间经过转置卷积或插值尺寸有时候就差1个像素这个1像素的差异如果不处理直接torch.cat会崩尺寸不匹配报错我一开始没写padding那几行结果训练的时候报错 “Sizes of tensors must match except in dimension 1” 查了好久才发现是上采样的特征图和跳跃连接的特征图尺寸不一样后来加了手动padding计算用F.pad补成一样的尺寸损失函数用了BCE Dice的组合这个在医学图像分割里基本是标配了 BCE保证像素级分类正确 Dice保证区域重叠度def dice_loss(pred, target, smooth1.0): pred torch.sigmoid(pred) pred pred.contiguous().view(-1) target target.contiguous().view(-1) intersection (pred * target).sum() return 1 - (2.0 * intersection smooth) / (pred.sum() target.sum() smooth) def bce_dice_loss(pred, target): bce nn.BCEWithLogitsLoss()(pred, target) dice dice_loss(pred, target) return bce diceDice Loss直接优化的是分割区域和真值的重叠度比单纯的BCE要敏感得多因为BCE是对每个像素独立计算的而Dice Loss是整体计算但Dice Loss有个问题当预测和标签全为0时分母接近0 会导致梯度爆炸所以加了个smooth参数相当于一个平滑项这里要注意pred传入dice_loss时还没有做sigmoid 因为在bce_dice_loss里我直接传入logits 然后在dice_loss内部做sigmoid 而BCEWithLogitsLoss内部也会做sigmoid 所以两者各做各的互不影响评估指标也不复杂:def dice_coefficient(pred, target, smooth1.0): pred (torch.sigmoid(pred) 0.5).float() pred pred.contiguous().view(-1) target target.contiguous().view(-1) intersection (pred * target).sum() return (2.0 * intersection smooth) / (pred.sum() target.sum() smooth) def iou_score(pred, target, smooth1.0): pred (torch.sigmoid(pred) 0.5).float() pred pred.contiguous().view(-1) target target.contiguous().view(-1) intersection (pred * target).sum() union pred.sum() target.sum() - intersection return (intersection smooth) / (union smooth)评估的时候threshold设0.5 即sigmoid输出大于0.5就算预测为病灶这个阈值在比赛中可能会调但这里就按默认的0.5训练循环加了ReduceLROnPlateau 验证loss不降了就自动降低学习率:def run_training(model, train_loader, test_loader, epochs20, lr1e-3, devicecpu): model model.to(device) optimizer torch.optim.Adam(model.parameters(), lrlr) scheduler torch.optim.lr_scheduler.ReduceLROnPlateau( optimizer, modemin, factor0.5, patience5 ) history {train_loss: [], test_loss: [], dice: [], iou: []} best_dice 0.0 for epoch in range(1, epochs 1): model.train() train_loss 0 for imgs, masks in train_loader: imgs, masks imgs.to(device), masks.to(device) optimizer.zero_grad() outputs model(imgs) loss bce_dice_loss(outputs, masks) loss.backward() optimizer.step() train_loss loss.item() train_loss / len(train_loader) model.eval() test_loss, dice, iou 0, 0, 0 with torch.no_grad(): for imgs, masks in test_loader: imgs, masks imgs.to(device), masks.to(device) outputs model(imgs) loss bce_dice_loss(outputs, masks) test_loss loss.item() dice dice_coefficient(outputs, masks).item() iou iou_score(outputs, masks).item() test_loss / len(test_loader) dice / len(test_loader) iou / len(test_loader) scheduler.step(test_loss) history[train_loss].append(train_loss) history[test_loss].append(test_loss) history[dice].append(dice) history[iou].append(iou) print(fEpoch {epoch:2d} | Train: {train_loss:.4f} | Test: {test_loss:.4f} | Dice: {dice:.4f} | IoU: {iou:.4f}) if dice best_dice: best_dice dice torch.save(model.state_dict(), best_unet.pth) return history可视化部分 matplotlib的中文显示问题困扰了我一下午:import matplotlib.pyplot as plt plt.rcParams[font.sans-serif] [SimHei] # 这行不加中文全是□□□□ plt.rcParams[axes.unicode_minus] False # 负号正常显示 def plot_training_history(history): epochs range(1, len(history[train_loss]) 1) fig, axes plt.subplots(1, 2, figsize(14, 5)) axes[0].plot(epochs, history[train_loss], b-o, label训练损失, markersize3) axes[0].plot(epochs, history[test_loss], r-s, label验证损失, markersize3) axes[0].set_xlabel(Epoch) axes[0].set_ylabel(Loss) axes[0].set_title(训练与验证损失曲线) axes[0].legend() axes[0].grid(True, alpha0.3) axes[1].plot(epochs, history[dice], g-^, labelDice 系数, markersize3) axes[1].plot(epochs, history[iou], m-d, labelIoU 分数, markersize3) axes[1].set_xlabel(Epoch) axes[1].set_ylabel(Score) axes[1].set_title(分割评估指标曲线) axes[1].legend() axes[1].grid(True, alpha0.3) plt.tight_layout() plt.savefig(output_figures/training_history.png, dpi150, bbox_inchestight) plt.close()SimHei这个字体吧其实不是什么高级知识点但不知道就是不知道一搜一堆但没搜到之前看着满屏的方框真的心态爆炸那个axes.unicode_minus也要设不然负号显示成乱码分割结果的可视化:def plot_predictions(imgs_list, masks_list, preds_list): n len(imgs_list) fig, axes plt.subplots(n, 3, figsize(10, 3*n)) if n 1: axes axes.reshape(1, -1) for i in range(n): img imgs_list[i].squeeze().numpy() mask masks_list[i].squeeze().numpy() pred preds_list[i].squeeze().numpy() axes[i, 0].imshow(img, cmapgray) axes[i, 0].set_title(f输入图像 {i1}) axes[i, 0].axis(off) axes[i, 1].imshow(mask, cmapReds, alpha1.0) axes[i, 1].set_title(f真实标签 {i1}) axes[i, 1].axis(off) axes[i, 2].imshow(pred, cmapReds, alpha1.0) axes[i, 2].set_title(f预测结果 {i1}) axes[i, 2].axis(off) plt.tight_layout() plt.savefig(output_figures/segmentation_results.png, dpi150, bbox_inchestight) plt.close()这里用Reds colormap来表示病灶区域红色越深表示模型越确定那里是病灶因为我们的任务是二分类所以直接用二值化的预测结果画图就可以了合成数据的可视化:def plot_synthetic_samples(dataset, num_samples6): fig, axes plt.subplots(2, num_samples, figsize(3*num_samples, 5)) indices np.random.choice(len(dataset), num_samples, replaceFalse) for i, idx in enumerate(indices): img, mask dataset[idx] img_np img.squeeze().numpy() mask_np mask.squeeze().numpy() axes[0, i].imshow(img_np, cmapgray) axes[0, i].set_title(f合成影像 {i1}) axes[0, i].axis(off) axes[1, i].imshow(mask_np, cmapReds) axes[1, i].set_title(f病灶标签 {i1}) axes[1, i].axis(off) plt.tight_layout() plt.savefig(output_figures/synthetic_data_samples.png, dpi150, bbox_inchestight) plt.close()然后run.py组装所有模块:import torch import numpy as np import random import os def set_seed(seed42): random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) set_seed(42) device cuda if torch.cuda.is_available() else cpu print(f使用设备: {device}) from data_gen import get_dataloaders, MedicalDataset from model import UNet from train import run_training, get_sample_predictions from visualize import plot_synthetic_samples, plot_training_history, plot_predictions os.makedirs(output_figures, exist_okTrue) train_loader, test_loader get_dataloaders( batch_size4, train_ratio0.8, image_size64, num_samples200 ) plot_synthetic_samples(train_loader.dataset, num_samples6) model UNet(n_channels1, n_classes1, base_channels64) total_params sum(p.numel() for p in model.parameters()) print(f模型参数量: {total_params:,}) history, best_model_path run_training( model, train_loader, test_loader, epochs5, lr1e-3, devicedevice ) plot_training_history(history) best_model UNet(n_channels1, n_classes1, base_channels64) best_model.load_state_dict(torch.load(best_model_path, map_locationdevice)) best_model best_model.to(device) imgs_list, masks_list, preds_list get_sample_predictions( best_model, test_loader, device, num_samples4 ) plot_predictions(imgs_list, masks_list, preds_list) print(全部完成!)这里我设了随机种子保证每次跑出来的结果一样不然给导师演示的时候换了个数据结果不一样就尴尬了训练了5个epoch结果如下:Epoch 1 | Train Loss: 1.1348 | Test Loss: 1.0618 | Dice: 0.2795 | IoU: 0.1752Epoch 2 | Train Loss: 0.7597 | Test Loss: 0.5824 | Dice: 0.9560 | IoU: 0.9169Epoch 3 | Train Loss: 0.4153 | Test Loss: 0.2297 | Dice: 0.9995 | IoU: 0.9991Epoch 4 | Train Loss: 0.1989 | Test Loss: 0.2077 | Dice: 0.9578 | IoU: 0.9226Epoch 5 | Train Loss: 0.0850 | Test Loss: 0.0479 | Dice: 0.9997 | IoU: 0.9994看这个结果第一轮Dice只有0.27 基本等于随机猜因为病灶在图像中占比本来就不大随便预测全是背景反而Dice会很低到第二轮Dice直接跳到0.95 说明U-Net学得很快毕竟我的合成数据太规整了都是些椭圆圆的组合对于U-Net来说简直是小菜一碟不过如果换成真实医学影像比如CT的肝脏肿瘤分割那数据复杂得多噪声类型也不一样器官边缘也不清晰这个模型的Dice能有0.8就算很好了我的合成数据本质上就是一个玩具级的东西但作为毕设展示算法流程是够了训练历史曲线:从曲线上看损失函数下降非常快到第3轮就基本收敛了 Dice和IoU在第3轮达到0.99以上第4轮稍微降了一点然后又回升这其实说明模型在第3轮就已经过拟合了不过在我这个场景下验证集也是合成数据所以过拟合也无所谓分割结果对比:这4个测试样本的预测结果几乎跟真实标签一模一样红色区域基本重合当然这在真实数据上是不可能的事情所以我文章前面就说了合成数据就是图一乐真正的医学影像分割要考虑的东西多得多比如数据增强、归一化、后处理、CRF条件随机场等等合成数据样本预览:上面一行是生成的影像下面一行是病灶标签可以看到病灶都在器官内部亮度对比明显背景有噪声整体渐变的视觉效果其实看着还行有点CT那个意思了总结一下我踩的坑:PyTorch版本问题 CPU模式比GPU慢十倍但也能忍关键是要用对版本不然各种警告数据生成的索引类型问题 bool和float32搞混了报错数据类型不一致看不见的bug最致命 U-Net的跳跃连接尺寸对齐差一个像素就崩这个真的搞了好久 Dice Loss的smooth参数不能省否则除零错误 matplotlib中文显示一行rcParams搞定但是不知道就是不知道查了一下午代码里我用了SimHei字体就是那个黑体在Windows上一般都有如果linux或者mac上没有需要自己装不过大部分情况下Windows的matplotlib都自带了 SimHei看起来比较正式适合毕业论文图表最后说一句这个项目我在本机CPU上跑了5个epoch大概花了2分钟如果换成GPU估计十几秒就完事了但是没GPU也只能将就了真实医学影像的话建议用更大的输入尺寸至少256x256 因为病灶可能很小分辨率低了根本看不清 base_channels也可以加大到128 总之就是堆算力但毕设的话像我这样搞个demo给导师演示一下就过了完整源码链接在上面需要的自己拿有问题也别问我我也是边学边做踩坑踩过来的

记得给AI应用项目加上了AI Gateway

第一件事：模型切换太折腾了某个智能客服项目，一开始用的是GPT-4，效果确实好，但太贵了。后来听说Claude便宜点，想切过去试试。结果打开代码一看，头都大了： OpenAI的调用是 openai.chat.completions.create Claude的是 anthropic.messages.create 参数结构也不一样，Cla…...

2026/6/1 1:06:50 阅读更多 →

基于PLC控制的立体车库系统设计(设计源文件+万字报告+讲解)（支持资料、图片参考_降重降ai）_文章底部可以扫码

基于PLC控制的立体车库系统设计摘要：随着我国市场经济的不断发展，市场上各要素的配置愈加充分合理，城市人民的生活水平的提高了不少，在企业的竞争中，谁能降低成本、提高效益，谁就在市场中占有利地位。城市…...

2026/6/1 1:01:31 阅读更多 →

避坑指南：高通SA8255P平台Nor Flash烧录全流程与‘先擦后写’的血泪教训

高通SA8255P平台Nor Flash烧录避坑指南：从"先擦后写"到系统启动异常的深度解析在汽车电子与高性能计算领域，高通SA8255P平台凭借其强大的异构计算架构和功能安全特性，正逐渐成为智能座舱与自动驾驶系统的核心处理器。然而&#xff…...

2026/6/1 0:57:57 阅读更多 →

智能水印工具终极指南：如何批量为照片添加专业相机参数水印

智能水印工具终极指南：如何批量为照片添加专业相机参数水印【免费下载链接】semi-utils 一个批量添加相机机型和拍摄参数的工具，后续「可能」添加其他功能。项目地址: https://gitcode.com/gh_mirrors/se/semi-utils 还在为数百张照片手动添加相…...

2026/5/31 0:06:17 阅读更多 →

Go语言可扩展性设计：水平扩展

Go语言可扩展性设计：水平扩展1. 引言在互联网时代，业务的快速增长对系统的扩展性提出了极高的要求。水平扩展（Scale Out）作为分布式系统的核心设计理念，能够通过增加服务器节点来提升系统的整体处理能力。与垂直扩展&…...

2026/6/1 0:54:56 阅读更多 →

Claude Code Tool System 与 Permission 机制深度解析

代码解析 Claude Code Tool System 与 Permission 机制深度解析 0. 背景与定位 Claude Code 是一个运行在终端的 Agentic 编码工具，其核心能力来自工具系统（Tool System）——AI 通过调用工具与文件系统、Shell、网络、子 Agent 交互。而**权…...

2026/5/31 0:10:50 阅读更多 →