并行层替换代码示例【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skillsAPI 差异当前仓库的ColumnParallelLinear/RowParallelLinear/QKVParallelLinear使用tp_size: inttp_rank: int参数不接受tp_group。以下示例中的tp_group写法对应重构版 API当前仓库需改为tp_rankdist.get_rank(self.hccl_comm_dict[xxx_group])。VocabParallelEmbedding在两个版本中都用tp_size tp_rank。Attention 层当 attn_tp_size 1原组件替换为通信组说明QKV LinearQKVParallelLinearattn_tp_group列切分Q/K/V 按头数分O LinearRowParallelLinearattn_tp_group行切分含 AllReducefrom module.linear import QKVParallelLinear, RowParallelLinear # QKV 投影 self.qkv_proj QKVParallelLinear( hidden_sizeconfig.hidden_size, num_headsconfig.num_attention_heads, num_key_value_headsconfig.num_key_value_heads, head_dimconfig.head_dim, tp_groupself.attn_tp_group, tp_sizeself.attn_tp_size, ) # O 投影 self.o_proj RowParallelLinear( config.hidden_size, config.hidden_size, tp_groupself.attn_tp_group, tp_sizeself.attn_tp_size, )o_proj_tp_size 独立配置当o_proj_tp_size ≠ attn_tp_size时如 MLA 模型self.o_proj RowParallelLinear( config.hidden_size, config.hidden_size, tp_groupself.oproj_tp_group, # 独立通信组 tp_sizeself.o_proj_tp_size, )Dense FFN 层当 dense_tp_size 1原组件替换为通信组说明Gate LinearColumnParallelLineardense_tp_group列切分Up LinearColumnParallelLineardense_tp_group列切分Down LinearRowParallelLineardense_tp_group行切分含 AllReducefrom module.linear import ColumnParallelLinear, RowParallelLinear self.gate_proj ColumnParallelLinear( config.hidden_size, config.intermediate_size, tp_groupself.dense_tp_group, tp_sizeself.dense_tp_size, ) self.up_proj ColumnParallelLinear( config.hidden_size, config.intermediate_size, tp_groupself.dense_tp_group, tp_sizeself.dense_tp_size, ) self.down_proj RowParallelLinear( config.intermediate_size, config.hidden_size, tp_groupself.dense_tp_group, tp_sizeself.dense_tp_size, )Embedding / LMHead当 embed_tp_size 1 或 lmhead_tp_size 1from module.linear import VocabParallelEmbedding, ColumnParallelLinear # Embedding参数为 tp_size tp_rank无 tp_group self.embed_tokens VocabParallelEmbedding( config.vocab_size, config.hidden_size, self.padding_idx, torch.bfloat16, tp_sizeself.embed_tp_size, tp_rankdist.get_rank(self.hccl_comm_dict[embed_tp_group]) if self.embed_tp_size 1 else 0, ) # LMHead当前仓库tp_size tp_rank同 Embedding self.lm_head ColumnParallelLinear( config.hidden_size, config.vocab_size, tp_sizeself.lmhead_tp_size, tp_rankdist.get_rank(self.hccl_comm_dict[lmhead_tp_group]) if self.lmhead_tp_size 1 else 0, )模块间数据重排当相邻模块 TP 度不同时# Embed(embed_tp16) → Attention(attn_tp1) dist.all_gather_into_tensor(full_input, embed_output, groupembed_tp_group) # Dense FFN(dense_tp8) 的输入/输出 dist.all_gather_into_tensor(x_output, x, groupdense_tp_group) # 输入聚合 # ... FFN 计算 ... dist.reduce_scatter_tensor(mlp_res, down_proj, groupdense_tp_group) # 输出分散参考实现cann-recipes-infer/models/longcat-flash/models/modeling_longcat_flash.py搜索all_gather_into_tensor和reduce_scatter_tensorcann-recipes-infer/models/deepseek_r1/models/modeling_deepseek.py搜索同上【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考