Block Mmad Pingpong【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置功能说明BlockMmad的偏特化实现block层级mmad计算不做bias计算非异步计算非TLA实现。调度策略// Now ENABLE_UNIT_FLAG_ must be false when input element is int8 template bool ENABLE_UNIT_FLAG_ false struct MmadAtlasA2Pingpong : public MmadAtlasA2 { static constexpr uint32_t STAGES 2; static constexpr bool ENABLE_UNIT_FLAG ENABLE_UNIT_FLAG_; };当ENABLE_UNIT_FLAG_为true时使能L0C同时搬出和写入提高流水并行度。调用示例Block组装参考basic_matmulconstexpr bool enableUnitFlag true; using MmadDispatchPolicy Gemm::MmadAtlasA2PingpongenableUnitFlag; using L1TileShape GemmShape128, 256, 256; using L0TileShape GemmShape128, 256, 64; using AType Gemm::GemmTypehalf, LayoutA; using BType Gemm::GemmTypehalf, LayoutB; using CType Gemm::GemmTypehalf, LayoutC;using BlockMmad Gemm::Block::BlockMmadMmadDispatchPolicy, L1TileShape, L0TileShape, AType, BType, CType;Block实例化参考basic_matmul在kernel代码的void operator()AscendC::AIC函数中Arch::ResourceArchTag resource; BlockMmad blockMmad(resource);Block执行参考basic_matmul在kernel代码的void operator()AscendC::AIC函数中blockMmad(gmA[gmOffsetA], // A矩阵的block块在GM上起始地址 params.layoutA, // A矩阵在GM上的layout gmB[gmOffsetB], // B矩阵的block块在GM上起始地址 params.layoutB, // B矩阵在GM上的layout gmC[gmOffsetC], // C矩阵的block块在GM上起始地址 params.layoutC, // C矩阵在GM上的layout actualBlockShape); // block块的实际shape约束说明模板参数BiasType_没有实际使用不支持bias计算【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考