chap10 注意力机制(Transformer)

发表于2021-12-26更新于2021-12-26字数统计28阅读时长1分

layer norm 是对每个句子的全部字归一，batch norm 是全部句子的第 d 个字归一