写在前面
本文首发于公众号:NewBeeNLP
这一期魔改Transformers主要关注对原始模型中位置信息的讨论与优化,
Self-Attention with RPR from Google,NAACL2018
Self-Attention with SPR from Tencent,EMNLP 2019
TENER from FDU
Encoding Word Order in Complex Embedding,ICLR2020
1、Self-Attention with R
具有长距离注意的Lite变压器
inproceedings{Wu2020LiteTransformer,
title={Lite Transformer with Long-Short Range Attention},
author={Zhanghao Wu* and Zhijian Liu* and Ji Lin and Yujun Lin and Song Han},
booktitle={International Conference on Learning Represe