您好,欢迎光临本网站![请登录][注册会员]  
文件名称: ALBERT_presentation.pdf
  所属分类: 深度学习
  开发工具:
  文件大小: 2mb
  下载次数: 0
  上传时间: 2019-10-20
  提 供 者: u0113*****
 详细说明:ALBERT: A Lite BERT for Language Understanding作者演讲PPtWe are witnessing a big shift in the approach in natural language understanding in the last two years Full-network pre-training share most of the parameters between pre-training and fine-tuning NSP Mask LM Mask LM MNLI/NER/SQuAD Start/End span BERT ∴ BERT ICLS] 1 [SEPITok t ICLS Masked sentence A Masked sentence B Question Paragraph Unlabeled Sentence a and B Pair Question Answer Pair Pre-training Fine-Tuning Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification. 2018 Radford, Alec, et al. "Improving language understanding by generative pre-training. 2018 Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding 2018 Full-network pre-training is the alexNet in NLU IMAGENET 900 30 83.2 80.0 72.0 70.0 590 600 d 10% d 50.0 09 40.0 2010 2011 2012 2013 20142015 Gated AR(Oct. 2017)GPT (Jun 2018) BERT(Nov 2018) XLNET(Jun 2019) Roberta(JuL 2019) Human performance Turker performance Improvements on ImageNet over the years mprovements on RaCe in the last two years RACE: English Reading Comprehension EXam for Chinese Middle/High Schoo Can we improve Full-network pre-training models similar to what computer vision community did for AlexNet? BERT uses self-supervised loss NSP: Predict whether the Mask Lm: recover the two input sentences are randomly masked token units next to each other or not Add Norm NSP Mask lm Mask lM Feed Forward C‖T1 T1 LX Add norm BERT Multi-Head Attention E ICLS Tok 1 Tok N Tok 1 TokM Positional Encoding ① Masked Sentence a Input Masked Sentence B Embedding Unlabeled sentence a and b pair Inputs What happens after AlexNet? Revolution of depth 28.2 25.8 152 layers 16.4 11.7 22 layers 19 layers 6.7 73 3.57 8 layers 8 layers shallow ILSVRC15 LSVRC14 LSVRC14 LSVRC13 ILSVRC12 LSVRC11 LSVRC10 ResNet GoogleNet VGG AlexNet Image Net Classification top-5 error(%) Slide credit: kaiming he For BERT, increasing the depth (l and width(h) of the network leads to better performance Hyperparams Dev Set accuracy #L h #a LM (ppl) MNLi-m MRPC SST-2 3768125.84 77.9 79.888.4 676835.24 80.6 82.2907 QQ00 6768124.68 81984.8913 12768123.99 84.4 86.792.9 121024163.54 85.786993.3 241024163.23 86687893.7 Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding 2018 s having better NLU model as easy as increasing the model size
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 相关搜索: ALBERT_presentation.pdf
 输入关键字,在本站1000多万海量源码库中尽情搜索: