[TOC]
TD算法
有模型强化学习
规划和学习
DP算法
[TOC]
ceph quincy 版本部署
[TOC]
gym
多臂老虎机
[TOC]
多臂老虎机
[TOC]
注意力机制
[TOC]
Linux常用
[TOC]
10. Transformer 原理
本文 翻译自 Ketan Doshi 博客中关于 Transformers Explained Visually 的系列文章
- Overview of Functionality :Components of the architecture, and behavior during Training and Inference
- How it works, step-by-step :How data flows and what computations are performed, including matrix representations
- Multi-head Attention :Inner workings of the Attention module throughout the Transformer
- Why Attention Boosts Performance:How does Attention capture the relationships between words in a sentence
http://fancyerii.github.io/2019/03/09/transformer-illustrated/
[TOC]