Efficient Transformers
Universal Transformer
Adaptive architecture where depth is dynamically determined by an adaptive halting mechanism rather than fixed. Universal Transformer iteratively applies shared-weight transformations with adaptive attention.
← Kembali