Vision Transformers (ViT)
Image Patch Tokenization
Process of cutting an image into non-overlapping fixed-size patches, typically 16x16 pixels, which are then converted into sequential tokens.
← Geri