Model Parallelism
Sharded Data Parallelism
A combination of data parallelism and the ZeRO strategy, where model weights are partitioned (sharded) among workers while maintaining data parallelism.
← Tillbaka