Model Parallelism Here are some common methods about how to achieve model parallelism in multi-gpus Nowadays, we use data parallelism more; the other two are suitable for small VRAM GPUs to train a large network.