Finally, we have finished all contents we want to talk about. In this section, we’ll do a quick summary about what we have talked about and plan for the future of this series.

Summary

In our ten sections of tutorial, we are learning from low-level (tensors) to high-level (modules). In detail, the structure looks like this:

Tensor operations (Sec 1, 2)
Tensor-wise operations (Sec 3)
Module basics (Sec 4)
Implement by pure-python (Sec 5 ResNet)
Implement by CUDA (Sec 6, 7, 8, 9)

Conclusion

From our tutorial, we know that the model consists of nn.Modules. We implement the forward() function with many tensor-wise operations to do the forward pass.

The PyTorch is highly optimized. The Python side is enough for most cases. So, it is unnecessary to implement the algorithm in C++/CUDA. (Ref to sec 9. Our CUDA matrix multiplication operation is slower than the PyTorch’s). In addition, when we are writing in native Python, we don’t need to worry about the correctness of the gradient calculation.

But just in some rare cases, the forward() implementation is complicated, and they may contain for loop. The performance is low. Under such circumstances, you may consider to write the operator by yourself. But keep in mind that:

You need to check if the forward & backward propagations are correct;
You need to do benchmarks - does my operator really get faster?

Therefore, manually write a optimized CUDA operator is time consuming and complicated. In addition, one should be equipped with proficient CUDA knowledge. But once you write the good CUDA operators, your program will boost for many times. They are all about trade-off.

Announce in Advance

Finally, let’s talk about some things I will do in the future:

This series will not end. For this series article 11 and later: we’ll talk about some famous model implementations.
As I said above, writing CUDA operator needs proficient CUDA knowledge. So I’ll setup a new series to tell you how to write good CUDA programs: CUDA Medium Tutorials

Future's blog

PyTorch Practical Hand-Written Modules Basics 10--Summary and Conclusion

Summary

Conclusion

Announce in Advance