We are developing a novel deep learning method for 3d data. Like in deep neural networks, we employ automatic differentiation and gradient descent to optimise the parameters of a function (see youtube).
Currently we work with pytorch, a popular python framework for deep learning. It provides automatic differentiation, but the performance can be further improved by implementing certain functions in C++/CUDA. That is the task for this project.
I already implemented one of the needed functions and got an improvement of about 10x, but that can be further improved. There are also other functions that would benefit by porting them from pytorch to CUDA. Such a port would require to derive and implement the gradient computation, in which I would help you.