@bachelorsthesis{Komon2022,
  title =      "Distributed Surface Reconstruction",
  author =     "Patrick Komon",
  year =       "2022",
  abstract =   "As the accessibility of high-quality 3D scans increases,
               processing the scanned data becomes more challenging. 3D
               scanners obtain very large, unstructured sets of points, so
               called point clouds. To be able to use the data in a
               meaningful way it is necessary to reconstruct the surface of
               the scanned object from the point cloud, resulting in a 3D
               model. This is called the problem of 3D surface 
               reconstruction. Processing very large point clouds (in a
               reasonable time) is necessary in order to keep up with ever
               increasing scanning technology. In this thesis, we
               construct, implement and evaluate a distributed surface
               reconstruction algorithm called DistributedBallFilter. It is
               a distributed-memory parallel version of the recently
               developed BallFilter algorithm [Ohr22]. Firstly, the input
               point cloud is subdivided into chunks called tiles using a
               3D grid. To ensure the correctness of the results, tiles are
               slightly overlapping on their borders. After splitting the
               input, each tile can be processed independently from each
               other. The tiles are assigned and distributed to a number of
               p processes. The assignment of tiles to processes is
               calculated using longest-processing-time-first list
               scheduling. Then all processes reconstruct the 3D surface of
               all their assigned tiles in parallel. After all tiles are
               processed, the result is merged back together into a single
               3D model, containing the reconstructed surface of the entire
               input point cloud. The asymptotic run time complexity is O(n
               log n) in the worst case (same as BallFilter) and O(n + n
               log n p ) in the best case, depending on the distribution of
               points within the input data. Furthermore, we implemented
               the algorithm in C++. The input splitting is run on a GPU
               using CUDA and discussed thoroughly in its dedicated paper
               [Bru22]. For each tile a single file is output, which is
               communicated to each process via a distributed file system.
               The MPI standard is used for sending all local results to a
               single process, which is also responsible for merging and
               outputting the final 3D model. Finally we executed the
               algorithm on the VSC3+ cluster, a high-performance cluster
               based in Vienna. It was run against several test data sets.
               We visualize the results and analysed the behavior of the
               running time when scaling the number of processes as well as
               the input size. In our tests, DistributedBallFilter managed
               to be up to around five times faster than BallFilter
               depending on the number of nodes used and the input size.
               The largest observed speedup was by a factor of 5.89
               compared to BallFilter.",
  month =      sep,
  pages =      "2",
  address =    "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
  school =     "Research Unit of Computer Graphics, Institute of Visual
               Computing and Human-Centered Technology, Faculty of
               Informatics, TU Wien ",
  URL =        "https://www.cg.tuwien.ac.at/research/publications/2022/Komon2022/",
}