Abstract Global optical flow estimation is the foundation stone for obtaining odometry which is used to enable aerial robot navigation. However, such a method has to be of low latency and high robustness whilst also respecting the size, weight, area and power (SWAP) constraints of the robot. A combination of cameras coupled with inertial measurement units (IMUs) has proven to be the best combination in order to obtain such low latency odometry on resource‐constrained aerial robots. Recently, deep learning approaches for visual inertial fusion have gained momentum due to their high accuracy and robustness. However, an equally noteworthy benefit for robotics of these techniques are their inherent scalability (adaptation to different sized aer...