System Overview
Overview: AutoCrane uses visual and LIDAR data. AutoCrane's incoming visual sensor data is (pre-)processed in parallel by means of a highly distributed system architecture. For each imaging sensor (currently camera and 3D-LIDAR), one process ("sensor node") is provided. A sensor node processes incoming data with high bandwidth. It sends only the processing results with significantly lower bandwidth into the synchronous control loop running the core. The sensor processing is designed to be carried out completely in one control cycle, either in "free run" clocked by the respective sensor, or synchronized with the main loop in the central software.
The images from the cameras are processed at 20 Hz by a multi-layer image processing chain adapted to the perspective and the task which extracts the required information. While the later processing steps of this chain also make use of classical machine vision algorithms, a neural network is always used in the first step. In this way, we achieve an accuracy and robustness with respect to lighting, weather and environmental characteristics and other changes that would not be practically possible with classical methods alone.
The proprietary neural networks we use are based on the "All-Convolutionary-Net" and "U-Net" developed in Freiburg, Germany, that have been proven in widespread use worldwide. The use of pre-trained networks "off-the-shelf" was ruled out by the unusual domain and specific requirements. Besides dealing with the unusual perspective and rotating objects, we pay special attention to achieve low inference times (typically maximum 5-15ms) and noticeable data sparseness (typically 20,000 images compared to 14 million images in the ImageNet dataset) of the networks and training algorithms.
Besides dealing with the unusual perspective and rotating objects, we pay special attention to achieve low inference times (typically maximum 5-15ms) and noticeable data sparseness (typically 20,000 images compared to 14 million images in the ImageNet dataset) of the networks and training algorithms.
The system uses LIDAR data to create height maps and to determine the position and shape of objects in conditions where visual information is not sufficient. The height maps are automatically generated from incoming measurements (3D point cloud of a single measurement) and continuously updated. They are part of the world model and are used for collision avoidance, path planning and determination of placement and pickup location in the log yard. Known objects are recovered in the point clouds, and their 6D pose (position in space and orientation along all three spatial axes) can be determined.
By means of this "registration", very precise localization and relative distance determination between several objects is possible, such as between the grapple and the load. KAI uses these measurements as an independent, secondary measurement when unloading trucks, and as primary information when unloading trains. Here, the grapple of the crane and the train itself are registered in the point cloud using a variant of the Iterative Closest Point Algorithm. The 3D models required for this can either be created via previous scans of the objects with LIDAR sensors or automatically extracted from CAD models.
Training: The neural networks have to be trained for each crane and grapple type and for different loads. For this purpose, thousands of images are collected, the informative ones selected and then annotated individually by humans (wo create training labels manually).
This is done in the cloud component "CRID" developed by us. Along the images CRID stores frequency, specific activities, time of day and environmental conditions as well as malfunctions (retrospectively, using the buffer). CRID links and indexes the images with machine data from the PLC and log files from the AI. This makes it possible to answer queries such as "Return all images taken in the morning in which a truck can be seen" over hundreds of millions of images in real time.

The training of the Networks takes place in several iterations of a so-called "data loop" and follows the idea of "bootstrapping": the steps data collection, curation, annotation, network training, and network evaluation are executed several times. With each iteration, more and more helpful images are collected and new, improved Networks are created until the desired Network quality is achieved. By analyzing the "confidence" of the Network output, we can find and mark "difficult" and "interesting" images in CRID in a fully automated and time-saving way.
Interested in learning more?
Get in touch directly with our Managing Director of Sales. Simply press the button below, and we’ll connect you personally to discuss how our solutions can support your operations.

Volker Voss
Managing Director Sales
Get in touch
Interested in learning more? - Get in touch directly with our Managing Director of Sales. Simply press the button below, ask your questions, and we’ll get in touch to discuss how our solutions can support your operators.
