Machine Learning and Artificial Intelligence for Embedded Systems

With advances in artificial intelligence research, more and more areas of application are emerging. These traditionally resource-hungry technologies are also increasingly finding their way into small, mobile and even wearable systems. This development is possible because specialised hardware as well as resourceful optimisations drastically reduce resource and energy requirements, whereas sound automation and templates increasingly remove hurdles for application.

Optimizations for Neural Networks

Our research focuses on optimizing AI models, particularly deep learning (DL) architectures, for deployment on resource-constrained IoT devices such as embedded FPGAs. The focus is especially on the quantization of neural networks, i.e. the reduction of numerical precision of model parameters to decrease memory usage and computational load, as well as on the use of transformer models.

Optimierung von neuronalen NetzenIn this context, we generally investigate optimizations for the training of networks while striving for the quickest and most efficient computations possible, for instance, through the use of separable convolutions.

In addition, in quantised neural networks (QNN), a low bit depth of two or fewer bits, in conjunction with the properties of FPGAs, allows operations within the network to be pre-computed to be stored in the LUTs of the configurable logic blocks for the shortest possible access time.

Contact: Lukas Einhaus, M.Sc.

 

Our research focuses on energy-efficient time-series analysis on resource-constrained embedded FPGAs. We work on quantization and hardware-aware deployment of deep neural networks (especially Transformers) on embedded FPGAs. Our work covers forecasting, classification, and anomaly detection, with comparative studies involving LSTM models, 1D-CNN models and other hybrid architectures. Key methods include multi-objective optimization and mixed-precision quantization to balance accuracy, latency, and energy use, supported by an automated deployment framework across different FPGA platforms.

Our automated toolchain, ElasticAI.creator, is used for model development and deployment. The goal of the research is to create a balance between model accuracy and hardware-related consumption (such as latency, energy). These findings are applied in our research project RIWWER.

Contact: Tianheng Ling, M.Sc.

Spike sorting and online training for quantised neural networks

Sp:Ai:ke: Rohdaten, Filterung, Sortierung usw.Another exciting application of quantised neural networks, which is currently being studied in the Sp:Ai:Ke research project, is the time series analysis of (real biological) neural signals.

The records of such signals always include whole bundles of signals as

well as a certain amount of background activity. To get a clear picture of individual neural activities, sophisticated signal processing is required, which in addition has to have adaptive qualities.

More precisely, adjustments to the signal classification become necessary at runtime (i.e. online) due to the neuronal drift; the deployed network must be continuously readjusted with only few exemplary data samples.

Contact: Leo Buron, M.Sc.

Hardware-aware neural architecture search

Finding the ideal neural architecture for a particular use case can take a lot of time and trial and error.



Neural Architecture Search (NAS for short) automates this design step and even finds architectures that outperform those created manually in some cases. Furthermore, even the required resource estimation can be automated by a Deep Neural Network. In this way, the tailored deployment of neural networks is simplified greatly.

The method, which we currently focus on, is based on the combination of evolutionary algorithms and reinforcement learning and discovers optimal architectures by deliberately mutating, recombining and weeding out contenders. Our requirements typically include efficient operation on resource-constrained hardware, hence hardware costs such as latency or energy consumption are included as optimisation criteria in the architecture search.

Our ongoing studies once again focus on applications in the domains of signal processing and time series analysis, e.g. to find latency-optimized architectures for the simulation of time-variant non-linear audio effects.

Ansprechpartner: Christopher Ringhofer, M.Sc.





​Building on this work, it is becoming increasingly evident that the performance of neural architectures strongly depends on subsequent compression techniques. In most cases, NAS and compression methods such as quantization are considered in completely independent steps to keep the search space manageable. However, this approach has the drawback that dependencies between the chosen architectural parameters and the effects of the applied
quantization scheme are not taken into account during NAS, which can lead to suboptimal models. Our current research therefore investigates ways to combine both approaches in a meaningful manner, with a particular focus on expanding the search space and refining the evaluation strategy for the sampled models.

Ansprechpartnerin: Natalie Maman, M.Sc.


The TransfAIr project plays a key role in translating this research into practical applications. Within the project, the development of the ElasticAI.Explorer toolbox is being advanced. This toolbox is designed to provide tools that enable AI models to be executed on various embedded systems and to be automatically optimized for them through HW-NAS.
The goal is to facilitate and improve the “TransfAIr” of AI models across diverse hardware environments. Particulary, the focus lies on the deployment on different hardware platforms as well as the extension and enhancement of the software stack.

Ansprechpartner: Robin Feldmann, M.Sc.