Analytical Data Science

"Many laboratory scientists view data analysis as a hobby—the last thing performed when writing up a paper or report on a Friday afternoon. They often spend months or years acquiring data and at the very last minute produce some statistical analysis to include in their paper to please referees and editors."

— Richard G. Brereton, A short history of chemometrics: a personal view

Brereton's quite provoking statement addresses all of us, and we must admit that he has hit the nail on the head. Over the years, measurement data has become more extensive and complex, and looking at the developments of digitalization, processing and analyzing measurement data will become more and more crucial. However, the number of chemometric methods is almost infinite, but unfortunately, there is a vast lack of standardization. Making the right decision highly depends on the initial research question you want to answer and the data's structure. And in many cases, after searching in vain, we use methods we are already familiar with, knowing full well that they are of limited use in answering our questions. Ultimately, it is a pity as we did not manage to unfold the full potential of our scientific experiments. To that end, the IAC established a new junior research group to challenge data processing and analysis. The new group will work closely with all IAC scientists, develop new chemometric workflows, and realize automation concepts to handle large or big datasets.

The first project will focus on liquid chromatography high-resolution mass spectrometry (LC-HRMS) non-target analysis (NTA) data. In this context, Lotta Hohrenk-Danzouma et al. from the IAC already confirmed the assumption that NTA results are spiked with many false positives. Moreover, results evaluated with algorithm A do not match those evaluated with algorithm B or C. However, the conventional algorithms for NTA are not insufficient. Still, the study pointed out that it is impossible to set up and optimize all evaluation parameters for the individual algorithms. Additionally, most algorithms for NTA need to come up with information about result reliability. Therefore, the project will aim to find new and more robust ways for NTA evaluation and develop a new supporting output variable containing information about result reliability.