Data Champion: Xuan Liu

CRC/TRR 196: MARIEXuan Liu: How collaborative projects pose new challenges to RDM

Dr. Xuan Liu completed her PhD thesis on MEM-enabled reconfigurable reflectarray for terahertz beam steering as part of CRC MARIE. She received the Data Champion Award for her contributions to developing a metadata scheme for the CRC and for her practice of data sharing. In our interview, she discusses the challenges of RDM in collaborative work, the process of developing metadata descriptions, and how to motivate researchers to improve their RDM practice.

RDS: Thanks a lot, Xuan, for making time for this interview and congratulations on being a data champion for MARIE!

Xuan Liu (XL): Thank you!

RDS: You’re a data champion because of your work with Dataverse and your contributions to the development of metadata descriptions. How did you use data management tools? How did you benefit from that?

XL: Data management is important for me because my work generates lots of data – both simulation data and measurement data, which I need for my dissertation and for publications later on. Data management is a matter of good scientific practice: we have to make our data understandable to others and we have to be able to provide proof for how we worked.

How do I use it in my day-to-day research? When I generate data, it is first stored on my local computer, but most of the data I generate is not just the result of my own work, but part of a collaboration. MARIE is a Collaborative Research Center, so we needed a sharing platform to work on data together as well as an archive in which we can store our data so others can use it later on for their own research.

So for MARIE, we introduced RDM tools like Nextcloud and Dataverse to make collaboration easier. For example, to share hot data, meaning data that we still have to clean and process and analyze, we use Nextcloud so that multiple people can access, modify, and extract the useful data. Once we’re done processing the data and have published our findings, the data is cold and needs to be archived. That’s what we use Dataverse for. For example, I am leaving MARIE and won’t be there in the third funding phase – so I won’t be available in the next two or three years to tell the people who are taking on my work or the people who want to use my data what formats I used and so on. But in Dataverse we have metadata schemas for all kinds of data so that people can easily understand my datasets and use them for their own research.

RDS: Looking back at the development of metadata schemes, what do you think was the biggest challenge to describe the different kinds of experiments, simulations and measurements in MARIE?

XL: The biggest challenge was that it’s almost impossible to develop a metadata scheme that covers everything. Just imagine – we have more than 20 subprojects and more than 40 PhDs working in MARIE. We are from different areas: Some are in physics, some are from the photonic side, some from the electronic side, some are engineers … our research is very different and the simulations we do are also very different.

For example, we work with simulation data and this data is created by some kind of software, right? But what kind? So when you put the software in the metadata, then people will have a better idea of what kind of simulation it is – is it an electromagnetic simulation or a numerical simulation with MATLAB? In the metadata you can also see the version of the software, which is important when someone wants to reproduce a simulation.

These are some basic metadata, but it’s impossible to cover everything. But we can also add some extra information – the simplest way is to upload a readme file with your dataset to specify what cannot be covered by the Dataverse metadata scheme. Readme files were recommended throughout the workshops for that reason.

RDS: Exactly, that was the great work you did in the RDM workshops – you discussed standards and identified ten common or shared fields so that you as a group in MARIE could find a compromise to establish a basic standard. But since experiments are complex, readme files are the right tool.

XL: People from many different groups attended the workshops to develop the metadata scheme. We wrote down everything that was relevant for us – and then the Research Data Services team gathered all the information, cleaned it, and extracted what was important and developed a metadata scheme. Later on, there was a beta phase of the Dataverse so we could test whether it worked for us, whether anything needed to be improved. So we went back and forth a couple of times before the final version.

As a result of these three workshops, most of the information we need is covered in the metadata scheme. Attending the workshops gave us a chance to contribute our opinion, because we really differ in our research topics, our perspectives, the tools and equipment that we use, the way we think, the way we design stuff. And if you hold that kind of workshop again in the third phase, maybe the metadata schema can be refined even more.

RDS: Thank you, it’s good to hear that you appreciate these workshops and the discussion with your colleagues since it’s so important for the collaboration. Finally, is there anything that’s missing for you personally? Anything that you would have liked to have or anything that could improve RDM in the future?

XL: I think the tools provided are sometimes more than researchers are willing to use. They feel that metadata entry takes too much time, they think, “I already have so many things to do, I have to do publications, I have to do measurements, I have to do exercises – and now this.”

Metadata has to be specific and detailed, so yes, it will take some time, but the process has already been improved! For example, you can use a mask, you can generate a script so you don’t have to fill in every field every single time, but people still think it’s too complicated. They just have to take the time to do it. The challenge is to motivate them, to raise awareness that RDM is a matter of good scientific practice, that it’s part of being a good researcher, a good scientist. It’s not actually that complicated – so many features have already been implemented to reduce the workload. What’s missing really needs to come from the researchers.

Actually, measures like the Data Champion Award already help – after the ceremony, people came and asked me what I did to get the award, so I told them about the workshops and about sharing my data. And between the announcement of the award and the day of the ceremony, a lot of datasets were uploaded to Dataverse, so it has been a big motivator.

RDS: That’s good to hear! Thank you for the interview, and congratulations again on the award!

Contact

Do you have any questions? Please do not hesitate to contact us.

Contact