Measurements data storage

Hello everyone!

I was wondering what would be the safest and the most convenient way to organize data storage in the group (to be specific: measurements results).

Right now we are using the K-drive, which is convenient, since it’s available on all the measurements computers. Basically, we store all the results immediately on the K-drive. But this method is unsafe, since there is no back-up (as far as I understand) or versioning.

Another idea I was thinking about is to use Syncthing, but then it would be impossible to store all the information locally on all the computers (well, we would have to get an extra hard drive for all the PCs we use). And we would love to access all the information on all the computers - for example, to check the SEM image of the measured device, or compare prev measurements.

Maybe it’s possible to set-up a combination of K-drive and something else, like this Syncthig, or maybe some cloud storage. Probably it would be enough just to have a one storage with a backup of the K-drive folder.

I’ve seen a post by @gsteele13, we should also check if Synology is what we need.

I would be glad to hear any recommendation on this issue!

Synology could certainly do it (and a lot more!) but it is also quite a bit of work to set it all up.

It would be great to hear if there is a better lightweight data synchronisation option out there.

(I am also not 100% happy with the synology drive syncing software, would also maybe be interested to switching over to another software solution to sync the data to my synology)

One thing one could consider is the “research drive” software from Surf:

https://www.surf.nl/en/research-drive-securely-and-easily-store-and-share-research-data

Cheers,
Gary

BTW, here is my earlier post on the data storage / management implementation in my group:

Data management and documentation

I think K drive (together with others from ict) has regular backup done by ICT, but I also heard about incidents happened long ago where they lost some data.

We are using the group’s gitlab server, with a typical dataset size being somewhat small—under a gigabyte. I am not psyched about it, but it works, and it’s backed up by the TUD ICT.

I also see the university page listing storage options, which links to a couple internal pages describing different options.

Some time ago the university ICT experimented with iRODS—a dedicated data storage system, but I’m not sure what was their conclusion.

I was involved in the iRODS discussion

The conclusion was that it was far too complicated for typical use cases, and that the university should provide a sync service to get all the data into one place

I think they are trying to do that using Research Drive: there was discussion of a university subscription. But I don’t know what happened

Sync seems easy, but it’s actually quite a complex problem in concurrency (i took a computer science theory course once on concurrency…not an easy pickle to solve…)

There was a famous incident many years ago where a fire alarm went off in error. The system then filled all the servers with the fire-prevention stuff, but they lost loads of the drives. And it was not backed up in two locations back then.

My understanding is that most of the drives (K: for sure and also a lot of the bulk drive L:) are now at least replicated across two different physical locations (second replication is in Leiden).

If you’re not sure, Andre van den Berg is a good person to contact

The project drive (U:) is currently the recommended storage solution for this as it can be accessed by multiple members of the group (using their own devices), and it is backed up by TU Delft ICT at two physical locations when you select this when you request the drive (like the K drive, although I’m double checking with André whether the K drive is indeed backed up at two physical locations). If you’re interested in options with versioning control, GitLab is also a recommended storage solution, but it won’t be ideal for large data files. In the latter case Subversion may be more appropriate. I see Anton has already linked the page with more information, but just to be sure here it is again: https://www.tudelft.nl/en/library/current-topics/research-data-management/r/manage/storage/

Gary mentioned ResearchDrive, but we found out about half a year ago that it was still in development. Perhaps they have improved it in the meantime, but we do not have TU Delft licenses for this as far as I’m aware. We can try to get you started if you want to test this, but this may be slow based on the Data Steward team experiences.

The IRODs project was terminated. I think the latest development is implementing OneDrive as the new storage location, with everyone having access to at least 1 TB. This is expected to take place within a year, so this may still take some time. I have been told that this solution would also be accessible to students, so I’m looking forward to this being implemented.

You can also follow Gary’s approach but please do contact André van den Berg and Lolke Boonstra from ICT, so that they are aware that you are taking this approach and to ensure you get the right NAS in case you want to store it at the Data Center. We had some bumps with the last person/group trying to implement this, so this may result in some resistance from ICT as they do not want to be dealing with every research group separately, as this will be a huge burden on them in the long term. As this is not a TU Delft solution, ICT will also only enable you to set it up but not really support it, so you would have to do most things yourself.

Happy to clarify if anything is unclear from my message!

1 Like