iRODS Wiki – Forschungsdatenmanagement

iRODS is a decentralized virtual file system that is separate from your normal directories in Linux or Windows. In iRODS, directories are called collections and they can contain further sub-collections (sub-directories) or data objects (files). In practice, you can treat collections and data as directories and files for the sake of simplicity. Files and folders in your own local and mounted directories can be archived (copied) to the remote iRODS file system provided by the ZDV. Archiving is done by first creating a directory (collection) in the remote path and then copying your local files and directories to the remote path in iRODS. The archived files can be copied back or downloaded years later. The archives can be private and only stored long-term, they can be restricted to research groups or offered for public download.

Archiving in iRODS is currently possible via the command line of the ZDV Linux work PCs, the web server linux.zdv.uni-mainz.de and the supercomputer MOGON II/MOGON-NHR. MOGON also has a special iRODS wiki. In the future, there will be new and more user-friendly options for accessing iRODS. Currently, only users with a university account can use iRODS. If your current work computer does not have a standard ZDV Linux installation, you can access the Linux servers via SSH. From there, you can use the available iRODS iCommands to archive data. Personal computers with an iRODS iCommand client installation can also access the ZDV iRODS archive.

Before accessing the iRODS server, a configuration file must be available and you must authenticate yourself with your JGU account. The configuration file “irods_environment.json” is located in the hidden folder “.irods” and has the following content:

{ 
  "irods_authentication_scheme": "pam_password",
  "irods_client_server_negotiation": "request_server_negotiation",
  "irods_client_server_policy": "CS_NEG_REQUIRE",
  "irods_encryption_algorithm": "AES-256-CBC",
  "irods_encryption_key_size": 32,
  "irods_encryption_num_hash_rounds": 16,
  "irods_encryption_salt_size": 8,
  "irods_host": "irods.zdv.uni-mainz.de",
  "irods_port": 1247,
  "irods_user_name": "FILL IN YOUR USERNAME HERE",
  "irods_home": "/zdv/home/FILL IN YOUR USERNAME HERE",
  "irods_zone_name": "zdv"
}

Do not forget to enter your correct user name. Then enter the command “iinit” and your JGU password.

Here is a short summary of the most important iRODS commands with some important command line parameters. Note that they start with an ‘ i ‘:

Command	Parameters	Description
`ipwd`		Print current iRODS working directory (collection)
`ils`	-l, -L, -A, -r	Lists of the iRODS directory (collection) (-l: with details; -L: more details; -A: ACL, -r recursive)
`icd`		Change iRODS directory (collection) to the target path
`imkdir`	-p	create a new directory (collection) (directory; -p: create full path with parents)
`iput`	-K, -r, –metadata	Upload files/directories, (K: calculate and validate checksums; -r: recursive, –metadata add descriptions of the data: “Title;My data;;Description;Research data;;”)
`iget`	-r, -f	Download iRODS target data to the current local path, -r recursive, -f overwrite
`iticket`	create read	Create a ticket for downloading files from the Internet
`imeta`	ls -d/-C	Listing the metadata of an iRODS file or directory (collection)
`imeta`	set -d/-C Key “Value”	Add/update metadata to an iRODS file or directory (collection); e.g: Key = Title, “Value” = My research data: Key = Title, “Value” = My research data

Here is a brief summary of our iRODS wrapper commands that simplify the archiving process of research data. Note that wrappers start with an ‘ i_ ‘, unlike pure iRODS commands which start with ‘i’:

Command	Parameters	Description
`source /usr/local/bin/i_init.sh`	For linux.zdv.uni-mainz.de	Initializes iRODS and displays the current remote iRODS directory after each subsequent command.
`i_exit`		Deletes the current iRODS remote directory and cleans up the local settings.
`i_archive`		Uploads the 1st argument to the current remote iRODS directory/collection and adds the metadata of the .json file of the second argument to all uploaded files.
`i_metaupdate`		Updates the (first argument) remote iRODS directory/collection and updates the metadata (second argument) using the .json file.
`i_publish`		Creates an iRODS web download ticket and links for all files in the path (first argument). Anyone can use the links to download the files.
`i_ticketget`		Prints the iRODS ticket of a remote iRODS path, if available.
`i_downlinkget`		Uses the iRODS ticket and prints the public web download links.

A simplified procedure for archiving research data would use the simplified wrappers and look as follows:

source /usr/local/bin/i_init.sh for the systems linux.zdv.uni-mainz.de
Provides the wrapper commands and initializes iRODS

imkdir -p /zdv/home/MUSTERPERSON/project123/
creates a new remote iRODS path to which the data is copied;

icd /zdv/home/MUSTERPERSON/project123/
changes the current remote iRODS directory to the newly created one;

i_archive /fullpath/some.file metadata.json
archives a local file some.file into the current remote iRODS path by copying it and transferring the metadata from the metadata.json files.

i_publish /zdv/home/MUSTERPERSON/project123/some.file
creates a ticket number (e.g.: ABCDEFG123456890) and prints the generated web download link.

Metadata is accepted by iRODS as triplets (Attribute [i.e: Title], Value [i.e: “Research Data from X publication”],, Unit[meist immer leer gelassen]).
The first two fields Attribute and Value are mandatory fields and must not be empty, the Unit is optional.

You can manually add metadata to each remote iRODS file and directory after uploading by selecting
imeta set -d some.file Title "My research data from Publication X"

This would be very complex and time-consuming, so it is advisable to define a .json metadata file for all files and directories of a single archive tree.

i_archive some.file metadata.json
Can be used to upload your local files with metadata all at once.

i_metaupdate /zdv/home/MUSTERPERSON/project123/some.file metadata.json
Can be used to update a single remote file/directory with metadata.

This flat metadata.json example requires a minimal set of metadata attributes:

{
 "Title":"",
 "ResourceType":"",
 "Project":"",
 "Keywords":"",
}

{
 "Title":"My Scientific Data from XYZ publication",
 "ResourceType":"Tables, Texts, Images",
 "Project":"BMBF-12345, DFG-67890",
 "Keywords":"Thermodynamics, Simulation, HPC, MPI , XYZ, ,BMBF, DFG",
}

The following attributes are set automatically when uploading files/directories and are not required in the .json file:

Creator, Publisher, Location, Date, ExpiryDate(Date + 10 years), protected (default: "false") .

The following attributes are recommended in your .json metadata file:

{
 "Title":"",
 "ResourceType":"",
 "Project":"",
 "Keywords":"",
 "Contributor":"",
 "Reference":"",
 "License":""
}

{
 "Title":"My Scientific Data from XYZ publication",
 "ResourceType":"Tables, Texts, Images",
 "Project":"BMBF-12345, DFG-67890",
 "Keywords":"Thermodynamics, Simulation, HPC, MPI , XYZ, ,BMBF, DFG",
 "Contributor":"Co-author1, Co-author2, Co-author3",
 "Reference":"",
 "License":"GPLX, CC0, CC-BY"
}

You can confirm that the metadata has been set by using:
imeta ls -d /zdv/home/MUSTERPERSON/project123/some.file for files
imeta ls -C /zdv/home/MUSTERPERSON/project123/ for directories.

Summary of imeta:

Parameters	Description
add\|set\|rm\|ls\|cp	command, see the next table for details(ls\|cp do not need the AVU triplet)
-d dataObject \|-C directory/collection	which object/collection (file/path) is to be queried/edited
Attributes Value [Unit]	AVU Triplet, in which the unit is optional

Command description:

Command	Description
add	add an AV(U) triplet
set	Set a single value
rm	Remove an AV(U) triplet
ls	List existing metadata. If attribute is specified, only metadata of the specified attribute is listed
cp	Copy existing metadata. Requires a destination and a source (e.g. `imeta cp -d source -c target`)

Publishing research data:

A ticket for downloading files must be created for public access.
With this ticket and the path to the remote iRODS storage location, anyone can download the information and content.
Already archived data is published with :

i_publish

i_publish /zdv/home/MUSTERPERSON/project123/some.file
generates a ticket number (i.e: ABCDEFG123456890) and prints the generated web download link

You can also perform the publishing process manually using iRODS commands instead:
iticket create read /zdv/home/MUSTERPERSON/project123/some.file
generates a ticket number (i.e: ABCDEFG123456890) udie is used to download some.file from the network
Then you need to add the path so that it looks like this: wget https://irods-web.zdv.uni-mainz.de/irods-rest/rest/fileContents/zdv/home/MUSTERPERSON/some.file?ticket=ABCDEFG123456890Data Policy and Licensing

The “creator” is responsible for ensuring that the further use of third-party data is lawful (copyright law) and that personal data is handled correctly (GDPR).
This applies to all archived data, even if the “creator” is no longer employed at the university or if the data is not public.
This decision aid can help you to decide whether your data may be published.

If your data can actually be published, there are different types of licenses for different cases:

Software
- Apache
- GPL, GPLv2, GPLv3
- MIT
- BSD
Art, pictures, text, etc.
- Creative Commons (Text, Arts, Photos, …)
Data records
- Open Data Commons

The applicability of CC-BY licenses for software is not recommended: see CC recommendation and discussion. The same applies to datasets where publication under a CC licence other than CC0 is questionable. Other licenses for datasets can be found at Open Definition Licenses Service.

Avoid proprietary file formats, as you don’t know if the software to open them will still be available in a few years’ time. Try to use open standards.