Dies ist eine alte Version des Dokuments!

Inhaltsverzeichnis

Institutional Research Data Repository of HSB: GRO.data

Institutional Research Data Repository of HSB: GRO.data

Welcome to the wiki World of Research Data Management (RDM), the world of Open Science and Open Education!

Research data is a shared resource and a common good that is essential for excellent research, excellent education, and excellent science. Be prepared to share and publish your research data in accordance with Good Research Practice in order to gain academic credit and academic recognition for your research, and to contribute the advancement of science in your field of research for the benefit of society and global unity.

The FDM@HSB team is here to support your research, providing RDM tools and services including the HSB Institutional Research Data Repository.

This RDM Wiki provides background information on the HSB institutional research data repository, demonstrates how it can support your research. It also provides the information you need to implement it for your research data.

We, the FDM@HSB team, greatly appreciate you taking the time to read through the wiki for your research and sincerely thank you for your valuable contribution and commitment to using the repository! We are here to help and welcome your feedback, as this plays a key role in improving our service.

Should you have any questions or need further assistance, please do not hesitate to contact us at fdm@hs-bremen.de.

Let’s work together to

archive and/or share your research data,
enhance its discoverability,
facilitate its reuse by others, and
enable your contribution to the ideal of open science.

See also:

Forschungsdaten-Policy der Hochschule Bremen, February 3, 2025, in German
Good Research Practice
- Guidelines for Safeguarding Good Research Practice, Code of Conduct
- For Building Trust in Science: Good Research Practice (GRP), Leibniz Institute on Aging – Fritz Lipmann Institute
FAIR principles
- GO FAIR
- FORCE11
- and its first publication Wilkinson et al. 2016.

If you are confident with the topic of repositories and research data management, or prefer a practical approach, please refer to our one-page Lightning Start Graphic and Quick Start Guide.

1. Institutional research data repository of HSB: GRO.data

What is GRO.data?

GRO.data is an institutional Research Data Repository.
It is managed by the eResearch Alliance, a joint group of SUB, GWDG, UMG at the Campus in Göttingen.
A Repository in the digital world is a place, where digital information is stored, and can be found and retrieved.

A Research Data Repository in the context of Research Data Management (RDM) is a digital storage space that enables researchers and academics to store and archive their data, and make them more discoverable, reusable and accessible.

An institutional Research Data Repository generally serves the researchers or academics of that institution or organisation.

GRO.data is based on the open source software Dataverse.

What is Dataverse?

Dataverse is an open source web-based research data repository that is designed to help researchers and organisations to manage, archive, publish, share, and preserve their data.

Dataverse was initially developed at Harvard University, and is being used on almost all continents since 2006. It has a strong and vibrant community, constantly improving the software.

It provides a robust and user-friendly environment for data management, ensuring that valuable research data is well organised, accessible, maintained and preserved over time.

Dataverse promotes open science by facilitating data, sharing and collaboration within the research community. It follows the FAIR principles, making data Findable, Accessible, Interoperable and Reusable. This enables researchers to more easily replicate the work of others.

See below for more details about the software Dataverse.
See also: Dataverse, Harvard Dataverse Repository.
To learn about the initial idea of Dataverse, please read [King 2007].

Why should I use a research data repository?

A research data repository helps you

archive and/or publish your research data.

ensure your research data can be found, and reused by others.

follow best practice, and conform with integrity and transparency, which is the core value of Good Research Practice.

get the academic credit and academic recognition you deserve

achieve long-term trust, recognition and sustainable success.

contribute to the ideal of open science and open education.

See also:

Guidelines for Safeguarding Good Research Practice, Code of Conduct., DFG: Deutsche Forschungsgemeinschaft
Good Research Practice, DFG
FAIR principles, GO FAIR Initiative
The FAIR Data Principles, FORCE11: The Future of Research Communications and e-Scholarship
Good Research Practice, Leibniz Institute on Aging – Fritz Lipmann Institute

In general, we recommend selecting a subject-specific repository that aligns with your intended use because of its discipline-specific community and the high relevance of a scientific field.

If you are looking for a subject-specific repository, re3data is a global registry of research data repositories that can help facilitate your search.

GRO.data is a general-purpose research data repository adapted by the Göttingen eResearch Alliance. It supports you to

publish, archive, and share your research data.

Your data can be enriched with metadata, and get persistent identifiers, PIDs, such as DOIs. These metadata are propagated to the central DOI database, which is used by different search engines like Datacite Search.

In addition, GRO.data offers features such as versioning, data citation, file previews, license assignment, file restriction, and controlled access permissions via roles and responsibilities. Furthermore, the functionalities are highly extensible.

HSB provides GRO.data as an institutional research data repository in cooperation with the GWDG, which manages and operates the GRO.data.

Its primary purpose is to support the storage and archiving of "cold" research data that is infrequently accessed, and to make it available to others.

What is the GWDG?

The GWDG is a joint institution of the University of Göttingen and the Max Planck Society.

It serves as an IT competence centre and data centre with over 50 years of experience. It is one of the locations for a supercomputer operated by the North German Supercomputing Alliance, a collaboration of seven federal states in Northern Germany.

The GWDG provides future and customer-oriented reliable services and infrastructure to support and advance science for its excellence in research and teaching, now and in the future. See also Mission Statement.

Data processing has always been an important service of the GWDG, and along with the development of technologies, the GWDG continues to provide different solutions to support science and research. Data archive, data management plans, data repository, publication management, which support researchers, scientists, data scientists, data stewards with their research data, are some of the examples.

Where and how is my data stored? And how secure is my data?

GRO.data is hosted in academic data centre according to German data protection and data security directives. The authentication and authorisation are processed and operated via the Academic Cloud. DFN-AAI-Service is applied to carry out authentication and authorisation using the software “Shibboleth”. DFN-AAI is a service infrastructure for research and education communities in Germany.

The Academic Cloud was developed to support scientific use and research in Lower Saxony, is provided by the GWDG in Göttingen, gets support in planning and operation by LANIT, and is co-funded by the MWK in Hannover. The Academic Cloud offers proven software applications as reliable cloud services. Uses can access the services with a single account through a uniform portal.

The data is stored in multiple locations across Göttingen in order to ensure redundancy. There are daily backups of the data which are stored on disks and tapes, and the drives are not on the same machine with the application itself.

The data storage process is one of the most important assets of the GWDG. It applies e.g., two factor authentication with dedicated hardware to access internal systems including the software application and the storage system with highly structured authentication and authorization management.

See also: Terms and conditions, Imprint, Privacy Notice of the GRO.data website.

What is a Cloud and Cloud Computing?

Cloud and cloud computing is the concept of on-demand use of computing resources, such as physical or virtual servers, data storage, networking capabilities, software applications, over the internet.

This technology is more eco-friendly than the traditional IT solutions, which reduces the consumption of energy and the ecological footprint, while simultaneously enhancing the performance, scalability, availability, accessibility, flexibility of IT services and its environment. See figure cloud computing.

Do you perhaps have a visual representation of GRO.data and the Academic Cloud?

See:

Figure: GRO.data and the Academic Cloud
Figure: Conceptual View of GRO.data and the Academic Cloud

See also:

Dataverse
eResearch Alliance
GWDG
- 50 Jahre GWDG – eine Zeitreise durch 50 Jahre wissenschaftliche Datenverarbeitung bei der GWDG (only in German)
- Northern Germany’s fastest computer, Göttingen’s supercomputer “Emmy” fifth fastest in Germany and, 47th in the world
SUB, Göttingen State and University Library
The Max Planck Society
FAIR principles:
- GO FAIR, GO FAIR Initiative
- FORCE11, The Future of Research Communications and e-Scholarship

2. Getting Started

We show you here the key concepts and main features of GRO.data/Dataverse in more details at your convenience to understand the software better, and highlight all the important aspects of preparing and handling data before you storing, archiving and publishing your data, which would be keys for processing your data into the GRO.data repository smoothly and efficiently.

We strive to prepare everything we can to save your work and time. However, if you have any further questions, please feel free to contact us at: fdm@hs-bremen.de.

Let's work together walk through the preparation process

for publishing and/or archiving your intended research data to achieve your research goals,

for others who will reuse your data,

for a meaningful global world of research and science.

2.1 Dataverse Basic and main Features

2.1.1. Data Organisation and Management in the software Dateverse

"Dataverse collection" and "Dataset" are two basic concepts of data organisation in the software Dataverse.

The word “dataverse” has a doubled meaning in this context. To make it easier to understand and avoid any confusion, we use the notation “dataverse” for a “dataverse collection”, and the notation “Dataverse” for the software Dataverse, i.e., “dataverse” ⇒ “dataverse collection”, and “Dataverse” ⇒ the software Dataverse.

A dataverse is a container for all your datasets, and other dataverses. They are virtual archives where you can organise your data. It can be setup for individual researchers, departments, organisations, etc. It helps you manage your data, and works like a folder.

Each dataverse contains datasets and/or other dataverses, and each dataset contains descriptive metadata and data files, including a description of the methods, documentation and codes associated with the data. All of this will make it easier for other researchers to discovery and understand your dataset. i.e., dataset ⇒ metadata & data & code & documentation. And for the purpose of organising your research work, you can also nest dataverses into other dataverses, if you wish.

All datasets and files are automatically assigned a DOI in the repository of GRO.data, when they are published. You can control the access to the data, for example you can open your data to the general public, or restrict access to it. Permissions and access control can also be applied for a single file.

See also:

2.1.2. Metadata Support

A) Supported Metadata

A dataset has three types of metadata schemata

Citation Metadata

This is required and default setting. It is standardised citation of datasets, making easier for researchers to publish their data and get credit as well as recognition for their work. It is the metadata that are needed for generating a data citation.

Domain Specific Metadata
Dataverse has currently special support for Social Science, Life Science, Geospatial, Journal, Astronomy and Astrophysicsdatasets :
- Geospatial Metadata
- Social Science & Humanities Metadata
- Life Sciences Metadata
- Journal Metadata
- Astronomy and Astrophysics Metadata

File-level Metadata

This varies depending on the type of data file. Examples include: editing file name as needed; adding file descriptions (file level “terms”); adding tags at the file level. See also Edit File Metadata

Even more, you can have customised metadata, if you wish.

Before creating custom metadata, consider how to best utilise existing metadata, and carefully evaluate the necessity and usefulness of the custom metadata. While creating custom metadata offers advantages such as unique complementary information, it is also time-consuming. For a unique, large-scale and long-term project that produces significant amounts of unique data, using custom metadata can be beneficial. One example is the Collaborative Research Centre 990, CRC990. See CRC990.

B) Supported Metadata Export Formats

Once a dataset has been published, its metadata can be exported in a variety of other metadata standards and formats, which help make datasets more discoverable and usable in other systems, such as other data repositories. The following metadata export formats are available:

Dublin Core
DDI (Data Documentation Initiative Codebook 2.5)
DDI HTML Codebook (A more human-readable, HTML version of the DDI Codebook 2.5 metadata export)
DataCite 4
JSON (native Dataverse Software format)
OAI_ORE
OpenAIRE
Schema.org JSON-LD

See also:

2.1.3 Licenses, custom licenses and Custom terms of use

The DEAL consortium recommends “CC BY” licence, Open Access Means CC BY: https: //deal − konsortium.de/en/why – ccby
However, you can select any licence, which fits your data and research or define your own terms of use for your data.

Custom licenses and custom terms of use allow users to define specific conditions for the access and usage of their data. While the default setting is CC0 1.0, which effectively waives all rights and allows unrestricted use of the data, users can choose to apply more tailored licenses. These custom licenses can specify restrictions on commercial use, attribution requirements, or modifications.

Custom terms of use provide additional flexibility by outlining specific conditions related to data access, sharing, and redistribution. This ensures that data creators can maintain control over how their work is used while still making it available to others.

See also:

2.1.4 Access Control and Permissions

The access control and permissions of the software Dataverse are quite flexible. Users have different levels of access control, i.e.:

Dataverse level access control
Dataset level access control
File level access control

User accounts can be granted roles that define which actions they are allowed to take on specific dataverses, datasets and/or files. And each role comes with a set of permissions, which define the specific actions that users may take.

This means you can also restrict any files in your datasets. Permission refers here to the access of files, as metadata is always visible. If you do restrict file access, do not restrict a file without any terms, please give information about why it’s restricted, otherwise your data isn’t FAIR.

The figure below shows an overview of roles and permissions in the software Dataverse.

See also:

2.1.5 Other useful features

File hierarchy
File previews Previewers are available for these file types
Versioning Dataset Versions, Replace Files
Usage statistics and metrics
Guestbook
Faceted search
…

For more information see Dataverse Features

2.2 GRO.data Servers

If you feel confident with the data handling and process of data archiving and publishing, feel free to go to the production server of GRO.data, otherwise you can go to the GRO.data test server to try out everything first before putting your data on the production server.

Login with both of the servers via academic cloud/AcademicID, and go to the server website and follow the login with academic cloud.

Please keep in mind: Never publish your REAL DATA on the GRO.data Test Server, it should be published on the GRO.data Production Server!

2.3 Prepare your data

2.3.1. Preparation process for archiving and publishing data

The following diagram guides you in preparing your data thoroughly and professionally before publishing and archiving your data. If you have a lightweight dataset to archive, you might finish your preparation quickly, but do plan more time than you suppose to need.

Once you get everything done, you'll enjoy the ease that the repository software does lots of work for you, e.g. backup your data daily, and your published data is available around-the-clock, all year round, for all interested users.

References:

2.3.2. File formats: consideration and recommendation for selecting suitable file types and formats for archiving and sharing

Here are the considerations and recommendations for selecting suitable file formats for archiving and sharing:

Open and non-proprietary data formats when possible: they do not depend on specific non-open software, therefore they would have a high likelihood of long-term sustainability.
File formats that are an international standard for your files.
File formats that commonly used in your research areas, by research communities, or other interested parties.
Preferably data formats, which are human-readable, e.g., plain text files in contrast to binary files.
File formats that have not been developed by a vendor-independent standards organisation or a community, but if the development has stabilised, these formats can be seen as equivalent to open formats, quasi-standard, for practical purpose. One example is TIFF, which is proprietary but widely used and well-documented, another example is the archiving format ZIP.
File formats that are not platform-independent, but they're supported by Windows machines, Macs, and Unix-, Linux-based systems.
Text files should be in ASCII or UTF-8 encoding.
No file formats that are not widely available and understood.

Currently, we recommend the following file types and formats for archiving and sharing: see Table 1.

More about Organising Files, File Formats and Document Data:

Organizing Data, researchdata.org
File Formats for Archiving, ETH Zürich, in German
File Formats, forschungsdaten.info, in German
Sustainability of Digital Formats, Library of Congress
Harvard Library Digital Preservation
A Study of Digital Preservation File Format, University of Illinois, 2014
Document Data, University Library Zurich
Documentation in Research Data Management, Crystal Lewis, Data Management in Large-Scale Education Research

Feel free to take a look at our Survival Kit for research data management on the topics of metadata, data security, versioning, data formats, (short info), with useful links, as well as preparation and publication of data.

2.3.3 A test Dataset with folders

Here we have a test dataset with folders as an example to show you a possible folder structure. Please feel free to have your own folder structure fitting your dataset, e.g., with raw data and/or aggregated data etc.

If you upload a dataset with folder structure as a ZIP file, the Dataverse software will automatically fill in the file path information for each file contained in the .ZIP file. If there is more than one file in the dataset, and at least one of them has a non-empty directory path, the dataset page will present an option for switching between the traditional table view, and the tree-like view of the files showing the folder structure as in the test dataset example.

2.3.4 Example Data Formats for some Special User Groups

To maximise our support and to save your time and work, we collect some special example data and data formats here for your convenience.

DICOM data: Examples from Harvard Dataverse

GRO.data User Guide

GRO.data User Guide: Part I

GRO.data overview
Account and Login
Creating a Dataverse
Adding Dataset

GRO.data User Guide Part: II

How to publish
Roles and permissions
Dataset Terms - License/Data Use Agreement
Data citation
Finding and using data

Feel free to visit our website Research Data Management at HSB City University of Applied Sciences, in German.

We endeavour to ensure the accuracy of the information on the wiki and provide references for it. Please note that, we cannot guarantee the completeness of the information. We warmly welcome your feedback, comments and suggestions.

We are committed to ensuring that your intellectual property rights are not infringed upon. In the event that this is indeed the case, please notify us at once.