Dies ist eine alte Version des Dokuments!
Inhaltsverzeichnis
Institutional Research Data Repository of HSB: GRO.data
Welcome to the wiki World of Research Data Management (RDM), the world of Open Science and Open Education!
Research data is a shared resource and a common good that is essential for excellent research, excellent education, and excellent science. Be prepared to share and publish your research data in accordance with Good Research Practice in order to gain academic credit and academic recognition for your research, and to contribute the advancement of science in your field of research for the benefit of society and global unity.
The FDM@HSB team is here to support your research, providing RDM tools and services including the HSB Institutional Research Data Repository.
This RDM Wiki provides background information on the HSB institutional research data repository, demonstrates how it can support your research. It also provides the information you need to implement it for your research data.
We, the FDM@HSB team, greatly appreciate you taking the time to read through the wiki for your research and sincerely thank you for your valuable contribution and commitment to using the repository! We are here to help and welcome your feedback, as this plays a key role in improving our service.
Should you have any questions or need further assistance, please do not hesitate to contact us at fdm@hs-bremen.de.
Let’s work together to
- archive and/or share your research data,
- enhance its discoverability,
- facilitate its reuse by others, and
- enable your contribution to the ideal of open science.
See also:
- Forschungsdaten-Policy der Hochschule Bremen, February 3, 2025, in German
-
- For Building Trust in Science: Good Research Practice (GRP), Leibniz Institute on Aging – Fritz Lipmann Institute
- FAIR principles
- and its first publication Wilkinson et al. 2016.
1. Institutional research data repository of HSB: GRO.data
What is GRO.data?
- GRO.data is an institutional Research Data Repository.
It is managed by the eResearch Alliance, a joint group of SUB, GWDG, UMG at the Campus in Göttingen. - A Repository in the digital world is a place, where digital information is stored, and can be found and retrieved.
- A Research Data Repository in the context of Research Data Management (RDM) is a digital storage space that enables researchers and academics to store and archive their data, and make them more discoverable, reusable and accessible.
- An institutional Research Data Repository generally serves the researchers or academics of that institution or organisation.
- GRO.data is based on the open source software Dataverse.
What is Dataverse?
- Dataverse is an open source web-based research data repository that is designed to help researchers and organisations to manage, archive, publish, share, and preserve their data.
- Dataverse was initially developed at Harvard University, and is being used on almost all continents since 2006. It has a strong and vibrant community, constantly improving the software.
- It provides a robust and user-friendly environment for data management, ensuring that valuable research data is well organised, accessible, maintained and preserved over time.
- Dataverse promotes open science by facilitating data, sharing and collaboration within the research community. It follows the FAIR principles, making data Findable, Accessible, Interoperable and Reusable. This enables researchers to more easily replicate the work of others.
See below for more details about the software Dataverse.
See also: Dataverse, Harvard Dataverse Repository.
To learn about the initial idea of Dataverse, please read [King 2007].
Why should I use a research data repository?
A research data repository helps you
- archive and/or publish your research data.
- ensure your research data can be found, and reused by others.
- follow best practice, and conform with integrity and transparency, which is the core value of Good Research Practice.
- get the academic credit and academic recognition you deserve
- achieve long-term trust, recognition and sustainable success.
- contribute to the ideal of open science and open education.
See also:
- Guidelines for Safeguarding Good Research Practice, Code of Conduct., DFG: Deutsche Forschungsgemeinschaft
- FAIR principles, GO FAIR Initiative
- The FAIR Data Principles, FORCE11: The Future of Research Communications and e-Scholarship
In general, we recommend selecting a subject-specific repository that aligns with your intended use because of its discipline-specific community and the high relevance of a scientific field.
If you are looking for a subject-specific repository, re3data is a global registry of research data repositories that can help facilitate your search.
GRO.data is a general-purpose research data repository adapted by the Göttingen eResearch Alliance. It supports you to
- publish, archive, and share your research data.
Your data can be enriched with metadata, and get persistent identifiers, PIDs, such as DOIs. These metadata are propagated to the central DOI database, which is used by different search engines like Datacite Search.
In addition, GRO.data offers features such as versioning, data citation, file previews, license assignment, file restriction, and controlled access permissions via roles and responsibilities. Furthermore, the functionalities are highly extensible.
HSB provides GRO.data as an institutional research data repository in cooperation with the GWDG, which manages and operates the GRO.data.
Its primary purpose is to support the storage and archiving of "cold" research data that is infrequently accessed, and to make it available to others.
What is the GWDG?
- It serves as an IT competence centre and data centre with over 50 years of experience. It is one of the locations for a supercomputer operated by the North German Supercomputing Alliance, a collaboration of seven federal states in Northern Germany.
- The GWDG provides future and customer-oriented reliable services and infrastructure to support and advance science for its excellence in research and teaching, now and in the future. See also Mission Statement.
- Data processing has always been an important service of the GWDG, and along with the development of technologies, the GWDG continues to provide different solutions to support science and research. Data archive, data management plans, data repository, publication management, which support researchers, scientists, data scientists, data stewards with their research data, are some of the examples.
Where and how is my data stored? And how secure is my data?
- GRO.data is hosted in academic data centre according to German data protection and data security directives. The authentication and authorisation are processed and operated via the Academic Cloud. DFN-AAI-Service is applied to carry out authentication and authorisation using the software “Shibboleth”. DFN-AAI is a service infrastructure for research and education communities in Germany.
- The Academic Cloud was developed to support scientific use and research in Lower Saxony, is provided by the GWDG in Göttingen, gets support in planning and operation by LANIT, and is co-funded by the MWK in Hannover. The Academic Cloud offers proven software applications as reliable cloud services. Uses can access the services with a single account through a uniform portal.
- The data is stored in multiple locations across Göttingen in order to ensure redundancy. There are daily backups of the data which are stored on disks and tapes, and the drives are not on the same machine with the application itself.
- The data storage process is one of the most important assets of the GWDG. It applies e.g., two factor authentication with dedicated hardware to access internal systems including the software application and the storage system with highly structured authentication and authorization management.
See also: Terms and conditions, Imprint, Privacy Notice of the GRO.data website.
What is a Cloud and Cloud Computing?
- Cloud and cloud computing is the concept of on-demand use of computing resources, such as physical or virtual servers, data storage, networking capabilities, software applications, over the internet.
- This technology is more eco-friendly than the traditional IT solutions, which reduces the consumption of energy and the ecological footprint, while simultaneously enhancing the performance, scalability, availability, accessibility, flexibility of IT services and its environment. See figure cloud computing.
Do you perhaps have a visual representation of GRO.data and the Academic Cloud?
See:
- Figure: GRO.data and the Academic Cloud
- Figure: Conceptual View of GRO.data and the Academic Cloud
See also:
-
- Northern Germany’s fastest computer, Göttingen’s supercomputer “Emmy” fifth fastest in Germany and, 47th in the world
- SUB, Göttingen State and University Library
- FAIR principles:
2. Getting Started
We show you here the key concepts and main features of GRO.data/Dataverse in more details at your convenience to understand the software better, and highlight all the important aspects of preparing and handling data before you storing, archiving and publishing your data, which would be keys for processing your data into the GRO.data repository smoothly and efficiently.
We strive to prepare everything we can to save your work and time. However, if you have any further questions, please feel free to contact us at: fdm@hs-bremen.de.
Let's work together walk through the preparation process
- for publishing and/or archiving your intended research data to achieve your research goals,
- for others who will reuse your data,
- for a meaningful global world of research and science.
2.1 Dataverse Basic and main Features
2.1.1. Data Organisation and Management in the software Dateverse
"Dataverse collection" and "Dataset" are two basic concepts of data organisation in the software Dataverse.
The word “dataverse” has a doubled meaning in this context. To make it easier to understand and avoid any confusion, we use the notation “dataverse” for a “dataverse collection”, and the notation “Dataverse” for the software Dataverse, i.e., “dataverse” ⇒ “dataverse collection”, and “Dataverse” ⇒ the software Dataverse.
A dataverse is a container for all your datasets, and other dataverses. They are virtual archives where you can organise your data. It can be setup for individual researchers, departments, organisations, etc. It helps you manage your data, and works like a folder.
Each dataverse contains datasets and/or other dataverses, and each dataset contains descriptive metadata and data files, including a description of the methods, documentation and codes associated with the data. All of this will make it easier for other researchers to discovery and understand your dataset. i.e., dataset ⇒ metadata & data & code & documentation. And for the purpose of organising your research work, you can also nest dataverses into other dataverses, if you wish.
All datasets and files are automatically assigned a DOI in the repository of GRO.data, when they are published. You can control the access to the data, for example you can open your data to the general public, or restrict access to it. Permissions and access control can also be applied for a single file.
See also:
2.1.2. Metadata Support
A) Supported Metadata
A dataset has three types of metadata schemata
- Citation Metadata
This is required and default setting. It is standardised citation of datasets, making easier for researchers to publish their data and get credit as well as recognition for their work. It is the metadata that are needed for generating a data citation.
- Domain Specific Metadata
Dataverse has currently special support for Social Science, Life Science, Geospatial, Journal, Astronomy and Astrophysicsdatasets :- Geospatial Metadata
- Social Science & Humanities Metadata
- Life Sciences Metadata
- Journal Metadata
- Astronomy and Astrophysics Metadata
- File-level Metadata
This varies depending on the type of data file. Examples include: editing file name as needed; adding file descriptions (file level “terms”); adding tags at the file level. See also Edit File Metadata
Even more, you can have customised metadata, if you wish.
Before creating custom metadata, consider how to best utilise existing metadata, and carefully evaluate the necessity and usefulness of the custom metadata. While creating custom metadata offers advantages such as unique complementary information, it is also time-consuming. For a unique, large-scale and long-term project that produces significant amounts of unique data, using custom metadata can be beneficial. One example is the Collaborative Research Centre 990, CRC990. See CRC990.
B) Supported Metadata Export Formats
Once a dataset has been published, its metadata can be exported in a variety of other metadata standards and formats, which help make datasets more discoverable and usable in other systems, such as other data repositories. The following metadata export formats are available:
- Dublin Core
- DDI (Data Documentation Initiative Codebook 2.5)
- DDI HTML Codebook (A more human-readable, HTML version of the DDI Codebook 2.5 metadata export)
- DataCite 4
- JSON (native Dataverse Software format)
- OAI_ORE
- OpenAIRE
- Schema.org JSON-LD
See also:
2.1.3 Licenses, custom licenses and Custom terms of use
However, you can select any licence, which fits your data and research or define your own terms of use for your data.
Custom licenses and custom terms of use allow users to define specific conditions for the access and usage of their data. While the default setting is CC0 1.0, which effectively waives all rights and allows unrestricted use of the data, users can choose to apply more tailored licenses. These custom licenses can specify restrictions on commercial use, attribution requirements, or modifications.
Custom terms of use provide additional flexibility by outlining specific conditions related to data access, sharing, and redistribution. This ensures that data creators can maintain control over how their work is used while still making it available to others.
See also:
2.1.4 Access Control and Permissions
The access control and permissions of the software Dataverse are quite flexible. Users have different levels of access control, i.e.:
- Dataverse level access control
- Dataset level access control
- File level access control
User accounts can be granted roles that define which actions they are allowed to take on specific dataverses, datasets and/or files. And each role comes with a set of permissions, which define the specific actions that users may take.
This means you can also restrict any files in your datasets. Permission refers here to the access of files, as metadata is always visible. If you do restrict file access, do not restrict a file without any terms, please give information about why it’s restricted, otherwise your data isn’t FAIR.
The figure below shows an overview of roles and permissions in the software Dataverse.
See also: Roles & Permissions
See also:
2.1.5 Other useful features
- File hierarchy
- File previews Previewers are available for these file types
- Versioning Dataset Versions, Replace Files
- Usage statistics and metrics
- Guestbook
- Faceted search
- …
For more information see Dataverse Features
2.2 GRO.data Servers
If you feel confident with the data handling and process of data archiving and publishing, feel free to go to the production server of GRO.data, otherwise you can go to the GRO.data test server to try out everything first before putting your data on the production server.
Login with both of the servers via academic cloud/AcademicID, and go to the server website and follow the login with academic cloud.
2.3 Prepare your data
2.3.1. Preparation process for archiving and publishing data
The following diagram guides you in preparing your data thoroughly and professionally before publishing and archiving your data. If you have a lightweight dataset to archive, you might finish your preparation quickly, but do plan more time than you suppose to need.
Once you get everything done, you'll enjoy the ease that the repository software does lots of work for you, e.g. backup your data daily, and your published data is available around-the-clock, all year round, for all interested users.
References:
2.3.2. File formats: consideration and recommendation for selecting suitable file types and formats for archiving and sharing
Here are the considerations and recommendations for selecting suitable file formats for archiving and sharing:
- Open and non-proprietary data formats when possible: they do not depend on specific non-open software, therefore they would have a high likelihood of long-term sustainability.
- File formats that are an international standard for your files.
- File formats that commonly used in your research areas, by research communities, or other interested parties.
- Preferably data formats, which are human-readable, e.g., plain text files in contrast to binary files.
- File formats that have not been developed by a vendor-independent standards organisation or a community, but if the development has stabilised, these formats can be seen as equivalent to open formats, quasi-standard, for practical purpose. One example is TIFF, which is proprietary but widely used and well-documented, another example is the archiving format ZIP.
- File formats that are not platform-independent, but they're supported by Windows machines, Macs, and Unix-, Linux-based systems.
- Text files should be in ASCII or UTF-8 encoding.
- No file formats that are not widely available and understood.
Currently, we recommend the following file types and formats for archiving and sharing: see Table 1.
More about Organising Files, File Formats and Document Data:
- Organizing Data, researchdata.org
- File Formats for Archiving, ETH Zürich, in German
- File Formats, forschungsdaten.info, in German
- Sustainability of Digital Formats, Library of Congress
- A Study of Digital Preservation File Format, University of Illinois, 2014
- Document Data, University Library Zurich
- Documentation in Research Data Management, Crystal Lewis, Data Management in Large-Scale Education Research
Feel free to take a look at our Survival Kit for research data management on the topics of metadata, data security, versioning, data formats, (short info), with useful links, as well as preparation and publication of data.
2.3.3 A test Dataset with folders
Here we have a test dataset with folders as an example to show you a possible folder structure. Please feel free to have your own folder structure fitting your dataset, e.g., with raw data and/or aggregated data etc.
2.3.4 Example Data Formats for some Special User Groups
To maximise our support and to save your time and work, we collect some special example data and data formats here for your convenience.
DICOM data: Examples from Harvard Dataverse
GRO.data User Guide
- GRO.data overview
- Account and Login
- Creating a Dataverse
- Adding Dataset
- How to publish
- Roles and permissions
- Dataset Terms - License/Data Use Agreement
- Data citation
- Finding and using data
Feel free to visit our website Research Data Management at HSB City University of Applied Sciences, in German.
We are committed to ensuring that your intellectual property rights are not infringed upon. In the event that this is indeed the case, please notify us at once.









