Data Management

Definition: Data management comprises all the disciplines related to managing data as a valuable resource. (source: Wikipedia)

Data management is more than just a file folder to drop data. It is an entire methodology, including:

  • actual storage media
  • retention plans,
  • compliance requirements, and
  • sharing capabilities.

While each university user’s needs will differ, there are a variety of storage options offered by both on campus as well as off-campus providers.

The university library offers assistance regarding the compliance of research data to National Science Foundation data management requirements. Our sidebar at right links to some of those resources.

Additional resources for learning about data management plans are also available from the NSF, NIH, and other peer institutions.

Also included in this list are off-campus resources that are targeted towards specific use cases. However when the use case fits the need, these resources are often unparalleled by local resources in both size and cost.

Data Storage

For departments, labs, PI’s and schools, Information and Technology Services (ITS) at U-M offers a variety of storage options. Backup services can also be purchased from ITS.

ITS Suite of Storage Options and Backup Service

Templates for Data Management Plans

Several U.S. funding agencies, including the National Science Foundation and the National Institutes of Health, require researchers to supply detailed, cost-effective plans for managing research data.

The DMPTool has been developed by several universities and organizations to help researchers meet these new requirements. The University of Michigan is one of the contributing institutions.

If you are submitting a proposal to NSF’s Directorate for Engineering, the College of Engineering has developed a template: NSF Data Management Plan Template for the College of Engineering.

Repositories

Databib
Databib is a tool for helping people identify and locate online repositories of research data. Users and bibliographers create and curate records that describe data repositories that users can search.

Deep Blue
Deep Blue is the university’s user-contributed repository. Users can deposit their work, access topics of interest, and preserve work for future generations. It is intended to provide lower barriers of access and offer a wide dissemination of information for the user of the university’s intellectual and creative output.

Inter-University Consortium For Political and Social Research (ICPSR)
ICPSR is an international consortium of approximately 700 academic institutions and research organizations. ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields. ICPSR supports data deposit as well as dissemination and search capabilities. The University of Michigan Institute for Social Research manages the ICPSR collections.

HathiTrust Digital Library
The Hathi Trust Digital Library is a collaboration of more than thirteen universities. The focus of the partnership is on preserving and providing access to digitized book and journal content from the partner library collections. The collections of the HathiTrust can be accessed through APIs, and users can add their own collections by using the Collection Builder tool.

High Speed Storage for Teragrid Systems

Indiana University Data Capacitor
The Indiana University Data Capacitor is primarily used as high speed/high bandwidth Lustre storage for NSF TeraGrid system users. Space allocations on the Data Capacitor can be requested in a similar manner as computational resources using the TeraGrid POPS website.

Massive, Scalable, Mid-Performance Storage

Amazon S3 and S3 Reduced Redundancy*
Amazon’s S3 storage services are a good fit for users who wish to leverage Amazon’s vast compute platform for computational analysis, data dispersion, and data redundancy. Web-based file management clients are available. Also, Amazon’s computational offerings, like MapReduce, automatically are configured to use S3 storage for output collection.

Google Storage for Developers*
The primary purpose of Google Storage for Developers is to house data used by applications built on the Google AppEngine platform. It is possible to use the service for general purpose, but use of the Google platform for application development is the best way to take full advantage of this storage offering.

Rackspace Cloud Files*
Rackspace Cloud Files is the Rackspace storage solution. Like Amazon S3, Rackspace Cloud Files offers a RESTful API for application development. Cloud Files supports fairly large files and provides redundancy across geographic zones, similar to Amazon S3.

General Purpose Storage

Dropbox*
Dropbox is a general purpose, publicly-offered storage service intended for users who simply need a location to drop or backup files.

Google Storage*
Slightly different than Google Storage for Developers, general Google Storage is a cost effective way to extend the capacity of established Google App storage. Google App storage spans multiple applications, including Picasa, GMail, Documents, Sites, and more. It is primarily a lower performance

Data Transfer

Globus Online
Many users may be more familiar with Globus as a transfer protocol for high-performance computing grids and clusters. Globus has been used as a data transfer software on HPC but also in novel solutions involving Matlab computations and other middleware data transfer applications. Globus Online is the online interface to Globus transfer capabilities.

Data Federation

iRODS (Integrated Rule-Oriented Data System)
IRODS is a policy-based data management, data sharing, and preservation system. While the general architecture of iRODS is generally a project for university-wide IT groups, any device that can hold data can be federated to an iRODS system. Lab data servers, library archives, and even a personal hard drive can potentially serve data to the iRODS system. Once on iRODS, data can be manipulated by application APIs and moved between repositories as appropriate. iRODS offers the potential federation of repositories of all types, across university, regional, and national boundaries.

Additional Data Management Plan Resources From Other Universities

Michigan State University Data Management Planning Guide
Pennsylvania State Research Data Management Guidelines
University of Nebraska-Lincoln Data Management Plan Guides for NSF, NIH, NASA, NEH, NIJ

*Disclaimer: Researchers should not assume that third-party vendors or their applications or services have been institutionally scrutinized for compliance with relevant university policy, privacy, data security, and legal requirements. Consequently, it is important that you conduct a good faith effort to ensure that the most critical compliance risks are reasonably accounted for before completing registration or assenting to a click-through agreement with any third-party cloud computing service.

We also encourage you to review the vendor’s Terms of Use, especially language dealing with accounts, passwords, privacy, security, and data recovery. Unless specifically stated and contracted for by Procurement Services, U-M does not endorse any third-party vendors; all transactions are between the researcher and the vendor.