|
GOALS OF DATA SHARING
Data sharing promotes many goals of the NIH research endeavor. It is particularly
important for unique
data that cannot be readily replicated. Data sharing allows scientists
to expedite the translation of research results into knowledge, products,
and procedures to improve human health.
There are many reasons to share data from NIH-supported studies. Sharing
data reinforces open scientific inquiry, encourages diversity of analysis
opinion, promotes new research, makes possible the testing of new or alternative
hypotheses and methods of analysis, supports studies on data collection methods
and measurement, facilitates the education of new researchers, enables the
exploration of topics not envisioned by the initial investigators, and permits
the creation of new datasets when data from multiple sources are combined.
In NIH's view, all data should be considered for data sharing. Data should
be made as widely and freely available as possible while safeguarding the
privacy of participants, and protecting confidential and proprietary data.
To facilitate data sharing, investigators submitting a research application
requesting $500,000 or more of direct costs in any single year to NIH on or
after October 1, 2003 are expected to include a plan for sharing final research
data for research purposes, or state why data sharing is not possible.
APPLICABILITY
The NIH policy on data sharing applies:
To the sharing of final
research data
for research purposes.
To basic research, clinic al studies, surveys, and other types of research
supported by NIH. It appies to research that involves human subjects and
laboratory research that does not involve human subjects. It is especially
important to share unique data that cannot be readily replicated.
To applicants seeking $500,000 or more in direct costs in any year of
the proposed project period through grants, cooperative agreements, or contracts.
To research applications submitted beginning October 1, 2003.
Policies with respect to data sharing vary across countries. Investigators
from foreign institutions and U.S. Investigators collecting data in other
countries should familiarize themselves with the policies governing data sharing
in the countries in which they plan to work and to address any spec limitations
in the data-sharing plan in their application.
Even if NIH support is sought to transform or link datasets (as opposed to
producing a new set of data), the investigator should still include a data-sharing
plan in the application. If there are limitations associated with a data-sharing
agreement for the original data that preclude subsequent sharing, then the
applicant should explain this in the application.
IMPLEMENTATION
The NIH data-sharing policy applies to applicants seeking $500,000 or more
re in direct costs in any year of the proposed research. The $500,000 threshold
corresponds to the threshold set in the October 16, 2001 NIH Guide, where
applicants requesting $500,000 or more in direct costs for any year must seek
agreement by NIH Institute or Center (IC) staff to accept assignment of their
application at least 6 weeks prior to the anticipated submission date.
(See http://grants2.nih.gov/grants/guide/notice-files/NOT-OD-02-004.html
for details).
That policy directs applicants to contact in writing or by telephone IC program
staff during the development process of the application but no later than
6 weeks before the anticipated submission date. Applicants are encouraged
to discuss their proposed data-sharing plan with IC program staff at that
time.
Final research data are recorded factual material commonly accepted in the
scientific community as necessary to document, support, and validate research
findings. This does not mean summary statistics or tables; rather, it means
the data on which summary statistics and tables are based. For most studies,
final research data will be a computerized dataset. For example, the final
research data for a clinical study would include the computerized dataset
upon which the accepted publication was based, not the underlying pathology
reports and other her clinical source documents. For some but not all scientific
areas, the final dataset might include both raw data and derived variables,
which would be described in the documentation associated with the dataset.
Given the breadth and variety of science that NIH supports, neither the precise
content for the data documentation, nor the formatting, presentation, or transport
mode for data is stipulated. What is sensible in one field or one study may
not work at all for others. It would be helpful for members of multiple disciplines
and their professional societies to discuss data sharing , determine what
standards and best practices should be proposed, and create a social environment
that supports data sharing. NIH is planning to convene workshops where investigators
with experience in data sharing will share their expertise with others. These
workshops will address areas such as cleaning and formatting data, writing
documentation, redacting data to protect subjects' identities and proprietary
information, and estimating costs to prepare documentation and data for sharing.
When the Principal Investigator (PI) and the authorized institutional official
sign the face page of an NIH application, they are assuring compliance with
policies and regulations governing research awards. NIH expects grantees to
follow these rules and to conduct the work described in the application. Thus,
if an application describes a data-sharing plan, NIH expects that plan to
be enacted. If progress has been made with the data-sharing plan, then en
the grantee should note this in the progress report. In the final progress
report, if not sooner, the grantee should note what steps have been taken
w with respect to the data-sharing plan. In the case of noncompliance (depending
on its severity and duration) NIH can take various actions to protect the
Federal Government's interests. In some instances, for example, NIH may make
data sharing an explicit term and condition of subsequent awards.
Grantees should note that, under the NIH Grants Policy Statement, they are
required to keep the data for 3 years following closeout of a grant or contract
agreement. (Contracts may specify different time periods.) For the most part,
NIH makes awards to institutions and not individuals (with very few exceptions
, such as F32 awards). Thus, the grantee institution may have additional policies
and procedures regarding the custody, distribution, and required retention
period for data produced under research awards.
Timeliness of Data Sharing
Recognizing that the value of data often depends on their timeliness,
data sharing should occur in a timely fashion. NIH expects the timely release
and sharing of data to be no later than the acceptance for publication of
the main findings from the final dataset. The specific time will be influenced
by the nature of the data collected. Data from small studies can be analyzed
and submitted for publication relatively quickly. If data from large epidemiologic
or longitudinal studies are collected over several discrete time periods
or waves, it is reasonable to expect that the data would be released in
waves as data become available or main findings from waves of the data are
published. NIH recognizes that the investigators who collected the data
have a legitimate interest in benefiting from their investment of time and
effort. NIH continues to expect that the initial investigators may benefit
from first and continuing use but not from prolonged exclusive use.
Human Subjects and Privacy Issues
The rights and privacy of human subjects who participate in NIH-sponsored
research must be protected at all times. It is the responsibility of the
investigators, their Institutional Review Board (IRB), and their institution
to protect the rights of subjects and the confidentiality of the data. Prior
to sharing, data should be redacted to strip all identifiers, and effective
strategies should be adopted to minimize risks of unauthorized disclosure
of personal identifiers. Stripping a dataset of items that could identify
individual participants is referred to by several different terms, such
as "data redaction," "de-identification of data," and
anonymizing data. In addition to removing direct identifiers, e.g., name,
address, telephone numbers, and Social Security Numbers, researchers should
consider removing indirect identifiers and other information that could
lead to "deductive disclosure" of participants' identities. Deductive
disclosure of individual subjects becomes more likely when there are unusual
characteristics of the joint occurrence of several unusual variables. Samples
drawn from small geographic areas, rare populations, and linked datasets
can present particular challenges to the protection of subjects' identities.
Investigators may use different methods to reduce the risk of subject identification.
One possible approach is to withhold some part of the data. Another approach
h is to statistically alter the data in ways that will not compromise secondary
analyses but will protect individual subjects' identities. Alternatively,
an investigator may restrict access to the data at a controlled site, sometimes
referred to as a data
enclave. Some investigators may employ hybrid methods, such as releasing
a highly redacted dataset for general use but providing access to more sensitive
data with stricter controls through a data enclave.
Researchers who seek access to individual level data are typically required
to enter into a data-sharing agreement. Data-sharing agreements, which come
by many terms, including "license agreements," and "data
distribution agreements," generally include requirements to protect
participants' privacy and data confidentiality. They may prohibit the recipient
from transferring the data to other users or require that the data be used
for research purposes only, among other provisions, and they may stipulate
penalties for violations. For further information on these alternative mechanisms
to share data while protecting participant confidentiality, see also the
section concerning "Methods
for Data Sharing." In most instances, sharing and archiving of
data is possible without compromising confidentiality and privacy rights.
The procedures adopted to share data while protecting privacy should be
individually tailored to the specific dataset.
Investigators seeking NIH support for clinical trials may wish to consider
several factors as they develop their data-sharing plan. Researchers who
are planning clinical trials and intend to share the resulting data should
think carefully about the study design, the informed consent documents,
and the structure of the resulting dataset prior to the initiation of the
study. For example, many early phase clinical trials use small samples,
which make it difficult to protect the privacy of the participants. Furthermore,
some study designs afford greater privacy protection to subjects than others.
For example, longitudinal research poses challenges because the need to
retain identifiers in order to link individual-specific data collected at
different time points.
NIH recognizes that the sharing of data from clinical trials and under
other situations may require making the data anonymous or sharing under
more controlled means, as through a restricted access data enclave. Sharing
though data enclaves would grant access only to researchers who agree to
preserve the privacy of subjects and provide means to protect the confidentiality
of the data.
Investigators who are working for or who are themselves covered
entities under the Health Insurance Portability and Accountability Act
(HIPAA) must consider issues related to the Privacy Rule, a Federal regulation
under HIPAA that governs the protection of individually identifiable health
information. The Department of Health and Human Services (DHHS) provides
guidance on research and the Privacy Rule elsewhere:
http://www.hhs.gov/ocr/
It should be noted that the Privacy Rule is relatively new, and additional
information and guidance will be shared on the DHHS website as soon as it
is available.
If research participants are promised that their data will not be shared
with other researchers, the application should explain the reasons for such
promises. Such promises should not be made routinely and without adequate
justification. For the most part, it is not appropriate for the initial
investigator to place limits on the research questions or methods other
investigators might pursue with the data. It is also not appropriate for
the investigator who produced the data to require coauthorship as a condition
f or sharing the data.
Many research efforts supported by NIH do not include human subjects. Final
research datasets from studies that do not include human subjects generally
should not be constrained by the limitations deemed necessary and appropriate
for human subjects.
Proprietary Data
Although Small Business Innovation Research (SBIR) applicants are also
to address data sharing in their applications, under the Small Business
Act, SBIR grantees may withhold their data for 4 years after the end of
the award. The Small Business Act provides authority for NIH to protect
from disclosure and nongovernmental use all SBIR data developed from work
performed under an SBIR funding agreement for a period of 4 years after
the closeout of either a phase I or phase II grant unless NIH obtains permission
from the awardee to disclose these data. The data rights protection period
lapses only upon expiration of the protection period applicable to the SBIR
award, or by agreement between the small business concern and NIH.
Issues related to proprietary data also can arise when cofunding is provided
by the private sector (e.g., the pharmaceutical or biotechnology industries)
with corresponding constraints on public disclosure. NIH recognizes the
nee d to protect patentable and other proprietary data. Any restrictions
on data sharing due to cofunding arrangements should be discussed in the
data-sharing plan section of an application and will be considered by program
staff. While NIH understands that an institution's desire to exercise its
intellectual property rights may justify a need to delay disclosure of research
findings, a delay of 30 to 60 days is generally viewed as a reasonable period
for such activity.
Methods for Data Sharing
There are many ways to share data.
Under the auspices of the PI:
Mixed mode sharing.
The method for sharing that an investigator selects is likely to depend
on several factors, including the sensitivity of the data, the size and
complexity of the dataset, and the volume of requests anticipated. Investigators
sharing under their own auspices may simply mail a CD with the data to
the requestor r, or post the data on their institutional or personal Website.
Although not a condition for data access, some investigators sharing under
their own auspices may form collaborations with other investigators seeking
their data in order to pursue research of mutual interest. Others may
simply share the data by transferring them to a data archive facility
to distribute more widely to interested users, to maintain associated
documentation, and to meet reporting requirements. Data archives can be
particularly attractive for investigators concerned about a large volume
of requests, vetting frivolous or inappropriate requests, or providing
technical assistance for users seeking help with analyses.
There are several mechanisms for data sharing that investigators can
use. For example, investigators sharing under their own auspices should
consider using a data-sharing agreement to impose appropriate limitations
on users. Such an agreement usually indicates the criteria for data access,
whether or not there are any conditions for research use, and can incorporation
privacy and confidentiality standards to ensure data security at the recipient
site and prohibit manipulation of data for the purposes of identifying
subjects. Many examples of data sharing agreements for specific datasets
are available on the Internet, including the following:
AHRQ National Inpatient Sample at:
http://www.ahcpr.gov/data/hcup/datause.htm
Russian Longitudinal Monitoring Survey at:
http://www.cpc.unc.edu/dataarch/iprimary/rlms.html
Center for Medicare and Medicaid Services Data at:
http://hrsonline.isr.umich.edu/rda/userdocs/cmsdua.pdf
Alternatively, researchers may want to add their data to a data archive
or a data enclave. Datasets that cannot be distributed to the general
public, for example, because of participant confidentiality concerns,
third-party licensensing or use agreements that prohibit redistribution,
or national security considerations, can be accessed through a data
enclave. A data enclave provides a controlled, secure environment in
which eligible researchers can perform analyses using restricted data
resources.
Investigators may also wish to develop a "mixed mode" for
data sharing that allows for more than one version of the dataset and
provides different levels of access depending on the version. For example,
a redacted dataset could be made available for general use, but stricter
controls through a data enclave would be applied if access to more sensitive
data were required.
Investigators will need to determine which method of data sharing is
best for their particular dataset. The Data Sharing Workbook (PDF)
or (MS
Word) provides information and examples of how others have shared
data.
Data Documentation
Regardless of the mechanism used to share data, each dataset will require
documentation. (Some fields refer to data documentation by other terms,
such as metadata or codebooks). Proper documentation is needed to ensure
that others can use the dataset and to prevent misuse, misinterpretation,
and confusion . Documentation provides information about the methodology
and procedures used to collect the data, details about codes, definitions
of variables, variable field locations, frequencies, and the like. The precise
content of documentation will vary by scientific area, study design, the
type of data collected, and characteristics of the dataset.
It is appropriate for scientific authors to acknowledge the source of data
upon which their manuscript is based. Many investigators include this information
in the methods and/or reference sections of their manuscripts. Journals
generally include an acknowledgment section, in which the authors can recognize
people who helped them gain access to the data. Authors using shared data
should check the policies of the journal to which they plan to submit to
determine the precise location in the manuscript for such acknowledgment.
Most journals now expect that DNA and amino acid sequences that appear in
articles will be submitted to a sequence database before publication.
Funds for Data Sharing
NIH recognizes that it takes time and money to prepare data for sharing.
Thus, applicants can request funds for data sharing and archiving in their
grant application. (See also the section on What
to Include in an NIH Application.) Investigators who incorporate data
sharing in the initial design of the study may more readily and economically
establish adequate procedures for protecting the identities of participants
and share a useful dataset with appropriate documentation.
Review Considerations
Reviewers will not factor the proposed data-sharing plan into the determination
of scientific merit or priority score. Program staff will be responsible
for overseeing the data sharing policy and for assessing the appropriateness
and adequacy of the proposed data-sharing plan.
WHAT TO INCLUDE IN AN NIH APPLICATION
Investigators seeking $500,000 or more in direct costs in any year should
include a description of how final research data will be shared, or explain
why data sharing is not possible. It is expected that the data sharing discussion
on will be provided primarily in the form of a brief paragraph immediately
following the Research Plan Section of the PHS 398 application form (i.e.,
immediately after I. Letters of Support), and would not count towards the
application page limit.
Data Sharing Plan (to follow immediately after the Research Plan Section)
The precise content of the data-sharing plan will vary, depending on the
data being collected and how the investigator is planning to share the data.
Applicants who are planning to share data may wish to describe briefly the
expected schedule for data sharing, the format of the final dataset, the documentation
to be provided, whether or not any analytic tools also will be provided, whether
or not a data-sharing agreement will be required and, if so, a brief description
of such an agreement (including the criteria for deciding who can receive
the data and whether or not any conditions will be placed on their use), and
the mode of data sharing (e.g., under their own auspices by mailing a disk
or posting data on their institutional or personal website, through a data
archive or enclave). Investigators choosing to share under their own auspices
may wish to enter into a data-sharing agreement.
References to data sharing may also be appropriate in other sections of the
application, as discussed below.
Budget and Budget Justification Sections
Applicants may request funds in their application for data sharing. If
funds are being sought, the applicant should address the financial issues
in the budget and budget justification sections. Some investigators have
more experience than others in estimating costs associated with preparing
the dataset and associated documentation, and providing support to data
users. As investigators gain experience with the process, their ability
to estimate costs will improve. Investigators working with archives can
get help with data preparation and cost estimation. Investigators who are
concerned about paying for data-sharing costs at the end of their grant
can make prior arrangements with archives. Investigators facing considerable
delays in the preparation of the final dataset for sharing should consult
with the NIH program about how to manage this situation, such as requesting
a no-cost extension.
Background and Significance Section (PHS 398 Research Plan Section B)
If support is being sought to develop a large database that will serve
as an important resource for the scientific community, the applicant may
wish to make a statement about this in the significance section of the application.
Human Subjects Section (PHS 398 Research Plan Section E)
If the research involves human subjects and the data are intended to be
shared, the application should discuss how the rights and confidentiality
of participants would be protected. In the Human Subjects section of the
application, the applicant should discuss the potential risks to research
participants posed by data sharing and steps taken to address those risks.
EXAMPLES OF DATA-SHARING PLANS
The precise content and level of detail to be included in a data-sharing
plan depends on several factors, such as whether or not the investigator is
planning to share data, the size and complexity of the dataset, and the like.
Below are several examples of data-sharing plans.
Example 1
The proposed research will involve a small sample (less than 20 subjects)
recruited from clinical facilities in the New York City area with Williams
syndrome. This rare craniofacial disorder is associated with distinguishing
facial features, as well as mental retardation. Even with the removal of
all identifiers, we believe that it would be difficult if not impossible
to protect the identities of subjects given the physical characteristics
of subjects, the type of clinical data (including imaging) that we will
be collecting, and the relatively restricted area from which we are recruiting
subjects. Therefore, we are not planning to share the data.
Example 2
The proposed research will include data from approximately 500 subjects
being screened for three bacterial sexually transmitted diseases (STDs)
at an inner city STD clinic. The final dataset will include self-reported
demographic and behavioral data from interviews with the subjects and laboratory
data from urine specimens provided. Because the STDs being studied are reportable
diseases, we will be collecting identifying information. Even though the
final dataset will be stripped of identifiers prior to release for sharing,
we believe that there remains the possibility of deductive disclosure of
subjects with unusual characteristics. Thus, we will make the data and associated
documentation available to users only under a data-sharing agreement that
provides for:
1.) a commitment to using the data only for research purposes and not
to identify any individual participant;
2.) a commitment to securing the data using appropriate computer technology;
and
3.) a commitment to destroying or returning the data after analyses are
completed.
Example 3
This application requests support to collect public-use data from a survey
of more than 22,000 Americans over the age of 50 every 2 years. Data products
from this study will be made available without cost to researchers and analysts.
https://ssl.isr.umich.edu/hrs/
User registration is required in order to access or download files. As
part of the registration process, users must agree to the conditions of
use governing access to the public release data, including restrictions
against attempting to identify study participants, destruction of the data
after analyses are completed, reporting responsibilities, restrict ions
on redistribution of the data to third parties, and proper acknowledgment
of the data resource. Registered users will receive user support, as well
as information related to errors in the data, future releases, workshops,
and publication lists. The information provided to users will not
be used for commercial purposes, and will not be redistributed
to third parties.
DEFINITIONS
Covered Entity
A covered entity is d defined as a health care clearinghouse, health plan,
or health care provider that electronically transmits health information
in connection with a transaction for which DHHS has adopted standards under
the Health Insurance Portability and Accountability Act (HIPAA). An example
of a researcher who may be a covered entity is a physician who electronically
bills for health care services and conducts clinical trials. A set of decision
tools on "Am
I a covered entity?" are available from the DHHS Office for Civil
Rights Website:
http://www.hhs.gov/ocr/
Final Research Data
Recorded factual material commonly accepted in the scientific community
as necessary to document and support research findings. This does not mean
summary statistics or tables; rather, it means the data on which summary
statistics and tables are based. For the purposes of this policy, final
research data do not include laboratory notebooks, partial datasets, preliminary
analyses, drafts of scientific papers, plans for future research, peer review
reports, communications with colleagues, or physical objects, such as gels
or laboratory specimens. NIH has separate guidance on the sharing of research
resources, which can be found at:
http://grants.nih.gov/grants/policy/nihgps_2003/NIHGPS_Part7.ht
m#_Toc54600131
Restricted Data
Datasets that cannot be distributed to the general public, because of,
for example, participant confidentiality concerns, third-party licensing
or use agreements, or national security considerations.
Timeliness
In general, NIH considers the timely release and sharing of data to be
no later than the acceptance for publication of the main findings from the
final dataset. However, the actual time will be influenced by the nature
of the data collected.
Unique Data
Data that cannot be readily replicated. Examples of studies producing unique
data include: large surveys that are too expensive to replicate; studies
of unique populations, such as centenarians; studies conducted at unique
times, such as a natural disaster; studies of rare phenomena, such as rare
metabolic diseases.
This list of Frequently Asked Questions will be updated as
we receive additional questions and finalize the NIH statement on sharing research
data. We encourage readers to check in regularity for updates.
March 5, 2003:
-
Why should I share my final research data?
Data sharing achieves many important goals for the scientific community,
such as
reinforcing open scientific inquiry
encouraging diversity of analysis and opinion,
promoting new research, testing of new or alternative hypotheses and
methods of analysis
supporting studies on data collection methods and measurement
facilitating education of new researchers
enabling the exploration of topics not envisioned by the initial investigators
permitting the creation of new datasets by combining data from multiple
sources.
-
Who benefits from data sharing?
Everyone benefits, including investigators, funding agencies, the scientific
community, and, most importantly, the public. Data sharing provides more
effective use of NIH resources by avoiding unnecessary duplication of data
collection. It also conserves research funds to support more investigators.
The initial investigator benefits, because as the data are used and published
more broadly, the initial investigator's reputation grows.
-
Is data sharing widely accepted as a good practice?
National scientific organizations have made a commitment to the sharing
and archiving of data through their ethical codes (e.g., the American Sociological
Association) or publication policies (e.g., the American Psychological Association).
More than 15 years ago, the National Academy of Sciences described the benefits
of sharing data. (See http://books.nap.edu/catalog/2033.html)
For many years, the National Science Foundation (NSF) Economics Program
has required data underlying an article arising from an NSF grant to be
placed in a public archive. Similar expectations exist at the National Institute
of Justice. Moreover, many scientific journals require that authors make
available the data included in their publications. In the biological sciences,
protein and DNA sequences are made available to researchers through data
archives, such as GenBank. Since 1996, NIH has required data sharing in
several areas, such as DNA sequences, mapping information, and crystallographic
coordinates.
-
What do you mean by final research data?
By "final research data", we mean recorded factual material commonly
accepted in the scientific community as necessary to validate research findings.
Final research data do not include laboratory notebooks, partial datasets,
preliminary analyses, drafts of scientific papers, plans for future research,
peer review reports, communications with colleagues, or physical objects,
such as gels or laboratory specimens.
-
Does "final research data" include data that were not originally
produced under an NIH grant or contract?
Sometimes. For example, where NIH support is sought to transform or link
datasets (as opposed to producing new data), the investigator should include
a data-sharing plan in the application.
-
What do you mean by unique data?
By "unique data" we mean data that cannot be readily replicated.
Examples of studies producing unique data include: large surveys that are
too expensive to replicate; studies of unique populations, such as centenarians;
studies conducted at unique times, such as a natural disaster; studies of
rare phenomena, such as rare metabolic diseases.
-
What kinds of data are candidates for sharing?
Potentially all kinds of data are candidates for sharing, but unique data
are especially important. Some biologic sciences already have data-sharing
plans in place, such as genetic mapping. But other basic science data are
also amenable to sharing. Data from human subjects (e.g., surveys, clinical
studies) also can be shared if the identity and privacy of research participants
can be protected.
-
Can you give me some examples of data that have been shared?
Examples of shared epidemiologic data include the Framingham Heart Study,
the Honolulu Heart Program, the Atherosclerosis Risk in Communities, Epidemiology
of Chronic Disease in the Oldest Old, and the Iowa 65+ Rural Health Study.
Examples of shared data from clinical trials include the Asymptomatic Cardiac
Ischemia Pilot, the Intermittent Positive Pressure Breathing Study, and
the Safety and Efficacy Trial of Zidovudine for Asymptomatic HIV Infected
Individuals. Examples of shared datasets from the basic sciences include
a growing number of genome sequences and maps, as well as protein and nucleotide
databases (see ENTREZ http://www.ncbi.nlm.nih.gov/Database/index.html
and other resources for molecular biology at the National Center
for Biotechnology Information at http://www.ncbi.nlm.nih.gov)
-
Data from my studies are generated from a very small number of rats,
and I publish the final data. Am I expected to provide these data to other
investigators as well?
Publishing these final data constitutes an acceptable mechanism for sharing
data.
-
How soon after data collection am I obliged to share the final data?
Recognizing that the value of data often depends on their timeliness, data
sharing should occur in a timely fashion. NIH expects the timely release
and sharing of data to be no later than the acceptance for publication of
the main findings from the final dataset. This time point will be influenced
by the nature of the data collected. Data from small studies can be analyzed
and submitted for publication relatively quickly. If data from large epidemiologic
or longitudinal studies are collected over several discrete time periods
or waves, data should be released in waves as data become available or main
findings from waves of the data are published. NIH recognizes that the investigators
who collected the data have a legitimate interest in benefiting from their
investment of time and effort. NIH continues to expect that the initial
investigators may benefit from the first and continuing use, but not from
prolonged exclusive use. While NIH also understands that an institution's
desire to exercise its intellectual property rights may justify a need to
delay disclosure of research findings, a delay of 30 to 60 days is generally
viewed as a reasonable period for such activity.
-
Does data sharing pertain only to published data?
No. Data-sharing plans should encompass all data from funded research that
can be shared without compromising individual subjects' rights and privacy,
regardless of whether the data have been used in a publication. Furthermore,
data sharing prior to the publication of major results is encouraged in
many instances, for example, when data are collected to provide a resource
for the scientific community (as in the case of many large surveys).
-
Due to circumstances beyond my control (an earthquake!), I was unable
to recontact a substantial portion of the sample in my longitudinal study.
I was planning to put my data in an archive, but the resulting high rate
of attrition makes the data minimally useful. Should I still archive the
final dataset?
Investigators need to find a balance between the value of the final data
and the costs associated with archiving. If the data are of limited usefulness,
then it is probably not worth the expense and effort of putting them in
an archive. However, if the investigator has published results based on
this dataset, then the dataset should be shared.
-
I am preparing an SBIR application. Am I required to submit a data-sharing
plan?
Yes. The specific nature of the data you will collect will determine whether
or not you may share the final dataset. If the final data are not amenable
to sharing, for example, if they are proprietary, then you need to explain
this in your application. Under the Small Business Act, SBIR grantees may
withhold their data for 4 years after the end of the award. The Small Business
Act provides authority for NIH to protect from disclosure and nongovernmental
use all SBIR data developed from work performed under an SBIR funding agreement
for a period of 4 years after the closeout of either a Phase I or Phase
II grant unless NIH obtains permission from the awardee to disclose these
data. The data rights protection period lapses only upon expiration of the
protection period applicable to the SBIR award, or by agreement between
the small business concern and NIH.
-
I don't want to share my data, which were generated under an NIH grant.
Can I be forced to do so?
When the PI and the authorized institutional official sign the face page
of an NIH application, they are assuring compliance with policies and regulations
governing research awards. NIH expects grantees to follow these rules and
to conduct the work described in the application. Thus, if an application
describes a data sharing plan, NIH expects that plan to be enacted. In some
instances, for example, NIH may make data sharing a term and condition of
award.
Under specific circumstances, your data also may be accessible through the
Freedom of Information Act (FOIA). If your competitive grant was awarded
after April 17, 2000 and if your data were cited in a Federal regulation
or administrative order, then your data may also be accessible through FOIA.
(See http://grants.nih.gov/grants/policy/a110/a110_guidance_dec1999.htm).
-
Will the data-sharing plan affect the priority score of my application?
No. Reviewers will not factor the proposed data-sharing plan into the determination
of scientific merit or priority score. Program staff is responsible for
overseeing the data-sharing policy and for assessing the appropriateness
and adequacy of the proposed data-sharing plan. Program concerns must be
resolved prior to making any award.
-
My research, which seeks support from both the public and private sectors,
will involve proprietary data. How do I deal with the data-sharing issue
in my application?
NIH recognizes that there may be circumstances where a cofunder has requested
restrictions on data sharing as a condition of funding. These restrictions
should be identified in the application and a proposal made about how data
from the cofunded project will be shared. Should you believe that you are
unable to share any of the data, your justification will be considered by
NIH program staff.
-
I'm a busy investigator. I don't have time to process requests for my
data. What should I do?
In addition to publishing small datasets, there are several alternatives
to responding to each separate request to share data (e.g., putting data
in an archive or restricted access facility, and setting up a web site for
data access). Archives and data enclaves provide technical assistance for
users with questions or problems and may spare busy investigators time.
-
Can I share data with colleagues under my own auspices?
Yes. Your data-sharing plans should indicate the criteria for deciding who
can receive your data and whether or not you will place any conditions on
their use. Data should be made as widely and freely available as possible
while safeguarding the confidentiality of the data and privacy of participants.
You should not place limits on the questions or methods others might pursue
nor should you require co-authorship as a condition for receiving the data.
-
Should the data source be cited or acknowledged in papers that rely
on shared data?
It is appropriate to acknowledge the source of data upon which a manuscript
is based. Many investigators include this information in the methods and/or
reference sections of their manuscripts. Journals generally include an acknowledgment
section, in which the authors can recognize people who helped them gain
access to the data. However, you should check the policies of the journal
to which you plan to submit.
-
Should I consider contributing my research data to a data archive?
Maybe. Archives are organizations that collect and distribute data. They
understand what is needed to prepare data for wider distribution and documentation
for users. They provide stable, reliable, and cost-effective means for distributing
data. They also provide protections for the dataset and technical assistance
for requestors.
-
Where can I find guidance on preparing data for sharing and archiving?
Guidance is available from a variety of sources. For example, the Inter-University
Consortium for Political and Social Research at the University of Michigan
has prepared an excellent set of guidelines for preparing data for archiving.
While these guidelines were written with social science data in mind, they
are broadly applicable. See http://www.icpsr.umich.edu/ACCESS/dpm.html
For molecular biology information, the National Center for Biotechnology
Information (NCBI), a division of the National Library of Medicine (NLM)
at the National Institutes of Health, is ready to assist researchers who
have genome-specific and molecular data to submit. For more information
about submitting and accessing NCBI data, see the NCBI Website at http://www.ncbi.nlm.nih.gov/Genbank/index.html
-
How do I pay for preparing data for sharing and archiving?
NIH recognizes that it takes time and money to prepare data for sharing.
You can request funds for data archiving and sharing as part of your grant
application for collecting the data. If you have already collected the data,
you may want to ask your NIH Project Officer about a competitive or administrative
supplement. NIH recommends that you consider procedures and costs for data
sharing during the application process rather than after the data have been
collected.
-
Should I address data sharing in my NIH application?
Yes. By the October 1, 2003 application receipt date, NIH requests that
all extramural applicants seeking $500,000 or more in direct costs in any
one year provide a data-sharing plan in their applications.
-
What do I need to include in my application and where do I put the information
about data sharing?
Scientists submitting grant, cooperative, or contract applications should
include a data-sharing plan, or provide justification for the absence of
such a plan, in a brief paragraph to be placed immediately after the Research
Plan Section (i.e., immediately after PHS 398 Section I. Letters of Support
in the Research Plan Section of their application) so it does not count
toward the application page limit. Additional information on data sharing
might be included in other sections of the application, as appropriate.
For example, if you are producing a large dataset that will become an important
resource for the scientific community, you probably want to mention this
in the significance section. If you are requesting funds to prepare, document,
and archive the data, you would want to include relevant information in
the budget and budget justification sections. In the Human Subjects section
of the application, you should discuss the potential risks to research participants
posed by data sharing and steps you will take to address those risks.
-
The informed consent form for my recently completed study states explicitly
that only my research team will see the data provided and that we will not
share the data. Am I now expected to share it?
No, but if you plan to collect additional data from those subjects under
a grant with a data-sharing plan, you should revise the consent procedure
to be consistent with the data-sharing plan. In preparing and submitting
a data-sharing plan during the application process, investigators should
avoid developing or relying on consent processes that promise research participants
not to share data with other researchers. Such promises should not be made
routinely or without adequate justification described in the data-sharing
plan.
-
How can I protect the privacy of my subjects?
It is the responsibility of the investigators, their IRB, and their institution
to protect the rights of participants and the confidentiality of their data.
Data should be redacted to strip all individual identifiers, and effective
strategies should be adopted to minimize risk of disclosing a participant's
identity. Options to protect privacy include: withholding part of the data,
statistically altering the data in ways that will not compromise secondary
analyses, requiring researchers who seek data to commit to protect privacy
and confidentiality, and providing data access in a controlled site, sometimes
referred to as a data enclave. Some investigators use hybrid methods, releasing
a redacted dataset for general use but providing access to more sensitive
data through a user contract or data enclave. In most instances, sharing
data is possible without compromising participant confidentiality and privacy.
-
Can institutions and investigators subject to the Federal Health Insurance
Privacy and Portability Act (HIPAA) Privacy Rule share data in accord with
the NIH Data Sharing policy?
Yes. NIH recognizes that data sharing may be complicated or limited, in
some cases, by institutional policies or local IRB rules, as well as by
local, state and Federal laws and regulations like the Privacy Rule. To
protect the rights and privacy of people who participate in NIH-sponsored
research, data intended for broader use should be free of identifiers that
would permit linkages to individual research participants, and exclude variables
that could lead to deductive disclosure of the identity of individual subjects.
When data sharing is limited, applicants should explain such limitations
in their data sharing plans.
-
I collect data on sensitive and, sometimes, illegal behaviors. Are these
data too sensitive to be shared?
Not necessarily. The collection of sensitive data does not preclude sharing.
For example, the National Center for Chronic Disease Prevention and Health
Promotion at CDC operates the Youth Risk Behavior Surveillance System (YRBSS),
available at http://www.cdc.gov/nccdphp/dash/yrbs/,
which provides data on six health risk behaviors among youth: unintentional
injuries and violence, tobacco use, alcohol and other drug use, sexual behaviors,
dietary behaviors, and physical activity. Similarly, data from the National
Survey of Family Growth, which includes statistical data on family life,
marriage and divorce, contraception, sexual experience, pregnancy, and infertility,
can be obtained from the National Center for Health Statistics.
Sensitive data can be shared so long as appropriate privacy safeguards are
in place. Investigators must determine if and how the rights and privacy
of the subjects can be protected. And investigators collecting data on sensitive
and illegal behaviors should obtain a Certificate of Confidentiality ( http://grants.nih.gov/grants/policy/coc/)
to protect against the involuntary release of data that could identify research
participants.
-
Can data from a clinical trial be shared?
It depends. Participants' privacy must be protected in accord with all applicable
laws and regulations. Clinical trial datasets are frequently rich in items
that could potentially identify individual subjects. For example, many early
phase trials use small samples, which make it difficult to protect the privacy
of the participants. Researchers who are planning clinical trials and intend
to share the resulting data should think carefully about the study design,
the informed consent documents, and the structure of the resulting data
prior to the initiation of the study.
There are many precedents for sharing of clinical trial data. For example,
data from a number of clinical trials supported by the National Heart, Lung,
and Blood Institute (NHLBI) are available for research use (See http://www.nhlbi.nih.gov/resources/deca/directry.htm).
The National Institute of Allergy and Infectious Diseases (NIAID) also lists
their clinical trials datasets that they have made available through the
National Technical Information Service (NTIS) for public use (See http://www.niaid.nih.gov/research/aidsdata.htm).
-
Is data on DNA and protein sequences archived?
Yes. For example, GenBank ( http://www.ncbi.nih.gov/Genbank/)
and Entrez ( http://www.ncbi.nlm.nih.gov/Entrez/)
archive gene sequencing data. The sharing of materials, data, and software
in a timely manner has been an essential element in the rapid progress that
has been made in the genetic analysis of mammalian genomes.
-
I did not request support for sharing data in my application, which
was funded. Can I charge requestors for the costs associated with sharing
the data?
Yes, as long as such costs are reasonable and not excessive and reflect
actual costs associated with complying with the request. These expenses
for preparing and shipping the data might include costs of personnel, computing
time, supplies, and other directly related expenses. NIH requirements for
accountability for various types of income under NIH grants are specified
elsewhere, see http://grants.nih.gov/grants/policy/nihgps_2003/NIHGPS_Part8.htm#_Toc54600138
-
I am working on a select pathogen and cannot share the data for reasons
of national security. Is this an acceptable reason for not sharing?
Yes.
-
If I am required to submit a revised data-sharing plan, what do I need
to do?
As is the case with PIs who submit any additional or revised application
material, your revised data-sharing plan must be signed by your institutional
official and by you.
-
I want to request a dataset from a recent publication. How do I do this?
You should check the publication to see if reference is made to an archive,
an enclave, or a Website where the data might be available. If no such information
is provided, you may wish to send a letter to the PI to see if the data
are available for sharing, and where you might be able to get the data and
associated documentation.
February 16, 2004:
-
I am a PI on a P30 center grant with a budget in excess of $500,000
(direct costs) in each year. Some of the research projects that collect
survey data benefit by the infrastructure support provided by the P30 but
these research projects are not funded by NIH. Am I still expected to share
data from these research grants?
If any NIH support (i.e., partial support) is provided for resource development,
even if those research resources were developed primarily with non-NIH funds,
then those research resources must be shared in line with NIH policy as
if NIH funded the entire project. It should be emphasized that although
a data sharing plan is only required of grants awarding direct costs of
$500,000 or more in any one year, data sharing itself (without a specific
plan submission) continues to be a requirement of all NIH-funded grants.
If the P30 maintains core resources that actually house and are the final
repository of the data, e.g., a high throughput array analysis core, then
any project using the center’s resources would be subject to the center’s
data sharing plan.
|