Research Remix

October 8, 2010

A look into the revised NSF data sharing policy

Filed under: Notes, Policies — Tags: , , , , — Heather Piwowar @ 6:09 am

Curious about details on the NSF’s revised policy on Dissemination and Sharing of Research Results?  I’ve been digging into the documents released by the NSF and its Directorates.  Here are my notes, in case they are useful for someone:  excerpts from the docs, grouped by topic.

Refs:

  • [SES] Division of Social and Economic Sciences
  • [EAR] Division of Earth Sciences
  • [ENG] Engineering Directorate
  • [OCE] Division of Ocean Sciences
  • [IODP] Integrated Ocean Drilling Program
  • [MPS]  Mathematical and Physical Sciences Directorate

What is considered “data”/research results covered by this policy ?

  • “may include, but is not limited to: data, publications, samples, physical collections, software and models”  FAQ
  • “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants”  AAG
  • “Investigators and grantees are encouraged to share software and inventions created under the grant or otherwise make them or their products widely available and usable.”  AAG
  • “Qualitative resources.   If it is appropriate for other researchers to have access to them, the investigators should specify a time at which they will be made generally available, in an appropriate form and at a reasonable cost.”  SES
  • “In addition, complete information on how an experiment was conducted and any unusual stimulus materials should be made available, so that failures to replicate will not turn out to depend on one scientist’s incomplete understanding of another’s procedure.”  SES
  • “Mathematical and computer models.  Investigators should plan to make these models available to others wanting to apply them to other data sets or experimental situations. In some cases, the descriptions in published articles are sufficient; more often, it will be necessary for investigators to prepare fully documented and robust versions of these models, typically on disk, so that they can be provided to others.”  SES
  • “Preservation of all data, samples, physical collections and other supporting materials needed for long- term earth science research and education”  EAR
  • “Experimental Research: In experimental research, individuals, be they people, animals, or objects, are subjected to preplanned conditions and their responses tabulated in some fashion. Investigators should plan to make these tabulated data available to other investigators requesting them” SES
  • “Data archives must include easily accessible information about the data holdings, including quality assessments”  EAR
  • “Archiving of both physical and digital data must be addressed in the plan”  ENG
  • “Under the following definitions, all data must be included in the DMP that result fully or in part from activities supported by ENG.”  ENG
  • “Research data are formally defined as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings” by the U.S. Office of Management and Budget (1999).”  ENG
  • “The basic level of digital data to be archived and made available includes (1) analyzed data and (2) the metadata that define how these data were generated…. Analyzed data are (but are not restricted to) digital information that would be published, including digital images, published tables, and tables of the numbers used for making published graphs.  Necessary metadata are (but are not restricted to) descriptions or suitable citations of experiments, apparatuses, raw materials, computational calculation input conditions”  ENG
  • “These are data that are or that should be published in theses, dissertations, referred journal articles, supplemental data attachments for manuscripts, books and book chapters, and other print or electronic publication formats.”  ENG
  • “What data are not included at the basic level? The Office of Management and Budget statement (1999) specifies that this definition does not include “preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues.” Raw data fall into this category as “preliminary analyses.””  ENG
  • “Describe the types of data and products that will be generated in the research, such as images of astronomical objects, spectra, data tables, time series, theoretical formalisms, computational strategies, software, and curriculum materials.”  MPS astonomy
  • Particular attentionshould be paid to data sets that are products of well-defined surveys.”  MPS

Where?  Public repository?  other?

  • “There is no public database for my type of data. What can I do to provide data access? Contact the cognizant NSF Program Officer for assistance in this situation.”  FAQ
  • “Quantitative Social and Economic Data Sets.  This may be the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan, but other public archives are also available.”  SES
  • “institutional archives that are standard for a particular discipline (e.g. IRIS for seismological data, UNAVCO for GPS data)” EAR
  • “Experimental research. SES will work with the research community to identify and resolve problems with developing and establishing centralized archives.”  SES
  • “to other investigators requesting them” SES
  • “Where no data or sample repository exists for the collected data or samples, metadata must be prepared and made available. The Principal Investigator (PI) is required to address alternative strategies for complying with the general philosophy of sharing research products and data as described above”  OCE
  • “The PI is invited to discuss this issue with NSF Program Officers in advance of submitting proposals.”  OCE
  • “for most ocean data there are designated National Data Centers where data must be deposited… Appendix I. National Data Centers”  OCE
  • “For some special programs and focused community initiatives, alternative database activities exist… Principal Investigators are encouraged to submit their data to these databases when appropriate. Since such databases may not provide long-term archival capabilities, such submission will satisfy the Principal Investigator’s obligations only if the database submits the data to one of the National Data Centers….  Appendix III: Other Database Activities…. Appendix IV. Sample Repositories”  OCE
  • Experimental Research:  “at a minimum along the lines suggested by Geoffrey Loftus in his editorial in the January, 1993, issue of Memory and Cognition”  SES  [Loftus, G.R. (1993). Editorial Comment. Memory & Cognition, 21(1), 1-3.  pdf]
  • “Describe your plans, if any, for providing such general access to data,including websites maintained by your research group, and direct contributions to publicdatabases (e.g., the Protein Data Bank, Cambridge Crystallographic Data Centre,Inorganic Crystal Structure Database in Karlsruhe, Zeolite Structure Database).”  MPS
  • “Finally, note as well any anticipated inclusionof your data into databases that mine the published literature (e.g., PubChem, NISTChemistry WebBook).”

Who needs access:  researchers, educators, public?

  • “The National Science Foundation is committed to the principle that the various forms of data collected with public funds belong in the public domain.”  SES
  • But it is a bit confused.  Even within one paragraph:  “The National Science Foundation is committed to the principle that the various forms of data collected with public funds belong in the public domain. Therefore, the Division of Social and Economic Sciences has formulated a policy to facilitate the process of making data that has been collected with NSF support available to other researchers.”  SES
  • “for research and education” EAR
  • “Data inventories should be published or entered into a public database periodically and when there is a significant change in type, location or frequency of such observations.” EAR
  • “Policies for public access and sharing should be described”  ENG
  • “samples and data to research scientists (Science Party members and postmoratorium researchers), educators, museums, and outreach institutions” IODP
  • “interested parties” MPS

Timeliness

  • “The expectation is that all data will be made available after a reasonable length of time.” FAQ
  • “One standard of timeliness is to make the data or samples accessible immediately after publication.” FAQ
  • “However, what constitutes a reasonable length of time will be determined by the community of interest through the process of peer review and program management” FAQ
  • “Quantitative Social and Economic Data Sets: For appropriate data sets, researchers should be prepared to place their data in fully cleaned and documented form in a data archive or library within one year after the expiration of an award.”  SES
  • “For those programs in which selected principle investigators have initial periods of exclusive data use, data should be made openly available as soon as possible, but no later than two (2) years after the data were collected. This period may be extended under exceptional circumstances, but only by agreement between the Principal Investigator and the National Science Foundation. For continuing observations or for long-term (multi-year) projects, data are to be made public annually.”  EAR
  • “Publication delay policies (if applicable) must be clearly stated.  Investigators are expected to submit significant findings for publication quickly that are consistent with the publication delay obligations of key partners, such as industrial members of a research center.”  ENG
  • “Public release of data should be at the earliest reasonable time. A reasonable standard of timeliness is to make the data accessible immediately after publication, where submission for publication is also expected to be timely.”  ENG
  • “Principal Investigators are required to submit all environmental data collected to the designated National Data Centers (Appendix I) as soon as possible, but no later than two (2) years after the data are collected. Inventories (metadata) of all marine environmental data collected should be submitted to the designated National Data Centers within sixty (60) days after the observational period/cruise. For continuing observations, data inventories should be submitted periodically if there is a significant change in location, type or frequency of such observations.”  OCE
  • “Also describe your practiceor policies regarding the release of data for access, for example whether data are posted before or after formal publication.”  MPS-AST

Data retention and preservation

  • “Minimum data retention of research data is three years after conclusion of the award or three years after public release, whichever is later.”  ENG
  • “Exceptions requiring longer retention periods may occur when data supports patents, when questions arise from inquiries or investigations with respect to research, or when a student is involved, requiring data to be retained a timely period after the degree is awarded.”  ENG
  • “Research data that support patents should be retained for the entire term of the patent”  ENG
  • “Longer retention periods may also be necessary when data represents a large collection that is widely useful to the research community. For example, special circumstances arise from the collection and analysis of large, longitudinal data sets that may require retention for more than three years. Project data-retention and data-sharing policies should account for these needs”  ENG
  • “If maintenance of a web site ordatabase is the direct responsibility of your group, provide information about the period of timethe web site or data base is expected to be maintained.”  MPS-AST
  • “Describe how data will be archived and how preservation of access will be handled. Forexample, will hardcopy notebooks, instrument outputs, and physical samples be stored ina location where there are safeguards against fire or water damage? Is there a plan totransfer digitized information to new storage media or devices as technological standardsor practices change? Will there be an easily accessible index that documents where allarchived data are stored and how they can be accessed?”  MPS-CHE

Program-specific additional requirements

  • several noted some programs, institutions, communities may have more stringent requirements.  A few (OCE) go into some specifics.

Reporting, review, and consequences

  • “The Data Management Plan will be reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.”  GPG, MPS
  • MPS Divisions will rely heavily on the merit review process in this initial phase to determinethose types of plan that best serve each community and update the information accordingly.  MPS
  • “NSF program management will implement these policies for dissemination and sharing of research results, in ways appropriate to field and circumstances, through the proposal review process; through award negotiations and conditions; and through appropriate support and incentives for data cleanup, documentation, dissemination, storage and the like.”  AAG
  • “Within the proposal review process, compliance with these data guidelines will be considered in the Program Officer’s overall evaluation of a Principal Investigator’s record of prior support.” EAR
  • “Efficiency and effectiveness of the DMP will be considered by NSF and its reviewers during the proposal review process.”  ENG
  • “After an award is made, data management will be monitored primarily through the normal Annual and Final Report process and through evaluation of subsequent proposals.  Subsequent proposals. Data management must be reported in subsequent proposals by the PI and Co-PIs under “Results of prior NSF support.””  ENG
  • “Strategies and eventual compliance with this policy will be evaluated not only by proposal peer review but also through project monitoring by NSF program officers, by division and directorate Committees of Visitors, and by the National Science Board.”  ENG
  • “Plans for the handling of data and other products will be considered in the review process.”  OCE
  • “Annual reports, required for all projects, should address progress on data and research product sharing. The Division of Ocean Sciences requires that final reports document compliance or explain why it did not occur. In cases where the final report is due before the required data or sample submission, the PI must report submission of metadata and plans for final submission. The PI should notify the cognizant Program Officer by e-mail after final data and/or sample submission.”  OCE
  • “Within the proposal review process, compliance with these data guidelines will be considered in the Program Officer’s overall evaluation of a Principal Investigator’s record of prior support.”  OCE
  • “Many of the proposals to DMS that require significant data management plans will beinterdisciplinary submissions… DMS expects principal investigators to address the customary data practices of partner disciplines in their proposals’data management plans, and reviewers are likely to be asked to comment on the suitability of those plans from the perspectives of the relevant disciplines.”  MPS-DMS

Exceptions?

  • All documents recognize the special needs of sensitive (eg human subjects) data and the need to protect IP rights.
  • “A valid Data Management Plan may include only the statement that no detailed plan is needed, as long as the statement is accompanied by a clear justification. “ GPG
  • “It is acceptable to state in the Data Management Plan that the project is not anticipated to generate data or samples that require management and/or sharing.  PIs should note that the statement will be subject to peer review.”  FAQ
  • “legal rights to intellectual property [..] Such incentives do not, however, reduce the responsibility that investigators and organizations have as members of the scientific and engineering community, to make results, data and collections available to other researchers.”  AAG
  • “General adjustments and, where essential, exceptions to this sharing expectation may be specified by the funding NSF Program or Division/Office for a particular field or discipline to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate the legitimate interest of investigators. “  AAG
  • “For example, human subjects protection requires removing identifiers, which may be prohibitively expensive or render the data meaningless in research that relies heavily on extensive in-depth interviews.”  SES
  • “These guidelines are considered to be a binding condition on all EAR-supported projects” EAR
  • not peer review?  “Exceptions to these data guidelines require agreement between the Principal Investigator and the NSF Program Officer.”  EAR
  • “Some proposals may involve proprietary or other restricted data. For example, projects having proprietary information that will eventually lead to commercialization, such as [..].  In addition, membership agreements, contracts, involvement with other agencies, and similar obligations may place some restrictions on data sharing.  The proposal’s DMP would address the distinction between released and restricted data and how they would be managed.”  ENG
  • “Exceptions to the basic data-management policy should be discussed with the cognizant program officer before submission of such proposals.” ENG
  • “if you plan to provide data and images on your website, will the website contain disclaimers, or conditions regarding the use of the data in other publications or products? If the data or products (e.g., images) are copyrighted (by a journal, for example), how will this be noted on the website?”  MPS-AST

Money

  • “Should the budget and its justification specifically address the costs of implementing the Data Management Plan?  As long as the costs are allowable in accordance with the applicable cost principles, and necessary to implement the Data Management Plan, such costs may be included (typically on Line G2) of the proposal budget, and justified in the budget justification.”  FAQ
  • “It is NSF’s strong expectation that investigators will share with other researchers, at no more than incremental cost”  FAQ  “no more than incremental cost” means that they can charge researchers to recover costs.
  • “These plans should cover how and where these materials will be stored at reasonable cost, and how access will be provided to other researchers, generally at their cost.”  SES

What to put in the data management plan?

  • “This supplement may include types of research output expected to be created, standards to be used, policies for sharing, provisions for reuse, and plans for preservation.”  GPG with emphasis added
  • “The DMP should clearly articulate how “sharing of primary data” is to be implemented…. The DMP should describe the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project. It should then describe the expected types of data to be retained…. The DMP should describe the period of data retention… The DMP should describe the specific data formats, media, and dissemination approaches that will be used to make data available to others, including any metadata.”  ENG
  • “It should outline the rights and obligations of all parties as to their roles and responsibilities in the dissertations, refereed journal articles, supplemental data management and retention of research data. It must also consider changes to roles and  responsibilities that will occur should a principal investigator or co-PI leave the institution. “ENG
  • “Any costs should be explained in the Budget Justification pages. “ ENG
  • “requires that proposal Project Descriptions outline plans for preservation, documentation, and sharing of data, samples, physical collections, curriculum materials and other related research and education products”  OCE
  • “DMR PIs should include in their Data Management Plan those aspects of data retention andsharing that would allow them to respond to a question about a published result.”  MPS-DMR
  • “Due to the diverse communities supported by DMR, the Division is not in a position to recommend a Division-specific single data sharing and archiving approach.”  MPS-DMR
  • “The Physics Division is not in a position to recommend a Division-specific single data sharing and archiving approach applicable to the disparate communities supported through the Division.The Division will rely on the process of peer review to allow each of these communities toidentify best practices.”  MPS-PHY

Other notes:

  • I didn’t go into detail extracting info from the IODP doc.  Useful, clear, lengthy doc!
  • Looking forward to hearing from the Biology Directorate.  Others?
  • “Goal: Provide for clear, effective, and transparent implementation of NSF policy for data management and dissemination”  ENG.  Awesome.
  • “Where data are stored in unusual or not generally accessible formats, explain how the data may be converted to a more accessible format or otherwise made available to interested parties. In general, solutions and remedies should be provided.”  MPS
  • “Ensure that dissemination of the scientific findings of all IODP drilling projects/expeditions are planned so as to gain maximum scientific and public exposure”  IODP.
  • A lot of emphasis on “other researchers.”  Obligations to share data with commercial researchers are not clear, except where the language emphasizes “public”
  • Overall, I’m pretty impressed by all of this.  I was hesitant about the new NSF policy based on preliminary info:  it felt like too small a step.  But the Directorates have stepped up and given it meat and a backbone.  Nice work.  NIH, your turn again.

Reference docs on current policy

  • [SES] Division of Social and Economic Sciences
  • [EAR] Division of Earth Sciences
  • [ENG] Engineering Directorate
  • [OCE] Division of Ocean Sciences
  • [IODP] Integrated Ocean Drilling Program

Related documents

  • Committee on Strategy and Budget Task Force on Data Policies Charge and timeline (Draft final report expected first half of 2011)
  • NSF Press Release 10-077:  Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans (May 10 2010)

ETA:  added MPS guidelines

8 Comments

  1. “in case they are useful for someone…” I should say!! I am preparing a talk on Open Science for the annual meeting of the Pacific Northwest Chapter of the Medical Library Association and this is a gold mine of materal. And this post is really fascinating it and of itself–you are really providing a very valuable public service by pointing out that major grantors (like the feds) are starting to require Open Science-friendly practices by applicants for funding.

    Comment by Hope Leman — October 8, 2010 @ 6:35 am

  2. Sounds like a great experiment for studies of hedged language (or more pointedly, weasel words). I can’t make heads or tails of what all that’s asking me to do with my funded work.

    The biggy, in addition to “reasonable time”, is “the need to protect IP rights.” This is compounded by the fact that very few projects or researchers are funded by a single source and that it’s often a rather arbitrary decision as to who payed for what. Under the old DARPA setup, companies would claim they developed software with their own money and did “research” with DARPA money.

    Are they kidding about “fully documented and robust models”? Most researchers don’t have the skills required to produce full documentation or robust models even if they wanted to. Of course, “fully” is up for grabs, as is “robust”.

    Comment by lingpipe — October 14, 2010 @ 2:08 pm

  3. It’s not at all clear to me that this NSF policy is compatible with my university’s policies on research. Columbia claims IP rights to everything done using Columbia equipment. Given that most researchers work in their offices over the campus internet, just about everything an academic does on campus uses the university’s equipment in some way.

    The distinction between “research” and “commercial” is very unclear. Given that universities (like Columbia) regularly assert ownership and patent rights “research” and companies (like LingPipe) often give away our “commercial” work, the intention of the distributor is often perverted.

    Comment by lingpipe — October 14, 2010 @ 2:12 pm

  4. [...] Categorized excerpts of NSF policy requirements and guidelines by Heather Piwowar at researchremix [...]

    Pingback by NSF policy on dissemination and sharing of research results « Dryad news and views — November 15, 2010 @ 11:14 am

  5. [...] provided by the directorates and divisions have given it real substance.  As you can see from this summary, the guidelines definitely vary across [...]

    Pingback by Other NSF Directorates, where are your data sharing guidelines? « Research Remix — November 15, 2010 @ 1:03 pm

  6. Thanks, Heather
    Gathering these bits together helped me a lot for researching a paper on a deadline! – Nicely organized

    Comment by Dave Fearon — February 11, 2011 @ 7:49 pm

  7. Heather, a wonderful and essential job.

    This killed me: “The National Science Foundation is committed to the principle that the various forms of data collected with public funds belong in the public domain. Therefore, the Division of Social and Economic Sciences has formulated a policy to facilitate the process of making data that has been collected with NSF support available to other researchers.”

    Though really, if you’re in Data Mgt you know there is a lot of complexity in the technology, and in the policy implications. It will take a while to come together.

    Comment by John Graybeal — May 4, 2011 @ 4:00 pm

  8. [...] the right place for datasets. Journals, along with others in the scholarly ecosystem (spurred on by recent requirements by funders for increased data availability, and evidence that researchers often don’t makedata [...]

    Pingback by thoughts on where journals are now, what to do next « Research Remix — December 3, 2011 @ 2:59 pm


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,544 other followers

%d bloggers like this: