National Database for Autism Research
Contact us

Standards

To be an effective research community resource, NDAR has adopted a number of important standards and conventions.

GUID

One of the most important standards NDAR supports is the NDAR Global Unique IDentifier (GUID). The GUID allows users of NDAR to share data specific to a research subject without exposing personally identifiable information (PII). Use of the GUID minimizes risks to study participants because it stores subject research data based on an anonymous identification. This allows each subject’s data to remain separate. Using the GUID, NDAR can to link together various types of information on a single participant that may have been collected at different times or locations, thereby giving a researcher access to a full range of information related to a single participant. Additionally, it is used by NDAR to define specific study population/sub-populations supporting a particular hypothesis. Use of GUIDs, or pseudo-GUIDs, is required for all data submitted to NDAR. The fields required to generate a GUID at a research site are:

Important: When collecting the above information at a research site, be sure to ask for the full legal name of the subject as it appeared on the birth certificate. Additionally, it is important to include the town or muncipality of birth as it appeared on the subject's birth certificate. By ensuring that this information is acquired and entered accurately at the research site, it is possible to identify subjects across research projects and allow for the definition of a subject population without having to share protected personnally identifiable information.

Other fields may optionally be provided to increase GUID precision. For sites that do not have those data or appropriate informed consents to capture and generate a GUID, NDAR allows a pseudo-GUID to be generated. The pseudo-GUID is essentially a random identifier. Although supported, a pseudo-GUID has limitations on how the data associated with it can be queried or used within NDAR. When possible, a GUID is preferred.

For more information on how the GUID works refer to GUID User Manual.

For more information on the GUID and its protections on privacy, refer to NDAR Policy and Procedures.

Data Dictionary

As of December 2008, NDAR supports approximately 10,000 data elements related to phenotypic, imaging and genomic data. Clearly, 10,000 data elements represent only a small fraction of the possible data elements collected and potentially shareable within the ASD research community. NDAR recognizes that a mechanism to support an order of magnitude increase in the number of data elements must be developed, and that this mechanism will need to accommodate local differences in the methods used to collect data at different research sites. Furthermore, the data dictionary must be a collaborative solution to be accepted by the research community. To support these requirements, NDAR is working to make the Data Dictionary available online with the capability of allowing NDAR users to add and to make modifications to existing data elements. This community-based solution to a community-wide need will foster a common understanding of important data-related issues. NDAR, in collaboration with the research community, is working through the requirements and procedures for this important community resource. For those wishing to help define what we hope will be the ASD data standard, please contact ndar@mail.nih.gov.

The current NDAR Data Dictionary for clinical assessments and imaging is available for download at http://ndar.nih.gov/ndarpublicweb/Documents/NDAR%20Codebook_CA_IMG.xls [Last Update: June 5, 2009].

The current genomics data dictionary is available at http://ndar.nih.gov/ndarpublicweb/Documents/Genomics_codebook.xls

Imaging

NDAR supports the receipt of raw brain images in DICOM format. NDAR also supports processed images in a variety of formats including DICOM, MINC 1.0 and 2.0, Analyze, NIfTI-1, AFNI and SPM. If you are using a different file format, please contact ndar@mail.nih.gov to let us to add to our list of supported standards.

Genetics

NDAR supports the submission of SNP and gene expression microarray. NDAR also stores descriptions associated with these data such as pedigree, biological samples, experiment design, experiment samples, reagents, and protocols. Note that NDAR does not store any specimen or samples, but does provide reference data for the storage location.

CDISC

NDAR has adopted the Clinical Data Interchange Standards Consortium (CDISC) ODM data standard. To support a simple data submission process, NDAR has provided a tool that will convert common tabular datasets into the CDISC ODM standard. For more information and/or to receive a copy of this tool, please contact ndar@mail.nih.gov.

Data Sharing

Before NDAR receives any data, the data must first pass automated quality checks defined in the NDAR Validation Tool. Once data is submitted into NDAR, it remains private to the investigator and individuals that the investigator has given specific access. The expectation is that data will shared as specified by the NDAR Policy, which is usually within 9 months from the date that data were provided to NDAR. In the event that an investigator would like to use NDAR as a platform for scientific collaboration, he/she has the opportunity to request permission to allow data to reside in NDAR without the expectation for sharing broadly until publication or primary objectives have been met, whichever comes first. See Data Sharing for the different possibilities available to share and communicate research available to an investigator using NDAR.

Federation

NDAR supports two methods to make relevant ASD research data available via NDAR: data submission and data federation. Data submission is the process used by those investigators who wish to contribute to NDAR by submitting a copy of their research data directly into the NDAR Central Repository, hosted at the NIH. Data federation allows NDAR to access other established ASD data repositories controlled by the NIH or others. Each offers its own unique advantages.

Data federation provides a significant benefit to the research community. A working data federation infrastructure hides the complexities of data location, data ownership, and data maintenance from the user. NDAR users will be able to query federated data resources - listed simply as just another NDAR Collection - in the same way as they would query data resident in the NDAR Central Repository.

For data federation, NDAR uses the BIRN data mediator software to query other data resources. However, no specific data federation software is required to federate with NDAR. Instead, NDAR will need to access an ANSI-99 SQL relational database (e.g. Oracle, SQL Server, MySQL, Postgres, etc.) with appropriate data views established. NDAR supports different permissions allowing the federated data resource to define different levels of access for NDAR users. For genetics, imaging and other rich datasets, NDAR supports file based access to these often large datasets. These datasets, their location defined in a database can be made available using either the BIRN SRB resident at the federated site or simply by allowing NDAR access to a sFTP file server or other similar technical standard containing the data.

For those interested in pursuing data federation, please refer to Submission Planning for detail on how to provide data to NDAR through federation.


Validation Tool

NDAR supplies a Validation Tool to assist researchers with the submission of data into the NDAR Central Repository. The Validation Tool imports the NDAR Data Dictionary and validates the metadata associated with the files identified by the NDAR user for submission against the data dictionary. The tool provides a report of any data discrepancies and warnings. If errors are found, a submission package cannot be created. The tool, which runs as a Java Web Start application, runs locally on a user’s computer, requiring the Java runtime environment to be installed.

Query Tool

The NDAR Query Tool is used to search the data in NDAR. It can be used against one or many NDAR Collections which a user has been granted access, regardless of whether the NDAR Collection data are located in the NDAR Central Repository or elsewhere using NDAR’s data Federation capabilities. This tool may also be used to query one or many NDAR Studies, allowing others to query against a defined population/sub-population. Queries are specified by the user based on selections made in the query entry page of the Query Tool. When a query is submitted by the user, the Query Tool makes a request to each data resource returning the results. The results are displayed to the user in the Query Tool results page and can be saved in XML or CSV format. Queries can be formed to unify Clinical Assessments, Imaging, and Genomics data into a single result. Subjects returned in the results may then be saved to an NDAR Study.

^ top of page

This page was last updated: Jun 4, 2009