Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea

	About this book
	AHRQ Evidence Reports, Numbers 1-60
	1. Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea
	Introduction
	Methodology
	Results
	Findings of Diagnostic Test Studies
	Prevalence and Comorbidity Studies
	Limitations and Strengths of the Evidence Base
	Conclusions
	Future Research
	References
	Figures
	Tables
	Appendix A Evidence Scoring Form
	Appendix B Data Extraction Form for Diagnostic Studies
	Appendix C Data Extraction Form for Prevalence and Outcome Studies
	Appendix D Peer Reviewer Questionnaire
	Appendix F Bibliography of Rejected Studies
	Appendix H AHCPR Data Matrix
	Appendix I AHCPR Data Matrix of Prevalence or Outcomes of Sleep Apnea Studies
	Appendix J Legend for Data Matrices
	Appendix E Bibliography of Accepted Studies
	Appendix G Bibliography of Pending Studies

1. Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea

THIS EVIDENCE REPORT IS OUTDATED AND IS NO LONGER VIEWED AS GUIDANCE FOR CURRENT MEDICAL PRACTICE. IT IS MAINTAINED FOR ARCHIVAL PURPOSES ONLY.

Evidence Report/Technology Assessment

Number 1

Prepared for:
Agency for Health Care Policy and Research

Department of Health and Human Services
U.S. Public Health Service
2101 East Jefferson Street
Rockville, MD 20852
http://www.ahcpr.gov

Contract No. 290-97-0016

Prepared by:MetaWorks, Inc., Boston MA
Susan D. Ross, MD, FRCPC
EPC Project Director

I. Elaine Allen, PhD
Katherine J. Harrison, BA
Marion Kvasz, MD
Janet Connelly, BS
Iris A. Sheinhait, MA

Investigators

AHCPR Publication No. 99-E002

February 1999

Preface

The Agency for Health Care Policy and Research (AHCPR), through its Evidence-based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHCPR and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHCPR encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHCPR expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome written comments on this evidence report. They may be sent to: Director, Center for Practice and Technology Assessment, Agency for Health Care Policy and Research, 6010 Executive Boulevard, Suite 300, Rockville, MD 20852.

Douglas B. Kamerow, M.D. John M. Eisenberg, M.D.

Director, Center for Practice and Technology Assessment Administrator

Agency for Health Care Policy and Research Agency for Health Care Policy and Research

The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Health Care Policy and Research or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service.

Structured Abstract

Objective. The objective was to establish the evidence base for diagnosing sleep apnea (SA) in adult patients using systematic review methods. Tests covered were sleep monitoring devices, radiologic imaging, laboratory assays, and clinical signs and symptoms posited for use in screening or diagnosing SA. The standard sleep lab polysomnogram (PSG) was the gold standard.

Search strategy. Literature published from 1980 through November 1, 1997 (cutoff) was searched using Medline and Current Contents, supplemented by a manual review of the bibliographies of all accepted papers.

Selection criteria. Studies of at least 10 adult patients suspected of or diagnosed with SA had to report the results of any test to establish or support a diagnosis of SA, relative to a standard PSG-derived apnea index (AI, the number of apneic episodes/Hour sleep); apnea-hypopnea index (AHI, the total apneas plus hypopneas during total time asleep, divided by the number of hours asleep); or respiratory distress index (RDI). Eligible languages were English, German, French, Spanish, or Italian. Diagnostic papers reporting prevalence or clinical comorbidities of SA were also accepted.

Data collection and analysis. Based on scores for study characteristics (e.g., random order test, blinding of test readers, and use of PSG comparison), 147 studies met or exceeded the minimum evidence score. From these, data on study, patient, and test characteristics and on results were collected. Nondiagnostic studies reporting prevalence or clinical comorbidities were separately extracted.

Study and patient-level covariates were summarized and the results were analyzed using fixed effects models. Results were evaluated using summary receiver operating characteristic (ROC) curves where data were available.

Main results. In 71 analyzable diagnostic or screening studies (7,572 patients), the sensitivity and specificity of partial channel and partial time PSGs appeared most promising as possible prescreening tests or replacements for full PSG. Prediction models achieved good sensitivity and specificity. Studies of portable devices were variable due to study and device heterogeneity. Radiologic studies and several miscellaneous studies of questionnaires, anthropomorphic signs, and ears/nose/throat (ENT) exams could not be analyzed due to insufficient data. Global clinical impressions and oximetry provided moderate sensitivity and specificity. Least accurate were flow volume loops. The review and analysis were limited by variability in PSG definitions of apnea and hypopnea, and thresholds for SA diagnosis. For sensitivity and specificity determinations, the lowest AI/AHI threshold for SA diagnosis was used. Necessary components of "standard" PSG were not consistent.

SA prevalence studies in different patient populations were reviewed. Few such studies utilized gold standard PSG to diagnose SA; so the diagnosis was based on unvalidated tests. Such prevalence estimates are suspect.

Conclusions. The best available evidence from literature sources suggests the diagnosis of SA is still best accomplished with full PSG. Progress has been made in establishing reasonable sensitivity and specificity of tests other than full PSG, and future researchers should focus on building this evidence base. Standardization of terms and diagnostic criteria is an absolute requirement to expedite development and enhance the utility of this literature in the future.

Suggested citation:
Ross, SD, Allen IE, Harrison KJ, et al. Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea. Evidence Report/Technology Assessment No. 1. (Prepared by MetaWorks Inc. under Contract No. 290-97-0016.) AHCPR Publication No. 99-E002. Rockville, MD: Agency for Health Care Policy and Research. February 1999.

Summary

Overview

In this study, MetaWorks investigators have developed an evidence base via a systematic review of the literature pertinent to diagnostic testing and screening in sleep apnea in adult patients. Sleep apnea (SA) is a recently recognized disorder of sleep characterized by recurrent apneic and hypopneic episodes. Apnea was typically defined as complete cessation of airflow, but in some studies, a >80 percent reduction in airflow was used. For defining hypopnea, most papers suggested a 50 percent or greater reduction in airflow was used, with or without a coincident O₂ desaturation of anywhere from 2 percent to 4 percent from some average SaO₂ over a preceding interval of time. In view of its high prevalence and serious associated morbidity, SA has recently been described as a major public health concern. A major problem in the field in 1998 is diagnosis: who to test, how to test, and what are the implications of test results regarding the risk of serious clinical sequelae?

Sleep apnea is a condition where the gold standard diagnostic method, overnight full channel polysomnography (PSG) in a sleep lab, is intrusive and costly, and the interpretation can be difficult. A standard PSG typically consists of electroencephalogram (EEG), submental (± tibialis) electromyogram (EMG), electrooculogram (EOG), respiratory airflow (usually by oronasal flow monitors), respiratory effort (usually by plethysmography), and oxygen saturation (oximetry). Electrocardiography (ECG) and body position are also frequently monitored in formal sleep studies and stated to be standard requirements of PSG by some groups.

If, however, the estimated prevalence of sleep apnea at 2 percent to 4 percent of middle-aged adults is correct, the costs of full PSGs to diagnose all suspected cases would be prohibitive. The development of simpler and less costly alternatives for diagnostic testing would be highly desirable as would simpler prescreening tests prior to full PSG. Diagnostic approaches which might be viewed either as alternatives to PSGs or as screening tests to better select patients for PSG include: partial channel PSGs; partial night or daytime PSGs; portable sleep monitoring devices for use at home; radiologic imaging of the head and neck for anatomic abnormalities predictive of sleep apnea, including cephalometry; magnetic resonance imaging (MRI) and computed tomography (CT) scans; anthropomorphic measurements, such as neck circumference; nasopharyngeal and laryngeal endoscopic measurements of both structure and function; and focused questionnaires. All such interventions were within the scope of this review, provided they compared results against the gold standard diagnostic test, the standard PSG.

Although the type of sleep evaluation study preferred (and reimbursed) varies widely among physicians, sleep centers, and managed care organizations, MetaWorks investigators have avoided making specific recommendations in this review. MetaWorks investigators also did not review technical considerations related to data acquisition, storage, retrieval, and analysis of various devices, which were beyond the scope of this project. Rather, it is intended that this synthesis of the best available evidence will serve as an information resource for local decisionmakers and developers of guidelines/recommendations. It should also serve to highlight gaps in literature and areas ripe for future research.

Reporting the Evidence

The key questions that guided this review were: 1) What diagnostic and screening tests are presently available? 2) What is the strength of the evidence in support of each? 3) What is the predictive value of these tests in different populations (which requires estimating the prevalence of SA in different populations)? 4) What are the implications of certain PSG results in terms of serious clinical events occurring as comorbidities in association with a diagnosis of SA?

Methodology

In general, MetaWorks investigators used systematic review methods derived from the evolving science of review research. The review followed a prospective protocol that was developed a priori and shared with the nominating partners on the project (Blue Cross/Blue Shield [BC/BS] of Massachusetts and the Sleep Disorders Centre of Metropolitan Toronto), a panel of technical experts (with representation from consumer groups and professional specialties: neurology, pulmonology, dentistry, otolaryngology, epidemiology, and nursing); and the Task Order Officers at the Agency for Health Care Policy and Research (AHCPR). The protocol outlined the methods to be used for the literature search, study eligibility criteria, data elements for extraction, and methodological strategies to minimize bias and maximize precision during the process of data collection, extraction, and synthesis.

The published literature was searched from 1980 to present. The search cutoff date was November 1, 1997, and the retrieval cutoff date was January 30, 1998. The search started with a broad Medline search using the terms "sleep apnea syndrome" and "monitoring, physiologic," "sleep apnea syndrome" and "airway resistance," and "human." Also, MetaWorks investigators searched "sleep apnea syndromes," "sleep apnea syndrome" and "index." In addition, the 1997 Current Contents CD-ROM was searched ("sleep apnea") to the same cutoff date. All citations and abstracts were printed and screened at MetaWorks for any mention of diagnostic tests in adults with SA, for which full papers were obtained. The electronic searches noted above were supplemented by a thorough search of the reference lists of all eligible studies and relevant review articles. To be included in the review, studies had to report results of any diagnostic test or intervention to establish or support a diagnosis of SA in adults, with at least 10 patients as total sample size. Studies reported in the following Western European languages - English, German, French, Spanish, or Italian - were accepted.

All eligible papers were scored on features pertinent to diagnostic test study design, execution, and reporting, with a range of possible scores from 0 to 44. Those falling in the lowest 20 percent of the distribution of actual scores were dropped from data extraction and analysis. Each accepted diagnostic study was extracted in duplicate by investigators with one extractor using a blinded copy of each study report, masked as to source of financial support, authors, and journal. The agreement between extractors was approximately 78 percent and differences were resolved by consensus.

Key data elements sought for extraction from each study included study level, patient level, and test characteristics. Only clearly reported aggregate results were extracted from studies. Results that were only given for individual patients and results that would require extrapolations from graphs or derivations from figures or tables were not captured. For all tests, sensitivity, specificity, positive predictive value, negative predictive value, and correlation coefficients of each test relative to PSG AI or AHI (RDI) results were sought. (Apnea index [AI] is defined as the number of apneic episodes/hour sleep, and apnea-hypopnea index [AHI] is the total apneas plus hypopneas during total time asleep, divided by the number of hours asleep. The respiratory distress index [RDI] is the same as AHI.)

The main objective of the analysis was to evaluate the diagnostic accuracy of alternatives to full PSG for the diagnosis of SA as compared to a full PSG (gold standard). Initially, weighted averages using Mantel-Haenszel fixed effects models combining the comparative summary statistics were calculated and summarized for groups based on diagnostic test category. Study and patient-level covariates and study evidence scores were also summarized for each diagnostic test category. A summary receiver operating characteristic (ROC) curve was calculated for each diagnostic group where data were available. While differences among studies may be an argument against estimating one common sensitivity and specificity using fixed or random effects models, these factors can be described using the summary ROCs, which both display and summarize the heterogeneity.

A group of 22 peer reviewers drawn from consumer groups and professional organizations, along with our technical experts and partners, was assembled to review and provide suggestions to the draft final report describing this project. Their feedback, as well as that from AHCPR, was incorporated wherever possible within the original scope of the project.

Findings

All Studies: PSG

71 studies (7,572 patients), mean evidence score = 20.6 (range, 16 to 34). Level III to IV evidence overall (that is, primarily derived from case series and observational studies).
Variability in PSG definitions of apnea and hypopnea, and AI or AHI thresholds for diagnosis, with or without presence of clinical signs and/or symptoms.
Variability in components of "standard" PSG is evident, and requirement for all "standard" PSG channels not established in SA diagnosis. Night to night PSG reproducibility is not well documented and may differ by SA diagnostic thresholds.

Partial Channel PSGs

3 studies of partial channel PSGs (213 patients), mean evidence score = 17.7 (range, 17 to 19).
Sensitivity ranged from 82 percent to 94 percent and specificity from 82 percent to 100 percent.
Sensitivity and specificity of partial channel appear promising as possible prescreening tests or replacements for full PSG.

Portable Devices

25 studies of portable monitoring devices (1,631 patients), mean evidence score = 22.1 (range, 16 to 34).
Portable device results were mostly from supervised sleep labs, not at home.
Reliability in unattended home use, equipment failure rates, night to night reproducibility, price, compliance, and safety are rarely reported.
Sensitivity ranged from 32 percent to 100 percent and specificity from 33 percent to 100 percent.
Studies of portable devices were variable due to study and device heterogeneity.

Oximetry

12 studies of oximetry alone (1,784 patients); mean evidence score = 20 (range, 16 to 32).
Mean sensitivity and mean specificity are 87.4 percent (range, 36 percent to 100 percent) and 64.9 percent (range, 23 percent to 99 percent), respectively.
Oximetry studies provided moderate sensitivity and specificity.

Partial Time PSGs

7 studies of partial time PSGs (505 patients), mean evidence score = 18.6 (range, 17 to 20).
Mean sensitivity at AI/AHI threshold of 5 was 69.7 percent (range, 66 percent to 93 percent), and at threshold of 10, 79.5 percent (range, 42 percent to 89 percent). Specificity at AI/AHI threshold of 5 was 87.4 percent (range, 50 percent to 100 percent) and at threshold of 10, 86.7 percent (range, 57 percent to 100 percent).
Sensitivity and specificity of partial time PSGs appear promising as possible pre-screening tests or replacements for full PSG.

Radiologic

5 radiologic studies - 1 MRI, 3 cephalometry, and 1 CT + cephalometry - not meta-analyzable.
Radiology studies could not be analyzed due to insufficient data.

Miscellaneous

17 clinical studies (too few studies each for anthropomorphic signs or ears/nose/throat [ENT] exams). Also, 1 chemical assay and 3 questionnaire studies not meta-analyzable.
4 studies of flow volume loops (595 patients), mean evidence score = 18.3 (range, 17 to 20). When both FEF₅₀/FIF₅₀ (a measure of extrathoracic airway obstruction) and the sawtooth sign (indicative of pharyngeal fluttering during respirations) were analyzed together, the mean sensitivity was 39.1 percent (range, 41 percent to 59 percent) and mean specificity was 60.5 percent (range, 54 percent to 85 percent).
4 studies of global impressions of clinicians (1,139 patients), mean evidence score = 23 (range, 19 to 28). Mean sensitivity = 58.9 percent (range, 52 percent to 79 percent), specificity = 65.6 percent (range, 50 percent to 100 percent).
Several miscellaneous studies of questionnaires, anthropomorphic signs, and ENT exams could not be analyzed due to insufficient data.
Global clinical impressions provided moderate sensitivity and specificity; least accurate were flow volume loops.

Prediction Equations

8 models (1,908 patients), mean evidence score = 21.5 (range, 17 to 30). Mean sensitivity = 66.5 percent (range, 61 percent to 98 percent) and mean specificity = 88.7 percent (range, 21 percent to 100 percent).
Prediction models achieved high sensitivity and specificity.

Prevalence Studies

General populations: 11 prevalence studies (2,410 patients), mean prevalence of SA = 9.2 percent (range, 0 to 33 percent).
Healthy elderly: 7 prevalence studies (469 patients), mean prevalence of SA = 34.6 percent (range, 2 percent to 43 percent).
Coronary artery disease: 8 studies (461 patients), mean prevalence of SA = 54.9 percent (range, 50 percent to 100 percent).
Hypertension: 4 studies (166 patients), mean prevalence SA = 26.9 percent (range, 22 percent to 30 percent).
Erectile dysfunction/impotence: 3 studies (1,138 men), mean prevalence of SA = 42.2 percent (range, 11 percent to 44 percent).
Other special populations (stroke, end stage renal disease, congestive heart failure, Alzheimer's disease, depression, and healthy offspring of SA patients): too few studies to summarize.
Caveat: Prevalence studies may be underrepresented in this set due to search strategy of identifying primarily diagnostic studies.
Caveat: Few prevalence studies here utilized gold standard PSG to diagnose SA, so diagnosis based upon unvalidated tests. Such prevalence estimates are suspect.

Comorbidity Studies

Conditions associated with SA:

Hypertension: 24 studies (3,497 SA patients), mean proportion with hypertension = 42.0 percent (range, 9 percent to 77 percent).
Coronary artery disease: 9 studies (1,086 SA patients), mean proportion with coronary artery disease (manifest as angina or myocardial infarction [MI]) = 20.3 percent (range, 2 percent to 33 percent).
Ventricular arrhythmias: 5 studies (205 SA patients), mean proportion with ventricular arrhythmias (usually complex arrhythmias, during nocturnal monitoring) = 13.1 percent (range, 3 percent to 47 percent).
Mortality: 5 studies (2,281 SA patients) with prolonged follow-up (5 to 13 years) reported deaths (all causes) in 6 percent to 11 percent of patients, mean = 7.0 percent.
Caveat: Studies with actual clinical consequences of certain AIs, with or without signs and symptoms, and with or without treatment, are not well represented in this set. Inclusion of treatment studies might be useful. Clinical implications of SA diagnosis are unclear.

Future Research

Future studies of diagnostic test strategies should address the many limitations of the literature noted above. The field could benefit from adoption of a common terminology and definitions for fundamental concepts such as apnea and hypopnea, and the relation between AI and AHI should be established, in order to allow conversions and comparisons across studies. Researchers should seek to clarify what is the frequency of sleep apnea/hypopnea in general populations by gender and age. More naturalistic sleep studies (in the home) are still of interest, as MetaWorks investigators suspect much of the uncertainty about the nature of SA, its pathophysiology, the risk factors, and the clinical consequences, derive from the fact that the phenomenon of SA may be altered by the fact of observing it via standard PSG. Long term follow-up studies are recommended to better document the findings of treated vs. untreated SA. Lastly, all sleep monitoring systems which are proposed as prequalifiers or replacements for PSG must be validated in the setting in which they are intended to be used.