Systematic Literature Review
Scott A. Shipman, M.D., M.P.H.,a Mark Helfand, M.D., M.P.H.,b Virginia A. Moyer,
M.D., M.P.H.,c Barbara P. Yawn, M.D.,
M.Sc.d
The authors of this article are responsible for its contents,
including any clinical or treatment recommendations. No statement in this article
should be construed as an official position of the Agency for Healthcare Research
and Quality or the U.S. Department of Health and Human Services.
Address correspondence to: Scott A. Shipman, M.D., M.P.H., Department
of Pediatrics, Oregon Health & Science University, 707 SW Gaines Rd., CDRC-P, Portland, Oregon 97239. E-mail: shipmans@ohsu.edu.
Contents
Abstract
Introduction
Methods
Results
Discussion
Future Research
References
Notes
Abstract
Background: Developmental
dysplasia of the hip (DDH) represents a spectrum of anatomic abnormalities that
can result in permanent disability.
Objective: We sought to gather and synthesize the
published evidence regarding screening for DDH by primary care providers.
Methods: We performed a
systematic review of the literature using a best evidence approach as used by
the U.S. Preventive Services Task Force.
The review focused on screening relevant to primary care in infants from
birth to 6 months of age, and on interventions employed before 1 year of age.
Results: The literature on
screening and interventions for DDH suffers from significant methodological
shortcomings. No published trials
directly link screening to improved functional outcomes. Clinical examination and ultrasound identify
somewhat different groups of newborns at risk for DDH. A significant proportion of hip abnormalities
identified through clinical examination or ultrasound in the newborn period
will spontaneously resolve. Very few
studies examine the functional outcomes of patients who have undergone therapy
for DDH. Due to the high rate and
unpredictable nature of spontaneous resolution of DDH and the absence of
rigorous comparative studies, the effectiveness of interventions is not known. All surgical and nonsurgical interventions
have been associated with avascular necrosis of the femoral head, the most
common and most severe harm associated with treatment of DDH.
Conclusion: Screening with
clinical examination or ultrasound can identify newborns at increased risk for
DDH, but due to the high rate of spontaneous resolution of neonatal hip
instability and dysplasia and the lack of evidence of the effectiveness of
intervention on functional outcomes, the net benefits of screening are not
clear.
Key Words: developmental
dysplasia of the hip, DDH, hip dysplasia, mass screening, infants, systematic
review
Return to Contents
Introduction
Developmental
dysplasia of the hip (DDH) represents a spectrum of anatomical abnormalities in
which the femoral head and the acetabulum are in improper alignment and/or grow
abnormally. The precise definition of DDH is controversial.1,2 The spectrum includes hips that are
dysplastic, subluxated, dislocatable and dislocated. Clinical instability of the hip is the
traditional hallmark of the disorder. In an unstable hip, the femoral head and
acetabulum may not have a normal tight, concentric anatomic relationship, which
can lead to abnormal growth of the hip joint and may result in permanent
disability. DDH can lead to premature
degenerative joint disease, impaired walking, and chronic pain.
Estimates of the
incidence of DDH in infants vary between 1.5 and 20 per 1000 births.3 The incidence of DDH in infants is influenced
by a number of factors, including diagnostic criteria, gender, genetic and
racial factors, and age of the population in question.4 The reported incidence has increased
significantly since the advent of clinical and sonographic screening,
suggesting possible overdiagnosis.2 In addition to a higher prevalence of DDH in
females, reported risk factors for the development of DDH include a family
history of DDH, breech intrauterine positioning, and additional in utero postural deformities.5-7 However, the majority of cases of DDH have no
identifiable risk factors.8
Self-limited hip
instability is a common finding in newborns.9 More than 80% of clinically unstable hips
noted at birth have been shown to resolve spontaneously.10 Because of the potential for subsequent
impairment and the widespread belief that earlier treatment leads to improved
outcomes, screening newborns for DDH has become commonplace. However, the high rate of spontaneous
resolution raises uncertainty about the most appropriate plan of action when a
newborn has a positive screening examination for an unstable hip.
Intervention for
DDH includes both nonsurgical and surgical options. A variety of abduction
devices are used to treat DDH nonsurgically, with the Pavlik method among the
most common. These
devices place the legs and hips in an abducted and flexed position in an effort
to promote proper alignment and stabilization of the hip joint. The duration of treatment varies from center
to center. Complications of
nonsurgical therapy are not trivial, with avascular necrosis of the femoral
head among the most serious.3
Surgical
intervention may be necessary when DDH is severe, when it is diagnosed late, or
after an unsuccessful trial of nonsurgical methods.11 Many surgical procedures are
used to treat DDH, most of which involve manual reduction of the femoral head
into the acetabulum, with or without additional procedures on the adductor
and/or iliopsoas tendons, the femur, or the acetabulum. Preoperative management may include a period
of traction, and postoperative management typically includes a period of fixed
positioning in a spica cast. The
duration and specific approach to pre- and post-operative management are highly
variable. Surgical intervention places
the hip at risk of avascular necrosis, in addition to standard operative risks
including general anesthesia, intraoperative complications, and post-operative
wound infections.
This evidence synthesis assesses the literature on screening and intervention for
developmental dysplasia of the hip. It
was conducted for the U.S. Preventive Services Task Force (USPSTF), which had
no previous recommendations for this condition. Two systematic reviews of DDH
have been published previously, one by the Canadian Task Force on Preventive
Health Care (CTFPHC)3 and another by the American
Academy of Pediatrics (AAP).1,4 This evidence synthesis
summarizes this previous work as applicable, and incorporates studies published
since these reviews were completed.
Return to Contents
Methods
The analytic
framework and key questions (Figure 1, 12 KB) guiding the literature review were
developed in consultation with liaisons from the USPSTF. We focused on screening in infants from birth
through 6 months of age. The overarching question (KQ1) considers direct
evidence linking screening to improved patient outcomes. The remaining key questions examine critical
links in the logic underlying screening. To be effective, screening must
identify cases of DDH earlier than they would be identified in the usual course
of care (KQ2, 3). In addition, early
identification must lead to earlier treatment, and earlier treatment must lead
to better functional outcomes than late treatment (KQ5). Finally, the benefits of early identification
and treatment must outweigh the harms of screening and of the treatments
themselves (KQ4, 6). Finally, pending
sufficient evidence of effectiveness, evidence regarding the cost-effectiveness
of screening is considered.
Literature Search Strategy
Two recent
systematic reviews of screening for DDH, by the AAP and the CTFPHC, targeted
several questions also relevant to this review.
We utilized the previous reviews to focus the search strategy and
eligibility criteria for our review.12 When questions had substantial overlap, we reviewed all studies
identified in these reviews and searched the literature for studies published
subsequently (after 1996 for the AAP review and 2000 for the CTFPHC review).
Additionally,
relevant studies were identified from multiple searches of MEDLINE® (1966 to
January 2005) and the Cochrane Library databases through June of 2004. Specific search strategies are available from
the authors. Additional articles were
obtained by reviewing reference lists of other pertinent studies, reviews,
editorials, and Web sites, and by consulting experts. This strategy was modified for assessments of
screening modalities in Key Question 3, in which we focused our review on the
relevant literature beginning in 1996, the year in which the AAP review
concluded.
Inclusion/Exclusion
Criteria
Investigators
reviewed all abstracts identified in the searches and the previous systematic
reviews and determined eligibility by applying inclusion and exclusion criteria
specific to key questions.12 Full-text papers of included
abstracts were then reviewed for relevance.
Eligible studies had English-language abstracts, were applicable to U.S.
clinical practice, and provided primary data relevant to key questions. Non-English literature with English abstracts
was reviewed to identify any controlled trials.
We excluded so-called teratological DDH, that occurring in children with
neuromuscular disorders or other congenital malformations. For all included studies, initial screening
had to be conducted in children less than 6 months of age, and screening
studies needed to be prospective, primary care based or population based in
design. Studies of risk factors also had
to be primary care based or population based.
Intervention and outcomes studies had to report results of children
diagnosed before 6 months of age, and interventions had to be employed earlier
than 1 year of age on average. For
intervention studies, we were particularly interested in functional outcomes,
including: gait, pain, physical
functioning, activity level, peer relations, family relations, school and
occupational performance. For
noninvasive interventions, another potential benefit is a reduced need for
surgery later in childhood. Therefore,
intervention studies were eligible if they reported one of these functional
outcomes and/or a subsequent need for surgery.
Studies that reported only radiological reports of anatomic structural
relationships and development, which have not been shown to be valid predictors
of functional outcomes, were excluded (indicated by a dotted line in the
analytic framework). For avascular
necrosis (AVN), the predominant harm from interventions, studies needed to
report the rate of this complication in the treated patient population, meet
age-based inclusion criteria, have at least 1 year of followup, and not
experience excessive (>50%) loss to followup.
We used a "best
evidence" approach13; that is, for each key
question, we included studies with weaker designs only if better-designed
studies were not available. Case
reports, series with 5 or fewer subjects, editorials, letters, nonsystematic
review articles, and commentaries were excluded from the evidence review.
Most
studies of DDH are observational, uncontrolled or poorly controlled, and have
significant flaws in design. To assess
the quality of these studies, we considered the following: study design,
clarity of diagnostic standards, comparability of subjects, variation in
screening approach and/or intervention protocol, duration of followup, loss to
followup, efforts to control for confounding and minimize bias, masking of
outcome assessment, and validity and standardization of outcomes measured.14
Size
of Literature Reviewed
Investigators reviewed 1,145 abstracts of
English-language articles identified by the searches, excluding 679 citations
on first review. Review of an additional 544 abstracts of non-English
language articles identified no controlled trials. A total of 466 full-text articles were retrieved and reviewed; 416 were from the
electronic searches and 50 were from reference lists or experts' suggestions (list
of expert reviewers available upon request from the authors). The following met inclusion criteria: thirteen papers about risk factors; 59 about
screening, including 3 controlled trials; 5 about harms of screening; 47 about
interventions and harms of interventions, including no controlled trials; and 8
about cost.
Return to Contents
Results
Key Question 1. Does
Screening for DDH Lead To Improved Outcomes (including reduced need for surgery
and improved functional outcomes such as:
gait, physical functioning, activity level, peer relations, family
relations, school and occupational performance)?
There are no prospective studies—either randomized
or observational—comparing a screened to a non-screened population with
measurement of functional outcomes after an adequate period of followup. There are also no controlled trials that
compare surgical or nonsurgical treatment for early DDH to observation only.
In theory, early
application of noninvasive treatments (e.g., a harness) to obtain a concentric
and stable reduction of the femoral head in the acetabulum may obviate the need
for surgery later on. However, the
evidence that screening leads to a reduced rate of surgery is weak and
indirect. The 2000 CTFPHC report, citing
several descriptive studies, concluded "With serial clinical examination, the
operative rate for DDH has decreased by more than 50% to 0.2-0.7% per 1000."3 It should be noted that this reduction was
observed at an ecological level: descriptive studies in screened populations
were compared, indirectly, to unscreened populations or to historical rates.
The studies were not comparative and did not report functional outcomes. In addition, while some studies suggest that
surgical rates have declined since the adoption of universal screening
programs, they do not indicate why. The
decline might be attributable to increased rates of screening, but other
factors, such as wider use of a period of observation before recommending
surgery, could also account for the declining use of these surgical
procedures.
The outcome
measure used in many studies was the proportion of infants and children with
DDH who had surgical intervention. If
screening identifies more cases than usual care, it could reduce this
proportion even if the same number of cases required surgery as before. For this reason it is difficult to determine
whether a decrease in the surgical rate over time reflects the efficacy of
noninvasive intervention or the inclusion of additional cases in the
denominator who are at little or no risk of requiring surgery.
The findings are also inconsistent: some studies observed a decrease in operative
rates,15-18 while others saw no change19,20 or an increase.21-23 Ascertainment of cases was often flawed, and
the studies span several decades, making it difficult to assess whether the
varied results represent artifacts of data quality, secular trends, or
differences in local practice styles.24 These studies are also limited because they
typically do not follow the screen-negative population with the same vigilance
as the screen positive population, and experience significant loss to followup
in the screen positive population that can bias the outcomes.
More recent
studies also have conflicting results.
In 1998, the MRC Working Party on Congenital Dislocation of the Hip
reported operative rates in a randomly selected, population-based survey of 20%
of all births in the U.K.24 After adjustment for differences in
ascertainment that had been overlooked in previous reports, the incidence of a
first operative procedure for congenital dislocation of the hip was similar
before and after screening was introduced (pre-screening rate range 0.66-0.85
per 1000, post-screening rate 0.78 per 1000 live births, 95% CI 0.72-0.84 per
1000). Even in the screening era, 70% of
the cases reported by surgeons to the registry had not been detected by
screening. In 1999, Australian
investigators reported the operative rate in the post-screening era using an
existing perinatal database and an inpatient discharge database to identify
infants with congenital dislocation of the hip.25 In contrast to the U.K. study above, they
reported an operative rate of 0.46 per 1000 live births and found that 97.6% of
congenital dislocation cases were diagnosed before 3 months of age. The causes behind conflicting findings such
as in these two studies are unknown.
Key Question 2. Can Infants at High Risk for DDH Be
Identified, and Does This Group Warrant a Different Approach to Screening Than
Children at Average Risk?
Risk factors are
considered an adjunct to, rather than a substitute for, universal screening by
physical examination. For example, the
AAP recommends using risk factors to identify newborns whose risk for DDH may
exceed the comfort level of physicians, prompting additional screening using
ultrasound. The rationale for this
approach is that, in high-risk newborns, clinical examination alone can miss
many cases of DDH that ultrasound may be able to identify. The assumptions underlying this approach are:
- Risk factors can identify a group of newborns at a high risk of DDH.
- Ultrasound is more sensitive than clinical examination for identifying infants at risk of complications from DDH.
In case control
and observational studies, breech positioning at delivery, family history of
DDH, and female gender have been most consistently shown to have an association
with the diagnosis of DDH. Additional
risk factors may include maternal primiparity, high birthweight,
oligohydramnios, and congenital anomalies.
Primary care and
population-based cohort studies26-36 that include one or more of
the major risk factors are summarized in Table 1. Consistently, only a minority
(10-27%) of all infants diagnosed with DDH in population-based studies have
identified risk factors (with the exception of female gender)30,32,33,35 and among those with risk
factors, between 1% and 10% have DDH.30,33,35 This wide range illustrates
the impact of the reference standard on the relative importance of risk
factors. Those studies with a more
strict standard for diagnosing "true" DDH, for instance limited to those
patients that receive treatment, demonstrate substantially lower rates of DDH
among those with risk factors. For
example, a recent cohort study of 29,323 births at one hospital, the prevalence
of treated DDH was 20/1000 in breech females, versus 110/1000 in this group if
the diagnosis of DDH had been based upon an abnormal clinical exam. Additional rates of DDH using the more strict
reference standard: 12/1000 in family history positive females, 4/1000 in
breech males, 5/1000 and 0.3/1000 in females and males with no risk factors,
respectively.28
Lehmann and
colleagues conducted a meta-analysis of studies published through 1996 to
estimate the probability of having a positive screening test for the three
leading risk factors.1 Breech females (84/1000) had a dramatically
higher than average risk (calculated at 8.6/1000 for all newborns) of being
screen-positive, followed by family history positive females (24/1000), breech
males (18/1000), females with no risk factors (14/1000), and males with no risk
factors having the lowest risk (3/1000).
When considering these prevalence estimates, it should be noted that the
reference standard used in Lehmann's synthesis was a positive Barlow or
Ortolani test at the newborn screening examination. While this is a commonly used measure of the
disorder, it may overestimate the number of infants with "true" DDH, i.e., those
that do not spontaneously resolve and thus require therapy. The substantial
differences in prevalence between the AAP review and the previous
population-based study is likely to reflect different diagnostic standards, and
impacts the predictive value of risk factors for DDH. Further implications of the lack of a
practically applied "gold standard" for diagnosing DDH is discussed in greater
detail under KQ3.
Several potential
biases should be considered in evaluating risk factor data. In studies where the examiner is aware of
patients' risk factor status, the diagnosis of DDH may be overestimated due to
more careful or thorough examinations or more aggressive followup and
reexamination in infants with known risk factors. Moreover, in retrospective studies researchers
apply criteria to improve the reliability of their record review; this
approach, while necessary to conduct such a study, reduces the influence of an
equivocal or inaccurate history. A
predictor such as family history may be less reliable in a prospective,
practice-based study than in case control studies which exclude patients
(charts) that have equivocal or incomplete information about it. Finally, investigators' awareness of the
subjects' final diagnoses could influence the way risk factor information is
handled in retrospective studies.
Key Question 3. What Is the Accuracy of Screening Tests for
DDH, and Does Screening for DDH Lead to Early Identification of Children With
DDH?
The most common
methods of screening for DDH involve the physical examination of the hips and
lower extremities. Provocative testing includes the Barlow and Ortolani
maneuvers, which involve adduction of the flexed hip with gentle posterior
force, and abduction of the flexed hip with gentle anterior force,
respectively. The Barlow test attempts
to identify a dislocatable hip,10,37 while the Ortolani exam
attempts to relocate a dislocated hip.38 Due to variations in
technique, the Barlow and Ortolani tests have been shown to have a high degree
of operator dependence.39 In addition, confusion about
the identification of a "click" versus a "clunk" on these tests, and the
significance of each of these findings, can lead to disparate conclusions
between examiners. Additional findings
sometimes reported on clinical examinations for DDH in infants include
asymmetry of gluteal and thigh skin folds, discrepant leg lengths, and
diminished range of motion (particularly abduction) in an affected hip.4
To measure
sensitivity of a test directly in a prospective study, infants who had negative
initial screening tests must be followed and examined at older ages to identify
false negative initial test results.
Measuring sensitivity is also difficult because results of the Barlow
test can be classified into several levels, rather than just two ("positive" or
"negative"). Conversely, measuring
specificity and false positives is difficult because, in most studies, all
infants who have a positive screening test are treated with a nonsurgical
intervention; the great majority improve, and it is impossible to say how many
of them "responded" and how many of them did not have DDH in the first
place.
Assessing the
impact of a screening program on the rate of late diagnosis of DDH provides an
indirect measure of sensitivity. It is
apparent that screening tests performed soon after birth identify some
individuals at risk of developing DDH sooner than they would otherwise be
identified: most children would otherwise not come to medical attention until
they present with crawling or gait delays or disturbances. However, it is difficult to quantify the
impact of screening tests on the incidence of late diagnosis with the available
literature. Studies of the impact of
screening programs on the frequency of late diagnosis have had mixed results.16-18,21,25,40-52 Most of these studies report the experience
of a screening program in a defined geographic or hospital service area over
many years. The comparisons are
ecological, and these studies have the same methodological problems as those
that examined the effect of screening on rates of surgical treatment (discussed
above under KQ1). Some studies in this group reported that, after a screening
program was adopted, late diagnosis was very rare, while others report that
screening had no effect on the rate of late diagnosis, and that unexplained
fluctuations in late diagnosis rates were observed from year to year within the
post-screening era (Figure 2).16-18,20-22,29,33,40,45,50,53
The lack of a
practical confirmatory "gold standard" diagnostic test for DDH makes it
difficult to assess—or define—false positives. Various reference standards
appear in the literature, including positive clinical examination, ultrasound
confirmation, radiographic confirmation, arthrography, persistence of abnormal
findings on serial exam or ultrasound over weeks to months, diagnosis by an
orthopedist, and use of treatment. The most meaningful reference standard
defines "true" DDH as "those neonatal hips, which, if left untreated, would
develop any kind of dysplasia and, therefore, are to be included in the
determination of DDH incidence."2
To apply this
standard, a cohort study must follow infants for a long enough period without
applying any treatment, in order to determine whether or not the abnormal
findings persist and lead to clinical problems.
In one good-quality prospective cohort study that followed untreated
infants for 2 to 6 weeks, approximately 9 of 10 infants with initially abnormal
ultrasound examinations revert to normal.2 Similarly, by 2-4 weeks of age, over 60% of
infants identified at birth by abnormal clinical examination (Barlow or
Ortolani tests) have reverted to normal when judged by repeat clinical examination
or by ultrasound examination.10,37,54 Longer prospective studies28,53-59 and a systematic review of
observational studies of ultrasound screening60 demonstrate that in untreated
hips, mild dysplasia without frank instability usually (consistently over 90%)
resolves spontaneously between 6 weeks and 6 months.
The clinical exam
approach to diagnosis for DDH shifts over time. Barlow and Ortolani tests
become less sensitive as infants age, due to factors including increased
strength, bulk, and size.3,4 In their place, assessment of hip abduction
becomes the preferred examination, because infants with dislocated hips have
increased contractures of the hip adductors.4 In general, the specificity
of examination improves as infants age, because the hips of the newborn infant
are more likely to exhibit transient and clinically insignificant laxity than
they will subsequently.37 Two recent studies provide indirect insight
into the changing signs of DDH as the infant ages. In a study of 1071 referred infants at one
center, only 2 of 34 (6%) hips in patients with positive Barlow or Ortolani
tests, confirmed as dislocatable by ultrasound, had any limitation in abduction
at 1-2 weeks of age, suggesting that limited abduction has poor sensitivity in
newborns.61 Specificity of limited hip abduction in
newborns was also poor: among 203 1-2 week
old infants with limited abduction, <20% had abnormalities on
ultrasound. These findings contrasted
with older children: of the eight
patients who presented after six months of age with dislocatable hips, hip
abduction was limited in 7 (87.5%). In the
second study, a prospective observational study limited to infants greater than
3 months of age (N=683), unilateral limited hip abduction had a sensitivity of
69% (156/226), and a specificity of 54% (247/457).62 The reference standard in this study was any
ultrasound abnormality; among the subset of subluxable and dislocatable hips,
sensitivity of limited hip abduction was >82%. Of the 136 patients with limited abduction
and normal ultrasound findings at the initial exam, none showed exam or gait
abnormalities at 5 years of age. Though
not conclusive, these studies suggest that hip abduction is a relatively
insensitive and nonspecific marker of DDH in early infancy, but becomes more
accurate after 3-6 months of age and with more severely affected hips.
Additional
physical examination findings sometimes linked to DDH include asymmetrical
gluteal and thigh skinfolds, and leg length discrepancy. No studies from the past 40 years were
identified which assessed the value of these findings in diagnosing DDH. In 1962, Barlow pointed out the lack of
utility of asymmetric skin folds due to their poor sensitivity and specificity,10 and in 1961 Palmén studied
500 random newborns, finding that 27% had no thigh skinfolds, 40% were
symmetrical, and 33% asymmetrical; 4 of these 500 babies had an abnormal
provocative test of stability, of which 2 had symmetrical skinfolds.63 Based on this scarce and unsupportive literature,
it is difficult to conclude that these additional findings on exam are useful.
The degree of
training and experience with the clinical examination of the hip in infants has
been shown to be a strong predictor of the test characteristics. Pediatricians have been shown to have a case
identification rate of 8/1000, whereas orthopedists identify approximately
11/1000.1 Two studies show that having duplicate
blinded examinations by a pediatrician and an orthopedist improves the
sensitivity, specificity, and predictive value of clinical exam screening.64,65 Additional studies show that well-trained
non-physicians, including physiotherapists and neonatal nurse practitioners,
perform at least as well as physician examiners, and better than physician
trainees.66-68 In one single site
longitudinal study, as the number of pediatricians involved in screening
infants increased (holding steady the overall number of newborns screened), a
greater number of cases of DDH were missed despite an increased rate of
suspected cases identified.69 In other words, both sensitivity and
specificity suffered when there was less centralized oversight of the newborn
screening program and when fewer infants were screened, on average, by each
pediatrician.
Studies comparing
pediatricians with orthopedic surgeons often employ a study design in which the
orthopedist reviews a subset of hips found to be positive or questionable by a
previous examiner. This second exam may
happen days after the initial examination.
Also, the surgeons often have at their disposal the results of ultrasonography,
and their clinical examination is not blinded from the ultrasound exam. Not
surprisingly, such studies show a higher sensitivity and specificity of
clinical examination in the hands of the specialist.
Return to Contents
Proceed to Next Section