Data Extraction
The Evidence Review Team (ERT) designed data extraction forms to capture information on various aspects of the primary studies. Data fields for each study included study setting, funding source, eligibility criteria, study design characteristics, patient demographics, co-morbidities, number of subjects, description of surgical and anesthetic techniques, description of relevant risk factors or interventions, description of outcomes, statistical methods, results, study quality, applicability (see below), and free text fields for comments and assessment of biases. Work Group members were apprised of the entire data extraction process. They also reviewed the data extraction form.
Data from each article were extracted by one member of the ERT. A second member verified each set of data extraction, and discrepancies were resolved through discussions. Work Group members reviewed the results of the data extraction.
Outcomes of interest included symptomatic, clinically documented pulmonary embolism (PE) including treatment for PE, clinically documented PE-related death, all-cause death, other PE-related clinical events, rehospitalization due to venous thromboembolism or bleeding, major infection (not including superficial infections), major bleeding (as defined by authors, but generally including life threatening, intraocular, intracerebral, a bleed requiring more than a specified number of transfusions, extending the length of hospital stay, or resulting in a return to the operating room), and bleeding-related death.
Summary Tables
Summary tables describe the studies according to four dimensions: study size and important characteristics, results, methodological quality, and applicability. The ERT generated summary tables using data from extraction forms and/or the articles. Work Group members reviewed the summary tables.
Grading of Individual Studies
Methodological Quality Assessment
Methodological quality (or internal validity) refers to the design, conduct, and reporting of the clinical study. Many methods have been devised to measure study quality. There remains controversy regarding how different aspects of study design and quality may impact study results. The ERT used a three-category grading system (A, B, C) to denote the methodological quality of each study. This system has been used for a range of systematic reviews and clinical practice guidelines. It defines a generic grading system that is applicable to different study designs. The quality rating was based primarily on the study design and the quality of reporting pertained specifically to PE, major bleeding, and death.
A Good quality: Likely to have the least bias and results are considered valid. Clear protocol, clear description of the population, setting, and interventions; appropriate measurement of and reporting of rates of PE or death due to PE; appropriate statistical and analytic methods; no obvious reporting errors; less than 20% dropout; clear explanation of dropouts; and no obvious bias.
B Fair quality: Susceptible to some bias, but not sufficient to invalidate the results. They do not meet all the criteria in good quality studies because they have some deficiencies, but none are likely to cause major biases. The studies may be missing information, making it difficult to assess limitations and potential problems.
C Poor quality: Significant bias that may invalidate the results. These studies have serious errors in design, analysis, or reporting; have large amounts of missing information, or discrepancies in reporting. Studies that reported results for a specific outcome that were poorly defined were downgraded to poor for that specific outcome (e.g., if it was unclear whether all the PEs reported were confirmed).
Applicability Assessment
Applicability addresses the relevance of a given study to a population of interest. Every study applies certain eligibility criteria when selecting study subjects. Most of these criteria are explicitly stated (e.g., disease status, age, comorbidities). Some of them may be implicit or due to unintentional biases, such as those related to location (e.g., multicenter vs. single center; urban vs. rural setting), intervention (e.g., an outmoded dose), factors resulting in study withdrawals, or issues related to compliance with stated criteria, and others. The applicability of a study is dictated by the key questions, the populations, and the interventions that are of interest only to these specific guidelines (as opposed to those of interest to the original investigators).
The Work Group determined that short duration studies (follow-up duration of less than 6 weeks) were of limited applicability for estimating rates of PE and total death after arthroplasty. It was also the opinion of the Work Group that surgical techniques and post-operative management had changed significantly over time. Because of these changes, the care of patients enrolled prior to 1996 was sufficiently different than current practice. The consensus was reached to exclude these patients from the review.
To address these issues, we categorized studies within a target population into 3 categories of applicability that are defined as follows:
Wide: Sample is representative of the target population. It should be sufficiently large to cover a range of patient ages, other demographic features, and reasons for arthroplasty. Minimal exclusions based on age, comorbidities, or underlying risk of bleeding or venous thromboembolism. In addition, the intervention should be applicable to currently used interventions, including dose and duration of intervention. Complete reporting of baseline characteristics. Follow-up duration for at least 6 weeks with respect to the PE-related outcomes and total death
Moderate: Sample is representative of a relevant sub-group of the target population, but not the entire population. Limitations include such factors as exclusion of patients based on medical or surgical history, or narrow age range. Adequate reporting of baseline characteristics. Follow-up duration for at least 6 weeks with respect to the PE-related outcomes and total death.
Narrow: Sample is representative of a narrow subgroup of subjects only, and is of limited applicability to other subgroups. Multiple deficiencies regarding applicability or poor reporting of eligibility criteria and/or baseline characteristics. Follow-up duration may have been less than 6 weeks. Studies with less than 6 weeks follow-up may have been graded Narrow for PE, PE-related death, and total death, but Moderate or Wide for bleeding-related outcomes.
Statistical Methods
The primary units of analyses were rates of clinical outcomes. For the few relevant randomized trials with two interventions of interest or an intervention and a no intervention control, the odds ratios for the clinical outcomes were also analyzed. Rates of clinical outcomes of interest were calculated for each study based on the number of reported events and the best estimate of the denominator (the number of evaluated patients). For each event rate, a 95% confidence interval of the rate was calculated using an exact confidence interval approach.
Several of the studies reported only event rates after hospitalization. These studies randomized patients at discharge and specifically evaluated post-hospitalization interventions. Since these studies excluded patients who had thromboembolic events—including PE—during hospitalization, they were not included in our calculations of rates of events after arthroplasty.
However, these studies were fully evaluated and reviewed by the Work Group members. Because the event rates for most outcomes of interest were very small (less than 1%) and none of the studies included sufficient numbers of patients to provide estimates of the outcomes of interest, the estimated event rates were not normally distributed in the studies. In this situation, there are not adequate (i.e., reliable) methods of meta-analyzing rates. However, to provide the best estimates of event rates for different interventions, four different statistical approaches were used to pool the data.
Medians. For each analysis in which there were at least 3 cohorts of patients, the median value across cohorts was documented. The size of the cohorts and the confidence intervals of the study rates were not considered.
Simple Pooling. For each analysis, the total number of events was divided by the total number of patients across studies. This is equivalent to a fixed effects meta-analysis weighted by sample size (or a simple average). The confidence interval for the pooled estimate was calculated using the exact confidence interval approach.
Random Effects Model Meta-Analysis of Logit of Event Rate. The logit [ln(rate/(1-rate))] for each study was calculated. When the event rate was zero, 0.5 was added to all 4 cells of the 2x2 table. The logit values were then meta-analyzed using standard DerSimonian and Laird random effects model meta-analysis. However, a large number of studies had zero event rates and because of the relatively small sample sizes, adding 0.5 to cells frequently caused anomalous results. Use of smaller "fudge factors" (Woolf's corrections) sometimes resulted in exceedingly large confidence intervals. Thus, when summary estimates of rates were outside the range of estimates among the constituent studies, these estimates were discarded.
Bayesian Meta-analysis of Proportions. The event rates in each study were modeled as binomial distributions. Prior probability information was elicited as relatively non-informative beta distributions. Details on the parameterization of the Bayesian models and the specifications of the priors per analysis are available upon request. As specified, the prior distributions are incompatible with a zero event prevalence; therefore we did not perform these analyses when all numerators were zero across studies.
Individual study estimates and all four sets of summary estimates were graphed to highlight the relative rates across interventions and across outcomes. Because of the low event rates of outcomes of interest and the small sample sizes of the randomized trials (frequently resulting in 0 events in both arms), and because only one or two randomized trials were comparable in interventions, controls, and surgeries, the odds ratio of events were not calculated.
For all analyses, studies that reported only outpatient events that failed to adequately describe events during hospitalization were excluded. The Work Group, however, did review these studies.