Ten articles were abstracted in parallel, with similar results, demonstrating a reliable procedure. Data were abstracted regarding the study as a whole, design and quality, patient groups, and outcomes and their values. Efforts were made to standardize the vocabulary used in recording the methodology and results, resulting in a controlled vocabulary of 1,262 terms.
The definitions of the study designs were as follows: uncontrolled experiment, an unspecified group of participants received intervention and follow-up data were provided; case-series cross section, data were provided on a single group of participants; case-series follow-up, baseline and follow-up data were provided on a single group of participants; cohort cross section, a single group of participants was divided into 2 or more groups on the basis of a specified feature (e.g., history or laboratory tests) and described; cohort follow-up, baseline and follow-up data were provided on a single group of participants who were divided into 2 or more groups; case control, 2 or more groups were assembled and retrospective or current data are provided; and randomized controlled trial (RCT), participants were randomly assigned to intervention, and follow-up data are provided. Because the questions were all comparative, only studies with 2 arms or more were included in this review (thereby excluding case series).
Table 3 of the technical report (see "Availability of Companion Documents" field) shows the type of articles and studies as processed in the review. Most articles were rejected on the basis of title review. Some articles were rejected at more than 1 point in the process, as indicated by the overlaps. Ninety-four articles were left for review. Sixty-four articles had full data investigation, which included separating articles into 1 or more studies in case there was, for example, a baseline case-control study with a cohort follow-up; hence, the 64 articles translated into 83 studies.
An article might have more than 1 study in it if, for instance, there was an initial cohort cross-section with a subsequent cohort follow-up reported in the same article. Of the 83 studies for which methodology was reviewed, 46 were case control, 20 were cohort cross section, 10 were cohort follow-up, and 7 were RCTs. A recent systematic review of treatments in recurrent abdominal pain identified the same RCTs, providing validation for this approach.
Data abstracted for each study as a whole included study city; study country; single or multiple site; site type (community, physician office, academic pediatric setting, gastroenterologist office); funding source; age range, mean, and standard deviation (SD); sample size; number of groups; number of outcomes; and number of time points.
A methodology review was performed for each study, based on the Newcastle-Ottawa Scale for assessing the quality of nonrandomized studies in meta-analyses. Inclusion and exclusion criteria were noted by using the controlled vocabulary. The evidence was characterized in terms of outcome type (based on the controlled vocabulary), outcome name (specific to this study), outcome units (for continuous outcomes), outcome time point (baseline or later), method (how outcome was assessed), sample size at outset, and sample size at termination (a difference from sample size at outset indicated loss to follow-up). Data for continuous outcomes (in which a quantity was measured within a participant) were usually characterized by the mean and SD. For categorical data (in which participants were counted once), the category was labeled by using the controlled vocabulary, test statistic name, P value (and comments), and data source (page/figure/table).
Figure 1 of the technical report summarizes the geographic distribution of the studies for which the country was provided. Of 89 articles for which data were provided, 62 (70%) were performed at a single site, 30 (34%) were based on research at an academic pediatric center, 27 (31%) were performed at a gastroenterology clinic, 17 (19%) were performed at the community level, and 9 (10%) were performed at general pediatric offices.
The guideline developers calculated a quality score for each study as the ratio of quality items attained to the total number of items. The average, SD, and confidence intervals are given for each design in Table 4 of the technical report. Although the quality scores seem to increase for the more preferred designs, the confidence intervals all overlap.
Evidence tables for each of the 8 questions were generated across studies and grouped according to arm type, method, or outcome, as pertinent to the question. There were 685 outcomes across the studies, categorized as history outcomes (550 [80%]), tissue/physiologic outcomes (115 [17%]), physical examination outcomes (15 [2%]), and use of medications (5 [1%]). Among the 685 outcomes, 161 of the P values (23%) were not statistically significant, and an additional 316 (46%) were not provided by investigators. Each subcommittee member took responsibility for 1 or more questions. Each reviewed the evidence tables and the primary articles and generated a summary of the research. The scale for rating evidence is described in "Rating Scheme for the Strength of the Evidence."
The reviews were discussed by the subcommittee, and the nominal group technique was used to achieve consensus.