EIA Logo

Short-Term Energy Outlook

Model Documentation Statistical Overview


Contents


Introduction

The STIFS model consists of over 300 equations (excluding equations used to convert standard units into energy equivalents such as British thermal units (Btu's)), of which just over 100 are estimated. The estimated equations are regression equations that together form a system of interrelated forecasting equations. The selection of functional form and the estimation technique is generally done on an equation-by-equation basis. The general method of estimation is ordinary least squares. Some equations incorporate a correction for autocorrelation of the error term.


Data Sources

The historical energy data used to estimate the model come primarily from the IMDS electronic database. IMDS merges data regularly reported in several EIA publications: Quarterly Coal Report, Petroleum Supply Monthly, Petroleum Marketing Monthly, Electric Power Monthly, Natural Gas Monthly, and Monthly Energy Review. Because of data limitations there are inconsistencies in the level of disaggregation of each type of fuel. For example, electricity and natural gas demands are represented by market sector, but petroleum products are generally represented only as national totals or for a combination of sectors (distillate and residual fuel oil are exceptions). Market-level data are available for the regulated industries (electricity and natural gas) while product-level data are available for the petroleum product markets, particularly for data frequencies higher than annual.

These energy price and volume data are supplemented by data from outside sources; the most common are listed below.

Most of the data sources provide monthly data and are used directly. Quarterly data are interpolated into monthly series.


Variable Naming Convention

Over 600 variables are used in the STIFS model for estimation, simulation and report writing. Most of these variables follow the following naming convention:

Characters MG TC P US A
Positions 1 and 2 3 and 4 5 6 and 7 8
Identity Type of energy Energy activity
or consumption
end-use sector
Type of data Geographic area
or special
equation factor
Data treatment

In this example, MGTCPUSA is the identifying code for motor gasoline total consumption in physical units in the United States which is deseasonalized.

Type of energy categories:
AB = aviation gasoline blending components
CC = coal coke
CL = coal
CO = crude oil, including lease condensate
CP = crude oil and pentanes plus
CU = crude oil and unfinished oils
DF = distillate fuel, including diesel fuel and heating oil
DS = diesel fuel
D2 = heating oil
EL = electricity
ES = electricity sales
ET = ethane
FE = petrochemical feedstocks
GE = geothermal energy
HY = hydroelectric power
JF = jet fuel
JK = jet fuel, kerosene-type
LG = liquefied petroleum gases
LX = liquefied petroleum gases, excluding ethane
MB = motor gasoline blending components
MG = finished motor gasoline
MI = miscellaneous petroleum products
NG = natural gas
NL = natural gas liquids
NU = nuclear power
OH = other hydrocarbons/alcohol
PA = all petroleum products
PC = petroleum coke
PP = pentanes plus
PR = propane
PS = other petroleum products
RF = residual fuel
RS = raw steel
UO = unfinished oils
WN = combined wind, photovoltaic, and solar thermal energy

Energy activity or consumption end-use sectors:
AC = transportation sector consumption
CA = capacity
CC = commercial sector consumption
CM = commercial sector consumption
EO = electricity production
ES = sales to end-users
EU = electricity sector consumption
EX = gross export
FC = synfuels consumption
FP = field production
HC = residential/commerical sector consumption
IN or IC = industrial sector consumption
IM = gross import
KC = coke oven consumption
LO = losses
NI = net import
NS = nonutility supply
PR = production
PS = petroleum product stocks
RC = residential sector consumption
RI = refinery input
RO = refinery output
RT = retail sales
TC = total consumption of all sectors
TX = Federal, state, and local taxes
UN = unaccounted for
WH = wholesale sales

Type of data:
D = price per million Btu
K = factor for converting data from kilowatthours to Btu
M = data in alternative physical units
P = data in standardized physical units
S = share or ratio expressed as a fraction
U = price per standardized physical unit
Z = factor for converting data from barrels to Btu

The physical units for data series in the STIFS model, represented by a "P" in the fifth character, include some of the following:

Conversion factors, represented by a "K" in the fifth character, are applied to the physical unit data to convert the data to Btu's, a common unit for all forms of energy.

Geographic identification or special equation factor:
AD = "Add" factor
AK = Alaska
MU = "Multiply" factor
48 = The contiguous 48 states and the District of Columbia
US = United States

Data treatment:
A = deseasonalized data series
S = seasonal factors derived from Census X-11 method
B,Q,X,Z = temporary variables


Mathematical Specifications

This section summarizes the characteristics of the equations that appear in the STIFS model.

Regression Equations and Estimated Coefficients

Most equations are estimated using either ordinary least-squares (OLS) or Maximum Likelihood (ML) for equations with auto-regressive error corrections. In all equations, the estimated coefficients appear before their associated right-hand-side variable. A standard naming convention for coefficients is used in most equations. The first three or four letters of the coefficients correspond to the first three or four letters of the dependent (endogenous) variable, followed by an underscore, then followed by two letters from the associated independent right-hand side variable. For example, for nonutility distillate fuel demand:

DSTCPUS(t) = DSTC_01 + DSTC_AC * DFACPUS

The coefficient DSTC_01 is the estimated equation intercept and DSTC_AC is the estimated coefficient associated with distillate demand in the transportation sector, DFACPUS.

Autocorrelation Correction

When time series data are used in regression analysis, often the error term is not independent through time. If the error term is autocorrelated, the efficiency of ordinary least-squares parameter estimates is adversely affected and standard error estimates are biased. The Durbin-Watson statistic is used to test for the presence of first-order autocorrelation in OLS residuals and is reported in the regression results. For equations in which a lagged dependent variable is present, the Durbin h statistic is reported.

Autocorrelation correction involves estimating the parameters of a linear model whose error term is assumed to be an autoregressive process of a given order p, denoted AR(p). The model for an autoregressive process is of the form:

y(t) = b0 + b1 x(t) + u(t)

where,
u(t) = e(t) - a1 u(t-1) -...- ap u(t-p)
e(t) = normally and independently distributed white noise disturbance

The autoregression coefficients, ai, are designated in the regression estimation results as the name of the endogenous variable followed by "_Lp", where p refers to the specified order (usually 1). For example, the nonutility distillate fuel demand is estimated with a first-order autoregressive error term:

DSTCPUS = DSTC_01 + DSTC_AC * DFACPUS + u(t)

where
u(t) = e(t) - DSTCPUS_L1 * u(t-1)

Distributed Lag Terms

Some equations explain the current values of endogenous variables as functions of past values of exogenous variables using a polynomial distributed lag structure. For a regression equation in which the effect of a right hand side variable, x(t), has a polynomial distributed lag structure of the form:

y(t) = b0 + b1 x(t) + b2 x(t-1) + b3 x(t-2) + ... bk+1 x(t-k)

then,
bi+1 = a0 i0 +...+ aj ij
j = 0 to n, n = degree of polynomial used
i = 0 to k, k = number of lags

The polynomial distributed lag is identified in the text as:

distlag( exogenous, degree=j, lags=i)

For example, air travel capacity (equation for RMZT) involves a distributed lag on aircraft utilization (RMZZ):

RMZT(t) = ...+ distlag( RMZZ, degree = 2, lags = 2) +...

The estimated coefficients. aj, are reported in the Appendix A estimation results as the name of the endogenous variable followed by "k_j", where k refers to the distributed lag term (usually equal to 1 unless an equation contains more than one distributed lag term) and j refers to the degree of the polynomial (j = 0 to n).

RMZT(t) = ...+ b0 RMZZ(t) + b1 RMZZ(t-1) + b2 RMZZ(t-2) + ...

where,
b0 = LDRZM1_0
b1 = LDRZM1_0 + LDRZM1_1 + LDRZM1_2
b2 = LDRZM1_0 + 2 * LDRZM1_1 + 22 * LDRZM1_2
b3 = LDRZM1_0 + 3 * LDRZM1_1 + 32 * LDRZM1_2

. . . etc . . .

Deseasonalized Variables

Several regression equations are estimated using seasonally-adjusted data. If the variable ends in an "A", such as ETTCPUSA, then the data for ETTCPUS has been deseasonalized using seasonal factors (in this case, ETTCPUSS) from the U.S. Census X-11 multiplicative seasonal adjustment routine. To obtain non-seasonally adjusted projections, the forecasts developed from seasonally-adjusted equations are the reseasonalized using the Census X-11 seasonal adjustment factors.


EIA Home Page

File last modified: June 6, 1998

Contact:
Tancred Lidderdale
Email: tlidderd@eia.doe.gov
Phone: (202) 586-7321
Fax: (202) 586-9753

This page's URL: http://www.eia.doe.gov/emeu/steo/pub/document/partb.html

If you are having technical problems with this site, please contact the EIA Webmaster at wmaster@eia.doe.gov.